Access the full text.
Sign up today, get DeepDyve free for 14 days.
Bridging Classical and uantum with SDP initialized warm-starts for QAOA REUBEN TATE and MAJID FARHADI, Georgia Institute of Technology, USA CRESTON HEROLD and GREG MOHLER, Georgia Tech Research Institute, USA SWATI GUPTA , Georgia Institute of Technology, USA We study the Quantum Approximate Optimization Algorithm (QAOA) in the conteMax-Cut xt of the problem. Noisy quantum devices are only able to accurately execute QAOA at low circuit depths, while classically-challenging problem instances may call for a relatively high circuit-depth. This is due to the need to build correlations between reachable pairs of vertices in potentially large graphs 16].[ To enhance the solving power of low-depth QAOA, we introduce a classical pre-processing step that initializes QAOA with a biased superposition of possible cuts in the graph, referrwarm-start ed to as a . In particular, we initialize QAOA with a solution to a low-rank semideinite programming relaxation Max-Cut of theproblem. Our experimental results show that this variant QAof OA, calledQAOA-warm, is able to outperform standarQdAOA on lower circuit depths in solution quality and training time. While this improvement is partly due to the classical warm-start, we ind strong evidence of further improvement using QAOA circuit at small depth. We provide experimental evidence of improved performance as well as theoretical properties of the proposed framework. CCS Concepts: · Hardware→ Quantum computation; · Theory of computation→ Network optimization ; Semideinite programming. Additional Key Words and Phrases: QAOA, approximation algorithms, graph theory, Max-Cut, warm-starts 1 INTRODUCTION There is growing interest in utilizing near-term quantum technology 44] to solv [ e challenging problems in combinatorial optimization. Farhi 17]etreal. cently [ introduced the Quantum Approximate Optimization Algo- rithmQ(AOA), designed speciically for combinatorial optimization problems. This is a hybrid quantum-classical algorithm, where the state of a quantum processor is controlled by variational parameters � and �, which are optimized using a classical processor. We consider theMax-Cut problem, which is one of the most studied problems in combinatorial optimization. Given a simple weighted graph � = (�, �) with vertex set� = [�], edge set � ⊆ , and weights� : � → R, the Max-Cut problem is to ind a partition � into of two sets�,� \ � ⊆ � such that the total weight of the edges that are cut by this partitioning,cut i.e(�.,) := � · 1[� ∈ � × (� \ �)], is maximized. The best-possible �∈� Corresponding Author Authors’ addresses: Reuben Tate, reubent@gatech.edu; Majid Farhadi, farhadi@gatech.edu, Georgia Institute of Technology, Atlanta, Georgia, USA, 30332; Creston Herold, creston.herold@gtri.gatech.edu; Greg Mohler, greg.mohler@gtri.gatech.edu, Georgia Tech Research Institute, Atlanta, Georgia, USA, 30332; Swati Gupta, swatig@gatech.edu, Georgia Institute of Technology, Atlanta, Georgia, USA, 30332. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proit or commercial advantage and that copies bear this notice and the full citation on the irst page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). © 2022 Copyright held by the owner/author(s). 2643-6817/2022/7-ART https://doi.org/10.1145/3549554 ACM Trans. Quantum Comput. 2 • Reuben Tate, Majid Farhadi, Creston Herold, Greg Mohler, and Swati Gupta (under Unique Games Conjecture) approximation ratio for solving Max-Cut the problem in polynomial time is 0.878 (for graphs with non-negative edge weights), and is given by the seminal Goemans-Williamson 22] (GW) [ algorithm. This method creates a random partition of the vertex set using the solution to a convex relaxation of Max-Cut using a semi-deinite program (SDP). Max-Cut is not only NP-hard to compute but also is hard to classically approximate better than a multiplicative factor /17 for ofnon-negativ 16 e edge weights 24,[48]. Graphs with both positive and negative edge weights seem harder to ind approximate solutions for12(]e.g., forseae [ related problem). Given a perfect noise-free quantum computer, on the other hand,Qthe AOA algorithm is able to converge to the optimal solution as the number of QAOA stages � increases. This is dueQto AOA’s asymptotic equivalence to the Quantum Adiabatic Algorithm (QAA � → ) as∞ [17]. The caveat is that increasing � not only increases the number of parameters to be optimized, but makes the circuit more susceptible to quantum noise. Suiciently deep circuits are efectively inaccessible due to the practical limitations of current and near-term quantum hardware. In this work, we study the impact of low-rank local optima Max-Cut for relaxations as initializations for QAOA. We refer to these non-standard initializationsQof AOthe A algorithm as warm starts (following the classical optimization literatur 4, 36, 45 e []) and explore their impact on the performance of the hybrid variational method. Our warm-states are separable, and are based on a local optima of Burer-Monteiro’s low-rank relaxation ofMax-Cut on a given graph 5[, 10]. We use standard QAOA mixers with this initialization and refer to this variant Q asAOA-warm. A key result of our study is that, our numerical simulations QAsho OA- wwarm typically outperforms standar QdAOA in quality of solution for �-depth. low In particular, we perform numerical simulations on 1264 graph instances of up to 12 nodes. We indQthat AOA-warm achieves a higher instance- speciic approximation ratio than standar QAOdA for 96.8%, 90.0%, 72.8%, and 53.6% of instances for circuit depths � = 1, 2, 4, 8 respectively. While this improvement is partly due to the classical warm-start, we ind that the improvement due to the QAOA circuit on the warm-start is signiicant, e.g., at least 50% improvement in the instance-speciic approximation ratio � = 1atcompared to � = 0 (warm-start initialization) on 74 instances, and at least 80% improvement on 22 instances at depth � = 1. We also explore the variational parameter space with and without initializations using warm-starts, and show interesting theoretical properties for warm-starts. For QAOA-warm at � = 1, our numerical simulations indicate that the energy landscape frequently has a more ridge-like structure which can potentially be exploited in regards to optimization of the variational parameters. Additionally, with QAOA-warm, our simulations show an overall improvement in the expected cut values across the landscape. For graphs with non-negative edge weights, it is known that standar QAOA d starts with an approximation ⊗� ratio of 0.5 at � = 0 (as measuring the initial|+⟩ state yields the same result as if uniform classical sampling all possible cuts). In contrast, given any graph with non-negative edge weights �-close and asolution to the Burer-Monteiro relaxation of the problem (that is, the Burer-Monteiro objective value of the solution is at least a � fraction of the optimal cut) in 2 or 3 dimensions, we prove our pre-processing stage QAallo OA-wws arm to guarantee at least a 0.75� or 0.66� approximation (respectively) at any depth, in particular � = 0. This general bound augments the current literature where provable guarantees for standar QdAOA at low depth are only known for special cases, e.g., regular graphs. For �-node even cycles, depth-� QAOA achieves an approximation 2�+1 2�+1 ratio of at most whenever � > 2� [37]. It is conjectured that the approximation ratio is exactly [17]. 2�+2 2�+2 On the other hand, for these even cycles, the warm-starts simply result in an optimal cut due to the antipodal structure of locally optimal solutions to the utilized Burer-Monteiro relaxation. Under the Unique Games Conjecture (UGC), 0.878-approximation ratio is the best ratio we can hope to achieve in polynomial time, which simply means that assuming UGC (and � ≠ �� ) there does not exist a polynomial-time classical algorithm Max-Cut forwith a(0.878+ �)- approximation ratio, for�any > 0, [28, 29, 39]. Meanwhile, if only � ≠ �� is assumed, it is known that there does not exist a theoretical polynomial-time classical algorithm Max-Cut for that achieves an approximation ratio of greater than .941+0 � for any� > 0 [24, 48]. ACM Trans. Quantum Comput. Bridging Classical and uantum with SDP initialized warm-starts for QAOA • 3 In order to give a fair comparison with QAOA, we explore the limitations A-wof armQA atOhigher circuit depths. We prove that QAOA-warm does not guarantee convergence to the optimal solution�as→ ∞ for certain 3-dimensional initializations (see Theorem 3). This is to be expected Q since AOA’sstandar equivalence d ⊗� to the Quantum Adiabatic Algorithm is dependent on the fact that the initial |+⟩ is state a maximum energy eigenstate of the mixing Hamiltonian, but this will not typically be the case for theQinitial AOA-warmstate . of In other words, for any graph� , there exists a graph-dependent depth� such that standard QAOA does at least as well asQAOA-warm. Related Work. There have been many diferent approaches to improving QAOA. Zhou et 51 al.] pr [ oposed the INTERP and FOURIER heuristics to improve parameter optimization. These approaches bootstrap QAOA parameter initialization to the QAOA solver itself, and do not use any classical-side optimization. 52] Zhu et al. [ introduced layer-dependent mixer operators that rely on an ansatz for the QAOA states. Sack and Serbyn 46] [ meanwhile focused on QAOA parameter optimization by connecting QAOA more closely to its quantum adiabatic origins. Our approach meanwhile leverages the considerable body of work on classical solvers. Bärtschi et al. [2] altered the mixing term to use a Grover-like circuit. However, their approach is not well Max-Cut suited to as it does not have a subspace of preferred states. Unreachable states that are independent of initial conditions were explored in1], [ and a contrast was drawn between these states and the barren plateau problem, where poor initialization results in ineicient optimization. Our work connects the two cases, inding cases where initial states fail to mix properly and yield low-value approximate solutions. In a recent parallel study by Egger et al. [15], the authors explore two warm-start techniques. In the former, they perform a continuous relaxation of variables for a Quadratic Unconstrained Binary Optimization (QUBO) and modify the mixer in a way that ensures one achieves optimality as the circuit depth tends to ininity. In the latter, they initialize QAOA based on a singlecut that is classically obtained frMax-Cut om a instance, and then modify the mixer so that the value of that speciic cut can be recovered at depth-1 QAOA. Our approach, on the other hand, is to use low-rank local optima for relaxations Max-Cut to (with rank greater than 1). Additionally, Egger et15al. ] mo [ dify the mixer so that the warm start is the lowest energy eigenstate, while we maintain the standard mixer in this work. Overall, since our approach allows more lexibility in the initialization of warm-starts, it ultimately translates to improvements in performance, especially at low-circuit depths (as discussed in Appendix E). Outline.We believe our study draws interesting connections between classical and quantum hybrid algorithms while positively impacting the performance QAof OAthe algorithm. In the rest of this paper, we revie QAw OA in Section 2.1 and the Goemans-Williamson algorithm as well as the low-rank Burer-Monteiro formulation for it in Section 2.2, we introduce our key ideas as a preprocessing stage in Section 3, present our computational and theoretical results in Sections 4 and 5 respectively, and conclude with a discussion and open questions in Section 2 QUANTUM AND CLASSICAL OPTIMIZATION ALGORITHMS Before delving into the relevant algorithms in quantum and classical settings, we irst deine the notion of approximation ratio (AR)Max-Cut for in general weighted graphs. In the QAOA literature, many have adopted the term approximation ratio to mean the performance of a single run of an algorithm on a particular instance. This ratio is typically łnormalized" to lie in the[0, inter 1] andval is well-deined even when the Max-Cut(�) = 0. Given an expected cut value of the cut obtaineE d using algorithm A on graph � , we call such a ratio � , , A,� A,� the normalized instance-speciic approximation ratio and deine it as E − Min-Cut(G) A,� � = ; (1) A,� Max-Cut(�) − Min-Cut(G) QUBO is in fact equivalent Max-Cut to [14], and therefore all our results apply to QUBO as well. ACM Trans. Quantum Comput. 4 • Reuben Tate, Majid Farhadi, Creston Herold, Greg Mohler, and Swati Gupta for brevity, we will simply use the term instance-speciic approximation ratio (or simply the empirical approxi- mation ratio) to gauge the performance of variants of QAOA over the family of graph instances weGconsider (see Section 4.1). We also note that in the classical optimization literature, the term approximation ratio is usually reserved for a theoretical lower bound on the performance of a particular algorithm across all problem.instances We will call such a bound as the theoretical worst-case approximation ratio or simply the approximation ratio when clear from context. For both standard QAOA and our proposed variant QAO A-warm on a particular graph � , we obtain a inal quantum state|�⟩ andE is deined as the expected cut value obtained from measuring |�⟩ in the computational A,� basis. Both the Burer-Monteiro and Goemans-Williamson algorithms yield a collection � = of(� ,p.oints . . , � ) 1 � on a hypersphere and E is deined to be the expected value of the cut obtained from performing randomized A,� hyperplane rounding on �. For our simulations in Section 4, we calculate E exactly. Since we are using a full-state vector classical A,� simulator for QAOA (instead of an actual quantum device), we can directly calculate E by reading of the A,� amplitudes of the inal quantum state (as opposed to simulating several quantum measurements and approximating E with an emperical average). Similarly, when working with the Burer-Monteiro or GWEalgorithm, A,� A,� is computed exactly by analyzing the angles between the points on the hypersphere (as opposed to actually performing hyperplane rounding and approximating E with an emperical average). Precise formulasE for A,� A,� are provided in Sections 2.1 and 2.2. 2.1 The uantum Approximate Optimization Algorithm In this section, we review the hybrid quantum-classical algorithm QAOA for of theMax-Cut problem.QAOA assigns a quantum spin to every binary output variable. In each of � lay theers of the algorithm, the problem � � Hamiltonian � and a mixing Hamiltonian � = � , where � is a Pauli matrix for qubit �with� = �,�, �, � � �∈[�] � � are alternately applied to the initial quantum processor|� ⟩state , generating a variational wavefunction −�� � −�� � −��� −��� � � � � 1 � 1 � |� (�, �)⟩=� � ··· � � |� ⟩ , � 0 ⊗� where |� ⟩ = |+⟩ is the standard initial state. Sampling from the inal variational state will yield a cut with an expected cut value of: � (�, �) = ⟨� (�, �)| � |� (�, �)⟩ . � � � � For the maximization problem Max-Cut, the cost Hamiltonian for a graph � = (�, �) (with weights � : � → R) can be written as � � � = � (1− � � ) . � � � � � (�,�)∈� The (near) optimal parameters of the algorithm, �, �, are found by a classical algorithm to maximize the performance of theQAOA algorithm, with � (�, �) viewed as a multi-dimensional non-convex function. We let � denote the expected cut value with optimal choice �, �of parameters, i.e., � = max � (�, �). � � �,� In particular, for a randomized algorithm A forMax-Cut theoretical computer science literature refers to an approximation�ratio ifof E[cut(� )] A,� � ≤ min , (2) Max-Cut(� ) � ∈� where (� ,� (� ) \ � ) is the (random) cut returned byA (on � ), the expectation is over the randomness of algorithm A taken as the A,� A,� worst-case over all positive weighted graphs G. For example, the 0.878 bound for Goemans-Williamson is a worst-case bound obtained where the expectation is over all positive weighted graph instances. ACM Trans. Quantum Comput. Bridging Classical and uantum with SDP initialized warm-starts for QAOA • 5 It is not diicult to see that � is a non-decreasing function �;in moreover, as previously mentione�d, → � � Max-Cut(�) as � → ∞ [16]. For graphs with non-negative edge-weights, the ratio � /Max-Cut(�) ≥ 0.5 for all� ≥ 0 due to the 0.5-approximation ratio achiev� ed=at0 for the standard initialization. 2� To ind the optimal variational parameters, one can simply perform a dense grid sear �, �ch∈ [for −�, �] , but this would be feasible only for small circuit depths. For scalability, one can�instead (�, �) astraeat black-box and utilize a classical optimizer to (iteratively) update and ind suitable � andvalues � in an ofefort to the maximize the expected cut value. For any classical optimization algorithm A, it will eventually terminate at some (�, �) = (�ˆ, �); in the context of instance-speciic approximation ratio (Equation 1), the expected cut value isE = � (�ˆ, �). A,� � To optimize the variational parameters, we consider four choices of the optimizer: 30], C AD OBYLA AM [ [43], Nelder-Mead [20], and BFGS [19]. Since� (�, �) is non-convex, classical optimizers are not guaranteed to stop at a globally optimal choice � and of�, i.e., the expected result of QAOA will not always be equal� to(i.e. the expected result ofQAOA had we initialize � and d � optimally). ADAM and BFGS operate with the irst-order information (i.e., using gradient estimates), whereas COBYLA and Nelder-Mead operate with the zeroth-order information (i.e., function value estimates). On quantum devices, gradients are estimated using multiple evaluations of the function � (�, �) at various(�, �); these function evaluations are noisy since � (�, �) itself is estimated by taking � � an average of multiple quantum measurements. For this reason, gradient-free optimizers are typically more robust against quantum noise and are recommended in practice over gradient-based methods [32]. Application of machine learning techniques for optimizing the variational parameters (a technique known as meta-learning) has also shown promise in the noisy quantum setting 50]. Re [ cent results regarding the concentration of the (standard) QAOA landscape can also be used to speed up optimization of the variational parameters 8]. Even [ though the runtimes for various optimizers may signiicantly difer, we ind that the choice of the optimizer has much smaller impact on the instance-speciic approximation ratio achie QAOvA- edw for arm (discussed in Section 4). 2.2 Classical Optimization Algorithms In this section we review two classical approximation algorithms Max-Cut for . Recall that given a (weighted) graph � = (�, �) with weights � : � → R, the Max-Cut problem is to ind a partitioning of the vertices into two subsets, � and � = � \ �, that maximizes the number of cut edges, i.e., Max-Cut(�) = max � 1[�∈ �]1[�∉ �]. �,� �⊆� (�,�)∈� |� | Instead of maximizing over subsets � ,of one can rewrite the problem as maximizing{o−v1er , 1} instead. To do this, we associate every verte�x∈ � with a decision variable � , where � = +1 indicates that verte�x∈ � � � and � = −1 indicates that �∉ �. Observe that for an edge(�, �) ∈ �, we have that the edge is cut if and only if � ≠ � . Using the above fact, one can easily check that: � � 1 � , (�, �) is cut , � � � (� − � ) = (3) �,� � � 4 0, (�, �) is not cut . For actual quantum devices, the value � of(�, �) and its gradients can be estimated by taking multiples measurements � (�,of �) in the � � computational basis. One can calculate (or approximate) the gradient using a variety of methods. Our implementation approximates the gradient using an analytic forward diference method implementedTensorflo in w uantum (with default parameters error_order=1 and grid_spacing=0.001). By analytic, we mean that any expectations computed in the calculation are compute exactly d (instead of using a sampling-based approximation). ACM Trans. Quantum Comput. 6 • Reuben Tate, Majid Farhadi, Creston Herold, Greg Mohler, and Swati Gupta By adding up the contribution of each edge and letting � = |� |, it becomes clear that one can reformulate the Max-Cut problem as the following maximization problem: max cut(�) = max � (� − � ) , (4) �,� � � � � �∈{±1} �∈{±1} (�,�)∈� = max � (1− � � ), �,� � � �∈{±1} (�,�)∈� 1 1 = � + max ⟨−�, �� ⟩, (5) 2 �∈{±1} 4 where � is the adjacency matrix �of, ⟨·,·⟩ denotes the Frobeniusproduct of two matrices, and � = � . � � (�,�)∈� Goemans-Williamson (GW) Algorithm. In the seminal work of Goemans and Williamson 22] in [ 1995, the authors pioneered the use of semi-deinite programs for solving combinatorial problems. Considering � = �� ≽ 0 from equation (5), Max-Cut is equivalent to maximizing ⟨−�, � ⟩ by matrix� from the positive semideinite cone, subject to having a unit diagonal, in addition to beingRelaxing rank-1. the last constraint gives us a semideinite program as follows: maximize⟨−�, � ⟩ subject to ⟨�, � � ⟩ = 1, ∀�∈ [�], (6) � ∈ S , where � = |� | and S is the set of all �× � positive semideinite matrices. The value given by the relaxation above was irst considered in 1993 by Delorme and Poljak 13] in [ the form of an eigenvalue maximization problem with the equivalence shown shortly after by Poljak and Rendl 42] in [ 1995. The above relaxation is in the form of a semi-deinite program and hence since it is a convex program it can be solved in polynomial time up to arbitrary precision, e.g., by using interior point methods [40]. � �×� For a Cholesky decomposition�of= � � (with� ∈ R ), one can think of the solution to the above SDP as an embedding which maps verte�xto � , i.e., the�th column of� . This embedding can be viewed as :� a maximizer of a relaxation of equation (4)�wher still e has unit distance from the origin, butRno, w i.e in ., � lies on the(� − 1)-sphere. To map this high dimensional solution to a cut in the graph, the GW algorithm considers a random hyperplane through the origin to partition the vertices into two sets according to which side of the hyperplane they lie on; Goemans and Williamson 22] show[ed that this choice of rounding yields an approximation ratio of 0.878 Max-Cut to , when the edge weights are non-negative. More speciically, given a ixed SDP solution � = � � of GW, the expected value of the cut obtained via hyperplane rounding is given by � arccos(� · � ); in the context of instance-speciic approximation ratio (Equation 1), we deine the � � :� :� (�,�)∈� expected cut value (on a particular run of GW) as the previous sum,Ei.e., = � arccos(� · � ). A A,� � � :� :� (�,�)∈� similar deinition E ofis also used for the Burer-Monteiro method which we describe next. A,� � �×� Burer-Monteiro (BM) Method.Observe that changing variables�as= � � (with� ∈ R ), one can eliminate the positive semi-deinite constraint in (6) and obtain the following equivalent reformulation: maximize⟨−�, � �⟩ subject to ∥� ∥ = 1, ∀�∈ [�], (7) � 2 � ∈ R , ∀�∈ [�] , (8) Not to be confused with the bra-ket notation, theFrobeniusproduct of two same-sized matrices � and �, denoted by ⟨�, �⟩, is equal to † † Tr(� �) where Tr(·) denotes the trace of a matrix and(·) denotes conjugate transposition. We use ł � ≽ 0ž to mean that � is a positive semideinite matrix, � isi.e a symmetric ., matrix with real, nonnegative eigenvalues. 7 � � �+1 The �-sphere, denoted � , is deined as � = {� ∈ R : ∥�∥ = 1}. ACM Trans. Quantum Comput. Bridging Classical and uantum with SDP initialized warm-starts for QAOA • 7 where � denotes the �th column of� . Burer and Monteiro10 [ ] proposed relaxing � for each vertex toR instead � � � � ofR in(8), i.e., use� ∈ R . Unlike the relaxation used in the Goemans-Williamson relaxation, this modiication yields a non-convex optimization problem. We refer to this modiication�asBur theer-Monteir rank- oMax-Cut �−1 (BM-MC ) relaxation. Given a feasible solution � : � → � to the BM-MC relaxation, we let BM-MC (�) � � � denote the BM-MC objective value at �; for a given graph � , we letBM-MC (�) = max BM-MC (�) denote � � � � the globally optimal BM-MC objective value for � . Not only is optimizing a non-convex (non-concave) optimization problem diicult, but even inding a local optimum to a non-convex optimization problem can be challenging due to saddle-points. Nevertheless, irst and second-order optimization methods have showed promising performance in converging to high quality local optima for low-rank BM formulation Max-Cut of (and many other combinatorial optimization problems). Burer and Monteiro invented this heuristic method, motivate existence d by of a low rank optimal solutions to the original � dimensional) ( SDP whenever is no less than the number of constraints of the SDP, known as the Barvinok-Pataki bound 3,[41]. Their method has showed promising performance in practice, even in constant dimensions and is an active area of research in non-convex optimization 6, 7the , 11or ]. yExp [ eriments by Burer, Monteiro, and Zhang11 [ ] demonstrate that BM-MC performs much more quickly while maintaining relatively good solutions; on one particular 20,000-node instance, GW took over 1.5 days to complete, whereas a rank-2 approximate BM-MC solution was found in a little over a second; repeated runs of BM-MC over the course of a couple minutes on the same graph yielded cuts that were at least as good as those obtained by11 GW ]. Mor [ e details on the runtime of GW and BM-MC can be found in Appendix D. Recently, Mei et al.38[] showed that, forMax-Cut SDPs corresponding to graphs with non-negative edge- weights, any second-order local optimum for the BM formulation is approximately optimal with respect to the original SDP. Theorem 1 (Mei et al. [38]). For graphs with non-negative edge weights, the objective at a locally optimal solution, for the above non-convex, rank-� SDP formulation, is within a factor 1− of that of the rank-� SDP. �−1 In this work, we refer to a solution � of the BM-MC relaxation in rank � as �−close if BM-MC(�) ≥ �Max-Cut(�). To parse Mei et. al’s result in other words, for �-no an de graph � with non-negative edge weights, any locally-optimal solution � to BM-MC is also(1− )−close since BM-MC (�) ≥ Max-Cut(�). The above �−1 theorem highlights the fact that increasing � improves performance of the BM formulation; however, for the purposes of this work (and simple mapping to the Bloch sphere), we restrict our attention to rank-2 and rank-3 solutions. We next discuss our key ideas on bringing in warm starts from classical optimization to quantum algorithms. 3 PREPROCESSING STAGE FOR QAOA-WARM In this section, we discuss our classical preprocessing stagefor warm-starts in QAOA, which are obtained through the Burer-MonteiroMax-Cut (BM-MC ) relaxation in rank � (for� = 2, 3). Given a classical solution � ∈ � (for � � � �∈ � for graph� = (�, �)), our warm-starts comprise a separable product state|� ⟩ ⊗ |� ⟩ ⊗ ···⊗ |� ⟩, wherein 1 2 � the pure state of each qubit|� ⟩ can be represented on its own Bloch sphere at the location of the corresponding vertex � ∈ � . These initial qubit positions are obtained using a classical Burer-Monteiro algorithm in rank-2 � � (BM-MC ) and rank-3 (BM-MC ). 2 3 To motivate such an approach for creating warm-starts for QAOA, we highlight two key observations. First, since the objective of BM-MC can be written asmax � ∥� − � ∥ , the classical solutions are � � ,� ∈� (�,�)∈� � � � � � � �−1 2 incentivized to move the adjacent vertices as far apart as possible, ideally, to opposite ends of the sphere. This helps increase the probability of an edge being in a cut obtained not only by hyperplane rounding but also quantum sampling (as long as the corresponding qubits are aligned with the measurement axis as much as possible, i.e. at the north and south poles of the Bloch sphere). In general, if there is a cluster of vertices at both ACM Trans. Quantum Comput. 8 • Reuben Tate, Majid Farhadi, Creston Herold, Greg Mohler, and Swati Gupta BM-MC BM-MC 2 3 Fig. 1. Pie charts representing best expected cut value (expectation over randomness in sampling) obtained by using (i) hyperplane rounding of the BM-MCsolution (HR), (ii) quantum sampling of the BM-MC solution (QS1), and quantum � � sampling of the initial state of standard QAOA (QS2). For every instance, QS2 always yielded the worst result of the three, and for majority of the instances QS1 ≥ HR. For HR and QS1, the best of 5 (in terms of SDP objective) locally optimal BM-MC solutions are used; for that solution, the best of 5 rotations is used for QS1. The regions marked in gray indicate instances for which QS1 and HR had a tie (diference in instance-specific approximation ratio of at most 0.001). the poles of the sphere, then the probability of capturing the weight of the edges that go across these clusters is increased for both classical and quantum approaches. Next, we ind a reduction to the quantum sampling objective from the BM-MC objective for an edge. Consider an edge �, such that one of the vertices is located at the top of Bloch sphere. Then the probability of that edge being cut via quantum sampling and the contribution that edge makes to the BM-MC objective coincides. Consider an � � edge � = (�, �) such that � = (0, 0, 1) , and � = (sin� cos �, sin� sin�, cos�) (where � and � are the polar and � � azimuthal angle respectively). The expected contribution � to the of Max-Cut from quantum sampling is equal to � multiplied by the probability that the � is edge cut, i.e�., sin(�/2). The contribution to the BM-MC �,� �,� 3 objective from edge � can be written as � (1− � · � ). By deinition, cos(�) = � · � , and thus, the contribution �,� � � � � to the BM-MC objective is� (1− cos(�)) = � sin(�/2), which is equivalent to the expected contribution 3 �,� �,� of� from quantum sampling . A natural question at this point is if there is any improvement in cut quality when QAOA- applying warm to the warm-start initialization compared to simply performing hyperplane rounding on said warm-start, and if quantum sampling of a classical solution is even competitive compared to a hyperplane rounding of the same. We show in Figure 1 that quantum sampling of the warm-start (QS1) initialization outperforms the expected cut obtained using the standard initial state for QAOA (QS2). Moreover, with an appropriate initial rotation of the warm-start (Section 3.2), QS1 outperforms hyperplane rounding (HR) for the majority of instances. By interpreting QS1 and QS2 as being the result of depth-0 QAOA-warm and standard QAOA respectively, the results motivate our approach in the hopes that QAOA-warm outperforms standardQAOA (and GW) for circuit depths � > 0 as well. We explain next the pipeline of constructing warm-starts using appropriate initial rotations. 3.1 Solving BM-MC We use the Burer-Monteiro algorithm�in dimensions for � = 2, 3, for inding approximate solutions Max-Cut to . In each dimension, we begin with � points chosen uniformly at random on the unit circle � =(for 2) or unit sphere (for� = 3). We represent these points in polar coordinates�(for = 2) or spherical coordinates (for � = 3); that is, we keep track of the polar �)(angles (for� = 2, 3) and azimuthal�() angles (for� = 3) of each point. To This may not be true in general for Max-Cut the over the entire graph, due to alignment with the measurement axis. ACM Trans. Quantum Comput. Vertex-At-Top Bridging Classical and uantum with SDP initialized warm-starts for QAOA • 9 Fig. 2. Comparison of the hyperplane rounding and quantum sampling for a 3-cy Max-Cut cle ( =2): figure (a) shows a local optimal BM-MC solution, where any random hyperplane will give a cut of size 2. Both (b) and (c) show two diferent embeddings of the BM-MC solution (from (a)) onto the Bloch sphere. In (b), the qubits�lie = 0onplane and quantum sampling results in a expected cut of 1.875. In (c), all qubits lie on the equator of the Bloch sphere (similar to the standard start of QAOA), so each edge has a probability of 1/2 of being cut, yielding a total expected cut 1.5. Both of (b) and (c) demonstrate that the orientation of the rotated BM-MC solution is important when embedding it into the Bloch sphere and can result in diferent expected cuts. ind locally optimal solutions, we perform stochastic coordinate byascent making small random perturbations to these angles (thus maintaining feasibility) and update our solution if the objective increases; see Appendix A.1.2 ∗ �−1 for more detail. We ind 5 local optima and take the best solution. x : �Let→ � be a solution to BM-MC , i.e., a Burer-Monteiro relaxation Max-Cut of in the�-dimensional space. 3.2 Random Rotations Classical hyperplane rounding for BM-MC is invariant under a global rotation of the entire solution, however quantum sampling is not as it has a ixed axis along which measurements are performed. For example, in Figure 2 we consider 3 rotations of a particular BM-MC initialization of 3 qubits on the Bloch sphere, and though hyperplane rounding is agnostic to a rotation of the Bloch sphere, quantum sampling depends on the choice of the measurement axis. The diference in instance-speciic approximation ratio attained by quantum sampling in two diferent orientations of the same solution on Bloch sphere demonstrates the importance of choosing a suitable rotation when embedding the BM-MC solution to the Bloch sphere. Thus, before mapping the rank- � approximate solutions from BM-MC , a rotation is performed to mitigate unfavorable orientations due to warm-starts. We consider two types of random rotation schemes: uniform rotation R (for in all the vertices), and random łvertex-at-top" rotations where a vertex is sampled uniformly and mapped to(0the , 0, 1) vector for rank-3 and (1, 0) in rank-2 solutions. Uniform random rotations can provably recover a signiicant fraction of the BM-MC objective (see Section 5) whereas vertex-at-top rotations serve as a useful heuristic. We describe both of these ∗ ∗ rotations in Appendix A.1. We use the shorthand � (x ) and � (x ) to denote the rotations of the approximate � � solution x by a random vertex-at-top rotation� and a random uniform rotation � respectively. � � Stochastic coordinate ascentworks well in practice in inding a local optimum,38 se].eNe e.g., vertheless, [ for guaranteed convergence one can use other methods such as (fast) Riemannian Trust-Region methods. ACM Trans. Quantum Comput. 10 • Reuben Tate, Majid Farhadi, Creston Herold, Greg Mohler, and Swati Gupta Fig. 3. We begin with a locally optimal solution from BM (top-let). We then apply a rotation � ∈ {� , � }; here we show � � � being applied (top-right). Lastly, we use � to map this rotated solution to a separable quantum state. 3.3 Mapping to the Bloch Sphere To map the rotated solutions �(x ) = ((� , � ), . . . ,(� , � )) (with� ∈ {� , � }), we can simply map the rank-3 1 1 � � � � solutions for each vertex to the Bloch sphere (see Figure 3) using a tensorizable state for each qubit, i.e., the quantum ł mapping"� is given by: � (x ) = � (� , � ) ⊗ ··· ⊗ � (� , � ), 3 1 1 3 � � where �� � (�, �) = cos(�/2) |0⟩ + � sin(�/2) |1⟩ . For rank-2 solutions, let �(x ) = (� , . . . , � ) be the rotated approximate solution in polar coordinates where 1 � � ∈ [0, 2�) for� = 1, . . . , �. We embed the solution into the ��-plane of the Bloch sphere with the following quantum mapping: � (x ) = � (� ) ⊗ ··· ⊗ � (� ), 2 1 2 � where � (�) is given by: � � −��/2 cos |0⟩ + � sin |1⟩ , � ∈ [0, �), 2 2 � (�) = � � ��/2 cos � − |0⟩ + � sin� − |1⟩ , � ∈ [�, 2�). 2 2 The quantum mapping for rank-2 solutions is motivated by the fact that for rank-3 solutions, certain initializa- tions along the �-axis cause QAO A-warm to perform poorly (see Section 5.2)); mapping to�the �-plane of the Bloch sphere allows us to avoid these problematic states. ACM Trans. Quantum Comput. Bridging Classical and uantum with SDP initialized warm-starts for QAOA • 11 Algorithm 1: QAOA-warm using BM-MC Input: � = (�, �), � : � → R, �, �, � ∈ {� , � } � � 1 � ← BM-MC (�, � ) // approximate solution 2 |� ⟩ ← ⊗ � (�(� )), for�∈ [�], for random� 0 �∈� � � 3 return QAOA(�, �, |� ⟩ , �) 3.4 Performing a biasedQAOA Now we performQAOA, as described in Section 2.1, while redeining initial |� ⟩ as th state e tensor product of the qubit-states due to the previous step, i.e., |� ⟩ = ⊗ � (�(x (�))) , 0 �∈� � and run QAOA with the chosen depth and optimize over the � parameters 2 � = (� , . . . ,� ), � = (� , . . . , � ) in 1 � 1 � order to maximize � (�, �) = ⟨� (�, �)| � |� (�, �)⟩. We initialize �, � close to0 which allows us to start with a � � � � solution quality close to what would be obtained by just doing quantum sampling. Moreover, the ridge-like geometry of theparameter landscape (seen in Sections 4.5 and 5.1) also suggests that points near the origin are suitable for initialization of the variational parameters. We summarizeQAOA-warm in pseudocode in Algorithm 1. In the next section we present experimental results with warm-starts, followed by theoretical development of properties of standar QAOA dand QAOA-warm. 4 RESULTS In this section, we discuss the results of our numerical simulations QAOA-wof arm. We irst discuss the details of the preprocessing pipeline and the graph instances used in Section 4.1. In order toQ compar AOA-w earm to other Max-Cut algorithms, one can use diferent black-box optimizers, such as ADAM, COBYLA, Nelder-Mead and BFGS. We irst run computations to pick a single optimizer, then to pick the rank of the initialization, and the rotation scheme to work with in Sections 4.2 and 4.3. In Section 4.4, we next provide aggregate results for QAOA-warm including (i) a comparison againstMax-Cut other algorithms, (ii) improvement in instance-speciic approximation ratio with incr�ease depth, d and (iii) trends in (median) instance-speciic approximation ratio with varying�-depth and graph size. Lastly, to understand the behaviorQof AOA-warm, we discuss the qualitative shape and numerical properties of the parameter landscapQ eA of OA-warm (and standard QAOA) in Section 4.5. 4.1 Experimental Setup Graph Instances. We consider a collection of 1264 graphs, G, generated as follows. We irst generated a set of unweighted graphs, which includes all non-isomorphic graphs � = 2for to � = 6 vertices (142 instances), and 29 random graphs for each size � = 7 to � = 12 sampled from diferent random graph generators Python in ’s NetworkX [23] package. These random graph generators include Erdös-Renyí, Barabasi Albert, Dual of Barabasi- Albert, Watts-Strogatz, and Newman-Watts-Strogatz models (detailed in Appendix A.3). Many experimental studies in the current QAOA literature only consider graphs from a single random graph model (e.g. Erdös-Renyí); however graphs from such models can have predictable behavior when it comes Max-Cut to which could Initializing the parameters exactly to 0 is not advised due issues regarding saddle points. In particular, Q standar AOA at d � = 1 has a saddle point at the origin and thus terminates immediately. We instead� initialize , � by sampling uniformly at random from the interval � � [−0.0001, 0.0001] for� = 1, . . . , �. In the case of standardQAOA, if we still get stuck after a few epochs (due to the saddle point), we discard the run and retry with new randomized initial parameter values. For example, when using the Erdös-Rényi graph model, if each edge appears independently with pr� obability , and if we take a random cut with� vertices on one side and � − � vertices on the other side, then one would obser�v�e(� − �) edges across the cut (in expectation). ACM Trans. Quantum Comput. 12 • Reuben Tate, Majid Farhadi, Creston Herold, Greg Mohler, and Swati Gupta potentially have a large inluence on the performance of QAOA. For this reason, we construct an ensemble of graphs G using a variety of random graph generators. Next, we create three weighted versions of each of the 316 unit-weighted instances constructed above, by considering independent edge-weightings drawn from (i) uniform distribution {−10,−9,···on , 9, 10}\{0}, (ii) � � � −�−2 uniform distribution{1,on 2, . . . , 10}, and (iii) weights of±form 2 withPr[� = 2 ] = Pr[� = −2 ] = 2 � � for all non-negative integers �. The weighted and unweighted instances together give us a total of 1264 instances. The last family of weighted instances is constructed due to high variation of performance of classical heuristics on similar instances, observed in a previous study by Dunning 14].etNote al [that the last two ways of sampling edge-weights results in only positive edge-weight graphs. We will often present results for mixed-weight graphs (positive and negative weights), and positive-only separately. Running the Preprocessing StageW . e computed QAOA-warm, Goemans-Williamson and standar QA dOA solutions for each of the weighted graph instances � ∈ G. Both standard QAOA and QAOA-warm were run for circuit depths � ∈ {1, 2, 4, 8}, for each optimizer considered,Q and AOA-warm for each considered rank of BM-MC (� = 2, 3) and for each rotation type (vertex-at-top and uniform random). We consider the best of 5 warm-starts (in objective value) when selecting BM-MC warm-starts, and subsequently the best of 5 random rotations, i.e., the rotation that yields the highest expected cut value at the end of the hybrid-optimization loop. In the experiments, we compute the expected cut values exactly (details in Section 2) rather than simulating quantum measurements or hyperplane rounding. For any two runs of standarQdAOA or QAOA-warm that difer only in choice of optimizer, the initial parameter values used are the same � and (with � sampled uniformly from the � � interval[−0.0001, 0.0001] for all �= 1, . . . , �). Our implementation of standar QAdOA and QAOA-warm utilizes ⊗� Google’sTensorflow uantum library and IBM’s Qiskit library. The|+⟩ stateis initialized by applying a ⊗� Hadamard gate to each qubit in|0⟩ . For states initialized based on low-rank approximate solutions, we generate the initial state as discussed in Section 3, which is easily implemented using standard rotation gates. For each epoch of each run of standardQAOA or QAOA-warm, our implementation records the values of the variational parameters, the expected cut value at those parameters, and the probability distribution of cuts. all Each 2 run of standardQAOA and QAOA-warm terminated when the diference in successive values � (�of , �) was less −6 ¯ ¯ than � ∗ 10 , where � is the sum of the absolute values of the edge weights. We next summarize the results from these numerical simulations. 4.2 Optimizer Choice We consider four diferent optimizers to optimize � variational the 2 parameters: ADAM, BFGS, Nelder-Mead, and COBYLA and present comparisons between these set of optimizers. As demonstrated in Figure 4, when ADAM is compared to the other three optimizers, the expected cut values obtainedQfor AOA-warm are similar (i.e. within 0.01 diference in instance-speciic AR) for at least 90% of the�runs = 1;atthis percentage decreases at� = 8 but the instance-speciic approximation ratios are still relatively similar for the majority of the instances. This suggests that the instance-speciic approximation ratios achieQvAeO dA- for warm are largely independent of the optimizer used; for this reason, all remaining results involving instance-speciic approximation ratios for QAOA-warm will be in terms of runs using the ADAM optimizer. It should be noted that all of the optimizers considered vary in regards to runtime, e.g., the cost per iteration and the number of iterations required to train the variational parameters (we discuss this further in Appendix D) Even though the choice of the optimizer had almost no impact on QAthe OA-warm in terms of the instance- speciic approximation factors obtained, Figure 4 illustrates a noticeable � efe achie ct onved for standard A,� QAOA especially at the higher circuit depths that we teste�d=for 4 and ( � = 8). In particular, we ind that We suspect this is an artefact of the parameter landscapes becoming latter with the warm-starts. ACM Trans. Quantum Comput. Bridging Classical and uantum with SDP initialized warm-starts for QAOA • 13 Fig. 4. Histogram of diferences in instance-specific approximation ratio (AR) � between ADAM and other optimizers A,� for QAOA-Warm (top) and standard QAOA (botom). Overlapping regions are in purple. The red bin indicates instances for which optimizers performed similarly to ADAM. runs using the ADAM optimizer tend to have better performance for QAOA. For this reason, the remaining results in this paper regarding standard QAOA will only include runs that utilize the ADAM optimizer in order to obtain a more simple, direct, and fair comparison with A-warm QAO. 4.3 Choice of Rank and Rotations To compare QAOA-warm against standard-QAOA, GW, and hyperplane rounding of BM-MC , we need to narrow in to the choice of the BM-MC rank (2 or 3) and the type of rotation (vertex-at-top or uniform random) to use. We explore these two choices in this subsection. Recall that we consider the best-of-5 warm-starts for each type of rotation . Over the 1264 graph instances, for rank-3 initializations, we ind that the vertex-at-top rotations typically have a slight increase in performance over random uniform rotations, especially when rank-3 solutions are used (e.g., at � depth = 1, rank-3 vertex-at-top rotations obtain 0.9576 instance-speciic approximation ratio on average, whereas rank-3 uniform rotations obtain 0.9440). These results seem reasonable since vertex-at-top rotations rarely end up in states that plateau for warm- starts (see Section 5.2 for an example of such a warm-start). We include a summary of average instance-speciic approximation ratios observed across the four choices of rank and rotations in Table 1. We found that restarting standarQ dAOA multiple times did not impact the results signiicantly. ACM Trans. Quantum Comput. 14 • Reuben Tate, Majid Farhadi, Creston Herold, Greg Mohler, and Swati Gupta depth � = 1 depth � = 8 vert. uniform vert. uniform all graphs rank-2 0.9581 0.9581 rank-2 0.9726 0.9718 rank-3 0.9576 0.9440 rank-3 0.9688 0.9560 vert. uniform vert. uniform positive-weight rank-2 0.9569 0.9569 rank-2 0.9704 0.9697 graphs rank-3 0.9556 0.9441 rank-3 0.9659 0.9548 Table 1. Multiple tables comparing the average instance-specific approximation ratio achieved during QAOA-warm when utilizing diferent combinations of ranks and rotations during the preprocessing stage. For the top row of tables, these averages were computed using all the graphs in our graph librar G (seye Section 4.1) whereas for the botom row, we restrict our atention to only those graphsGinwith positive edge weights. Each run of standar QAdOA and QAOA-warm terminates −6 ¯ ¯ when the diference in successive values�of(�, �) is less than10 � where � is the sum of the absolute values of the edge weights. Fig. 5. Histogram of diferences in instance-specific approximation ratios between QAOA-Warm and standard QAOA. Overlapping regions are in purple. On the other hand, when using rank-2 initializations, there is virtually no diference between the two rotation approaches, as rank-2 solutions were speciically designed to avoid bad states for warm-starts. For the ease of presentation, the remainder of the results in this paper will utilize rank-2 initializations with a vertex-at-top rotation scheme as this appears to be one of the most promising combinations QAOA- for warm. 4.4 Aggregate Results Here we use aggregated results ofQAOA-warm in order to answer three key questions: (Q1) How doQ esAOA- warm fare compare to standardQAOA and classical Max-Cut algorithms (BM-MC and Goemans-Williamson), (Q2) How much ofQAOA-warm’s instance-speciic approximation ratio can be attributed to the warm-start ACM Trans. Quantum Comput. Bridging Classical and uantum with SDP initialized warm-starts for QAOA • 15 p=1 p=2 p=4 p=8 all positive all positive all positive all positive WBGS 25.08% 25.08% 26.42% 26.42% 23.97% 23.97% 16.61% 16.61% WBSG 0.00% 0.00% 0.16% 0.31% 0.32% 0.31% 0.71% 0.79% WGBS 30.93% 30.93% 32.28% 32.28% 29.98% 29.98% 21.84% 21.84% WGSB 0.00% 0.00% 0.24% 0.31% 0.32% 0.47% 2.06% 3.30% WSBG 0.00% 0.00% 0.16% 0.16% 2.61% 2.52% 4.27% 4.56% WSGB 0.08% 0.16% 0.00% 0.00% 2.29% 2.04% 4.27% 3.62% BWGS 0.95% 1.89% 0.71% 1.10% 0.16% 0.16% 0.00% 0.00% BWSG 0.08% 0.16% 0.24% 0.47% 0.08% 0.00% 0.00% 0.00% BGWS 22.39% 22.39% 17.01% 17.01% 6.25% 6.60% 1.34% 0.47% BGSW 1.50% 2.36% 4.83% 5.50% 10.44% 10.44% 4.03% 6.29% BSWG 0.24% 0.47% 0.32% 0.63% 0.71% 1.26% 0.63% 1.26% BSGW 0.32% 0.63% 0.24% 0.47% 1.34% 1.89% 1.11% 2.04% GWBS 1.50% 1.89% 1.50% 1.89% 1.27% 1.57% 0.47% 0.31% GWSB 0.08% 0.16% 0.08% 0.16% 0.08% 0.16% 0.55% 0.94% GBWS 15.66% 15.66% 10.92% 10.92% 4.91% 4.87% 1.11% 0.63% GBSW 0.71% 0.63% 3.72% 3.62% 6.09% 6.09% 2.69% 4.72% GSWB 0.00% 0.00% 0.00% 0.00% 0.08% 0.16% 0.40% 0.63% GSBW 0.00% 0.00% 0.00% 0.00% 0.08% 0.16% 0.24% 0.31% SWBG 0.00% 0.00% 0.00% 0.00% 1.27% 1.57% 8.86% 8.86% SWGB 0.16% 0.31% 0.32% 0.47% 1.74% 1.73% 8.23% 7.39% SBWG 0.00% 0.00% 0.40% 0.79% 0.87% 1.57% 1.42% 2.20% SBGW 0.16% 0.31% 0.24% 0.31% 2.69% 2.67% 11.71% 11.71% SGWB 0.16% 0.31% 0.16% 0.31% 0.08% 0.16% 0.08% 0.16% SGBW 0.00% 0.00% 0.08% 0.00% 2.37% 1.73% 7.36% 8.33% Total 100% 100% 100% 100% 100% 100% 100% 100% Table 2. We consider 4 algorithms: Goemans-Williamson (G), rank-2 Burer-Monteiro with hyperplane roundingQ(B), AOA- warm (W), and standardQAOA (S). There is a row for each of the 4! = 24 ways the algorithms can perform relative to one another with the cell value indicating the percentage of instances for which that ordering occurs. As an example, the top-letmost value indicates that for 25.08% of instances, � ≥ � ≥ � ≥ � in terms of AR with � and � being depth-1. The four largest entries in each column are bolded for emphasis. To account for numerical error for nearly solved instances, we declare QAOA-warm (W) as the best as long as it is within 0.001 AR of the best algorithm. We include columns corresponding to the entire graph librarG y as well as the subsetG ofthat have positive-weighted edges. itself v/s what is done by the quantum circuit, and (Q3) What are the trends QAOin A-warm’s instance-speciic approximation ratio with varying depth and graph size and how does this compare withQA standar OA? d (Q1). To answer the irst question, we compare standarQ dAOA, QAOA-warm, GW, and hyperplane rounding of the BM-MC solutions in Table 2. At depth-1, QAOA-warm is at least as good as the other three algorithms for 56.1% of the instances meanwhile standar QAdOA is the best for less than 1% of the instances. However, as the circuit depth increases, standar QAdOA is the best algorithm for a larger proportion of instances (37.66% of instances at depth � = 8); meanwhileQ,AOA-warm is still at least as good as the other algorithms for 49.8%, nearly half, of the instances. These results support our claim that warm-starts show improvements in performance ofQAOA at low circuit depths. Since standar QAdOA achieves the optimal cut in the limit as the circuit depth ACM Trans. Quantum Comput. 16 • Reuben Tate, Majid Farhadi, Creston Herold, Greg Mohler, and Swati Gupta increases and thus, for any particular graph, there exists some (instance-dependent) circuit � for depth which standard QAOA beats GW [17]. Current and near-term quantum devices are only able to reliably run QAOA for low circuit depth (due to the presense of quantum noise), and therefore we propose thatA- Qw Aarm O can be of signiicant use in this regime. Although our current implementation QAOA-wof arm does not perform as well at higher circuit depths (compared to standar QAdOA), it may be possible to extend QAOA-warm in order to see continued improvement with increased circuit depth by changing the mixers; we discuss this more in Section 6. We next consider the diference in instance-speciic approximation ratios obtaine QAOA-w darm by and standard QAOA. In Figure 5, we provide a detailed comparison between instance-speciic approximation ratios attained byQAOA-warm and standard QAOA in the form of a histogram. We see improvements in the instance- speciic approximation ratio ranging from 0.1 to 0.5 when using warm-starts, especially at low circuit depth. These results are consistent with those depicted in Table 2. We note that in this igure, as in the others, we take the best of 5 vertex-at-top rotations for QAOA-warm; and in Appendix A.4, we include results in the case where the median and worst (of 5) vertex-at-top rotations are used instead. (Q2). We now address the second key question regarding how much of the performanceQof AOA-warm can be attributed to the warm-start itself. This is an important question to address because if the improvement generated by QAOA-warm is due only to the initial quantum state � = 0athaving higher overlap with good solutions, then there would be no point in running the quantum device. To test this, we compare, in Figure 6, the improvement in instance-speciic approximation ratio from QAdepth-0 OA-warm (i.e. just measuring the initial state obtained from the preprocessing stage) to depth-1 QAOA-warm, as well as the improvement when we change the depth from 1 to 8. For 74 instances, we observed that the instance-speciic approximation ratio from QAOA-warm improved by at least 50% when going fr�om= 0 to � = 1 and by at least 80% for 22 instances. This shows the promise of using QAOA on top of the warm-starts. On the other hand, the increase in instance-speciic approximation ratio fromQdepth-1 AOA-warm to depth-8 QAOA-warm is milder, ranging upto 10% for positive-weighted instances and upto 22.3% for general graphs. These results show that running QAOA-warm does yield an increase in instance-speciic approximation ratio beyond simply sampling the initial warm-start state; however, the returns diminish with higher circuit depths (this is expecte QA dObA- ecause warm can plateau for some instances, Section 5.2). (Q3). Lastly, to address the third question, we consider how the performance QAof OA-warm varies across� (number of nodes) and� (circuit depth), which we illustrate for our graph library in Figure 7. While there is a signiicant improvement in performance for standar QAOAd with increasing circuit depth, we ind QAthat OA- warm consistently outperforms standarQdAOA (on average), except at � = 8. We also see that at ixed depth, the performance of both standardQAOA and QAOA-warm degrades as the number of nodes increases, while the degradation of QAOA-warm is much latter compared to standardQAOA. We further discuss pre-processing and parameter search time for QAOA-warm in Appendix D. 4.5 Parameter Landscapes and Trajectories We now consider lookingall at parameter combinations for � and � in order to obtain a better understanding of the landscape that we need to optimize over for standar QAdOA and QAOA-warm. For any graph � , initial state |� ⟩, and circuit depth � = 1, we can plot aparameter landscape which allows us to visualize the solution quality as a function of the variational parameters � and � . In particular, each point (� , � ) in the landscape is assigned 1 1 1 1 � (�,�)−Min-Cut(G) a color which corresponds to the instance-speciic approximation ratio (i.e. the quantity ). Max-Cut(� )−Min-Cut(G) As an example, we plot the parameter landscape for graph � in Figure 8 without and with warm-starts (using 2 vertex-at-top rotations and one uniform rotation). For each parameter landscape, we ranQA the OA training loop twenty times with random initializations (� , � ) and of overlayed the trajectories of the parameter values 1 1 throughout the training loop for the variational parameters. When no warm-start is used, the parameter landscape ACM Trans. Quantum Comput. Bridging Classical and uantum with SDP initialized warm-starts for QAOA • 17 Fig. 6. The number of instances for which QAOA-warm obtained at least an �% improvement in AR as the circuit depth increases from� = 0 to� = 1 (let) and from� = 1 to� = 8 (right). For each instance, the best percent improvement (across all five vertex-at-top rotations) is used. Note that % improvements in instance-specific approximation ratios go up to 80-120% from� = 0 to� = 1, and up to 12-20% from as depth increases fr�om = 1 to� = 8. All Graphs Positive-Weighted Graphs Fig. 7. This figure shows how standarQdAOA (doted) and QAOA-warm (solid) perform as we alter the circuit depth and the number of nodes. For QAOA-warm, we take the best of 5 vertex-at-top rotations. For the let plot, for � =each 2, . . . , 12, we find the instance-specific approximation ratio achieved for both standar QAOdA and QAOA-warm for each�-node instance in G (see Section 4.1), and take the median of those instance-specific approximation ratios. The right plot is constructed similarly except only instances inG with positive edge-weights are considered. We plot the results for circuit�depths = 1, 2, 4, 8. has many peaks and valleys and a wide range of solution qualities; using a warm-start drastically changes the landscape. However, if we rotate one of the approximate solution of BM-MC for� using a vertex-at-top rotation, this yields a ridge-like parameter landscape where the optimal parameter values lie near � =the 0. This line behavior is no longer there for a diferent vertex-at-top rotation for the same approximate solution. The endpoints of the optimization trajectories on the resultant are scattered, and the ridge-like shape is not as pronounced. When performing a uniform rotation, the globally optimal solution qualities are comparable to the solution qualities when rotating vertex 1 to the top; however, the landscape retains some less symmetric peaks and valleys and some of the trajectories end at local optima that are far from optimal. Overall, we see that the rotation used in the preprocessing stage can have a considerable efect on both the shape of the landscape and the solution qualities. Ideally, with a good choice of rotation, the parameter landscape ACM Trans. Quantum Comput. 18 • Reuben Tate, Majid Farhadi, Creston Herold, Greg Mohler, and Swati Gupta No Warm Start BM-MC , Vertex 2 to Top BM-MC , Vertex 1 to Top BM-MC , Uniform Rotation 3 3 Fig. 8. Parameter landscapes for� (top-let) with corresponding SDP solution (botom-let). For each trajectory of optimization of the variational parameters, we use a black circle to denote the beginning of the trajectory and×ato white denote the end of the trajectory. When no warm start is used, there are many peaks and valleys (top-center). When vertex 1 rotated to the top; we have a ridge-like landscape with the optimal solutions occurring on the horizontal � =line 0 (botom-center). When rotating vertex 2 at the top instead, the parameter landscape is less ridge-like and the endpoints of the trajectories are more scatered (top-right). When using a uniform rotation we have peaks and valleys similar to when no warm-start was used but with overall beter solution qualities (botom-right). has a ridge-like shape with high solution qualities near � =the 0,line in which case � ,= � = 0 is a natural choice of initialization when running A-warm QA.O To quantify latness of the parameter landscapes when using warm-starts, we consider some simple aggregate statistics of the landscapes of all unit-weight graphs inG. For each graph, we view each point in the parameter landscape as producing an instance-speciic approximation ratio in [0,1]. We compute the minimum, maximum, and average instance-speciic approximation ratios found across each landscap . As sho e wn in FigureQ 9,AOA- warm landscapes have lower range of instance-speciic approximation ratios, e.g., 80.4% of the instances have a range of at most 0.4 in the instance-speciic approximation ratios attained in the landscape. This means that any two choices of � , � parameters will produce solutions with a diference in instance-speciic approximation ratio 1 1 of at most 0.4. In contrast, only 27.5% of our graph instances have such a range of instance-speciic approximation ratios for the standarQ dAOA. We further see that when we use warm-starts, the overall quality of approximation across the parameter landscape improves. This can be seen by observing a higher minimum, maximum, and average instance-speciic approximation ratios than standar QAOAd. Due to the symmetries in the QAOA circuit for unit-weight graphs, we know that it suices to check the values � (�,of �) for(�, �) in [−�, �] × [−�/4, �/4] [51]. The minimum, maximum, and average are computed by considering a discretization of the landscape. In particular, we consider the values � � � of� (� , � ) for all(� , � ) ∈ D = {(� , ) : �= −50,−49, . . . , 50 and �= −50,−49, . . . , 50}. 1 1 1 1 1 50 4 50 ACM Trans. Quantum Comput. Bridging Classical and uantum with SDP initialized warm-starts for QAOA • 19 Fig. 9. This figure shows how various statistics of the parameter landscape change with the variant QAOA consider of ed (standardQAOA, QAOA-warm with vertex-at-top rotations, and QAOA-warm with random rotations). For each unit weight graphs in our graph libraryG (See Section 4.1) and for eachQAOA variant, we first generate the parameter landscape; we use a single rank-2 initialization for both rotation schemes considereQ dAfor OA-warm. For each landscape, we calculate the minimum, maximum, and average across the landscape in addition to the range (the diference between the highest and lowest instance-specific approximation ratio achieved in the landscape). 5 THEORETICAL BOUNDS In this section, we theoretically analyze QAOA-warm and demonstrate its strengths and weaknesses compared to standard QAOA. The literature on provable approximation guaranteesQA for OA is sparse. In 2014, Farhi et al. [17] proved a 0.6924-approximation for 3-regular graphs� at= 1; for triangle-fr�e-r eegular graphs, Wang et al. 1 1 [49] demonstrated that depth-1 QAOA achieves an approximation ratio of at least 1+ . Wurst and �(�+1) Love extend Farhi et al.’s result to show that depth-2 QAOA achieves a 0.7559-approximation on 3-regular graphs (and depth-3 QAOA achieves a 0.7924-approximation ratio on 3-regular graphs under some conjectures). For higher circuit depths and for more general graphs, not much else is knownQab AO out A approximation bounds. Our results add to this narrative. We irst show that,� at= 0 (i.e. before any gates are applied beyond initializa- tion),QAOA-warm on graphs with non-negative edge-weights achieves at least .75� and 0 0.66� approximations for theMax-Cut when using a�-close BM-MC and BM-MC solution (which may correspond to distribu- 2 3 tions over cuts) respectively; for � > 2/3 and � > 3/4 (for rank-2 and rank-3 respectively), this results in an In this context, we say that a local BM-MCsolution � is�-close if BM-MC(�) ≥ �Max-Cut(� ). � � ACM Trans. Quantum Comput. 20 • Reuben Tate, Majid Farhadi, Creston Herold, Greg Mohler, and Swati Gupta improvement from the/12-approximation provided by standard QAOA� at= 0. Though the worst-case results on approximation ratios for Burer-Monteiro relaxation give trivial forbrank-2 ounds and rank-3 solutions, � could be much higher in practice (e.g., in our simulations, we � obser ≥ 0v.999 ed for all positive-weighted instances for BM-MC, the same can be said for BM-MCwith the exception of 19 instances with the smallest � 3 2 observed being� = 0.833). Next, we discuss QAO A-warm’s performance in the case where the initialization has a particular antipodal structure. We prove that such structures naturally arise when considering locally approximate BM-MC solutions for (connected) even cycles. For these cases, vertex-at-top rotations recover the optimal solution. For uniform rotations on these antipodal structures, one can achieve optimality using solutions. BM-MC 5.1 Approximation Bounds for QAOA-warm We irst show that a biased initialization (using classical algorithms) can improve the theoretically known performance ofQAOA in some cases. Solving a rank- � SDP relaxation of Max-Cut is polynomial time solvable. Obtaining an optimal rank-1 solution is the ultimate goal and seems to be the hardest rank-constrained problem. The higher the rank, the more tractable the problem becomes. We show if one accomplishes the harder objective in the classicalQphase AOA-w ofarm, the quantum phase will be better initiated. Our randomized mapping from a low-rank BM solution to the Bloch sphere guarantees to preserve the objective by a factor of.75 0 and 0.66 at � = 0 for rank-2 and rank-3 initialization respectively. Theorem 2. Let � be a graph with non-negative edge weights. x If is a�-close solution toBM-MC (for � ) in 3-dimensions, (randomized) initialization of QAOA�with (x) has a (worst-case) approximation ratio of 0.66� at � = 0, i.e., only using quantum sampling with zero circuit depth for QAOA. Similarly x is , ifa�-close solution of BM-MC (for Max-Cut of � ) in2-dimensions, initialization of QAOA with � (x) is a0.75�-approximate solution 2 � at � = 0. Note that these approximation factors are lower bounds on the expected fraction of the Max-Cut obtained via random sampling. Proof. We start by proving the/23 performance of a randomized mapping from BM-MC to the Bloch sphere. ′ ′ Let � = � (�, �) be the expected value ofMax-Cut obtained by quantum sampling (i.e., QAOA�for = 0). Then, 0 0 ′ ′ � � 0 0 ≥ � · (sinceBM-MC (x) ≥ �Max-Cut(�)) Max-Cut(�) BM-MC (x) E[1[�and �have diferent spins ]] �+� � � ≥ � min . ( ≥ min( , ) for�,�, �,�≥ 0; � ’s cancel) � � �+� � � (�,�)∈� ∥x − x ∥ � � It suices to lower bound the ratio between edge-wise contribution from quantum sampling versus edge-wise contribution to the semi-deinite objective (which upper bounds the BM-MC denominator). Instead of rotating the sphere, we can choose a random direction � ∈ � to correspond to the positive spin of the Bloch sphere. Consider an edge� = (�, �) ∈ � whose endpoints are at angles � on � with respect tox, i.e.,x · x = cos �. Let � � � 1 and � correspond to angles from� and � to the positive spin � of the (rotated) sphere. We can write 2 � � E[1[�and �have diferent spins ]] �(� , � ) 1 2 min ≥ min , 1 2 (�,�)∈� �∈[0,�] ∥x − x ∥ sin(�/2) � � where � � � � 1 2 2 1 2 2 2 2 �(� , � ) = cos sin + cos sin , 1 2 2 2 2 2 17 1 In the worst-case, it is known that a local optimum (up to second order) for a�rank formulation of BM-MC is at least�a = 1 − �−1 approximation of the rank- � SDP relaxation for graphs with non-negative edge-weights (Theorem 1) [38]. ACM Trans. Quantum Comput. Bridging Classical and uantum with SDP initialized warm-starts for QAOA • 21 where we replacedE[1[�and �have diferent spins ]] by a sum of probabilities of the two cases corresponding to assignment of diferent spins �and to �, formulated considering the state is a product state and observing that 2 2 ∥� − � ∥ = 2− 2 cos(�) = 4 sin(�/2). We can rewrite the above as � � E [� (� )� (� ) + � (� )� (� )] E [1− cos(� ) cos(� )] � ,� |� 1 1 2 2 2 1 1 2 � ,� |� 1 2 1 2 1 2 min = min , �∈[0,�] 2(1− cos(�)) �∈[0,�] (1− cos(�)) where � (�) = 1+ cos(�) and � (�) = 1− cos(�). 1 2 To further simplify notation of our optimization problem let us assume that instead � of androtating � � � and sampling with respect to a spin direction, we randomly choose the positive �spin suchpiv thatotthe �-axis is now rotated to be�at∈ � . Without loss of generality, assume � = (1, 0, 0), � = (cos �, sin�, 0) and � � � = (cos�,sin� cos �, sin� sin�) ∈ � is uniformly sampled from the sphere. Let ℎ(�, �, �) = cos�(cos � cos� + sin� sin� cos �). This give us the following: E [1− cos(� ) cos(� )] � ,� |� 1 2 1 2 min �∈[0,�] (1− cos(�)) ∫ ∫ � 2� 1− ℎ(�, �, �) sin����� 4� 0 0 = min �∈[0,�] 1− cos � 1− cos � sin� cos ��� 2 0 = min �∈[0,�] 1− cos � cos � −1 3 cos � 1− cos � 1− 2 3 0 3 = min = min = . �∈[0,�] 1− cos � �∈[0,�] 1− cos � 3 This inishes the proof for BM-MC . Recall that for BM-MC, we perform a uniformly at random rotation along a unit circle on the Bloch sphere passing through|0⟩ and |1⟩. The proof is similar to the�rank = 3 case, and easier. It suices to lower bound the following ratio: E[1[�and �have diferent spins ]] �(�) min ≥ min , 1 2 (�,�)∈� �∈[0,�] sin(�/2) ∥x − x ∥ � � by 0.75. Here �(�) denotes the probability that two unentangled qubits with (angular) distance � over the sphere/circle, are measured by opposite spins. Similar as in previous proof we can simplify the ratio as E [1− cos(� ) cos(� )] � ,� |� 1 2 1 2 min , �∈[0,�] (1− cos(�)) where � and � are the angles between two vertices and the pivot. 1 2 Again we can think of vertices to be ixed over the sphere and randomly rotate|1⟩the pivot. Without loss of generality, let � = (1, 0) and � = (cos �, sin�). The random pivot can be formulated as(cos�,sin�) where � is � � uniformly distributed[o0v,er 2�). We can write� = � and � = � − �. The target ratio can be written as 1 2 2� 1− cos(�) cos(� − �)�� 2� 0 min �∈[0,�] 1− cos(�) 2� 1 2 1− cos (�) cos(�)��+ 0 2� = min �∈[0,�] 1− cos(�) ACM Trans. Quantum Comput. 22 • Reuben Tate, Majid Farhadi, Creston Herold, Greg Mohler, and Swati Gupta 1− cos(�) = min = . �∈[0,�] 1− cos(�) 4 In addition to preserving a provable fraction of the classicalobje BM-MC ctive, QAO A-warm can provably outperform standardQAOA on certain families of graphs. In particular, for even cycles, we show that any locally optimal BM-MCis also a globally optimal rank-1 solution. Hence the preprocessing stage yields a warm-start that is simply a collection of antipodal points corresponding Max-Cut to the, which can easily be recovered with a suitable rotation. On the other hand, depth- � standard QAOA is only able to achieve an theoretical worst-case approximation ratio at most (2� + 1)/(2� + 2) [17, 49]. More details regarding the behavior QAof OA-warm on warm-starts with antipodal structures can be found in Appendix C. 5.2 Limitations of QAOA-warm We now look at some of the limitations QAO ofA-warm. Speciically, we observe a decrease in performance due to warm-starts close to the eigenstates of the mixer with zero eigenvalue. 5.2.1 QAOA-warm at High Circuit Depth. In the case of the standard initialization QAOfor A, we know that with the optimum choice of parameters �, �, the probability of sampling Max-Cut the (with a single measurement) approaches 1 as the circuit depth � approaches ininity. This is not the case QAfor OA-warm: Theorem 3. There exists a graph� and a warm-start initialization �(of) such that for all � ≥ 0, depth-� QAOA-warm (with any choice of variational parameters � and �) results in� (�, �) = Max-Cut(�), or in other words, the expected cut obtained via quantum sampling isMax-Cut(�). Proof idea. Let � be a graph on two vertices connected by an edge of unit weight. Suppose that we run 1 1 √ √ QAOA-warm starting with the state|�⟩ := |�⟩⊗|�⟩ where|�⟩ := |+⟩ = (|0⟩+|1⟩) and|�⟩ := |−⟩ = (|0⟩−|1⟩). 2 2 Note that |�⟩ corresponds to an optimal solution to BM-MC . For � = 0, it is easy to show that we obtain, in expectation, 50%Max-Cut of the of� . After the cost term of the circuit is applied, the resulting state is unafected by the mixing term (since the resulting state is an eigenstate of the mixing Hamiltonian with eigenvalue zero) and thus there is no change in measurement. Even higher circuit depths have no efect in driving the system out of the eigenstate and�thus (�, �) = 0.5 for all � and any �, �, i.e., QAOA-warm (initialized|�with ⟩) obtains only 50% of the Max-Cut of� (in expectation) regardless of circuit depth or choice of variational parameters. □ We include a complete proof in Appendix B. The previous theorem shows QAthat OA-warm may achieve poor instance-speciic approximation ratios speciic on initial states, however we next discuss that this behavior is consistent across slight perturbations around this state as well. In Figure 10, at any(�,p�oint ) we depict the percentage ofMax-Cut obtained using the optimal choice of variational parameters if the initial state of the irst qubit is given by the polar and azimuthal angles � and � and the second qubit is diametrically opposed. Note that the optimal Max-Cut is achieved with probability 1 only when both vertices �lie �-plane in the . The worst case occurs when the vertices lie on the �-axis; this is consistent with Theorem 3. In general, in expectation, there is a larger gap to optimality the closer the solution x is to the�-axis; which suggests that it is reasonable to embed the approximate solutions of BM-MC in the��-plane of the Bloch sphere (as done in our preprocessing stage). Lastly, we believe that this behavior is consistent at larger circuit depths as well. 6 DISCUSSION In this work, we proposed using classical approximate solutions to lo Max-Cut w-rank formulations to initialize the QAOA algorithm. There are signiicant diferences in classical approximation algorithms Max-Cut and for ACM Trans. Quantum Comput. Bridging Classical and uantum with SDP initialized warm-starts for QAOA • 23 Fig. 10. A plot of the percentage of the Max-Cut achieved withQAOA-warm (when the optimal variational parameters are chosen) with� = 1 for a one-edge graph� at various starting states |� ⟩ = � (�) where one point of� has polar angle � and azimuthal angle � and the remaining point is diametrically opposed. The starting states that perform the worst, |+−⟩i.e. and |−+⟩, are marked with a black×. For each point in the figure, the optimal variational parameters were estimated by performing a dense grid-search over the variational parameter space. quantum algorithms. For example, in the classical approach the vertices that share the same 3-dimensional representation on the sphere will always be on the same side of the cut (no matter which hyperplane is selected). In contrast, quantum sampling creates a very diferent distribution (with a larger support) over cuts, wherein vertices with the same state can be sampled on diferent sides of the cut. Despite this diference, we observe that as the angle� of the vertices to the measurement axis approaches 0, the probability distribution of the classical solution approaches that of the quantum sampling. Intuitively, as vertices start clustering at the antipodes on the 3-dimensional sphere, quantum sampling of the corresponding qubits and hyperplane rounding of the 3-dimensional representation both give similar cuts. Moreover, SDP-based solutions spread adjacent vertices (with positive edge weights) as far as possible�on -dimensional the sphere, which can be beneicial for quantum sampling as well. Standard QAOA is a local algorithm 17].[If the circuit depth � is not high enough, then standarQdAOA may fail to achieve near-optimal solutions 9, 16]. Ho [ wever, when one considers the preprocessing stage used in QAOA-warm, such a locality property no longer exists. A clear example of this is applie BM-MCd to an odd cycle: the optimal solution consists of the vertices evenly spaced apart along the unit circle. However, if a single edge is deleted, the optimal solution collapses to a rank-1 solution. The edge deletion has a global efect on the positions of all the vertices, and consequently, on the probability of each edge being cut. Put another away, although the quantum operations in QAOA-warm are still local, the warm-start encodes information about the global structure of the graph, in which case, building up correlations between distant qubits (via a high circuit depth) may not be necessary if a high-quality warm-start is used. Warm-starts also appear to latten the energy landscape in terms of (�,�). In the most extreme case (for example Figure 14(b)), the warm start inds the optimal solution, completely decoupling the QAOA optimization loop from� and the cost Hamiltonian � . Even when this does not occur, warm-starting still appears to make 1 � QAOA less sensitive to initial (�,�) values by starting of in the neighborhood of a possible solution. In particular, the role of� is diminished, as the warm-start has already begun optimizing the cost-energy. This suggests that QAOA-warm serves as a kind of dimensional reduction, emphasizing the amplitude manipulation of the mixer over the energy weighting of the cost Hamiltonian. This is not a guaranteeQthat AOAthe optimization will ind the optimal solution in the reduced space; the reduction may hide the optimal solution for graphs that are ACM Trans. Quantum Comput. Approximation Ratio 24 • Reuben Tate, Majid Farhadi, Creston Herold, Greg Mohler, and Swati Gupta especially challenging for SDP solvers. However, this lattening may prove important for physical implementations of QAOA. The warm-start lattened landscapes may make QAOA more robust to both classical and quantum noise that would otherwise complicate the optimal solution search. In this work, we restricted our attention to rank-2 and rank-3 initializations, whereas in classical methods, one could also make an attempt at inding rank- � (� > 3) solutions. These solutions are easier to ind, and yield provably better approximations � incr as eases38 [ ]. However, increasing the number of dimensions makes the mapping to the quantum states non-trivial. Exploration of higher-rank approximations are left as a future research direction. Another direction for future work is to applyA- Qw Aarm O to other combinatorial problems. One path is reduction of other problems in NPMax-Cut to [26]. Alternately, Quadratic Unconstrained Binary Optimization (QUBO) problems can easily be recast asMax-Cut a problem (and vice versa) with the number of variables difering by at most 14 1 []. Many combinatorial problems are easily expressed in the form of a21 QUBO , 34] so [ Max-Cut is also an interesting problem to consider from a practical standpoint. However, it may be of interest to see if our approach can be used directly for other combinatorial problems without resorting to such reductions. Lastly, as seen in Section 5.2, we acknowledge that QAOA-warm, in its current form, has limitations; in particular, increased circuit depth does not necessarily yield optimality Max-Cut in of the limit. Nonetheless, we believe that QAOA-warm is a promising approach since, even at low circuit depth, it is able to start with relatively high instance-speciic approximation ratios (compared to QAstandar OA). This d performance may be extendable to higher circuit depth via modiications to the mixing Hamiltonian � ; this idea yields positive results in Egger et al.’s work and we believe that mixer modiications may be beneicial QAOA-warm to as well. 7 CONCLUSION We explored the idea of warm-starts for initializing the quantum stateQof AOA thealgorithm, and showed promising experimental and theoretical results for low-rank initializations using approximate SDP solution. On average, we ind thatQAOA-warm performs better in terms of time and quality of solutions in low depth circuits, compared to standard QAOA. Moreover, even though a portion of the instance-speciic approximation ratios of QAOA-warm can be attributed to the classical warm-start itself, we ind that running QAOA-warm introduces signiicant improvements in expected cut quality beyond simply (quantum) sampling the initial warm-start state for many instances. As the circuit depth increases, QAOA-warm is however unable to converge to the optimal solution (unlike standar QAdOA). We believe that this could be remedied by considering further modiications to QAOA-warm (e.g. modifying the mixing Hamiltonian), although standard mixers might provide easier implemen- tation on certain hardware. We further acknowledge that beyond the instance-speciic approximation ratio, there are a variety of methods and metrics in which to measure the performance QAOof A and its possible variants. We leave such an exploration of the cut distributions (and metrics on those distributions) for potential future work; we refer the reader to a paper by Herrman et al. for such results on standarQdAOA [25]. Overall, we believe that the use of the standard mixers with warm-starts allows a principled way of bringing in information from classical solvers into quantum algorithms. The concept of warm-starts and plateauing of quality of instance-speciic approximation at higher � depth could be of interest to researchers looking at reachability of solution state space and, at the limitations and strengths of the standard QAOA itself. ACKNOWLEDGMENTS This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) under Contract No. HR001120C0046. ACM Trans. Quantum Comput. Bridging Classical and uantum with SDP initialized warm-starts for QAOA • 25 REFERENCES [1] Vishwanathan Akshay, Hariphan Philathong, Mauro E.S. Morales, and Jacob D. Biamonte. 2019. Reachability Deicits in Quantum Approximate Optimization. arXiv preprint arXiv:1906.11259v2 (2019). [2] Andreas Bärtschi and Stephan Eidenbenz. 2020. Grover Mixers for QAOA: Shifting Complexity from Mixer Design to State Preparation. arXiv preprint arXiv:2006.00354 (2020). [3] Alexander I. Barvinok. 1995. Problems of distance geometry and convex properties of quadraticDiscr maps.ete & Computational Geometry 13, 2 (1995), 189ś202. [4] Dimitris Bertsimas, Angela King, and Rahul Mazumder. 2016. Best subset selection via a modern optimization The Annalslens. of Statistics44, 2 (2016), 813ś852. [5] Samuel Burer and Renato DC Monteiro. 2005. Local minima and convergence in low-rank semideinite programming. Mathematical Programming103, 3 (2005), 427ś444. [6] Nicolas Boumal, Vladislav Voroninski, and Afonso S. Bandeira. 2018. The non-convex BurerśMonteiro approach works on smooth semideinite programs. arXiv preprint arXiv:1606.04970 (2018). [7] Nicolas Boumal, Vladislav Voroninski, and Afonso S Bandeira. 2020. Deterministic Guarantees for Burer-Monteiro Factorizations of Smooth Semideinite Programs. Communications on Pure and Applied Mathematics 73, 3 (2020), 581ś608. [8] Fernando G.S.L. Brandao, Michael Broughton, Edward Farhi, Sam Gutmann, and Hartmut Neven. 2018. For Fixed Control Parameters the Quantum Approximate Optimization Algorithm’s Objective Function Value Concentrates for Typical arXiv Instances. preprint arXiv:1812.04170(2018). [9] Sergey Bravyi, Alexander Kliesch, Robert Koenig, and Eugene Tang. 2019. Obstacles to state preparation and variational optimization from symmetry protection.arXiv preprint arXiv:1910.08980 (2019). [10] Samuel Burer and Renato DC Monteiro. 2003. A nonlinear programming algorithm for solving semideinite programs via low-rank factorization. Mathematical Programming95, 2 (2003), 329ś357. [11] Samuel Burer, Renato D. C. Monteiro, and Yin Zhang. 2001. Rank-2 Relaxation Heuristics for Max-Cut and Other Binary Quadratic Programs. SIAM Journal on Optimization 12, 2 (2001), 503ś521. [12] M. Charikar and A. Wirth. 2004. Maximizing quadratic programs: extending Grothendieck’s ine 45th quality Annual . InIEEE Symposium on Foundations of Computer Science . 54ś60. https://doi.org/10.1109/FOCS.2004.39 [13] C. Delorme and S. Poljak. 1993. Laplacian eigenvalues and the maximum cut problem. Mathematical Programming62 (1993), 557ś574. [14] Iain Dunning, Swati Gupta, and John Silberholz. 2018. What works best when? A systematic evaluation of heuristics for Max-Cut and QUBO. INFORMS Journal on Computing30, 3 (2018), 608ś624. [15] Daniel J Egger, Jakub Marecek, and Stefan Woerner. 2020. Warm-starting quantum optimization. arXiv preprint arXiv:2009.10095 (2020). [16] Edward Farhi, David Gamarnik, and Sam Gutmann. 2020. The quantum approximate optimization algorithm needs to see the whole graph: A typical casearXiv . preprint arXiv:2004.09002 (2020). [17] Edward Farhi, Jefrey Goldstone, and Sam Gutmann. 2014. A quantum approximate optimization algorithm. arXiv preprint arXiv:1411.4028 (2014). [18] William Feller. 1971. An Introduction to Probability Theory and Its Applications (3rd ed.). Vol. 2. Wiley, Hoboken, New Jersey. [19] Roger Fletcher. 1987.Practical Methods of Optimization (2nd edition) . John Wiley and Sons, New York, NY, USA. [20] Fuchang Gao and Lixing Han. 2012. Implementing the Nelder-Mead simplex algorithm with adaptive parameters. Computational Optimization and Applications 51 (2012), 259ś277. [21] Fred Glover, Gary Kochenberger, and Yu Du. 2019. Quantum Bridge Analytics I: A Tutorial on Formulating and Using QUBO Models. arXiv preprint arxiv:1811.11538 (2019). [22] Michel X Goemans and David P Williamson. 1995. Improved approximation algorithms for maximum cut and satisiability problems using semideinite programming. Journal of the ACM (JACM) 42, 6 (1995), 1115ś1145. [23] Aric Hagberg, Pieter Swart, and Daniel S Chult. 2008. Exploring network structure, dynamics, and function using NetworkX . Technical Report. Los Alamos National Lab.(LANL), Los Alamos, NM (United States). [24] Johan Håstad. 2001. Some optimal inapproximability Journal results.of the ACM (JACM) 48, 4 (2001), 798ś859. [25] Rebekah Herrman, Lorna Trefert, James Ostrowski, Phillip C Lotshaw, Travis S Humble, and George Siopsis. 2021. Impact of graph structures for QAOA on MaxCut.Quantum Information Processing20, 9 (2021), 1ś21. [26] Richard M. Karp. 1972.Reducibility among Combinatorial Problems . Springer US, Boston, MA, 85ś103. [27] William Karush. 1939. Minima of functions of several variables with inequalities as side conditions . Master’s thesis. University of Chicago. [28] Subhash Khot. 2002. On the power of unique 2-prover 1-round games. In Proceedings of the thiry-fourth annual ACM symposium on Theory of computing. 767ś775. [29] Subhash Khot, Guy Kindler, Elchanan Mossel, and Ryan O’Donnell. 2007. Optimal inapproximability results for MAX-CUT and other 2-variable CSPs? SIAM J. Comput. 37, 1 (2007), 319ś357. [30] Diederik Kingma and Jimmy Lei Ba. 2017. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2017). ACM Trans. Quantum Comput. 26 • Reuben Tate, Majid Farhadi, Creston Herold, Greg Mohler, and Swati Gupta [31] Harold W. Kuhn and Albert W. Tucker. 1951. Nonlinear Programming. ProInceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability . University of California Press, Berkeley, Calif., 481ś492. [32] Wim Lavrijsen, Ana Tudor, Juliane Müller, Costin Iancu, and Wibe de Jong. 2021. Classical Optimizers for Noisy Intermediate-Scale Quantum Devices.arXiv preprint arXiv:2004.030043 (2021). [33] Yin Tat Lee and Swati Padmanabhan. 2019. An � (� /� .5)-Cost Algorithm for Semideinite Programs with Diagonal Constraints. arXiv preprint arXiv:1903.01859(2019). [34] Bas Lodewijks. 2020. Mapping NP-hard and NP-complete Optimization Problems to Quadratic Unconstrained Binary Optimization Problems.arXiv preprint arxiv:1911.08043 (2020). [35] Russell Lyons and Yuval Peres. 2017.Probability on trees and networks . Vol. 42. Cambridge University Press. [36] Tobia Marcucci and Russ Tedrake. 2020. Warm start of mixed-integer programs for model predictive control of hybrid IEEEsystems. Trans. Automat. Control (2020). https://doi.org/10.1109/TAC.2020.3007688 [37] Glen Bigan Mbeng, Rosario Fazio, and Giuseppe E. Santoro. 2019. Quantum Annealing: a journey through Digitalization, Control, and hybrid Quantum Variational schemes. arXiv preprint arXiv:1906.08948 (2019). [38] Song Mei, Theodor Misiakiewicz, Andrea Montanari, and Roberto I Oliveira. 2017. Solving SDPs for synchronization and MaxCut problems via the Grothendieck inequality arXiv. preprint arXiv:1703.08729 (2017). [39] Elchanan Mossel, Ryan O’Donnell, and Krzysztof Oleszkiewicz. 2005. Noise stability of functions with low inluences: invariance and optimality.46th In Annual IEEE Symposium on Foundations of Computer Science (FOCS’05) . IEEE, 21ś30. [40] Yurii Nesterov and Arkadi Nemirovski.Interior-p 1994. oint polynomial algorithms in convex programming . SIAM, Philadelphia, PA, USA. [41] Gábor Pataki. 1998. On the rank of extreme matrices in semideinite programs and the multiplicity of optimal Mathematics eigenvalues. of Operations Research23, 2 (1998), 339ś358. [42] Svatopluk Poljak and Franz Rendl. 1995. Nonpolyhedral Relaxations of Graph-Bisection PrSIAM oblems. Journal on Optimization 5, 3 (1995), 467Ð-487. [43] M.J.D. Powell. 1994. A Direct Search Optimization Method That Models the Objective and Constraint Functions by Linear Interpolation. Advances in Optimization and Numerical Analysis 275 (1994), 51ś67. [44] John Preskill. 2018. Quantum Computing in the NISQ era and beyQuantum ond. 2 (2018), 79. [45] Ted Ralphs and Menal Güzelsoy. 2006. Duality and warm starting in integer programming. Proceedings In of 2006 NSF Design, Service, and Manufacturing Grantees and Research Conference. [46] Stefan H. Sack and Maksym Serbyn. 2021. Quantum Annealing Initialization of the Quantum Approximate Optimization Algorithm. arXiv preprint arXiv:2101.05742 [quant-ph] (2021). [47] The Sage Developers. 2020.SageMath, the Sage Mathematics Software System. https://www.sagemath.org. [48] Luca Trevisan, Gregory B Sorkin, Madhu Sudan, and David P Williamson. 2000. Gadgets, approximation, and linear programming. SIAM J. Comput. 29, 6 (2000), 2074ś2097. [49] Zhihui Wang, Stuart Hadield, Zhang Jiang, and Eleanor G Riefel. 2018. Quantum approximate optimization algorithm for MaxCut: A fermionic vie Physical w. Review A97, 2 (2018), 022304. [50] Max Wilson, Rachel Stromswold, Filip Wudarski, Stuart Hadield, Norm M. Tubman, and Eleanor G. Riefel. 2021. Optimizing quantum heuristics with meta-learning. Quantum Machine Intelligence3, 13 (2021). [51] Leo Zhou, Sheng-Tao Wang, Soonwon Choi, Hannes Pichler, and Mikhail D Lukin. 2018. Quantum approximate optimization algorithm: performance, mechanism, and implementation on near-term devices. arXiv preprint arXiv:1812.01041 (2018). [52] Lingua Zhu, Ho Lun Tang, George S. Barron, F.A. Calderon-Vargas, Nicholas J. Mayhall, Edwin Barnes, and Sophia E. Economou. 2020. An Adaptive Quantum Approximate Optimization Algorithm for Solving Combinatorial Problems on a Quantum arXiv Computer. preprint arXiv:2005.10258 [quant-ph](2020). A COMPUTATIONAL DETAILS A.1 Details of Rotations A.1.1 Random vertex-at-top. We irst describe the rotation in 3-dimensions for vertex � = (sin� cos � , sin� sin� , cos� ) , � � � � � � 3 � which is sampled uniformly at random �∈(for [�]). The rotation that maps� ∈ R to (0, 0, 1) is obtained by irst rotating clockwise along�the -axis by� , followed by a clockwise rotation along �-axis the by� , followed � � by a uniform at random rotation � in[0, 2�] around the �-axis. ∗ � Indeed, one can check that � (x (�)) = (0, 0, 1) (which will correspond to the quantum state |0⟩ on the Bloch � � Sphere). For rank-2 solutions, with a uniform at random verte � =x(cos(� ), sin(� )) sampled from�∈ [�], � � � ACM Trans. Quantum Comput. Bridging Classical and uantum with SDP initialized warm-starts for QAOA • 27 Algorithm 2: Obtain Solution for BM-MC Input: Weighted graph� = (�, �), � : � → R, � ∈ {2, 3} 1 If� = 2, let� , . . . , � ∈ R be the angles of� points chosen uniformly at random on the 2-dimensional unit 1 � circle. �If= 3, let(� , � ), . . . ,(� , � ) be the spherical coordinates�of points chosen uniformly at 1 1 � � random on the 3-dimensional sphere. 2 repeat 3 for �= 1 through � do // coordinate ascent 4 Sample the perturbation value(Δs)� (and Δ� if � = 3) from� (−�, �) for small � > 0. 5 Update � = � + Δ� (and � = � + Δ� if � = 3) and compute the BM objective. � � � � 6 If the objective improves, keep the perturbation. 7 end −5 8 until no improvement in objective by≥ 10 |� | within 100 evaluations. �∈� we can simply work with polar coordinates and shift all polar angles � to obtain by the random vertex-at-top rotation. To be precise, we set � = � −� where � denotes the angle of the point corresponding to �the th vertex � � � � in the rank-2 solution. A.1.2 Uniform at Random Rotation. In this case, we uniformly pick a rotation�of -dimensional the sphere and apply it. For rank-3 solutions, one way to accomplish this is by picking �ˆ uniformly a point at random from the surface of the sphere, rotating �ˆ to the top of the sphere (in a way similar to the vertex-at-top rotations), and then performing a uniform random rotation[0,in 2�] around the �-axis. Such an�ˆ can be generated by picking �, � uniformly at random from the inter[0val , 1] and then setting� = 2�� and � = arccos(2� − 1). The pair(�, �) will then correspond to the polar and azimuthal angles of the randomly chosen� pon oint the surface of the sphere [18]. For rank-2 solutions, we can simply shift all the angles by some random angle. More precisely � = �, set + � � � where � denotes the angle of the point corresponding to �the th vertex in the rank-2 solution and � is chosen uniformly at random in [0, 2�]. A.2 Finding approximate BM-MC solutions. In Algorithm 2, we describe our implementation for obtaining the semideinite programming (SDP) solution for BM-MC for� = 2, 3 using coordinate ascent. In the algorithms below, we write � (�,�) to denote the uniform distribution on the inter[�val ,�] (where �,� ∈ R with� < �). We set � = 1/20 for experiments in this work. We normalize the angles output by BM-MCto enforce the standard range of angles for spherical coordinates without changing the objective value. A.3 Graph Instances. As mentioned in Section 4, for our simulations, we irst generate a collection of unit-weight graphs. From each one, we create multiple weighted graphs by applying diferent edge-weight distributions to the unit-weight graph. Below, we describe the collection of unit-weight graphs that were generated for this process. The collection of non-isomorphic graphs up to 6 nodes were generated SageMath using [47]. The remaining instances were generated using various random graph generators found in NetworkX the package [23]; the parameter names used below are the same as those used in the corresponding NetworkX functions . • All non-isomorphic connected graphs up to 6 nodes (142 instances) ACM Trans. Quantum Comput. 28 • Reuben Tate, Majid Farhadi, Creston Herold, Greg Mohler, and Swati Gupta Fig. 11. Illustration depicting the range of graph metrics for our instanceGlibrar . When y comparing unit-weight Erdős-Rényi graphs (red) with the remaining graphs inG (blue), there is an increase in the range of values obtained for both graph metrics. • Erdos-Renyi (42 instances): for each � from� = 7 to � = 12, create 7 instances with � sampled from[0, 1] uniformly. • Random Regular (42 instances): for eac�hfrom� = 7 to � = 12, create 7 instances with � sampled uniformly from valid degrees. • Barabasi Albert (18 instances): for each � from� = 7 to � = 12 and for all � in{1, 2, 3}, create 1 instance (with initial graph being star graph � + 1onnodes) • Dual Barabasi Albert (36 instances): for each � from� = 7 to � = 12 and for all{(� ,� ) : � ,� ∈ 1 2 1 2 {1, 2, 3},� ≠ � } with� = 0.25, create 1 instance with initial graph on star with (� ,�max) + 1 nodes 1 2 1 2 • Watts Strogatz Graphs (18 instances): for each � from� = 7 to � = 12, for all � in{2, 4, 6}, create 1 instance with� sampled uniformly fr[om 0, 1]. • Newman Watts Strogatz Graphs (18 instances): for each � from� = 7 to � = 12, for all k in {2, 4, 6}, create 1 instance with � sampled uniformly fr[om 0, 1]. Figure 11 demonstrates how varied our ensemble is with respect to two important graph metrics dependent on eigenvalues of the normalized Laplacian [35]. A.4 QAOA-Warm with Median and Worst Vertex-At-Top Rotations For our numerical simulations in Section 4, we use the best of either 5 vertex-at-top rotations or best of 5 uniform rotations for QAA- O warm. Performing multiple runs QAof OA-warm with diferent rotations and taking the best allows one to mitigate the possibility of using a warm-start with a poor rotation. We present the results with respect to the median rotation here. We plot the results below in Figure 12; we see that the results do not difer much from what was seen in Figure 5. In Figure 13, we also plot the results when the worst of 5 vertex-at-top rotations are used to give an idea of the worst-case performanceQA for OA-warm. B PROOFS Proof for Theorem 3. Proof. We consider a graph� = (�, �) on two vertices connected by an edge of unit weight initialized with 1 1 ∗ |�⟩ := |�⟩ ⊗ |�⟩ where |�⟩ := |+⟩ = √ (|0⟩ + |1⟩) and |�⟩ := |−⟩ = √ (|0⟩ − |1⟩). (Note that |�⟩ = � (� ) where 2 2 ∗ � � � = ((1, 0, 0) ,(−1, 0, 0) ) is an optimal solution to BM-MC .) ACM Trans. Quantum Comput. Bridging Classical and uantum with SDP initialized warm-starts for QAOA • 29 Fig. 12. Histograms comparing the instance-specific approximation ratio in (depth- �) QAOA-warm and (depth-�) standard QAOA for both� = 1 (blue) and� = 8 (red) where the median vertex-at-top rotations are used. Overlapping portions of the histogram are in purple. The let plot is generated using the graphs in our graphG librar (seeySection 4.1) whereas for the botom right plot, we restrict our atention to only those graphs G with in positive edge weights. Fig. 13. Histograms comparing the instance-specific approximation ratio in (depth- �) QAOA-warm and (depth-�) standard QAOA for both� = 1 (blue) and� = 8 (red) where the worst vertex-at-top rotations are used. Overlapping portions of the histogram are in purple. The let plot is generated using the graphs in our graphG librar (seeySection 4.1) whereas for the right plot, we restrict our atention to only those graphs G with in positive edge weights. We irst consider the � = 1 case. For convenience, let � := � and � := � for this case. Observe, 1 1 1 1 1 |�⟩ = |�⟩ ⊗ |�⟩ = (|0⟩ + |1⟩) ⊗ (|0⟩ − |1⟩) = (|00⟩ − |01⟩ + |10⟩ − |11⟩). √ √ 2 2 ACM Trans. Quantum Comput. 30 • Reuben Tate, Majid Farhadi, Creston Herold, Greg Mohler, and Swati Gupta Note that if we were to do a quantum measurement of this state, we would get each of the four states |00⟩ ,|01⟩ ,|10⟩ ,|11⟩ with equal probability, i.e., the theorem holds�in = 0the case as well. −��� −��·1 Since� is the Hamiltonian ofMax-Cut the problem,� |01⟩ = 1 · |01⟩. Thus, � |01⟩ = � |01⟩ = � � −�� −��� −�� −��� −��� � � � � |01⟩. Similar calculations show� that |10⟩ = � |10⟩ , � |00⟩ = |00⟩ , and � |11⟩ = |11⟩ and thus by linearity, 1 1 ′ −��� −��� −�� � � |�⟩ := � |�⟩ = � (|00⟩ − |01⟩ + |10⟩ − |11⟩) = |00⟩ − |11⟩ + � |10⟩ − |01⟩ . 2 2 � � For a 2-node graph, � = � + � , and thus, 1 2 � � � � � (|00⟩ − |11⟩) = � |00⟩ − � |11⟩ + � |00⟩ − � |11⟩ = 0, 1 1 2 2 and similarly � (|10⟩ − |01⟩) = 0. ′ ′ By the above observations and linearity, we have �that|�⟩ = 0, i.e.,|�⟩ is an eigenvector of � with � � eigenvalue 0 and thus, −��� ′ −��·0 ′ ′ |� (�, �)⟩ = � |�⟩ = � |�⟩ = |�⟩ , i.e., the mixing Hamiltonian has no efect on the quantum state. Writing |�⟩, wout e have −�� |� (�, �)⟩ = |00⟩ − |11⟩ + � |10⟩ − |01⟩ . If we repeat all of these calculations in the case � > that 1, we get that −�� −�� � 1 |� (�, �)⟩ = |00⟩ − |11⟩ + � ··· � |10⟩ − |01⟩ −� � �=1 = |00⟩ − |11⟩ + � |10⟩ − |01⟩ , in which case all four states |00⟩ ,|01⟩ ,|10⟩ ,|11⟩ are measured with equal probability meaning that the expected cut value for� is 50% of the maximum cut�in . □ Proof for Theorem 4. Proof. Without loss of generality�, let be a cycle with � vertices. Let� : � → � be a local optimum for BM-MC . Our proof consists of two steps. First, we sho rank w (span({ � : � ∈ � })) ≤ 2. Next, building upon 3 � this characterization for the local optima we show that in fact the above rank is exactly 1 and all edges are (fully) cut, i.e., global optimum is achieved. We use irst order necessary conditions, known as KKT, to derive the desired characterization. Let us formulate the Lagrangian for our constrained optimization problem, ︁ ︁ 2 2 L(�, �) = ∥� − � ∥ − � (∥� ∥ − 1) , � � � � (�,�)∈� �∈� where � ∈ R is a multiplier corresponding to the condition ∥� ∥ = 1, ∀� ∈ � . It is easy to see the objective for � � our constrained optimization problem is emax qual to 2 min L(�, �) . Further using Lagrangian duality �:�→� �:�→R �L theory, we apply KKT optimality conditions that require (at any local optima) stationary condition = 0 is �� satisied for all � ∈ � in addition to the following feasibility and complementary slackness conditions (which are trivially satisied): • Primal feasibility re∥quir � ∥ =es1, ∀� ∈ � . • Dual feasibility requir � ∈ es R, ∀� ∈ � . • Complementary slackness requir�es = 0 whenever ∥� ∥ ≠ 1. � � ACM Trans. Quantum Comput. Bridging Classical and uantum with SDP initialized warm-starts for QAOA • 31 The stationary condition can be reformulated as � � (� − � ) (� − � ) − � (� � − 1) = 0 ∀� ∈ � . � � � � � � �� (�,�)∈� Thus, for all � ∈ � , we have the following stationary condition: (� − � ) = � � . � � � � (�,�)∈� Considering our graph being a cycle, where every verte � ∈x� has two neighbors, the stationary condition implies a linear dependence � with of � ’s corresponding to its neighbors. Hence rank , (span({ � , � , � })) ≤ 2 � � � � � where � and � are two neighbors of �. Note that if this rank is 1, one can easily show neighbors of this vertex (and consequently for every vertex) are at antipodal points � = � = −� . Otherwise�, and � are mirrored � � � � � with respect tospan({ � }). In this case, these three vectors lie on a unique plane. Inductively, one can show all vertices of the cycle lie on the same plane. With all the points lying on the same plane, it remains to show that the additional dimensionR (direction) in allows one to (locally) increase the objective. Without loss of generality � ∈ R, let be a point on the unit sphere with polar angle � = �/2 and azimuthal angle � . Coloring vertices of the cycle by two colors � ∈ { 1, 2}, it is � � easy to see for�˜ = (1, �/2+ (−1) �, � ) all edges stretch (unless they are antipodal) so the objective increases. � � This shows� was not a local optimum, in case of a rank 2 assignment. □ Proof for Observation 5. Proof. Since we are working only with circuit�depth = 1, for simplicity, we� use and � to denote � and � 1 1 respectively. If|� ⟩ = � (� (x)) where � (·) is a vertex-at-top rotation and x is as described in the statement of the 0 � � observation, then it is straightforward to see that there exists a reordering of the vertices|�such ⟩ =that ⊗� ⊗(�−�) |0⟩ ⊗ |1⟩ where |�| = � (where the irst� qubits correspond to vertices �in ). Let � = Max-Cut(�). Since� is the Hamiltonian ofMax-Cut the problem and|� ⟩ corresponds to an � 0 optimal cut, then|� ⟩ is a eigenvector �of with eigenvalue � . Thus, 0 � −��� −��·� � |� ⟩ = � |� ⟩ = � |� ⟩ . (9) 0 0 0 −��� where � = � . Using equation (9), we have that −��� −��� −��� ⊗� ′ ⊗(�−�) � � � |� (�, �)⟩ = � � |� ⟩ = �� |� ⟩ = � |� ⟩ ⊗ |� ⟩ , 1 0 0 � where � = cos(�) |0⟩ − �sin(�) |1⟩ and � = cos(�) |1⟩ − �sin(�) |0⟩ . The expected energy is the sum of the expected energy for each edge(�, �) ∈ � and each edge contributes a non-zero amount if and only if both endpoints have a diferent spin after measurement. Howev|�er,(since �, �)⟩ is an unentangled state, then we can consider measuring each vertex independently Consider . an edge(�, �) ∈ � and suppose that � ∈ � and � ∈ �. Then, Pr(� and � measured to be |0⟩ and |1⟩ respectively) = Pr(� measured to be |0⟩) · Pr(� measured to be |1⟩) (|� (�, �)⟩ is unentangled) = Pr(� measured to be |0⟩) · Pr(� measured to be |1⟩) (by construction) � � 2 2 = cos (�) · sin(�). (def of� ) Similar calculations show that � ∈ if � and � ∈ �, then the probability that � is measured to be|1⟩ and � is 2 2 measured to be |0⟩ is also cos (�) sin(�). In the case that � ∈ � and � ∉ �, one can similarly show that the 4 4 probability of measuring both to have difering spins is giv(�en) +by sin cos(�). The � term is a global phase change that doesn’t afect the measurement and can thus be ignored. ACM Trans. Quantum Comput. 32 • Reuben Tate, Majid Farhadi, Creston Herold, Greg Mohler, and Swati Gupta (a) (b) (c) (d) Fig. 14. Parameter landscapes of the instance-specific approximation ratio of the four-cy � for cle� = 1. When no warm start is used, the landscape has many peaks and valleys in the form of local maxima and minima (a). For both BM-MC and BM-MC , a vertex-at-top rotation yields a convex landscape with a ridge-like shape (b), thereby efectively capturing the optimal solution for the 4-cycle. When a uniform rotation is used, a BM-MC formulation (c) is able to achieve optimality for some choice of parameters whereas this is not the case for a BM-MC formulation (d). Combining all the calculations above, the expected energy is given by 2 2 4 4 � (�, �) = 2(� − � ) sin� cos � + � (sin� + cos �), where � is the sum of all edge weights�(i.e = . � ). �∈� Observe that when � = 0, the above equation reduces to� (�, �) = � as desired. By applying various trigonometric identities and algebraic manipulations, we can rewrite the above function as � (�, �) = (2� − � ) cos(4�) + 2� + � ) . The Max-Cut of a graph is at least half the sum of the edge weights, � i.e ≥ �., /2 (which implies � −2� ≥ 0). Sincecos(4�) is decreasing in |�| for� ∈ [−�/4, �/4], then it must be that� is decreasing (in |�| for � ∈ [−�/4, �/4]). □ C QAOA-WARM ON ANTIPODAL STRUCTURES We illustrate a set of graph instances wher QAeOA-warm has a signiicant advantage over standar QdAOA by � � considering BM-MCsolutions that have a special structure. For any positive integer �, we say that x ∈ (R ) � � has an optimal antipodal structure (in R ) for a graph� = (�, �) if there exists � ⊆ � and a unit vector� ∈ R such that (�,� \ �) is aMax-Cut of� and � = � if �∈ � and � = −� if �∉ �. That is, the points corresponding � � to each vertex lie at antipodal points on the sphere in a way determined by Max-Cut some of� . If we consider |� ⟩ = � (� (x)) where � is a random vertex-at-top rotation and x has optimal antipodal structure, then we 0 � � basically recover the Max-Cut. In this caseQ , AOA-warm with initial|�state ⟩ yields the Max-Cut of� for � = 0. For any connected bipartite graph and any � (including � = 2, 3), one can show that any globally optimal solution � ofBM-MC will have the antipodal structure described above. For the special case of even cycles, we ind that theBM-MC optimizationMax-Cut of always inds the global optimum. These observations simply imply that random vertex-at-top rotations recover good solutions from the classical regime. Theorem 4. For a union of� even-cycles, any local optimumx for BM-MC is a global optimum. To see this, observe that the expected sum of the weights of the edges crossing a random cut (where one independently places each vertex on one side of the cut or the other with probability /2) will 1 � be/2. Since the expectation�is/2, then there must exist at least one cut where the sum of the weights crossing the cutatis least � /2, i.e.,� ≥ � /2. ACM Trans. Quantum Comput. Approximation Ratio Bridging Classical and uantum with SDP initialized warm-starts for QAOA • 33 The characterization of local optima above for even cycles simply implies that initializing A-warm with QAO the random vertex-at-top rotation� (� (x )) will also recover the Max-Cut. To prove the structure of local optima in Theorem 4, we exploit the structure of the graph and utilize KKT conditions to irst show that any local optimum for BM-MChas rank at most 2. Further, we show that any rank-2 solution can be improved locally, thus resulting in rank-1 local (and therefore, global) optima. Details of the proof can be found in the Appendix B. It is conjectured that the performance of standarQdAOA for�-node even cycles is(2� + 1)/(2� + 2) when � > 2� [17, 49]. The above theorem is a concrete example wherQ eAOA-warm outperforms standardQAOA, due to a warm-start with a classical optimal solution. We ind that warm-starts often result in latter parameter landscapes forQAOA-warm, e.g., see Figure 14 depicting the landscapes for various variants QAOA- ofwarm on cycle� on four vertices (i.e � . = (�, �) with� = {1, 2, 3, 4} and � = {(1, 2),(2, 3),(3, 4),(1, 4)}). For the 4 4 vertex-at-top rotation in particular, notice that the solution quality monotonically de|� cr| with eases in the optimal parameters all lying on the�line = 0. We make this precise in the following observation. Observation 5. Let � ∈ {2, 3} and � = (�, �) be a graph with weights� : � → R and � ⊆ � be such that (�,� \ �) is aMax-Cut of � . Let � be a unit vector inR . Let x be such that � = � if� ∈ � and � = −� if� ∉ �. If � � we initialize QAOA-warm with the initial state|� ⟩ = � (� (x)) where � is a random vertex-at-top rotation, then, 0 � � we recover the Max-Cut since the states are aligned with|0⟩ and |1⟩. The expected cut value at (� , � ) is given by 1 1 � (� , � ) = (2� − � ) cos(4�) + 2� + � ) , 1 1 1 where � = Max-Cut(�) and � = � . Observe that � (� , 0) = Max-Cut(�) for all � ∈ R and � (� , � ) � 1 1 1 1 1 1 �∈� decreases as |� | increases for all|� | ∈ [0, �/4]. 1 1 The form of the expression for � (� , � ) follows from the fact that the cost term of the quantum circuit has no 1 1 1 efect and that � can be interpreted as a rotation angle (about the �-axis) in the Bloch sphere that moves the state away from the measurement axis. The details of this derivation are included in Appendix B. In contrast to the vertex-at-top rotations preserving the optimality of antipodal solutions (Observation 5), this is not always the case for uniform rotations. In Figure 14 for example uniform , therotation does not yield the optimal cut for � for any choice of parameters in rank-3. However, if we instead use the BM-MC solution with a 4 2 uniform rotation to obtain �(x), then there does exists a combination of parameter values that yields the optimal cut (by choosing� = 0 and an appropriate choice �of, application of the quantum circuit can be interpreted as a 1 1 rotation that aligns �(x) with the measurement axis in this case). This is due to potential proximity of uniformly rotated rank-3 solutions to the eigenstates of the mixer, which we can avoid in rank-2 initializations as discussed in Section 5.2. D PRE-PROCESSING TIME VS PARAMETER SEARCH TIME Here, we compare runtimes for various aspectsQof AOA-warm to those of standardQAOA and GW. For the preprocessing stage, inding an approximate solution for BM-MC takes up the bulk of the time (i.e., 1-3 seconds). The rotation applied to the solution and the mapping of the rotated solution to the Bloch sphere is negligible. We plot the runtimes for BM-MCfor� = 2, 3 in Figure 15. To get a better idea of scaling, we consider an expanded ′ ′ graph librarG y consisting of 2076 instances; G is generated in the same way asG (see Section 4.1) but we instead consider graphs of up to 19 nodes. Finding approximate solutions to rank-2 BM-MC is considerably faster than rank-3 BM-MC; furthermore, the runtimes are similar regardless of edge weight values. Plots of GW’s runtime for all graphsGinare included in Figure 15; as before, we see the runtimes are similar even if In non-linear optimization, the KarushśKuhnśTucker (KKT) conditions are irst-order necessary conditions which characterize the set of optimal solutions. The usage of the KKT conditions generalizes the method of Lagrange multipliers [27][31]. ACM Trans. Quantum Comput. 34 • Reuben Tate, Majid Farhadi, Creston Herold, Greg Mohler, and Swati Gupta Fig. 15. This figure shows how the median runtime changes for both GW and BM-MC (� = 2, 3) as the number of nodes increases. The extended graph libraryG (2076 instances) was used to generate the results above; we also run plot the results for just the positive-weighted graphsG inas well. The top and botom of the colored regions corresponding to 75 and 25 percentiles respectively. Fig. 16. This figure shows how the median runtime changes for the optimization loop of both standar QAOA and d QAOA- warm for various optimizers (ADAM, BFGS, and Nelder-Mead). COBYLA was not included due to technical limitations with our sotware; in particular, we were unable to gain direct access to the source code needed to in order to exclude the runtime of function or gradient evaluations. as the circuit depth increases. These runtimes do not include the time taken to evaluate/estimate the function values or gradients of the expecte�d cut (�, �) value (since in practice, such calls would be made on the quantum device). These plots were generated by randomly selecting 20 8-node graphs from our graph library G (see Section 4.1), with 10 of the 20 graphs having only positive edge weights. For each solid colored line (corresponding to the median), there are two dashed lines of the same color above and below representing the 75th and 25th percentiles respectively. On the right, we plot the runtimes for BFGS separately in order to more easily see the trend in runtime � as increases. restrict our attention to only positive-weighted graphs. Note that our code for BM-MC runs is not optimized, and possibly faster implementations for this might be possible. For both classical algorithms (GW and BM-MC ), we see that the runtime increases superlinearly in the number of nodes�. In regards to theoretical results, the runtime of GW is dominated by solving the SDP; Lee and Padmanabhan [33] develop an algorithm where one can get within factor − � of 1 the optimal SDP objective 3.5 in� (� /� ) time where� is the number of edges in the graph. Similarly, for BM-MC , Mei et al.38[] show ACM Trans. Quantum Comput. Bridging Classical and uantum with SDP initialized warm-starts for QAOA • 35 that one can use a variant of the fast Riemannian trust-region algorithm to ind a locally optimal solution in 2 4 � (� �� log�) time for �-regular graphs. We now consider the runtime of the optimization loop used in both standard QAOAA- and warm QAOas seen in Figure 16 for various optimizers (ADAM, BFGS, Nelder-Mead). To get an idea of the runtime of the classical portions of the optimization loop, we exclude the time taken to estimate the function values or gradients of � (�, �). During our preliminary experiments, we found that the number of nodes did not have any noticable efect on the runtime of the optimization loop for either standar QAOA dor QAOA-warm for any of the optimizers. However, for all optimizers, Figure 16 shows that more time is needed to optimize � and � as the circuit depth increases. With the exception of BFGS, for all optimizers and circuit depths, itQapp AOears A-wthat arm converges to a set of parameters more quickly compared to standard QAOA. We now discuss the runtime of the preprocessing stage QA of OA-warm relative to the runtimeQof AOA- warm’s optimization loop. A direct comparison is diicult since the former is independent of the � circuit depth and the latter is independent of the number of no�des . However, for the� and � considered in our experiments, it appears (from Figures 15 and 16) that the preprocessing stage takes orders of magnitude longer. We remark that our current implementation for inding approximate BM-MC solutions (Algorithm 2) was not designed to ind solutions quickly; we suspect other methods can ind solutions more quickly. Additionally, we remark that the runtime preprocessing stage appears to scale modestly as the number of nodes increases. The trends in Figure 16 also suggest that as the circuit depth � increases, that the proportion Q ofAOA-warm spent in the preprocessing stage diminishes. Moreover, the real runtime of the optimization loop on an actual quantum device would be longer since one needs to consider the time needed to query the quantum device in order to estimate the value or gradient of � (�, �) at every iteration of the optimization loop. Lastly, there is the additional beneit that if one wants to perform multiple QAOA-warm runs with diferent initializations of the variational parameters or diferent rotation schemes, then one only needs to ind a solution to BM-MC once. E COMPARISON WITH EGGER ET AL. Both our approach and that of Egger et al.15 [ ] consider a variant Q ofAOA initialized with a separable state obtained by some classical method. In this section, we describe the similarities and diferences between the two approaches. In their irst approach, which Egger et al. refer to as łContinuous warm-start QAOA", they consider QAOA applied to the Quadratic Unconstrained Binary Optimization (QUBO) problem which can be formulated as min � Σ�, (10) �∈{0,1} where Σ is an� × � symmetric matrix. One can consider the relaxation, min � Σ�, (11) �∈[0,1] where now each � lies in the inter[0val , 1]. IfΣ is positive semideinite, then Equation 11 is a convex program which can easily be solved by classical optimizers to obtained an optimal � . Ne solution xt, Egger et al. produces an unentangled state by mapping each � ∈ [0, 1] to a portion of a great-circle on the Bloch sphere; more speciically, the initial|� ⟩state is given by h ︃ i |� ⟩ = � 2 arcsin � |0⟩ , (12) 0 � �=1 We exclude such portions since including them would not be relective of the runtime obtained on an actual quantum device; a quantum device can estimate � (�, �) (the expected cut value) in time polynomial � wher ineas a numerical simulation would (typically) take time that is exponential�in . ACM Trans. Quantum Comput. 36 • Reuben Tate, Majid Farhadi, Creston Herold, Greg Mohler, and Swati Gupta where � (�) is a rotation on the Bloch sphere about the �-axis by angle �. This initialization signiicantly difers from our initialization scheme in that our relaxation relaxes each variable to a unit vector (instead of a position in an interval) and (for rank-3 initializations) we are not restricted to any particular portion of the Bloch sphere. Additionally, since Max-Cut is equivalent to QUBO 14],[ our approach can also be used to solve arbitrary QUBO problems; however, this approach by Egger et al. is not applicable Max-Cut to since one can show that the correspondingΣ matrix in Equation 11 would not be positive semideinite. Egger et al. also modify the mixing Hamiltonian � in their approach; in particular, they cho�ose= � � � where �,� �=1 ∗ ∗ 2� − 1 −2 � (1− � ) � � � � = . (13) �,� ∗ ∗ ∗ −2 � (1− � ) 1− 2� � � � One can show that|� ⟩ from Equation 12 is a ground state�ofas described above; Egger et al. remark that this 0 � allows us to apply the adiabatic theorem and conclude that this variant of QAOA approaches the optimal solution as the circuit depth tends to ininity (assuming an optimal choice of variational parameters). Unfortunately, our QAOA-warm approach has no such guarantees. Egger et al. also consider another variantQof AOA called łRounded warm-startQAOA." Unlike their previous approach, this approach is more readily applicable Max-Cut to . In this approach, a cut(�,� \ �) is irst generated via classical means (e.g. the rounding procedure found in the GW algorithm). Then, each qubit corresponding to a vertex in� is initialize � (d�to/3) |0⟩; similarly, each vertex� in \ � is initialize � (d2� to/3) |0⟩. The mixer � � used in this approach is the same as Equation 13 but with the diagonal elements multiplie −1. It candbby e shown that this approach allows one to recover the same cut that was initially used to create the initial quantum state. However, unlike their previous approach, the adiabatic theorem is not applicable in this case (since the initial state is no longer a ground state of the mixer), and thus, not much is known about the theoretical convergence of this rounded approach. Experimental Comparison.We perform a similar set of numerical simulations that Egger et al. Max-Cut used for (i.e. their rounded approach) and compare their approach with our own. When using their approach, for each instance considered, we also use a GW-solver to (classically) obtain 10 cuts, keeping only the best 5. Each of these cuts are used to create a diferent initial state using their Rounded Warm-Start QAOA approach (with a regularization parameter�of= 0.25 which guarantees that their approach can recover the cut used for initialization at a particular set of variational parameters � = 1). Similar at to our numerical simulations for QAOA-warm, we also test their approach at circuit depths � = 1, 2, 4, 8. We consider two possible choices regarding the initialization of the variational � andparameters �: (1) initializing near the origin and (2) initializing � = �/2,� = 0at(with the remaining parameters being initialized 1 1 to 0 for� > 1). The former initialization is the same as thatQuse AOdA- for warm and the latter is guaranteed to produce the cut used to initialize the quantum state. For the Egger et al. approach, we found that for most instances, when taking the best of 5 cuts produced by GW, that at least one of them would be the optimal cut. This yields uninteresting results since � cho = �osing /2 (with the remaining parameters being 0) would automatically yield the optimal cut when using Egger et al.’s approach. For this reason, we have removed all such instances (76.2% of the instance library) in our comparison. In fact, the matrix would bnegativ e edeinite in the caseMax-Cut of . It is straightforward to show that this implies that any locally optimal solution to the proposed relaxation of the QUBO (corresponding to Max-Cut the instance) would yield only purely binary solutions in {0, 1} . The authors do consider further modiications of the initial state by intr�oducing > 0 parameter an that ensures the qubits are not initialized too close to the poles; this is done to avoid degenerate initializations that would cause Egger QAet OA al.to ’s fail variant to of converge as the circuit depth � → ∞. The mixer is also adjusted accordingly. ACM Trans. Quantum Comput. Bridging Classical and uantum with SDP initialized warm-starts for QAOA • 37 Fig. 17. Histograms are used to compare the instance-specific approximation ratios QAO of A-warm and Egger et al.’s [15] Rounded Warm-Start QAOA approach in our graph librar G (ysee Section 4.1) for circuit depths � = 1 (blue) and� = 8 (red); regions in purple are the overlaps of the two histograms. The let column uses all the graphs G (seein Section 4.1) whereas the right column only considers those with positive edge weights. The top row considers initializing the variational parameters to the origin for both approaches whereas the botom row considers initializing � = �/2 (with the remaining parameters at zero) for Egger et al.’s approach. The removal of such instances also removes all instances for which QAOA-warm achieves optimality � =at0 (26.3% of the instance library). As seen in Figure 17, at low circuit depth � = 1), ( regardless of initializaiton scheme used for Egger et al.’s approach, QAOA-warm is able outperform it for the majority of the instances. The advantage QAO that A-warm has over Egger et al.’s approach is subdued at higher � (e.g. we see a leftward shift in the histograms when going from� = 1 to � = 8) or when using the secondary initialization scheme for Egger et al.’s approach. ACM Trans. Quantum Comput.
ACM Transactions on Quantum Computing – Association for Computing Machinery
Published: Feb 24, 2023
Keywords: QAOA
You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.