Access the full text.
Sign up today, get DeepDyve free for 14 days.
References for this paper are not available at this time. We will be adding them shortly, thank you for your patience.
LEAP: Scaling Numerical Optimization Based Synthesis Using an Incremental Approach ETHAN SMITH and MARC G. DAVIS, University of California, Berkeley, USA JEFFREY M. LARSON, Argonne National Laboratory, USA ED YOUNIS, LINDSAY BASSMAN, WIM LAVRIJSEN, and COSTIN IANCU, Lawrence Berkeley National Laboratory, USA While showing great promise, circuit synthesis techniques that combine numerical optimization with search over circuit structures face scalability challenges due to a large number of parameters, exponential search spaces, and complex objective functions. The LEAP algorithm improves scaling across these dimensions using iterative circuit synthesis, incremental re-optimization, dimensionality reduction, and improved numerical optimization. LEAP draws on the design of the optimal synthesis algorithm QSearch by extending it with an incremental approach to determine constant preix solutions for a circuit. By narrowing the search space, LEAP improves scalability from four to six qubit circuits. LEAP was evaluated with known quantum circuits such as QFT and physical simulation circuits like the VQE, TFIM, and QITE. LEAP can compile four qubit unitaries 59× faster up to than QSearch and ive and six qubit unitaries with up1to .2× fewer CNOTs compared to the QFAST package. LEAP can reduce the CNOT count by up to 36×, or 7× on average, compared to the CQC Tket compiler. Despite its heuristics, LEAP has generated optimal circuits for many test cases with a priori known solutions. The techniques introduced by LEAP are applicable to other numerical-optimization-based synthesis approaches. 1 INTRODUCTION Quantum synthesis techniques generate circuits from high-level mathematical descriptions of an algorithm. They can provide a powerful tool for circuit optimization, hardware design exploration, and algorithm discovery. An important quality metric of synthesis, and of compilers in general, is circuit depth, which relates directly to the program performance on hardware. Short-depth circuits are especially important for noisy intermediate- scale quantum (NISQ) era devices, characterized by limited coherence time and noisy gates. Here synthesis provides a critical capability in enabling experimentation where only the shortest depth circuits provide usable outputs. Furthermore, synthesizing short depth circuits is a powerful building block useful in circuit partitioning algorithms that can be used to optimize circuits with 100s of qubits. Reducing the depth of each block can greatly reduce the overall depth of the partitioned circuit. In general, two concepts are important when thinking about synthesis algorithms 1ś 6]: cir [ cuit structure captures the application of gates on łphysicalž qubit links, function whilecaptures the gate operations, for example, rotation angle R (θ ). Recently introduced techniques 6, 7[] can generate short-depth circuits in a topology-aware manner by combining numerical optimization of parameterized gate representations U ) to determine (e.g., function together with search over circuit structures. Regarding circuit depth, their eicacy surpasses that of Authors’ addresses: Ethan Smith, ethanhs@berkeley.edu; Marc G. Davis, marc.davis@berkeley.edu, University of California, Berkeley, Berkeley, USA; Jefrey M. Larson, jmlarson@anl.gov, Argonne National Laboratory, Lemont, USA; Ed Younis, edyounis@lbl.gov; Lindsay Bassman, lbassman@lbl.gov; Wim Lavrijsen, wlavrijsen@lbl.gov; Costin Iancu, cciancu@lbl.gov, Lawrence Berkeley National Laboratory, Berkeley, USA. Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or ailiate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only. © 2022 Copyright held by the owner/author(s). Publication rights licensed to ACM. 2643-6817/2022/8-ART $15.00 https://doi.org/10.1145/3548693 ACM Trans. Quantum Comput. 2 • Ethan Smith, Marc G. Davis, Jefrey M. Larson, Ed Younis, Lindsay Bassman, Wim Lavrijsen, and Costin Iancu traditional optimizing compilers such as IBM 8]Qiskit and CQC[Tket [9], or of other available synthesis tools such as UniversalQ[10]. An exemplar of synthesis approaches is QSear6ch ], which [ provides optimal-depth synthesis and has been shown to match known optimal quantum algorithm implementations for circuits such 11]. asQSear QFT [ch grows a circuit by adding layers of parameterized gates and permuting gate placement at each link, building on the previous best placements to form a circuit structure. A numerical optimizer is run on each candidate circuit structure to instantiatefunction the that łminimizesž a score (distance from the target unitary based on the HilbertśSchmidt norm). This score guides the A* search algorithm 12] to[extend and evaluate the next partial solution. Other numerical-optimization-based synthesis [3, 4, 6] algorithms use a similar approach. While providing good-quality results, however, these techniques face scalability challenges: (1) the number of parameters to optimize grows with circuit depth; (2) the number of intermediate solutions to consider is exponential; and (3) the objective function for optimization is complex, and optimizers may get stuck in local minima. LEAP (Larger Exploration by Approximate Preixes) has been designed to improve the scalability of QSearch, and it introduces several novel techniques directly extensible to the broader class of search or numerical-optimization-based synthesis. Preix Circuit Synthesis: Designed to improve scaling, LEAP prunes the search space by limiting backtracking depth and by coarsening the granularity of the backtrack steps. Our branch-and-bound algorithm monitors progress during search and employs eł xecution-drivenž heuristics to decide which partial solutions are good preix candidates for the inal solution. Whenever a preix is chosen, the question is whether to reuse the structure (gate placement) or structure and function (gate instantiation) together. The former approach prunes the search space, while the latter prunes both the search and parameter spaces. Incremental Re-synthesis:The end result of incremental preix synthesis (or other divide-and-conquer methods, partitioning techniques, etc.) is that circuit pieces are processed in disjunction, with the potential of missing the global optimum. Intuitively, LEAP gravitates toward the solution by combining local optimization on disjoint sub-circuits. By chopping and combining pieces of the inal circuit, we can create new, unseen sub-circuits for the optimization process. Overall, this technique is designed to improve the solution quality for any divide-and- conquer or other hierarchical approach. Dimensionality Reduction: This technique could improve both scalability and solution quality. QSearch and LEAP require sets of gates that can fully describe the Hilbert subspace explored by the input transformation. This approach ensures convergence, but in many cases it may overit the problem. We provide an algorithm to delete any parameterized gates that do not contribute to the solution, thereby reducing the dimension of the optimization problems. When applied directly to the inal solution, dimensionality reduction may improve the solution quality by deleting single-qubit gates. Dimensionality reduction may also be applied in conjunction with preix circuit synthesis, improving both scalability and solution quality. Multistart Numerical Optimization: This technique afects both scalability and the quality of the solution. Any standalone numerical optimizer is likely to have a low success rate when applied to problem formulations that involve quantum circuit parameterizations. Multistart optimization 13] improves [ on the success rate and quality of solution (avoids local minima) by running multiple numerical optimizations in conjunction. Each individual multi-optimization step may become slower, but improved solutions may reduce the chance of missing an optimal solution, causing further search expansion. LEAP has been implemented as an extension to QSearch, and it has been evaluated on traditional łgatesž such as mul and adder, as well as full-ledged algorithms such as QFT 11], HLF [ [14], VQE [15], TFIM [16, 17], and QITE [18]. We compare its behavior with state-of-the-art synthesis approaches: QSearch, QF7AST ], Tket [ [9], The UniversalQ algorithms have been recently incorporated into IBM Qiskit. For brevity, in the rest of this paper we will refer to it as Qiskit-synth. ACM Trans. Quantum Comput. LEAP: Scaling Numerical Optimization Based Synthesis Using an Incremental Approach • 3 and Qiskit-synth10[]. While QSearch scales up to four qubits, LEAP can compile four-qubit unitaries up to 59× faster than QSearch and scales up to six qubits. On well-known quantum circuits such as the Variational Quantum Eigensolver (VQE), the Quantum Fourier Transformation (QFT), and physical simulation circuits such as the Transverse Field Ising Model (TFIM), LEAP with re-synthesis can reduce the CNOT count by up×,to 48 or 11× on average. Our heuristics rarely afect solution quality, and LEAP can frequently match optimal-depth solutions. At ive and six qubits, LEAP synthesizes circuits with .19×up feto wer 1 CNOTs on average compared with QFAST, albeit with an average .55×3 performance penalty. LEAP can be one order of magnitude slower than Qiskit-synth while providing two or more orders of magnitude shorter circuits. Compared with Tket, LEAP reduces the depth on average by 7.70×, while taking signiicantly longer in runtime. All of our techniques afect behavior and performance in a nontrivial way: • Compared with QSearch, preix synthesis reduces by orders of magnitude the number of partial solutions explored, leading to signiicant speedup. • Incremental re-synthesis reduces circuit depth by 15% on average, albeit with large increases in running time. • Dimensionality reduction eliminates up toU 40% gates ofon average (thus reducing the number of parameters) and shortens the circuit critical path. • Multistart increases the optimizer success rate from 15% (best value observed for any standalone optimizer) to 99%. For a single optimization run, however, multistart is×up slotow10 er than the underlying numerical optimizer. Overall, we believe LEAP provides a very competitive circuit optimizer for circuits on NISQ devices up to six qubits. We believe that our techniques can be easily generalized or transferred directly to other algorithms based on the search of circuit structures or numerical optimization. For example, re-synthesis, dimensionality reduction, and multistart optimization are directly applicable to QFAST; and re-synthesis is applicable to Qiskit-synth. We can expect that synthesis techniques using divide-and-conquer or partitioning methods will be mandatory for scalability to the number of qubits (in thousands) provided by future near-term processors. Our techniques provide valuable information to these budding approaches. The rest of this paper is structured as follows. In Section 2 we describe the problem and its challenges. The proposed solutions are discussed in Sections 3 through 6. The experimental evaluation is presented in Section 7. In Section 9 we discuss the implications of our approach. Related work is presented in Section 10. In Section 11 we briely summarize our conclusions. 2 BACKGROUND In quantum computing, a qubit is the basic unit of quantum information. The general quantum state is represented by a linear combination of two orthonormal basis states (basis vectors). The most common basis is the equivalent ! ! 1 0 of the 0 and 1 values used for bits in classical information theory, resp |0⟩e=ctively and |1⟩ = . 0 1 The generic qubit state is a superposition of the basis states,|ψ namely ⟩ = α |,0⟩+β |1⟩, with complex amplitudes 2 2 α and β, such that |α| + |β| = 1. The prevalent model of quantum computation is the circuit model introduce 19], dwher ine[ information carried by qubits (wires) is modiied by quantum gates, which mathematically correspond to unitary operations. ∗ ∗ ∗ A complex square matrix U is unitaryif its conjugate transpose U is its inverse, thatUis, U = U U = I. In the circuit model, a single-qubit gate is represented×by 2 unitar a 2 y matrix U. The efect of the gate on the qubit state is obtained by multiplying the U matrix with the vector representing the quantum state |ψ ⟩ = U |ψ⟩. The most general form of the unitary for a single-qubit gate is the continuousž ł or łvariationalž gate representation. ACM Trans. Quantum Comput. 4 • Ethan Smith, Marc G. Davis, Jefrey M. Larson, Ed Younis, Lindsay Bassman, Wim Lavrijsen, and Costin Iancu θ θ iλ cos −e sin 2 2 U (θ ,ϕ, λ) = iϕ θ iλ+iϕ θ e sin e cos 2 2 n n A quantum transformation (algorithm, circuit) n qubits on is represented by a unitary matrix U of size × 2 .2 A circuit is described by an evolution in space (application on qubits) and time of gates. Figure 1 shows an example circuit that applies single-qubit CNOand T gates on three qubits. Circuit Synthesis: The goal of circuit synthesis is to decompose unitaries SU (nfr ) om into a product of terms, where each individual term (e.g., fr SUom (2) and SU (4)) captures the application of a quantum gate on individual qubits. This is depicted in Figure 1. The quality of a synthesis algorithm is evaluated by the number of gates in the resulting circuit and by the solution distinguishability from the original unitary. Circuit length provides one of the main optimality criteria for synthesis algorithms: shorter circuits are better. CNOT count is a direct indicator of overall circuit length, since the number of single-qubit generic gates introduced in the circuit is proportional to a constant given by decomposition ZXZXZ(e).g., rules. Since CNOT gates have low idelity on NISQ devices, state-of-the-art approaches 1, 2][directly attempt to minimize their count. Longer-term, single-qubit gate count (and circuit critical path) is likely to augment the quality metric for synthesis. Synthesis algorithms use distance metrics to assess the solution quality. Their goal is∥U to− minimize U ∥, where U is the unitary that describes the transformationU and is the computed solution. They choose an error thresholdϵ and use it for convergence∥,U − U ∥ ≤ ϵ. Early synthesis algorithms used the diamond norm, while more recent eforts [4, 20] use a metric based on the HilbertśSchmidt inner product betwUeen and U . ⟨U ,U ⟩ = Tr (U U ) (1) S HS S This is motivated by its lower computational overhead. U1 ≞ U2 C U4 U3 T U5 � � � � �� �� �� �� � � �� �� � � � � �� �� �� �� � � � � �� �� �� �� ⨂ = � � � � � � �� �� � � �� �� �� �� �� �� � � �� �� � � � � �� �� �� �� 3 3 Fig. 1. Unitaries (above) and tensors products (below). The unitary U representsna= 3 qubit transformation, where U is2a × 2 matrix. The unitary is implemented (equivalent or approximated) by the circuit on the right-hand side. The single-qubit unitaries 2× 2 matrices, are while 2 2 CNOT is a2 × 2 matrix. The computation performed by the circuit(Iis⊗ U ⊗ U ) (I ⊗ CN OT ) (U ⊗ U ⊗ U ), where I is the identity 2 4 5 2 1 2 3 2 2× 2 matrix and⊗ is the tensor product operator. The right-hand side shows the tensor product of 2× 2 matrices. 2.1 Optimal-Depth Topology-Aware Synthesis QSearch [6] introduces an optimal-depth topology-aware synthesis algorithm that has been demonstrated to be extensible across native gate sets (e.g., R {, R ,CNOT }, {R , R , CZ}) and to multilevel systems such as qutrits. X Z X Z The approach employed in QSearch is canonical for the operation of other synthesis approaches that employ numerical optimization. Conceptually, the problem can be thought of as a search over a tree of possible circuit structures containing parameterized gates. A search algorithm provides a principled way to walk the tree and evaluate candidate solutions. For each candidate, a numerical optimizer instantiates the function (parameters) of each gate in order to minimize some distance objective function. ACM Trans. Quantum Comput. Choose successor with smallest f(n) LEAP: Scaling Numerical Optimization Based Synthesis Using an Incremental Approach • 5 # $ � � � " " " # � � 2 "# "# " # $ � � � # # # ( ( $ U (n , x̅) = (� ⊗ � ⊗ � )(I ⊗ � ) 3 2 # $ 2 #$ � � " 0 � $ 2 "# � � " ! � � # # � !" # $ � � � " " $ $ # U (n , x̅) = (� ⊗ � ⊗ � )(I ⊗ � ) " 1 2 # $ 2 #$ 2 "# � # ! � � � ( ( $ " U (n , x̅)= (� ⊗ � ⊗ � )(I ⊗ � ) 4 2 " # 2 "$ # 0 " # � � ! ! # # # " U (n , x̅)= U ⊗ U ⊗ U # 0 " # $ � !" � !" 1 " " � � 0 " " # � � # ! ! !" ( ( $ 1 U (n , x̅)= (� ⊗ � ⊗ � )(I ⊗ � ) " 5 2 " # 2 "# " # " � ! ! # " !" 1 $ # � $ $ # � � " U (n ,x̅)= (� ⊗ � ⊗ � )(I ⊗ � ) " 2 2 " # 2 "# " "# � � f(n) = cnot count + a * min D(U(n ,x̅), U ) x̅ 1 target ( ( $ U (n x̅)= (� ⊗ � ⊗ � )(I ⊗ � ) 6, 2 # $ 2 #$ Fig. 2. Example evolution of the search algorithm for a three-qubit circuit. It starts by placing a layer of single-qubit gates, then generating the next two possible solutions. Each is evaluated, and in this case the upper circuit is closer to the target unitary, leading to a smaller heuristic value. This circuit is then expanded with its possible two successors. These are again instantiated by the optimizer. The second circuit from the top has an acceptable distance and is reported as the solution. The path in blue shows the evolution of the solution. The ansatz circuits enclosed by the dotted line have been evaluated during the search. QSearch works by extending the circuit structure a layer at a time. At each step, the algorithm places a two-qubit expansion operator in all legal placements. The operator contains one CNOT gate U and (θtw ,ϕ,oλ) gates. QSearch then evaluates these candidates using numerical optimization to instantiate allthe single-qubit gates in the structure. An A*12 [ ] heuristic determines which of the candidates is selected for another layer expansion, as well as the destination of the backtracking steps. Figure 2 illustrates this process for a three-qubit circuit. Although theoretically able to solve for any łprogramž (unitary) size, the scalability of QSearch is limited in practice to four-qubit programs because of several factors. The A* strategy determines the number of solutions evaluated: at best this is linear in depth; at worst it is exponential. Any technique to reduce the number of candidates, especially when deep, is likely to improve performance. Our preix synthesis solution is discussed in Section 3. Since each expansion operator has twUo gates, accounting for six parameters, circuit parameterization grows linearly with depth. Numerical optimizers scale at best with a high-degree polynomial in the number of parameters, making optimization of long circuits challenging. Any technique to reduce the number of parameters is likely to improve performance. Dimensionality reduction is discussed further in Section 5. The scalability and the quality of the numerical optimizer matter. Faster optimizers are desirable, but their quality afects performance nontrivially. Our experimentation with21 CMA ], L-BFGS -ES [ [22], and Google Ceres [23] shows that the QSearch success rate of obtaining a solution from a valid structure can vary from 20% to In practice, QSearch uses 5 parameters because of commutativity rules betwReen (θ ) and CNOT gates. ACM Trans. Quantum Comput. Choose successor with smallest f(n) Fast progress 6 • Ethan Smith, Marc G. Davis, Jefrey M. Larson, Ed Younis, Lindsay Bassman, Wim Lavrijsen, and Costin Iancu 1% for longer circuits. Besides this measurable outcome, the propensity of optimizers to get stuck in local minima and plateaus can have a negative efect on scalability by altering the search path. A more nuanced approach to optimization and judicious allocation of optimization time budget may improve scalability. Our multistart approach is discussed further in Section 6. 3 PREFIX CIRCUIT SYNTHESIS The synthesis solution space can be thought of as a tree that enumerates circuit structures of increasing depth: Level 1 contains depth-one structures, Level 2 contains depth-two structures, and so on. For scalability, we want to reach a solution while evaluating the least number of candidates and possible the shallowest circuits possible. The number of evaluations is given by the search algorithm: in the case of QSearch the path is driven by A*, and scalability is limited by long backtracking chains. Do not reevaluate Fig. 3. Synthesis needs to navigate around local minima and plateaus. Fig. 4. Preix-based synthesis induces a partitioning of the circuit. Each partition/preix captures the efect of its associated sub-tree on the search for a solution. Each partition has been subject to optimization: global with respect to the partition itself, but local with respect to the inal solution. The resulting circuit in the middle reaches a solution from composing local optima. Re-synthesis combines disjoint partitions in order to form regions that are passed through optimization. Since the new regions have not been subject to optimization, there exists the potential for improvement. Our idea introduces a simple heuristic to reduce the frequency of backtracking. The approach is data ł drivenž and inspired by techniques employed in numerical optimization, as shown in Figure 3. Imagine mapping the search tree onto an optimization surface, which will contain plateaus and local minima. Exiting a plateau is characterized by faster progress toward a solution and minima. If the minima are local (partial solution is not acceptable), the algorithm has to walk out of the łvalley.ž Once out, the algorithm may still be on a plateau, but it can mark the region just explored as not łinterestingž for any backtracking. The efect of implementing ACM Trans. Quantum Comput. Work to climb LEAP: Scaling Numerical Optimization Based Synthesis Using an Incremental Approach • 7 these principles in the search is illustrated in Figure 4. The result is a partitioning of the solution space into coarse-grained regions grouped by circuit depth range. During search, backtracking between solutions within a region is performed by using the A* rules. We never backtrack outside of a region to any candidate solution that resides in the previous depth ł band.ž Overall, the efect of our strategy can be thought of as determining a preix structure on the resulting circuit, as shown in Figure 4. The algorithm starts with a pure A* search on circuits up dto.depth The irst depthd viable 1 1 partial solution is recorded, and the search proceeds to depth d in sub-tree A. A* search proceeds in sub-tree A until inding the irst viable candidate atd depth , then proceeds in sub-tree B. At this point we have three regions: the start sub-tree for depth 0 to d , A for depthd + 1 to d , and B for depthd + 1 to d . In this example 1 1 2 2 3 the search in sub-tree B fails at depth d + 1. We, therefore, backtrack tod , and the search proceeds on the path 2 2 depicted on the right-hand side of the tree and eventually inds a solution. One can easily see how by prohibiting backtracking into large solution sub-trees we can reduce the number of evaluated (numerically optimized) candidates and improve scalability. As this changes the A* optimality property of the algorithm, the challenge is determining these sub-trees in a manner that still leads to a short-depth solution. Preix Formation:A partial solution describes a circuit structure and its function (gates). We have considered both static and dynamic methods for preix formation. In our nomenclature, a static approach will choose a preix circuit whose structure and function are ixed: this is a fully instantiated circuit. A dynamic approach will choose a ixed structure whose function is still parameterized. In the irst case, the preix circuit is completely instantiated with native gates to perform a single computation, while in the latter it can łwalkž a much larger Hilbert subspace as induced by the parameterization. Intuitively, determining a single instantiated preix circuit is good for scalability. This reduces the number of parameters evaluated in any numerical optimization operation after preix formation. We have experimented with several strategies for forming instantiated preix circuits in our synthesis algorithms, but they did not converge or they produced very long circuits. Preix Formulation:In LEAP we use a dynamic data-driven approach informed by the evolution of the under- lying A* QSearch algorithm, described in Figure 4. Our analysis of the trajectories for multiple examples shows that many paths are characterized by a rapid improvement in solution quality (reduction in HilbertśSchmidt distance between target unitary and approximate preix), followed by plateauing induced either by optimizer limitations (local minima) or as an artifact of the particular structures considered (dead-end). LEAP forms subtrees by irst identifying and monitoring plateaus. Since during a plateau the rate of solution quality change is łlow,ž a łpreixž is formed whenever a solution is evaluated with a jump in the rate of change. The plateau identiication heuristic is augmented with a work-based heuristic: we wait to form a preix until we sample enough partial solutions on a path. This serves several purposes: it gives us more samples in a sub-tree to gain some conidence we have not skipped łthe only few viable partial solutions,ž and it increases the backtracking granularity by identifying larger subtrees. Even more subtly, the work heuristic decreases the sensitivity of the approach to the thresholds used to assess the rate of change in the plateau identiication method. By delaying to form a preix based on work, we avoid jumping directly into another plateau that will result in superluously evaluating many solutions that are close in depth to each other. Solution Optimality: By discarding pure A* search, LEAP gives up on always inding the optimal solution. However, the following observations based on the properties of the solution search space indicate that optimality loss could be small and that the approach can be generalized to other search and numerical optimization-based methods. First, the solution tree of circuit structures exhibits high symmetry. Partial solutions can be made equivalent by qubit relabeling; all solutions reached from any equivalent structure will have a similar depth. For example, for a circuit with N qubits, a depth 1 circuit with a CNOT on qubits 0 and 1 can be thought of as eł quivalentž to the circuit with a CNOT on qubits N − 2 and N − 1. Symmetry indicates that coarse-grained pruning may be feasible, since a sub-tree may contain many eł quivalentž partial solutions. ACM Trans. Quantum Comput. 8 • Ethan Smith, Marc G. Davis, Jefrey M. Larson, Ed Younis, Lindsay Bassman, Wim Lavrijsen, and Costin Iancu Second, assuming that the optimal solution has depth d, there are many easy-to-ind solutionsdatepth > d. In Figure 3, assume that the solution noSdeat depth d is missed by our strategy. However, there arlienks solutions at d + 1, links solutions at d + 2, and so forth, trivially obtained by adding identityS.gates In other to words, the solution density increases (probably quadratically) with circuit depth increase. If the search has a de ł centž partial solution at depth d, numerical optimization is likely to ind the inal solution at very close depth. Overall, the high-level heuristic goal is to get to optimal depth with a łgood enoughž partial solution. Our łgood enoughž criteria combine the HilbertśSchmidt norm with a measure of work. The pseudocode for the preix formation algorithm in LEAP is presented in Figure 5. Algorithm 1Helper Functions Algorithm 2LEAP Preix Formation 1: functions(n) 1: functionleap_synthesize(U , ϵ, δ ) t ar дet 2: return {n + CNOT +U ⊗U for all possible CNOT positions } 3 3 2: s ← the best score of preixes 3: n ← the preix structure 3: i 4: whiles > ϵ do 4: functionp(n, U ) i 5: n , s ← inner_synthesize(U , ϵ, δ ) 5: return min D (U (n, x ), U ) i i t arдet 6: 6: return n , s i i 7: functionh(d) 7: 8: return d ∗ a ▷ a is a constant determined via experiment. See 8: functioninner_synthesize(U , ϵ, δ ) t ar дet section 3.3.1 9: n ← representation of U on each qubit 9: 10: a ← best depth values of intermediate results 10: functionpredict_score(a, b, d ) i 11: b ← best depth values of intermediate results 11: return {PredictedCNOTs for depthd based on points in a, b} i 12: push n onto queue with priorityh(d )+0 best 13: whilequeue is not emptydo 14: n ← pop fromqueue 15: for alln ∈s(n) do 16: s ← p(n , U ) i i t ar дet 17: d ← CNOT count ofn i i 18: s ← predict_score(a, b, d ) p i 19: ifs < ϵ then 20: return n , s i i 21: ifs < s then i p 22: return n , s i i 23: ifd < δ then 24: push n onto queue with priority h(d )+CNOT count ofn i i Fig. 5. Preix formation algorithm in LEAP, based on the algorithm in [6]. 4 INCREMENTAL RE-SYNTHESIS The end result of incremental synthesis (or other divide-and-conquer methods, partitioning techniques, etc.) is that circuit pieces are optimized in disjunction, with the potential of missing the optimal solution. For LEAP, this is illustrated in Figure 4. Preix synthesis generates a natural partitioning of the circuit. Each partition is optimized based on knowledge local to its sub-tree. The inal solution is composed of local optima. The basic observation here is that by chopping and combining pieces of the circuit generated by preix synthesis, we can create new, unseen circuits for the optimization process. For incremental re-synthesis, we use the output circuit from preix synthesis and its partitioning (the list of depths where preixes were ixed). The reoptimizer removes circuit segments to create łholesž of a size provided by the user (referred to as re-synthesis window) centered on the divisions between partitions. This circuit is lifted to a unitary, and the reoptimizer synthesizes it and replaces it into the original solution. The process continues iteratively until a stopping criterion is reached. This amounts to moving a sliding optimization window across the circuit. ACM Trans. Quantum Comput. LEAP: Scaling Numerical Optimization Based Synthesis Using an Incremental Approach • 9 The quality of thesolution is determined by the choice of the size of the re-synthesis window, the number of applications (circuit coverage) and stopping criteria, and the numerical optimizer. In LEAP we make several pragmatic choices. The size of the optimization window is selected to be long enough for reduction potential but overall short enough that it can be optimized fast enough. The algorithm reoptimizes exactly once at each boundary in the original partitioning. The re-synthesis pass allows us to manage the budget given to numerical optimizers. Since each circuit piece is likely to be transformed multiple times, some of the operations can use fast but lower-quality/budget optimization. We do use the fastest optimizer available during preix synthesis, switching during re-synthesis to the higher-quality but slower multistart solv 13er], based on [ described in Section 6. 5 DIMENSIONALITY REDUCTION The circuit solution provides a parameterized structure instantiated for the solution. This parameterization introduced by the single-qubit U gates may overit the problem. For LEAP, which targets only the CNOT count, this may be a valid concern, and we therefore designed a dimensionality reduction pass. We use a simple algorithm that attempts to delete U gateone at a time and reinstantiates the circuit at each step. This linear complexity algorithm can discover and remove only simple correlations between parameters. More complex cases can be discovered borrowing from techniques for dimensionality reduction for machine learning [24] or numerical optimization [25]. When applied to the inal synthesis solution, dimensionality reduction may reduce the circuit critical path even further by deleting U gates. It can also also be combined with the preix synthesis. Once a preix is formed, we can reduce its dimensionality. As numerical optimizers scale exponentially with parameters, this will improve the execution time per invocation. On the other hand, it may afect the quality of the solution as we remove expressive power from preixes. In the current LEAP version, only the inal solution is simpliied. 6 MULTISTART OPTIMIZATION Solving the optimization problem for the objective function in LEAP or QSearch can be diicult. Quantum circuits, even optimal ones, are not unique: a global phase is physically irrelevant and thus does not afect the output. Furthermore, circuits that difer only in a local basis transformation and its inverse surrounding a circuit subsection (e.g., a single 2-qubit gate) are mathematically equivalent. Provided native gate sets may contain equivalences; and single-qubit gates, being rotations, are periodic. As a practical matter, we ind that we cannot declare these equivalences to existing optimizers. Furthermore, where they can be used to create constraints or inaccessible regions (e.g., by remapping the periodicity into a single region), we ind that they hinder the search, because boundaries can create artiicial local minima. The unavoidable presence of equivalent circuits means that we are essentially overitting the problem, where changes in parameters can cancel each other out, leading to saddle points, which turn into local minima in the optimization surface because of the periodicity; see Figure 6. The former cause, at best, an increase in the number of iterations as progress slows down because of smaller gradients; the latter risks getting the optimizer stuck. Another problem comes from the speciication of the objective: distance metrics care only about the output, and diferent circuits can thus result in equal distances from the desired unitary. If no derivatives are available, this results in costly evaluations just to determine no progress can be made, a problem that gets worse at scale. But even with a derivative, it closes directions for exploration and shrinks viable step sizes, thus increasing the likelihood of getting stuck in a local minimum. There are physical diferences; in particular such circuits tend to sample diferent noise proiles. This property forms the basis of randomized compilation. ACM Trans. Quantum Comput. parameter 2 10 • Ethan Smith, Marc G. Davis, Jefrey M. Larson, Ed Younis, Lindsay Bassman, Wim Lavrijsen, and Costin Iancu saddle points global minimum Fig. 6. Optimization surface near the global minimum for a 4-qubit circuit of depth 6 for the irst step in the QITE algorithm, varying 6 (3 pairs) out 42 parameters equally, showing the efect on the optimization surface for 2 parameters from distinct pairs. (The global minimum is so pronounced only because the remaining 36 parameters are kept ixed at optimum, reducing the total search space; most of the 42-dim surface is lat. In sum, local optimization methods are highly dependent on the starting parameters, yet global optimization methods can require far too many evaluations to be feasible for real-world objectives. An attractive middle ground is an approach that starts many local optimization runs from diferent points in the domain. Multistart optimization methods are especially appealing when there is some structure in the objective, such as the least-squares form of the objective. Some multistart approaches complete a given local optimization run before starting another, whereas others may interleave points from diferent runs. The asynchronously parallel optimization solver for inding multiple minima (APOSMM) 13[] begins with a uniform sampling of the domain and then starts local optimization runs from any point subject to constraints: (1) point not yet explored; (2) not a local optimum; and (3) no point available within a distance r with a smaller function value. If no such point is available, more sampling is performed. The radiusr decreases as more points are sampled, thereby allowing past points to start runs. Under certain conditions on the objective function and the local optimization method, the logic of APOSMM can be shown to asymptotically identify all local optima while starting only initely many local optimization runs. 7 EXPERIMENTAL SETUP LEAP, available https://github.com/BQSKit/qsearch at , extends QSearch. We evaluated it with Python 3.8.5, using numpy 1.19.5 and Rust 1.48.0 code. For our APOSMM implementation, we integrated with the version in the libEnsemble Python 26package , 27]. [ We tried two diferent local optimization methods within APOSMM: the L-BFGS implementation22 within ] SciPy [ and the Google Ceres [23] least-squares optimization routine. For experimental evaluation, we use a 3.35GHz Epyc 7702p based server, with 64 cores and 128 threads. Our workload consists of known circuits mul (e.g.,, add, Quantum Fourier Transform), as well as newly introduced algorithms. VQE15[] starts with a parameterized circuit and implements a hybrid algorithm where parameters are reinstantiated based on the results of the previous run. The TFIM 16] and [ Quantum Imaginary Time Evolution (QITE) [18] algorithms model the time evolution of a system. They are particularly challenging for NISQ devices as circuit length grows linearly with the simulated time step. In TFIM, each timestep (extension) can be computed and compiled ahead of time from irst principles, while in QITE it is dependent on the previous time step. We evaluate LEAP against QSearch and other available state-of-the-art synthesis software and compilers. QFAST [28] scales better than QSearch by conlating search for structure with numerical optimization, albeit ACM Trans. Quantum Comput. parameter 1 LEAP: Scaling Numerical Optimization Based Synthesis Using an Incremental Approach • 11 producing longer circuits. Qiskit-synth 10] uses [ linear algebra decomposition rules for fast synthesis, but circuits tend to be long. IBM Qiskit 8] [provides łtraditionalž quantum compilation infrastructures using peephole optimization and mapping algorithms. CQC9]Tket prov[es another good quality compilation infrastructure across multiple gate sets. To showcase the impact of QPU topology, we compile for processors where qubits are fully connected (all-to-all), as well as processors with qubits connected in a nearest-neighbor (linear) fashion. Table 1. Results for 3-4 qubit synthesis benchmarks. * 3 Qubit results were chosen as the best run of two samples. 3 Qubits* 4 Qubits fredkin tofoli grover hhl or peres qft3adder vqe TFIM-1 TFIM-10 TFIM-22 TFIM-60 TFIM-80 TFIM-95 Qiskit Mapped 8 6 7 5 6 5 6 10 76 6 60 132 360 480 570 QFAST 8 8 7 3 8 7 7 15 43 8 14 16 18 14 21 All-to-All LEAP 7 6 6 3 6 5 6 12 22 6 12 13 12 15 12 TKet Mapped 8 6 7 3 6 5 6 10 71 6 60 132 360 480 570 CNOTs Qiskit Synthesized 15 9 29 13 11 11 27 66 566 124 218 218 218 218 218 Qiskit Mapped 12 13 14 11 11 9 8 20 85 6 60 132 360 480 570 QFAST 8 8 7 4 8 7 8 36 40 6 10 10 12 12 23 Linear LEAP 8 8 7 3 8 7 7 14 24 7 12 13 12 13 12 TKet Mapped 14 9 13 3 12 11 9 16 71 6 60 132 360 480 570 Qiskit Synthesized 30 17 74 30 19 28 70 247 2630 477 523 523 523 523 523 Qiskit Mapped 10 8 17 10 9 9 11 11 86 7 70 154 420 560 665 QFAST 19 19 17 9 19 17 17 34 91 20 32 36 40 32 46 All-to-All LEAP 17 15 15 9 15 13 15 26 49 16 28 28 28 31 28 TKet Mapped 10 8 16 5 8 9 11 10 76 7 61 133 361 481 571 U s Qiskit Synthesized 19 11 42 17 17 12 39 88 671 160 261 261 261 261 261 Qiskit Mapped 23 22 30 22 22 20 18 37 106 7 70 154 420 560 665 QFAST 19 19 17 11 19 17 19 76 84 16 24 24 28 28 50 Linear LEAP 19 19 17 9 19 17 17 32 53 18 28 30 28 30 28 Qiskit Synthesized 50 32 126 49 37 45 120 410 4169 785 851 851 851 850 851 Qiskit Mapped 11 11 16 11 8 8 12 11 116 10 73 157 423 563 668 QFAST 17 17 15 7 17 15 15 21 61 9 21 29 29 29 35 All-to-All LEAP 15 13 13 7 13 11 13 19 39 13 21 24 19 27 21 TKet Mapped 10 8 16 5 8 9 11 10 76 7 61 133 361 481 571 Depth Qiskit Synthesized 29 17 56 26 21 19 51 121 1062 227 421 421 421 421 421 Qiskit Mapped 23 24 29 23 21 18 17 32 136 10 73 157 423 563 668 QFAST 17 17 15 9 17 15 17 63 63 9 13 21 25 21 31 Linear LEAP 17 17 15 7 17 15 15 27 41 15 23 23 25 23 21 TKet Mapped 12 11 15 6 8 8 12 11 104 10 73 157 423 563 668 Qiskit Synthesized 56 34 139 55 38 51 132 390 3949 770 852 852 852 852 852 Qiskit Mapped 1.64 1.27 1.50 1.36 1.88 1.75 1.42 1.91 1.40 1.30 1.78 1.82 1.84 1.85 1.85 QFAST 1.59 1.59 1.60 1.71 1.59 1.60 1.60 2.33 2.20 3.11 2.19 1.79 2.00 1.59 1.91 All-to-All LEAP 1.60 1.62 1.62 1.71 1.62 1.64 1.62 2.00 1.82 1.69 1.90 1.71 2.11 1.70 1.90 TKet Mapped 19 15 22 6 16 15 16 16 104 10 73 157 423 563 668 Parallelism Qiskit Synthesized 1.17 1.18 1.27 1.15 1.33 1.21 1.29 1.27 1.16 1.25 1.14 1.14 1.14 1.14 1.14 Qiskit Mapped 1.52 1.46 1.52 1.43 1.57 1.61 1.53 1.78 1.40 1.30 1.78 1.82 1.84 1.85 1.85 QFAST 1.59 1.59 1.60 1.67 1.59 1.60 1.59 1.78 1.97 2.44 2.62 1.62 1.60 1.90 2.35 Linear LEAP 1.59 1.59 1.60 1.71 1.59 1.60 1.60 1.70 1.88 1.67 1.74 1.87 1.60 1.87 1.90 TKet Mapped 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.03 0.18 0.01 0.10 0.23 0.66 0.90 1.08 Qiskit Synthesized 1.43 1.44 1.44 1.44 1.47 1.43 1.44 1.68 1.72 1.64 1.61 1.61 1.61 1.61 1.61 Qiskit Mapped 0.04 0.04 0.05 0.05 0.04 0.08 0.04 0.05 0.36 0.03 0.20 0.40 1.00 1.33 1.67 QFAST 1.82 1.77 1.82 0.23 4.57 0.54 0.70 7.71 553.79 1.29 13.19 12.26 10.87 6.12 11.29 All-to-All LEAP 2.99 1.89 1.84 0.47 1.01 0.60 0.98 34.57 2006.31 10.56 42.59 16.41 31.73 30.71 51.12 TKet Mapped 0.02 0.01 0.01 0.01 0.01 0.01 0.01 0.03 0.19 0.01 0.11 0.25 0.70 0.94 1.14 Time (s) Qiskit Synthesized 0.26 0.14 0.86 0.24 0.22 0.19 0.60 1.58 12.10 2.85 3.36 3.50 3.37 3.52 3.32 Qiskit Mapped 0.17 0.15 0.17 0.15 0.16 0.18 0.13 0.20 1.04 0.06 0.32 0.66 1.82 2.54 2.93 QFAST 1.66 1.64 1.78 0.41 1.60 1.25 1.89 16.25 201.63 0.64 1.77 1.85 3.00 2.81 6.08 Linear LEAP 2.42 1.62 1.42 0.21 1.52 1.13 0.72 32.61 765.19 1.93 57.15 18.82 9.54 12.80 11.24 TKet Mapped 0.02 0.02 0.03 0.02 0.03 0.02 0.02 0.04 0.27 0.02 0.17 0.38 1.14 1.44 1.72 Qiskit Synthesized 0.45 0.28 1.96 0.46 0.37 0.39 1.13 4.07 41.59 7.55 7.02 8.25 6.44 8.47 7.00 8 EVALUATION Summarized results are presented in Table 2, with more details in Tables 3 and 4. We present data for all-to-all and nearest-neighbor chip topology. ACM Trans. Quantum Comput. 12 • Ethan Smith, Marc G. Davis, Jefrey M. Larson, Ed Younis, Lindsay Bassman, Wim Lavrijsen, and Costin Iancu Table 2. Summary of the quality metrics (average value) for ive- and six-qubit circuit synthesis. * Qiskit’s methods are exact, yet due to some post-processing in their mapping pipeline, large errors are shown. All-to-all Linear Qiskit Mapped Tket Mapped LEAP QFAST Qiskit SynthesisQiskit Mapped Tket Mapped LEAP QFAST Qiskit Synthesis Time (s) <1 <1 7.34e3 423 31 1.4 <1 608 342 76 Error 1e-16 3e-15 1e-12 1e-4 1e-11 2.9e-1* 3e-15 1e-12 1e-5 9e-1* CNOT 240 240 18.85 27.8 1991 250 248.6 18.8 36.4 6115 U 270 243.07 41.71 60.9 2155 291 270.27 42.7 78.2 9512 Depth 207 206.67 29.2 43.9 3912 321 215.47 28 48.6 9004 Table 3. Summary of synthesis results for QSearch and LEAP on the linear topology. LEAP produces very similar results as QSearch in signiicantly less time. QSearch LEAP ALG Qubits Ref CNOT Unitary Distance Time (s)CNOT Unitary Distance Time (s) −16 −16 QFT 3 6 7 3.33∗ 10 2.0 8 2.22∗ 10 1.7 −16 −16 Tofoli 3 6 8 2.22∗ 10 3.4 8 2.22∗ 10 1.6 −16 −16 Fredkin 3 8 8 4.44∗ 10 2.6 8 3.33∗ 10 1.7 −16 Peres 3 5 7 0 1.7 7 2.22∗ 10 1.1 −16 −16 Logical OR 3 6 8 2.22∗ 10 3.4 8 3.33∗ 10 1.6 −16 −16 QFT 4 12 14 6.7∗ 10 2429.3 13 6.7∗ 10 77.9 TFIM-1 4 6 6 0 13.4 6 0 7.2 −11 −11 TFIM-10 4 60 11 9.08∗ 10 955.4 11 3.95∗ 10 47.8 −15 −16 TFIM-22 4 126 12 1.22∗ 10 2450.3 12 7.77 41.6 −16 −16 TFIM-60 4 360 12 4.44∗ 10 1391 12 2.22∗ 10 31.6 −16 −16 TFIM-80 4 480 12 4.44∗ 10 1553.1 12 2.22∗ 10 35 −16 −16 TFIM-95 4 570 12 6.66∗ 10 1221.4 12 2.22∗ 10 38.1 Table 3 presents a direct comparison between QSearch and LEAP for circuits up to four qubits. Despite its heuristics, LEAP produces optimal depth solutions, matching the reference implementations on nearest-neighbor chip topology. Overall, LEAP can compile four-qubit unitaries× up faster to 59than QSearch. As shown in Table 4, LEAP scales up to six qubits. In this case, we include full topology data, as well results for compilation with QFAST, Qiskit, Qiskit-synth, and Tket. On well-known quantum circuits such as VQE and QFT and physical simulation circuits such as TFIM, LEAP with re-synthesis can reduce the CNOT count by up to 48×, or 11× on average when compared to Qiskit. On average when compared to Tket, LEAP reduces depth by a factor of ×7. Our heuristics rarely afect solution quality, and LEAP can match optimal depth solutions. At ive and six qubits, LEAP synthesizes circuits with×to fe1.19 wer CNOTs on average compared with QFAST, albeit with an average 3.55 × performance penalty. LEAP can be one order of magnitude slower than Qiskit-synth while providing two or more orders of magnitude shorter circuits. 8.1 Impact of Prefix Synthesis Most of the speed improvements are directly attributable to preix synthesis, which reduces by orders of magnitude the number of partial solutions evaluated. For example, for QFT4, the whole search space contains ≈ 43M solution candidates. QSearch will explore 2,823 nodes, while LEAP will exploreTFIM-22 410. For, these numbers are (≈ 1.6M, 54,020, 176) respectively. Detailed results are omitted for brevity. Preix formation is calculated based on a best-it line formed by a linear regression of the best scores versus the depth associated with the new best-found score. This linear regression is used as an estimator of the expected score at the current depth. When the score calculated from the heuristic is better than the expected score, this means that the new best score is better than expected; in other words, more progress to the solution has been made than expected. We note that when the search algorithm in QSearch needs to backtrack and search many diferent nodes, the progress towards the solution is slower, and the calculated score is worse than the expected ACM Trans. Quantum Comput. LEAP: Scaling Numerical Optimization Based Synthesis Using an Incremental Approach • 13 Table 4. Results for 5ś6 qubit synthesis benchmarks with QFAST, LEAP, and IBM Qiskit. (* implies the program timed out after 12 hours.) 5 Qubits 6 Qubits grover5 hlf mul qaoa qft5 TFIM-10 TFIM-40 TFIM-60 TFIM-80 TFIM-100 TFIM-1 TFIM-10 TFIM-24 TFIM-31 TFIM-51 Qiskit Mapped 48 13 17 20 20 80 320 480 640 800 10 100 240 310 510 TKet Mapped 48 7 15 20 20 80 320 480 640 800 10 100 240 310 510 All-to-All LEAP * 9 13 * 31 18 22 21 21 22 10 * * * * QFAST 70 13 18 39 46 20 20 24 22 26 12 29 26 24 28 CNOTs Qiskit Synthesized 570 870 77 750 580 1025 1025 1025 1025 1025 4006 4474 4474 4474 4474 Qiskit Mapped 131 23 22 55 31 80 320 480 640 800 10 100 240 310 510 TKet Mapped 96 16 24 42 41 80 320 480 640 800 10 100 240 310 510 Linear LEAP 49 15 15 28 30 18 20 20 20 20 10 24 27 29 30 QFAST 60 55 58 69 114 12 18 20 20 21 10 16 20 22 32 Qiskit Synthesized 2503 2578 760 2692 2622 2791 2791 2791 2791 2791 13155 13365 13365 13365 13365 Qiskit Mapped 78 8 16 20 29 90 360 540 720 900 11 110 264 341 561 All-to-All TKet Mapped 72 10 16 19 29 81 321 481 641 801 11 101 241 311 511 LEAP * 22 27 * 65 41 49 45 47 45 26 * * * * U s QFAST 145 31 41 83 97 45 45 53 49 57 30 64 58 54 62 Qiskit Synthesized 672 976 87 861 687 1140 1140 1140 1140 1140 4294 4765 4765 4765 4765 Qiskit Mapped 235 37 40 93 63 90 360 540 720 900 11 110 264 341 561 Linear TKet Mapped 72 10 15 19 29 81 321 481 641 801 11 101 241 311 511 LEAP 103 35 35 61 65 41 45 45 45 45 26 54 60 64 64 QFAST 125 115 121 143 233 29 41 45 45 47 26 38 46 50 70 Qiskit Synthesized 4008 4046 1190 4264 4165 4400 4401 4401 4401 4400 20375 20659 20658 20656 20658 Qiskit Mapped 85 16 26 32 26 76 286 426 566 706 16 79 177 226 366 All-to-All TKet Mapped 85 8 25 32 26 76 286 426 566 706 16 79 177 226 366 LEAP * 13 22 * 47 23 35 31 31 33 13 * * * * Depth QFAST 123 21 33 65 85 31 33 49 29 39 13 47 29 29 33 Qiskit Synthesized 1064 1662 138 1451 1089 2008 2008 2008 2008 2008 7872 8841 8841 8841 8841 Qiskit Mapped 200 34 40 76 44 76 286 426 566 706 16 79 177 226 366 Linear TKet Mapped 133 17 36 53 43 76 286 426 566 706 16 79 177 226 366 LEAP 71 17 29 41 45 25 27 35 35 31 13 31 31 33 37 QFAST 99 87 77 83 151 17 21 29 21 27 13 17 25 21 45 Qiskit Synthesized 3799 3933 1115 4061 3924 4236 4236 4236 4236 4236 19074 19495 19495 19494 19494 Qiskit Mapped 0.16 0.05 0.07 0.07 0.11 0.22 0.88 1.19 1.68 2.03 0.04 0.28 0.62 0.80 1.41 All-to-All TKet Mapped 0.07 0.03 0.03 0.04 0.04 0.15 0.68 1.03 1.41 1.79 0.03 0.20 0.46 0.59 0.97 LEAP * 618.62 652.92 * 11418.54 7826.57 16527.44 9069.7 6628.47 1586.35 19233.36 * * * * Time (s) QFAST 3187.40 27.70 86.79 249.15 499.49 79.86 69.38 71.98 77.42 215.13 23.14 618.43 191.99 270.70 684.63 Qiskit Synthesized 11.61 14.50 2.65 14.61 14.43 14.35 15.04 14.59 14.27 16.52 82.16 62.93 64.10 63.34 64.62 Qiskit Mapped 1.12 0.24 0.38 0.46 0.34 0.43 1.75 2.57 3.39 4.31 0.09 0.51 1.30 1.61 2.60 Linear TKet Mapped 0.14 0.04 0.05 0.06 0.07 0.25 1.10 1.65 2.12 2.69 0.06 0.34 0.76 0.98 1.58 LEAP 25233.78 165.50 856.36 3525.54 5165.28 11631.55 3585.95 2113.57 1901.41 2835.3 7651.29 145303.80 175491.42 177015.25 47681.98 QFAST 992.38 228.55 213.94 365.15 1901.26 7.67 22.78 26.63 30.28 21.01 5.25 61.68 82.52 408.35 772.39 Qiskit Synthesized 33.20 34.42 12.38 36.25 38.37 35.93 35.53 32.27 34.11 32.41 170.08 161.25 156.66 161.30 159.81 Table 5. Number and location of preix blocks for various circuits. ALG Qubits CNOT # of Blocks Block End Locations fredkin 3 8 2 5,8 tofoli 3 8 2 6,9 grover3 3 7 2 5,7 hhl 3 3 1 3, or 3 8 2 5,8 peres 3 7 2 6,7 qft3 3 8 2 5,9 qft4 4 18 4 5,13,18,21 adder 4 15 3 8,14,19 vqe 5 20 8 3,7,11,14,18,21,25,28 TFIM-1 4 7 2 5,7 TFIM-10 4 12 3 5,10,12 TFIM-22 4 12 3 5,10,12 TFIM-60 4 12 3 5,10,12 TFIM-80 4 12 3 5,10,12 TFIM-95 4 12 3 5,10,12 mul 5 15 5 3,9,12,16,18 qaoa 5 28 7 6,10,14,19,24,29,35 qft5 5 30 10 5,8,11,15,20,25,30,35,38,40 TFIM-10 5 18 7 3,6,9,13,16,19,21 TFIM-40 5 20 7 3,7,10,13,16,19,21 TFIM-60 5 20 7 3,6,10,15,18,21,24 TFIM-80 5 20 7 3,6,11,16,20,23,24 TFIM-100 5 20 6 5,9,13,17,20,22 TFIM-1 6 10 4 4,7,10,12 ACM Trans. Quantum Comput. 14 • Ethan Smith, Marc G. Davis, Jefrey M. Larson, Ed Younis, Lindsay Bassman, Wim Lavrijsen, and Costin Iancu score. We, therefore, do not form preixes in this case, which allows LEAP to maintain the important backtracking and searching that makes QSearch optimal. Table 5 presents the number of preixes formed during synthesis for each circuit considered. Since preixes have a depth between three and ive qubits, this informs our choice of the re-synthesis window discussed below. 8.2 Impact of Incremental Re-synthesis While signiicantly reducing depth (with respect to the circuit reference), preix synthesis can be improved upon by incremental re-synthesis, as shown by the comparison in Table 6. LEAP applies only a single step of re-synthesis. Given the solution from preix synthesis, LEAP selects a window at each preix boundary, resynthesizes, and reassmembles the circuit. Detailed results are omitted for brevity, but further iterations do little to improve the solution. Table 6. Summary of the CNOT reduction and time for resynthesis on the linear topology. Before Resynthesis After Resynthesis ALG Qubits CNOT Unitary Distance Time (s)CNOT Unitary Distance Time (s) qft3 3 9 0 1.6 8 0 3.4 −16 −16 logical or 3 8 4.44∗ 10 1.4 8 4.44∗ 10 5.9 −16 −16 fredkin 3 8 2.22∗ 10 1.4 8 2.22∗ 10 5.7 −16 tofoli 3 9 2.22∗ 10 1.7 8 0 3.4 −16 adder 4 19 0 48.9 15 2.22∗ 10 76.7 −16 −16 qft4 4 21 2.22∗ 10 38.6 18 1.11∗ 10 190.3 −12 −12 TFIM-10 4 12 8.03∗ 10 10.3 12 8.03∗ 10 176.6 −16 −16 TFIM-80 4 12 6.66∗ 10 4.2 12 6.66∗ 10 103.8 −16 −16 TFIM-95 4 12 4.44∗ 10 6.5 12 4.44∗ 10 113 −11 −11 vqe 4 28 2.47∗ 10 151.2 20 2.70∗ 10 2062.8 −15 −16 qft5 5 40 1.22∗ 10 772.4 30 6.66∗ 10 4392.8 −12 −12 TFIM-10 5 21 7.97∗ 10 310.6 18 9.19∗ 10 11320.8 −16 TFIM-40 5 21 6.66∗ 10 44 20 0 3541.8 TFIM-60 5 24 0 66.9 20 0 2046.5 −16 −16 TFIM-80 5 24 2.22∗ 10 73.5 20 2.22∗ 10 1827.8 −16 −16 TFIM-100 5 22 4.44∗ 10 55.4 20 1.11∗ 10 2779.8 −16 −16 mul 5 18 4.44∗ 10 47.0 15 2.22∗ 10 809.2 −16 −16 TFIM-1 6 12 2.22∗ 10 213.3 10 1.11∗ 10 7437.9 The re-synthesis window in LEAP is chosen pragmatically with a limited depth (7 CNOTs for 3 and 4 qubits, 5 CNOTs for 5 and 6 qubits in our case), to lead to reasonable expectations on execution time, while providing some optimization potential. Incremental re-synthesis reduces circuit depth by 15% on average, albeit in many cases with a signiicant impact on the runtime. 8.3 Impact of Dimensionality Reduction LEAP applies a single step of dimensionality reduction at the end of the synthesis process, the sweep starting at the circuit beginning. For brevity, we omit detailed data and note that in this inal stage dimensionality reduction eliminates up to 40%Uofgates (parameters) and shortens the circuit critical path. These results indicate that our approach overits the problem by inserting too many U gates. We examined the spatial occurrence of single-qubit gate deletion since this may guide any dynamic attempts to eliminate parameters during synthesis for scalability purposes. Figure 7 presents a summary for three-qubit circuits; trends are similar for all other benchmarks considered. The data shows that gate deletion is successful at many circuit layers, indicating that a heuristic for on-the-ly dimensionality reduction heuristic may be feasible to develop for even further scalability and quality improvements. As discussed in Section 6, dimensionality ACM Trans. Quantum Comput. LEAP: Scaling Numerical Optimization Based Synthesis Using an Incremental Approach • 15 Table 7. Spatial placement of U gates deleted. The number of columns denotes circuit stages (CNOTs), and we present the number of gates deleted at each position. Name Number ofGates Deleted qft2 2 0 0 0 qft3 2 0 0 1 0 0 1 1 fredkin 3 2 0 1 1 2 0 0 1 tofoli 2 2 1 2 1 2 0 1 0 peres 2 0 1 2 0 1 0 1 logical_or 2 1 2 0 2 1 0 1 0 hhl 2 0 2 0 Table 8. Accuracy and speed of various optimizers on a variety of circuits. APOSMM-N means APOSMMN with starting points. BFGS Ceres APOSMM-8 APOSMM-12 APOSMM-16 APOSMM-20 APOSMM-24 ALG CNOT % Success Time (s) % Success Time (s) % Success Time (s) % Success Time (s) % Success Time (s) % Success Time (s) % Success Time (s) fredkin 8 89 0.03 69 0.01 100 0.13 100 0.14 100 0.14 100 0.15 100 0.16 logical_or 8 16 < 0.01 55 0.01 100 0.13 100 0.14 100 0.15 100 0.16 100 0.17 peres 7 18 < 0.01 73 0.01 69 0.08 90 0.11 92 0.12 98 0.13 99 0.14 tofoli 8 43 0.01 74 0.01 100 0.13 100 0.14 100 0.14 100 0.15 100 0.17 qft3 8 9 < 0.01 26 < 0.01 80 0.10 91 0.12 95 0.13 98 0.14 100 0.16 qft4 18 1 < 0.01 15 0.02 66 0.50 83 0.68 92 0.82 94 0.99 99 1.08 qft5 30 0 < 0.01 2 0.12 8 1.19 13 2.78 15 3.81 25 7.21 36 12.10 reduction will reduce the number of parameters for numerical optimization, while reducing overitting and gate (parameter) correlation that lead to cancellations of gate efects on a qubit. 8.4 Impact of Multistart Optimization When evaluating numerical optimizers used in synthesis, we are interested in determining how often they found the true minimum, since this has a signiicant impact on both solution quality and execution speed. We evaluated the commonly used local optimization methods Google’s Cer 23]es and [ an implementation of L-BFGS 22][as well as the multistart APOSMM [13] framework. We ran each optimizer 100 times on several circuits to evaluate their accuracy and speed. The results are summarized in Table 8. The QFT results illustrate that the BFGS and Ceres optimizers perform poorly even on a smaller circuit such as a three-qubit QFT, inding solutions just 9% and 26% of the time, much lower than even APOSMM with 8 starting points. We found that APOSMM with 12 starting points performed well on all but the ive-qubit QFT circuit. Since optimizing the parameters of the QFT5 circuit is a much higher-dimensional problem, even APOSMM with 24 starting points found solutions in only 36% of the runs. While APOSMM is much more accurate than BFGS and Ceres on the circuits we tested, it is also about an order of magnitude slower for larger circuits, even though the local optimization runs are done in parallel. In addition, the slowdown increases with the number of starting points. The time for QFT5 approximately doubles every 4 additional starting points for parallel runs. For our runs in Table 4 we selected 12 starting points since this number was reasonably accurate and takes a reasonable amount of time. Therefore when using LEAP, we use Ceres because it is fast and scales well, and a missed solution will be found during re-synthesis. During re-synthesis, APOSMM is used, since it is much more likely to ind true minima, thus strengthening the optimality of search-based algorithms. 8.5 Gate Set Exploration Similar to QSearch, LEAP can target diferent native gate sets and provide another dimension to circuit optimiza- tion or hardware design exploration. Besides CNOT, we have targeted other two-qubit gates supported by QPU √ √ manufacturers: CSX ( CNOT ), iSWAP, and SQISW (iSW AP). Here, the square root gates implement the matrix ACM Trans. Quantum Comput. 16 • Ethan Smith, Marc G. Davis, Jefrey M. Larson, Ed Younis, Lindsay Bassman, Wim Lavrijsen, and Costin Iancu Table 9. Number of two qubit gates needed to implement various three- to six-qubit circuits. Using CNOT reduces the number of two-qubit gates needed vs iSWAP, whereas a combination of CNOT and iSWAP reduces the number of two-qubit gates even further. ALG iSWAP CNOT qft4 22 13 tim-4-22 16 12 ALG CNOT SQCNOT iSWAP SQISW CNOT + iSWAP CNOT + SQCNOT iSWAP + SQISW tim-4-95 14 12 vqe 26 21 qft3 6 8 7 8 5 5 7 full adder 30 18 fredkin 7 9 7 9 7 7 8 hlf 22 13 tofoli 6 7 7 8 6 5 7 mul 18 13 peres 5 5 7 8 5 4 6 qft5 50 28 logical or 6 7 7 8 6 8 7 tim-5-40 29 20 tim-5-100 33 20 tim-6-24 40 28 tim-6-51 43 31 square root of their counterpart, and their composition has been previously studie 29] fordgeneric [ two-qubit programs. Results are presented in Table 9. We make the following observations: • While CNOT and iSWAP are considered eł quivalentž in terms of expressive power, using CNOT gates for larger circuits (ive and six qubits) tends to produce observably shorter circuits. • Mixing two-qubit gates (CNOT+iSWAP) tends to produce shorter circuits than when using CNOT alone. • The depths of CNOT- and CNOT -based circuits are very similar. Given that in some implementations the latency of CNOT gates may be shorter than that of CNOT gates, the former may be able to provide a performance advantage. • Sleator and Weinfurter30[] prove that the Tofoli gate can be optimally implemented using a ive-gate combination of CNOT andCNOT . LEAP can reproduce this result, which indicates it may provide a useful tool for discovering optimal implementations of previously proposed gates. These observations are somewhat surprising and probably worth a more detailed future investigation. While the data indicates that mixing CNOT and iSWAP can produce the shortest circuits, we found that in LEAP the search space size would double, hence the speed to the solution will sufer. Therefore for our experiments, we kept with the CNOTU + gate set that was used by QFAST and Qsearch. 9 DISCUSSION Overall, the results indicate that the heuristics employed in LEAP are much faster than QSearch and are still able to produce low-depth solutions in a topology-aware manner. The average depth diference for three- and four-qubit benchmarks between QSearch and LEAP is 0 across physical chip topologies and workload. We ind the preix formation idea intuitive, easily generalizable, and powerful. The method used to derive preix formation employs concepts encountered in numerical optimization algorithms and is easily identiiable in other search-based synthesis algorithms: łprogressž to the solution, and łregion of similarityž or plateau. The LEAP algorithm indicates that incremental and iterative approaches to synthesis work well. In our case, the results even indicate that one extra step of local optimization can match the eicacy of global optimization. This result bodes well for approaches that scale synthesis past hundreds of qubits through circuit partitioning, such as our QGo [31] optimization and QuEst [32] approximation algorithms. Dimensionality reduction as implemented in LEAP not only reduces the efects of overitting by numerical optimization but also opens a promising path for scaling numerical-optimization-based synthesis. Since we were able to delete 40% of parameters from the inal solution, we believe that by combining it with preix synthesis we can further improve LEAP’s scalability. Multistart optimization can be trivially incorporated into any algorithm, and we have indeed already modiied the QSearch and QFAST algorithms to incorporate it. Furthermore, the spirit of the multistart łapproachž can be ACM Trans. Quantum Comput. LEAP: Scaling Numerical Optimization Based Synthesis Using an Incremental Approach • 17 employed to further prune the synthesis search space. Whenever a preix formed, the synthesis algorithm had explored a plateau and a local minimum. At this stage, a multistart search could be started using as seeds other promising partial solutions within the tree. Fig. 7. TFIM circuit depth evolution and łidelityž when executed on the IBM Athens system. łIBMž is compiled with Qiskit, while łConstant Depthž is synthesized with LEAP 9.1 Scaling Hamiltonian Simulation Circuit Exploration: TFIM and QITE The preix formation idea is powerful and showcases how synthesis can turn into a capability tool. TFIM circuits simulate a time-dependent Hamiltonian, where the circuit for each time step containsž ł the circuit (computation) associated with the previous time step as a preix. The circuits generated by the TFIM domain generator grow linearly in size. In our experiments, we observed that after some initial time steps, all circuits for any late time step have an asymptotic constant depth. This observation led to the following experiment: we picked a circuit structure generated for a late simulation step and considered it as a parameterized template for all other simulation steps. We then successfully solved the numerical optimization problem with this template for any TFIM step. This procedure empirically provides us with a ixed-depth (short-depth) template for the TFIM algorithm. Furthermore, this demonstration motivated a successful efort 33] to [ derive from irst principles a ixed-depth circuit for TFIM. The results are presented in Figure 7. Note the highly increased idelity when running the circuit on the IBM Athens system. The QITE algorithm presents an interesting challenge to the preix formation idea. In this case, the next timestep circuit is obtained by extending the curr ł entž circuit with a block dependent on its output after execution. When executing on hardware, synthesis has real-time constraints, and it has to deal with the hardware noise that afects the output. Preliminary results, courtesy of our collaborators Jean-Loup Ville and Alexis Morvan, indicate that the approach taken for TFIM may be successful for QITE. Table 10 summarizes the preliminary observations and indicates that again synthesis produces better-quality circuits than the domain generator or traditional compilation does. Note that in this experiment LEAP was fast enough to produce real-time results during the hardware experiment only for three-qubit circuits. Table 10. Summary of QITE results when running synthesis on hardware experiments. Structure of any circuit is determined by the output of the previous circuit, hence hardware noise. CNOT QITE size Qiskit Isometry QFAST LEAP 2 3 3 3 3 30-35 10-12 7-12 4 160-200 70-80 30-50 ACM Trans. Quantum Comput. 18 • Ethan Smith, Marc G. Davis, Jefrey M. Larson, Ed Younis, Lindsay Bassman, Wim Lavrijsen, and Costin Iancu Factor.pdf Fig. 8. The efective branching factor of various circuits synthesized by LEAP. The entries are sorted by qubit count, then by circuit length. 9.2 Analysis of Runtime Scaling as a Synthesis Application Like other synthesis algorithms (that operate on the complete unitary as input), LEAP scales exponentially with the number of qubits. To compare the runtime to other synthesis algorithms, we consider the efective branching factor of the search in LEAP. To evaluate the complexity of LEAP’s algorithm, we calculated the efective branching factor of the search. The efective branching factor equation, from [34], is given by ∗ ∗ 2 ∗ d N + 1 = 1 + b + (b ) + ... + (b ) (2) where d is the depth of the search. We evaluated the efective branching factor of LEAP’s search on multiple circuits, presented in 8. The entries are sorted by qubit count, then by circuit length. As can be seen, there appears to be a negative correlation between inal depth of the solution and efective branching factor, indicating that LEAP’s pruning is efective at shrinking the search space. The best efective branching factor evaluated was ∗ ∗ TFIM-5-80, withb = 1.08. The worst efective branching factor was QFT 3, with b = 1.41. The average efective branching factor was b = 1.22. 3n ∗d Therefore the overall runtime complexity of LEAP is approximately O (2 · d · b ), which we can simplify to 3n+log (d )+dlogb ∗ 2 2 O (2 ) by grouping to a common base. Since log (d ) ≤ dlog (b ), we can further simplify to the 2 2 3n+dlogb ∗ simpliied complexity O (2of ) where n is the number of qubits, b = 1.22 the average efective branching factor, andd = ⌈ (4 − 3n− 1)⌉ is the minimum number of layers built by CNOTs plus one qubit gates to produce an arbitrary n-qubit gate according35 to].[Note that when possible LEAP generally inds a shorter solution when possible, so the complexity given is an upper bound on what we expect from LEAP’s performance. 3n UniversalQ, another synthesis algorithm, has runtime comple O (2xity ) [10] for decomposing isometries, which is dominated by matrix operations similar to what LEAP requires for simulation. However, LEAP also must search over permutations, which adds the extra factor to the complexity. ACM Trans. Quantum Comput. LEAP: Scaling Numerical Optimization Based Synthesis Using an Incremental Approach • 19 9.3 Application to General Circuit Optimization LEAP, like Qsearch, synthesizes the whole unitary. Thus the complexity of the algorithm is exponential in the number of qubits. However, as seen in the past two examples on Hamiltonian simulation, LEAP has already seen use in synthesizing smaller circuits where the impact of the exponential scaling is not felt. For synthesizing larger circuits, LEAP can be an important building block for partitioning based circuit optimizers. Partitioning based circuit optimization runtime scales at the rate of synthesis of each block, and linearly with the number of blocks. Therefore if the size of the block is kept small, the dominant scaling is the number of blocks to synthesize, which can be done in parallel. For such partitioned circuits, high quality block optimization is vital to the quality of the over-all optimization. Therefore LEAP provides a compelling block synthesizer for partitioners, as it provides high quality results for smaller block sizes. Looking further forward, the question remains whether numerical-optimization-based synthesis can be useful in fault-tolerant quantum computing. There, the single-qubit gates will change to Clifords and the T gate, or another non-Cliford gate that makes the gate set universal. The execution cost model is also expected to be diferent: CNOTs and Clifords become cheap, while the non-Cliford operations become expensive. Likely, the non-Clifords are qualitatively more eł xpensivež than CNOTs in NISQ computing. Thus, the optimization objective becomes minimizing the number of non-Cliford gates. We have already shown that LEAP can be retargeted to new gate sets. We also have very strong evidence that adding a multi-objective optimization approach to search-based synthesis works very well under a fault-tolerant quantum computing cost model. The data indicates that it is realistic to expect eicacy improvements similar to those provided by LEAP under the NISQ cost model. This work is ongoing (and due to intellectual property concerns, we cannot disclose more details). As the already mentioned scalable partitioning approaches only leverage LEAP and do not require additional cost models, this bodes very well for the future of numerical- optimization-based synthesis in fault-tolerant quantum computing. 10 RELATED WORK A fundamental result that spurred the apparition of quantum circuit synthesis is provided by the Solovayś Kitaev (SK) theorem. The theorem relates circuit depth to the quality of the approximation, and its proof is by construction36 [ ś 38]. Diferent approaches36 [ , 39ś 48] to synthesis have been introduced since, with the goal of generating shorter-depth circuits. These can be coarsely classiied based on several criteria: target gate set, algorithmic approach, and solution distinguishability. Target Gate Set: The SK algorithm is applicable to any universal gate set. Later examples include synthesis of z- rotation unitaries with Cliford+V approximation 49] or Clifor [ d+T gates50[]. When ancillary qubits are allowed, one can synthesize single-qubit unitaries with the Cliford+T50gate ś 52].set While [ these eforts propelled the ield of synthesis, they are not used on NISQ devices, which ofer a diferent gate R ,set R ,(CNOT ,iSW AP and x z MùlmerśSùrensen all-to-all). Several [1ś3] other algorithms, discussed below, have since emerged. Algorithmic Approaches:The early attempts inspired by the SolovayśKitaev algorithm use a recursive (or divide-and-conquer) formulation, sometimes supplemented with search heuristics at the bottom. More recent search-based approaches are illustrated by the meet-in-the-middle [41] algorithm. Several approaches use techniques from linear algebra for unitary and tensor decomposition. Bullock and Markov [44] use QR matrix factorization via a Givens rotation and Householder transformation 45], but open [ questions remain as to the suitability for hardware implementation because these algorithms are expressed in terms of row and column updates of a matrix rather than in terms of qubits. ACM Trans. Quantum Comput. 20 • Ethan Smith, Marc G. Davis, Jefrey M. Larson, Ed Younis, Lindsay Bassman, Wim Lavrijsen, and Costin Iancu The state-of-the-art upper bounds on circuit depth are provided by techniques 1, 2] that [ use cosine-sine decomposition. The cosine-sine decomposition was irst use 53]dfor incompilation [ purposes. In practice, commercial compilers ubiquitously deploy only KAK [5] decompositions for 2-qubit unitaries. The basic formulation of these techniques is topology independent. Specializing for topology increases the upper bound on circuit depth by large constants; Shende et al. 2] mention [ a factor of 9, improved by Iten et al. [1] to 4×. The published approaches are hard to extend to diferent qubit gate sets, however, and it remains to be seen whether they can handle qutrits. Several techniques use numerical optimization, much as we did. They describe the gates in their variational/- continuous representation and use optimizers and search to ind a gate decomposition and instantiation. The work closest to ours is that of Martinez et 3al. ], who [ use numerical optimization and brute-force search to synthesize circuits for a processor using trapped-ion qubits. Their main advantage is the existence of all-to-all MùlmerśSùrensen gates, which allow a topology-independent approach. The main diference between our work and theirs is that they use randomization and genetic algorithms to search the solution space, while we show a more regimented way. When Martinez et al. describe their results, they claim that MùlmerśSùrensen counts are directly comparable to CNOT counts. By this metric, we seem to generate circuits comparable to or shorter than theirs. It is not clear how their approach behaves when topology constraints are present. The direct comparison is further limited by the fact that they consider only randomly generated unitaries, rather than algorithms or well-understood gates such as Tofoli or Fredkin. Another topology-independent numerical optimization technique is presente 4]. The d in main [ contribution is to use a quantum annealer to do searches over sequences of increasing gate depth. The authors report results only for two-qubit circuits. All existing studies focus on the quality of the solution, rather than synthesis speed. They also report results for low-qubit concurrency: Khatri et4al. ] for [ two-qubit systems, Martinez et al. 3] for [ systems up to four qubits. Solution Distinguishability: Synthesis algorithms can be classiied as exact or approximate based on distin- guishability. This is a subtle classiication criterion, since many algorithms can be viewed as either. For example, the divide-and-conquer algorithm Meet-in-the-Middle propose 41],d although in [ designed for exact circuit synthesis, may also be used to construct an ϵ-approximate circuit. The results seem to indicate that the algorithm failed to synthesize a three-qubit QFT circuit. We classify our implementation as approximate since we rely on numerical optimization and therefore must accept solutions at a small distance from the original unitary. 11 CONCLUSION In this paper we describe the LEAP compiler and modiications to a search and numerical-optimization-based synthesis algorithm. The results indicate that we can empirically provide optimal-depth circuits in a topology- aware manner for programs up to six qubits. The techniques employed preix formation, incremental re-synthesis, dimensionality reduction, and multistart optimization and can be easily generalized to other algorithms from this class. We believe LEAP provides the best-quality optimizer currently available for circuits up to six qubits on NISQ hardware. Furthermore, LEAP is the linchpin in our scalable synthesis algorithms 31], (QuEst QGo [[32]) using circuit partitioning techniques. With these algorithms, we have demonstrated the synthesis of circuits up to hundreds of qubits. LEAP has been released as part of the BQSkit (Berkeley Quantum Synthesis Toolkit) infrastructure. [54] describes a method using Givens rotations and Householder decomposition. ACM Trans. Quantum Comput. LEAP: Scaling Numerical Optimization Based Synthesis Using an Incremental Approach • 21 ACKNOWLEDGMENTS This work was supported by the Advanced Quantum Testbed program and the Quantum Algorithm Teams program of the Advanced Scientiic Computing Research for Basic Energy Sciences program, Oice of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231 and DE-AC02-06CH11357. REFERENCES [1] Raban Iten, Roger Colbeck, Ivan Kukuljan, Jonathan Home, and Matthias Christandl. Quantum circuits for isometries. Physical Review A, 93:032318, Mar 2016. [2] V. V. Shende, S. S. Bullock, and I. L. Markov. Synthesis of quantum-logic cirIEEE cuits. Transactions on Computer-Aided Design of Integrated Circuits and Systems , 25(6):1000ś1010, June 2006. [3] E. Martinez, T. Monz, D. Nigg, P. Schindler, and R. Blatt. Compiling quantum algorithms for architectures with multi-qubit ArXiv gates. e-prints, July 2016. [4] Sumeet Khatri, Ryan LaRose, Alexander Poremba, Lukasz Cincio, Andrew T. Sornborger, and Patrick J. Coles. Quantum-assisted quantum compiling. arXiv e-prints , page arXiv:1807.00800, Jul 2018. [5] Robert R. Tucci. An Introduction to Cartan’s KAK Decomposition for QC Programmers. arXiv e-prints , pages quantśph/0507171, Jul [6] M. G. Davis, E. Smith, A. Tudor, K. Sen, I. Siddiqi, and C. Iancu. Towards optimal topology aware quantum circuit synthesis. 2020 In IEEE International Conference on Quantum Computing and Engineering (QCE) , pages 223ś234, 2020. [7] Ed Younis, Koushik Sen, Katherine Yelick, and Costin Iancu. QFAST: Quantum synthesis using a hierarchical continuous circuit space, [8] IBM Qiskit. Available at https://qiskit.org/. [9] Seyon Sivarajah, Silas Dilkes, Alexander Cowtan, Will Simmons, Alec Edgington, and Ross Duncan. t|ket: a retargetable compiler for nisq devices. Quantum Science and Technology , 6(1):014003, Nov 2020. [10] Raban Iten, Oliver Reardon-Smith, Luca Mondada, Ethan Redmond, Ravjot Singh Kohli, and Roger Colbeck. Introduction to UniversalQ- CompilerarXiv . e-prints , page arXiv:1904.01072, Apr 2019. [11] Victor Namias. The fractional order Fourier transform and its application to quantum IMA mechanics. Journal of Applied Mathematics , 25(3):241ś265, 03 1980. [12] P. E. Hart, N. J. Nilsson, and B. Raphael. A formal basis for the heuristic determination of minimum IEEEcost Transactions paths. on Systems Science and Cybernetics , 4(2):100ś107, July 1968. [13] Jefrey Larson and Stefan M. Wild. Asynchronously parallel optimization solver for inding multiple Mathematical minima. Programming Computation, 10(3), 2 2018. [14] Sergey Bravyi, David Gosset, and Robert König. Quantum advantage with shallow cir Science cuits. , 362(6412):308ś311, 2018. [15] Jarrod R McClean, Jonathan Romero, Ryan Babbush, and Alán Aspuru-Guzik. The theory of variational hybrid quantum-classical algorithms. New Journal of Physics , 18(2):23023, 2016. [16] Dongbin Shin, Hannes Hübener, Umberto De Giovannini, Hosub Jin, Angel Rubio, and Noejung Park. Phonon-driven spin-Floquet magneto-valleytronics in MoS . Nature Communications , 9(1):638, 2018. [17] Lindsay Bassman, Kuang Liu, Aravind Krishnamoorthy, Thomas Linker, Yifan Geng, Daniel Shebib, Shogo Fukushima, Fuyuki Shimojo, Rajiv K Kalia, Aiichiro Nakano, et al. Towards simulation of the dynamics of materials on quantum Physical computers. Review,B 101(18):184305, 2020. [18] Mario Motta, Chong Sun, Adrian T. K. Tan, Matthew J. O’Rourke, Erika Ye, Austin J. Minnich, Fernando G. S. L. Brandão, and Garnet Kin-Lic Chan. Determining eigenstates and thermal states on a quantum computer using quantum imaginary time Natur evolution. e Physics, 16(2):205ś210, 2020. [19] D Deutsch. Quantum computational networks.Proceedings of the Royal Society of London. A. Mathematical and Physical Sciences , 425(1868):73ś90, 09 1989. [20] Vadym Kliuchnikov, Alex Bocharov, and Krysta M. Svore. Asymptotically optimal topological quantum Pcompiling. hysical Review Letters, 112:140504, Apr 2014. [21] Nikolaus Hansen. The CMA evolution strategy: A tutorial. CoRR, abs/1604.00772, 2016. [22] Dong C. Liu and Jorge Nocedal. On the limited memory BFGS method for large scale optimization. Mathematical Programming , 45(1-3):503ś528, 1989. [23] Sameer Agarwal, Keir Mierle, and Others. Ceres solver. http://ceres-solver.org. [24] Laurens Van Der Maaten, Eric Postma, and Jaap Van den Herik. Dimensionality reduction: a comparativJeMach revieLearn w. Res, 10:66ś71, 2009. ACM Trans. Quantum Comput. 22 • Ethan Smith, Marc G. Davis, Jefrey M. Larson, Ed Younis, Lindsay Bassman, Wim Lavrijsen, and Costin Iancu [25] Vladislav Sovrasov. Comparison of dimensionality reduction schemes for derivative-free global optimization Procediaalgorithms. Computer Science, 136:136ś143, 2018. 7th International Young Scientists Conference on Computational Science, YSC2018, 02-06 July2018, Heraklion, Greece. [26] Stephen Hudson, Jefrey Larson, Stefan M. Wild, David Bindel, and John-Luke Navarro. libEnsemble users manual. Technical Report Revision 0.7.2, Argonne National Laboratory, 2021. [27] Stephen Hudson, Jefrey Larson, John-Luke Navarro, and Stefan Wild. libEnsemble: A library to coordinate the concurrent evaluation of dynamic ensembles of calculations. IEEE Transactions on Parallel and Distributed Systems , 2021. [28] Ed Younis, Koushik Sen, Katherine Yelick, and Costin Iancu. QFAST: Conlating search and numerical optimization for scalable quantum circuit synthesis, 2021. [29] Heng Fan, Vwani Roychowdhury, and Thomas Szkopek. Optimal two-qubit quantum circuits using exchange interactions. Physical Review A, 72(5), Nov 2005. [30] Tycho Sleator and Harald Weinfurter. Realizable universal quantum logic Phys. gates. Rev. Lett., 74:4087ś4090, May 1995. [31] Xin-Chuan Wu, Marc Grau Davis, Frederic T. Chong, and Costin Iancu. Optimizing noisy-intermediate scale quantum circuits: A block-based synthesis, 2020. [32] Tirthak Patel, Ed Younis, Costin Iancu, Wibe de Jong, and Devesh Tiwari. Robust and resource-eicient quantum circuit approximation, [33] Lindsay Bassman, Roel Van Beeumen, Ed Younis, Ethan Smith, Costin Iancu, and Wibe A. de Jong. Constant-depth circuits for dynamic simulations of materials on quantum computers, 2021. [34] Stuart J. Russell and Peter Norvig. Artiicial Intelligence: A Modern Approach, 4th Edition . Pearson, Hoboken, NJ, 2021. [35] Vivek V. Shende, Igor L. Markov, and Stephen S. Bullock. Minimal universal two-qubit controlled-NOT-basePdhysical circuits. Review A, 69(6), jun 2004. [36] C. M. Dawson and M. A. Nielson. The Solovay-Kitaev algorithm. Quant. Info. Comput., 6(1):81ś95, 2005. [37] A. B. Nagy. On an implementation of the Solovay-Kitaev algorithm. arXiv:quant-ph/0606077, 2016. [38] O. Al-Ta’Ani.Quantum Circuit Synthesis using Solovay-Kitaev Algorithm and Optimization Techniques . PhD thesis, Kansas State University, 2015. [39] A. De Vos and S. De Baerdemacker. Block-ZX Z synthesis of an arbitrary quantum circuit. Phys. Rev. A, 94:052317, Nov 2016. [40] Alex Bocharov and Krysta M. Svore. Resource-optimal single-qubit quantum cir Phys. cuits. Rev. Lett., 109:190501, Nov 2012. [41] Matthew Amy, Dmitri Maslov, Michele Mosca, and Martin Roetteler. A meet-in-the-middle algorithm for fast synthesis of depth-optimal quantum circuits. Trans. Comp.-Aided Des. Integ. Cir. Sys. , 32(6):818ś830, June 2013. [42] Esteban A Martinez, Thomas Monz, Daniel Nigg, Philipp Schindler, and Rainer Blatt. Compiling quantum algorithms for architectures with multi-qubit gates. New Journal of Physics , 18(6):063029, 2016. [43] B. Giles and P. Selinger. Exact synthesis of multiqubit Cliford+T Physical circuits. Review Letters. , 87(3):032332, March 2013. [44] S. S. Bullock and I. L. Markov. An arbitrary two-qubit computation in 23 elementary gates or less. ProceeIn dings 2003. Design Automation Conference (IEEE Cat. No.03CH37451), pages 324ś329, June 2003. [45] J. Urias. Householder factorization of unitary matrices. J. Mathematical Physics , 51:072204, 2010. [46] Mikko Möttönen, Juha J. Vartiainen, Ville Bergholm, and Martti M. Salomaa. Quantum circuits for general multiqubit Phys. Rev. gates. Lett., 93:130502, Sep 2004. [47] M. Amy and M. Mosca. T-count optimization and Reed-Muller coarXiv:1601.07363v1 des. , 2016. [48] G. Seroussi and A. Lempel. Factorization of symmetric matrices and trace-orthogonal bases in inite SIAMields. Journal on Computing , 9(4):758ś767, 1980. [49] Neil J. Ross. Optimal ancilla-free Cliford+V approximation of Quantum Z-rotations. Info. Comput., 15(11-12):932ś950, September 2015. [50] V. Kliuchnikov, D. Maslov, and M. Mosca. Practical approximation of single-qubit unitaries by single-qubit quantum Cliford and T circuits. IEEE Transactions on Computers, 65(1):161ś172, Jan 2016. [51] A. Kitaev, A. Shen, and M. Vyalyi. Classical and Quantum Computation . American Mathematical Society, Boston, MA, 2012. [52] Adam Paetznick and Krysta M. Svore. Repeat-until-success: Non-deterministic decomposition of single-qubit Quantum unitaries. Info. Comput., 14(15-16):1277ś1301, November 2014. [53] Robert R. Tucci. A rudimentary quantum compiler arXiv . e-prints , pages quantśph/9902062, Feb 1999. [54] Nikolay Vitanov. Synthesis of arbitrary SU(3) transformations of atomic Phys. qutrits. Rev. A, 85, 03 2012. ACM Trans. Quantum Comput.
ACM Transactions on Quantum Computing – Association for Computing Machinery
Published: Feb 10, 2023
Keywords: Gate-based quantum computing
You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.