Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Machine Learning Optimization of Quantum Circuit Layouts

Machine Learning Optimization of Quantum Circuit Layouts Machine Learning Optimization of uantum Circuit Layouts ALEXANDRU PALER, Aalto University, Finland, University of Texas at Dallas, USA, and Transilvania University of Bras ov, Romania LUCIAN M. SASU, Transilvania University of ov,Bras Romania ADRIAN-CĂTĂLIN FLOREA, Transilvania University of ov,Bras Romania RĂZVAN ANDONIE, Central Washington University, USA and Transilvania University sov, Romania of Bra The quantum circuit layout (QCL) problem is to map a quantum circuit such that the constraints of the device are satisied. We introduce a quantum circuit mapping heuristic, QXX, and its machine learning version, QXX-MLP. The latter infers automatically the optimal QXX parameter values such that the layed out circuit has a reduced depth. In order to speed up circuit compilation, before laying the circuits out, we are using a Gaussian function to estimate the depth of the compiled circuits. This Gaussian also informs the compiler about the circuit region that inluences most the resulting circuit’s depth. We present empiric evidence for the feasibility of learning the layout method using approximation. QXX and QXX-MLP open the path to feasible large scale QCL methods. CCS Concepts: · Hardware → Quantum computation; Software tools for EDA; · Software and its engineering → Compilers. 1 INTRODUCTION The quantum circuit layout problem is deeply related to the topology of the device used to execute the circuit: instructions cannot be applied between arbitrary hardware registers. Before executing a quantum circuit, this is adapted to the device’s register connectivity during a procedurecompilation called . Quantum circuit compilation is often called quantum circuit layout (QCL). The interest in eicient QCL methods is motivated by the current generation of quantum devices, called NISQ devices 29]. V[ery recent work presents worrisome evidence that even very small and shallow circuits are diicult to execute on NISQ [37]. NISQ circuit compilation includes, for example, error-mitigation 11],strategies lag-qubits [ 8[] and not just QCL methods, but the latter play deinitely an signiicant role in the compilation of large scale fully error-corrected circuits. However, scalable compilation is not possible with current state of the art QCL methods. Fast and scalable QCL allows laying out a circuit with multiple QCL parameter values and selecting the best compiled circuit. Without going into details, our preliminary analysis showed that at the time of writing this manuscript, when optimising aggressively, compilation can take up to 1 hour for most QCL methods when presented circuits of approximately 50 qubits. It is imperative to have eicient and conigurable QCL methods. We focus on three research questions: I. How can we determine the best QCL parameter values to minimize the depth of the compiled output circuit? Authors’ addresses: Alexandru Paler, Aalto University, Finland and University of Texas at Dallas, USA and Transilvania Univ sov,ersity of Bra Romania; Lucian M. Sasu, Transilvania University sov,ofRomania; Bra Adrian-Cătălin Florea, Transilvania University sov, Romania; of Bra , , Răzvan Andonie, Central Washington University, USA and Transilvania University ov, Romania. of Bras Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proit or commercial advantage and that copies bear this notice and the full citation on the irst page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior speciic permission and/or a fee. Request permissions from permissions@acm.org. © 2022 Copyright held by the owner/author(s). Publication rights licensed to ACM. 2643-6817/2022/9-ART $15.00 https://doi.org/10.1145/3565271 ACM Trans. Quantum Comput. 2 • Paler et al. Fig. 1. Continuous learning QCL: It is possible to learn our QCL method, which is called QXX (yellow), and replace it with a machine learning model (e.g. neural network - green). Optimal parameter values (blue) for adapting QXX performance are chosen by automatically executing weighted random search (WRS) in a loop. II. Choosing good parameter values for the QCL method should be very fast. How can we establish a time- performance trade-of between searching optimal parameter values and the minimization of the depth? III. Can QCL be sped up by machine learning? More formally, QCL takes as input a cir�cuit incompatible with a device’s (can be error-corrected) register �� connectivity and outputs a circuit � that is compatible Additional . gates are used to overcome the connectivity ��� limitations and compile a device-compatible � cir . Wcuit e deine the�����function between the depths of two ��� circuits, wher|�e| > 0 is the depth of circuit � . We formalize the QCL problem as follo QCL ws:optimization is the minimization of the�����function. |� | ��� �����(� ,� ) = (1) �� ��� |� | �� 1.1 Related Work QCL is already a wide topic, and we will not be providing a thorough exposition of the ield. We refer the reader to the works of [9, 31, 32, 34, 35] for detailed and careful overviews of some the works that inluenced and shaped QCL. Some of the irst discussions about circuit optimality and gate counts appeared in the seminal 4]. paper of [ After quantum circuits started being analysed as reversible circuits formed from Tofoli gates, a large number of exact methods and heuristics was proposed ś a not so recent but complete review30 is].[The complexity class of QCL was discussed irst in 19][ and this has been used as a foundation for proving the complexity of diferent QCL variations such as [7, 32, 34]. In general, there are heuristic QCL methods and exact QCL methods. A recent exact method is the one from [34] (the paper includes a discussion about bottlenecks of exact methods). Automatic quantum circuit compilation does not always generate the best possible circuit. One of the irst attempts to design full algorithm circuits considering hardware connectivity limitations 14]. Another is [ recent example of a hand optimised circuit 26],is [ where, interestingly, the optimisation was achieved by using a results known from the automatic optimisation of circuits. Most quantum circuit design automation tools treat QCL as a sequence of steps like initial placement and gate scheduling. The authors of31[] have a complete discussion of the theoretical implications of placement and scheduling. In practice, QCL has been solved by introducing SWAP gates, but there exist more reined methods like23[]. Solving QCL through search algorithms has been recently presented24in ] (b[eam search) and [41] (A*). The work of [31] introduced a parameterisable search algorithm for QCL, Bounded Mapping Tree. In this work, we focus on QCL as an instance of register allocation as introduce 32] dand byuse [ the by now classic graph view of the quantum device layout. Recently, graph-based QCL approaches (motivate 19]) d by [ ACM Trans. Quantum Comput. Machine Learning Optimization of uantum Circuit Layouts • 3 Fig. 2. The quantum circuit compilation (QCL) procedure takes an input � and circuit transforms it into the functionally �� equivalent � circuit. The QXX method computes an optimal assignment of circuit qubits (registers) to device qubits ��� (registers). seem to achieve the necessary performance for compiling very large circuits to complicated device topologies. Some QCL approaches were proposed using hyper-graphs [3], but more classical approaches are the ones from [10]. Machine learning has started being widely applied in diferent aspects of quantum computing. Nevertheless, compared to [40] we are not compiling only single qubit gates after needing impractical numbers of hours to train a very large network. In parallel and independent to the preparation of this manuscript, machine learning methods for QCL have been presented in28[]. Compared to [28], our method is capable of learning continuously. QCL includes two steps31[]. The irst step is initial placement (e.g., [25]), where a mapping of the circuit qubits to device registers is computed. This step is also called qubit 32 allo ]. The cation second [ QCL step isgate scheduling, where the circuit � is traversed gate-by-gate. �� 1.2 Contribution We present machine learning method (cf. Fig. 1) for the initial placement (i.e. mapping) of quantum circuits. In order to test the feasibility of learning, we develop an initial placement heuristic, which we called QXX for no particular reason. From a technical perspective, our methods are designed and implemented into great detail and capable of being deployed in practice. The machine learning model of QXX is called QXX-MLP, which is the signiicantly faster implementation of the QXX. At the date of writing this manuscript, our method was the irst to efectively learn a QCL heuristic. QXX is conigurable over multiple parameters, and our goal is to have a method that can automatically chose the best parameter values in order to achieve optimally compiled circuits. The novelties of QXX and QXX-MLP (Section 2) are: (1) automatic feature selection ś focusing on the important sub-circuit using a conigurable Gaussian function (Section 2.3); (2) automatic QCL coniguration ś we use weighted random search (Section 2.5) to optimize QXX parameter values. The parameters inluence also the speed of the QCL; (3) demonstrating QCL learnability ś the machine learning version QXX-MLP works as an approximation method for the circuit layout depth (Section 2.6); (4) scalability ś the almost instantaneous layout performance (Sections 3.4 and 3.2 and Table 3). We show that QCL execution time can be shortened while maintaining ����the �optimality. The goal of this work is to highlight the generality and wide applicability of learning QCL methods. For this reason, we focus on benchmarking circuits which are compatible with both non-error-corrected and error- corrected machines. We use synthetic benchmarking circuits, QUEK 35] (Se O[ction 3.1), which capture the properties of Tofoli+H (e.g. arithmetic, quantum chemistry, quantum inance etc.) circuits without being speciic ACM Trans. Quantum Comput. 4 • Paler et al. for a particular application. Section 3 presents and discusses experimental results performed with the QXX and QXX-MLP approaches. Conclusions are synthesized in Section 4. 2 METHODS We present the QXX method and describe the machine learning techniques that are using it. Subsection 2.1 gives the details of QXX. Fig. 1 illustrates the approach followed by this work during the QCL parameter optimization stage: The parameters of the normal QXX method (orange) are optimized using WRS. To further speed up QXX we train a machine learning model, called QXX-MLP, to predict ����the �values obtained by using the QXX for a given circuit and a particular set of parameter values. We employ three methods to evaluate how the parameter optimization of QXX inluences the compiled circuits. First, an eicient parameter optimizer is Weighted Random Search (WRS, Subsection 2.5). Second, we use exhaustive search to collect training data to obtain the QXX-MLP neural network (Subsection 2.6). The latter is used to estimate optimal parameter values. The WRS method starts from an initial set of parameter values and adapts the values in order to minimize the obtained�����. This procedure forms a feedback loop between WRS and the QXX method. Running WRS multiple times for a set of benchmark circuits is equivalent almost exe tocuting an exhaustive search of the parameter space. The exhaustive search data is collected and used to train QXX-MLP. From a methodological point of view, device variabilities are very important and, moreover, these seem to have a luctuating behavior 37[]. However, in this work we do not consider that NISQ qubits have variable idelities [36], or that crosstalk is a concern during NISQ compilation [22]. We will use the following notation. The irst QCL step computes � ,awher liste � [�] = � refers to circuit qubit� being stored on device register �. Computing the list � is the analogue to determining a good starting point for the gate scheduling procedure. We represent two-qubit gates by the tuples (� , � ). Scheduling executes � � the current two-qubit gate if (� [� ], � [� ]) is an edge of device connectivity graph, which willDEVICE be calle . d � � Otherwise, the gate qubits are moved across the device and stored in registers connected by an an edge from ��� ��� DEVICE. The movement introduces additional gates, such as SWAP gates, in order for all(�tuples , � ) to be � � edges ofDEVICE. The mapping is updated accordingly, as illustrated in Fig. 7. In general, the � depth is of ��� lower bounded by|� |. �� 2.1 The QXX Mapping Heuristic QXX is a fast search algorithm to determine a qubit mapping (allocation), and is called by the subsequent gate scheduler to compute a good qubit allocation/mapping/placement. QXX uses an estimation function to predict how a � with minimum depth would have to be initially mapped. ��� A novelty is that QXX uses a Gaussian-like function�calle ����ℎdto estimate the resulting depth (cost) of the layed out circuit calle � d. The qubit mapping is found using the minimum estimated��value ���ℎ.of ��� QXX uses three types of parameters, which we will explain in the following three sections: (1) for coniguring the search space; (2) for adapting �����ℎ to the circuit � ; �� (3) for adapting QXX to the second step of QCL, namely the scheduler/router of gates. 2.2 Search Space Configuration QXX is a combination of breadth-irst search and beam search. The search space is a tree (cf. Fig. 3). Constructing a qubit-to-register mapping is an iterative approach: qubits are selected one after the other, and so are the registers where the qubits mapped initially. ACM Trans. Quantum Comput. Machine Learning Optimization of uantum Circuit Layouts • 5 Fig. 3. The search space dimensions of QXX can be configured by adapting the parameters MaxDepth (green) and MaxChildren (red). Each node of the search tree stores a list of possible mappings for which a minimum cost was computed. The maximum length of the listMaxChildr is en (e.g., 3 blue levels). The list is emptied if a lower minimum cost is computeMaxDepth d, or if has been achieved. In the later case, the path with the minimum cost (green) is kept, and all other nodes are removed. For example, consider � = {(� , � ), (� , � ), (� , � )} a circuit of three CNOTs and a mapping � = [� , � ]. �� 1 2 2 3 3 4 1 2 This means that� is allocated to register � , and � to � . After the irst qubit was mapped, we hav|�e | = 1. 1 1 2 2 After all qubits were mapped, we hav |�e| = � . The maximum depth of the tree is � . In the worst case, each node has � children. The tree is augmented one step at a time, by adding a new circuit�qubit to the mapping (in the order of their index, in the current version - we do not analyze the inluence of this choice). This increases the tree’s depth: at each existing leaf node, all possible � mappings of � are considered. Consequently, all the new leaves of a tree are the result of appending � to the previous’ level leaves, which now are usual nodes. New depth estimation values are computed using�the ����ℎ function, each time leaves are added to the tree. Each tree node has an associated�����ℎ cost. The level in the tree equals the length of the mapping for which the cost was computed. The search is stopped after computing a complete mapping with the minimum �����ℎ cost. Thus, the maximum number of leaves per node is added in the unlikely case that all values �����of ℎ are equal. The search space will easily explode for large circuits. We introduce two parameters to prune the search space. In Fig. 3, the result of pruning the search space tree is represented by the green path and the green bounding box. The irst parameter is ����ℎ������whose job is to limit the number of children of equal minimum �����ℎ values. For an arbitrary value �of ���ℎ������< � , the tree will include at le �at vel most �· ����ℎ������nodes. The second parameter is the cut-of threshold �������ℎ which speciies that, at levels indexed by multiples of�������ℎ, all the nodes are removed from the tree, except for the ancestors of the minimum cost leaf. This is because for large circuits (e.g. more than 50 qubits) it is not practical to evaluate all the new combinations � of registers for �· ����ℎ������leaves. 2.3 Configuration of Circuit Depth Estimation The �����ℎ function is used to estimate the depth of the compiled circuit without laying it out. The value of�����ℎ can be calculated once at least two qubits were mapped. Equivalently, the value of �����ℎ is computed only for CNOTs whose qubits were mapped. The �����ℎ function is a sum of a Gaussian functions whose goal is to model the importance of CNOT sub-circuits � . fr Inom other words, �����ℎ is the �� sum of the costs estimated for scheduling the CNOTs of a circuit: ACM Trans. Quantum Comput. 6 • Paler et al. � 2 �����ℎ = ����exp −� · − � (2) �=0 where � ≤ |� | In the above formula,� is the number of CNOTs from � whose qubits were already mapped. The � and � � �� parameters control the spread and position of the Gaussian function. The value �is the of index of the CNOT from the resulting circuit�and ���is the cost of moving the qubits of the CNOT. For example, for the circuit � = {(� , � ), (� , � ), (� , � )} �� 1 2 2 3 3 4 and the mapping� = [� , � ], � = 1 the �����ℎ can be computed for only the irst CNOT, because � and � 1 2 � 3 4 were not mapped. . The scheduled CNOTs are indexed, and we assume that all gates are sequential (parallel gates are executed sequentially, considering the circuit truly a gate list). For example, after scheduling two CNOTs of an hypothetical circuit, the two CNOTs are numbered 1 and 2. In the exponent �of ����ℎ the value of is always in the range [0, 1]. For ����we use the shortest distance on an undirected graph. An example is Fig. 7, wher ����e= 5. For two � � qubits�, �, where (� [�], � [�]) is already an edge of DEVICE, the ����= 1. Otherwise, the distance between two qubits is computed based on the edge weights attached to DEVICE the graph. If the intention is to allow CNOTs at the middle of the circuit to have longer movements on the device, we can set parameters to generate a function like in the top right panel of Fig. 5. For example � = 5,and for� = 0.5, the ����from�����ℎ are weighted with almost zero at the start of the circuit (Fig. 4). The opposite situation is illustrated in the middle panel. � =For 0, the Gaussian is efectively a constant function, such that the cost is the sum of all the CNOT distances. Fig. 4. A quantum circuit and a Gaussian function. a) A four qubit quantum circuit consisting of a sequence of CNOTs; b) The ����uses a Gaussian function. The later is drawn superimposed in order to highlight the weight generated by the Gaussian at each CNOT position in the circuit. The weights are minimal for the first and last CNOT. The maximum value is for the CNOT at the middle of the circuit. 2.4 Configuring QXX for the Scheduler QXX is efectively estimating the output of the gate scheduler,sele which cts the best edge ofDEVICE where to execute the CNOTs of a circuit. The scheduler is modeled as a black box and its functionality is unknown. QXX can work with Qiskit’s StochasticSwap, tket or other tools such as the ones ones from [18, 39]. ACM Trans. Quantum Comput. Machine Learning Optimization of uantum Circuit Layouts • 7 Fig. 5. The value of the Gaussian use ��d�by ��ℎ for diferent values of parameters � and � . The horizontal axis illustrates the input value to the Gaussian: the gates’ integer index in the circuit is scaled with the total number of gates in the circuit. The importance (weight) of a gate is highest whenever the value of the Gaussian approaches the maximum value. A gate has low importance when the Gaussian has low values. Starting from the initial placement, QXX estimates the total depth of the circuit after repeatedly mapping the circuit. Without loss of generality, QXX assumes that the scheduler will move both qubits of a CNOT across the DEVICE towards the selected edge. Instead of updating the mapping, the movements of the qubits accumulate are d into anofset variable. Qubit movements is captured by the ��������������parameter. The ��������������is asymmetric, and when processing CNOT qubits, it moves on the DEVICE the qubit with the lowest index by the fraction 1 �������������� −1 , and the qubit with the highest index by . �������������� �������������� As we will show in the Results section, for deep circuits we determine ��������������values closer to 2 (favors high and low index qubits equally), and for shallow circuits the values are higher (favors low indexed qubits). After moving the qubits on the device (Fig. 6), the ofset of a qubit is an estimation of how much the qubit was moved by the scheduler. For example, the ofset of an arbitrary qubit used in 3 CNOTs is the sum of the three movement updates which are obtained after scaling each CNO ��T’s ��with the corresponding �������������� expression. Fig. 6. Device connectivity graph is blue, the weight of all but one edge is 1. For example, the high cost might be due to higher physical error-rates. The movement of qubits on the device is controlled by an additional parameter ��,�calle ����d�: higher values imply a larger estimation of the movement heuristic. Changing the��value �����of �is equivalent to scaling the total value of �����ℎ, because ��������is a common factor in the calculation ���of �. For the weight scale factor��������= 1, as shown in Fig. 7, the minimum distance between the CNOT qubits corresponds to the sum of the edge weights separating them (ive in Fig. 7). ACM Trans. Quantum Comput. 8 • Paler et al. Fig. 7. The efect of the MovementFactor: Device connectivity graph is blue, and the two brown qubits A and B have to be interacted. If not specified, all edges have weight 1. The shortest path between A and B has cost L=5, because there are five edges between A and B. There are multiple options how A and B can be brought together. One of the options is to assume that A and B move to the pink C and D. Assuming that CD is at the middle of the path connecting A and B, then MovementFactor=2 because both A and B are moved on average L/2. A higher movement factor implies that one of the qubits moves less, while the other more. QXX Parameters WRS Obtained Exhaustive Search Name Min Max Increment WRS-Weight Prob. of Change Start Stop Increment MaxDepth 1 55 1 9.35 0.62 1 9 4 MaxChildren 1 55 1 8.00 0.53 1 9 4 B 0 500 0.1 7.76 0.52 0 20 2 C 0 1 0.01 15.06 1.00 0 1 0.25 MovementFactor 1 55 1 3.52 0.23 2 10 4 EdgeCost 0.1 1 0.1 10.59 0.70 0.2 1 0.4 Table 1. Parameter values. The QXX Parameters columns represent the min. and max. value together with the increments be to used with the heuristic.WRS TheObtained columns illustrate the results of the weighted random search method for optimal parameter values. Ranges and increments of QXX’s input parameters. The training of QXX-MLP as well as WRS were performed for a smaller search space whose parameter ranges are presented in the Exhaustive Search columns. 2.5 Weighted Random Search for Parameter Configuration The QXX parameters from Sections 2.2, 2.3, 2.4 have to be tuned for optimal performance of the compilation. We calltrial the evaluation for one set of parameter values. In previous w13 ork ], w [ e introduced the WRS method, a combination of Random Search (RS) and probabilistic greedy heuristic. Instead of a blind RS search, the WRS method uses information from previous trials to guide the search process toward next interesting trials. We use WRS to optimize the following QXX parameters (introduced in Subsection �����2.1): ��ℎ, ����ℎ������, �, � , �������������,�and ��������(see Table 1). Within the same number of trials, diferent optimization methods achieve diferent scores, depending on how łsmartž they are. Due to the nature of QXX, the trial execution times are variable, since it depends on the deined quantum architecture, the topology of the circuit to lay out, as well as the values of the parameters (cf. Fig. 3). Search space reduction and search strategy are inter-connected. For the exhaustive search parameter ranges from Table 1, we limit the time spent evaluating a parameter coniguration by intrtime oducing out parameter a . WRS uses the obtained data to run an instance of fANOVA [15]. 2.6 Learning QXX - Training QXX-MLP The previous section was about searching parameter values. Herein, we go one step further and learn the behavior of the gate scheduler in relation to the mapper parameters and�the ����ℎ function used by QXX. We describe https://github.com/aclorea/goptim ACM Trans. Quantum Comput. Machine Learning Optimization of uantum Circuit Layouts • 9 the method to learn speciic tuples consisting of circuit and QXX parameters, and how to estimate the depth of the mapped circuits. Parameter optimization, and for that reason, eicient initial mapping of the circuit to the device is a regression problem. The training data for learning was obtained as follows: for the parameters from Table 1, we choose smaller ranges and larger increments as illustrated in Exhaustiv the e Search columns. There are three possible values for �������ℎ and three values for����ℎ������. For a given value of �������ℎ, there are 3 × 11 × 5 × 3 × 3 = 1485 possible parameter conigurations. There is a total×of 1485 90 = 133650 layouts for each�������ℎ. Overall, there are 400950 parameter conigurations of the 90 circuits that are being evaluated. Table 1 illustrates the parameter ranges for which we collected data that allows us to compute the ratio �����(� ,� ) for every combination of circuit layout and QXX parameters. The generated exhaustive search �� ��� data is available in the project’s online repository. We considered three candidate models to learn QXX: k Nearest Neighbors (KNN), Random Forest (RF) and MultiLayer Perceptron (MLP). Each model has diferent inductive21bias ], being [ respectively: a localśbased predictor, an ensemble model built by bootstrapping, and a connectionist model, respectively. Despite of their conceptual simplicity, KNN predictors are easily interpretable and passed the test of time [38]. RFs were found as the best models for classiication problems 12], and [ we wanted to investigate their performance on this regression problem as well. Finally, MLPs creates new features through nonlinear input feature transformations, unlike KNN and RF which use raw input attributes. Nonlinear transformations and non-local character of MLPs are considered the premises for the successful deep learning movement [5]. We found MLP as the best model and use it during the parameter optimization stage (as illustrated in Fig. 1) as an approximator for the functionality of QXX. More details are in the Appendix. 3 RESULTS We present empirically obtained results about: 1) the performance of QXX; 2) the quality of the QXX-MLP model; 3) the performance of optimising QXX parameters with WRS. We select QXX parameter values using: a) exhaustive search; b) WRS with the QXX method (orange in Fig. 1) , and c) WRS on the QXX-MLP (green in Fig. 1). The QXX method was implemented and is available online . The current implementation is agnostic of the underlying quantum circuit design framework (e.g. Cirq or Qiskit). QXX is implemented in Python. The exhaustive search was executed on an i7 7700K machine with 32GB of RAM. The QXX model was trained on Intel Xeon W-2145 3.70GHz with 16 cores and 256 GB RAM. The WRS parameter optimization was performed on a laptop grade i5 processor with 16 GB RAM. For the benchmarks and comparisons we used Cirq 0.9, IBM Qiskit 0.25 and tket 0.2. In the following, we describe how the WRS heuristic is used to evaluate the QXX method and its MLP implementation. Afterwards, we present a series of plots that support empirically the performance of QXX. We analyze the inluence of the parameters, and ofer strong evidence in favor of learning quantum circuit layout methods. In particular, we will show that�the ����ℎ Gaussian can shorten the time of necessary to run QXX. This is achieved by pruning the search space and focusing on the most important region of the circuit. We will present examples for how the Gaussian is automatically adapted for deep and shallow circuits. 3.1 Benchmark Circuits We evaluate the�����itness of the QXX method using the QUEKO benchmark suite 35].[These circuits abstract Tofoli based and quantum supremacy like circuits, as well as a variety of NISQ chip layouts. Such benchmarks complement the libraries of reversible adders and quantum algorithms [17]. https://github.com/alexandrupaler/qxx ACM Trans. Quantum Comput. 10 • Paler et al. Fig. 8. Comparison with state of the art methods using QUEKO TFL benchmark circuits. All curves except QXX35 ar].e from [ The curves are computed ater averaging the depth ratios for 10 circuits for each known optimal depth from the benchmark. Horizontal axis is the known optimal depth of the TFL circuits. The vertical axis illustrates the achiev�e�� d �depth �. We have used QXX together with the Qiskit scheduler, and we assume that this is the reason why the QXX curve is close to the Qiskit curve. For shallow circuits the mapper is more important than the scheduler (QXX is beter than Qiskit), and for deeper circuits the importance of the mapper vanishes (QXX and Qiskit perform close to each other). QXX is not a scheduler, but a mapper (cf. Section 2.4). Automatic compilation/mapping of circuits, although applicable to NISQ applications, e.g. Fig 9, is of little practical importance by itself when one wishes fault-tolerance to be taken into account. NISQ circuits, such as for supremacy or for VQE/QAOA, are co-designed and a method like ours is just one piece in a much larger worklow which. We focused on Tofoli+H circuits, QUEKO TFL, because such circuits are not co-design and are also very representative for large scale error-corrected computations. The 90 QUEKO TFL circuits include circuits with known optimal [depths 5, 10, 15of , 20, 25, 30, 35, 40, 45] (meaning that the input circuit and the layout circuit have equal |� depths | = |� |). For each depth value ��� �� there are 10 circuits with 16 qubits. The NISQ machine to map the circuits to is Rigetti Aspen. A perfect QCL method will achie �v�� e ��= 1 on the QUEKO benchmarks. In our experimental setup, the layout procedure uses: 1) QXX for the initial placement (irst QCL step) and 2) the Qiskit StochasticSwap gate scheduling (second QCL step). The results depend on both the initial placement as well as the performance of the StochasticSwap scheduler. We do not conigure the latter and use the same randomization seed. We assume that this is the reason why the QXX performance is close to the Qiskit one. 3.2 Resulting Depths and Scalability Our goal is to show that the behavior of the mapper/compiler can be learned, and that compilation can be sped up using machine learning. QXX achie ��� ves ��values around 30% lower (which means better ś best�����is 1) than Qiskit on the low depth (up to 15 gates) QUEKO TFL circuits. In general, as shown in Fig. 8, the performance of QXX is between Qiskit and tket 33].[tket outperforms Qiskit, and QXX too, on the QUEKO benchmarks, because it has a much smarter scheduler. QXX is not a scheduler, but a mapper (cf. Section 2.4). The results from Fig. 8 are encouraging, because: a) QXX performs better than most compilers; b) there is known variability in the compiler’s performance with respect to benchmark circuits, such that for other circuits ACM Trans. Quantum Comput. Machine Learning Optimization of uantum Circuit Layouts • 11 Search Space Params. Timeout MaxDepth MaxChildren 0.05s 0.5s 5s 20s 1 1 25 0 0 0 1 5 43 0 0 0 1 9 36 3 0 0 5 1 25 0 0 0 5 5 23899 0 0 0 5 9 41014 2457 0 0 9 1 46 0 0 0 9 5 44550 37365 2501 172 9 9 44550 44494 36288 28798 Table 2. The number of WRS timeouts (0.05s, 0.5s, 5s, 20s) is a measure of QCL execution time. We count the number of timeouts when using diferent search space pruning strategies configur ����e�d�by �ℎ and ����ℎ������. the classiication might look completely diferent; c) our results were obtained very fast when using the MLP approach ś we get almost instantaneously the layed out circuit (Section 3.4). −2 For NISQ gate error rates (realistically) upper bounded by ,10 only circuits with a maximum depth of 30 are of practical importance. For shallow circuits the mapper is more important than the scheduler (QXX is better than Qiskit), and for deeper circuits the importance of the mapper vanishes (QXX and Qiskit perform close to each other). Fig. 10 presents results with parameters chosen speciically for shallow circuits. With respect to the scalability of our methods, one interesting aspect are the timeouts. For extreme parameter values (e.g. when����ℎ������and �������ℎ are 9) the execution time of QXX is high although the method has polynomial complexity. We introduced a timeout of 20 seconds for the QXX executions and collected data accordingly. Table 2 illustrates the increasing execution times, it ofers the motivation to learn the method ś the model will have constant execution time irrespective of the parameter value coniguration. Additionally, we notice that computing a good initial placement takes, in the best case, a small fraction of total time spent laying out. Without considering the optimality of the generated circuits, in the worst case, computing the initial placement using QXX can take between 2% and 99% of the total layout time. For more details see Table 3 and Section 3.4. For �������ℎ = 1 the maximum time fraction is 10% ��, for �����ℎ = 5 the maximum is 85%, while for �������ℎ = 9 it is 99%. These values are also in accordance with the execution times presented in Table 2. However, when considering the 100 fastest QXX execution times, for each ��of ���the ��ℎ values, the maximum mapping duration is 4% of the total layout duration. 3.3 QXX parameter optimization using WRS Initially we ran WRS for a total of 1500 trials within the parameters space deined in�T�able ����1.�ℎFor , ����ℎ������and ��������������we use the same limits and steps deined in the table�.,For � , and �������� we generated the values by drawing from a uniform distribution in the speciied range. We ran the classical RS step for 550 trials and computed the weight (importance) of each of the parameters using fANOVA obtaining the values from Table 1. The weight of a parameter measures its importance for the optimization of the itness function. We use the ���� (�����) = ���� (|� |/|� |) itness measure. WRS ran with eight workers for a total time of ��� �� four hours and 11 minutes and the best result (3.99) was obtained at iteration 1391 and again, at a later trial, for a diferent value of �������ℎ. Table 3 shows the combination of parameter values that yield the best results. The subsequent WRS executions use the exhaustive search parameter space deined in Table 1. In this parameter space we use WRS to optimize several conigurations for which we have changed either the evaluation timeout ACM Trans. Quantum Comput. 12 • Paler et al. TFL-45 TFL-25 MLP Name 20s 5s 0.5s 0.05s 20s 5s 0.5s 0.05s MaxDepth 9 9 9 6 8 9 9 3 9 MaxChild 4 3 2 2 4 3 2 2 9 B 5 17.3 8 3.5 6.9 15.7 6.10 2 1.5 C 0.61 0.25 0.02 0.31 0.86 0.91 0.65 0.74 0.32 Mov.Factor 2 4 2 6 6 10 6 7 10 EdgeCost 0.2 0.2 0.2 0.9 0.2 0.2 1 0.2 0.8 Duration(s)35170 28800 21785 11438 10583 8110 6497 4102 ~2 Avg. Ratio 4.093 4.138 4.328 4.465 4.043 3.966 4.279 4.537 4.423 Table 3. Parameter values obtained using WRS on the batch of 90 QUEKO circuits. The TFL-45 and TFL-25 are for QXX and WRS on the the TFL circuits of depths 25 and 45. The MLP column is for when using QXX-MLP with WRS. The last two rows represent the durations and the recorded average Ratios. The MLP approach is very fast and takes approx. 2 seconds. with values from Table 2 or the maximum TFL Depth (either 25 of 45). Interestingly, the 5 seconds timeout, achieves a better performance than the WRS parameter optimization with 20 seconds timeout (cf. large number of timeouts in Table 2). In general, WRS selects high �������ℎ values. This is explainable by the fact that optimal �����values are easier to be found using large search spaces. 3.4 QXX-MLP using WRS: Fast and scalable QCL One of the research questions was if it is feasible to learn the QCL methods in general, and QXX in particular. Moreover, we answer the question łWould it be possible to choose optimal values by a rule of thumb instead of searching for them?ž. For example, is it possible to obtain ��� go�o�in d a timely manner by using�lo ��w����ℎ values? During our experiments, the found parameters did not difer signiicantly between QXX and QXX-MLP. The results from Fig. 10 and Table 3 show that the MLP model of QXX performs well ś within 10% performance decrease compared to the normal QXX. We used a combination of WRS and QXX-MLP in an attempt to minimize the time required to identify an optimal coniguration. QXX-MLP and WRS are almost instantaneous: under 2 seconds for the entire batch of 90 circuits. In contrast, the execution time of WRS using the normal QXX was about 3 hours to ind the optimal parameters for the 90 circuits. A detailed analysis of the speed/optimality tradeofs available in the Appendix (e.g. Fig. 17). We also tested thetransfer of learning: the possibility of training QXX on a set of circuits, and then applying to a diferent type of circuits. Figure 9 illustrates the results of applying QXX-MLP on quantum supremacy circuits (QSE)[34]. The performance of QXX-MLP is more than encouraging: it had the performance of a timedout WRS op- timization. The WRS parameter optimization did not timeout, because MLP inference is a very fast constant time operation. Table 3 shows that QXX-MLP performs similarly to the WRS with a 0.05s timeout. For example, when applied to QXX-MLP, the values chosen by WRS for ��������and ��������������are consistent with the exhaustive search results (cf. Fig. 20 in Appendix): in general, �����an ���= 0.2 is preferable for all the TFL-depth values, and for��������������> 2 is preferable. The preference for large movement factors is obvious for the shallow TFL circuits. Regarding QCL speed: The WRS and QXX-MLP optimization takes a few seconds compared to the hours (cf. Table 3 for laying out all the circuits from Section 3.1) necessary for WRS and QXX. This is a great advantage that comes on the cost of obtaining a trained MLP, which is roughly the same order of magnitude to a WRS parameter ACM Trans. Quantum Comput. Machine Learning Optimization of uantum Circuit Layouts • 13 Fig. 9. Transfer of learning: Laying out QUEKO QSE (supremacy experiment) circuits using parameters learned from QUEKO TFL circuits. Each parameter evaluation executed by WRS was timed out ater 5, 50, 5001and /1002000 �ℎ seconds (10 milliseconds). optimization. MLP training is performed only once. In the case where the MLP model is not used WRS parameter optimization has to be repeated. In a setting where quantum circuits are permanently layed out and executed (like in quantum computing clouds) incremental online learning is a feasible option. The parameter optimization of QXX-MLP seems to be very predictable, because the corresponding curves in Fig. 10 are almost lat. Table 3 provides evidence that the MLP is very conservative with the choice of the values of� and � : The Gauss curve is almost lat, and positioned slightly towards the beginning of the circuit. The latness of the curve and its position could explain the almost constant performance from Fig. 10 and the 10% average performance degradation of QXX-MLP. 3.5 Automatic Subcircuit Selection Our results support the thesis that WRS can adapt the parameters of the QXX Gauss bell curve in order to select the region of the circuit that inluences the total cost of laying it out. The parameter controlling the center of the bell curve is relevant with respect to resulting layout optimality, as well the speed of the layout method (cf. Figs. 19 in Appendix for more details). Fig. 11 is a comparison of the Gauss curves obtained for the diferent TFL circuit depths. It is surprising that values � of > 0 are found, given that these assume an upfront cost of re-routing some early gates. For shallow circuits, WRS prefers � values close to the end of the benchmark circuits, meaning that the last gates are more important than the others. This is in accordance with Fig. 12 where the green curve (�������ℎ = 9) is over the orange curve�(������ℎ = 1) for large values of � > 0.75 in the range of TFL depths from 5 to 30. For deeper circuits, WRS sets the center � of the Gaussian to be closer to the beginning of the circuit. This is in accordance with Fig. 12 where the vertical distance between the green and orange curves is maximum for � < 0.25. Fig. 12 answers the question: Considering the diferent values ����of ���ℎ, where should the Gauss bell be placed relative to the start of the compiled circuit? Intuitively, this means to answer the question: Are the irst gates more important than the last ones, or vice versa? Increasing values � inluence of the performance of QXX ACM Trans. Quantum Comput. 14 • Paler et al. Fig. 10. QXX-MLP achieves approximately 90% of the QXX performance when laying out TFL circuits with depths up to 25 ś the most compatible with current NISQ devices. WRS was applied on: 1) the normal QXX and timing out too long evaluations; 2) the QXX-MLP model. Fig. 11. Two Gaussian curves obtained using WRS: let) on TFL-45 circuits with timeout 5 seconds; right) on TFL-25 circuits with timeout 5 seconds. (cf. Table 3) with decreasing �������ℎ ś the orange ( �������ℎ = 1) and green (�������ℎ = 9) curves swap positions along the vertical axis with the intersection between them being around TFl depth 30. 4 CONCLUSIONS Scalable, conigurable and fast QCL methods are an imperative necessity. In the context of quantum computing clouds, continuous learning is a real possibility, because a large batch of circuits is permanently sent and executed on mainframe like machines. It is feasible to consider machine learning QCL methods for fast and accurate QCL. We introduced QXX, a novel and parameterized QCL method. The QXX method uses a Gaussian function whose parameters determine the circuit region that inluences most of the layout cost. The optimality of QXX is evaluated on the QUEKO benchmark circuits using�the ����function which expresses the factor by which the ACM Trans. Quantum Comput. Machine Learning Optimization of uantum Circuit Layouts • 15 Fig. 12. The�����(the number of times when a parameter value was counted in the best 100 parameter combinations obtained for QXX) values to assess the influence of the Gauss � parameter center on the�����. The Gauss curve is automatically adapted to circuit depth. The horizontal axis in the plots represent the diferent types of QUEKO TFL circuits with known depths. The vertical axis is the ����� value for the corresponding circuit types. There are two curves, the brown one for �������ℎ = 1 and the green one for�������ℎ = 9. For circuits with depths up 30,to the green�������ℎ = 9 performs beter than brown�������ℎ = 1 (lower ����� value) when the Gauss curve is positioned to the� right = 1.0 (represents the end of the circuit) ś the last gates are more important. For circuits of depths larger 30, the than opposite is true ś the first gates are more important. The function �����(� = �,|� |, �������ℎ) is the number of times when a parameter � � �� bounded to � was counted in best 100 parameter combinations obtained for QXX executed for a particular ��value �����ℎof and for QUEKO TFL circuits � of depth|� |. More details in the Appendix. � �� � �� number of gates in the layed out circuit has increased. We illustrate the utility of QXX and its employed Gaussian. We show that the best results are achieved when the bell curve is non-trivially conigured. QXX parameters are optimized using weighted random search (WRS). To increase the speed of the parameter search we train an MLP that learns QXX, and apply WRS on the resulting QXX-MLP. To crosscheck the quality of the WRS optimization and of the MLP model. This work brought empirical evidence that: 1) the performance of QXX (resulting ��� depth ��and speed) is on par with state of the art QCL methods; 2) it is possible to learn the QXX method parameters values and the performance degradation is an acceptable trade-of with respect to achieved speed-up compared to WRS (which per se is orders of magnitude faster than exhaustive search); 3) WRS is inding parameters values which are in accordance with the very expensive exhaustive search. We conjecture that, in general, new cost models are necessary to improve the performance of QCL methods. Using the Gaussian function, we conirmed the observation that the cost of compiling deep circuits is determined only by some of the gates (either at the start or the end of the circuit). From this perspective the Gaussian function worked as a simplistic feature extraction. Future work will focus on more complex techniques to extract features to drive the QCL method. ACM Trans. Quantum Comput. 16 • Paler et al. ACKNOWLEDGMENTS AP was supported by Google Faculty Research Awards and the project NUQAT funded by Transilvania University of Braşov. We are grateful to Bochen Tan for his feedback on a irst version of this manuscript, explaining the QUEKO benchmarks and ofering the scripts to generate and plot the presented results. REFERENCES [1] Razvan Andonie. 2019. Hyperparameter optimization in learning systems. J. Membr. Comput. 1, 4 (2019), 279ś291. https://doi.org/10. 1007/s41965-019-00023-0 [2] Razvan Andonie and Adrian-Catalin Florea. 2020. Weighted Random Search for CNN Hyperparameter Optimization. Int. J. Comput. Commun. Control 15, 2 (2020). https://doi.org/10.15837/ijccc.2020.2.3868 [3] Pablo Andrés-Martínez and Chris Heunen. 2019. Automated distribution of quantum circuits via hypergraph partitioning. Physical Review A 100, 3 (2019), 032308. [4] Adriano Barenco, Charles H Bennett, Richard Cleve, David P DiVincenzo, Norman Margolus, Peter Shor, Tycho Sleator, John A Smolin, and Harald Weinfurter. 1995. Elementary gates for quantum computation. Physical review A 52, 5 (1995), 3457. [5] Yoshua Bengio. 2009. Learning Deep Architectures forFound. AI. Trends Mach. Learn. 2, 1 (Jan. 2009), 1ś127. https://doi.org/10.1561/ [6] James Bergstra, Rémi Bardenet, Yoshua Bengio, and Balázs Kégl. 2011. Algorithms for Hyper-Parameter Optimization.. NIPS, John In Shawe-Taylor, Richard S. Zemel, Peter L. Bartlett, Fernando C. N. Pereira, and Kilian Q. Weinberger (Eds.). 2546ś2554. http://dblp.uni- trier.de/db/conf/nips/nips2011.html [7] Adi Botea, Akihiro Kishimoto, and Radu Marinescu. 2018. On the complexity of quantum circuit compilation. Eleventh Annual In Symposium on Combinatorial Search. [8] Rui Chao and Ben W Reichardt. 2018. Quantum error correction with only two extra qubits. Physical review letters 121, 5 (2018), 050502. [9] Andrew M Childs, Eddie Schoute, and Cem M Unsal. 2019. Circuit Transformations for Quantum Archite14th cturConfer es. In ence on the Theory of Quantum Computation, Communication and Cryptography. [10] Ross Duncan, Aleks Kissinger, Simon Perdrix, and John Van De Wetering. 2020. Graph-theoretic Simpliication of Quantum Circuits with the ZX-calculus. Quantum 4 (2020), 279. [11] Suguru Endo, Simon C Benjamin, and Ying Li. 2018. Practical quantum error mitigation for near-future applications. Physical Review X 8, 3 (2018), 031027. [12] Manuel Fernández-Delgado, Eva Cernadas, Senén Barro, and Dinani Amorim. 2014. Do We Need Hundreds of Classiiers to Solve Real World Classiication Problems? J. Mach. Learn. Res. 15, 1 (Jan. 2014), 3133ś3181. [13] Adrian-Catalin Florea and Razvan Andonie. 2019. Weighted Random Search for Hyperparameter Optimization. Int. J. Comput. Commun. Control 14, 2 (2019), 154ś169. https://doi.org/10.15837/ijccc.2019.2.3514 [14] AG Fowler, SJ Devitt, and LCL Hollenberg. 2004. Implementation of Shor’s Algorithm on a Linear Nearest Neighbour Qubit Array. Quantum Inf. Comput.4, quant-ph/0402196 (2004), 237ś251. [15] Frank Hutter, Holger Hoos, and Kevin Leyton-Brown. 2014. An Eicient Approach for Assessing Hyperparameter Importance. In Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32(Beijing, China) (ICML’14). JMLR.org, Iś754śIś762. [16] Wim Lavrijsen, Ana Tudor, Juliane Müller, Costin Iancu, and Wibe de Jong. 2020. Classical Optimizers for Noisy Intermediate-Scale Quantum Devices.arXiv preprint arXiv:2004.03004 (2020). [17] Ang Li and Sriram Krishnamoorthy. 2020. QASMBench: A Low-level QASM Benchmark Suite for NISQ Evaluation and Simulation. arXiv preprint arXiv:2005.13018 (2020). [18] Gushu Li, Yufei Ding, and Yuan Xie. 2019. Tackling the qubit mapping problem for NISQ-era quantumPrde oce vices. edingsInof the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems . 1001ś1014. [19] Dmitri Maslov, Gerhard W Dueck, D Michael Miller, and Camille Negrevergne. 2008. Quantum circuit simpliication and level compaction. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems27, 3 (2008), 436ś444. [20] Jarrod R McClean, Sergio Boixo, Vadim N Smelyanskiy, Ryan Babbush, and Hartmut Neven. 2018. Barren plateaus in quantum neural network training landscapes. Nature communications 9, 1 (2018), 1ś6. [21] Tom M. Mitchell. 1980. The Need for Biases in Learning Generalizations. Technical Report. Rutgers University, New Brunswick, NJ. http://dml.cs.byu.edu/~cgc/docs/mldm_tools/Reading/Need%20for%20Bias.pdf [22] Prakash Murali, David C McKay, Margaret Martonosi, and Ali Javadi-Abhari. 2020. Software mitigation of crosstalk on noisy intermediate- scale quantum computers. InProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems. 1001ś1016. ACM Trans. Quantum Comput. Machine Learning Optimization of uantum Circuit Layouts • 17 [23] Beatrice Nash, Vlad Gheorghiu, and Michele Mosca. 2020. Quantum circuit optimizations for NISQ arQuantum chitecturScience es. and Technology 5, 2 (2020), 025010. [24] Shin Nishio, Yulu Pan, Takahiko Satoh, Hideharu Amano, and Rodney Van Meter. 2019. Extracting success from ibm’s 20-qubit machines using error-aware compilation. arXiv preprint arXiv:1903.10963 (2019). [25] Alexandru Paler. 2019. On the inluence of initial qubit placement during NISQ circuit compilation. International WIn orkshop on Quantum Technology and Optimization Problems. Springer, 207ś217. [26] Sam Pallister. 2020. A Jordan-Wigner gadget that reduces T count by more than 6x for quantum chemistry applications. arXiv preprint arXiv:2004.05117 (2020). [27] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825ś2830. [28] Matteo G Pozzi, Steven J Herbert, Akash Sengupta, and Robert D Mullins. 2020. Using reinforcement learning to perform qubit routing in quantum compilers. arXiv preprint arXiv:2007.15957 (2020). [29] John Preskill. 2018. Quantum Computing in the NISQ era and beyQuantum ond. 2 (2018), 79. [30] Mehdi Saeedi and Igor L Markov. 2013. Synthesis and optimization of reversible circuitsÐa ACM sur Computing vey. Surveys (CSUR) 45, 2 (2013), 1ś34. [31] Marcos Yukio Siraichi, Vinícius Fernandes dos Santos, Caroline Collange, and Fernando Magno Quintão Pereira. 2019. Qubit allocation as a combination of subgraph isomorphism and token swapping. Proceedings of the ACM on Programming Languages3, OOPSLA (2019), 1ś29. [32] Marcos Yukio Siraichi, Vinícius Fernandes dos Santos, Sylvain Collange, and Fernando Magno Quintão Pereira. 2018. Qubit allocation. In Proceedings of the 2018 International Symposium on Code Generation and Optimization. 113ś125. [33] Seyon Sivarajah, Silas Dilkes, Alexander Cowtan, Will Simmons, Alec Edgington, and Ross Duncan. 2020. t| ket>: A retargetable compiler for NISQ devices.Quantum Science and Technology (2020). [34] Bochen Tan and Jason Cong. 2020. Optimal layout synthesis for quantum computing. 2020InIEEE/ACM International Conference On Computer Aided Design (ICCAD). IEEE, 1ś9. [35] B. Tan and J. Cong. 2020. Optimality Study of Existing Quantum Computing Layout Synthesis IEEE ToT ols. rans. Comput. early access (2020), 1ś1. [36] Swamit S Tannu and Moinuddin K Qureshi. 2019. Not all qubits are created equal: a case for variability-aware policies for NISQ-era quantum computers. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. 987ś999. [37] Ellis Wilson, Sudhakar Singh, and Frank Mueller. 2020. Just-in-time Quantum Circuit Transpilation arXiv Reduces preprint Noise. arXiv:2005.12820 (2020). [38] Xindong Wu, Vipin Kumar, J. Ross Quinlan, Joydeep Ghosh, Qiang Yang, Hiroshi Motoda, Geofrey J. McLachlan, Angus Ng, Bing Liu, Philip S. Yu, Zhi-Hua Zhou, Michael Steinbach, David J. Hand, and Dan Steinberg. 2007. Top 10 Algorithms in Kno Datawl.Mining. Inf. Syst. 14, 1 (Dec. 2007), 1ś37. https://doi.org/10.1007/s10115-007-0114-2 [39] Chi Zhang, Ari B Hayes, Longfei Qiu, Yuwei Jin, Yanhao Chen, and Eddy Z Zhang. 2021. Time-optimal QubitPrmapping. oceedings of In the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems . 360ś374. [40] Yuan-Hang Zhang, Pei-Lin Zheng, Yi Zhang, and Dong-Ling Deng. 2020. Topological Quantum Compiling with Reinforcement Learning. Physical Review Letters 125, 17 (2020), 170501. [41] Alwin Zulehner, Alexandru Paler, and Robert Wille. 2018. An eicient methodology for mapping quantum circuits to the IBM QX architectures.IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems38, 7 (2018), 1226ś1236. A BACKGROUND From an abstract point of view, register connectivity is encoded as a graph. In the simplest form possible, the graph edges are not weighted. The graph edges are unique tuples of circuit registers (� , � ) � � For example, in Fig. 13 the register connectivity of the�cir is cuit the red graph. The unique tuples of device �� registers are the edges of the device graph. For example, in Fig. 13, the device register connectivity is the blue graph. Fig. 4a includes a quantum circuit example: the qubits are represented by horizontal wires, the two qubit gates are vertical lines, the control qubit is marked •with , and the a target with⊕. The red graph from Fig. 13 is obtained by replacing all wires from Fig. 4 with vertices and the CNOTs with edges. ACM Trans. Quantum Comput. 18 • Paler et al. The aim of parameter optimization is to ind the parameters of a given model that return the best performance of an objective function evaluated on a validation set. In simple terms, we want to ind the model parameters that yield the best score on the validation set metric. Fig. 13. The quantum circuit (red) has to be executed on the quantum device (blue). The circuit uses four qubits (vertices marked with C) and the hardware has four registers (blue vertices), too. The circuit assumes that operations can be performed between arbitrary pairs of registers (the edges connecting the registers). The device supports operations only between a reduced set of register pairs. In machine learning, we usually distinguish between the training parameters, which are adapted during the training phase, and the hyperparameters (or meta-parameters), which have to be speciied before the learning phase [1]. In our case, since we do not train (adjust) inner parameters on speciic training sets, we have only hyperparameters, which we will simply call parameters here . Parameter optimization may include a budgeting choice of how many CPU cycles are to be spent on parameter exploration, and how many CPU cycles are to be spent evaluating each parameter choice. Finding the łbestž parameter coniguration for a model is generally very time consuming. There are two inherent causes of this ineiciency: one is related to the search space, which can be a discrete domain. In its most general form, discrete optimization is NP-complete. The second cause is the evaluation of the objective function can be also expensive. We call this evaluation for one set of parameter values trial. a There are several recent attempts to optimize the parameters of quantum circuits. Machine learning optimizers tuned for usage on NISQ devices were recently reviewed by Lavrijsen et al [16]. Several state-of-the-art gradient- free optimizers were compared, capable of handling noisy, black-box, cost functions and stress-test them using a quantum circuit simulation environment with noise injection capabilities on individual gates. Their results indicate that speciically tuned optimizers are essential to obtaining valid results on quantum hardware. Parameter optimizers have a range of applications in quantum computing, including the Variational Quantum Eigensolver and Quantum Approximate Optimization algorithms. However, this approach has the same weaknesses like classical optimization ś global optima are exponentially diicult to achieve [20]. Currently, the most common parameter optimization approaches ar 1,e2,[ 6, 13]: Grid Search, Random Search, derivative-free optimization (Nelder-Mead, Simulated Annealing, Evolutionary Algorithms, Particle Swarm Optimization), and Bayesian optimization (Gaussian Processes, Random Forest Regressions, Tree Parzen Estimators, etc.). Many software libraries are dedicated to parameter optimization, or have parameter optimization capabilities: BayesianOptimization, Hyperopt-sklearn, Spearmint, Optunity 1, 13]. Cloud , etc [ based highly integrated parameter optimizers are ofered by companies like Google (Google Cloud AutoML), Microsoft (Azure ML), and Amazon (SageMaker). B TECHNICAL DETAILS B.1 Weighted Random Search for parameter optimization There are two computational complexity aspects which have to be addressed in order to ind good QXX parameters: a) Reduce the search space and implicitly the number of trials; b) Re and duce the execution time of each trial. In general, the performances of a parameter optimizer is determined by [1, 2]: ACM Trans. Quantum Comput. Machine Learning Optimization of uantum Circuit Layouts • 19 • F1. The execution time of each trial. • F2. The total number of trials ś search space size. • F3. The performance of the search. Search space reduction (F2) and search strategy (F3) are inter-connected and can be addressed in a sequence: F2 is a quantitative criterion (how many). For instance, we can irst reduce the number of parameters (F2), and create this way more lexibility in the following stage for F3. In this work, we do not reduce the number of parameters. F3 is a qualitative criterion (how łsmartž). For instance, (F3), we can irst rank and weight the parameters based on the functional analysis of the variance of the objective function, and then reduce the number of trials (F2) by giving more chances to the more promising trials. There is a trade-of between F2 and F3. To address these issues and reduce the search space, we use the following standard techniques: • Instance selection: reduce the dataset based on statistical sampling (relates to F1). • Feature selection (relates to F1). • Parameter selection: select the most important parameters for optimization (relates to F2 & F3). • Parameter ranking: detect which parameters are more important for the model optimization and weight them (relates to F3 & F2). • Use additional objective functions:number of operations, optimization time, etc. (relates to F3 & F2). On average, the WRS method WRS converges faster than RS [13]. WRS outperformed several state-of-the art optimization methods: RS, Nelder-Mead, Particle Swarm Optimization, Sobol Sequences, Bayesian Optimization, and Tree-structured Parzen Estimator [2]. In the RS approach, parameter optimization translates into the optimization of an objectiv � eof function � variables by generating random values for its parameters and evaluating the function for each of these 6]. values [ The function computes some quality measure or score of the model (e.g., accuracy), and the variables correspond to the parameters. The assumption is to maximize � by executing a given number of trials. Focusing on factor F3, the idea behind the WRS algorithm is that a subset of candidate values that already produced a good result has the potential, in combination with new values for the remaining dimensions, to lead to better values of the objective function. Instead of always generating new values (like in RS), the WRS algorithm uses for a certain number of parameters the so far best obtained values. The exact number of parameters that actually change at each iteration is controlled by the probabilities of change assigned to each parameter. WRS attempts to have a good coverage of the variation of the objective function and determines the parameter importance (the weight) by computing the variation of the objective function. B.2 Training QXX-MLP For the training and validation stages, we had 12 input features: circuit features (extracted using the Python 3 4 5 package networkx) ś ��� _����_���� , ��_����_���� , �����, �����, �� � ������, � �������ś merged with QXX’s parameters ś �������ℎ, ����ℎ������, �, � , �������������,���������. We have chosen PageRank, Smetric and the other metrics, in order to capture as much information about the circuits. The more features used for learning, the better the trained model is. The value to be predicted by the models was ����the �between the depth of the known optimal circuit and resulting circuit. The performance of KNN, RF and MLP was assessed through tenfold cross validation (CV) over the whole dataset. In tenfold CV the available dataset resulted in exhaustive search is split in 10 folds. Each fold is used in Ranking nodes based on the structure of the incoming links. The eiciency of a pair of nodes in a graph is the multiplicative inverse of the shortest path distance between the nodes. The sum of the node degree products for every graph edge. ACM Trans. Quantum Comput. 20 • Paler et al. Model Hyperparameter Values KNN Neighbors sought {2, . . . , 8} KNN Minkowski metric’s � {1, 2} MLP Hidden layer’s size {3, 10, 20, 50, 100} MLP Activation function ReLU, tanh RF Maximum depth of a tree {2, 3, 4, 5, ����} RF Number of trees {2, 5, 10, 20} Table 4. Hyperparameter names and candidate values turn as a validation subset, and the other nine folds are used for training. Finally, the ten performance scores obtained on the validation subsets are averaged and used as an estimation of the model’s performance. For each of the ten train/validation splits, the optimal values of their speciic hyperparameters were sought via grid search, using ivefold CV; the metric to be optimized was mean squared error. The models’ speciic hyperparameters and candidate values are given in Table 4. As both KNN and MLP are sensitive to the scales of the input data, we used a scaler to learn the ranges of input values from the train subsets; the learned ranges were subsequently used to scale the values on both train and validation subsets. We used the reference implementations from scikitślearn 27[], version 0.22.1. Excepting the hyperparameters in Table 4, the hyperparameters of all other models are kept to their defaults. The lowest average values for mean squared error were obtained by RF, closely followed by MLP and KNN. From RF and MLP we preferred the latter due to its higher inference speed and smaller memory footprint. The inal MLP model was prepared by doing a inal grid search for the optimal hyperparameters from Table 4, choosing the best model through ivefold CV. The resulted networks look as follows: the input layer has 12 nodes, fully connected with the (only) hidden layer which hosts 100 neurons; furthermore, this layer is fully connected with the output neuron. ReLU and identity were used as activation functions for the hidden and output layers, respectively. For the hidden layer, both the number of neurons and the activation function were optimized through grid search. C EVALUATION We use the exhaustive search raw data and introduce metrics to evaluate the parameter importanc ��e�of ��ℎ. The function has six parameters (see Section 2.1), and we analyzed individual their importance using WRS (see Subsection 2.5). For example, Table 1 lists the importances (weights) of the individual QXX parameters. These weights were computed under the strong (naïve) independence assumption between the parameters. Usually, parameters are statistically correlated, and we prefer a iner grained understanding of the QXX’s performance. To compare how parameter pair inluence the �����function, we introduce two metrics calle ����� d and ����. To compute these metrics we execute the exhaustive search for the three values��of �����ℎ and consider all the parameter conigurations from Table 1 ś a six-dimensional grid search. For a given value of �������ℎ and a parameter coniguration (all other ive parameters) we average the resulting depth of the compiled cir �cuit, , over the circuits existing in � ��the benchmark. From the total 1485 ��� averages, we sample the lowest 100 values, leading to an approximate 7% sampling rate ≈,0.067, from the total number of parameter conigurations. The function �����(� = �,|� |, �������ℎ) � �� ACM Trans. Quantum Comput. Machine Learning Optimization of uantum Circuit Layouts • 21 is the number of times when a parameter � bounded to � was counted in the best 100 parameter combinations obtained for QXX executed for a particular value ���of ����ℎ for QUEKO TFL circuits � of depth|� |. � �� � �� For example,�����(� = 0, 1, 30) is the number of parameter conigurations wher � = e 0 and QXX was used with a search tree of�������ℎ = 30 to lay out TFL circuits of depth 1. The ����� function can compare how, for diferent circuit depths, �������ℎ inluences the optimal values�.of The ���� function aggregates how diferent values�of are ranked against each other when considering diferent TFL circuit depths: for the same |� |, higher rank values are better. The ���� is used to suggest � �� parameter value ranges. ���� (�, |� |) = �����(�, |� |, �������ℎ ) (3) � �� � �� � �=0 �������ℎ ∈ {1, 5, 9} C.1 Optimum parameters vs. circuit depth To speed-up QXX, we are interested in inding parameter values that keep execution times as low as possible without massively impacting the obtaine �����dvalues. According to WRS, �������ℎ is one of the most important parameters, but it is not immediately obvious if it is possible to achie �����vvalues e optimal using low �������ℎ values. In the following, we form parameter pairs betw ��e�en ����ℎ ∈ {1, 5, 9} and the other ive QXX parameters. The best layouts are obtained for: a) large values��of �����������,�and, b) for shallow circuits,�the ������� has to be preferably low. These observations explainStochasticSwap the gate scheduling method (Fig. 2). The �������������value shows that the scheduler prefers to move a single qubit on the coupling graph. The way how ��������values are inluenced by the TFL depth indicates that there exists a relation between the number of gates in the circuit and the number of edges in the coupling graph. This relation could be modelled �� _����� through a density function like . To the best of our knowledge, the efect of this density function on the �� _����� coupling graph edge weights has not been investigated in the literature by now. Figs. 14 and 15 show how randomly chosen parameter conigurations inluence the depth �����of shallow circuits with depths up to 25. Fig. 17 (optimal parameter coniguration irrespective of the �value �����of �ℎ) is supported by the results from Figs. 18, 19(discussed in the next section ś the Gaussian inluences �������ℎ) and 20 ś the best layouts are obtained for: a) large values�of ������������,�and, b) for shallow circuits,�the �������has to be preferably low. We conjecture that the variability in Figs. 18, 19 and 20 is mostly due to the correlations that exist between the search space size and the timeouts. For example, it can be seen that, with a small exception for medium-depth circuits, the best performing value ���of �ℎ������is correlated with the one�of ������ℎ. The number of timeouts we obtained for high values ���of ����ℎ = 9 is an indication of this observation. As a conclusion, it does not seem to be necessary to increase the breadth of the search if the depth of the search tree is shallow. C.2 GDepth parameters Fig. 19 answers the question: What is the best performing value � considering of the depth ��(�����ℎ) of the QXX search space? How many gates of a circuitare important is answered by the value�of . Out of the 11 used values (cf. Table 1), the irst four.0(,00.2, 0.4, 0.6) are considered being ����, the last four .14, 1.6, 1.8, 2.0 are ������. The remaining three values ar � �e��. Due to these ranges, the values on the vertical axis are normalised to 1. The width of the bell curve from Fig. 5 is conigurable and indicative of which circuit gates are the most important wrt.�����optimality. ACM Trans. Quantum Comput. 22 • Paler et al. Fig. 14. Random parameter configurations and their influence on TFL circuit depth optimality. The axes have the same interpretation like in Fig. 8. Each line corresponds to a random parameter value configuration. Fig. 15. Random parameter configurations and their influence on QSE (supremacy experiment) circuit depth optimality. The axes have the same interpretation like in Fig. 8. Each line corresponds to a random parameter value configuration. The number of preferred gates seems to be a function of the used timeout. The more time spent searching for optimal parameters, the thinner the Gaussian bell. As observed in Table 3 and Table 2, the number of timeouts for �������ℎ = 9 is high, such that the � values for timeout at 20 seconds seem not to obey the scaling observed for the other timeouts. This conirms the results of the exhaustive search as presented in Fig. 19 in Appendix, where the curve for�������ℎ = 9 has a high variation along the vertical axis. This diagram in Fig. 16 is similar to the one from Fig. 10, but the WRS was executed on all benchmarks ś parameters were chosen to be compatible with a much larger range of circuits. The curves do not seem to converge as well as they did for the depth 25 circuits. ACM Trans. Quantum Comput. Machine Learning Optimization of uantum Circuit Layouts • 23 Fig. 16. Precision: Laying out TFL circuits with depths up to 45. Fig. 17. It should be possible to find an optimal parameter configuration irrespective of the��value �����of ℎ: Ploting the best depth (let) and�����(right) obtained per TFL circuit depth and �������ℎ parameter. The red (MAX1, MAX5, MAX9) and green (MIN1, MIN5, MIN9) curves are the highest and lowest depths achieved for each �������ℎ value (1,5,9). ACM Trans. Quantum Comput. 24 • Paler et al. Fig. 18. Exhaustive Search: The ����� values to assess the influence of ��the ��ℎ������parameter on the�����. For this we plot the����� obtained for two values�of ������ℎ: blue for 1, and orange for5. Due to the high number of timeouts during the exhaustive search with �������ℎ = 9, the corresponding curve was not ploted. For ����ℎ������= 1, almost for all circuits (with the exception of depth 20, 25 and 30) the ��� best ���is obtained for�������ℎ = 1. As ����ℎ������is increased, the beter results are achieved by larger �������ℎ ś the orange curve is above the blue one. Fig. 19. The value of � can improve both the mapping (higher�����) as well as speed up the search due to lo�w�er �����ℎ: Normalised ����� values to assess the influence of � parameter the on the�����. For this we plot the ����� obtained for three values of �������ℎ: blue for 1, and orange for5, gray for9. Beter results are obtained with decreasing �������ℎ as the value of � is increasing. For example, in the let panel,����performs beter for�������ℎ = 5 for circuits with a depth up to 30. Moreover, � ��� achieves the best depth ratios for �������ℎ = 1. ACM Trans. Quantum Comput. Machine Learning Optimization of uantum Circuit Layouts • 25 Fig. 20. Exhaustive Search: let) ���� values for the ��������parameter ś up to circuits of depths 35 QXX achieves beter �����values for edge costs of 0.2, while for deeper circuits a higher value of the edge cost deliv�ers ���b�eter values; right) ���� values for the ��������������parameter ś higher parameter values perform significantly beter than lower values, meaning that, cf. Fig. 7, it is beter to move a single qubit instead of two across the graph. ACM Trans. Quantum Comput. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png ACM Transactions on Quantum Computing Association for Computing Machinery

Machine Learning Optimization of Quantum Circuit Layouts

Loading next page...
 
/lp/association-for-computing-machinery/machine-learning-optimization-of-quantum-circuit-layouts-cGLcLeC95g
Publisher
Association for Computing Machinery
Copyright
Copyright © 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ISSN
2643-6809
eISSN
2643-6817
DOI
10.1145/3565271
Publisher site
See Article on Publisher Site

Abstract

Machine Learning Optimization of uantum Circuit Layouts ALEXANDRU PALER, Aalto University, Finland, University of Texas at Dallas, USA, and Transilvania University of Bras ov, Romania LUCIAN M. SASU, Transilvania University of ov,Bras Romania ADRIAN-CĂTĂLIN FLOREA, Transilvania University of ov,Bras Romania RĂZVAN ANDONIE, Central Washington University, USA and Transilvania University sov, Romania of Bra The quantum circuit layout (QCL) problem is to map a quantum circuit such that the constraints of the device are satisied. We introduce a quantum circuit mapping heuristic, QXX, and its machine learning version, QXX-MLP. The latter infers automatically the optimal QXX parameter values such that the layed out circuit has a reduced depth. In order to speed up circuit compilation, before laying the circuits out, we are using a Gaussian function to estimate the depth of the compiled circuits. This Gaussian also informs the compiler about the circuit region that inluences most the resulting circuit’s depth. We present empiric evidence for the feasibility of learning the layout method using approximation. QXX and QXX-MLP open the path to feasible large scale QCL methods. CCS Concepts: · Hardware → Quantum computation; Software tools for EDA; · Software and its engineering → Compilers. 1 INTRODUCTION The quantum circuit layout problem is deeply related to the topology of the device used to execute the circuit: instructions cannot be applied between arbitrary hardware registers. Before executing a quantum circuit, this is adapted to the device’s register connectivity during a procedurecompilation called . Quantum circuit compilation is often called quantum circuit layout (QCL). The interest in eicient QCL methods is motivated by the current generation of quantum devices, called NISQ devices 29]. V[ery recent work presents worrisome evidence that even very small and shallow circuits are diicult to execute on NISQ [37]. NISQ circuit compilation includes, for example, error-mitigation 11],strategies lag-qubits [ 8[] and not just QCL methods, but the latter play deinitely an signiicant role in the compilation of large scale fully error-corrected circuits. However, scalable compilation is not possible with current state of the art QCL methods. Fast and scalable QCL allows laying out a circuit with multiple QCL parameter values and selecting the best compiled circuit. Without going into details, our preliminary analysis showed that at the time of writing this manuscript, when optimising aggressively, compilation can take up to 1 hour for most QCL methods when presented circuits of approximately 50 qubits. It is imperative to have eicient and conigurable QCL methods. We focus on three research questions: I. How can we determine the best QCL parameter values to minimize the depth of the compiled output circuit? Authors’ addresses: Alexandru Paler, Aalto University, Finland and University of Texas at Dallas, USA and Transilvania Univ sov,ersity of Bra Romania; Lucian M. Sasu, Transilvania University sov,ofRomania; Bra Adrian-Cătălin Florea, Transilvania University sov, Romania; of Bra , , Răzvan Andonie, Central Washington University, USA and Transilvania University ov, Romania. of Bras Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proit or commercial advantage and that copies bear this notice and the full citation on the irst page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior speciic permission and/or a fee. Request permissions from permissions@acm.org. © 2022 Copyright held by the owner/author(s). Publication rights licensed to ACM. 2643-6817/2022/9-ART $15.00 https://doi.org/10.1145/3565271 ACM Trans. Quantum Comput. 2 • Paler et al. Fig. 1. Continuous learning QCL: It is possible to learn our QCL method, which is called QXX (yellow), and replace it with a machine learning model (e.g. neural network - green). Optimal parameter values (blue) for adapting QXX performance are chosen by automatically executing weighted random search (WRS) in a loop. II. Choosing good parameter values for the QCL method should be very fast. How can we establish a time- performance trade-of between searching optimal parameter values and the minimization of the depth? III. Can QCL be sped up by machine learning? More formally, QCL takes as input a cir�cuit incompatible with a device’s (can be error-corrected) register �� connectivity and outputs a circuit � that is compatible Additional . gates are used to overcome the connectivity ��� limitations and compile a device-compatible � cir . Wcuit e deine the�����function between the depths of two ��� circuits, wher|�e| > 0 is the depth of circuit � . We formalize the QCL problem as follo QCL ws:optimization is the minimization of the�����function. |� | ��� �����(� ,� ) = (1) �� ��� |� | �� 1.1 Related Work QCL is already a wide topic, and we will not be providing a thorough exposition of the ield. We refer the reader to the works of [9, 31, 32, 34, 35] for detailed and careful overviews of some the works that inluenced and shaped QCL. Some of the irst discussions about circuit optimality and gate counts appeared in the seminal 4]. paper of [ After quantum circuits started being analysed as reversible circuits formed from Tofoli gates, a large number of exact methods and heuristics was proposed ś a not so recent but complete review30 is].[The complexity class of QCL was discussed irst in 19][ and this has been used as a foundation for proving the complexity of diferent QCL variations such as [7, 32, 34]. In general, there are heuristic QCL methods and exact QCL methods. A recent exact method is the one from [34] (the paper includes a discussion about bottlenecks of exact methods). Automatic quantum circuit compilation does not always generate the best possible circuit. One of the irst attempts to design full algorithm circuits considering hardware connectivity limitations 14]. Another is [ recent example of a hand optimised circuit 26],is [ where, interestingly, the optimisation was achieved by using a results known from the automatic optimisation of circuits. Most quantum circuit design automation tools treat QCL as a sequence of steps like initial placement and gate scheduling. The authors of31[] have a complete discussion of the theoretical implications of placement and scheduling. In practice, QCL has been solved by introducing SWAP gates, but there exist more reined methods like23[]. Solving QCL through search algorithms has been recently presented24in ] (b[eam search) and [41] (A*). The work of [31] introduced a parameterisable search algorithm for QCL, Bounded Mapping Tree. In this work, we focus on QCL as an instance of register allocation as introduce 32] dand byuse [ the by now classic graph view of the quantum device layout. Recently, graph-based QCL approaches (motivate 19]) d by [ ACM Trans. Quantum Comput. Machine Learning Optimization of uantum Circuit Layouts • 3 Fig. 2. The quantum circuit compilation (QCL) procedure takes an input � and circuit transforms it into the functionally �� equivalent � circuit. The QXX method computes an optimal assignment of circuit qubits (registers) to device qubits ��� (registers). seem to achieve the necessary performance for compiling very large circuits to complicated device topologies. Some QCL approaches were proposed using hyper-graphs [3], but more classical approaches are the ones from [10]. Machine learning has started being widely applied in diferent aspects of quantum computing. Nevertheless, compared to [40] we are not compiling only single qubit gates after needing impractical numbers of hours to train a very large network. In parallel and independent to the preparation of this manuscript, machine learning methods for QCL have been presented in28[]. Compared to [28], our method is capable of learning continuously. QCL includes two steps31[]. The irst step is initial placement (e.g., [25]), where a mapping of the circuit qubits to device registers is computed. This step is also called qubit 32 allo ]. The cation second [ QCL step isgate scheduling, where the circuit � is traversed gate-by-gate. �� 1.2 Contribution We present machine learning method (cf. Fig. 1) for the initial placement (i.e. mapping) of quantum circuits. In order to test the feasibility of learning, we develop an initial placement heuristic, which we called QXX for no particular reason. From a technical perspective, our methods are designed and implemented into great detail and capable of being deployed in practice. The machine learning model of QXX is called QXX-MLP, which is the signiicantly faster implementation of the QXX. At the date of writing this manuscript, our method was the irst to efectively learn a QCL heuristic. QXX is conigurable over multiple parameters, and our goal is to have a method that can automatically chose the best parameter values in order to achieve optimally compiled circuits. The novelties of QXX and QXX-MLP (Section 2) are: (1) automatic feature selection ś focusing on the important sub-circuit using a conigurable Gaussian function (Section 2.3); (2) automatic QCL coniguration ś we use weighted random search (Section 2.5) to optimize QXX parameter values. The parameters inluence also the speed of the QCL; (3) demonstrating QCL learnability ś the machine learning version QXX-MLP works as an approximation method for the circuit layout depth (Section 2.6); (4) scalability ś the almost instantaneous layout performance (Sections 3.4 and 3.2 and Table 3). We show that QCL execution time can be shortened while maintaining ����the �optimality. The goal of this work is to highlight the generality and wide applicability of learning QCL methods. For this reason, we focus on benchmarking circuits which are compatible with both non-error-corrected and error- corrected machines. We use synthetic benchmarking circuits, QUEK 35] (Se O[ction 3.1), which capture the properties of Tofoli+H (e.g. arithmetic, quantum chemistry, quantum inance etc.) circuits without being speciic ACM Trans. Quantum Comput. 4 • Paler et al. for a particular application. Section 3 presents and discusses experimental results performed with the QXX and QXX-MLP approaches. Conclusions are synthesized in Section 4. 2 METHODS We present the QXX method and describe the machine learning techniques that are using it. Subsection 2.1 gives the details of QXX. Fig. 1 illustrates the approach followed by this work during the QCL parameter optimization stage: The parameters of the normal QXX method (orange) are optimized using WRS. To further speed up QXX we train a machine learning model, called QXX-MLP, to predict ����the �values obtained by using the QXX for a given circuit and a particular set of parameter values. We employ three methods to evaluate how the parameter optimization of QXX inluences the compiled circuits. First, an eicient parameter optimizer is Weighted Random Search (WRS, Subsection 2.5). Second, we use exhaustive search to collect training data to obtain the QXX-MLP neural network (Subsection 2.6). The latter is used to estimate optimal parameter values. The WRS method starts from an initial set of parameter values and adapts the values in order to minimize the obtained�����. This procedure forms a feedback loop between WRS and the QXX method. Running WRS multiple times for a set of benchmark circuits is equivalent almost exe tocuting an exhaustive search of the parameter space. The exhaustive search data is collected and used to train QXX-MLP. From a methodological point of view, device variabilities are very important and, moreover, these seem to have a luctuating behavior 37[]. However, in this work we do not consider that NISQ qubits have variable idelities [36], or that crosstalk is a concern during NISQ compilation [22]. We will use the following notation. The irst QCL step computes � ,awher liste � [�] = � refers to circuit qubit� being stored on device register �. Computing the list � is the analogue to determining a good starting point for the gate scheduling procedure. We represent two-qubit gates by the tuples (� , � ). Scheduling executes � � the current two-qubit gate if (� [� ], � [� ]) is an edge of device connectivity graph, which willDEVICE be calle . d � � Otherwise, the gate qubits are moved across the device and stored in registers connected by an an edge from ��� ��� DEVICE. The movement introduces additional gates, such as SWAP gates, in order for all(�tuples , � ) to be � � edges ofDEVICE. The mapping is updated accordingly, as illustrated in Fig. 7. In general, the � depth is of ��� lower bounded by|� |. �� 2.1 The QXX Mapping Heuristic QXX is a fast search algorithm to determine a qubit mapping (allocation), and is called by the subsequent gate scheduler to compute a good qubit allocation/mapping/placement. QXX uses an estimation function to predict how a � with minimum depth would have to be initially mapped. ��� A novelty is that QXX uses a Gaussian-like function�calle ����ℎdto estimate the resulting depth (cost) of the layed out circuit calle � d. The qubit mapping is found using the minimum estimated��value ���ℎ.of ��� QXX uses three types of parameters, which we will explain in the following three sections: (1) for coniguring the search space; (2) for adapting �����ℎ to the circuit � ; �� (3) for adapting QXX to the second step of QCL, namely the scheduler/router of gates. 2.2 Search Space Configuration QXX is a combination of breadth-irst search and beam search. The search space is a tree (cf. Fig. 3). Constructing a qubit-to-register mapping is an iterative approach: qubits are selected one after the other, and so are the registers where the qubits mapped initially. ACM Trans. Quantum Comput. Machine Learning Optimization of uantum Circuit Layouts • 5 Fig. 3. The search space dimensions of QXX can be configured by adapting the parameters MaxDepth (green) and MaxChildren (red). Each node of the search tree stores a list of possible mappings for which a minimum cost was computed. The maximum length of the listMaxChildr is en (e.g., 3 blue levels). The list is emptied if a lower minimum cost is computeMaxDepth d, or if has been achieved. In the later case, the path with the minimum cost (green) is kept, and all other nodes are removed. For example, consider � = {(� , � ), (� , � ), (� , � )} a circuit of three CNOTs and a mapping � = [� , � ]. �� 1 2 2 3 3 4 1 2 This means that� is allocated to register � , and � to � . After the irst qubit was mapped, we hav|�e | = 1. 1 1 2 2 After all qubits were mapped, we hav |�e| = � . The maximum depth of the tree is � . In the worst case, each node has � children. The tree is augmented one step at a time, by adding a new circuit�qubit to the mapping (in the order of their index, in the current version - we do not analyze the inluence of this choice). This increases the tree’s depth: at each existing leaf node, all possible � mappings of � are considered. Consequently, all the new leaves of a tree are the result of appending � to the previous’ level leaves, which now are usual nodes. New depth estimation values are computed using�the ����ℎ function, each time leaves are added to the tree. Each tree node has an associated�����ℎ cost. The level in the tree equals the length of the mapping for which the cost was computed. The search is stopped after computing a complete mapping with the minimum �����ℎ cost. Thus, the maximum number of leaves per node is added in the unlikely case that all values �����of ℎ are equal. The search space will easily explode for large circuits. We introduce two parameters to prune the search space. In Fig. 3, the result of pruning the search space tree is represented by the green path and the green bounding box. The irst parameter is ����ℎ������whose job is to limit the number of children of equal minimum �����ℎ values. For an arbitrary value �of ���ℎ������< � , the tree will include at le �at vel most �· ����ℎ������nodes. The second parameter is the cut-of threshold �������ℎ which speciies that, at levels indexed by multiples of�������ℎ, all the nodes are removed from the tree, except for the ancestors of the minimum cost leaf. This is because for large circuits (e.g. more than 50 qubits) it is not practical to evaluate all the new combinations � of registers for �· ����ℎ������leaves. 2.3 Configuration of Circuit Depth Estimation The �����ℎ function is used to estimate the depth of the compiled circuit without laying it out. The value of�����ℎ can be calculated once at least two qubits were mapped. Equivalently, the value of �����ℎ is computed only for CNOTs whose qubits were mapped. The �����ℎ function is a sum of a Gaussian functions whose goal is to model the importance of CNOT sub-circuits � . fr Inom other words, �����ℎ is the �� sum of the costs estimated for scheduling the CNOTs of a circuit: ACM Trans. Quantum Comput. 6 • Paler et al. � 2 �����ℎ = ����exp −� · − � (2) �=0 where � ≤ |� | In the above formula,� is the number of CNOTs from � whose qubits were already mapped. The � and � � �� parameters control the spread and position of the Gaussian function. The value �is the of index of the CNOT from the resulting circuit�and ���is the cost of moving the qubits of the CNOT. For example, for the circuit � = {(� , � ), (� , � ), (� , � )} �� 1 2 2 3 3 4 and the mapping� = [� , � ], � = 1 the �����ℎ can be computed for only the irst CNOT, because � and � 1 2 � 3 4 were not mapped. . The scheduled CNOTs are indexed, and we assume that all gates are sequential (parallel gates are executed sequentially, considering the circuit truly a gate list). For example, after scheduling two CNOTs of an hypothetical circuit, the two CNOTs are numbered 1 and 2. In the exponent �of ����ℎ the value of is always in the range [0, 1]. For ����we use the shortest distance on an undirected graph. An example is Fig. 7, wher ����e= 5. For two � � qubits�, �, where (� [�], � [�]) is already an edge of DEVICE, the ����= 1. Otherwise, the distance between two qubits is computed based on the edge weights attached to DEVICE the graph. If the intention is to allow CNOTs at the middle of the circuit to have longer movements on the device, we can set parameters to generate a function like in the top right panel of Fig. 5. For example � = 5,and for� = 0.5, the ����from�����ℎ are weighted with almost zero at the start of the circuit (Fig. 4). The opposite situation is illustrated in the middle panel. � =For 0, the Gaussian is efectively a constant function, such that the cost is the sum of all the CNOT distances. Fig. 4. A quantum circuit and a Gaussian function. a) A four qubit quantum circuit consisting of a sequence of CNOTs; b) The ����uses a Gaussian function. The later is drawn superimposed in order to highlight the weight generated by the Gaussian at each CNOT position in the circuit. The weights are minimal for the first and last CNOT. The maximum value is for the CNOT at the middle of the circuit. 2.4 Configuring QXX for the Scheduler QXX is efectively estimating the output of the gate scheduler,sele which cts the best edge ofDEVICE where to execute the CNOTs of a circuit. The scheduler is modeled as a black box and its functionality is unknown. QXX can work with Qiskit’s StochasticSwap, tket or other tools such as the ones ones from [18, 39]. ACM Trans. Quantum Comput. Machine Learning Optimization of uantum Circuit Layouts • 7 Fig. 5. The value of the Gaussian use ��d�by ��ℎ for diferent values of parameters � and � . The horizontal axis illustrates the input value to the Gaussian: the gates’ integer index in the circuit is scaled with the total number of gates in the circuit. The importance (weight) of a gate is highest whenever the value of the Gaussian approaches the maximum value. A gate has low importance when the Gaussian has low values. Starting from the initial placement, QXX estimates the total depth of the circuit after repeatedly mapping the circuit. Without loss of generality, QXX assumes that the scheduler will move both qubits of a CNOT across the DEVICE towards the selected edge. Instead of updating the mapping, the movements of the qubits accumulate are d into anofset variable. Qubit movements is captured by the ��������������parameter. The ��������������is asymmetric, and when processing CNOT qubits, it moves on the DEVICE the qubit with the lowest index by the fraction 1 �������������� −1 , and the qubit with the highest index by . �������������� �������������� As we will show in the Results section, for deep circuits we determine ��������������values closer to 2 (favors high and low index qubits equally), and for shallow circuits the values are higher (favors low indexed qubits). After moving the qubits on the device (Fig. 6), the ofset of a qubit is an estimation of how much the qubit was moved by the scheduler. For example, the ofset of an arbitrary qubit used in 3 CNOTs is the sum of the three movement updates which are obtained after scaling each CNO ��T’s ��with the corresponding �������������� expression. Fig. 6. Device connectivity graph is blue, the weight of all but one edge is 1. For example, the high cost might be due to higher physical error-rates. The movement of qubits on the device is controlled by an additional parameter ��,�calle ����d�: higher values imply a larger estimation of the movement heuristic. Changing the��value �����of �is equivalent to scaling the total value of �����ℎ, because ��������is a common factor in the calculation ���of �. For the weight scale factor��������= 1, as shown in Fig. 7, the minimum distance between the CNOT qubits corresponds to the sum of the edge weights separating them (ive in Fig. 7). ACM Trans. Quantum Comput. 8 • Paler et al. Fig. 7. The efect of the MovementFactor: Device connectivity graph is blue, and the two brown qubits A and B have to be interacted. If not specified, all edges have weight 1. The shortest path between A and B has cost L=5, because there are five edges between A and B. There are multiple options how A and B can be brought together. One of the options is to assume that A and B move to the pink C and D. Assuming that CD is at the middle of the path connecting A and B, then MovementFactor=2 because both A and B are moved on average L/2. A higher movement factor implies that one of the qubits moves less, while the other more. QXX Parameters WRS Obtained Exhaustive Search Name Min Max Increment WRS-Weight Prob. of Change Start Stop Increment MaxDepth 1 55 1 9.35 0.62 1 9 4 MaxChildren 1 55 1 8.00 0.53 1 9 4 B 0 500 0.1 7.76 0.52 0 20 2 C 0 1 0.01 15.06 1.00 0 1 0.25 MovementFactor 1 55 1 3.52 0.23 2 10 4 EdgeCost 0.1 1 0.1 10.59 0.70 0.2 1 0.4 Table 1. Parameter values. The QXX Parameters columns represent the min. and max. value together with the increments be to used with the heuristic.WRS TheObtained columns illustrate the results of the weighted random search method for optimal parameter values. Ranges and increments of QXX’s input parameters. The training of QXX-MLP as well as WRS were performed for a smaller search space whose parameter ranges are presented in the Exhaustive Search columns. 2.5 Weighted Random Search for Parameter Configuration The QXX parameters from Sections 2.2, 2.3, 2.4 have to be tuned for optimal performance of the compilation. We calltrial the evaluation for one set of parameter values. In previous w13 ork ], w [ e introduced the WRS method, a combination of Random Search (RS) and probabilistic greedy heuristic. Instead of a blind RS search, the WRS method uses information from previous trials to guide the search process toward next interesting trials. We use WRS to optimize the following QXX parameters (introduced in Subsection �����2.1): ��ℎ, ����ℎ������, �, � , �������������,�and ��������(see Table 1). Within the same number of trials, diferent optimization methods achieve diferent scores, depending on how łsmartž they are. Due to the nature of QXX, the trial execution times are variable, since it depends on the deined quantum architecture, the topology of the circuit to lay out, as well as the values of the parameters (cf. Fig. 3). Search space reduction and search strategy are inter-connected. For the exhaustive search parameter ranges from Table 1, we limit the time spent evaluating a parameter coniguration by intrtime oducing out parameter a . WRS uses the obtained data to run an instance of fANOVA [15]. 2.6 Learning QXX - Training QXX-MLP The previous section was about searching parameter values. Herein, we go one step further and learn the behavior of the gate scheduler in relation to the mapper parameters and�the ����ℎ function used by QXX. We describe https://github.com/aclorea/goptim ACM Trans. Quantum Comput. Machine Learning Optimization of uantum Circuit Layouts • 9 the method to learn speciic tuples consisting of circuit and QXX parameters, and how to estimate the depth of the mapped circuits. Parameter optimization, and for that reason, eicient initial mapping of the circuit to the device is a regression problem. The training data for learning was obtained as follows: for the parameters from Table 1, we choose smaller ranges and larger increments as illustrated in Exhaustiv the e Search columns. There are three possible values for �������ℎ and three values for����ℎ������. For a given value of �������ℎ, there are 3 × 11 × 5 × 3 × 3 = 1485 possible parameter conigurations. There is a total×of 1485 90 = 133650 layouts for each�������ℎ. Overall, there are 400950 parameter conigurations of the 90 circuits that are being evaluated. Table 1 illustrates the parameter ranges for which we collected data that allows us to compute the ratio �����(� ,� ) for every combination of circuit layout and QXX parameters. The generated exhaustive search �� ��� data is available in the project’s online repository. We considered three candidate models to learn QXX: k Nearest Neighbors (KNN), Random Forest (RF) and MultiLayer Perceptron (MLP). Each model has diferent inductive21bias ], being [ respectively: a localśbased predictor, an ensemble model built by bootstrapping, and a connectionist model, respectively. Despite of their conceptual simplicity, KNN predictors are easily interpretable and passed the test of time [38]. RFs were found as the best models for classiication problems 12], and [ we wanted to investigate their performance on this regression problem as well. Finally, MLPs creates new features through nonlinear input feature transformations, unlike KNN and RF which use raw input attributes. Nonlinear transformations and non-local character of MLPs are considered the premises for the successful deep learning movement [5]. We found MLP as the best model and use it during the parameter optimization stage (as illustrated in Fig. 1) as an approximator for the functionality of QXX. More details are in the Appendix. 3 RESULTS We present empirically obtained results about: 1) the performance of QXX; 2) the quality of the QXX-MLP model; 3) the performance of optimising QXX parameters with WRS. We select QXX parameter values using: a) exhaustive search; b) WRS with the QXX method (orange in Fig. 1) , and c) WRS on the QXX-MLP (green in Fig. 1). The QXX method was implemented and is available online . The current implementation is agnostic of the underlying quantum circuit design framework (e.g. Cirq or Qiskit). QXX is implemented in Python. The exhaustive search was executed on an i7 7700K machine with 32GB of RAM. The QXX model was trained on Intel Xeon W-2145 3.70GHz with 16 cores and 256 GB RAM. The WRS parameter optimization was performed on a laptop grade i5 processor with 16 GB RAM. For the benchmarks and comparisons we used Cirq 0.9, IBM Qiskit 0.25 and tket 0.2. In the following, we describe how the WRS heuristic is used to evaluate the QXX method and its MLP implementation. Afterwards, we present a series of plots that support empirically the performance of QXX. We analyze the inluence of the parameters, and ofer strong evidence in favor of learning quantum circuit layout methods. In particular, we will show that�the ����ℎ Gaussian can shorten the time of necessary to run QXX. This is achieved by pruning the search space and focusing on the most important region of the circuit. We will present examples for how the Gaussian is automatically adapted for deep and shallow circuits. 3.1 Benchmark Circuits We evaluate the�����itness of the QXX method using the QUEKO benchmark suite 35].[These circuits abstract Tofoli based and quantum supremacy like circuits, as well as a variety of NISQ chip layouts. Such benchmarks complement the libraries of reversible adders and quantum algorithms [17]. https://github.com/alexandrupaler/qxx ACM Trans. Quantum Comput. 10 • Paler et al. Fig. 8. Comparison with state of the art methods using QUEKO TFL benchmark circuits. All curves except QXX35 ar].e from [ The curves are computed ater averaging the depth ratios for 10 circuits for each known optimal depth from the benchmark. Horizontal axis is the known optimal depth of the TFL circuits. The vertical axis illustrates the achiev�e�� d �depth �. We have used QXX together with the Qiskit scheduler, and we assume that this is the reason why the QXX curve is close to the Qiskit curve. For shallow circuits the mapper is more important than the scheduler (QXX is beter than Qiskit), and for deeper circuits the importance of the mapper vanishes (QXX and Qiskit perform close to each other). QXX is not a scheduler, but a mapper (cf. Section 2.4). Automatic compilation/mapping of circuits, although applicable to NISQ applications, e.g. Fig 9, is of little practical importance by itself when one wishes fault-tolerance to be taken into account. NISQ circuits, such as for supremacy or for VQE/QAOA, are co-designed and a method like ours is just one piece in a much larger worklow which. We focused on Tofoli+H circuits, QUEKO TFL, because such circuits are not co-design and are also very representative for large scale error-corrected computations. The 90 QUEKO TFL circuits include circuits with known optimal [depths 5, 10, 15of , 20, 25, 30, 35, 40, 45] (meaning that the input circuit and the layout circuit have equal |� depths | = |� |). For each depth value ��� �� there are 10 circuits with 16 qubits. The NISQ machine to map the circuits to is Rigetti Aspen. A perfect QCL method will achie �v�� e ��= 1 on the QUEKO benchmarks. In our experimental setup, the layout procedure uses: 1) QXX for the initial placement (irst QCL step) and 2) the Qiskit StochasticSwap gate scheduling (second QCL step). The results depend on both the initial placement as well as the performance of the StochasticSwap scheduler. We do not conigure the latter and use the same randomization seed. We assume that this is the reason why the QXX performance is close to the Qiskit one. 3.2 Resulting Depths and Scalability Our goal is to show that the behavior of the mapper/compiler can be learned, and that compilation can be sped up using machine learning. QXX achie ��� ves ��values around 30% lower (which means better ś best�����is 1) than Qiskit on the low depth (up to 15 gates) QUEKO TFL circuits. In general, as shown in Fig. 8, the performance of QXX is between Qiskit and tket 33].[tket outperforms Qiskit, and QXX too, on the QUEKO benchmarks, because it has a much smarter scheduler. QXX is not a scheduler, but a mapper (cf. Section 2.4). The results from Fig. 8 are encouraging, because: a) QXX performs better than most compilers; b) there is known variability in the compiler’s performance with respect to benchmark circuits, such that for other circuits ACM Trans. Quantum Comput. Machine Learning Optimization of uantum Circuit Layouts • 11 Search Space Params. Timeout MaxDepth MaxChildren 0.05s 0.5s 5s 20s 1 1 25 0 0 0 1 5 43 0 0 0 1 9 36 3 0 0 5 1 25 0 0 0 5 5 23899 0 0 0 5 9 41014 2457 0 0 9 1 46 0 0 0 9 5 44550 37365 2501 172 9 9 44550 44494 36288 28798 Table 2. The number of WRS timeouts (0.05s, 0.5s, 5s, 20s) is a measure of QCL execution time. We count the number of timeouts when using diferent search space pruning strategies configur ����e�d�by �ℎ and ����ℎ������. the classiication might look completely diferent; c) our results were obtained very fast when using the MLP approach ś we get almost instantaneously the layed out circuit (Section 3.4). −2 For NISQ gate error rates (realistically) upper bounded by ,10 only circuits with a maximum depth of 30 are of practical importance. For shallow circuits the mapper is more important than the scheduler (QXX is better than Qiskit), and for deeper circuits the importance of the mapper vanishes (QXX and Qiskit perform close to each other). Fig. 10 presents results with parameters chosen speciically for shallow circuits. With respect to the scalability of our methods, one interesting aspect are the timeouts. For extreme parameter values (e.g. when����ℎ������and �������ℎ are 9) the execution time of QXX is high although the method has polynomial complexity. We introduced a timeout of 20 seconds for the QXX executions and collected data accordingly. Table 2 illustrates the increasing execution times, it ofers the motivation to learn the method ś the model will have constant execution time irrespective of the parameter value coniguration. Additionally, we notice that computing a good initial placement takes, in the best case, a small fraction of total time spent laying out. Without considering the optimality of the generated circuits, in the worst case, computing the initial placement using QXX can take between 2% and 99% of the total layout time. For more details see Table 3 and Section 3.4. For �������ℎ = 1 the maximum time fraction is 10% ��, for �����ℎ = 5 the maximum is 85%, while for �������ℎ = 9 it is 99%. These values are also in accordance with the execution times presented in Table 2. However, when considering the 100 fastest QXX execution times, for each ��of ���the ��ℎ values, the maximum mapping duration is 4% of the total layout duration. 3.3 QXX parameter optimization using WRS Initially we ran WRS for a total of 1500 trials within the parameters space deined in�T�able ����1.�ℎFor , ����ℎ������and ��������������we use the same limits and steps deined in the table�.,For � , and �������� we generated the values by drawing from a uniform distribution in the speciied range. We ran the classical RS step for 550 trials and computed the weight (importance) of each of the parameters using fANOVA obtaining the values from Table 1. The weight of a parameter measures its importance for the optimization of the itness function. We use the ���� (�����) = ���� (|� |/|� |) itness measure. WRS ran with eight workers for a total time of ��� �� four hours and 11 minutes and the best result (3.99) was obtained at iteration 1391 and again, at a later trial, for a diferent value of �������ℎ. Table 3 shows the combination of parameter values that yield the best results. The subsequent WRS executions use the exhaustive search parameter space deined in Table 1. In this parameter space we use WRS to optimize several conigurations for which we have changed either the evaluation timeout ACM Trans. Quantum Comput. 12 • Paler et al. TFL-45 TFL-25 MLP Name 20s 5s 0.5s 0.05s 20s 5s 0.5s 0.05s MaxDepth 9 9 9 6 8 9 9 3 9 MaxChild 4 3 2 2 4 3 2 2 9 B 5 17.3 8 3.5 6.9 15.7 6.10 2 1.5 C 0.61 0.25 0.02 0.31 0.86 0.91 0.65 0.74 0.32 Mov.Factor 2 4 2 6 6 10 6 7 10 EdgeCost 0.2 0.2 0.2 0.9 0.2 0.2 1 0.2 0.8 Duration(s)35170 28800 21785 11438 10583 8110 6497 4102 ~2 Avg. Ratio 4.093 4.138 4.328 4.465 4.043 3.966 4.279 4.537 4.423 Table 3. Parameter values obtained using WRS on the batch of 90 QUEKO circuits. The TFL-45 and TFL-25 are for QXX and WRS on the the TFL circuits of depths 25 and 45. The MLP column is for when using QXX-MLP with WRS. The last two rows represent the durations and the recorded average Ratios. The MLP approach is very fast and takes approx. 2 seconds. with values from Table 2 or the maximum TFL Depth (either 25 of 45). Interestingly, the 5 seconds timeout, achieves a better performance than the WRS parameter optimization with 20 seconds timeout (cf. large number of timeouts in Table 2). In general, WRS selects high �������ℎ values. This is explainable by the fact that optimal �����values are easier to be found using large search spaces. 3.4 QXX-MLP using WRS: Fast and scalable QCL One of the research questions was if it is feasible to learn the QCL methods in general, and QXX in particular. Moreover, we answer the question łWould it be possible to choose optimal values by a rule of thumb instead of searching for them?ž. For example, is it possible to obtain ��� go�o�in d a timely manner by using�lo ��w����ℎ values? During our experiments, the found parameters did not difer signiicantly between QXX and QXX-MLP. The results from Fig. 10 and Table 3 show that the MLP model of QXX performs well ś within 10% performance decrease compared to the normal QXX. We used a combination of WRS and QXX-MLP in an attempt to minimize the time required to identify an optimal coniguration. QXX-MLP and WRS are almost instantaneous: under 2 seconds for the entire batch of 90 circuits. In contrast, the execution time of WRS using the normal QXX was about 3 hours to ind the optimal parameters for the 90 circuits. A detailed analysis of the speed/optimality tradeofs available in the Appendix (e.g. Fig. 17). We also tested thetransfer of learning: the possibility of training QXX on a set of circuits, and then applying to a diferent type of circuits. Figure 9 illustrates the results of applying QXX-MLP on quantum supremacy circuits (QSE)[34]. The performance of QXX-MLP is more than encouraging: it had the performance of a timedout WRS op- timization. The WRS parameter optimization did not timeout, because MLP inference is a very fast constant time operation. Table 3 shows that QXX-MLP performs similarly to the WRS with a 0.05s timeout. For example, when applied to QXX-MLP, the values chosen by WRS for ��������and ��������������are consistent with the exhaustive search results (cf. Fig. 20 in Appendix): in general, �����an ���= 0.2 is preferable for all the TFL-depth values, and for��������������> 2 is preferable. The preference for large movement factors is obvious for the shallow TFL circuits. Regarding QCL speed: The WRS and QXX-MLP optimization takes a few seconds compared to the hours (cf. Table 3 for laying out all the circuits from Section 3.1) necessary for WRS and QXX. This is a great advantage that comes on the cost of obtaining a trained MLP, which is roughly the same order of magnitude to a WRS parameter ACM Trans. Quantum Comput. Machine Learning Optimization of uantum Circuit Layouts • 13 Fig. 9. Transfer of learning: Laying out QUEKO QSE (supremacy experiment) circuits using parameters learned from QUEKO TFL circuits. Each parameter evaluation executed by WRS was timed out ater 5, 50, 5001and /1002000 �ℎ seconds (10 milliseconds). optimization. MLP training is performed only once. In the case where the MLP model is not used WRS parameter optimization has to be repeated. In a setting where quantum circuits are permanently layed out and executed (like in quantum computing clouds) incremental online learning is a feasible option. The parameter optimization of QXX-MLP seems to be very predictable, because the corresponding curves in Fig. 10 are almost lat. Table 3 provides evidence that the MLP is very conservative with the choice of the values of� and � : The Gauss curve is almost lat, and positioned slightly towards the beginning of the circuit. The latness of the curve and its position could explain the almost constant performance from Fig. 10 and the 10% average performance degradation of QXX-MLP. 3.5 Automatic Subcircuit Selection Our results support the thesis that WRS can adapt the parameters of the QXX Gauss bell curve in order to select the region of the circuit that inluences the total cost of laying it out. The parameter controlling the center of the bell curve is relevant with respect to resulting layout optimality, as well the speed of the layout method (cf. Figs. 19 in Appendix for more details). Fig. 11 is a comparison of the Gauss curves obtained for the diferent TFL circuit depths. It is surprising that values � of > 0 are found, given that these assume an upfront cost of re-routing some early gates. For shallow circuits, WRS prefers � values close to the end of the benchmark circuits, meaning that the last gates are more important than the others. This is in accordance with Fig. 12 where the green curve (�������ℎ = 9) is over the orange curve�(������ℎ = 1) for large values of � > 0.75 in the range of TFL depths from 5 to 30. For deeper circuits, WRS sets the center � of the Gaussian to be closer to the beginning of the circuit. This is in accordance with Fig. 12 where the vertical distance between the green and orange curves is maximum for � < 0.25. Fig. 12 answers the question: Considering the diferent values ����of ���ℎ, where should the Gauss bell be placed relative to the start of the compiled circuit? Intuitively, this means to answer the question: Are the irst gates more important than the last ones, or vice versa? Increasing values � inluence of the performance of QXX ACM Trans. Quantum Comput. 14 • Paler et al. Fig. 10. QXX-MLP achieves approximately 90% of the QXX performance when laying out TFL circuits with depths up to 25 ś the most compatible with current NISQ devices. WRS was applied on: 1) the normal QXX and timing out too long evaluations; 2) the QXX-MLP model. Fig. 11. Two Gaussian curves obtained using WRS: let) on TFL-45 circuits with timeout 5 seconds; right) on TFL-25 circuits with timeout 5 seconds. (cf. Table 3) with decreasing �������ℎ ś the orange ( �������ℎ = 1) and green (�������ℎ = 9) curves swap positions along the vertical axis with the intersection between them being around TFl depth 30. 4 CONCLUSIONS Scalable, conigurable and fast QCL methods are an imperative necessity. In the context of quantum computing clouds, continuous learning is a real possibility, because a large batch of circuits is permanently sent and executed on mainframe like machines. It is feasible to consider machine learning QCL methods for fast and accurate QCL. We introduced QXX, a novel and parameterized QCL method. The QXX method uses a Gaussian function whose parameters determine the circuit region that inluences most of the layout cost. The optimality of QXX is evaluated on the QUEKO benchmark circuits using�the ����function which expresses the factor by which the ACM Trans. Quantum Comput. Machine Learning Optimization of uantum Circuit Layouts • 15 Fig. 12. The�����(the number of times when a parameter value was counted in the best 100 parameter combinations obtained for QXX) values to assess the influence of the Gauss � parameter center on the�����. The Gauss curve is automatically adapted to circuit depth. The horizontal axis in the plots represent the diferent types of QUEKO TFL circuits with known depths. The vertical axis is the ����� value for the corresponding circuit types. There are two curves, the brown one for �������ℎ = 1 and the green one for�������ℎ = 9. For circuits with depths up 30,to the green�������ℎ = 9 performs beter than brown�������ℎ = 1 (lower ����� value) when the Gauss curve is positioned to the� right = 1.0 (represents the end of the circuit) ś the last gates are more important. For circuits of depths larger 30, the than opposite is true ś the first gates are more important. The function �����(� = �,|� |, �������ℎ) is the number of times when a parameter � � �� bounded to � was counted in best 100 parameter combinations obtained for QXX executed for a particular ��value �����ℎof and for QUEKO TFL circuits � of depth|� |. More details in the Appendix. � �� � �� number of gates in the layed out circuit has increased. We illustrate the utility of QXX and its employed Gaussian. We show that the best results are achieved when the bell curve is non-trivially conigured. QXX parameters are optimized using weighted random search (WRS). To increase the speed of the parameter search we train an MLP that learns QXX, and apply WRS on the resulting QXX-MLP. To crosscheck the quality of the WRS optimization and of the MLP model. This work brought empirical evidence that: 1) the performance of QXX (resulting ��� depth ��and speed) is on par with state of the art QCL methods; 2) it is possible to learn the QXX method parameters values and the performance degradation is an acceptable trade-of with respect to achieved speed-up compared to WRS (which per se is orders of magnitude faster than exhaustive search); 3) WRS is inding parameters values which are in accordance with the very expensive exhaustive search. We conjecture that, in general, new cost models are necessary to improve the performance of QCL methods. Using the Gaussian function, we conirmed the observation that the cost of compiling deep circuits is determined only by some of the gates (either at the start or the end of the circuit). From this perspective the Gaussian function worked as a simplistic feature extraction. Future work will focus on more complex techniques to extract features to drive the QCL method. ACM Trans. Quantum Comput. 16 • Paler et al. ACKNOWLEDGMENTS AP was supported by Google Faculty Research Awards and the project NUQAT funded by Transilvania University of Braşov. We are grateful to Bochen Tan for his feedback on a irst version of this manuscript, explaining the QUEKO benchmarks and ofering the scripts to generate and plot the presented results. REFERENCES [1] Razvan Andonie. 2019. Hyperparameter optimization in learning systems. J. Membr. Comput. 1, 4 (2019), 279ś291. https://doi.org/10. 1007/s41965-019-00023-0 [2] Razvan Andonie and Adrian-Catalin Florea. 2020. Weighted Random Search for CNN Hyperparameter Optimization. Int. J. Comput. Commun. Control 15, 2 (2020). https://doi.org/10.15837/ijccc.2020.2.3868 [3] Pablo Andrés-Martínez and Chris Heunen. 2019. Automated distribution of quantum circuits via hypergraph partitioning. Physical Review A 100, 3 (2019), 032308. [4] Adriano Barenco, Charles H Bennett, Richard Cleve, David P DiVincenzo, Norman Margolus, Peter Shor, Tycho Sleator, John A Smolin, and Harald Weinfurter. 1995. Elementary gates for quantum computation. Physical review A 52, 5 (1995), 3457. [5] Yoshua Bengio. 2009. Learning Deep Architectures forFound. AI. Trends Mach. Learn. 2, 1 (Jan. 2009), 1ś127. https://doi.org/10.1561/ [6] James Bergstra, Rémi Bardenet, Yoshua Bengio, and Balázs Kégl. 2011. Algorithms for Hyper-Parameter Optimization.. NIPS, John In Shawe-Taylor, Richard S. Zemel, Peter L. Bartlett, Fernando C. N. Pereira, and Kilian Q. Weinberger (Eds.). 2546ś2554. http://dblp.uni- trier.de/db/conf/nips/nips2011.html [7] Adi Botea, Akihiro Kishimoto, and Radu Marinescu. 2018. On the complexity of quantum circuit compilation. Eleventh Annual In Symposium on Combinatorial Search. [8] Rui Chao and Ben W Reichardt. 2018. Quantum error correction with only two extra qubits. Physical review letters 121, 5 (2018), 050502. [9] Andrew M Childs, Eddie Schoute, and Cem M Unsal. 2019. Circuit Transformations for Quantum Archite14th cturConfer es. In ence on the Theory of Quantum Computation, Communication and Cryptography. [10] Ross Duncan, Aleks Kissinger, Simon Perdrix, and John Van De Wetering. 2020. Graph-theoretic Simpliication of Quantum Circuits with the ZX-calculus. Quantum 4 (2020), 279. [11] Suguru Endo, Simon C Benjamin, and Ying Li. 2018. Practical quantum error mitigation for near-future applications. Physical Review X 8, 3 (2018), 031027. [12] Manuel Fernández-Delgado, Eva Cernadas, Senén Barro, and Dinani Amorim. 2014. Do We Need Hundreds of Classiiers to Solve Real World Classiication Problems? J. Mach. Learn. Res. 15, 1 (Jan. 2014), 3133ś3181. [13] Adrian-Catalin Florea and Razvan Andonie. 2019. Weighted Random Search for Hyperparameter Optimization. Int. J. Comput. Commun. Control 14, 2 (2019), 154ś169. https://doi.org/10.15837/ijccc.2019.2.3514 [14] AG Fowler, SJ Devitt, and LCL Hollenberg. 2004. Implementation of Shor’s Algorithm on a Linear Nearest Neighbour Qubit Array. Quantum Inf. Comput.4, quant-ph/0402196 (2004), 237ś251. [15] Frank Hutter, Holger Hoos, and Kevin Leyton-Brown. 2014. An Eicient Approach for Assessing Hyperparameter Importance. In Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32(Beijing, China) (ICML’14). JMLR.org, Iś754śIś762. [16] Wim Lavrijsen, Ana Tudor, Juliane Müller, Costin Iancu, and Wibe de Jong. 2020. Classical Optimizers for Noisy Intermediate-Scale Quantum Devices.arXiv preprint arXiv:2004.03004 (2020). [17] Ang Li and Sriram Krishnamoorthy. 2020. QASMBench: A Low-level QASM Benchmark Suite for NISQ Evaluation and Simulation. arXiv preprint arXiv:2005.13018 (2020). [18] Gushu Li, Yufei Ding, and Yuan Xie. 2019. Tackling the qubit mapping problem for NISQ-era quantumPrde oce vices. edingsInof the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems . 1001ś1014. [19] Dmitri Maslov, Gerhard W Dueck, D Michael Miller, and Camille Negrevergne. 2008. Quantum circuit simpliication and level compaction. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems27, 3 (2008), 436ś444. [20] Jarrod R McClean, Sergio Boixo, Vadim N Smelyanskiy, Ryan Babbush, and Hartmut Neven. 2018. Barren plateaus in quantum neural network training landscapes. Nature communications 9, 1 (2018), 1ś6. [21] Tom M. Mitchell. 1980. The Need for Biases in Learning Generalizations. Technical Report. Rutgers University, New Brunswick, NJ. http://dml.cs.byu.edu/~cgc/docs/mldm_tools/Reading/Need%20for%20Bias.pdf [22] Prakash Murali, David C McKay, Margaret Martonosi, and Ali Javadi-Abhari. 2020. Software mitigation of crosstalk on noisy intermediate- scale quantum computers. InProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems. 1001ś1016. ACM Trans. Quantum Comput. Machine Learning Optimization of uantum Circuit Layouts • 17 [23] Beatrice Nash, Vlad Gheorghiu, and Michele Mosca. 2020. Quantum circuit optimizations for NISQ arQuantum chitecturScience es. and Technology 5, 2 (2020), 025010. [24] Shin Nishio, Yulu Pan, Takahiko Satoh, Hideharu Amano, and Rodney Van Meter. 2019. Extracting success from ibm’s 20-qubit machines using error-aware compilation. arXiv preprint arXiv:1903.10963 (2019). [25] Alexandru Paler. 2019. On the inluence of initial qubit placement during NISQ circuit compilation. International WIn orkshop on Quantum Technology and Optimization Problems. Springer, 207ś217. [26] Sam Pallister. 2020. A Jordan-Wigner gadget that reduces T count by more than 6x for quantum chemistry applications. arXiv preprint arXiv:2004.05117 (2020). [27] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825ś2830. [28] Matteo G Pozzi, Steven J Herbert, Akash Sengupta, and Robert D Mullins. 2020. Using reinforcement learning to perform qubit routing in quantum compilers. arXiv preprint arXiv:2007.15957 (2020). [29] John Preskill. 2018. Quantum Computing in the NISQ era and beyQuantum ond. 2 (2018), 79. [30] Mehdi Saeedi and Igor L Markov. 2013. Synthesis and optimization of reversible circuitsÐa ACM sur Computing vey. Surveys (CSUR) 45, 2 (2013), 1ś34. [31] Marcos Yukio Siraichi, Vinícius Fernandes dos Santos, Caroline Collange, and Fernando Magno Quintão Pereira. 2019. Qubit allocation as a combination of subgraph isomorphism and token swapping. Proceedings of the ACM on Programming Languages3, OOPSLA (2019), 1ś29. [32] Marcos Yukio Siraichi, Vinícius Fernandes dos Santos, Sylvain Collange, and Fernando Magno Quintão Pereira. 2018. Qubit allocation. In Proceedings of the 2018 International Symposium on Code Generation and Optimization. 113ś125. [33] Seyon Sivarajah, Silas Dilkes, Alexander Cowtan, Will Simmons, Alec Edgington, and Ross Duncan. 2020. t| ket>: A retargetable compiler for NISQ devices.Quantum Science and Technology (2020). [34] Bochen Tan and Jason Cong. 2020. Optimal layout synthesis for quantum computing. 2020InIEEE/ACM International Conference On Computer Aided Design (ICCAD). IEEE, 1ś9. [35] B. Tan and J. Cong. 2020. Optimality Study of Existing Quantum Computing Layout Synthesis IEEE ToT ols. rans. Comput. early access (2020), 1ś1. [36] Swamit S Tannu and Moinuddin K Qureshi. 2019. Not all qubits are created equal: a case for variability-aware policies for NISQ-era quantum computers. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. 987ś999. [37] Ellis Wilson, Sudhakar Singh, and Frank Mueller. 2020. Just-in-time Quantum Circuit Transpilation arXiv Reduces preprint Noise. arXiv:2005.12820 (2020). [38] Xindong Wu, Vipin Kumar, J. Ross Quinlan, Joydeep Ghosh, Qiang Yang, Hiroshi Motoda, Geofrey J. McLachlan, Angus Ng, Bing Liu, Philip S. Yu, Zhi-Hua Zhou, Michael Steinbach, David J. Hand, and Dan Steinberg. 2007. Top 10 Algorithms in Kno Datawl.Mining. Inf. Syst. 14, 1 (Dec. 2007), 1ś37. https://doi.org/10.1007/s10115-007-0114-2 [39] Chi Zhang, Ari B Hayes, Longfei Qiu, Yuwei Jin, Yanhao Chen, and Eddy Z Zhang. 2021. Time-optimal QubitPrmapping. oceedings of In the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems . 360ś374. [40] Yuan-Hang Zhang, Pei-Lin Zheng, Yi Zhang, and Dong-Ling Deng. 2020. Topological Quantum Compiling with Reinforcement Learning. Physical Review Letters 125, 17 (2020), 170501. [41] Alwin Zulehner, Alexandru Paler, and Robert Wille. 2018. An eicient methodology for mapping quantum circuits to the IBM QX architectures.IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems38, 7 (2018), 1226ś1236. A BACKGROUND From an abstract point of view, register connectivity is encoded as a graph. In the simplest form possible, the graph edges are not weighted. The graph edges are unique tuples of circuit registers (� , � ) � � For example, in Fig. 13 the register connectivity of the�cir is cuit the red graph. The unique tuples of device �� registers are the edges of the device graph. For example, in Fig. 13, the device register connectivity is the blue graph. Fig. 4a includes a quantum circuit example: the qubits are represented by horizontal wires, the two qubit gates are vertical lines, the control qubit is marked •with , and the a target with⊕. The red graph from Fig. 13 is obtained by replacing all wires from Fig. 4 with vertices and the CNOTs with edges. ACM Trans. Quantum Comput. 18 • Paler et al. The aim of parameter optimization is to ind the parameters of a given model that return the best performance of an objective function evaluated on a validation set. In simple terms, we want to ind the model parameters that yield the best score on the validation set metric. Fig. 13. The quantum circuit (red) has to be executed on the quantum device (blue). The circuit uses four qubits (vertices marked with C) and the hardware has four registers (blue vertices), too. The circuit assumes that operations can be performed between arbitrary pairs of registers (the edges connecting the registers). The device supports operations only between a reduced set of register pairs. In machine learning, we usually distinguish between the training parameters, which are adapted during the training phase, and the hyperparameters (or meta-parameters), which have to be speciied before the learning phase [1]. In our case, since we do not train (adjust) inner parameters on speciic training sets, we have only hyperparameters, which we will simply call parameters here . Parameter optimization may include a budgeting choice of how many CPU cycles are to be spent on parameter exploration, and how many CPU cycles are to be spent evaluating each parameter choice. Finding the łbestž parameter coniguration for a model is generally very time consuming. There are two inherent causes of this ineiciency: one is related to the search space, which can be a discrete domain. In its most general form, discrete optimization is NP-complete. The second cause is the evaluation of the objective function can be also expensive. We call this evaluation for one set of parameter values trial. a There are several recent attempts to optimize the parameters of quantum circuits. Machine learning optimizers tuned for usage on NISQ devices were recently reviewed by Lavrijsen et al [16]. Several state-of-the-art gradient- free optimizers were compared, capable of handling noisy, black-box, cost functions and stress-test them using a quantum circuit simulation environment with noise injection capabilities on individual gates. Their results indicate that speciically tuned optimizers are essential to obtaining valid results on quantum hardware. Parameter optimizers have a range of applications in quantum computing, including the Variational Quantum Eigensolver and Quantum Approximate Optimization algorithms. However, this approach has the same weaknesses like classical optimization ś global optima are exponentially diicult to achieve [20]. Currently, the most common parameter optimization approaches ar 1,e2,[ 6, 13]: Grid Search, Random Search, derivative-free optimization (Nelder-Mead, Simulated Annealing, Evolutionary Algorithms, Particle Swarm Optimization), and Bayesian optimization (Gaussian Processes, Random Forest Regressions, Tree Parzen Estimators, etc.). Many software libraries are dedicated to parameter optimization, or have parameter optimization capabilities: BayesianOptimization, Hyperopt-sklearn, Spearmint, Optunity 1, 13]. Cloud , etc [ based highly integrated parameter optimizers are ofered by companies like Google (Google Cloud AutoML), Microsoft (Azure ML), and Amazon (SageMaker). B TECHNICAL DETAILS B.1 Weighted Random Search for parameter optimization There are two computational complexity aspects which have to be addressed in order to ind good QXX parameters: a) Reduce the search space and implicitly the number of trials; b) Re and duce the execution time of each trial. In general, the performances of a parameter optimizer is determined by [1, 2]: ACM Trans. Quantum Comput. Machine Learning Optimization of uantum Circuit Layouts • 19 • F1. The execution time of each trial. • F2. The total number of trials ś search space size. • F3. The performance of the search. Search space reduction (F2) and search strategy (F3) are inter-connected and can be addressed in a sequence: F2 is a quantitative criterion (how many). For instance, we can irst reduce the number of parameters (F2), and create this way more lexibility in the following stage for F3. In this work, we do not reduce the number of parameters. F3 is a qualitative criterion (how łsmartž). For instance, (F3), we can irst rank and weight the parameters based on the functional analysis of the variance of the objective function, and then reduce the number of trials (F2) by giving more chances to the more promising trials. There is a trade-of between F2 and F3. To address these issues and reduce the search space, we use the following standard techniques: • Instance selection: reduce the dataset based on statistical sampling (relates to F1). • Feature selection (relates to F1). • Parameter selection: select the most important parameters for optimization (relates to F2 & F3). • Parameter ranking: detect which parameters are more important for the model optimization and weight them (relates to F3 & F2). • Use additional objective functions:number of operations, optimization time, etc. (relates to F3 & F2). On average, the WRS method WRS converges faster than RS [13]. WRS outperformed several state-of-the art optimization methods: RS, Nelder-Mead, Particle Swarm Optimization, Sobol Sequences, Bayesian Optimization, and Tree-structured Parzen Estimator [2]. In the RS approach, parameter optimization translates into the optimization of an objectiv � eof function � variables by generating random values for its parameters and evaluating the function for each of these 6]. values [ The function computes some quality measure or score of the model (e.g., accuracy), and the variables correspond to the parameters. The assumption is to maximize � by executing a given number of trials. Focusing on factor F3, the idea behind the WRS algorithm is that a subset of candidate values that already produced a good result has the potential, in combination with new values for the remaining dimensions, to lead to better values of the objective function. Instead of always generating new values (like in RS), the WRS algorithm uses for a certain number of parameters the so far best obtained values. The exact number of parameters that actually change at each iteration is controlled by the probabilities of change assigned to each parameter. WRS attempts to have a good coverage of the variation of the objective function and determines the parameter importance (the weight) by computing the variation of the objective function. B.2 Training QXX-MLP For the training and validation stages, we had 12 input features: circuit features (extracted using the Python 3 4 5 package networkx) ś ��� _����_���� , ��_����_���� , �����, �����, �� � ������, � �������ś merged with QXX’s parameters ś �������ℎ, ����ℎ������, �, � , �������������,���������. We have chosen PageRank, Smetric and the other metrics, in order to capture as much information about the circuits. The more features used for learning, the better the trained model is. The value to be predicted by the models was ����the �between the depth of the known optimal circuit and resulting circuit. The performance of KNN, RF and MLP was assessed through tenfold cross validation (CV) over the whole dataset. In tenfold CV the available dataset resulted in exhaustive search is split in 10 folds. Each fold is used in Ranking nodes based on the structure of the incoming links. The eiciency of a pair of nodes in a graph is the multiplicative inverse of the shortest path distance between the nodes. The sum of the node degree products for every graph edge. ACM Trans. Quantum Comput. 20 • Paler et al. Model Hyperparameter Values KNN Neighbors sought {2, . . . , 8} KNN Minkowski metric’s � {1, 2} MLP Hidden layer’s size {3, 10, 20, 50, 100} MLP Activation function ReLU, tanh RF Maximum depth of a tree {2, 3, 4, 5, ����} RF Number of trees {2, 5, 10, 20} Table 4. Hyperparameter names and candidate values turn as a validation subset, and the other nine folds are used for training. Finally, the ten performance scores obtained on the validation subsets are averaged and used as an estimation of the model’s performance. For each of the ten train/validation splits, the optimal values of their speciic hyperparameters were sought via grid search, using ivefold CV; the metric to be optimized was mean squared error. The models’ speciic hyperparameters and candidate values are given in Table 4. As both KNN and MLP are sensitive to the scales of the input data, we used a scaler to learn the ranges of input values from the train subsets; the learned ranges were subsequently used to scale the values on both train and validation subsets. We used the reference implementations from scikitślearn 27[], version 0.22.1. Excepting the hyperparameters in Table 4, the hyperparameters of all other models are kept to their defaults. The lowest average values for mean squared error were obtained by RF, closely followed by MLP and KNN. From RF and MLP we preferred the latter due to its higher inference speed and smaller memory footprint. The inal MLP model was prepared by doing a inal grid search for the optimal hyperparameters from Table 4, choosing the best model through ivefold CV. The resulted networks look as follows: the input layer has 12 nodes, fully connected with the (only) hidden layer which hosts 100 neurons; furthermore, this layer is fully connected with the output neuron. ReLU and identity were used as activation functions for the hidden and output layers, respectively. For the hidden layer, both the number of neurons and the activation function were optimized through grid search. C EVALUATION We use the exhaustive search raw data and introduce metrics to evaluate the parameter importanc ��e�of ��ℎ. The function has six parameters (see Section 2.1), and we analyzed individual their importance using WRS (see Subsection 2.5). For example, Table 1 lists the importances (weights) of the individual QXX parameters. These weights were computed under the strong (naïve) independence assumption between the parameters. Usually, parameters are statistically correlated, and we prefer a iner grained understanding of the QXX’s performance. To compare how parameter pair inluence the �����function, we introduce two metrics calle ����� d and ����. To compute these metrics we execute the exhaustive search for the three values��of �����ℎ and consider all the parameter conigurations from Table 1 ś a six-dimensional grid search. For a given value of �������ℎ and a parameter coniguration (all other ive parameters) we average the resulting depth of the compiled cir �cuit, , over the circuits existing in � ��the benchmark. From the total 1485 ��� averages, we sample the lowest 100 values, leading to an approximate 7% sampling rate ≈,0.067, from the total number of parameter conigurations. The function �����(� = �,|� |, �������ℎ) � �� ACM Trans. Quantum Comput. Machine Learning Optimization of uantum Circuit Layouts • 21 is the number of times when a parameter � bounded to � was counted in the best 100 parameter combinations obtained for QXX executed for a particular value ���of ����ℎ for QUEKO TFL circuits � of depth|� |. � �� � �� For example,�����(� = 0, 1, 30) is the number of parameter conigurations wher � = e 0 and QXX was used with a search tree of�������ℎ = 30 to lay out TFL circuits of depth 1. The ����� function can compare how, for diferent circuit depths, �������ℎ inluences the optimal values�.of The ���� function aggregates how diferent values�of are ranked against each other when considering diferent TFL circuit depths: for the same |� |, higher rank values are better. The ���� is used to suggest � �� parameter value ranges. ���� (�, |� |) = �����(�, |� |, �������ℎ ) (3) � �� � �� � �=0 �������ℎ ∈ {1, 5, 9} C.1 Optimum parameters vs. circuit depth To speed-up QXX, we are interested in inding parameter values that keep execution times as low as possible without massively impacting the obtaine �����dvalues. According to WRS, �������ℎ is one of the most important parameters, but it is not immediately obvious if it is possible to achie �����vvalues e optimal using low �������ℎ values. In the following, we form parameter pairs betw ��e�en ����ℎ ∈ {1, 5, 9} and the other ive QXX parameters. The best layouts are obtained for: a) large values��of �����������,�and, b) for shallow circuits,�the ������� has to be preferably low. These observations explainStochasticSwap the gate scheduling method (Fig. 2). The �������������value shows that the scheduler prefers to move a single qubit on the coupling graph. The way how ��������values are inluenced by the TFL depth indicates that there exists a relation between the number of gates in the circuit and the number of edges in the coupling graph. This relation could be modelled �� _����� through a density function like . To the best of our knowledge, the efect of this density function on the �� _����� coupling graph edge weights has not been investigated in the literature by now. Figs. 14 and 15 show how randomly chosen parameter conigurations inluence the depth �����of shallow circuits with depths up to 25. Fig. 17 (optimal parameter coniguration irrespective of the �value �����of �ℎ) is supported by the results from Figs. 18, 19(discussed in the next section ś the Gaussian inluences �������ℎ) and 20 ś the best layouts are obtained for: a) large values�of ������������,�and, b) for shallow circuits,�the �������has to be preferably low. We conjecture that the variability in Figs. 18, 19 and 20 is mostly due to the correlations that exist between the search space size and the timeouts. For example, it can be seen that, with a small exception for medium-depth circuits, the best performing value ���of �ℎ������is correlated with the one�of ������ℎ. The number of timeouts we obtained for high values ���of ����ℎ = 9 is an indication of this observation. As a conclusion, it does not seem to be necessary to increase the breadth of the search if the depth of the search tree is shallow. C.2 GDepth parameters Fig. 19 answers the question: What is the best performing value � considering of the depth ��(�����ℎ) of the QXX search space? How many gates of a circuitare important is answered by the value�of . Out of the 11 used values (cf. Table 1), the irst four.0(,00.2, 0.4, 0.6) are considered being ����, the last four .14, 1.6, 1.8, 2.0 are ������. The remaining three values ar � �e��. Due to these ranges, the values on the vertical axis are normalised to 1. The width of the bell curve from Fig. 5 is conigurable and indicative of which circuit gates are the most important wrt.�����optimality. ACM Trans. Quantum Comput. 22 • Paler et al. Fig. 14. Random parameter configurations and their influence on TFL circuit depth optimality. The axes have the same interpretation like in Fig. 8. Each line corresponds to a random parameter value configuration. Fig. 15. Random parameter configurations and their influence on QSE (supremacy experiment) circuit depth optimality. The axes have the same interpretation like in Fig. 8. Each line corresponds to a random parameter value configuration. The number of preferred gates seems to be a function of the used timeout. The more time spent searching for optimal parameters, the thinner the Gaussian bell. As observed in Table 3 and Table 2, the number of timeouts for �������ℎ = 9 is high, such that the � values for timeout at 20 seconds seem not to obey the scaling observed for the other timeouts. This conirms the results of the exhaustive search as presented in Fig. 19 in Appendix, where the curve for�������ℎ = 9 has a high variation along the vertical axis. This diagram in Fig. 16 is similar to the one from Fig. 10, but the WRS was executed on all benchmarks ś parameters were chosen to be compatible with a much larger range of circuits. The curves do not seem to converge as well as they did for the depth 25 circuits. ACM Trans. Quantum Comput. Machine Learning Optimization of uantum Circuit Layouts • 23 Fig. 16. Precision: Laying out TFL circuits with depths up to 45. Fig. 17. It should be possible to find an optimal parameter configuration irrespective of the��value �����of ℎ: Ploting the best depth (let) and�����(right) obtained per TFL circuit depth and �������ℎ parameter. The red (MAX1, MAX5, MAX9) and green (MIN1, MIN5, MIN9) curves are the highest and lowest depths achieved for each �������ℎ value (1,5,9). ACM Trans. Quantum Comput. 24 • Paler et al. Fig. 18. Exhaustive Search: The ����� values to assess the influence of ��the ��ℎ������parameter on the�����. For this we plot the����� obtained for two values�of ������ℎ: blue for 1, and orange for5. Due to the high number of timeouts during the exhaustive search with �������ℎ = 9, the corresponding curve was not ploted. For ����ℎ������= 1, almost for all circuits (with the exception of depth 20, 25 and 30) the ��� best ���is obtained for�������ℎ = 1. As ����ℎ������is increased, the beter results are achieved by larger �������ℎ ś the orange curve is above the blue one. Fig. 19. The value of � can improve both the mapping (higher�����) as well as speed up the search due to lo�w�er �����ℎ: Normalised ����� values to assess the influence of � parameter the on the�����. For this we plot the ����� obtained for three values of �������ℎ: blue for 1, and orange for5, gray for9. Beter results are obtained with decreasing �������ℎ as the value of � is increasing. For example, in the let panel,����performs beter for�������ℎ = 5 for circuits with a depth up to 30. Moreover, � ��� achieves the best depth ratios for �������ℎ = 1. ACM Trans. Quantum Comput. Machine Learning Optimization of uantum Circuit Layouts • 25 Fig. 20. Exhaustive Search: let) ���� values for the ��������parameter ś up to circuits of depths 35 QXX achieves beter �����values for edge costs of 0.2, while for deeper circuits a higher value of the edge cost deliv�ers ���b�eter values; right) ���� values for the ��������������parameter ś higher parameter values perform significantly beter than lower values, meaning that, cf. Fig. 7, it is beter to move a single qubit instead of two across the graph. ACM Trans. Quantum Comput.

Journal

ACM Transactions on Quantum ComputingAssociation for Computing Machinery

Published: Feb 24, 2023

Keywords: Machine learning

References