Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Optimized Compiler for Distributed Quantum Computing

Optimized Compiler for Distributed Quantum Computing DANIELE CUOMO , Department of Physics, University of Naples Federico II ∗2 MARCELLO CALEFFI , DIETI, University of Naples Federico II KEVIN KRSULICH, IBM Quantum, T.J. Watson Research Center FILIPPO TRAMONTO , Kyndryl Italia Innovation Services GABRIELE AGLIARDI, Department of Physics, Politecnico di Milano and IBM Italia ENRICO PRATI , Department of Physics, Università degli Studi di Milano and IFN-CNR ∗2 ANGELA SARA CACCIAPUOTI , DIETI, University of Naples Federico II Practical distributed quantum computing requires the development of eicient compilers, able to make quantum circuits compatible with some given hardware constraints. This problem is known to be tough, even for local computing. Here, we address it on distributed architectures. As generally assumed in this telegates scenarior,epresent the fundamental remote (inter-processor) operations. Each telegate consists of several tasks: i) entanglement generation and distribution, ii) local operations, and iii) classical communications. Entanglement generations and distribution is an expensive resource, as it is time-consuming. To mitigate its impact, we model an optimization problem that combines running-time minimization with the usage of distributed entangled states. Speciically, we formulated the distributed compilation problem as a dynamic network low. To enhance the solution space, we extend the formulation, by introducing a predicate that manipulates the circuit given in input and parallelizes telegate tasks. To evaluate our framework, we split the problem into three sub-problems, and solve it by means of an approximation routine. Experiments demonstrate that the run-time is resistant to the problem size scaling. Moreover, we apply the proposed algorithm to compile circuits under diferent topologies, showing that topologies with a higher ratio between edges and nodes give rise to shallower circuits CCS Concepts: · Hardware→ Quantum computation; · Computer systems organization→ Distributed architectures; · Mathematics of computing→ Network optimization. Additional Key Words and Phrases: Quantum Circuit Compilation, Integer Linear Programming 1 INTRODUCTION Distributed architectures are envisioned as a long-term solution to provide practical applications of quantum computing12 [ , 22, 40, 88]. The general trend [31, 40, 41, 50, 66, 86] shows a common belief in distributed (and Also with FLY, Future communications Laboratory. Also with CNIT, National Inter-university Consortium for Telecommunications. Also with IBM Client Innovation Center during his contribution to this work. Authors’ addresses: Daniele Cuomo, daniele.cuomo@unina.it, Department of Physics, University of Naples Federico II, Italy; Marcello Calei, marcello.calei@unina.it, DIETI, University of Naples Federico II, Via Claudio 62, Italy, 80126; Kevin Krsulich, kevin.krsulich@ibm.com, IBM Quantum, T.J. Watson Research Center, Yorktown Heights, New York, 10598; Filippo Tramonto, ilippo.tramonto@gmail.com, Kyndryl Italia Innovation Services, Via Circonvallazione Idroscalo snc, Segrate (MI), Italy, 20090; Gabriele Agliardi, gabrielefrancesco.agliardi@polimi.it, Department of Physics, Politecnico di Milano, Piazza Leonardo da Vinci, Milano, 20133 and IBM Italia, Via Circonvallazione Idroscalo, Segrate (MI), 20090; Enrico Prati, enrico.prati@unimi.it, Department of Physics, Università degli Studi di Milano, Via Celoria 16, Milano, 20133 and IFN-CNR, Piazza Leonardo da Vinci 32, Milano, Italia, 20133; Angela Sara Cacciapuoti , angelasara.cacciapuoti@unina.it, DIETI, University of Naples Federico II, Via Claudio 62, Italy, 80126. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proit or commercial advantage and that copies bear this notice and the full citation on the irst page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). © 2023 Copyright held by the owner/author(s). 2643-6817/2023/1-ART https://doi.org/10.1145/3579367 ACM Trans. Quantum Comput. 2 • D. Cuomo et al. quasi-distributed, or multi-core) architectures as physical substrate, allowing a modular and horizontal scale-up of computing resources, rather than relying on vertical scale-up, coming from single hardware advancement. On the lip side, by linking distributed quantum processors, several new challenges 12, 15arise , 22, 30[, 51, 78, 90]. Here we consider thecompilation problem , which is generally tough to solve, even on single processor, and for which an NP-hardness proof is available 10].[An ever growing literature arises with a variety of proposals for local computing 9, 11 [ , 34, 45, 46, 49, 61, 67, 68, 70, 74, 83, 91, 93, 98] and for distributed computing 8, 23[ś 25, 35, 39, 76, 80, 81, 96, 97]. Even if quantum processors are already available, distributed architectures are at an early stage and must be discussed from several perspectives. A key concept is that telegates of as the fundamental inter-processor operations22 [ , 86, 88]. Each telegate can be decomposed into several tasks, that we group as follows: (i) the generation and distribution of entangled states among diferent processors, (ii) local operations and (iii) classical communications. Such tasks make the telegate an expensive resource, especially in terms of running . As atime consequence, they have critical impact on the performance of the overall computation. In contrast to such a limit, telegates ofer remarkable opportunities of parallelization. In fact, much circuit manipulation is possible to keep computation independent from telegate tasks. Therefore, we aim to model an optimization problem that embeds such opportunities. 1.1 Contribution Fig. 1. Manuscript overview. Blue blocks denote the steps in the problem modeling, scanned by blue arrows. Red blocks are the main ingredients to the entry blue blocks. The overall objective of our work is to deeply analyse strategies to reduce the overhead caused by telegates, which are the main bottleneck in the computation on distributed architectures. Fig. 1 gives a step by step overview of the paper, with particular attention to the problem modeling. Refer to [60, 94] for the state of the art on experimental implementations. ACM Trans. Quantum Comput. Optimized compiler for distributed quantum computing • 3 Sec. 2 and 3 are devoted to detailing and justifying assumptions our . As computation model we consider quantum circuits with a universal operator set. The set is based on local operations and on telegates as fundamental inter-processor operations. Here, we optimize telegates to eiciently scale with inter-processor connectivity restrictions. We move on by deining rigorously the problem (Sec. 4). To come up withformulation our we rely on a wide literature from the Operation Research ield, dealing with network scenarios. Speciically, we notice several analogies between our problem and those on dynamic networks, especially the group multi-commo of dity low problems [16ś 21, 36ś 38, 79, 84, 85]. The resulting formulation is particularly remarkable, as it is suitable for run-time minimization together with the minimization of resource usage, as a side objective. In an early step, the formulation is deliberately abstract, as it relies on binary relations that are not fully characterized at this stage. We believe that this enhances the modularity of the work and its readability. In fact, exploring the solution space requires to perform costly circuit manipulation, that deserve a dedicated discussion. Nevertheless, right after the abstract description of the problem structure, we proceed with thecharacterization full of the aforementioned binary relations (Sec.s 5.1 and 5.4). These relations deine which circuit manipulations are feasible. At irst, we use relations to model operations that can run in parallel, and in this context we introduce a relaxed version of parallelism, that we call quasi-parallelism . This relation is based on (automated) circuit manipulation which aims to gather telegates within the same time step. Sec. 5.1 contains a discussion on how to transform the graph, in order to adapt the model to the kind of circuit group one is tackling. After that, we relate all the operations to the partial order set induced by circuits expressed in normal forms ś see Sec. 5.3. We then describe ourimplementation (Sec. 6) and evaluate it by means of numerical experiments on diferent lattices (Sec. 7), showing that a square lattice gives rise to shallower circuits than a hexagon lattice, and that the compiler is able to process square lattices faster. We relate such a result to ratio between edges and nodes, which becomes an important index when choosing a topology for distributed quantum computation. Sec. 8 contains the summary of the indings and the conclusions. 2 DISTRIBUTED QUANTUM COMPUTING ESSENTIALS In this section we describe the main elements, featuring a distributed quantum architecture. One can encode a quantum processor as a set of qubits and a set of sparse tuneable couplings among qubits. If two qubits are coupled it means that they can interact. We will refer to such couplings local couplings as , to emphasize they belong to the same node in distributed architectures, as oppose entanglement d to links , that are couplings between qubits in diferent processors. As detailed in next sub-section, two remote qubits coupled through an entanglement link cannot be used for computation: consequently, it is useful to classify qubits as eithercomputation qubits or communication qubits , respectively. While computation qubits process information during the computation, the communication qubits couple distinct processors through the entanglement. Fig. 2 shows a toy architecture. The purple lines represent the couplings among distributed processors. 2.1 The entanglement link To couple two processors, a communication protocol, knownentanglement as generation and distribution [12, 13, 22], is necessary. We describe it here as three main steps: (1) generating a two-qubits maximally entangled;state (2) distributing the state between diferent processors ; (3) storing the partial states in the communication qubits. A similar classiication is available in Refs. [12, 59] The two-qubits assumption is general and can be extended to multi-qubits protocols. This step implies communication. The interested reader can ind in Ref. [13] three diferent protocols achieving the task. ACM Trans. Quantum Comput. 4 • D. Cuomo et al. When the protocol succeeds, the distributed qubits are correlated and can be exploited to perform non-local operations. For this reason we consider this correlation as a virtual link, which wentanglement e refer to as link. Entanglement links extend the possible interactions to any distributed computation qubits. Speciically, since the communication qubits are locally coupled with computation qubits, with entanglement links one can perform operations between remote computation qubits, referred telegates to as . More details on the functioning of telegates are reported in Sec. 3.2. However it is important to keep in mind that, to perform a remote operation, one has to measure the states stored in the communication qubits. As a consequence, an entanglement link is a depletable resource, assigned to a single remote operation. After the measurement, a new round of entanglement generation and distribution takes place. We now give a mathematical description of a distributed architecture, in order to formally describe the functioning of telegates. 2.2 Mathematical description So far, we presented the main elements occurring in a dis- tributed quantum architecture, which we can now represent mathematically. Formally,N let= (� , �, �) be a network triple representing the architectur � e=. � ∪ � is a set of nodes describing qubits, therefore it is the disjoint union of Fig. 2. Toy distributed quantum architecture with 3 computation qubits � = {� , � , . . . , � } and communica- 1 2 |�| processors. tion qubits � = {� , � , . . . , � }. We can represent � proces- 1 2 |�| sors by partitioning � into� = {� , � , . . . , � }. Therefore, a 1 2 � sub-set � characterizes a processor as its set of qubits/nodes. � = �∪ � is as a set of undirected edges. � represents the local couplings, therefore �⊆ � × � . � � Notice that there is no particular assumption on connectivity nor cardinality within processors. This keeps the treating hardware-independent and it allows for heterogeneous architectures. � represents entanglement links. Since entanglement links connect only communication qubits, we introduce, for each processor, a set of those qubits only; i.e � = ., � ∩ � . Therefore, we have � � � ⊆ � × � . � � �,�: �≠� Fig. 2 shows an exemplary architecture, with three processors �,in six computation qubits�in , six communi- cation qubits � in , three entanglement links � in and ten local couplings�in . Concerning minimal assumptions, we only care about architectures actually able to perform any operation. This translated into a simple connection assumption. 3 OPERATORS In the following, the gate model architecture of quantum computers is considered. There, a circuit describes a time-ordered quantum evolution as a sequence of quantum gates consisting of unitary operators. The set of available operators depends on the physical implementations. The interested reader can ind a discussion about how to achieve practical entanglement generation and distribution, via heralded-based protocols, at Ref. [59]. ACM Trans. Quantum Comput. Optimized compiler for distributed quantum computing • 5 3.1 Computation operators In order to achieve universal quantum computing, one may rely on a universal set of quantum logic gates capable to approximate any possible unitary operator. In the following, we consider a representative universal set of quantum gates, without loss of generality. A suicient set for local universal quantum computing consists of the three operators {CX, H, T}, where CX is the conditioned bit-lip operator H is the , Hadamard operator andT is the -phase shift. Indeed, with a polynomial number of repetitions H and Tof one can approximate any unitary operator with arbitrary precision 54, 75[]. Another suicient set is also {CZ, H, T}, where CZ is a conditioned phase-lip, thanks to the equivalence CZ ≡ H CX H . �,� � �,� � Nevertheless, for practical reasons that will be clear in Sec. 5.2, we ind convenient in the current paper to rely on the extended gate set {CX, CZ, H, T}. Other choices of universal sets are possible, such as those based on trapped ions in a3], cavity suitable [ for quantum interfaces where the photonic state is transferred to the cavity mode, and then to the electronic state of the ion via laser pulses [30, 86]. 3.2 Universal set To extend the universality also to distributed architectures, we need at least one remote operator. Since in our gate set ś {CX, CZ, H, T} ś one gate acting on two qubits (namelyCX , or CZ) is suicient, then it is also enough to have one remote operator. In other words, w.l.o.g. we can show a protocol performing only CXa(or CZ) between remote computation qubits. To represent such a protocol we use the notation RCX (or RCZ). With the diferent nomenclature we highlight their physical diference. Speciically CX repr , while esents a local gateRCX , represents a sequence of operations that involves distant qubits. Therefore, in general, implementations CX and RCXofcome with diferent idelity, latency and required resources. Speciically to the RCX functioning, this is based on a several fundamental steps, which we describe, in turn, by using operators. The irst operator models the entanglement link creation; we refer to Ethat or, as more explicitly, as E . It sets qubits� and � to the maximally entangled state �,� � � Φ = (|00⟩ + |11⟩). The second operator models a measurement for a communication qubit � , over the computational basis. Namely, the measurements outputs a classical binary variable � ∈ {0, 1}. We refer to that asM and with circuit � � component represented in Fig. 3. Fig. 4 shows a possible realization of a generic RCX . Here, there �,� are two qubits� , � ∈ � and two qubits� , � ∈ � . Let us separate � � � � � � the protocol in three diferent steps. The irst one is the creation of the entanglement link between � and � , i.e., applying E . After that, the � � �,� Fig. 3. Circuit component representing a second step is thepre-processing: a few local operations occur and measurement M . then qubits� , � are measured, getting� and � respectively. The inal � � � � step is thepost-processing. The binary variables are used to assert whether further operations are required. Speciically � =, if 1, a Pauli Z operator applies to � and, if � = 1, a Pauli X operator applies to � . This phase can be compactly referred with � � � � � � � the Z , X operators. Notice that� is local to processor � and � is local � to. But � uses � and � uses � . � � � � � � � � � � In other words, a cross classical communication occurs betw �een and � . � � Let us now give a look to some exemplary applications RCXof over the toy architecture of Fig. 2. �,� Here and throughout the paper, when an operator is subscripted, we are denoting the qubits it is operatingCXon, eis .g., aCX operator �,� with control qubit � and target qubit� . � � ACM Trans. Quantum Comput. 6 • D. Cuomo et al. Example 1. Assume one wants to run an RCX with control qubit � and target � ś i.e., RCX . Just run circuit in 2 3 2,3 Fig. 4, with � = 2, �= 3, � = 2, �= 4. Example 2. Now assume one wants to run RCX . In this case we can still use the entanglement link betw � een 1,3 2 and � . However, qubit� is not coupled with � . To use that link we need to swap the states stored in � and � 4 1 2 1 2 before and after running CX. What happens if one wants to run, sayRCX , ? In such a case, the qubits belong two processors having no 1,4 entanglement link coupling them. There is a really eicient protocol to overcome this problem: it is called entanglement swapand we describe it within the next section. 3.3 The entanglement swap As pointed out before, it might be the case where one wants to runRCX anoperator between a couple of qubits belonging processors with no entanglement link. Formally � and , let� such processors and �∩ (� × � ) = ∅. In � � � � the basic scenario, there exists an intermediate pro� cessor which has an entanglement link with � both and � , � � � say via four communication qubits such�that ∈ � , � , � ∈ � and � ∈ � . As Fig. 5 shows, we exploit � to � � � � � � � � entangle� and � . � � � � � � � � � � � H � � � Fig. 4. Protocol performing anRCX . From an operator point of view, this is equivalent to perform CX . However � and � �,� �,� belong diferent processors and that is why we use notation RCX. � � � � � � � � � � � � � � � � � Fig. 5. Entanglement swap protocol. This scenario has three processors � , � , � . � has an entanglement link both with � � � � � � and with� , created respectively byE and E . At the end of the protocol � and � are in the maximal entangled state � �,� �,� � � Φ . From an operator point of view, this is equivalent to perform E . �,� ACM Trans. Quantum Comput. Optimized compiler for distributed quantum computing • 7 The entanglement swap protocol can be generalized to an arbitrary sequence of intermediate processors. To this aim we introduce the conceptentanglement of path. 3.3.1 The entanglement path. Coherently with the standard deinition of path of a graph, an entanglement path is a sequence of entanglement links connecting two processors. Formally, an entanglement path is a sequence {� , � , . . . , � } of� processors such that, for any�in 1≤ �< � , there is an entanglement link betwe�en � � � � 1 2 � � and � . �+1 We can therefore entangle two communication qubits � ∈ � and � ∈ � by applying a generalization of the � � � � 1 � entanglement swap ś showed in Appendix A ś to{� , � , . . . , � }. � � � 1 2 � Since at the end of the protocol � and � are in the entangled state|Φ ⟩, an entanglement path is a generalization � � of an entanglement link. 3.3.2 RCX with entanglement path. In our scenario, the purpose of applying entanglement swap is to perform RCX. For this reason it is interesting to note that we can combine the entanglement swap protocol together with the protocol forRCX. The result is showed in Fig. 6. This result generalizes to every path, no matter the length ś see Appx. A. We further discuss within next section the latency implications coming from this result. 4 DISTRIBUTED QUANTUM CIRCUIT COMPILATION PROBLEM Usually, in the literature dealing with compiler 35design , 46, 91, 98 [ ], a circuit is encoded as a setlay ofers. Formally, a layer is a set ℓ of independent operators, meaning that each operatorℓin acts on a diferent collection of qubits. A circuit is an enumeration ofLlay = {ers ℓ , ℓ , . . . , ℓ }, where the cardinality is also commonly 1 2 |L| referred as circuit depth. A quantum programmer writes a logical circuit, abstracting from the real architecture and assuming that qubits are fully connected, i.e., any couple of qubits can perform CX opaeration directly. Such an abstraction holds also when stepping to distributed architectur . es However, NISQ architectures do not provide full coupling. As a consequence, there must be a software interface ś namely, a compiler ś able to map an abstract circuit to an equivalent one, but meeting the real coupling. In general, such a mapping implies overhead in terms of circuit depth. Therefore, inding a mapping with minimum depth overhead is an optimization problem. We refer to itquantum as the circuit compilation problem (QCC), Recall that, from a user perspectivCX e,≡ RCX. � � � � � � ⊕� � � 1 2 4 � � � � � � 1 2 H � � � � 2 3 � � � 4 H � � ⊕� � � 2 1 3 Fig. 6. RCX with entanglement swap. � ,� 1 2 ACM Trans. Quantum Comput. 8 • D. Cuomo et al. which is proved to bNP e -hard [10]. Its version on distributed architectures, which we refer todistribute as the d quantum circuit compilation problem (DQCC), is likely to be at least as harQCC d as. In fact, while QCC in we deal with local connectivity restrictions, DQCC lo incal connectivity stands alongside with remote connectivity ś i.e., the entanglement links ś, which is less dense than the local. Furthermor one e, performing a remote operation is much more time consuming than a local operation. Just consider that a remote operation relies on communication of both quantum and classical information. The above reasons make telegates the bottleneck in distributed computing. Therefore, they are worth Notation Description of dedicated analysis to minimize their impact. [�] An enumeration set{1, 2, . . . , �} O Font mainly used to denote operators 4.1 Objective function Δ Time to run operatorO To optimize a circuit, the irst thing we need to do isQ Quotient graph choosing an objective function to rate the expected L Circuit encoding performance of a circuit. A common approach is to L Circuit where only O operators occur evaluate only those operators which are somehow a � Discrete time step bottleneck to computation. Considering the gate set ≺, q Binary relations {CX, CZ, H, T}, in the context of fault-tolerant quantum A Predicate used to characterize q � Boolean variable computing42 [ ], the bottleneck is the T operator [4, � Flow function 82] since error correction protocols are designed for � �-th quantum processor {H, CX}. Conversely, on current NISQ technologies, s, t sources and targets vector the bottleneck lies inCX theand CZ operators, that are � Circuit depth more noisy as they operate on two qubits. The relevant metric can either be the number of occurrences of the subject operator O, namely theO-count, or the number of layers containing O at least once, namely theO- depth. To rate a compiled circuit on distributed architectures, we do something along the lines of this latter approach. Speciically, the bottleneck is RCX theand the RCZ operators, and each RCX or RCZ implies one occurrence ofE. Therefore, we will rate a circuit by meansEof -depth. its As simple example Eof -depth, consider an instance of the problem: a logical circuit where some RCX operators occur. Fig. 7 shows an exemplary one. Let us put in the worst-case scenario, i.e., all the 9 T four qubits belongto diferent processors. Consequently, all the two-qubits operators areRCX. Without considering the tasks which H T RCX relies on, there is not much optimization to do and E-depth the is 5. Fig. 7. Exemplary logical circuit, expressed in 4.2 Modeling the time domain the universal gate set {CX, CZ, H, T}. It should be clear thatE has central interest in our treating. In fact, we are also going to model the time by scanning Eitoccurs. as Speciically, notice that link generations among diferent couples of qubits are independent. For this reason we assume that all the Because the more communication qubits there are, the less computing resources are available. Assigning logical qubits to physical ones ś i.e., qubit mapping ś is another critical step for compilation and it deserves dedicated analysis [5, 26], out of the scope of this work. ACM Trans. Quantum Comput. Optimized compiler for distributed quantum computing • 9 possible links generate simultaneously and, as soon as all the states are measured, a new round of simultaneous generations begins. Clearly, after that a measurement M generates a boolean�, there is at least one post-processing operator that need to wait for that boolean to arrive. Generally speaking, the longer the path the mor�etakes timeto reach its destination. We need to account for that by a proper model. To this aim, we do some observations. Remark. Consider a generic single-qubit unitary opUerator . The time required to perform U is largely dominated by the travel time of �, whilst the actual time takenUby can be neglected. Furthermore, the travel of � is independent from computation. Hence, we can compactly refer to the post-processing waiting-time Δ . Aassecond observation is that the travel of� is also independent by entanglement link creations, which we assume to take Δ . It time is also logical to assume Δ ≲ Δ for the following reasoning: ev�enneif ed to cover a longer distance than the one covered by E, � relies on classical technologies, which are way more eicient than entanglement generation and distribution protocols. For this reason, in our treating we negle Δ �ct , since it happens in parallelΔwith . Stemming from this, we can model the time domain as a discrete set of � ∈steps {1, 2, . . . , �}, where � is an unknown time horizon, which is also E-depth. the At the beginning of each time � step , the whole set of entanglement links is available for telegates. Notice that most of the local operators are expected to run during the creation of the links. Because we relate them to the following inequality Δ ≫ Δ , Δ , Δ , Δ , (1) E CX CZ H T where, for a generic operator O, Δ is the time to run O. Therefore, since E is independent from local operators, we can always attempt to run these while E is running ś and also while classical � ar bits e traveling, as explained in Sec. 3.3.2. 4.3 Modeling the distributed architecture In light of the above observations, it is reasonable and convenient to consider the whole processor as a network node, and deine a function �that provides the number of available links between two processors. Speciically, we irst formalized a distributed architecture as the networkNgraph = (� , �, �) introduced in subsection 2.2; this step was important to understand the interior behavior of remote operations from a qubit perspective. However, now it is useful to re-state it to a more compact encoding, which highlights the main bottleneck of a distributed quantum architecture, the entanglement links. Formally speaking, we will consider quotientagraphofN. To not further weigh down the formalism, we re-model the in- stance, by considering as main nodes, the processors, corresponding to an enumeration for the partition �, i.e.,� = {� , � , . . . , � }. All 1 2 � the entanglement links connecting the same couple of processors, Fig. 8. uotient graph derived from toy net- now collapse two a single edge with integer capacity �, describing work of Fig. 2. The processors become the nodes, how many parallel entanglement links the two processors supplies. the entanglement links between a couple of We refer to this sets of edges as Ø processors gather into one edge, with capacity � ⊆ � × � . � � equal to the number of original links. �,�: �≠� Hence, the new undirected graph isQ = (�, �, �). With this reformulation a remote operation will refer to a control processorand a target processors ś i.e., RCX with� , � ∈ �. �,� � � In Fig. 8 we show the quotient graph related to the toy architecture of Fig. 2. The design of a distributed quantum architecture can easily adapt to satisfy requirements coming from assumptions on classical technologies, since these are very advanced. ACM Trans. Quantum Comput. 10 • D. Cuomo et al. 4.4 Single layer formulation Consider a basic circuit expressed as the singleton L = {ℓ}. Assume that inℓ there occur � RCX operators. From a logical perspective, all �the operators can run in parallel ś by deinition of layer. In other words, if the architecture connectivity had ininite capacity�(ś�i.e ) =.,∞, ∀�∈ � ś we could run L withE-depth 1, that is optimal. As the capacity values decrease, the optimal E-depth value grows, up toE-depth � in the worst-case. Let us formulate an optimization problem for the single-layer case ś we will introduce a generalization to any circuit in subsection 4.5. Speciically quickest , the multi-commodity lo [36] w wraps this basic scenario. In brief, the goal is to ind a low over time which satisfy the constraints imposed by a set of so-called commodities, which are going to representRCX theof a quantum circuit. The less time the low takes, the better. To formalize this problem one can directly model an objective function that evaluates a low by the time it takes. This is an approach employed in Ref. 63], but [ for single commodity. Alternatively, authors in 36]Ref. prop[ose to start from a formulation of multi-commo the dity lopr woblem over time MCF , where � is a given time horizon, namely a maximal number of time steps in which the low is constrained. We prefer this latter way because dynamic lows like MCF has been deeply studied since long time37 ago , 38 [ ]. Furthermore, even if this approach has an important drawback, explained at the end of this sub-section, it does not apply to our scenario. 4.4.1 Commodities. To formulateMCF , irst, we enumerate the occurrences ofRCX inL as a set of commodities [�] = {1, 2, . . . , �}. A set of couples source-sink nodes associates to the commodities. To do that, s = let (� , � , . . . � ) 1 2 � and t = (�, �, . . . � ) be two vectors induced by the operatorsRCX inL such that, 1 2 � RCX ∈ ℓ ⇐⇒ ∃�∈ [�] : � , � ∈ �. �,� � � � � � � Namely,� (� ) is the processor where the control (target) qubit of operation �occurs. � � � � 4.4.2 Decision variables. The decision variables of the optimization problem are the time-dependent functions � (�) ∈ {0, 1}, indicating the low on edge �∈ � dedicated to operation �∈ [�] at time�. The function has a �,� binary co-domain because an operation �uses at most one entanglement link. 4.4.3 Constraints. As usual, the irst constraint we introduce islothe w conservationconstraint. Formally, ∀�∈ [�],∀�∈ [�] and∀� ∈ � ∖{� , � } the following holds: � � � � � ︁ ︁ � (�)− � (�) = 0 (2) �,� �,� − + �∈� (� ) �∈� (� ) � � − + where � , � : � → � are the standard functions outputting the set of entering and exiting edges of the input node, respectively. Since a low� (�) = 1 identiies the usage of an entanglement link �to in perform�, we need to guarantee that �,� the low going through intermediate links of a path does not stop there. Conversely, whenever an end point of the path occurs in the control or target processor ś i.e�., or � ś, the operation demandś or commodity demand � � � � ś constraint holds instead of the conservation constraint. Namely ∀�∈ [,�], this can be written as: ︁ ︁ ︁ ︁ � (�)− � (�) = −1 (3) �,� �,� − + �∈� (� ) �∈[�] �∈� (� ) �∈[�] � � � � ︁ ︁ ︁ ︁ � (�)− � (�) = +1 (4) �,� �,� − + �∈� (� ) �∈[�] �∈� (� ) �∈[�] � � � � The choice of using letter � should highlight that the time horizon is going Eto-depth. be the We need to use vector notation to admit repetitions. ACM Trans. Quantum Comput. Optimized compiler for distributed quantum computing • 11 The above constraint explicitly requests that a low dedicate �reaches to its target� , without exiting. Symmetri- cally, it leaves its control processor � without returning. Notice that constraint (2) forces the operation demand to be satisied within a single time-step. The last constraint ensures that, at any time step, the number of operations does not exceed the entanglement resources. Hence,∀�∈ � and∀�∈ [�], we introduce acapacity bound: � (�) ≤ �(�) (5) �,� �∈[�] Í Í Í Ultimately, the objective function is the total � =low � (�). �∈� �∈[�] � �,� By gathering the above equations, we obtain the Integer Linear Programming formulation (6), which models MCF . A low � perfectly matches a set of entanglement paths used by the telegates. ︁ ︁ ︁ minimize � = � (�) �,� �∈� �∈[�] �∈[�] ︁ ︁ subject to � (�)− � (�) = 0 ∀�∈ [�],∀�∈ [�],∀� ∈ � ∖{� , � }, �,� �,� � � � � � − + �∈� (� ) �∈� (� ) � � ︁ ︁ ︁ ︁ � (�)− � (�) = −1 ∀�∈ [�], �,� �,� (6) − + �∈� (� ) �∈[�] �∈� (� ) �∈[�] � � � � ︁ ︁ ︁ ︁ � (�)− � (�) = +1 ∀�∈ [�], �,� �,� − + �∈� (� ) �∈[�] �∈� (� ) �∈[�] � � � � � (�) ≤ �(�) ∀�∈ �,∀�∈ [�] �,� �∈[�] Notice that solutions with cycles are in general feasible, but are senseless in our scenario. By expressing the problem as a minimization �, aof solver will avoid any cycle and will try to use as few entanglement links as possible. Once deined a solver forMCF , we just need to use it as proposed in Ref.36[], namely the solver occurs as sub- Algorithm 1: Quickest multi-commodity low routine within a binary research on the minimum time where Input: Q, [�] a feasible solution exists. Since the research space is over Output: � time, the algorithm is, in general, pseudo-logarithmic. Specif- 1 �← 1, �← � ically to our case, we already know that the worst solution 2 while �≤ � do is where all the operations run in sequence ś Ei.e -depth ., �+� 3 �← ⌊ ⌋ equal to�. Therefore, the time horizon is upper-bounded 4 �← MCF (Q, [�]) by � and the binary search haslog� calls to the sub-routine. ¯ Algorithm 1 shows the steps. Notice that the algorithm make 5 if � is feasible then use of an undetermined solver for MCF . Since we are facing6 �← � an NP-hard problem, this means that a real implementation 7 �← �− 1 would generally look for sub-optimal solutions. 8 else Unfortunately, standardMCF cannot catch the whole fea- ¯ � 9 �← �+ 1 tures ofDQCC when L = {ℓ , ℓ , . . . , ℓ }; we need to con- 1 2 |L| sider that operations in �] ar [ e somehow related each other by a logic determined L by. Hence in the following sub- section we are going to model such relations by introducing extra constraints. ACM Trans. Quantum Comput. 12 • D. Cuomo et al. 4.5 Any layer formulation As mentioned, the formulation we just gave is not enough to model DQCC theproblem to anyL = {ℓ , ℓ , . . . , ℓ }, because a circuit generally follows a logic which is related 1 2 |L| on the order of occurrence given byL. Therefore, even if it might happen that two operations could run in any order, in general this is not true. One needs to deine an order relation which is consistent with the logic of the circuit. From an optimization point of view, a critical matter is to choose an order relation that either wraps most of the good solutions or is prone to optimization algorithms. For this reason and for the sake of clarity, we here refer to a generic, irrelexive, order relation ≺ deined over [�], Fig. 9. RCX in logical con- without giving it a unique deinition. Formally �, ,�∈for [�],any �≺ �means that to flict as both �and �operate run �we need to ensure that �already ran. Starting from≺, we can deine a constraint on second qubit. to add to formulation (6). Namely∀�,∈ [�],∀�∈ � (� ) the following holds: � (�) ≤ min � (�¯) (7) �,� �,� �≺� � ¯<� The right part of the inequality is a value {0, 1}in and takes value 1 only if all the operations logically preceding �already ran. Notice that constraint (7) is linear, as it takes the minimum value among linear functions, and it can be easily mapped to a set of independent constraints � (�¯) ≤ � (�¯),∀�: �≺ �. �,� � ¯<� �,� The formulation now models DQCC. But, within next section, we reine inequality (7) to get a better solution space. 5 ENHANCING PARALLELISM � �+1 �+2 � �+1 � � 2 2 Z Z � � 1 1 E E H � H � 2 2 � � ≡ � � 1 4 4 1 X Z Z X � � 3 3 E E H � H � 4 4 � � ⊕� 3 1 3 X X Fig. 10. Example of how to achieve quasi-parallelism for tw RCXo in logical conflict. As before, from an optimization point of view, we are interested in considering as many good solutions as possible. To this aim, we propose an interesting approach which should enlarge the space of good solutions. Speciically, we notice that even if two operations �, �∈ [�] are such that �≺ �, this does not necessarily mean that they must run at diferent time steps. They, indeed, may run at the same time step and still respecting the logic imposed≺ by. Consider the example from Fig. 9. Since operations �and �operates over a common qubit, they are in logical conlict. Hence, it is reasonable to think�that ≺ �should hold. However, when considering �and �in their extended form ś i.e., where communication qubits are explicit ś, we notice that their logical conlict does not ACM Trans. Quantum Comput. Optimized compiler for distributed quantum computing • 13 map over all the operations involved. As Fig. 10 shows, the left part of the equivalence is a naive implementation of�followed by �, where the extended form completely inherits the logical conlict. Instead, the right part of the equivalence is way more eicient and it is still an implementation of circuit of Fig. 9. As consequence, even if �and �are in logical conlict, they can run at the same time step. We refer to this property as quasi-parallelism . For this reason we introduce a new binary relation between operations �], in [ which we refer to with the intuitive symb q. Asol before, we do not give here a unique deinition q. Spof eciically, for any�, �∈ [�], we write�q �to mean that operations�and �can run at the same time step, but we did not ix a criterion to establish when q holds. Clearly, operation �,s�∈ [�] which can run in full parallelism, are a special case of quasi-parallelism �q and �holds. We can now split the constraint (7), by discriminating between operations which can run in quasi-parallelism and the ones which cannot.∀Formally �∈ [�],∀�, ∈ � (� ) we introduce two new constraints � (�) ≤ min � (�¯) (8) �,� �,� �≺�∧�/� � ¯<� � (�) ≤ min � (�¯) (9) �,� �,� �≺�∧�q� � ¯≤� To sum up, we propose (10) as Integer Linear Programming formulation of DQCC theproblem.C is the set of constraints coming from the standar MCF d formulation given(6)in . In what follows we propose a characterization for relation q. ︁ ︁ ︁ minimize � = � (�) �,� �∈� �∈[�] �∈[�] subject to C, (10) � (�) ≤ min � (�¯) ∀�∈ [�],∀�∈ � (� ),∀�∈ [�], �,� �,� � �≺�∧�/� � ¯<� � (�) ≤ min � (�¯) ∀�∈ [�],∀�∈ � (� ),∀�∈ [�] �,� �,� � �≺�∧�q� � ¯≤� 5.1 Characterization Our goal is to model q to catch as many solutions as possible, while keeping them feasible to the hardware. With this in mind, we propose the following � � criterion: given any �, ,�� q �holds whenever�and �can run within a certain “small enough" time lapse. Speciically, the time lapse depends on the coherence time of communication qubits, which are assumed to be much more afected by noise than computing qubits. Notice that, when two operations �, �run in quasi-parallelism, the life- time of the employed communication qubits might grow. Therefore, we need Fig. 11. Three RCX operators in logical to ensure that it does not exceed the coherence time of the entanglement. conflict. Formally, let us assumeΔ being the coherence time of the entanglement ś hence, it starts from the momentE ends, up to the beginning of the measurements M. A complication arises from the factq that is, in general,intransitiv an relation. e To understand why this is true, consider the circuit in Fig. 11. In such a scenario we are faced with multiple choices. Namely, running (1) all�, �, � at diferent time steps; ACM Trans. Quantum Comput. 14 • D. Cuomo et al. (2) all�, �, � at the same time step; (3) �, �together and � afterwards; (4) �only, followed by �, �together. Case (1) is not of interest, because it is the worst solution and no op- timization applies. Case (2) is the best solution, but it is not necessarily feasible. In fact, for Δ small enough, we are forced to split the operations, as in one of the cases (3) and (4). This explains the non-transitivity, since �q �and �q �, but �/ �. We still need to characterize q, hence, we introduce a predicate method which aims to bring RCX closer to each other, so that quasi-parallelism is achievable. � � 5.2 A recursive predicate for the quasi-parallelism relation As said above, we are now going to introduce a method which veriies if Fig. 12. Two independentRCX ś i.e., � any two telegates can run in quasi-parallelism. Therefore, this method, say and �ś belonging diferent layers. A(�, �Δ, ), is a predicate, which is true whenever the operations in input can run in quasi-parallelism. We can inally characterize q: �q � ⇐⇒ A(�, �Δ, ). A works in a recursive fashion with three diferent scenarios as base case. Base case (i): given two operations �, ,�if they belong to the same layer, clearly they can run in full parallelism, thereforeA(�, �Δ, ) is true. Base case (ii): similarly to (i), �, �bif elong to diferent layers and they are completely independent ,A(�, �Δ, ) is true. Circuit of Fig. 12 gives an example �, � with in contiguous layers. Base case (iii): assume �, �contiguous ś i.e., in contiguous layers ś and both operating on, at least, one common qubit. We want to introduce, with this base case, the possibility that multiple operators may run simultaneously, as exempliied in Fig.s 10. For this reason, algorithm A considers all the operators involved to perform RCX anś recall protocol from Fig. 4. Namely A pushes , forward the post-processing of �ś i.e., the Pauli operations Z or X ś after the pre-processing of�ś i.e., the CX operations. One can do that by using the following transformation rules: � � � • CX(X ⊗ I) ≡ (X ⊗ X )CX � � � • CX(I⊗ Z ) ≡ (Z ⊗ Z )CX � � • CX(I⊗ X ) ≡ (I⊗ X )CX � � • CX(Z ⊗ I) ≡ (Z ⊗ I)CX Similarly, when CZaoccurs, the following rules apply: � � � • CZ(X ⊗ I) ≡ (X ⊗ Z )CZ � � • CZ(Z ⊗ I) ≡ (Z ⊗ I)CZ After the application of these rules, some post-processing operation, might hav propagate e been d also to communication qubits. Speciically, it may happen that an opXeration should precede a measurement. However, one can always reduce the depth of the circuit by sending � to the target(s) of the measurement. This is indeed what happens in our irst example ś Fig. 10 ś, where, instead of performing X in the communication qubit, � � ⊕� 3 1 3 we opt to put it in combination Xwith , achieving a single operation X ś see also Fig. 13 for a circuit Namely, what�does to its qubits does not afect the qubits �operates on. ACM Trans. Quantum Comput. Optimized compiler for distributed quantum computing • 15 representation. At the end of the circuit manipulation, the life-time of the communication qubits may have risen. If it does not exceeΔ d , then A(�, �Δ, ) is true; otherwiseA,(�, �Δ, ) is false. Φ Φ Φ Recursion: consider now the case where�and �are sepa- rated by a sequence of local operations O , . . . , O , assumed 1 � ¯ ¯ X � � to be conined to the universal set{CX, CZ, H, T}. In this caseA , ¯ ¯ applies, recursively, transformations for�and both�. Specii- � �⊕� X X cally, as long as possible pushes , it forwardthe post-processing of�by using former rules together with: Fig. 13. Propagation ofX . First wire no longer need � � • TZ ≡ Z T information of �. Second wire need information given � � ¯ ¯ • HX ≡ Z H by �⊕ �. Notice that measur�edis not the same value in the two cases. Ultimately, as long as possible A pushes , backward the pre- processing of �by using the following standard rules: • CX(T⊗ I) ≡ (T⊗ I)CX • CZ(T⊗ I) ≡ (T⊗ I)CZ • CZ(I⊗ T) ≡ (I⊗ T)CZ • (CX ⊗ I)(I⊗ CX ) ≡ (I⊗ CX )(CX ⊗ I) �,� �,� �,� �,� • (CX ⊗ I)(I⊗ CX ) ≡ (I⊗ CX )(CX ⊗ I) �,� �,� �,� �,� • CX (H⊗ H) ≡ (H⊗ H)CX �,� �,� • CZ(I⊗ H) ≡ (I⊗ H)CX • CX(I⊗ H) ≡ (I⊗ H)CZ IfA manages to make �post-processing and�pre-processing contiguous, the validity check reduces to the base case scenario. OtherwiseA,(�, �Δ, ) is false. H � � � H 1 4 X Z H A H � ↦−→ H � � ⊕� � H 1 3 6 X Z H � � ⊕� ⊕� 1 3 5 Fig. 14. An expansion, obtained by applying rules fromA. In this example scenario,RCX and RCZ are interspersed with single-qubit local operators. Notice that boolean variables travel simultaneously. Hence, the assumption we made in Sec. 4.2 � ⊕� � � 1 4 6 3 ś i.e., Δ ≲ Δ ś holds also for complex evaluations asZ and X Z . Namely, operations O , . . . , O belong to layers between the ones of �and � 1 � ACM Trans. Quantum Comput. 16 • D. Cuomo et al. So far, we deinedA only for�, �without any other remote operation in between. Before generalizing the method to any �and �we prove that our deinitionA ofcan be implemented so that it runs in polynomial time. We need this requirement to keep things tractable. Theorem 1. A has O(�) complexity, with � being the number of operations A considers. Proof. Assume there occur � local operations, say O , . . . , O , between �and �. IfA manages to push �forward 1 � O , it means that its post-processing run after O and it may only propagate vertically , over diferent qubits ś 1 1 by construction of the rule set. As consequence, the depth of the circuit has not increased. Furthermore, the � � post-processing is still composed by Pauli operations ofZtheorkind X . Hence, this holds for any O and 1≤� ¯≤� the recursion is upper-bounded byO(�). Symmetrically,A if manages to push �backward O , it means that the pre-processing can run befor O e. Also in � � this case, the depth has not increased and the pre-processing is still composed by two indep CXendent operations ś again, by construction of the rule set. Hence, this holds forOany and the recursion is upper-bounded by 1≤� ¯≤� O(�). □ We can now move on to the general case. Formally, between �and �a remote operation� may occur, which is also in logical conlict with both. For such a scenario, we just add a recursive ruleA. Namely (�, �Δ, ),holds if the following holds: ∃�∈ [0, 1] : A(�, �, �· Δ )∧A(�, �(1 , − �)· Δ ). Φ Φ Take a moment to appreciate why this kind of recursion is feasible. Speciically, one might think that validity of A(�, �, �· Δ ) andA(�, �(1 , − �)· Δ ) are not independent, because they both operate on�. However, in the former Φ Φ function,A evaluates the pre-processing of �, while, in the latter, it evaluates its post-processing. Therefore they can be evaluated independently. Theorem 2. GeneralizeA d has O(� ) complexity, with � being the number of operations A considers. Proof. Assume there occur � , . . . , � between �and �. For the purpose of the proof let � being a power of 2. 1 � A(�, �Δ, ) can choose any of the� , . . . , � operations for the recursion. To keep symmetry,A let(�, � , �· Δ ) Φ 1 � Φ � � and A(� , �,(1− �)· Δ ) be the recursive call. Notice that operations considerA ed(�by , � , �· Δ ) are , as Φ Φ 2 2 well as the ones considered byA(� , �,(1− �)· Δ ). The result is a recursive binary tree of height log� and, therefore,O(� ) calls toA. The leaves correspond to the base case of the recursion, which is proved to be tractable in Theorem 1. □ Fig. 14 shows an example scenario where we used rules asAin ś in addition to the irst one of Fig. 10. Clearly, our modular architecture is prone to modiications or extensions A, if offuture research highlighted more reined requirements. Remark. Notice that we managed to deineA to be independent by the connectivityQof . This was possible thanks to the way we modeled telegates via eicient entanglement paths ś see Appx. A. In other worA ds,(�, �Δ, ) works for any solver and regardless of the path this chooses to perform �and �. As consequence, the characterizationA ofś and therefore also of q ś is static and depends only by the logical circuit and global factors, Δ . Furthermor i.e., e, we may relate coherence time and entanglement link creation Δ +toΔ ≈ Δ . As consequence, whatever Δ is,A E Φ E Φ does not signiicantly afect the duration of each time step. This makes E-depth thea particularly good index for the running time of the overall computation. ACM Trans. Quantum Comput. Optimized compiler for distributed quantum computing • 17 5.3 The role of the Cliford group in distributed quantum computing In our algorithm, we tried to postpone the post-processing as much as possible, to allow classical information to travel across remote computers in the meantime. An ideal result would be to push it to the end of the circuit: indeed, since the post-processing is made only of Pauli gates, if it were located at the end of the circuit, it could be eiciently replaced by a classical computation, removing the need of the quantum state to remain coherent while the information travels. We show in the next subsection that pushing the post-processing to the end is possible if the circuit belongs to a particular class, namely Clifor the gr d oup, generated by the operators {CX, CZ, H, T } (or 2 2 by the minimal sets {CX, H, T } and {CZ, H, T }). Let us introduce here some basic facts about such a group. The interest in the Cliford group derives from the fact that it covers a wide spectrum of circuits, but does not include the complexities of the 2� |�⟩ T|�⟩ T gate. The Cliford group can also be eiciently simulated on a classical computer. We already discussed that the T gate represents the most error- |�⟩ prone gate in the fault-tolerant context. On the other hand, it is obvious that the Cliford group together with the T gate is universal [75]. For this Fig. 15. Example ofT gateinjection. reason, it makes sense to represent an arbitrary circuit in terms of a Cliford circuit plus as little T gates as possible. This was attempted in literature in two ways: • decompose circuits, with the goal of minimizing the numb T gateeroccurr of ences [4, 82]; • injectT gates into a Cliford circuit, by means of state preparation [62, 92, 95]. A basic example of T injection is shown in Fig. 15, where injection is performed through one auxiliary qubit, prepared in the state 1 i� |�⟩ = TH|0⟩ = √ (|0⟩ + � |1⟩). (11) Other facts about the Cliford group are worth being reported. Speciically, distributed architectures based on trapped ions50 [ , 73, 86] are well itted to work with state injection on Cliford circuits. Indeed, experimental results show that single-qubit gates can run with .9999% 99 idelity43[] and that CX (or CZ) operators, can achieve a 99.9% idelity 7].[ Furthermore the local connectivity for such a processor is complete 64]. This [ means that a T injection would give a idelity ∼ 99.of 8997%, if prepared as in equation (11) and circuit of Fig. 15, without the need of distillation nor of local routing. As a consequence, future architectures relying on entanglement generation and distribution, are likely to supply T inje some ction module too. 5.4 Circuit normal forms for the Cliford group and implications on the post-processing As said at the beginning of Sec. 5.3, important beneits could be achieved by postponing the post-processing to the end of the circuit, where they can be computed classically. An attempt in this direction is available 65], in Ref. [ where authors delay Pauli operations together with non-Pauli ones. Instead, our approach is to show that the result can always be achieved on the Cliford group, by relying on normal the forms [1, 27, 29, 71]. Such a form results particularly useful for distributed computing and, more in general, measurement-base for d computation. It was shown [29] that any Cliford gate acting on a Pauli state, can be represented in the normal form depicted in Fig. 16. This normal form is of practical interest as it can be obtained starting from any Cliford circuit, which is in general not in normal form. Such a result comes from the employment ZX-calculus of a reasoner ś e.g. [53]. ZX-calculus29 [ , 87] is a graphical language arisen as an optimizer for quantum circuits, that translates a quantum circuit into ZXa-diagram . The main diference between the diagram and the original circuit is that the former works with ZX-rules, which serve as a reasoning tool to smartly generate a new circuit, equivalent to the original one ZX.-calculus was recently introduced in the literature, with the main objective of minimizing a ACM Trans. Quantum Comput. 18 • D. Cuomo et al. circuit gate-depth, and its potentiality is still being explored, raising increasing interest for its versatility. In fact, we use it here to perform architecture-compliant optimization. Let us describe the few tools and properties we need to benchmark our compiler, while the interested reader can refer to the bibliography for a more extended dissertation. Coming back to Fig. 16, we use the circuit symbol to express a generic Pauli state preparation. Similarly, the symbolexpresses a generic Pauli measurement.L is a set of layers where only the O operator occurs. For exampleL encodes a circuit composed O CZ by CZ operators . (1) (2) L L L L . CZ CX H CZ . . . . . Fig. 16. Normal form coming from the ZX-rules applied in Ref. [29]. The following remark is a consequence of dealing with Cliford circuits in normal form. Remark. While predicate A is running, only Pauli and Hadamard operations concur to its evaluation. Hence, all the post-processing operations can be pushed forward, up to end of the circuit and can be computed eiciently by a classical computer. Furthermore, since no post-processing occurs during quantum computation, the entanglement path length has a negligible impact. (1) The normal form suggests that the problem can be separated into three parts, correspondingLto, L and CX CZ (2) (1) (2) L . For two of them ś i.e.,L and L ś the order relation is trivial (as CZall commute), and therefore we CZ CZ CZ can use any quickest multi-commodity low solver to get a feasible compilation. On the contrary, the optimal characterization of the order relationLforis a conceptually complex task. Indeed, a set of relations with CX minimal size may not be the best characterization from a practical point of view, if many of the relations involve remote qubits. The topic of optimal CX order relations deserves a dedicated analysis and is the subject of future work. Let us emphasize the importanceL of circuits, by pointing out some facts from71Ref. ]. The [ authors therein CZ introduce theBoolean degrees of freedom as a way to count how many diferent algorithms can be implemented with a class of gates, and show that a generic L “has roughly half the number of the degrees of freedomž CZ compared to a genericL , and roughly a quarter compared to the Cliford group. We validate our compiler CX performance by solvingL circuits on diferent architectures in Sec. 7. So, being able to exploit normal forms to CZ (1) (2) isolate two highly expressive bloL cksand L that can be compiled without recurring to order relations, is a CZ CZ very relevant result. Before discussing the implementation details, let us make a inal rZX emark -calculus. on We introduced it in the context of the Cliford group, but it is designed to work more broadly with any6cir , 14,cuit 47, 52[]. Therefore, we aim to expand our analysis in future works, by investigating normal forms for universal circuits. An interesting result in this sense is available in 44],Ref. wher[e authors show that a universal circuit can be split into three steps: Notice thatCZ ≡ H CX H . Thus, we do not need to expand our assumptions on the gate set. �,� � �,� � ACM Trans. Quantum Comput. Optimized compiler for distributed quantum computing • 19 (1) the system is prepared innon-Clifor a d state , this involves auxiliary qubits which will do the work of injecting non-Cliford phases ś e.g. the T gate; (2) an L circuit; CX (3) a measurement-based sequence of Cliford operations (which can still be treateZX d-calculus with [28]). 6 IMPLEMENTATION TECHNIQUES 6.1 Time-expansion Formulation (10) is a particular caseMCF of , as it slightly recedes from the standard formulation. As expected, the problem is still intractable. To understand that, consider this simple scenario: � an ] with instance � = 2[such that 1 q 2. We can restate the problem as follows: assert if there exists a solution at irst time step. If not, just put operation 2 at second time step. Unfortunately, asserting if such a solutionNP exists -hard.is Indeed, in Ref. [32], authors proved the hardness of such a decision problem, even for single capacity edges. Therefore, it is reasonable to look for approximations DQCC of . To this aim, we think a good line of research would be to follow a common technique for tackling MCF : the time-expansion [38]. Namely, a re-deinition of the instance graph, fromQ to a new graph Q . Such a technique is useful because, instead of tackling MCF over Q, one can tackle its � � static version MCF over Q . Let us introduce it formally for our scenario. A time-expansion of Q is a graphQ = (� , � ). Accordingly to this criterion, an � edge , � )(∈ � taking � � � � � discrete travel time � would translate into directed edges � (�(), � (�+ �)), (� (�), � (�+ �)) ∈ � , with a shared � � � � constraint on the capacity. Nevertheless, edgesQin are assumed to have null travel time. Hence, a time-expansion ofQ is particularly eicient, since one just needs to introduce a repetition Q for each of time-step �, which we refer to asQ(�) = (�(�), �(�)). As consequence, time-dependent setss(�) and t(�) replaces and t. We keep using s and t as the nodes encoding the commodities, non-localized in time. � For andeach �, we introduce edges (� , � (�)) and (� (�), � ), both with unit capacity. � � � � � � � � Since only integral low are allowed and the demand is exactly 1, for any operation �, only one of the edges{(� , � (�))} ś as well � � � � � as only one in{(� (�), � )} ś will have a non-zero low. � � � � � Now that we gave a irst intuitive way to encode the sources of the problem, let us optimize it. Notice that operation 1 can always run at time 1, and it is a waste of time and space considering other options. Fig. 17. Time-expanded graph of4 processors, for an instance[�] with� = 3 and time horizon As consequence, for operation 1, we only introduce � ,(� (1)) and � � � � � = 2. (� (1), � ). This extends to any operation, which can always run in � � � � a time between 1 andmin{�, �}, by assuming that a solution exists with time horizon �. Therefore, for each operation �, we introduce the sets of edges{(� , � (�)) : ∀1 ≤ �≤ min{�, �}} and {(� (�), � ) : ∀1 ≤ �≤ min{�, �}}. Fig. 17 shows the � � � � � � � inal graph for instance �] [with� = 3, time horizon � = 2 on an architecture with 4 processors. As said, the time expansion Q is a common way to tackle MCF as a static low problem and it is particularly � � eicient in our scenario. Speciically, we couldQmoby del simply introducing � repetitionsQ ofand, especially, without the need of edges connecting diferent time-steps Q(�), Q(�¯). Because of this result, we are also able to implement a time-expansion at a logical level, without actually allocating � repspace etition for Q of. This is detailed in Sec. 6.3. To the best of our knowledge, even if approximation algorithms MCF [for 20, 85] and variants16 [ , 18, 19, 79, 84] have been extensively studied, there seems to be no proposal relatable to ours, modeling DQCC. More formally, no eicient reduction seems possible from our problem to standard formulations, while approximation algorithms proposed in literature usually rely LPon -relaxation, or on greedy criteria. Theses proposals do not guarantee ACM Trans. Quantum Comput. 20 • D. Cuomo et al. that constraints(8) and (9) are satisied. Hence, further studies along this line would be useful to (i) place the problem within its most proper complexity class and to (ii) guarantee approximation ratio. 6.2 Transformation to direct graph Since the literature dealing MCF with usually assume a directed graph, we here report a mapping method from an undirected graph to an equivalent one with direct edges. This would bring just a constant overhead in the space, while it would not afect any approximation factor which a solver would rely on. Fig. 18 comes 2]. It is from a fast [ approach to map an undirected multi-commodity low problem to a directed one. Speciically, for each couple of ′ ′ nodes � , � connected by an edge with capacity �, one have to introduce two extra nodes, say� , � and connect � � � � ′ ′ them with the direct edge � (, � ) of capacity �. The last step is creating directed cycles of ininite capacity, where � � the only bottleneck is �. 6.3 Compilation through approximation We already discussed in Sec. 4.4 how to tackle DQCC as a particular case of quickest multi-commodity low. In this way we managed to reduce the problem on the resolution of one or more static instance of theMCF. In Refs. [55, 56] it has been shown that whenever each Fig. 18. Mapping from an undirected graph to a commodity is a source (or a target) for any other node, than solving directed one working for any multi-commodity it throughLP-relaxation outputs an optimal solution MCFto . This flow problem. The transformation undergoes result can be of interest when treating fully entangling cir. cuits with a constant overhead in the number of To keep the compiler more general, we opted to investigate algo- nodes and edges. rithms with approximation boundary guarante 57e,d58[, 69]. Specif- ically, we implemented the pseudo-code outlined in 33].Ref. This [ is followed by a proof on the approximation quality for the case of capacity�= 1 and �> 1. We focus on the case�= 1, but it can be extended to �> 1. By using our formalism, the approximation algorithm aims to run as many non-local operators ś i.e. satisfying commodities demand ś as possible. A computed solution is a sub-set �⊆ [�]. The optimal solution � is⊆ [�] and |�| ≤ |� |. It follows the (optimal) approximation boundary [33, 69]: |� | |�| ≥ √ , � = |�| (12) O( � ) Notice that the solution quality is inversely proportional to the number of entanglement links. It means that we cannot estimate an optimal solution toDQCC the, as for a given time horizon, this afects the quality of the solution space. Furthermore, the time-expansion increases the number of edges and so does the distance |� |−|�|. Ultimately, even if the allocated space by the time-expansion grows at most linearly with the number of non-local operations ś see Sec. 4.4 ś, this can seriously afect the performance when such an amount is very big . On contrary, it is possible to keep the time-expansion abstract and compiling iteratively as many operations as possible at each time-step. This method is detailed in Algorithm 2. Notice that each iteration guarantees the boundary of equation (12) and, above all, since the instance decreases in size, the distance |� | − |�| tends to decrease as well. 7 EVALUATION Better upper-bounds for the worst-case solution should be investigated. ACM Trans. Quantum Comput. Optimized compiler for distributed quantum computing • 21 As distributed quantum architectures are still at an early stage, it Algorithm 2: Iterative compiler is hard to predict with conidence what kind of connectivity and Input: Q, [�] resources they will supply. Furthermore, it is worth mentioning that Output: � distributed computing, by its nature, presents features coming from 1 �← [�] routing models as well as compiling models. Hence, we here report 2 �← 0 and compare an interesting work available in literature dedicated 3 while � ≠ ∅ do on routing entangled states77[]. In such a manuscript, authors deal 4 �← MCF(Q, �) with unreliable optical links to create entanglement and dynamically 5 �← �∖ � choose a multi-path solution in order to maximise the entanglement success-rate. Even if our network topology relies on the same ar- 6 �← �+ 1 chitecture, we model the linkage through a single path which is dedicated to the entanglement generation and distribution for a time Δ , taken big enough to guarantee a high idelity. This is a fundamental diference, making the two models diicult to compare. Here we evaluate thesquare lattice topology proposed in Ref. 77[] by comparing it withhe an xagon lattice topology. We therefore verify the compiler performance for both the lattices in terms of: • solution quality; • robustness to scale-up. We conclude the comparison with the possible implications of the results. 7.1 Set-up To compare the compiler performance on diferent topologies, we make usegenerator of a factor�. The number of nodes and edges of each lattice will be expressed as a function �. Because of the two lattices difer by deinition, it is not trivial to settle a fair comparison. To do that, we irst generate a sample of hexagon H such lattices that 2 2 1 3 7 |�| = /2· � + 3�+O(1), |�| = /4· � + /2· �+O(1). (13) We compareH with two square lattices, say S andS , that have sizes respectively lower and higherH than for ▼ ▲ each � ś see Fig. 19. Hence,S is such that 2 2 1 3 1 |�| = /4· � + /2· �+O(1), |�| = /2· � + 2�+O(1). (14) whileS is such that 2 2 |�| = 2� + 2�, |�| = � + 2�+O(1). (15) We show in the next subsection thatS andS perform better thanH in terms of resulting E-depth. This implies ▲ ▼ that the square lattice is a better design for distributed quantum computers, assuming that our compiler performs equally well on diferent topologies. (a) Square laticeS . (b) Hexagon laticeH. (c) Square laticeS . ▼ ▲ Fig. 19. Example of latices used for the experimental evaluation; they all come from generator � = 4. ACM Trans. Quantum Comput. 22 • D. Cuomo et al. Since we use Algorithm 2, capacities are assumed to be 1. We already pointed out that such an algorithm can be extended to the case �> 1. Notice that diferent node degrees imply diferent assumptions on the processor � units . The hexagon lattice has node degree upper-bounded by 3 and lower-bounded by 2, which means that � has 2 to 3 communication qubits. Similarly, the square lattice has degree upper-bounded by 4. Hence, the communication qubits per unit are 2 to 4. Since our focus here is on distributed compilation, we will assume � has 1that computation qubit. This is especially reasonable when considering that real implementation of distributed architecture may use most of their local resourcesauxiliar as y qubits , meant to keep the computation fault-tolerant. Concerning the life-time of the entanglement Δ , this comes after that the operator E succeeded to store the state in the distributed system. While performing E is the hardest part ś as it takes a long time Δ [48] ś, once it succeeds, the storage on matter qubits is quite performing 89]. For[ this reason, we can just assume that the coherence time is long enough to satisfy Δ > 4· Δ ; where the factor 4 is an upper-bound for the node degrees Φ CZ of lattices. For the numerical evaluation we use a generating vegctor = (1, 2, . . . , 11). Hence, when the generator is ixed to 11, the size ofH reaches |�| = 96 and |�| = 131,S reaches |�| = 49 and |�| = 84, whileS reaches |�| = 144 ▼ ▼ and |�| = 264. Ultimately, regarding the circuits, we have already discussed in Sec. 5.4 that from any Cliford circuit we can extract 3 separated sets of 2-qubits gates and focus onL circuits. For this reason, we here consider L circuits. CZ RCZ We generate three samples classiied by their size (or number of occurring operators). Each sample is composed by 10 random circuits in order to average the results. The size of the samples are 256, 512 and 1024. 7.2 Results To evaluate the results we used thematlab environment72 [ ]. The employed architecture is a MacBook Air (M1, 2020, 8GB RAM). The irst result ś shown in Fig. 20 ś is a comparison on the solution quality, a.k.a. E-depth. the As anticipated, Hexagon lat. Hexagon lat. Hexagon lat. 80 150 50 100 0 0 0 2 4 6 8 10 12 0 2 4 6 8 10 12 0 2 4 6 8 10 12 Generator Generator Generator (a) 256 RCZ (b) 512 RCZ (c) 1024 RCZ Fig. 20. uality scale comparison. the plots show that a square lattice gives better solutions, for any problem size. We can relate this behavior to the |�| ratio edges-to-nodes . Formally, let � = be such a ratio for a graphQ. Then it results that square lattices have |�| ratio: lim� = 2. (16) �→∞ Instead hexagon lattices have a lower ratio: lim� = /2. (17) �→∞ ACM Trans. Quantum Comput. Depth Depth Depth Optimized compiler for distributed quantum computing • 23 This suggests that the bigger the ratio, the better the solutions. The plots also show that the depth achieved by the diferent lattices may be ruled by the same polynomial function (up to some constant factor). This is in line with the intuition that a more connected topology allows for shorter depth. Furthermore, we already mentioned in Sec. 6.3 that, even if the approximation algorithm depends on the edges size, this is called as a subroutine that performs better and better at each iteration. All this may mean that the compiler has a convergence to an optimal depth. On contrary, if the compiler was afected by the number of edges, the functions should swap at some point, but we never observed such phenomenon. To conclude our evaluation, we took the average times for each sample. The results are shown in Fig. 21. Diferently from what we got in the solution quality evaluation ś where we noticed a similar behaviour for each architecture ś, the time-scale gives new perspectives in the lattices comparison.HInand fact,S seems to need approximately the same time to compile any circuit,S with performing slightly worse ś which is coherent with the size diference between the twos. Instead,S outperforms the others lattices. Furthermore, it seems that it is more resistant to scale-up as the scaling seems to follow a lower degree function. 1.4 0.35 Hexagon lat. Hexagon lat. Hexagon lat. 1.2 5 0.3 0.25 4 0.8 0.2 0.6 0.15 0.1 0.4 0.05 0.2 0 0 0 2 4 6 8 10 12 0 2 4 6 8 10 12 0 2 4 6 8 10 12 Generator Generator Generator (a) 256 RCZ (b) 512 RCZ (c) 1024 RCZ Fig. 21. Time scale comparison 8 CONCLUSION To conclude this manuscript, let us highlight the main beneits of our framework for DQCC treating , as well as the key indings. (i) By expressing the problem as a quickest low problem, we could give a formulation corresponding to a multi-commodity low problem over ixed time. This approach is particularly well itting with our goals, because a quickest low expresses the need to run a circuit as fast as possible, while a low over ixed time brings a side interest into the minimization of resource usage, which is desideratum clearly a, but still secondary to the overall running-time. (ii) Quasi-parallelism, represented by constraints (8) and (9), gives the possibility to consider a wider solution space. Quasi-parallelism is grounded on the idea of gathering logically sequenced telegates within the same time step, by means of an eicient circuit manipulation ś see preA dicate . (iii) We built our model step by step, each of which rigorously explained. The result is an highly modular work. For example, if one can consider only circuits where operations can all commute each other, formulation (6) is enough and approximation bounds are available. Instead, when considering any circuit, one can easily shape the extra constraints of formulation (10). Consider, for example, the quasi-parallelism relation q, we characterized it as the predicateA. By just extending the wayA works, the space of good solutions gets larger. (iv) Since we modeled the problem as a network low problem, one can also exploit the huge related literature to get inspiration in the way of tackling the problem. ACM Trans. Quantum Comput. Seconds Seconds Seconds 24 • D. Cuomo et al. (v) We deeply investigated the literature on quantum circuits and logic in order to tackle big groups of circuits with a form which would be well itting with the constraints coming from the architecture. This led us to focus on circuits expressed in normal forms. By tackling individual normal forms, the compiler can be modulated to a form chosen and take advantage from the properties coming from a normal form. We started by outlining a normal form for Cliford circuits up to one for universal circuits. From this step-by-step analysis of the circuit, we will be able to improve the compiler in future works, while at the same time being able to evaluate our model by means of a restricted group of circuits. (vi) We applied our compiler on diferent topologies. We focused on square and hexagon lattices and showed that square lattices outperforms hexagon ones, both in terms of solution quality E-depth)(and running-time. We gave some perspective on why we obtained such results, showing that the ratio edges-to-nodes is a representative metric. A ENTANGLEMENT SWAP GENERALIZATION Within this section we show how to eiciently implement an entanglement path. In Sec. 3.3, we introduced the entanglement swap as a circuit of depth 5. We also claimed that such a depth is ixed when generalizing the entanglement swap to the entanglement path. To this aim, we give an inductive proof for such a statement, starting from the base case with entanglement path of length 2. Theorem 3. An entanglement path{� , � , . . . , � } has an implementation with depth 5. � � � 1 2 � Proof. Consider, as base case, that we want to create a path of length 2. Clearly, we could do that by just putting in strict sequence two entanglement swaps: � � 1 3 Z Z H � 2 H � X 3 The colored operators are the only ones we are going to optimize; since the others are independent and no optimization can be applied. What follows is the base case for the induction: � � � ⊕� 1 3 1 3 Z Z Z 2 H � H � X 3 3 � � 4 4 � � ⊕� 4 2 4 X X � ⊕� � ⊕� 1 3 2 4 Speciically, circuit on the right of equation has post-processing comp Z osedon byirst qubit and X on last qubit. Furthermore, now the measurements are independent from other operations. By assuming that such a shape is preserved in the inductive step, we show that this transformation can be applied to any length: ACM Trans. Quantum Comput. Optimized compiler for distributed quantum computing • 25 � ⊕� ⊕···⊕� � � ⊕� ⊕···⊕� 1 3 2(� −1)−1 2� −1 1 3 2� −1 Z Z � ⊕� ⊕···⊕� 2 4 2(� −1) H � H � 2� −1 2� −1 � � 2� 2� � � ⊕� ⊕···⊕� 2� 2 4 2� X X This proves that we can always consider an entanglement path {� , � , . . . , � } to have circuit depth 5. □ � � � 1 2 � We just showed an eicient implementation for the entanglement path. Now we do one last step to exploit such a result and performing a generalized remote operation eiciently. Theorem 4. An RCX of entanglement path{� , � , . . . , � } has depth 5. � � � 1 2 � +2 Proof. Theorem 3 allows us to assume that, to perform a remote operation by using a path of length � , the computing qubits interact only with two communications qubits and depend only by Pauli operations � ⊕� ⊕···⊕� � ⊕� ⊕···⊕� 1 3 2� −1 2 4 2� Z and X . We can furtherpropagate such operations as follows: � � ⊕� ⊕···⊕� ⊕� 2� +2 1 3 2� −1 2� +2 Z Z � ⊕� ⊕···⊕� 1 3 2� −1 � � Z 2� +1 2� +1 � ⊕� ⊕···⊕� 2 4 2� H � H � X 2� +2 2� +2 � � ⊕� ⊕···⊕� ⊕� 2� +1 2 4 2� 2� +1 X X In this way the measurements are independent and the depth of the circuit is not increased. □ REFERENCES [1] Scott Aaronson and Daniel Gottesman. 2004. Improved simulation of stabilizerPcir hysical cuits.Review70, A 5 (2004), 052328. [2] Ravindra K Ahuja, Thomas L Magnanti, and James B Orlin. 1988. Network lows. (1988). [3] Nitzan Akerman, Nir Navon, Shlomi Kotler, Yinnon Glickman, and Roee Ozeri. 2015. Universal gate-set for trapped-ion qubits using a narrow linewidth diode laser New. Journal of Physics 17, 11 (2015), 113060. [4] Matthew Amy, Dmitri Maslov, and Michele Mosca. 2014. Polynomial-time T-depth optimization of Clifor T cir d+cuits via matroid partitioning. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 33, 10 (2014), 1476ś1489. [5] Pablo Andres-Martinez and Chris Heunen. 2019. Automated distribution of quantum circuits via hypergraph partitioning. Physical Review A100, 3 (2019), 032308. [6] Miriam Backens. 2014. The ZX-calculus is complete for stabilizer quantum mechanics. New Journal of Physics 16, 9 (2014), 093021. [7] CJ Ballance, TP Harty, NM Linke, and DM Lucas. 2014. High-idelity two-qubit quantum logic gates using trapped calcium-43 ions. arXiv preprint arXiv:1406.5473 (2014). [8] Robert Beals, Stephen Brierley, Oliver Gray, Aram W Harrow, Samuel Kutin, Noah Linden, Dan Shepherd, and Mark Stather. 2013. Eicient distributed quantum computing. Proceedings of the Royal Society A: Mathematical, Physical and Engineering 469, Sciences 2153 (2013), 20120686. [9] Kyle EC Booth, Minh Do, J Christopher Beck, Eleanor Riefel, Davide Venturelli, and Jeremy Frank. 2018. Comparing and integrating constraint programming and temporal planning for quantum circuit compilation. 28th international In conference on automated planning and scheduling . [10] Adi Botea, Akihiro Kishimoto, and Radu Marinescu. 2018. On the complexity of quantum circuit compilation. Eleventh annual In symposium on combinatorial sear . ch [11] Lukas Burgholzer, Sarah Schneider, and Robert Wille. 2022. Limiting the Search Space in Optimal Quantum Circuit 2022 Mapping. In 27th Asia and South Paciic Design Automation Conference (ASP-D . IEEE, AC) 466ś471. [12] Angela Sara Cacciapuoti, Marcello Calei, Francesco Tafuri, Francesco Saverio Cataliotti, Stefano Gherardini, and Giuseppe Bianchi. 2019. Quantum internet: networking challenges in distributed quantum computing. IEEE Network 34, 1 (2019), 137ś143. [13] Angela Sara Cacciapuoti, Marcello Calei, Rodney Van Meter, and Lajos Hanzo. 2020. When entanglement meets classical communica- tions: Quantum teleportation for the quantum internet. IEEE Transactions on Communications 68, 6 (2020), 3808ś3833. ACM Trans. Quantum Comput. 26 • D. Cuomo et al. [14] Titouan Carette, Emmanuel Jeandel, Simon Perdrix, and Renaud Vilmart. 2021. Completeness of Graphical Languages for Mixed State Quantum Mechanics.ACM Transactions on Quantum Computing 2, 4 (2021), 1ś28. [15] Davide Castelvecchi. 2018. The quantum internet has arrived (and it hasn’t). Nature 554, 7690 (2018), 289ś293. [16] Amit Chakrabarti, Chandra Chekuri, Anupam Gupta, and Amit Kumar. 2007. Approximation algorithms for the unsplittable low problem.Algorithmica 47, 1 (2007), 53ś78. [17] Kaushik Chakraborty, David Elkouss, Bruno Rijsman, and Stephanie Wehner. 2020. Entanglement distribution in a quantum network: A multicommodity low-based approach. IEEE Transactions on Quantum Engineering 1 (2020), 1ś21. [18] Chandra Chekuri, Sanjeev Khanna, and Bruce Shepherd. 2004. The all-or-nothing multicommodity low problem. ProceeIn dings of the 36th annual ACM symposium on theory of computing . 156ś165. [19] Chandra Chekuri, Sanjeev Khanna, and Bruce Shepherd. 2006. AnO( �) approximation and integrality gap for disjoint paths and unsplittable low The . ory of computing 2, 1 (2006), 137ś146. [20] Dae-Sik Choi and In-Chan Choi. 2006. On the efectiveness of the linear programming relaxation of the 0-1 multi-commodity minimum cost network low problem. InInternational Computing and Combinatorics Confer . Springer ence , 517ś526. [21] Claudio Cicconetti, Marco Conti, and Andrea Passarella. 2021. Request Scheduling in Quantum Netw IEEE Torks. ransactions on Quantum Engineering 2 (2021), 2ś17. [22] Daniele Cuomo, Marcello Calei, and Angela Sara Cacciapuoti. 2020. Towards a distributed quantum computing eIET cosystem. Quantum Communication 1, 1 (2020), 3ś8. [23] Davood Dadkhah, Mariam Zomorodi, and Seyed Ebrahim Hosseini. 2021. A New Approach for Optimization of Distributed Quantum Circuits. International Journal of Theoretical Physics 60, 9 (2021), 3271ś3285. [24] Omid Daei, Keivan Navi, and Mariam Zomorodi. 2021. Improving the Teleportation Cost in Distributed Quantum Circuits Based on Commuting of Gates.International Journal of Theoretical Physics 60, 9 (2021), 3494ś3513. [25] Omid Daei, Keivan Navi, and Mariam Zomorodi-Moghadam. 2020. Optimized Quantum Circuit International Partitioning. Journal of Theoretical Physics 59, 12 (2020), 3804ś3820. [26] Zohreh Davarzani, Mariam Zomorodi-Moghadam, Mahboobeh Houshmand, and Mostafa Nouri-baygi. 2020. A dynamic programming approach for distributing quantum circuits by bipartite Quantum graphs. Information Processing 19, 10 (2020), 1ś18. [27] Jeroen Dehaene and Bart De Moor. 2003. Cliford group, stabilizer states, and linear and quadratic operations over Physical GF (2). Review A68, 4 (2003), 042318. [28] Ross Duncan. 2012. A graphical approach to measurement-based quantum computing. arXiv preprint arXiv:1203.6242 (2012). [29] Ross Duncan, Aleks Kissinger, Simon Perdrix, and John Van De Wetering. 2020. Graph-theoretic Simpliication of Quantum Circuits with theZX-calculus.Quantum 4 (2020), 279. [30] Wolfgang Dür, Raphael Lamprecht, and Stefan Heusler. 2017. Towards a quantum internet. European Journal of Physics 38, 4 (2017), [31] Andrew Eddins, Mario Motta, Tanvi P Gujarati, Sergey Bravyi, Antonio Mezzacapo, Charles Hadield, and Sarah Sheldon. 2022. Doubling the size of quantum simulators by entanglement forging. PRX Quantum 3, 1 (2022), 010309. [32] Shimon Even, Alon Itai, and Adi Shamir. 1975. On the complexity of time table and multi-commodity low16th problems. AnnualIn Symposium on Foundations of Computer Science . IEEE, 184ś193. [33] Li Fei. 2017. Multicommodity Flows and Disjoint Paths Problem. https://cs.gmu.edu/~lifei/teaching/cs684spring17/lec8.pdf. [34] Davide Ferrari and Michele Amoretti. 2021. Noise-Adaptive Quantum Compilation Strategies Evaluated with Application-Motivated Benchmarks. arXiv preprint arXiv:2108.11874 (2021). [35] Davide Ferrari, Angela Sara Cacciapuoti, Michele Amoretti, and Marcello Calei. 2021. Compiler Design for Distributed Quantum Computing.IEEE Transactions on Quantum Engineering 2 (2021), 1ś20. [36] Lisa Fleischer and Martin Skutella. 2002. The quickest multicommodity low prInternational oblem. In Conference on Integer Programming and Combinatorial Optimization . Springer, 36ś53. [37] Lester R Ford Jr and D.R. Fulkerson. 1958. A suggested computation for Maximal Multi-Commodity NetworkManagement Flows. Science5, 1 (1958), 97. [38] Lester R Ford Jr and Delbert Ray Fulkerson. 1958. Constructing maximal dynamic lows from staticOp loerations ws. research6, 3 (1958), 419ś433. [39] Ranjani G Sundaram, Himanshu Gupta, and CR Ramakrishnan. 2021. Eicient Distribution of Quantum Cir 35thcuits. International In Symposium on Distributed Computing . Schloss Dagstuhl-Leibniz-Zentrum für Informatik. [40] Jay Gambetta. 2022. Expanding the IBM Quantum roadmap to anticipate the future of quantum-centric supercomputing. [41] Alysson Gold, JP Paquette, Anna Stockklauser, Matthew J Reagor, M Sohaib Alam, Andrew Bestwick, Nicolas Didier, Ani Nersisyan, Feyza Oruc, Armin Razavi, et. al 2021. Entanglement across separate silicon dies in a modular superconducting qubit npjdevice. Quantum Information 7, 1 (2021), 1ś10. [42] Daniel Gottesman. 1998. Theory of fault-tolerant quantum computation. Physical Review57, A 1 (1998), 127. ACM Trans. Quantum Comput. Optimized compiler for distributed quantum computing • 27 [43] TP Harty, DTC Allcock, CJ Ballance, L Guidoni, HA Janacek, NM Linke, DN Stacey, and DM Lucas. 2014. High-idelity preparation, gates, memory, and readout of a trapped-ion quantum bit. Physical review letters 113, 22 (2014), 220501. [44] Luke E Heyfron and Earl T Campbell. 2018. An eicient quantum compiler that reduces T count. Quantum Science and Technology 4, 1 (2018), 015004. [45] Stefan Hillmich, Alwin Zulehner, and Robert Wille. 2021. Exploiting quantum teleportation in quantum cir 2021 cuit 26th mapping. In Asia and South Paciic Design Automation Conference (ASP-D . IEEE, AC) 792ś797. [46] Toshinari Itoko, Rudy Raymond, Takashi Imamichi, Atsushi Matsuo, and Andrew W Cross. 2019. Quantum circuit compilers using gate commutation rules. In Proceedings of the 24th Asia and South Paciic Design Automation Confer . 191ś196. ence [47] Emmanuel Jeandel, Simon Perdrix, and Renaud Vilmart. 2018. A complete axiomatisation ZX-calculus of the for Clifor Td+ quantum mechanics. InProceedings of the 33rd Annual ACM/IEEE Symposium on Logic in Computer Science . 559ś568. [48] Norbert Kalb, Andreas A Reiserer, Peter C Humphreys, Jacob JW Bakermans, Sten J Kamerling, Naomi H Nickerson, Simon C Benjamin, Daniel J Twitchen, Matthew Markham, and Ronald Hanson. 2017. Entanglement distillation between solid-state quantum network nodes. Science356, 6341 (2017), 928ś932. [49] Peter J Karalekas, Nikolas A Tezak, Eric C Peterson, Colm A Ryan, Marcus P da Silva, and Robert S Smith. 2020. A quantum-classical cloud platform optimized for variational hybrid algorithms. Quantum Science and Technology 5, 2 (2020), 024003. [50] David Kielpinski, Chris Monroe, and David J Wineland. 2002. Architecture for a large-scale ion-trap quantum Naturcomputer e 417, . 6890 (2002), 709ś711. [51] H Jef Kimble. 2008. The quantum internet. Nature 453, 7198 (2008), 1023ś1030. [52] Aleks Kissinger and John van de Wetering. 2019. Reducing T-count with theZX-calculus.arXiv preprint arXiv:1903.10477 (2019). [53] Aleks Kissinger and John van de Wetering. 2020. PyZX: Large Scale Automated Diagrammatic Reasoning. In Proceedings 16th International Conference on Quantum Physics and Logic , Vol. 318. Open Publishing Association, 229ś241. [54] Aleksei Yur’evich Kitaev. 1997. Quantum computations: algorithms and error corrUsp ection. ekhi Matematicheskikh Nauk 52, 6 (1997), 53ś112. [55] D Kleitman, A Martin-Löf, B Rothschild, and A Whinston. 1970. A matching theorem forJournal graphs.of Combinatorial The8,or1y (1970), 104ś114. [56] Daniel J Kleitman. 1971. An algorithm for certain multi-commodity lowNetw problems. orks 1, 1 (1971), 75ś90. [57] Petr Kolman and Christian Scheideler. 2002. Improved bounds for the unsplittable low problem. SODA, In Vol. 2. 184ś193. [58] Bernhard Korte and Jens Vygen. 2006. Multicommodity Flows and Edge-Disjoint Paths. Combinatorial In Optimization: Theory and Algorithms . Springer. [59] Wojciech Kozlowski, Stephanie Wehner, Rodney Van Meter, Bruno Rijsman, Angela Sara Cacciapuoti, and Marcello Calei. 2021. Architectural Principles for a Quantum Internet. Internet-Draft draft-irtf-qirg-principles-03. Internet Engineering Task Force. Work in Progress. [60] Stefan Krastanov, Hamza Raniwala, Jefrey Holzgrafe, Kurt Jacobs, Marko Lončar, Matthew J Reagor, and Dirk R Englund. 2021. Optically Heralded Entanglement of Superconducting Systems in Quantum Networks. Physical Review Letters 127, 4 (2021), 040503. [61] Gushu Li, Yufei Ding, and Yuan Xie. 2019. Tackling the qubit mapping problem for NISQ-era quantumPrde oce vices. edings Inof the 24th International Conference on Architectural Support for Programming Languages and Operating.Systems 1001ś1014. [62] Ying Li. 2015. A magic state’s idelity can be superior to the operations that crNe eate w dJournal it. of Physics 17, 2 (2015), 023037. [63] Maokai Lin and Patrick Jaillet. 2014. On the quickest low problem in dynamic networks ś A parametric min-cost low approach. In Proceedings of the 26th annual ACM-SIAM symposium on discrete algorithms . SIAM, 1343ś1356. [64] Norbert M Linke, Dmitri Maslov, Martin Roetteler, Shantanu Debnath, Caroline Figgatt, Kevin A Landsman, Kenneth Wright, and Christopher Monroe. 2017. Experimental comparison of two quantum computing architePr ctur ocees. edings of the National Academy of Sciences114, 13 (2017), 3305ś3310. [65] Daniel Litinski. 2019. A game of surface codes: Large-scale quantum computing with lattice Quantum surger3y.(2019), 128. [66] Yehan Liu, Zlatko Minev, Thomas G McConkey, and Jay Gambetta. 2022. Design of interacting superconducting quantum circuits with quasi-lumped models. A Inmerican Physical Society (March Meeting) . [67] Liam Madden and Andrea Simonetto. 2022. Best approximate quantum compiling problems. ACM Transactions on Quantum Computing 3, 2 (2022), 1ś29. [68] Marco Maronese, Lorenzo Moro, Lorenzo Rocutto, and Enrico Prati. 2022. Quantum compiling. Quantum In Computing Environments . Springer, 39ś74. [69] Maren Martens. 2009. A simple greedy algorithm for the k-disjoint low problem. International In Conference on Theory and Applications of Models of Computation . Springer, 291ś300. [70] Dmitri Maslov, Sean M Falconer, and Michele Mosca. 2008. Quantum circuit placement. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 27, 4 (2008), 752ś763. [71] Dmitri Maslov and Martin Roetteler. 2018. Shorter stabilizer circuits via Bruhat decomposition and quantum circuit transformations. IEEE Transactions on Information Theor 64, y 7 (2018), 4729ś4738. ACM Trans. Quantum Comput. 28 • D. Cuomo et al. [72] MATLAB. 2021. R2021b. The MathWorks Inc., Natick, Massachusetts. [73] C Monroe, R Raussendorf, A Ruthven, KR Brown, P Maunz, L-M Duan, and J Kim. 2014. Large-scale modular quantum-computer architecture with atomic memory and photonic interconne Physical cts. Review89, A 2 (2014), 022317. [74] Lorenzo Moro, Matteo GA Paris, Marcello Restelli, and Enrico Prati. 2021. Quantum Compiling by Deep Reinforcement Natur Learning. e Communications Physics 4, 178 (2021). [75] Michael A Nielsen and Isaac Chuang. 2002. Quantum computation and quantum information. [76] Eesa Nikahd, Naser Mohammadzadeh, Mehdi Sedighi, and Morteza Saheb Zamani. 2021. Automated window-based partitioning of quantum circuits. Physica Scripta 96, 3 (2021), 035102. [77] Mihir Pant, Hari Krovi, Don Towsley, Leandros Tassiulas, Liang Jiang, Prithwish Basu, Dirk Englund, and Saikat Guha. 2019. Routing entanglement in the quantum internet. npj Quantum Information 5, 1 (2019), 1ś9. [78] Stefano Pirandola and Samuel L Braunstein. 2016. Physics: Unite to build a quantum Natur Internet. e News 532, 7598 (2016), 169. [79] Julian Rabbie, Kaushik Chakraborty, Guus Avis, and Stephanie Wehner. 2022. Designing quantum networks using preexisting infrastruc- ture. npj Quantum Information 8, 1 (2022), 1ś12. [80] Mohammad Beheshti Roui, Mariam Zomorodi, Masoomeh Sarvelayati, Moloud Abdar, Hamid Noori, Paweł Pławiak, Ryszard Tadeusiewicz, Xujuan Zhou, Abbas Khosravi, Saeid Nahavandi, . 2021. et alA novel approach based on genetic algorithm to speed up the discovery of classiication rules on GP KnoUs. wledge-Based Systems231 (2021), 107419. [81] Moein Sarvaghad-Moghaddam and Mariam Zomorodi. 2021. A general protocol for distributed quantum Quantum gates. Information Processing20, 8 (2021), 1ś14. [82] Peter Selinger. 2013. Quantum circuits T-depth of one. Physical Review87, A 4 (2013), 042302. [83] Marcos Yukio Siraichi, Vinícius Fernandes dos Santos, Sylvain Collange, and Fernando Magno Quintão Pereira. 2018. Qubit allocation. In Proceedings of the 2018 International Symposium on Code Generation and Optimization . 113ś125. [84] Aravind Srinivasan. 1997. Improved approximations for edge-disjoint paths, unsplittable low, and related routing problems. In Proceedings 38th Annual Symposium on Foundations of Computer Science . IEEE, 416ś425. [85] Anand Srivastav and Peter Stangier. 2000. On complexity, representation and approximation of integral multicommo Discr dity etelows. Applied Mathematics 99, 1-3 (2000), 183ś208. [86] LJ Stephenson, DP Nadlinger, BC Nichol, S An, P Drmota, TG Ballance, K Thirumalai, JF Goodwin, DM Lucas, and CJ Ballance. 2020. High-rate, high-idelity entanglement of qubits across an elementary quantum netw Physical ork. review letters 124, 11 (2020), 110501. [87] John van de Wetering. 2020.ZX-calculus for the working quantum computer scientist. arXiv preprint arXiv:2012.13966 (2020). [88] Rodney Van Meter and Simon J Devitt. 2016. The path to scalable distributed quantum computing. Computer 49, 9 (2016), 31ś42. [89] Pengfei Wang, Chun-Yang Luan, Mu Qiao, Mark Um, Junhua Zhang, Ye Wang, Xiao Yuan, Mile Gu, Jingning Zhang, and Kihwan Kim. 2021. Single ion qubit with estimated coherence time exceeding one Natur hour. e communications 12, 1 (2021), 1ś8. [90] Stephanie Wehner, David Elkouss, and Ronald Hanson. 2018. Quantum internet: A vision for the roadScience ahead.362, 6412 (2018). [91] Robert Wille, Lukas Burgholzer, and Alwin Zulehner. 2019. Mapping quantum circuits to IBM QX architectures using the minimal number ofSWAP and H operations. In2019 56th ACM/IEEE Design Automation Conference . IEEE, 1ś6. [92] Mithuna Yoganathan, Richard Jozsa, and Sergii Strelchuk. 2019. Quantum advantage of unitary Cliford circuits with magic state inputs. Proceedings of the Royal Society475, A 2225 (2019), 20180427. [93] Yuan-Hang Zhang, Pei-Lin Zheng, Yi Zhang, and Dong-Ling Deng. 2020. Topological quantum compiling with reinforcement learning. Physical Review Letters 125, 17 (2020), 170501. [94] Changchun Zhong, Zhixin Wang, Changling Zou, Mengzhen Zhang, Xu Han, Wei Fu, Mingrui Xu, S Shankar, Michel H Devoret, Hong X Tang, et al. 2020. Proposal for heralded generation and detection of entangled microwaveśoptical-photon Physical pairs. review letters 124, 1 (2020), 010511. [95] Xinlan Zhou, Debbie W Leung, and Isaac L Chuang. 2000. Methodology for quantum logic gate construction. Physical Review62, A 5 (2000), 052316. [96] Mariam Zomorodi-Moghadam, Zohreh Davarzani, Ismail Ghodsollahe. e2021. , et alConnectivity matrix model of quantum circuits and its application to distributed quantum circuit optimization. Quantum Information Processing 20 (2021). [97] Mariam Zomorodi-Moghadam, Mahboobeh Houshmand, and Monireh Houshmand. 2018. Optimizing teleportation cost in distributed quantum circuits. International Journal of Theoretical Physics 57, 3 (2018), 848ś861. [98] Alwin Zulehner and Robert Wille. 2019. Compiling ��(4) quantum circuits to IBM QX architectures.PrIn oceedings of the 24th Asia and South Paciic Design Automation Confer. ence 185ś190. ACM Trans. Quantum Comput. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png ACM Transactions on Quantum Computing Association for Computing Machinery

Loading next page...
 
/lp/association-for-computing-machinery/optimized-compiler-for-distributed-quantum-computing-QMA57Y5qYH
Publisher
Association for Computing Machinery
Copyright
Copyright © 2023 Copyright held by the owner/author(s).
ISSN
2643-6809
eISSN
2643-6817
DOI
10.1145/3579367
Publisher site
See Article on Publisher Site

Abstract

DANIELE CUOMO , Department of Physics, University of Naples Federico II ∗2 MARCELLO CALEFFI , DIETI, University of Naples Federico II KEVIN KRSULICH, IBM Quantum, T.J. Watson Research Center FILIPPO TRAMONTO , Kyndryl Italia Innovation Services GABRIELE AGLIARDI, Department of Physics, Politecnico di Milano and IBM Italia ENRICO PRATI , Department of Physics, Università degli Studi di Milano and IFN-CNR ∗2 ANGELA SARA CACCIAPUOTI , DIETI, University of Naples Federico II Practical distributed quantum computing requires the development of eicient compilers, able to make quantum circuits compatible with some given hardware constraints. This problem is known to be tough, even for local computing. Here, we address it on distributed architectures. As generally assumed in this telegates scenarior,epresent the fundamental remote (inter-processor) operations. Each telegate consists of several tasks: i) entanglement generation and distribution, ii) local operations, and iii) classical communications. Entanglement generations and distribution is an expensive resource, as it is time-consuming. To mitigate its impact, we model an optimization problem that combines running-time minimization with the usage of distributed entangled states. Speciically, we formulated the distributed compilation problem as a dynamic network low. To enhance the solution space, we extend the formulation, by introducing a predicate that manipulates the circuit given in input and parallelizes telegate tasks. To evaluate our framework, we split the problem into three sub-problems, and solve it by means of an approximation routine. Experiments demonstrate that the run-time is resistant to the problem size scaling. Moreover, we apply the proposed algorithm to compile circuits under diferent topologies, showing that topologies with a higher ratio between edges and nodes give rise to shallower circuits CCS Concepts: · Hardware→ Quantum computation; · Computer systems organization→ Distributed architectures; · Mathematics of computing→ Network optimization. Additional Key Words and Phrases: Quantum Circuit Compilation, Integer Linear Programming 1 INTRODUCTION Distributed architectures are envisioned as a long-term solution to provide practical applications of quantum computing12 [ , 22, 40, 88]. The general trend [31, 40, 41, 50, 66, 86] shows a common belief in distributed (and Also with FLY, Future communications Laboratory. Also with CNIT, National Inter-university Consortium for Telecommunications. Also with IBM Client Innovation Center during his contribution to this work. Authors’ addresses: Daniele Cuomo, daniele.cuomo@unina.it, Department of Physics, University of Naples Federico II, Italy; Marcello Calei, marcello.calei@unina.it, DIETI, University of Naples Federico II, Via Claudio 62, Italy, 80126; Kevin Krsulich, kevin.krsulich@ibm.com, IBM Quantum, T.J. Watson Research Center, Yorktown Heights, New York, 10598; Filippo Tramonto, ilippo.tramonto@gmail.com, Kyndryl Italia Innovation Services, Via Circonvallazione Idroscalo snc, Segrate (MI), Italy, 20090; Gabriele Agliardi, gabrielefrancesco.agliardi@polimi.it, Department of Physics, Politecnico di Milano, Piazza Leonardo da Vinci, Milano, 20133 and IBM Italia, Via Circonvallazione Idroscalo, Segrate (MI), 20090; Enrico Prati, enrico.prati@unimi.it, Department of Physics, Università degli Studi di Milano, Via Celoria 16, Milano, 20133 and IFN-CNR, Piazza Leonardo da Vinci 32, Milano, Italia, 20133; Angela Sara Cacciapuoti , angelasara.cacciapuoti@unina.it, DIETI, University of Naples Federico II, Via Claudio 62, Italy, 80126. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proit or commercial advantage and that copies bear this notice and the full citation on the irst page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). © 2023 Copyright held by the owner/author(s). 2643-6817/2023/1-ART https://doi.org/10.1145/3579367 ACM Trans. Quantum Comput. 2 • D. Cuomo et al. quasi-distributed, or multi-core) architectures as physical substrate, allowing a modular and horizontal scale-up of computing resources, rather than relying on vertical scale-up, coming from single hardware advancement. On the lip side, by linking distributed quantum processors, several new challenges 12, 15arise , 22, 30[, 51, 78, 90]. Here we consider thecompilation problem , which is generally tough to solve, even on single processor, and for which an NP-hardness proof is available 10].[An ever growing literature arises with a variety of proposals for local computing 9, 11 [ , 34, 45, 46, 49, 61, 67, 68, 70, 74, 83, 91, 93, 98] and for distributed computing 8, 23[ś 25, 35, 39, 76, 80, 81, 96, 97]. Even if quantum processors are already available, distributed architectures are at an early stage and must be discussed from several perspectives. A key concept is that telegates of as the fundamental inter-processor operations22 [ , 86, 88]. Each telegate can be decomposed into several tasks, that we group as follows: (i) the generation and distribution of entangled states among diferent processors, (ii) local operations and (iii) classical communications. Such tasks make the telegate an expensive resource, especially in terms of running . As atime consequence, they have critical impact on the performance of the overall computation. In contrast to such a limit, telegates ofer remarkable opportunities of parallelization. In fact, much circuit manipulation is possible to keep computation independent from telegate tasks. Therefore, we aim to model an optimization problem that embeds such opportunities. 1.1 Contribution Fig. 1. Manuscript overview. Blue blocks denote the steps in the problem modeling, scanned by blue arrows. Red blocks are the main ingredients to the entry blue blocks. The overall objective of our work is to deeply analyse strategies to reduce the overhead caused by telegates, which are the main bottleneck in the computation on distributed architectures. Fig. 1 gives a step by step overview of the paper, with particular attention to the problem modeling. Refer to [60, 94] for the state of the art on experimental implementations. ACM Trans. Quantum Comput. Optimized compiler for distributed quantum computing • 3 Sec. 2 and 3 are devoted to detailing and justifying assumptions our . As computation model we consider quantum circuits with a universal operator set. The set is based on local operations and on telegates as fundamental inter-processor operations. Here, we optimize telegates to eiciently scale with inter-processor connectivity restrictions. We move on by deining rigorously the problem (Sec. 4). To come up withformulation our we rely on a wide literature from the Operation Research ield, dealing with network scenarios. Speciically, we notice several analogies between our problem and those on dynamic networks, especially the group multi-commo of dity low problems [16ś 21, 36ś 38, 79, 84, 85]. The resulting formulation is particularly remarkable, as it is suitable for run-time minimization together with the minimization of resource usage, as a side objective. In an early step, the formulation is deliberately abstract, as it relies on binary relations that are not fully characterized at this stage. We believe that this enhances the modularity of the work and its readability. In fact, exploring the solution space requires to perform costly circuit manipulation, that deserve a dedicated discussion. Nevertheless, right after the abstract description of the problem structure, we proceed with thecharacterization full of the aforementioned binary relations (Sec.s 5.1 and 5.4). These relations deine which circuit manipulations are feasible. At irst, we use relations to model operations that can run in parallel, and in this context we introduce a relaxed version of parallelism, that we call quasi-parallelism . This relation is based on (automated) circuit manipulation which aims to gather telegates within the same time step. Sec. 5.1 contains a discussion on how to transform the graph, in order to adapt the model to the kind of circuit group one is tackling. After that, we relate all the operations to the partial order set induced by circuits expressed in normal forms ś see Sec. 5.3. We then describe ourimplementation (Sec. 6) and evaluate it by means of numerical experiments on diferent lattices (Sec. 7), showing that a square lattice gives rise to shallower circuits than a hexagon lattice, and that the compiler is able to process square lattices faster. We relate such a result to ratio between edges and nodes, which becomes an important index when choosing a topology for distributed quantum computation. Sec. 8 contains the summary of the indings and the conclusions. 2 DISTRIBUTED QUANTUM COMPUTING ESSENTIALS In this section we describe the main elements, featuring a distributed quantum architecture. One can encode a quantum processor as a set of qubits and a set of sparse tuneable couplings among qubits. If two qubits are coupled it means that they can interact. We will refer to such couplings local couplings as , to emphasize they belong to the same node in distributed architectures, as oppose entanglement d to links , that are couplings between qubits in diferent processors. As detailed in next sub-section, two remote qubits coupled through an entanglement link cannot be used for computation: consequently, it is useful to classify qubits as eithercomputation qubits or communication qubits , respectively. While computation qubits process information during the computation, the communication qubits couple distinct processors through the entanglement. Fig. 2 shows a toy architecture. The purple lines represent the couplings among distributed processors. 2.1 The entanglement link To couple two processors, a communication protocol, knownentanglement as generation and distribution [12, 13, 22], is necessary. We describe it here as three main steps: (1) generating a two-qubits maximally entangled;state (2) distributing the state between diferent processors ; (3) storing the partial states in the communication qubits. A similar classiication is available in Refs. [12, 59] The two-qubits assumption is general and can be extended to multi-qubits protocols. This step implies communication. The interested reader can ind in Ref. [13] three diferent protocols achieving the task. ACM Trans. Quantum Comput. 4 • D. Cuomo et al. When the protocol succeeds, the distributed qubits are correlated and can be exploited to perform non-local operations. For this reason we consider this correlation as a virtual link, which wentanglement e refer to as link. Entanglement links extend the possible interactions to any distributed computation qubits. Speciically, since the communication qubits are locally coupled with computation qubits, with entanglement links one can perform operations between remote computation qubits, referred telegates to as . More details on the functioning of telegates are reported in Sec. 3.2. However it is important to keep in mind that, to perform a remote operation, one has to measure the states stored in the communication qubits. As a consequence, an entanglement link is a depletable resource, assigned to a single remote operation. After the measurement, a new round of entanglement generation and distribution takes place. We now give a mathematical description of a distributed architecture, in order to formally describe the functioning of telegates. 2.2 Mathematical description So far, we presented the main elements occurring in a dis- tributed quantum architecture, which we can now represent mathematically. Formally,N let= (� , �, �) be a network triple representing the architectur � e=. � ∪ � is a set of nodes describing qubits, therefore it is the disjoint union of Fig. 2. Toy distributed quantum architecture with 3 computation qubits � = {� , � , . . . , � } and communica- 1 2 |�| processors. tion qubits � = {� , � , . . . , � }. We can represent � proces- 1 2 |�| sors by partitioning � into� = {� , � , . . . , � }. Therefore, a 1 2 � sub-set � characterizes a processor as its set of qubits/nodes. � = �∪ � is as a set of undirected edges. � represents the local couplings, therefore �⊆ � × � . � � Notice that there is no particular assumption on connectivity nor cardinality within processors. This keeps the treating hardware-independent and it allows for heterogeneous architectures. � represents entanglement links. Since entanglement links connect only communication qubits, we introduce, for each processor, a set of those qubits only; i.e � = ., � ∩ � . Therefore, we have � � � ⊆ � × � . � � �,�: �≠� Fig. 2 shows an exemplary architecture, with three processors �,in six computation qubits�in , six communi- cation qubits � in , three entanglement links � in and ten local couplings�in . Concerning minimal assumptions, we only care about architectures actually able to perform any operation. This translated into a simple connection assumption. 3 OPERATORS In the following, the gate model architecture of quantum computers is considered. There, a circuit describes a time-ordered quantum evolution as a sequence of quantum gates consisting of unitary operators. The set of available operators depends on the physical implementations. The interested reader can ind a discussion about how to achieve practical entanglement generation and distribution, via heralded-based protocols, at Ref. [59]. ACM Trans. Quantum Comput. Optimized compiler for distributed quantum computing • 5 3.1 Computation operators In order to achieve universal quantum computing, one may rely on a universal set of quantum logic gates capable to approximate any possible unitary operator. In the following, we consider a representative universal set of quantum gates, without loss of generality. A suicient set for local universal quantum computing consists of the three operators {CX, H, T}, where CX is the conditioned bit-lip operator H is the , Hadamard operator andT is the -phase shift. Indeed, with a polynomial number of repetitions H and Tof one can approximate any unitary operator with arbitrary precision 54, 75[]. Another suicient set is also {CZ, H, T}, where CZ is a conditioned phase-lip, thanks to the equivalence CZ ≡ H CX H . �,� � �,� � Nevertheless, for practical reasons that will be clear in Sec. 5.2, we ind convenient in the current paper to rely on the extended gate set {CX, CZ, H, T}. Other choices of universal sets are possible, such as those based on trapped ions in a3], cavity suitable [ for quantum interfaces where the photonic state is transferred to the cavity mode, and then to the electronic state of the ion via laser pulses [30, 86]. 3.2 Universal set To extend the universality also to distributed architectures, we need at least one remote operator. Since in our gate set ś {CX, CZ, H, T} ś one gate acting on two qubits (namelyCX , or CZ) is suicient, then it is also enough to have one remote operator. In other words, w.l.o.g. we can show a protocol performing only CXa(or CZ) between remote computation qubits. To represent such a protocol we use the notation RCX (or RCZ). With the diferent nomenclature we highlight their physical diference. Speciically CX repr , while esents a local gateRCX , represents a sequence of operations that involves distant qubits. Therefore, in general, implementations CX and RCXofcome with diferent idelity, latency and required resources. Speciically to the RCX functioning, this is based on a several fundamental steps, which we describe, in turn, by using operators. The irst operator models the entanglement link creation; we refer to Ethat or, as more explicitly, as E . It sets qubits� and � to the maximally entangled state �,� � � Φ = (|00⟩ + |11⟩). The second operator models a measurement for a communication qubit � , over the computational basis. Namely, the measurements outputs a classical binary variable � ∈ {0, 1}. We refer to that asM and with circuit � � component represented in Fig. 3. Fig. 4 shows a possible realization of a generic RCX . Here, there �,� are two qubits� , � ∈ � and two qubits� , � ∈ � . Let us separate � � � � � � the protocol in three diferent steps. The irst one is the creation of the entanglement link between � and � , i.e., applying E . After that, the � � �,� Fig. 3. Circuit component representing a second step is thepre-processing: a few local operations occur and measurement M . then qubits� , � are measured, getting� and � respectively. The inal � � � � step is thepost-processing. The binary variables are used to assert whether further operations are required. Speciically � =, if 1, a Pauli Z operator applies to � and, if � = 1, a Pauli X operator applies to � . This phase can be compactly referred with � � � � � � � the Z , X operators. Notice that� is local to processor � and � is local � to. But � uses � and � uses � . � � � � � � � � � � In other words, a cross classical communication occurs betw �een and � . � � Let us now give a look to some exemplary applications RCXof over the toy architecture of Fig. 2. �,� Here and throughout the paper, when an operator is subscripted, we are denoting the qubits it is operatingCXon, eis .g., aCX operator �,� with control qubit � and target qubit� . � � ACM Trans. Quantum Comput. 6 • D. Cuomo et al. Example 1. Assume one wants to run an RCX with control qubit � and target � ś i.e., RCX . Just run circuit in 2 3 2,3 Fig. 4, with � = 2, �= 3, � = 2, �= 4. Example 2. Now assume one wants to run RCX . In this case we can still use the entanglement link betw � een 1,3 2 and � . However, qubit� is not coupled with � . To use that link we need to swap the states stored in � and � 4 1 2 1 2 before and after running CX. What happens if one wants to run, sayRCX , ? In such a case, the qubits belong two processors having no 1,4 entanglement link coupling them. There is a really eicient protocol to overcome this problem: it is called entanglement swapand we describe it within the next section. 3.3 The entanglement swap As pointed out before, it might be the case where one wants to runRCX anoperator between a couple of qubits belonging processors with no entanglement link. Formally � and , let� such processors and �∩ (� × � ) = ∅. In � � � � the basic scenario, there exists an intermediate pro� cessor which has an entanglement link with � both and � , � � � say via four communication qubits such�that ∈ � , � , � ∈ � and � ∈ � . As Fig. 5 shows, we exploit � to � � � � � � � � entangle� and � . � � � � � � � � � � � H � � � Fig. 4. Protocol performing anRCX . From an operator point of view, this is equivalent to perform CX . However � and � �,� �,� belong diferent processors and that is why we use notation RCX. � � � � � � � � � � � � � � � � � Fig. 5. Entanglement swap protocol. This scenario has three processors � , � , � . � has an entanglement link both with � � � � � � and with� , created respectively byE and E . At the end of the protocol � and � are in the maximal entangled state � �,� �,� � � Φ . From an operator point of view, this is equivalent to perform E . �,� ACM Trans. Quantum Comput. Optimized compiler for distributed quantum computing • 7 The entanglement swap protocol can be generalized to an arbitrary sequence of intermediate processors. To this aim we introduce the conceptentanglement of path. 3.3.1 The entanglement path. Coherently with the standard deinition of path of a graph, an entanglement path is a sequence of entanglement links connecting two processors. Formally, an entanglement path is a sequence {� , � , . . . , � } of� processors such that, for any�in 1≤ �< � , there is an entanglement link betwe�en � � � � 1 2 � � and � . �+1 We can therefore entangle two communication qubits � ∈ � and � ∈ � by applying a generalization of the � � � � 1 � entanglement swap ś showed in Appendix A ś to{� , � , . . . , � }. � � � 1 2 � Since at the end of the protocol � and � are in the entangled state|Φ ⟩, an entanglement path is a generalization � � of an entanglement link. 3.3.2 RCX with entanglement path. In our scenario, the purpose of applying entanglement swap is to perform RCX. For this reason it is interesting to note that we can combine the entanglement swap protocol together with the protocol forRCX. The result is showed in Fig. 6. This result generalizes to every path, no matter the length ś see Appx. A. We further discuss within next section the latency implications coming from this result. 4 DISTRIBUTED QUANTUM CIRCUIT COMPILATION PROBLEM Usually, in the literature dealing with compiler 35design , 46, 91, 98 [ ], a circuit is encoded as a setlay ofers. Formally, a layer is a set ℓ of independent operators, meaning that each operatorℓin acts on a diferent collection of qubits. A circuit is an enumeration ofLlay = {ers ℓ , ℓ , . . . , ℓ }, where the cardinality is also commonly 1 2 |L| referred as circuit depth. A quantum programmer writes a logical circuit, abstracting from the real architecture and assuming that qubits are fully connected, i.e., any couple of qubits can perform CX opaeration directly. Such an abstraction holds also when stepping to distributed architectur . es However, NISQ architectures do not provide full coupling. As a consequence, there must be a software interface ś namely, a compiler ś able to map an abstract circuit to an equivalent one, but meeting the real coupling. In general, such a mapping implies overhead in terms of circuit depth. Therefore, inding a mapping with minimum depth overhead is an optimization problem. We refer to itquantum as the circuit compilation problem (QCC), Recall that, from a user perspectivCX e,≡ RCX. � � � � � � ⊕� � � 1 2 4 � � � � � � 1 2 H � � � � 2 3 � � � 4 H � � ⊕� � � 2 1 3 Fig. 6. RCX with entanglement swap. � ,� 1 2 ACM Trans. Quantum Comput. 8 • D. Cuomo et al. which is proved to bNP e -hard [10]. Its version on distributed architectures, which we refer todistribute as the d quantum circuit compilation problem (DQCC), is likely to be at least as harQCC d as. In fact, while QCC in we deal with local connectivity restrictions, DQCC lo incal connectivity stands alongside with remote connectivity ś i.e., the entanglement links ś, which is less dense than the local. Furthermor one e, performing a remote operation is much more time consuming than a local operation. Just consider that a remote operation relies on communication of both quantum and classical information. The above reasons make telegates the bottleneck in distributed computing. Therefore, they are worth Notation Description of dedicated analysis to minimize their impact. [�] An enumeration set{1, 2, . . . , �} O Font mainly used to denote operators 4.1 Objective function Δ Time to run operatorO To optimize a circuit, the irst thing we need to do isQ Quotient graph choosing an objective function to rate the expected L Circuit encoding performance of a circuit. A common approach is to L Circuit where only O operators occur evaluate only those operators which are somehow a � Discrete time step bottleneck to computation. Considering the gate set ≺, q Binary relations {CX, CZ, H, T}, in the context of fault-tolerant quantum A Predicate used to characterize q � Boolean variable computing42 [ ], the bottleneck is the T operator [4, � Flow function 82] since error correction protocols are designed for � �-th quantum processor {H, CX}. Conversely, on current NISQ technologies, s, t sources and targets vector the bottleneck lies inCX theand CZ operators, that are � Circuit depth more noisy as they operate on two qubits. The relevant metric can either be the number of occurrences of the subject operator O, namely theO-count, or the number of layers containing O at least once, namely theO- depth. To rate a compiled circuit on distributed architectures, we do something along the lines of this latter approach. Speciically, the bottleneck is RCX theand the RCZ operators, and each RCX or RCZ implies one occurrence ofE. Therefore, we will rate a circuit by meansEof -depth. its As simple example Eof -depth, consider an instance of the problem: a logical circuit where some RCX operators occur. Fig. 7 shows an exemplary one. Let us put in the worst-case scenario, i.e., all the 9 T four qubits belongto diferent processors. Consequently, all the two-qubits operators areRCX. Without considering the tasks which H T RCX relies on, there is not much optimization to do and E-depth the is 5. Fig. 7. Exemplary logical circuit, expressed in 4.2 Modeling the time domain the universal gate set {CX, CZ, H, T}. It should be clear thatE has central interest in our treating. In fact, we are also going to model the time by scanning Eitoccurs. as Speciically, notice that link generations among diferent couples of qubits are independent. For this reason we assume that all the Because the more communication qubits there are, the less computing resources are available. Assigning logical qubits to physical ones ś i.e., qubit mapping ś is another critical step for compilation and it deserves dedicated analysis [5, 26], out of the scope of this work. ACM Trans. Quantum Comput. Optimized compiler for distributed quantum computing • 9 possible links generate simultaneously and, as soon as all the states are measured, a new round of simultaneous generations begins. Clearly, after that a measurement M generates a boolean�, there is at least one post-processing operator that need to wait for that boolean to arrive. Generally speaking, the longer the path the mor�etakes timeto reach its destination. We need to account for that by a proper model. To this aim, we do some observations. Remark. Consider a generic single-qubit unitary opUerator . The time required to perform U is largely dominated by the travel time of �, whilst the actual time takenUby can be neglected. Furthermore, the travel of � is independent from computation. Hence, we can compactly refer to the post-processing waiting-time Δ . Aassecond observation is that the travel of� is also independent by entanglement link creations, which we assume to take Δ . It time is also logical to assume Δ ≲ Δ for the following reasoning: ev�enneif ed to cover a longer distance than the one covered by E, � relies on classical technologies, which are way more eicient than entanglement generation and distribution protocols. For this reason, in our treating we negle Δ �ct , since it happens in parallelΔwith . Stemming from this, we can model the time domain as a discrete set of � ∈steps {1, 2, . . . , �}, where � is an unknown time horizon, which is also E-depth. the At the beginning of each time � step , the whole set of entanglement links is available for telegates. Notice that most of the local operators are expected to run during the creation of the links. Because we relate them to the following inequality Δ ≫ Δ , Δ , Δ , Δ , (1) E CX CZ H T where, for a generic operator O, Δ is the time to run O. Therefore, since E is independent from local operators, we can always attempt to run these while E is running ś and also while classical � ar bits e traveling, as explained in Sec. 3.3.2. 4.3 Modeling the distributed architecture In light of the above observations, it is reasonable and convenient to consider the whole processor as a network node, and deine a function �that provides the number of available links between two processors. Speciically, we irst formalized a distributed architecture as the networkNgraph = (� , �, �) introduced in subsection 2.2; this step was important to understand the interior behavior of remote operations from a qubit perspective. However, now it is useful to re-state it to a more compact encoding, which highlights the main bottleneck of a distributed quantum architecture, the entanglement links. Formally speaking, we will consider quotientagraphofN. To not further weigh down the formalism, we re-model the in- stance, by considering as main nodes, the processors, corresponding to an enumeration for the partition �, i.e.,� = {� , � , . . . , � }. All 1 2 � the entanglement links connecting the same couple of processors, Fig. 8. uotient graph derived from toy net- now collapse two a single edge with integer capacity �, describing work of Fig. 2. The processors become the nodes, how many parallel entanglement links the two processors supplies. the entanglement links between a couple of We refer to this sets of edges as Ø processors gather into one edge, with capacity � ⊆ � × � . � � equal to the number of original links. �,�: �≠� Hence, the new undirected graph isQ = (�, �, �). With this reformulation a remote operation will refer to a control processorand a target processors ś i.e., RCX with� , � ∈ �. �,� � � In Fig. 8 we show the quotient graph related to the toy architecture of Fig. 2. The design of a distributed quantum architecture can easily adapt to satisfy requirements coming from assumptions on classical technologies, since these are very advanced. ACM Trans. Quantum Comput. 10 • D. Cuomo et al. 4.4 Single layer formulation Consider a basic circuit expressed as the singleton L = {ℓ}. Assume that inℓ there occur � RCX operators. From a logical perspective, all �the operators can run in parallel ś by deinition of layer. In other words, if the architecture connectivity had ininite capacity�(ś�i.e ) =.,∞, ∀�∈ � ś we could run L withE-depth 1, that is optimal. As the capacity values decrease, the optimal E-depth value grows, up toE-depth � in the worst-case. Let us formulate an optimization problem for the single-layer case ś we will introduce a generalization to any circuit in subsection 4.5. Speciically quickest , the multi-commodity lo [36] w wraps this basic scenario. In brief, the goal is to ind a low over time which satisfy the constraints imposed by a set of so-called commodities, which are going to representRCX theof a quantum circuit. The less time the low takes, the better. To formalize this problem one can directly model an objective function that evaluates a low by the time it takes. This is an approach employed in Ref. 63], but [ for single commodity. Alternatively, authors in 36]Ref. prop[ose to start from a formulation of multi-commo the dity lopr woblem over time MCF , where � is a given time horizon, namely a maximal number of time steps in which the low is constrained. We prefer this latter way because dynamic lows like MCF has been deeply studied since long time37 ago , 38 [ ]. Furthermore, even if this approach has an important drawback, explained at the end of this sub-section, it does not apply to our scenario. 4.4.1 Commodities. To formulateMCF , irst, we enumerate the occurrences ofRCX inL as a set of commodities [�] = {1, 2, . . . , �}. A set of couples source-sink nodes associates to the commodities. To do that, s = let (� , � , . . . � ) 1 2 � and t = (�, �, . . . � ) be two vectors induced by the operatorsRCX inL such that, 1 2 � RCX ∈ ℓ ⇐⇒ ∃�∈ [�] : � , � ∈ �. �,� � � � � � � Namely,� (� ) is the processor where the control (target) qubit of operation �occurs. � � � � 4.4.2 Decision variables. The decision variables of the optimization problem are the time-dependent functions � (�) ∈ {0, 1}, indicating the low on edge �∈ � dedicated to operation �∈ [�] at time�. The function has a �,� binary co-domain because an operation �uses at most one entanglement link. 4.4.3 Constraints. As usual, the irst constraint we introduce islothe w conservationconstraint. Formally, ∀�∈ [�],∀�∈ [�] and∀� ∈ � ∖{� , � } the following holds: � � � � � ︁ ︁ � (�)− � (�) = 0 (2) �,� �,� − + �∈� (� ) �∈� (� ) � � − + where � , � : � → � are the standard functions outputting the set of entering and exiting edges of the input node, respectively. Since a low� (�) = 1 identiies the usage of an entanglement link �to in perform�, we need to guarantee that �,� the low going through intermediate links of a path does not stop there. Conversely, whenever an end point of the path occurs in the control or target processor ś i.e�., or � ś, the operation demandś or commodity demand � � � � ś constraint holds instead of the conservation constraint. Namely ∀�∈ [,�], this can be written as: ︁ ︁ ︁ ︁ � (�)− � (�) = −1 (3) �,� �,� − + �∈� (� ) �∈[�] �∈� (� ) �∈[�] � � � � ︁ ︁ ︁ ︁ � (�)− � (�) = +1 (4) �,� �,� − + �∈� (� ) �∈[�] �∈� (� ) �∈[�] � � � � The choice of using letter � should highlight that the time horizon is going Eto-depth. be the We need to use vector notation to admit repetitions. ACM Trans. Quantum Comput. Optimized compiler for distributed quantum computing • 11 The above constraint explicitly requests that a low dedicate �reaches to its target� , without exiting. Symmetri- cally, it leaves its control processor � without returning. Notice that constraint (2) forces the operation demand to be satisied within a single time-step. The last constraint ensures that, at any time step, the number of operations does not exceed the entanglement resources. Hence,∀�∈ � and∀�∈ [�], we introduce acapacity bound: � (�) ≤ �(�) (5) �,� �∈[�] Í Í Í Ultimately, the objective function is the total � =low � (�). �∈� �∈[�] � �,� By gathering the above equations, we obtain the Integer Linear Programming formulation (6), which models MCF . A low � perfectly matches a set of entanglement paths used by the telegates. ︁ ︁ ︁ minimize � = � (�) �,� �∈� �∈[�] �∈[�] ︁ ︁ subject to � (�)− � (�) = 0 ∀�∈ [�],∀�∈ [�],∀� ∈ � ∖{� , � }, �,� �,� � � � � � − + �∈� (� ) �∈� (� ) � � ︁ ︁ ︁ ︁ � (�)− � (�) = −1 ∀�∈ [�], �,� �,� (6) − + �∈� (� ) �∈[�] �∈� (� ) �∈[�] � � � � ︁ ︁ ︁ ︁ � (�)− � (�) = +1 ∀�∈ [�], �,� �,� − + �∈� (� ) �∈[�] �∈� (� ) �∈[�] � � � � � (�) ≤ �(�) ∀�∈ �,∀�∈ [�] �,� �∈[�] Notice that solutions with cycles are in general feasible, but are senseless in our scenario. By expressing the problem as a minimization �, aof solver will avoid any cycle and will try to use as few entanglement links as possible. Once deined a solver forMCF , we just need to use it as proposed in Ref.36[], namely the solver occurs as sub- Algorithm 1: Quickest multi-commodity low routine within a binary research on the minimum time where Input: Q, [�] a feasible solution exists. Since the research space is over Output: � time, the algorithm is, in general, pseudo-logarithmic. Specif- 1 �← 1, �← � ically to our case, we already know that the worst solution 2 while �≤ � do is where all the operations run in sequence ś Ei.e -depth ., �+� 3 �← ⌊ ⌋ equal to�. Therefore, the time horizon is upper-bounded 4 �← MCF (Q, [�]) by � and the binary search haslog� calls to the sub-routine. ¯ Algorithm 1 shows the steps. Notice that the algorithm make 5 if � is feasible then use of an undetermined solver for MCF . Since we are facing6 �← � an NP-hard problem, this means that a real implementation 7 �← �− 1 would generally look for sub-optimal solutions. 8 else Unfortunately, standardMCF cannot catch the whole fea- ¯ � 9 �← �+ 1 tures ofDQCC when L = {ℓ , ℓ , . . . , ℓ }; we need to con- 1 2 |L| sider that operations in �] ar [ e somehow related each other by a logic determined L by. Hence in the following sub- section we are going to model such relations by introducing extra constraints. ACM Trans. Quantum Comput. 12 • D. Cuomo et al. 4.5 Any layer formulation As mentioned, the formulation we just gave is not enough to model DQCC theproblem to anyL = {ℓ , ℓ , . . . , ℓ }, because a circuit generally follows a logic which is related 1 2 |L| on the order of occurrence given byL. Therefore, even if it might happen that two operations could run in any order, in general this is not true. One needs to deine an order relation which is consistent with the logic of the circuit. From an optimization point of view, a critical matter is to choose an order relation that either wraps most of the good solutions or is prone to optimization algorithms. For this reason and for the sake of clarity, we here refer to a generic, irrelexive, order relation ≺ deined over [�], Fig. 9. RCX in logical con- without giving it a unique deinition. Formally �, ,�∈for [�],any �≺ �means that to flict as both �and �operate run �we need to ensure that �already ran. Starting from≺, we can deine a constraint on second qubit. to add to formulation (6). Namely∀�,∈ [�],∀�∈ � (� ) the following holds: � (�) ≤ min � (�¯) (7) �,� �,� �≺� � ¯<� The right part of the inequality is a value {0, 1}in and takes value 1 only if all the operations logically preceding �already ran. Notice that constraint (7) is linear, as it takes the minimum value among linear functions, and it can be easily mapped to a set of independent constraints � (�¯) ≤ � (�¯),∀�: �≺ �. �,� � ¯<� �,� The formulation now models DQCC. But, within next section, we reine inequality (7) to get a better solution space. 5 ENHANCING PARALLELISM � �+1 �+2 � �+1 � � 2 2 Z Z � � 1 1 E E H � H � 2 2 � � ≡ � � 1 4 4 1 X Z Z X � � 3 3 E E H � H � 4 4 � � ⊕� 3 1 3 X X Fig. 10. Example of how to achieve quasi-parallelism for tw RCXo in logical conflict. As before, from an optimization point of view, we are interested in considering as many good solutions as possible. To this aim, we propose an interesting approach which should enlarge the space of good solutions. Speciically, we notice that even if two operations �, �∈ [�] are such that �≺ �, this does not necessarily mean that they must run at diferent time steps. They, indeed, may run at the same time step and still respecting the logic imposed≺ by. Consider the example from Fig. 9. Since operations �and �operates over a common qubit, they are in logical conlict. Hence, it is reasonable to think�that ≺ �should hold. However, when considering �and �in their extended form ś i.e., where communication qubits are explicit ś, we notice that their logical conlict does not ACM Trans. Quantum Comput. Optimized compiler for distributed quantum computing • 13 map over all the operations involved. As Fig. 10 shows, the left part of the equivalence is a naive implementation of�followed by �, where the extended form completely inherits the logical conlict. Instead, the right part of the equivalence is way more eicient and it is still an implementation of circuit of Fig. 9. As consequence, even if �and �are in logical conlict, they can run at the same time step. We refer to this property as quasi-parallelism . For this reason we introduce a new binary relation between operations �], in [ which we refer to with the intuitive symb q. Asol before, we do not give here a unique deinition q. Spof eciically, for any�, �∈ [�], we write�q �to mean that operations�and �can run at the same time step, but we did not ix a criterion to establish when q holds. Clearly, operation �,s�∈ [�] which can run in full parallelism, are a special case of quasi-parallelism �q and �holds. We can now split the constraint (7), by discriminating between operations which can run in quasi-parallelism and the ones which cannot.∀Formally �∈ [�],∀�, ∈ � (� ) we introduce two new constraints � (�) ≤ min � (�¯) (8) �,� �,� �≺�∧�/� � ¯<� � (�) ≤ min � (�¯) (9) �,� �,� �≺�∧�q� � ¯≤� To sum up, we propose (10) as Integer Linear Programming formulation of DQCC theproblem.C is the set of constraints coming from the standar MCF d formulation given(6)in . In what follows we propose a characterization for relation q. ︁ ︁ ︁ minimize � = � (�) �,� �∈� �∈[�] �∈[�] subject to C, (10) � (�) ≤ min � (�¯) ∀�∈ [�],∀�∈ � (� ),∀�∈ [�], �,� �,� � �≺�∧�/� � ¯<� � (�) ≤ min � (�¯) ∀�∈ [�],∀�∈ � (� ),∀�∈ [�] �,� �,� � �≺�∧�q� � ¯≤� 5.1 Characterization Our goal is to model q to catch as many solutions as possible, while keeping them feasible to the hardware. With this in mind, we propose the following � � criterion: given any �, ,�� q �holds whenever�and �can run within a certain “small enough" time lapse. Speciically, the time lapse depends on the coherence time of communication qubits, which are assumed to be much more afected by noise than computing qubits. Notice that, when two operations �, �run in quasi-parallelism, the life- time of the employed communication qubits might grow. Therefore, we need Fig. 11. Three RCX operators in logical to ensure that it does not exceed the coherence time of the entanglement. conflict. Formally, let us assumeΔ being the coherence time of the entanglement ś hence, it starts from the momentE ends, up to the beginning of the measurements M. A complication arises from the factq that is, in general,intransitiv an relation. e To understand why this is true, consider the circuit in Fig. 11. In such a scenario we are faced with multiple choices. Namely, running (1) all�, �, � at diferent time steps; ACM Trans. Quantum Comput. 14 • D. Cuomo et al. (2) all�, �, � at the same time step; (3) �, �together and � afterwards; (4) �only, followed by �, �together. Case (1) is not of interest, because it is the worst solution and no op- timization applies. Case (2) is the best solution, but it is not necessarily feasible. In fact, for Δ small enough, we are forced to split the operations, as in one of the cases (3) and (4). This explains the non-transitivity, since �q �and �q �, but �/ �. We still need to characterize q, hence, we introduce a predicate method which aims to bring RCX closer to each other, so that quasi-parallelism is achievable. � � 5.2 A recursive predicate for the quasi-parallelism relation As said above, we are now going to introduce a method which veriies if Fig. 12. Two independentRCX ś i.e., � any two telegates can run in quasi-parallelism. Therefore, this method, say and �ś belonging diferent layers. A(�, �Δ, ), is a predicate, which is true whenever the operations in input can run in quasi-parallelism. We can inally characterize q: �q � ⇐⇒ A(�, �Δ, ). A works in a recursive fashion with three diferent scenarios as base case. Base case (i): given two operations �, ,�if they belong to the same layer, clearly they can run in full parallelism, thereforeA(�, �Δ, ) is true. Base case (ii): similarly to (i), �, �bif elong to diferent layers and they are completely independent ,A(�, �Δ, ) is true. Circuit of Fig. 12 gives an example �, � with in contiguous layers. Base case (iii): assume �, �contiguous ś i.e., in contiguous layers ś and both operating on, at least, one common qubit. We want to introduce, with this base case, the possibility that multiple operators may run simultaneously, as exempliied in Fig.s 10. For this reason, algorithm A considers all the operators involved to perform RCX anś recall protocol from Fig. 4. Namely A pushes , forward the post-processing of �ś i.e., the Pauli operations Z or X ś after the pre-processing of�ś i.e., the CX operations. One can do that by using the following transformation rules: � � � • CX(X ⊗ I) ≡ (X ⊗ X )CX � � � • CX(I⊗ Z ) ≡ (Z ⊗ Z )CX � � • CX(I⊗ X ) ≡ (I⊗ X )CX � � • CX(Z ⊗ I) ≡ (Z ⊗ I)CX Similarly, when CZaoccurs, the following rules apply: � � � • CZ(X ⊗ I) ≡ (X ⊗ Z )CZ � � • CZ(Z ⊗ I) ≡ (Z ⊗ I)CZ After the application of these rules, some post-processing operation, might hav propagate e been d also to communication qubits. Speciically, it may happen that an opXeration should precede a measurement. However, one can always reduce the depth of the circuit by sending � to the target(s) of the measurement. This is indeed what happens in our irst example ś Fig. 10 ś, where, instead of performing X in the communication qubit, � � ⊕� 3 1 3 we opt to put it in combination Xwith , achieving a single operation X ś see also Fig. 13 for a circuit Namely, what�does to its qubits does not afect the qubits �operates on. ACM Trans. Quantum Comput. Optimized compiler for distributed quantum computing • 15 representation. At the end of the circuit manipulation, the life-time of the communication qubits may have risen. If it does not exceeΔ d , then A(�, �Δ, ) is true; otherwiseA,(�, �Δ, ) is false. Φ Φ Φ Recursion: consider now the case where�and �are sepa- rated by a sequence of local operations O , . . . , O , assumed 1 � ¯ ¯ X � � to be conined to the universal set{CX, CZ, H, T}. In this caseA , ¯ ¯ applies, recursively, transformations for�and both�. Specii- � �⊕� X X cally, as long as possible pushes , it forwardthe post-processing of�by using former rules together with: Fig. 13. Propagation ofX . First wire no longer need � � • TZ ≡ Z T information of �. Second wire need information given � � ¯ ¯ • HX ≡ Z H by �⊕ �. Notice that measur�edis not the same value in the two cases. Ultimately, as long as possible A pushes , backward the pre- processing of �by using the following standard rules: • CX(T⊗ I) ≡ (T⊗ I)CX • CZ(T⊗ I) ≡ (T⊗ I)CZ • CZ(I⊗ T) ≡ (I⊗ T)CZ • (CX ⊗ I)(I⊗ CX ) ≡ (I⊗ CX )(CX ⊗ I) �,� �,� �,� �,� • (CX ⊗ I)(I⊗ CX ) ≡ (I⊗ CX )(CX ⊗ I) �,� �,� �,� �,� • CX (H⊗ H) ≡ (H⊗ H)CX �,� �,� • CZ(I⊗ H) ≡ (I⊗ H)CX • CX(I⊗ H) ≡ (I⊗ H)CZ IfA manages to make �post-processing and�pre-processing contiguous, the validity check reduces to the base case scenario. OtherwiseA,(�, �Δ, ) is false. H � � � H 1 4 X Z H A H � ↦−→ H � � ⊕� � H 1 3 6 X Z H � � ⊕� ⊕� 1 3 5 Fig. 14. An expansion, obtained by applying rules fromA. In this example scenario,RCX and RCZ are interspersed with single-qubit local operators. Notice that boolean variables travel simultaneously. Hence, the assumption we made in Sec. 4.2 � ⊕� � � 1 4 6 3 ś i.e., Δ ≲ Δ ś holds also for complex evaluations asZ and X Z . Namely, operations O , . . . , O belong to layers between the ones of �and � 1 � ACM Trans. Quantum Comput. 16 • D. Cuomo et al. So far, we deinedA only for�, �without any other remote operation in between. Before generalizing the method to any �and �we prove that our deinitionA ofcan be implemented so that it runs in polynomial time. We need this requirement to keep things tractable. Theorem 1. A has O(�) complexity, with � being the number of operations A considers. Proof. Assume there occur � local operations, say O , . . . , O , between �and �. IfA manages to push �forward 1 � O , it means that its post-processing run after O and it may only propagate vertically , over diferent qubits ś 1 1 by construction of the rule set. As consequence, the depth of the circuit has not increased. Furthermore, the � � post-processing is still composed by Pauli operations ofZtheorkind X . Hence, this holds for any O and 1≤� ¯≤� the recursion is upper-bounded byO(�). Symmetrically,A if manages to push �backward O , it means that the pre-processing can run befor O e. Also in � � this case, the depth has not increased and the pre-processing is still composed by two indep CXendent operations ś again, by construction of the rule set. Hence, this holds forOany and the recursion is upper-bounded by 1≤� ¯≤� O(�). □ We can now move on to the general case. Formally, between �and �a remote operation� may occur, which is also in logical conlict with both. For such a scenario, we just add a recursive ruleA. Namely (�, �Δ, ),holds if the following holds: ∃�∈ [0, 1] : A(�, �, �· Δ )∧A(�, �(1 , − �)· Δ ). Φ Φ Take a moment to appreciate why this kind of recursion is feasible. Speciically, one might think that validity of A(�, �, �· Δ ) andA(�, �(1 , − �)· Δ ) are not independent, because they both operate on�. However, in the former Φ Φ function,A evaluates the pre-processing of �, while, in the latter, it evaluates its post-processing. Therefore they can be evaluated independently. Theorem 2. GeneralizeA d has O(� ) complexity, with � being the number of operations A considers. Proof. Assume there occur � , . . . , � between �and �. For the purpose of the proof let � being a power of 2. 1 � A(�, �Δ, ) can choose any of the� , . . . , � operations for the recursion. To keep symmetry,A let(�, � , �· Δ ) Φ 1 � Φ � � and A(� , �,(1− �)· Δ ) be the recursive call. Notice that operations considerA ed(�by , � , �· Δ ) are , as Φ Φ 2 2 well as the ones considered byA(� , �,(1− �)· Δ ). The result is a recursive binary tree of height log� and, therefore,O(� ) calls toA. The leaves correspond to the base case of the recursion, which is proved to be tractable in Theorem 1. □ Fig. 14 shows an example scenario where we used rules asAin ś in addition to the irst one of Fig. 10. Clearly, our modular architecture is prone to modiications or extensions A, if offuture research highlighted more reined requirements. Remark. Notice that we managed to deineA to be independent by the connectivityQof . This was possible thanks to the way we modeled telegates via eicient entanglement paths ś see Appx. A. In other worA ds,(�, �Δ, ) works for any solver and regardless of the path this chooses to perform �and �. As consequence, the characterizationA ofś and therefore also of q ś is static and depends only by the logical circuit and global factors, Δ . Furthermor i.e., e, we may relate coherence time and entanglement link creation Δ +toΔ ≈ Δ . As consequence, whatever Δ is,A E Φ E Φ does not signiicantly afect the duration of each time step. This makes E-depth thea particularly good index for the running time of the overall computation. ACM Trans. Quantum Comput. Optimized compiler for distributed quantum computing • 17 5.3 The role of the Cliford group in distributed quantum computing In our algorithm, we tried to postpone the post-processing as much as possible, to allow classical information to travel across remote computers in the meantime. An ideal result would be to push it to the end of the circuit: indeed, since the post-processing is made only of Pauli gates, if it were located at the end of the circuit, it could be eiciently replaced by a classical computation, removing the need of the quantum state to remain coherent while the information travels. We show in the next subsection that pushing the post-processing to the end is possible if the circuit belongs to a particular class, namely Clifor the gr d oup, generated by the operators {CX, CZ, H, T } (or 2 2 by the minimal sets {CX, H, T } and {CZ, H, T }). Let us introduce here some basic facts about such a group. The interest in the Cliford group derives from the fact that it covers a wide spectrum of circuits, but does not include the complexities of the 2� |�⟩ T|�⟩ T gate. The Cliford group can also be eiciently simulated on a classical computer. We already discussed that the T gate represents the most error- |�⟩ prone gate in the fault-tolerant context. On the other hand, it is obvious that the Cliford group together with the T gate is universal [75]. For this Fig. 15. Example ofT gateinjection. reason, it makes sense to represent an arbitrary circuit in terms of a Cliford circuit plus as little T gates as possible. This was attempted in literature in two ways: • decompose circuits, with the goal of minimizing the numb T gateeroccurr of ences [4, 82]; • injectT gates into a Cliford circuit, by means of state preparation [62, 92, 95]. A basic example of T injection is shown in Fig. 15, where injection is performed through one auxiliary qubit, prepared in the state 1 i� |�⟩ = TH|0⟩ = √ (|0⟩ + � |1⟩). (11) Other facts about the Cliford group are worth being reported. Speciically, distributed architectures based on trapped ions50 [ , 73, 86] are well itted to work with state injection on Cliford circuits. Indeed, experimental results show that single-qubit gates can run with .9999% 99 idelity43[] and that CX (or CZ) operators, can achieve a 99.9% idelity 7].[ Furthermore the local connectivity for such a processor is complete 64]. This [ means that a T injection would give a idelity ∼ 99.of 8997%, if prepared as in equation (11) and circuit of Fig. 15, without the need of distillation nor of local routing. As a consequence, future architectures relying on entanglement generation and distribution, are likely to supply T inje some ction module too. 5.4 Circuit normal forms for the Cliford group and implications on the post-processing As said at the beginning of Sec. 5.3, important beneits could be achieved by postponing the post-processing to the end of the circuit, where they can be computed classically. An attempt in this direction is available 65], in Ref. [ where authors delay Pauli operations together with non-Pauli ones. Instead, our approach is to show that the result can always be achieved on the Cliford group, by relying on normal the forms [1, 27, 29, 71]. Such a form results particularly useful for distributed computing and, more in general, measurement-base for d computation. It was shown [29] that any Cliford gate acting on a Pauli state, can be represented in the normal form depicted in Fig. 16. This normal form is of practical interest as it can be obtained starting from any Cliford circuit, which is in general not in normal form. Such a result comes from the employment ZX-calculus of a reasoner ś e.g. [53]. ZX-calculus29 [ , 87] is a graphical language arisen as an optimizer for quantum circuits, that translates a quantum circuit into ZXa-diagram . The main diference between the diagram and the original circuit is that the former works with ZX-rules, which serve as a reasoning tool to smartly generate a new circuit, equivalent to the original one ZX.-calculus was recently introduced in the literature, with the main objective of minimizing a ACM Trans. Quantum Comput. 18 • D. Cuomo et al. circuit gate-depth, and its potentiality is still being explored, raising increasing interest for its versatility. In fact, we use it here to perform architecture-compliant optimization. Let us describe the few tools and properties we need to benchmark our compiler, while the interested reader can refer to the bibliography for a more extended dissertation. Coming back to Fig. 16, we use the circuit symbol to express a generic Pauli state preparation. Similarly, the symbolexpresses a generic Pauli measurement.L is a set of layers where only the O operator occurs. For exampleL encodes a circuit composed O CZ by CZ operators . (1) (2) L L L L . CZ CX H CZ . . . . . Fig. 16. Normal form coming from the ZX-rules applied in Ref. [29]. The following remark is a consequence of dealing with Cliford circuits in normal form. Remark. While predicate A is running, only Pauli and Hadamard operations concur to its evaluation. Hence, all the post-processing operations can be pushed forward, up to end of the circuit and can be computed eiciently by a classical computer. Furthermore, since no post-processing occurs during quantum computation, the entanglement path length has a negligible impact. (1) The normal form suggests that the problem can be separated into three parts, correspondingLto, L and CX CZ (2) (1) (2) L . For two of them ś i.e.,L and L ś the order relation is trivial (as CZall commute), and therefore we CZ CZ CZ can use any quickest multi-commodity low solver to get a feasible compilation. On the contrary, the optimal characterization of the order relationLforis a conceptually complex task. Indeed, a set of relations with CX minimal size may not be the best characterization from a practical point of view, if many of the relations involve remote qubits. The topic of optimal CX order relations deserves a dedicated analysis and is the subject of future work. Let us emphasize the importanceL of circuits, by pointing out some facts from71Ref. ]. The [ authors therein CZ introduce theBoolean degrees of freedom as a way to count how many diferent algorithms can be implemented with a class of gates, and show that a generic L “has roughly half the number of the degrees of freedomž CZ compared to a genericL , and roughly a quarter compared to the Cliford group. We validate our compiler CX performance by solvingL circuits on diferent architectures in Sec. 7. So, being able to exploit normal forms to CZ (1) (2) isolate two highly expressive bloL cksand L that can be compiled without recurring to order relations, is a CZ CZ very relevant result. Before discussing the implementation details, let us make a inal rZX emark -calculus. on We introduced it in the context of the Cliford group, but it is designed to work more broadly with any6cir , 14,cuit 47, 52[]. Therefore, we aim to expand our analysis in future works, by investigating normal forms for universal circuits. An interesting result in this sense is available in 44],Ref. wher[e authors show that a universal circuit can be split into three steps: Notice thatCZ ≡ H CX H . Thus, we do not need to expand our assumptions on the gate set. �,� � �,� � ACM Trans. Quantum Comput. Optimized compiler for distributed quantum computing • 19 (1) the system is prepared innon-Clifor a d state , this involves auxiliary qubits which will do the work of injecting non-Cliford phases ś e.g. the T gate; (2) an L circuit; CX (3) a measurement-based sequence of Cliford operations (which can still be treateZX d-calculus with [28]). 6 IMPLEMENTATION TECHNIQUES 6.1 Time-expansion Formulation (10) is a particular caseMCF of , as it slightly recedes from the standard formulation. As expected, the problem is still intractable. To understand that, consider this simple scenario: � an ] with instance � = 2[such that 1 q 2. We can restate the problem as follows: assert if there exists a solution at irst time step. If not, just put operation 2 at second time step. Unfortunately, asserting if such a solutionNP exists -hard.is Indeed, in Ref. [32], authors proved the hardness of such a decision problem, even for single capacity edges. Therefore, it is reasonable to look for approximations DQCC of . To this aim, we think a good line of research would be to follow a common technique for tackling MCF : the time-expansion [38]. Namely, a re-deinition of the instance graph, fromQ to a new graph Q . Such a technique is useful because, instead of tackling MCF over Q, one can tackle its � � static version MCF over Q . Let us introduce it formally for our scenario. A time-expansion of Q is a graphQ = (� , � ). Accordingly to this criterion, an � edge , � )(∈ � taking � � � � � discrete travel time � would translate into directed edges � (�(), � (�+ �)), (� (�), � (�+ �)) ∈ � , with a shared � � � � constraint on the capacity. Nevertheless, edgesQin are assumed to have null travel time. Hence, a time-expansion ofQ is particularly eicient, since one just needs to introduce a repetition Q for each of time-step �, which we refer to asQ(�) = (�(�), �(�)). As consequence, time-dependent setss(�) and t(�) replaces and t. We keep using s and t as the nodes encoding the commodities, non-localized in time. � For andeach �, we introduce edges (� , � (�)) and (� (�), � ), both with unit capacity. � � � � � � � � Since only integral low are allowed and the demand is exactly 1, for any operation �, only one of the edges{(� , � (�))} ś as well � � � � � as only one in{(� (�), � )} ś will have a non-zero low. � � � � � Now that we gave a irst intuitive way to encode the sources of the problem, let us optimize it. Notice that operation 1 can always run at time 1, and it is a waste of time and space considering other options. Fig. 17. Time-expanded graph of4 processors, for an instance[�] with� = 3 and time horizon As consequence, for operation 1, we only introduce � ,(� (1)) and � � � � � = 2. (� (1), � ). This extends to any operation, which can always run in � � � � a time between 1 andmin{�, �}, by assuming that a solution exists with time horizon �. Therefore, for each operation �, we introduce the sets of edges{(� , � (�)) : ∀1 ≤ �≤ min{�, �}} and {(� (�), � ) : ∀1 ≤ �≤ min{�, �}}. Fig. 17 shows the � � � � � � � inal graph for instance �] [with� = 3, time horizon � = 2 on an architecture with 4 processors. As said, the time expansion Q is a common way to tackle MCF as a static low problem and it is particularly � � eicient in our scenario. Speciically, we couldQmoby del simply introducing � repetitionsQ ofand, especially, without the need of edges connecting diferent time-steps Q(�), Q(�¯). Because of this result, we are also able to implement a time-expansion at a logical level, without actually allocating � repspace etition for Q of. This is detailed in Sec. 6.3. To the best of our knowledge, even if approximation algorithms MCF [for 20, 85] and variants16 [ , 18, 19, 79, 84] have been extensively studied, there seems to be no proposal relatable to ours, modeling DQCC. More formally, no eicient reduction seems possible from our problem to standard formulations, while approximation algorithms proposed in literature usually rely LPon -relaxation, or on greedy criteria. Theses proposals do not guarantee ACM Trans. Quantum Comput. 20 • D. Cuomo et al. that constraints(8) and (9) are satisied. Hence, further studies along this line would be useful to (i) place the problem within its most proper complexity class and to (ii) guarantee approximation ratio. 6.2 Transformation to direct graph Since the literature dealing MCF with usually assume a directed graph, we here report a mapping method from an undirected graph to an equivalent one with direct edges. This would bring just a constant overhead in the space, while it would not afect any approximation factor which a solver would rely on. Fig. 18 comes 2]. It is from a fast [ approach to map an undirected multi-commodity low problem to a directed one. Speciically, for each couple of ′ ′ nodes � , � connected by an edge with capacity �, one have to introduce two extra nodes, say� , � and connect � � � � ′ ′ them with the direct edge � (, � ) of capacity �. The last step is creating directed cycles of ininite capacity, where � � the only bottleneck is �. 6.3 Compilation through approximation We already discussed in Sec. 4.4 how to tackle DQCC as a particular case of quickest multi-commodity low. In this way we managed to reduce the problem on the resolution of one or more static instance of theMCF. In Refs. [55, 56] it has been shown that whenever each Fig. 18. Mapping from an undirected graph to a commodity is a source (or a target) for any other node, than solving directed one working for any multi-commodity it throughLP-relaxation outputs an optimal solution MCFto . This flow problem. The transformation undergoes result can be of interest when treating fully entangling cir. cuits with a constant overhead in the number of To keep the compiler more general, we opted to investigate algo- nodes and edges. rithms with approximation boundary guarante 57e,d58[, 69]. Specif- ically, we implemented the pseudo-code outlined in 33].Ref. This [ is followed by a proof on the approximation quality for the case of capacity�= 1 and �> 1. We focus on the case�= 1, but it can be extended to �> 1. By using our formalism, the approximation algorithm aims to run as many non-local operators ś i.e. satisfying commodities demand ś as possible. A computed solution is a sub-set �⊆ [�]. The optimal solution � is⊆ [�] and |�| ≤ |� |. It follows the (optimal) approximation boundary [33, 69]: |� | |�| ≥ √ , � = |�| (12) O( � ) Notice that the solution quality is inversely proportional to the number of entanglement links. It means that we cannot estimate an optimal solution toDQCC the, as for a given time horizon, this afects the quality of the solution space. Furthermore, the time-expansion increases the number of edges and so does the distance |� |−|�|. Ultimately, even if the allocated space by the time-expansion grows at most linearly with the number of non-local operations ś see Sec. 4.4 ś, this can seriously afect the performance when such an amount is very big . On contrary, it is possible to keep the time-expansion abstract and compiling iteratively as many operations as possible at each time-step. This method is detailed in Algorithm 2. Notice that each iteration guarantees the boundary of equation (12) and, above all, since the instance decreases in size, the distance |� | − |�| tends to decrease as well. 7 EVALUATION Better upper-bounds for the worst-case solution should be investigated. ACM Trans. Quantum Comput. Optimized compiler for distributed quantum computing • 21 As distributed quantum architectures are still at an early stage, it Algorithm 2: Iterative compiler is hard to predict with conidence what kind of connectivity and Input: Q, [�] resources they will supply. Furthermore, it is worth mentioning that Output: � distributed computing, by its nature, presents features coming from 1 �← [�] routing models as well as compiling models. Hence, we here report 2 �← 0 and compare an interesting work available in literature dedicated 3 while � ≠ ∅ do on routing entangled states77[]. In such a manuscript, authors deal 4 �← MCF(Q, �) with unreliable optical links to create entanglement and dynamically 5 �← �∖ � choose a multi-path solution in order to maximise the entanglement success-rate. Even if our network topology relies on the same ar- 6 �← �+ 1 chitecture, we model the linkage through a single path which is dedicated to the entanglement generation and distribution for a time Δ , taken big enough to guarantee a high idelity. This is a fundamental diference, making the two models diicult to compare. Here we evaluate thesquare lattice topology proposed in Ref. 77[] by comparing it withhe an xagon lattice topology. We therefore verify the compiler performance for both the lattices in terms of: • solution quality; • robustness to scale-up. We conclude the comparison with the possible implications of the results. 7.1 Set-up To compare the compiler performance on diferent topologies, we make usegenerator of a factor�. The number of nodes and edges of each lattice will be expressed as a function �. Because of the two lattices difer by deinition, it is not trivial to settle a fair comparison. To do that, we irst generate a sample of hexagon H such lattices that 2 2 1 3 7 |�| = /2· � + 3�+O(1), |�| = /4· � + /2· �+O(1). (13) We compareH with two square lattices, say S andS , that have sizes respectively lower and higherH than for ▼ ▲ each � ś see Fig. 19. Hence,S is such that 2 2 1 3 1 |�| = /4· � + /2· �+O(1), |�| = /2· � + 2�+O(1). (14) whileS is such that 2 2 |�| = 2� + 2�, |�| = � + 2�+O(1). (15) We show in the next subsection thatS andS perform better thanH in terms of resulting E-depth. This implies ▲ ▼ that the square lattice is a better design for distributed quantum computers, assuming that our compiler performs equally well on diferent topologies. (a) Square laticeS . (b) Hexagon laticeH. (c) Square laticeS . ▼ ▲ Fig. 19. Example of latices used for the experimental evaluation; they all come from generator � = 4. ACM Trans. Quantum Comput. 22 • D. Cuomo et al. Since we use Algorithm 2, capacities are assumed to be 1. We already pointed out that such an algorithm can be extended to the case �> 1. Notice that diferent node degrees imply diferent assumptions on the processor � units . The hexagon lattice has node degree upper-bounded by 3 and lower-bounded by 2, which means that � has 2 to 3 communication qubits. Similarly, the square lattice has degree upper-bounded by 4. Hence, the communication qubits per unit are 2 to 4. Since our focus here is on distributed compilation, we will assume � has 1that computation qubit. This is especially reasonable when considering that real implementation of distributed architecture may use most of their local resourcesauxiliar as y qubits , meant to keep the computation fault-tolerant. Concerning the life-time of the entanglement Δ , this comes after that the operator E succeeded to store the state in the distributed system. While performing E is the hardest part ś as it takes a long time Δ [48] ś, once it succeeds, the storage on matter qubits is quite performing 89]. For[ this reason, we can just assume that the coherence time is long enough to satisfy Δ > 4· Δ ; where the factor 4 is an upper-bound for the node degrees Φ CZ of lattices. For the numerical evaluation we use a generating vegctor = (1, 2, . . . , 11). Hence, when the generator is ixed to 11, the size ofH reaches |�| = 96 and |�| = 131,S reaches |�| = 49 and |�| = 84, whileS reaches |�| = 144 ▼ ▼ and |�| = 264. Ultimately, regarding the circuits, we have already discussed in Sec. 5.4 that from any Cliford circuit we can extract 3 separated sets of 2-qubits gates and focus onL circuits. For this reason, we here consider L circuits. CZ RCZ We generate three samples classiied by their size (or number of occurring operators). Each sample is composed by 10 random circuits in order to average the results. The size of the samples are 256, 512 and 1024. 7.2 Results To evaluate the results we used thematlab environment72 [ ]. The employed architecture is a MacBook Air (M1, 2020, 8GB RAM). The irst result ś shown in Fig. 20 ś is a comparison on the solution quality, a.k.a. E-depth. the As anticipated, Hexagon lat. Hexagon lat. Hexagon lat. 80 150 50 100 0 0 0 2 4 6 8 10 12 0 2 4 6 8 10 12 0 2 4 6 8 10 12 Generator Generator Generator (a) 256 RCZ (b) 512 RCZ (c) 1024 RCZ Fig. 20. uality scale comparison. the plots show that a square lattice gives better solutions, for any problem size. We can relate this behavior to the |�| ratio edges-to-nodes . Formally, let � = be such a ratio for a graphQ. Then it results that square lattices have |�| ratio: lim� = 2. (16) �→∞ Instead hexagon lattices have a lower ratio: lim� = /2. (17) �→∞ ACM Trans. Quantum Comput. Depth Depth Depth Optimized compiler for distributed quantum computing • 23 This suggests that the bigger the ratio, the better the solutions. The plots also show that the depth achieved by the diferent lattices may be ruled by the same polynomial function (up to some constant factor). This is in line with the intuition that a more connected topology allows for shorter depth. Furthermore, we already mentioned in Sec. 6.3 that, even if the approximation algorithm depends on the edges size, this is called as a subroutine that performs better and better at each iteration. All this may mean that the compiler has a convergence to an optimal depth. On contrary, if the compiler was afected by the number of edges, the functions should swap at some point, but we never observed such phenomenon. To conclude our evaluation, we took the average times for each sample. The results are shown in Fig. 21. Diferently from what we got in the solution quality evaluation ś where we noticed a similar behaviour for each architecture ś, the time-scale gives new perspectives in the lattices comparison.HInand fact,S seems to need approximately the same time to compile any circuit,S with performing slightly worse ś which is coherent with the size diference between the twos. Instead,S outperforms the others lattices. Furthermore, it seems that it is more resistant to scale-up as the scaling seems to follow a lower degree function. 1.4 0.35 Hexagon lat. Hexagon lat. Hexagon lat. 1.2 5 0.3 0.25 4 0.8 0.2 0.6 0.15 0.1 0.4 0.05 0.2 0 0 0 2 4 6 8 10 12 0 2 4 6 8 10 12 0 2 4 6 8 10 12 Generator Generator Generator (a) 256 RCZ (b) 512 RCZ (c) 1024 RCZ Fig. 21. Time scale comparison 8 CONCLUSION To conclude this manuscript, let us highlight the main beneits of our framework for DQCC treating , as well as the key indings. (i) By expressing the problem as a quickest low problem, we could give a formulation corresponding to a multi-commodity low problem over ixed time. This approach is particularly well itting with our goals, because a quickest low expresses the need to run a circuit as fast as possible, while a low over ixed time brings a side interest into the minimization of resource usage, which is desideratum clearly a, but still secondary to the overall running-time. (ii) Quasi-parallelism, represented by constraints (8) and (9), gives the possibility to consider a wider solution space. Quasi-parallelism is grounded on the idea of gathering logically sequenced telegates within the same time step, by means of an eicient circuit manipulation ś see preA dicate . (iii) We built our model step by step, each of which rigorously explained. The result is an highly modular work. For example, if one can consider only circuits where operations can all commute each other, formulation (6) is enough and approximation bounds are available. Instead, when considering any circuit, one can easily shape the extra constraints of formulation (10). Consider, for example, the quasi-parallelism relation q, we characterized it as the predicateA. By just extending the wayA works, the space of good solutions gets larger. (iv) Since we modeled the problem as a network low problem, one can also exploit the huge related literature to get inspiration in the way of tackling the problem. ACM Trans. Quantum Comput. Seconds Seconds Seconds 24 • D. Cuomo et al. (v) We deeply investigated the literature on quantum circuits and logic in order to tackle big groups of circuits with a form which would be well itting with the constraints coming from the architecture. This led us to focus on circuits expressed in normal forms. By tackling individual normal forms, the compiler can be modulated to a form chosen and take advantage from the properties coming from a normal form. We started by outlining a normal form for Cliford circuits up to one for universal circuits. From this step-by-step analysis of the circuit, we will be able to improve the compiler in future works, while at the same time being able to evaluate our model by means of a restricted group of circuits. (vi) We applied our compiler on diferent topologies. We focused on square and hexagon lattices and showed that square lattices outperforms hexagon ones, both in terms of solution quality E-depth)(and running-time. We gave some perspective on why we obtained such results, showing that the ratio edges-to-nodes is a representative metric. A ENTANGLEMENT SWAP GENERALIZATION Within this section we show how to eiciently implement an entanglement path. In Sec. 3.3, we introduced the entanglement swap as a circuit of depth 5. We also claimed that such a depth is ixed when generalizing the entanglement swap to the entanglement path. To this aim, we give an inductive proof for such a statement, starting from the base case with entanglement path of length 2. Theorem 3. An entanglement path{� , � , . . . , � } has an implementation with depth 5. � � � 1 2 � Proof. Consider, as base case, that we want to create a path of length 2. Clearly, we could do that by just putting in strict sequence two entanglement swaps: � � 1 3 Z Z H � 2 H � X 3 The colored operators are the only ones we are going to optimize; since the others are independent and no optimization can be applied. What follows is the base case for the induction: � � � ⊕� 1 3 1 3 Z Z Z 2 H � H � X 3 3 � � 4 4 � � ⊕� 4 2 4 X X � ⊕� � ⊕� 1 3 2 4 Speciically, circuit on the right of equation has post-processing comp Z osedon byirst qubit and X on last qubit. Furthermore, now the measurements are independent from other operations. By assuming that such a shape is preserved in the inductive step, we show that this transformation can be applied to any length: ACM Trans. Quantum Comput. Optimized compiler for distributed quantum computing • 25 � ⊕� ⊕···⊕� � � ⊕� ⊕···⊕� 1 3 2(� −1)−1 2� −1 1 3 2� −1 Z Z � ⊕� ⊕···⊕� 2 4 2(� −1) H � H � 2� −1 2� −1 � � 2� 2� � � ⊕� ⊕···⊕� 2� 2 4 2� X X This proves that we can always consider an entanglement path {� , � , . . . , � } to have circuit depth 5. □ � � � 1 2 � We just showed an eicient implementation for the entanglement path. Now we do one last step to exploit such a result and performing a generalized remote operation eiciently. Theorem 4. An RCX of entanglement path{� , � , . . . , � } has depth 5. � � � 1 2 � +2 Proof. Theorem 3 allows us to assume that, to perform a remote operation by using a path of length � , the computing qubits interact only with two communications qubits and depend only by Pauli operations � ⊕� ⊕···⊕� � ⊕� ⊕···⊕� 1 3 2� −1 2 4 2� Z and X . We can furtherpropagate such operations as follows: � � ⊕� ⊕···⊕� ⊕� 2� +2 1 3 2� −1 2� +2 Z Z � ⊕� ⊕···⊕� 1 3 2� −1 � � Z 2� +1 2� +1 � ⊕� ⊕···⊕� 2 4 2� H � H � X 2� +2 2� +2 � � ⊕� ⊕···⊕� ⊕� 2� +1 2 4 2� 2� +1 X X In this way the measurements are independent and the depth of the circuit is not increased. □ REFERENCES [1] Scott Aaronson and Daniel Gottesman. 2004. Improved simulation of stabilizerPcir hysical cuits.Review70, A 5 (2004), 052328. [2] Ravindra K Ahuja, Thomas L Magnanti, and James B Orlin. 1988. Network lows. (1988). [3] Nitzan Akerman, Nir Navon, Shlomi Kotler, Yinnon Glickman, and Roee Ozeri. 2015. Universal gate-set for trapped-ion qubits using a narrow linewidth diode laser New. Journal of Physics 17, 11 (2015), 113060. [4] Matthew Amy, Dmitri Maslov, and Michele Mosca. 2014. Polynomial-time T-depth optimization of Clifor T cir d+cuits via matroid partitioning. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 33, 10 (2014), 1476ś1489. [5] Pablo Andres-Martinez and Chris Heunen. 2019. Automated distribution of quantum circuits via hypergraph partitioning. Physical Review A100, 3 (2019), 032308. [6] Miriam Backens. 2014. The ZX-calculus is complete for stabilizer quantum mechanics. New Journal of Physics 16, 9 (2014), 093021. [7] CJ Ballance, TP Harty, NM Linke, and DM Lucas. 2014. High-idelity two-qubit quantum logic gates using trapped calcium-43 ions. arXiv preprint arXiv:1406.5473 (2014). [8] Robert Beals, Stephen Brierley, Oliver Gray, Aram W Harrow, Samuel Kutin, Noah Linden, Dan Shepherd, and Mark Stather. 2013. Eicient distributed quantum computing. Proceedings of the Royal Society A: Mathematical, Physical and Engineering 469, Sciences 2153 (2013), 20120686. [9] Kyle EC Booth, Minh Do, J Christopher Beck, Eleanor Riefel, Davide Venturelli, and Jeremy Frank. 2018. Comparing and integrating constraint programming and temporal planning for quantum circuit compilation. 28th international In conference on automated planning and scheduling . [10] Adi Botea, Akihiro Kishimoto, and Radu Marinescu. 2018. On the complexity of quantum circuit compilation. Eleventh annual In symposium on combinatorial sear . ch [11] Lukas Burgholzer, Sarah Schneider, and Robert Wille. 2022. Limiting the Search Space in Optimal Quantum Circuit 2022 Mapping. In 27th Asia and South Paciic Design Automation Conference (ASP-D . IEEE, AC) 466ś471. [12] Angela Sara Cacciapuoti, Marcello Calei, Francesco Tafuri, Francesco Saverio Cataliotti, Stefano Gherardini, and Giuseppe Bianchi. 2019. Quantum internet: networking challenges in distributed quantum computing. IEEE Network 34, 1 (2019), 137ś143. [13] Angela Sara Cacciapuoti, Marcello Calei, Rodney Van Meter, and Lajos Hanzo. 2020. When entanglement meets classical communica- tions: Quantum teleportation for the quantum internet. IEEE Transactions on Communications 68, 6 (2020), 3808ś3833. ACM Trans. Quantum Comput. 26 • D. Cuomo et al. [14] Titouan Carette, Emmanuel Jeandel, Simon Perdrix, and Renaud Vilmart. 2021. Completeness of Graphical Languages for Mixed State Quantum Mechanics.ACM Transactions on Quantum Computing 2, 4 (2021), 1ś28. [15] Davide Castelvecchi. 2018. The quantum internet has arrived (and it hasn’t). Nature 554, 7690 (2018), 289ś293. [16] Amit Chakrabarti, Chandra Chekuri, Anupam Gupta, and Amit Kumar. 2007. Approximation algorithms for the unsplittable low problem.Algorithmica 47, 1 (2007), 53ś78. [17] Kaushik Chakraborty, David Elkouss, Bruno Rijsman, and Stephanie Wehner. 2020. Entanglement distribution in a quantum network: A multicommodity low-based approach. IEEE Transactions on Quantum Engineering 1 (2020), 1ś21. [18] Chandra Chekuri, Sanjeev Khanna, and Bruce Shepherd. 2004. The all-or-nothing multicommodity low problem. ProceeIn dings of the 36th annual ACM symposium on theory of computing . 156ś165. [19] Chandra Chekuri, Sanjeev Khanna, and Bruce Shepherd. 2006. AnO( �) approximation and integrality gap for disjoint paths and unsplittable low The . ory of computing 2, 1 (2006), 137ś146. [20] Dae-Sik Choi and In-Chan Choi. 2006. On the efectiveness of the linear programming relaxation of the 0-1 multi-commodity minimum cost network low problem. InInternational Computing and Combinatorics Confer . Springer ence , 517ś526. [21] Claudio Cicconetti, Marco Conti, and Andrea Passarella. 2021. Request Scheduling in Quantum Netw IEEE Torks. ransactions on Quantum Engineering 2 (2021), 2ś17. [22] Daniele Cuomo, Marcello Calei, and Angela Sara Cacciapuoti. 2020. Towards a distributed quantum computing eIET cosystem. Quantum Communication 1, 1 (2020), 3ś8. [23] Davood Dadkhah, Mariam Zomorodi, and Seyed Ebrahim Hosseini. 2021. A New Approach for Optimization of Distributed Quantum Circuits. International Journal of Theoretical Physics 60, 9 (2021), 3271ś3285. [24] Omid Daei, Keivan Navi, and Mariam Zomorodi. 2021. Improving the Teleportation Cost in Distributed Quantum Circuits Based on Commuting of Gates.International Journal of Theoretical Physics 60, 9 (2021), 3494ś3513. [25] Omid Daei, Keivan Navi, and Mariam Zomorodi-Moghadam. 2020. Optimized Quantum Circuit International Partitioning. Journal of Theoretical Physics 59, 12 (2020), 3804ś3820. [26] Zohreh Davarzani, Mariam Zomorodi-Moghadam, Mahboobeh Houshmand, and Mostafa Nouri-baygi. 2020. A dynamic programming approach for distributing quantum circuits by bipartite Quantum graphs. Information Processing 19, 10 (2020), 1ś18. [27] Jeroen Dehaene and Bart De Moor. 2003. Cliford group, stabilizer states, and linear and quadratic operations over Physical GF (2). Review A68, 4 (2003), 042318. [28] Ross Duncan. 2012. A graphical approach to measurement-based quantum computing. arXiv preprint arXiv:1203.6242 (2012). [29] Ross Duncan, Aleks Kissinger, Simon Perdrix, and John Van De Wetering. 2020. Graph-theoretic Simpliication of Quantum Circuits with theZX-calculus.Quantum 4 (2020), 279. [30] Wolfgang Dür, Raphael Lamprecht, and Stefan Heusler. 2017. Towards a quantum internet. European Journal of Physics 38, 4 (2017), [31] Andrew Eddins, Mario Motta, Tanvi P Gujarati, Sergey Bravyi, Antonio Mezzacapo, Charles Hadield, and Sarah Sheldon. 2022. Doubling the size of quantum simulators by entanglement forging. PRX Quantum 3, 1 (2022), 010309. [32] Shimon Even, Alon Itai, and Adi Shamir. 1975. On the complexity of time table and multi-commodity low16th problems. AnnualIn Symposium on Foundations of Computer Science . IEEE, 184ś193. [33] Li Fei. 2017. Multicommodity Flows and Disjoint Paths Problem. https://cs.gmu.edu/~lifei/teaching/cs684spring17/lec8.pdf. [34] Davide Ferrari and Michele Amoretti. 2021. Noise-Adaptive Quantum Compilation Strategies Evaluated with Application-Motivated Benchmarks. arXiv preprint arXiv:2108.11874 (2021). [35] Davide Ferrari, Angela Sara Cacciapuoti, Michele Amoretti, and Marcello Calei. 2021. Compiler Design for Distributed Quantum Computing.IEEE Transactions on Quantum Engineering 2 (2021), 1ś20. [36] Lisa Fleischer and Martin Skutella. 2002. The quickest multicommodity low prInternational oblem. In Conference on Integer Programming and Combinatorial Optimization . Springer, 36ś53. [37] Lester R Ford Jr and D.R. Fulkerson. 1958. A suggested computation for Maximal Multi-Commodity NetworkManagement Flows. Science5, 1 (1958), 97. [38] Lester R Ford Jr and Delbert Ray Fulkerson. 1958. Constructing maximal dynamic lows from staticOp loerations ws. research6, 3 (1958), 419ś433. [39] Ranjani G Sundaram, Himanshu Gupta, and CR Ramakrishnan. 2021. Eicient Distribution of Quantum Cir 35thcuits. International In Symposium on Distributed Computing . Schloss Dagstuhl-Leibniz-Zentrum für Informatik. [40] Jay Gambetta. 2022. Expanding the IBM Quantum roadmap to anticipate the future of quantum-centric supercomputing. [41] Alysson Gold, JP Paquette, Anna Stockklauser, Matthew J Reagor, M Sohaib Alam, Andrew Bestwick, Nicolas Didier, Ani Nersisyan, Feyza Oruc, Armin Razavi, et. al 2021. Entanglement across separate silicon dies in a modular superconducting qubit npjdevice. Quantum Information 7, 1 (2021), 1ś10. [42] Daniel Gottesman. 1998. Theory of fault-tolerant quantum computation. Physical Review57, A 1 (1998), 127. ACM Trans. Quantum Comput. Optimized compiler for distributed quantum computing • 27 [43] TP Harty, DTC Allcock, CJ Ballance, L Guidoni, HA Janacek, NM Linke, DN Stacey, and DM Lucas. 2014. High-idelity preparation, gates, memory, and readout of a trapped-ion quantum bit. Physical review letters 113, 22 (2014), 220501. [44] Luke E Heyfron and Earl T Campbell. 2018. An eicient quantum compiler that reduces T count. Quantum Science and Technology 4, 1 (2018), 015004. [45] Stefan Hillmich, Alwin Zulehner, and Robert Wille. 2021. Exploiting quantum teleportation in quantum cir 2021 cuit 26th mapping. In Asia and South Paciic Design Automation Conference (ASP-D . IEEE, AC) 792ś797. [46] Toshinari Itoko, Rudy Raymond, Takashi Imamichi, Atsushi Matsuo, and Andrew W Cross. 2019. Quantum circuit compilers using gate commutation rules. In Proceedings of the 24th Asia and South Paciic Design Automation Confer . 191ś196. ence [47] Emmanuel Jeandel, Simon Perdrix, and Renaud Vilmart. 2018. A complete axiomatisation ZX-calculus of the for Clifor Td+ quantum mechanics. InProceedings of the 33rd Annual ACM/IEEE Symposium on Logic in Computer Science . 559ś568. [48] Norbert Kalb, Andreas A Reiserer, Peter C Humphreys, Jacob JW Bakermans, Sten J Kamerling, Naomi H Nickerson, Simon C Benjamin, Daniel J Twitchen, Matthew Markham, and Ronald Hanson. 2017. Entanglement distillation between solid-state quantum network nodes. Science356, 6341 (2017), 928ś932. [49] Peter J Karalekas, Nikolas A Tezak, Eric C Peterson, Colm A Ryan, Marcus P da Silva, and Robert S Smith. 2020. A quantum-classical cloud platform optimized for variational hybrid algorithms. Quantum Science and Technology 5, 2 (2020), 024003. [50] David Kielpinski, Chris Monroe, and David J Wineland. 2002. Architecture for a large-scale ion-trap quantum Naturcomputer e 417, . 6890 (2002), 709ś711. [51] H Jef Kimble. 2008. The quantum internet. Nature 453, 7198 (2008), 1023ś1030. [52] Aleks Kissinger and John van de Wetering. 2019. Reducing T-count with theZX-calculus.arXiv preprint arXiv:1903.10477 (2019). [53] Aleks Kissinger and John van de Wetering. 2020. PyZX: Large Scale Automated Diagrammatic Reasoning. In Proceedings 16th International Conference on Quantum Physics and Logic , Vol. 318. Open Publishing Association, 229ś241. [54] Aleksei Yur’evich Kitaev. 1997. Quantum computations: algorithms and error corrUsp ection. ekhi Matematicheskikh Nauk 52, 6 (1997), 53ś112. [55] D Kleitman, A Martin-Löf, B Rothschild, and A Whinston. 1970. A matching theorem forJournal graphs.of Combinatorial The8,or1y (1970), 104ś114. [56] Daniel J Kleitman. 1971. An algorithm for certain multi-commodity lowNetw problems. orks 1, 1 (1971), 75ś90. [57] Petr Kolman and Christian Scheideler. 2002. Improved bounds for the unsplittable low problem. SODA, In Vol. 2. 184ś193. [58] Bernhard Korte and Jens Vygen. 2006. Multicommodity Flows and Edge-Disjoint Paths. Combinatorial In Optimization: Theory and Algorithms . Springer. [59] Wojciech Kozlowski, Stephanie Wehner, Rodney Van Meter, Bruno Rijsman, Angela Sara Cacciapuoti, and Marcello Calei. 2021. Architectural Principles for a Quantum Internet. Internet-Draft draft-irtf-qirg-principles-03. Internet Engineering Task Force. Work in Progress. [60] Stefan Krastanov, Hamza Raniwala, Jefrey Holzgrafe, Kurt Jacobs, Marko Lončar, Matthew J Reagor, and Dirk R Englund. 2021. Optically Heralded Entanglement of Superconducting Systems in Quantum Networks. Physical Review Letters 127, 4 (2021), 040503. [61] Gushu Li, Yufei Ding, and Yuan Xie. 2019. Tackling the qubit mapping problem for NISQ-era quantumPrde oce vices. edings Inof the 24th International Conference on Architectural Support for Programming Languages and Operating.Systems 1001ś1014. [62] Ying Li. 2015. A magic state’s idelity can be superior to the operations that crNe eate w dJournal it. of Physics 17, 2 (2015), 023037. [63] Maokai Lin and Patrick Jaillet. 2014. On the quickest low problem in dynamic networks ś A parametric min-cost low approach. In Proceedings of the 26th annual ACM-SIAM symposium on discrete algorithms . SIAM, 1343ś1356. [64] Norbert M Linke, Dmitri Maslov, Martin Roetteler, Shantanu Debnath, Caroline Figgatt, Kevin A Landsman, Kenneth Wright, and Christopher Monroe. 2017. Experimental comparison of two quantum computing architePr ctur ocees. edings of the National Academy of Sciences114, 13 (2017), 3305ś3310. [65] Daniel Litinski. 2019. A game of surface codes: Large-scale quantum computing with lattice Quantum surger3y.(2019), 128. [66] Yehan Liu, Zlatko Minev, Thomas G McConkey, and Jay Gambetta. 2022. Design of interacting superconducting quantum circuits with quasi-lumped models. A Inmerican Physical Society (March Meeting) . [67] Liam Madden and Andrea Simonetto. 2022. Best approximate quantum compiling problems. ACM Transactions on Quantum Computing 3, 2 (2022), 1ś29. [68] Marco Maronese, Lorenzo Moro, Lorenzo Rocutto, and Enrico Prati. 2022. Quantum compiling. Quantum In Computing Environments . Springer, 39ś74. [69] Maren Martens. 2009. A simple greedy algorithm for the k-disjoint low problem. International In Conference on Theory and Applications of Models of Computation . Springer, 291ś300. [70] Dmitri Maslov, Sean M Falconer, and Michele Mosca. 2008. Quantum circuit placement. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 27, 4 (2008), 752ś763. [71] Dmitri Maslov and Martin Roetteler. 2018. Shorter stabilizer circuits via Bruhat decomposition and quantum circuit transformations. IEEE Transactions on Information Theor 64, y 7 (2018), 4729ś4738. ACM Trans. Quantum Comput. 28 • D. Cuomo et al. [72] MATLAB. 2021. R2021b. The MathWorks Inc., Natick, Massachusetts. [73] C Monroe, R Raussendorf, A Ruthven, KR Brown, P Maunz, L-M Duan, and J Kim. 2014. Large-scale modular quantum-computer architecture with atomic memory and photonic interconne Physical cts. Review89, A 2 (2014), 022317. [74] Lorenzo Moro, Matteo GA Paris, Marcello Restelli, and Enrico Prati. 2021. Quantum Compiling by Deep Reinforcement Natur Learning. e Communications Physics 4, 178 (2021). [75] Michael A Nielsen and Isaac Chuang. 2002. Quantum computation and quantum information. [76] Eesa Nikahd, Naser Mohammadzadeh, Mehdi Sedighi, and Morteza Saheb Zamani. 2021. Automated window-based partitioning of quantum circuits. Physica Scripta 96, 3 (2021), 035102. [77] Mihir Pant, Hari Krovi, Don Towsley, Leandros Tassiulas, Liang Jiang, Prithwish Basu, Dirk Englund, and Saikat Guha. 2019. Routing entanglement in the quantum internet. npj Quantum Information 5, 1 (2019), 1ś9. [78] Stefano Pirandola and Samuel L Braunstein. 2016. Physics: Unite to build a quantum Natur Internet. e News 532, 7598 (2016), 169. [79] Julian Rabbie, Kaushik Chakraborty, Guus Avis, and Stephanie Wehner. 2022. Designing quantum networks using preexisting infrastruc- ture. npj Quantum Information 8, 1 (2022), 1ś12. [80] Mohammad Beheshti Roui, Mariam Zomorodi, Masoomeh Sarvelayati, Moloud Abdar, Hamid Noori, Paweł Pławiak, Ryszard Tadeusiewicz, Xujuan Zhou, Abbas Khosravi, Saeid Nahavandi, . 2021. et alA novel approach based on genetic algorithm to speed up the discovery of classiication rules on GP KnoUs. wledge-Based Systems231 (2021), 107419. [81] Moein Sarvaghad-Moghaddam and Mariam Zomorodi. 2021. A general protocol for distributed quantum Quantum gates. Information Processing20, 8 (2021), 1ś14. [82] Peter Selinger. 2013. Quantum circuits T-depth of one. Physical Review87, A 4 (2013), 042302. [83] Marcos Yukio Siraichi, Vinícius Fernandes dos Santos, Sylvain Collange, and Fernando Magno Quintão Pereira. 2018. Qubit allocation. In Proceedings of the 2018 International Symposium on Code Generation and Optimization . 113ś125. [84] Aravind Srinivasan. 1997. Improved approximations for edge-disjoint paths, unsplittable low, and related routing problems. In Proceedings 38th Annual Symposium on Foundations of Computer Science . IEEE, 416ś425. [85] Anand Srivastav and Peter Stangier. 2000. On complexity, representation and approximation of integral multicommo Discr dity etelows. Applied Mathematics 99, 1-3 (2000), 183ś208. [86] LJ Stephenson, DP Nadlinger, BC Nichol, S An, P Drmota, TG Ballance, K Thirumalai, JF Goodwin, DM Lucas, and CJ Ballance. 2020. High-rate, high-idelity entanglement of qubits across an elementary quantum netw Physical ork. review letters 124, 11 (2020), 110501. [87] John van de Wetering. 2020.ZX-calculus for the working quantum computer scientist. arXiv preprint arXiv:2012.13966 (2020). [88] Rodney Van Meter and Simon J Devitt. 2016. The path to scalable distributed quantum computing. Computer 49, 9 (2016), 31ś42. [89] Pengfei Wang, Chun-Yang Luan, Mu Qiao, Mark Um, Junhua Zhang, Ye Wang, Xiao Yuan, Mile Gu, Jingning Zhang, and Kihwan Kim. 2021. Single ion qubit with estimated coherence time exceeding one Natur hour. e communications 12, 1 (2021), 1ś8. [90] Stephanie Wehner, David Elkouss, and Ronald Hanson. 2018. Quantum internet: A vision for the roadScience ahead.362, 6412 (2018). [91] Robert Wille, Lukas Burgholzer, and Alwin Zulehner. 2019. Mapping quantum circuits to IBM QX architectures using the minimal number ofSWAP and H operations. In2019 56th ACM/IEEE Design Automation Conference . IEEE, 1ś6. [92] Mithuna Yoganathan, Richard Jozsa, and Sergii Strelchuk. 2019. Quantum advantage of unitary Cliford circuits with magic state inputs. Proceedings of the Royal Society475, A 2225 (2019), 20180427. [93] Yuan-Hang Zhang, Pei-Lin Zheng, Yi Zhang, and Dong-Ling Deng. 2020. Topological quantum compiling with reinforcement learning. Physical Review Letters 125, 17 (2020), 170501. [94] Changchun Zhong, Zhixin Wang, Changling Zou, Mengzhen Zhang, Xu Han, Wei Fu, Mingrui Xu, S Shankar, Michel H Devoret, Hong X Tang, et al. 2020. Proposal for heralded generation and detection of entangled microwaveśoptical-photon Physical pairs. review letters 124, 1 (2020), 010511. [95] Xinlan Zhou, Debbie W Leung, and Isaac L Chuang. 2000. Methodology for quantum logic gate construction. Physical Review62, A 5 (2000), 052316. [96] Mariam Zomorodi-Moghadam, Zohreh Davarzani, Ismail Ghodsollahe. e2021. , et alConnectivity matrix model of quantum circuits and its application to distributed quantum circuit optimization. Quantum Information Processing 20 (2021). [97] Mariam Zomorodi-Moghadam, Mahboobeh Houshmand, and Monireh Houshmand. 2018. Optimizing teleportation cost in distributed quantum circuits. International Journal of Theoretical Physics 57, 3 (2018), 848ś861. [98] Alwin Zulehner and Robert Wille. 2019. Compiling ��(4) quantum circuits to IBM QX architectures.PrIn oceedings of the 24th Asia and South Paciic Design Automation Confer. ence 185ś190. ACM Trans. Quantum Comput.

Journal

ACM Transactions on Quantum ComputingAssociation for Computing Machinery

Published: Feb 24, 2023

Keywords: Quantum circuit compilation

References