Efficient Distributed Quantum Computing Steve Brierley Heilbronn Institute, Dept. Mathematics, University of Bristol October 2013 Work with Robert Beals, Oliver Gray, Aram Harrow, Samuel Kutin, Noah Linden, Dan Shepherd & Mark Stather
Summary Two models of Quantum computation: Distributed Quantum Computing Quantum Parallel RAM
Summary Two models of Quantum computation: Distributed Quantum Computing Quantum Parallel RAM Result: Theses two models are efficiently related to the standard quantum circuit model Method: We use techniques from (classical) parallel computing - sorting networks.
Summary Two models of Quantum computation: Distributed Quantum Computing Quantum Parallel RAM Result: Theses two models are efficiently related to the standard quantum circuit model Method: We use techniques from (classical) parallel computing - sorting networks. Applications: Build a quantum computer from a network of small parts Q PRAM is a new tool in quantum algorithm design
Distributed quantum computing Quantum circuits Up to N/2 two-qubit gates on any disjoint pair of qubits Not physical because any two qubits can interact
Distributed quantum computing Quantum circuits Up to N/2 two-qubit gates on any disjoint pair of qubits Not physical because any two qubits can interact Distributed quantum computing (DQC) N small processors interact in a fixed low-degree topology
Distributed quantum computing
Circuits on a DQC Suppose we want to implement the circuit C = U 1 U 2 U 3 on a 1-D nearest neighbour graph The naive approach: use SWAP gates to move gates one at a time If there are N qubits the cost is O(N 2 ) per timestep
Overview of our approach Replace the circuit with ( ) ( C P 1 U1 L P1 1 P 2 U2 L P2 1 ) ( P 3 U L 3 P 1 3 ) where U L k are local unitaries We can combine some of the permutations C P 1 U L 1 P 2 U L 2 P 3 U L 3 P 4
Overview of our approach Replace the circuit with ( ) ( C P 1 U1 L P1 1 P 2 U2 L P2 1 ) ( P 3 U L 3 P 1 3 ) where U L k are local unitaries We can combine some of the permutations C P 1 U1 L P 2 U2 L P 3 U3 L P 4 The key idea is to use a sorting network to implement P k The algorithm is universal The cost depends on the graph but is close to optimal For a 1D nearest neighbour graph the overhead is O(N)
Sorting networks A fixed network of binary comparators: if x < y then swap x, y Insertion sort Bitonic Sort
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example Suppose we want to implement the circuit C = U 1 U 2 U 3 on a 1-D nearest neighbour graph Our approach yields
Emulating circuits on a fixed architecture Given an architecture constrained by G, what is the cost of emulating a highly parallel circuit? Theorem: 1) Any circuit can be emulated on a restricted architecture with a overhead depth factor of D G (the cost of a sorting network). 2) If you can do better, you have a better sorting algorithm!
Interesting architectures The cost depends on the graph... Graph Degree Routing Cost 1D n.n. 2 Naive approach O(N 2 ) 1D n.n. 2 Insertion sort O(N) 2D n.n. 4 Insertion sort O( N) Hypercube log N Bitonic sort O(log 2 N) Cyclic butterfly 4 Benes + insertion O(log N) Complete graph N n/a 1
Lull
QPRAM on a distributed quantum computer QPRAM = Circuit model + Parallel access to quantum RAM
QPRAM on a distributed quantum computer QPRAM = Circuit model + Parallel access to quantum RAM Key primitive: The global state of the computer has registers j 1,..., j N, x 1,..., x N and y 1,..., y N Locally, processor i controls j i, x i, y i. Processor i wants to query the memory at processor j i. Want to replace y i with y i x ji according to the quantum state j 1,..., j N
Algorithm for parallel memory look-ups Idea: Make the sorting network reversible Each node requires S D G T T log log N Then the same network works for all inputs We can input a superposition of destinations
Algorithm for parallel memory look-ups Each processor submits question (j i, Q, y i, 0) and answer (i, A, 0, x i ) packets
Algorithm for parallel memory look-ups Each processor submits question (j i, Q, y i, 0) and answer (i, A, 0, x i ) packets Sort the packets (with a sorting network) based on first two indices (Q < A) The sequence is now... (j, Q, y, 0)(j, Q, y, 0)... (j, Q, y, 0)(j, A, 0, x j )...
Algorithm for parallel memory look-ups Each processor submits question (j i, Q, y i, 0) and answer (i, A, 0, x i ) packets Sort the packets (with a sorting network) based on first two indices (Q < A) The sequence is now... (j, Q, y, 0)(j, Q, y, 0)... (j, Q, y, 0)(j, A, 0, x j )... Broadcast the answer x j using local CNOTs in O(log N) time CNOT each x j value to the y register
Algorithm for parallel memory look-ups Each processor submits question (j i, Q, y i, 0) and answer (i, A, 0, x i ) packets Sort the packets (with a sorting network) based on first two indices (Q < A) The sequence is now... (j, Q, y, 0)(j, Q, y, 0)... (j, Q, y, 0)(j, A, 0, x j )... Broadcast the answer x j using local CNOTs in O(log N) time CNOT each x j value to the y register Undo the broadcast and sort steps to return (j i, Q, y i x ji, 0) to processor i
Distributed quantum memory Theorem: 1) In the circuit model, the cost of parallel memory access is O(log N log log N) 2) To access even a single piece of quantum data costs Ω(log N)
Distributed quantum memory Theorem: 1) In the circuit model, the cost of parallel memory access is O(log N log log N) 2) To access even a single piece of quantum data costs Ω(log N) Applications: MultiGrover algorithm Element Distinctness problem
Application: MultiGrover Multiple processors can Grover search the same database held in quantum memory! The first thing each processor does is form x i x i D
Application: MultiGrover Multiple processors can Grover search the same database held in quantum memory! The first thing each processor does is form x i x i D If D requires N log N qubits to store, MultiGrover finds N solutions in the same time as Grover finds 1. i.e. we have recovered the situation when the database is simple to represent.
Application: Element Distinctness Best Oracle complexity is T = O(N 2/3 ) but this requires S = O(N 2/3 ). When the function is easy to compute but hard to invert, ST 2 = O(N 2 ) Grover-Rudolph complain that we can achieve this with non-communicating parallel Grover searches
Application: Element Distinctness Best Oracle complexity is T = O(N 2/3 ) but this requires S = O(N 2/3 ). When the function is easy to compute but hard to invert, ST 2 = O(N 2 ) Grover-Rudolph complain that we can achieve this with non-communicating parallel Grover searches MultiGrover + Buhrman et al answers this challenge ST = O(N)
Summary Two models of Quantum computation: Distributed Quantum Computing Quantum Parallel RAM Result: Using sorting networks, the two models are efficiently related to the standard quantum circuit model Applications: Build a quantum computer from a network of small parts Q PRAM is a new tool in quantum algorithm design 1D n.n graph : Hirata et al. QIC 11, 142 (2011) Any graph & QPRAM: Beals et al. Proc. R. Soc. A (arxiv:1207.2307) Cyclic Butterfly : work in progress