Large-Scale Quantum Architectures Fred Chong Director of Computer Engineering Professor of Computer Science University of California at Santa Barbara With Daniel Kudrow, Tzvetan Metodi, Darshan Thaker, Kenneth Bier, Summer Deng, Yu Tomita, Isaac Chuang, Ken Brown, and Diana Franklin QARC + Quantum Architecture Research Center
Science Fiction? n 5 and 7-bit machines have been built [Vandersypen00, Laflamme99] n Dwave 128-bit adiabatic machine q 512-bit machine planned F. Chong -- QC 2
This Talk n A systems perspective to guide device development n Case studies in: q Abstractions for new technologies q Applying classical optimizations to novel architectures 3
Outline n Introduction to Quantum Computing n Quantum Logic Array [Micro05] n Compressed Quantum Logic Array [ISCA06] n Code Generation for Arbitrary Rotations [ISCA13] n Final Thoughts F. Chong -- QC 4
Quantum Bits (qubit) + n 1 qubit probabilistically represents 2 states a> = C 0 0> + C 1 1> n Every additional qubit doubles # states ab> = C 00 00> + C 01 01> + C 10 10> +C 11 11> n Quantum parallelism on an exponential number of states n But measurement collapses qubits to single classical values F. Chong -- QC 5
7-qubit Quantum Computer ( Vandersypen, Steffen, Breyta, Yannoni, Sherwood, and Chuang, 2001 ) Bulk spin NMR: nuclear spin qubits Decoherence in 1 sec; operations at 1 KHz Failure probability = 10-3 per operation Potentially 100 sec @ 10 KHz = 10-6 per op pentafluorobutadienyl cyclopentadienyldicarbonyliron complex F. Chong -- QC 6
Not Gate The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again. X Gate Bit-flip, Not n Flips probabilities for 0> and 1> n Conservation of energy C i n Reversibility => unitary matrix 2 i X = α 2 + β 2 0 1 = 1 1 0 α β = β 0 + α 1 * 0 1 0 1 ( X ) T X = = 1 0 1 0 I (* means complex conjugate) F. Chong -- QC 7
Controlled Not Controlled Not Controlled X CNot X 1 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 a b c d = a 00 + b 01 + d 10 + c 11 n Control bit determines whether X operates n Control bit is affected by operation F. Chong -- QC 8
Universal Quantum Operations H Gate Hadamard The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again. H 1 1 1 α (α + β) 0 > + (α β) 1> = 2 1 1 β 2 T Gate T 0 α - 0 iπ = αe 8 β e iπ - iπ iπ 8 e 8 8 0 > + βe 1 > Z Gate Phase-flip Z 1 0 0 α -1 β = α 0 > β 1 > Controlled Not Controlled X CNot X 1 0 0 0 0 1 0 0 0 0 0 1 0 a 0 b 1 c 0 d = a 00 > + b 01 > + d 10 > + e 11 > F. Chong -- QC 9
Quantum Algorithms n Factorization (Shor s Algorithm) q n 3 instead of exponential n Search (Grover s Algorithm) q function evaluation q Sqrt(n) instead of n F. Chong -- QC 10
Reliability is hard n Quantum computing should be hard q Short lived q Small systems n Can t copy data (no-cloning theorem) q Need to protect it n Can t measure data q How do we detect errors? F. Chong -- QC 11
Quantum Error Correction X 12 X 23 Error Type Action +1 +1 no error no action +1-1 bit 3 flipped flip bit 3-1 +1 bit 1 flipped flip bit 1-1 -1 bit 2 flipped flip bit 2 (3-qubit code) F. Chong -- QC 12
Syndrome Measurement 0 X X X 12 Y 2 Y 2 ' Y 1 Y 1 ' F. Chong -- QC 13
3-bit Error Correction A 1 X X X 01 A 0 X X X 12 Y 2 X Y 2 ' Y 1 X Y 1 ' Y 0 X Y 0 ' F. Chong -- QC 14
Error Correction is Crucial n Need continuous error correction q can operate on encoded data [Shor96, Steane96, Gottesman99] n Threshold Theorem [Ahanorov 97] q failure rate of 10-4 per op can be tolerated n Practical error rates are 10-6 to 10-9 UCSB 4/04 F. Chong -- QC 15
Concatenated Codes 1 logical qubit Level 1: 7 physical qubits Level 2: 49 physical qubits Reliability increases doubly exponentially." " Exponentially slower." " Exponentially greater resources." Concatenated Steane Code 16
Error Correction Overhead 7-qubit code [Steane96], applied recursively Recursion Storage Operations time Min. (k) (7 k ) ( 153 k ) ( 5 k ) 0 1 1 1 1 7 153 5 2 49 23,409 25 3 343 3,581,577 125 4 2,401 547,981,281 625 5 16,807 83,841,135,993 3125 F. Chong -- QC 17
Our Goal 10 7 gates Factor 2048-bit Number 10 6 gates Factor 1024-bit Number Complexity (# gates) FACTORING (NMR) QARC (NMR) NMR 01 Supercond. Ion Trap 00 01, LANL 99, Oxford 03 00, LANL 00 98 04, NIST 01, NIST 99,01 03 99,00, MIT Ion trap DJ 99 00 00 00, Frankfurt 00, NIST 02, NIST Saclay / 98 98, LANL Delft / UK 99, Oxford 00, NEC 99, Cambridge 96, NIST 03, NEC 1 2 3 4 5 6 7 # of quantum bits (qubits) 10 5 qubits 10 6 qubits 18
Quantum Architecture Research: 1. Identify the viability of proposed technologies n n Quantify the physical bounds and known hardware aspects. Alert physicist of technological limits that are needed for computationally relevant implementation. 2. Identify the Unknowns n Scaling to arbitrary sizes is not the same as implementing universal logic with only a few gates. 3. Identify Correct Microarchitectural Abstractions 4. Multidisciplinary Field q computer engineers must work closely with theorists and physicists 19
Trapped Ions for Quantum Computation Electrode Substrate Ion Space Trapping electrodes are attached on aluminum substrate Qubits are stored in the internal electronic states of each ion Lasers implement logic gates and measurement Sympathetic recooling ions reduce vibrational heating Cirac and Zoller, PRL, v74, 1995; Kielpinski et. al. Nature, v417, 2002 20
Trapped Ions: Example a8 a7 a6 a5 ballistic channel a1 a2 a3 a4 21
Quantum Teleportation? Data Ion Alice Entangled Ions (aka EPR pair) Bob Reliable classical information channel n Two ions are entangled in an inseparable state. n One is sent to Alice and one to Bob n With some error, Alice can force Bob s ion into resembling her data ion. n By sending Bob two-bits of classical information of what the error is, Bob can accurately recreate the Data ion. Bennett, et. al. PRL, v70, 1993 22
Quantum Logic Array Architecture - QLA (overview) Classical Control Processors Classical Control Processors Logical Qubit Logical Qubit R R Logical Qubit R R Logical Qubit Logical Qubit R R QLA Building Tile Sea of lower level qubits Channel Channel R [Metodi et al, Micro 2005] 23
High-Level Architecture Overview (Recursion) Classical Control Processors Logical Qubit Logical Qubit Logical Qubit level 1 Qubits physical ions 24
High-Level Architecture Overview Logical Qubit Classical Control Processors Classical Control Processors Logical Qubit Logical Qubit R R Logical Qubit R R Logical Qubit Logical Qubit ~100 logical qubits per 90nm-technology Pentium 4 processor, compared to 55 million classical transistors within each such P4 R R 720 m 49 Physical Ions --- 5292 trap cells 2.11 mm 2 2940 10:49 25
Recall the Repeater Stations Logical Qubit Logical Qubit Logical Qubit R R R Logical Qubit Logical Qubit Logical Qubit R R R 26
Inter-Qubit Communication source EPR Q1 256 qubits ~ 30,000 cells destination Qk Ballistic channels are too faulty for the data to move through very large distances. Teleportation allows us to transport data without physically moving the ion. EPR pairs still need to move to connect the source and the destination. The EPR pairs are purified upon arrival with the use of ancillary EPR pairs. Bennett, et. al. PRL, v76, 1996 27
Communication Channel: Detail Repeater Repeater 28
Quantum Repeaters Q1 Qk R R R source R R R R R R R destination 7 repeater stations connect with EPR pairs (only the ions are shown) 29
Quantum Repeaters Q1 Qk R R R 30
Quantum Repeaters Q1 Qk R R R 10:49 31
Quantum Repeaters Q1 Qk R R R Teleporting the data 32
QLA: Theoretical Expectaitons Modular Exponentiation f ( x) = a x mod M QFT Period of f(x) Classical Post processing Toffoli Toffoli FT Toffoli gate needs 9 logical qubits and a total of 21 error correction steps: Time ~2.5 seconds per gate. Meter and Itoh, quant-ph/0408046 33
Factoring an Integer Modular Exponentiation f ( x) = a x mod M QFT Period of f(x) Classical Post processing n 128-bit: 63,730 Toffoli Gates with 21 ECC steps per Toffoli for modular exponentiation. Thus we have 21(63,730)+QFT = 1.34 x 10 6 time steps = ~ 16 hours. è 16*1/.75 è ~21 hours n 512-bit: 397.910 Toffoli Gates + QFT è ~5.5 days n 1024-bit: 964,919 Toffoli Gates + QFT è ~13.4 days n 2048-bit: 2,301,767 Toffoli Gates + QFT è ~32 days 34
Major QLA Problem!!!!! AREA and Classical Resource EXPLOSION Solution: Specialized Architecture Elements? 35
Design Pyramid Allowed Physical Component Reliability! QLA! Area! Reliability! Speed! 36
Application Constrains Parallelism n Modular Exponentiation Component: The Draper Carry-Lookahead Adder (64-qubit Adder) 37
Specialization Ancilla : Data 2 : 1 Compute Block Logical Data Qubits Ancilla : Data 1 : 8 Memory Block Logical Ancilla Qubits 38
Area Reduced Factor of Shor s Alg. Adder Input Size 39
Area Reduced CQLA: 28cm x 28cm" QLA: 90cm x 90cm" 40
Design Pyramid - CQLA CQLA! QLA! Area! Reliability! Speed! 10:49 41
Concatenated Codes 1 logical qubit Level 1: 7 physical qubits Level 2: 49 physical qubits Reliability increases doubly exponentially." " Exponentially slower." " Exponentially greater resources." Concatenated Steane Code 42
Faster CQLA Cache @ Level 1 Compute @ Level 1 Memory Block Compute Block [Thaker et al, ISCA 2006] 43
Overall Results Factor of Shor s Alg. Adder Input Size 44
Design Pyramid CQLA v2 CQLA v2 QLA! Area! Reliability! Speed! 45
Quantum Code Generation for Arbitrary Rotations n Looking at larger sent of benchmarks for the first time, we find that rotations are important, difficult to compile for, and expensive to execute n Unique sequence for every distinct rotation q Can be 4 TB of code! n Sometimes need dynamic code generation q Rotation angles determined at runtime q Large code size [Kudrow et al, ISCA 2013] 46
Bloch Sphere The surface represents all 'pure' qubit states
Bloch Sphere The surface represents all 'pure' qubit states The poles are the classical zero and one states
Bloch Sphere The surface represents all 'pure' qubit states The poles are the classical zero and one states
Bloch Sphere The surface represents all 'pure' qubit states The poles are the classical zero and one states A point between the poles represents a superposition of states
Rotation Gate A Rotation gate changes the phase of a qubit This is a rotation about the z-axis of the Bloch Sphere
Rotation Gate A Rotation gate changes the phase of a qubit This is a rotation about the z-axis of the Bloch Sphere The rotation gate is parameterized by the rotation angle, theta
Rotation Decomposition Most technologies do not have native support for arbitrary rotations Fault-tolerant constructions exist only for discrete gates Rotations must be approximated using supported gates We refer to this as 'decomposition'
Rotation Decomposition H gate T gate X gate...
Rotation Decomposition
Rotation Decomposition H gate
Rotation Decomposition H gate T gate
Rotation Decomposition H gate T gate X gate
Rotation Decomposition H gate T gate X gate H gate
Rotation Decomposition H gate T gate X gate H gate T gate...
Rotation Decomposition Scaffold QPL QASM module RotatePhi(qbit q) { module RotatePhi(qbit q) { } Rz(q, Phi); Rotation gate T q H q Z q H q T q Z q... Decomposition }
Quantum Rotations
Static Compilation Scaffold QASM Classical Processor Quantum Co-processor
Dynamic Compilation Scaffold QASM Classical Processor Quantum Co-processor
Methods Solovay-Kitaev Widely used method of decomposing rotations Recursively factors a rotation Increasing recursion depth increases precision Our baseline SK includes parallelization and data structure optimizations 10X faster
Methods - SQCT New technique that improves significantly over SK Minimizes T gates in sequence T gates are resource intensive Offers same precision of SK in fewer gates
Methods Library Construction Designed with dynamic compilation in mind Minimize compilation time Pre-compute a library of rotations Quickly concatenate them at runtime to create desired rotation Overshoot precision of library to compensate for loss of precision during concatenation
Methods Library Construction Example: binary construction Generate library: Concatenate appropriate sequences to approximate desired angle: T, H, T, Z, T, Z, H,...
Methods Library Construction
Methodology Tested all methods over a set of random angles High-end desktop hardware (Intel quad-core, 32G ram) Evaluation metrics: Precision of approximation Compilation time Sequence length (in T gates)
Results Compilation Time
Results Compilation Time Ion Trap Neutral Atom Superconductor Photons
Results Compilation Time Ion Trap Linear Systems Neutral Atom Superconductor Photons
Results Sequence Length
Library Size
Library Size
Library Size
Dynamic Compilation Summary Up to 100,000X speedup for dynamic compilation with 5-10X increase in sequence length
Future Work n Minimize T gates in sequences n Compare surface codes to concatenated codes n Optimize for SIMD parallelism n Use resource estimation for hot code optimization 10:49 79
Final Thoughts n Systems research on emerging technologies is important to guide device development n Communicating to materials and device people is challenging, as is developing useful abstractions n Many traditional optimizations can be adapted to novel architectures 80