Parallel Processing and Circuit Design with Nano-Electro-Mechanical Relays Elad Alon 1, Tsu-Jae King Liu 1, Vladimir Stojanovic 2, Dejan Markovic 3 1 University of California, Berkeley 2 Massachusetts Institute of Technology 3 University of California, Los Angeles
Power Density (W/cm2) Normalized Energy/op CMOS is Scaling, Power Density is Not 10000 1000 100 Power Density Prediction circa 2000 Sun s Surface Rocket Nozzle Nuclear Reactor 100 80 60 8086 10 4004 Hot Plate P6 80088085 8080 286 386 Pentium proc 486 1 1970 1980 1990 2000 2010 Year S. Borkar 40 20 0 10 0 10 1 10 2 10 3 10 4 1/throughput (ps/op) V dd and V t not scaling well power/area not scaling 2
Power Density (W/cm2) Normalized Energy/op CMOS is Scaling, Power Density is Not 10000 1000 Power Density Prediction circa 2000 Sun s Surface Rocket Nozzle 100 80 100 Nuclear Reactor 8086 Core 2 10 4004 Hot Plate P6 80088085 8080 286 386 Pentium proc 486 1 1970 1980 1990 2000 2010 Year S. Borkar 60 40 20 Run in parallel to recoup performance Operate at a lower energy point 0 10 0 10 1 10 2 10 3 10 4 1/throughput (ps/op) V dd and V t not scaling well power/area not scaling Parallelism to improve throughput within power budget 3
Normalized Energy/cycle Normalized Energy/op Where Parallelism Doesn t Help 25 Energy/op vs. Vdd 100 Energy/op vs. 1/throughput 20 80 15 10 E total E tot 60 40 Scale Vdd & V T : 5 E dynamic E leak E dyn E leak 0.1 0.2 0.3 0.4 0.5 Vdd (V) 20 0 10 1 10 2 10 3 10 4 10 5 1/throughput (ps/op) CMOS circuits have well-defined minimum energy Caused by leakage and finite sub-threshold swing Need to balance leakage and active energy Limits energy-efficiency, no matter how slow the circuit runs 4
Normalized Energy/cycle I DS (ma) 25 20 15 10 5 What if There Was No Leakage? Energy/op vs. Vdd E tot = E dyn E total E dynamic 0.1 0.2 0.3 0.4 0.5 Vdd (V) No longer need to balance increasing component of energy as Vdd decreases Relay (mechanical switch) offers near infinite subthreshold slope and no leakage current 1.2 1.0 0.8 0.6 0.4 0.2 0.0 Measured relay I-V W x L = 2mm x 20mm H = g = 0.6mm 34 35 36 37 V GB (V) compliance limit V DS = 1V 5
Outline If we could reliably build relays, would relay circuits offer superior energyefficiency? NEM Relay Structure and Model Digital Circuit Design with NEM Relays Comparisons to CMOS 6
NEM Relay Structure & Operation Electrical Gate Cross-Section Metal Channel Gate Dielectric Closed Relay ( ON ): V gb > V pi (pull-in voltage) W H Source t gap Air gaph Drain Drain t dimple Top View Electrical Gate Source Body Dielectric Bodyody Anchor L W Body Electrode (under body dielectric) Drain On-resistance depends on materials, pressure ( V gb ) Open Relay ( OFF ): V gb < V po (pull-out voltage) Channel Contact Dimples Infinite off-resistance zero leakage 7
Preliminary Fabricated Relay GATE SOURCE Channel DRAIN BODY 1mm Measured I-V Cantilever W,L,H = 2μm,20μm,200nm Actuation gap thickness = 400nm 8
NEM Relay Model Anchor Mass m Spring k 0.5R c C Damper b 0.5R c Mechanical Model S C s G o C d D R sd R cs C gb C gc R cd S o D o C sd_b R sd C sd_b B Compact Verilog-A model for circuit simulation: Mechanical dynamics: spring (k), damper (b), mass (m) Electrical parasitics: non-linear gate-body (C gb ), gate-channel (C gc ), and source/drain-body cap (C sd_b ), contact (R cs,d ) 9
NEM Relay Scaling Scaling (e.g., constant-e) benefits similar to CMOS Pull-in Voltage with Beam Scaling Measured pull-in voltages scale linearly If overcome reliability issues & surface forces scale: {W,L,H,t gap } = {90nm,2.3um,90nm,10nm} V pi = 200mV Mechanical delay also scales linearly (~10ns @90nm) 10
NEM Relay as a Logic Element 4-terminal design mimics MOSFET operation Electrostatic actuation is ambipolar Non-inverting logic possible: actuation independent of drain/source voltages Mechanical delay (~10ns) >> electrical t (~1ps) Can stack 200 devices (with 1kW on-resistance) before electrical delay is comparable to mechanical delay 11
Digital Circuit Design with Relays CMOS: delay set by electrical time constant Quadratic delay penalty for stacking devices Buffer & distribute logical/electrical effort over many stages Relays: delay dominated by mechanical movement Want all relays to switch simultaneously Implement logic as a single complex gate 12
Digital Circuit Design with Relays 4 gate delays 1 mechanical delay Delay Comparison vs. CMOS Single mechanical delay vs. several electrical gate delays For reasonable load, relay delay unaffected by fan-out/fan-in Area Comparison vs. CMOS Larger individual devices Fewer devices needed to implement the same logic function 13
Normalized Energy/op CMOS vs. Relays: Digital Logic 100 80 60 40 20 0 10 1 10 2 10 3 10 4 10 5 1/throughput (ps/op) Most processor components exhibit the energy vs. performance tradeoff of static CMOS Control, Datapath, Clock Adder energy performance tradeoff is representative 14
Relay-based Adder Full adder cell: 12 relays vs. 24 transistors XOR free Complementary signals avoid extra mechanical delay (to invert) Relays all sized minimally 15
N-bit Relay-Based Adder Ripple carry configuration Cascade full adder cells to create larger complex gate Stack of N relays, but still single mechanical delay 16
Snapshot: NEM Relays vs. CMOS Energy/op vs. Delay/op across V dd 32-bit adder CMOS Relay 9x 10x Supply Voltage Load Cap per Output Total Gate Cap 0.5 V 0.32-0.9 V 25 ff 25 ff 4.0 pf 125 ff Area 600 mm 2 480 mm 2 Energy is for adder only Compare vs. Sklansky CMOS adder 1 For similar area: >9x lower E/op, >10x greater delay 1 D. Patil et. al., Robust Energy-Efficient Adder Topologies, in Proc. 18th IEEE Symp. on Computer Arithmetic (ARITH'07). 17
Energy-Delay Results: Contact R Low contact R not critical Enables low force, hard contact Good news for reliability More later 18
Energy-Delay Comparison Relays always more energy efficient >10x better, even in the GOPS range Energy/op vs. Delay/op across V dd & C L 30x cap. gain Lower device C g, C d Fewer devices 2.4x Vdd gain No leakage energy Limited by E surf 19
Area vs. Throughput 30 25 W Relay (buffered, 25fF) A Relay /A CMOS 20 15 10 5 W Relay (buffered, 100fF) Au Relay (25fF) Au Relay (100fF) E relay = 20fJ/op 0.5 1 1.5 2 2.5 3 Throughput (GOPS) For high throughputs, relays require area overhead to achieve similar throughputs as CMOS Area overhead is bounded (max. 6x-25x) : CMOS needs parallelism to maintain E min for throughputs over 1GOPS 20
Revised Power Breakdown I/O energy dominant if core is ~30x more efficient Need to explore processing of analog signals too No or limited linear gain in relays rely on switching 1 * F. Chen et. al., Integrated Circuit Design with NEM Relays, IEEE ICCAD, Nov. 2008. 21
Conclusions NEM relays offer unique characteristics Nearly ideal I on /I off Significantly lower C g than CMOS Switching delay largely independent of electrical t Relay circuits show potential for order of magnitude better energy efficiency Examined 32-bit adders Exploring memories, multipliers, complete microcontroller Key challenges are reliability and scaling Circuit level insights critical: engineer contact R 22
Acknowledgements Hei Kam, Fred Chen, Hossein Fariborzi, Abhinav Gupta, Jaeseok Jeon, Rhesa Nathanael, Vincent Pott, Matthew Spencer, Cheng Wang Berkeley Wireless Research Center NSF DARPA FCRP (C2S2, MSD) Intel Foundation Graduate Fellowship 23