WARM SRAM: A Novel Scheme to Reduce Static Leakage Energy in SRAM Arrays

Similar documents
ESE 570: Digital Integrated Circuits and VLSI Fundamentals

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

Digital Integrated Circuits A Design Perspective

CMOS Digital Integrated Circuits Lec 13 Semiconductor Memories

Topics. Dynamic CMOS Sequential Design Memory and Control. John A. Chandy Dept. of Electrical and Computer Engineering University of Connecticut

! Charge Leakage/Charge Sharing. " Domino Logic Design Considerations. ! Logic Comparisons. ! Memory. " Classification. " ROM Memories.

! Memory. " RAM Memory. ! Cell size accounts for most of memory array size. ! 6T SRAM Cell. " Used in most commercial chips

Semiconductor Memory Classification

University of Toronto. Final Exam

and V DS V GS V T (the saturation region) I DS = k 2 (V GS V T )2 (1+ V DS )

Digital Integrated Circuits A Design Perspective. Semiconductor. Memories. Memories

Digital Integrated Circuits A Design Perspective

Semiconductor Memories

ESE570 Spring University of Pennsylvania Department of Electrical and System Engineering Digital Integrated Cicruits AND VLSI Fundamentals

GMU, ECE 680 Physical VLSI Design 1

Power Dissipation. Where Does Power Go in CMOS?

Objective and Outline. Acknowledgement. Objective: Power Components. Outline: 1) Acknowledgements. Section 4: Power Components

ECE 438: Digital Integrated Circuits Assignment #4 Solution The Inverter

9/18/2008 GMU, ECE 680 Physical VLSI Design

EE141Microelettronica. CMOS Logic

Properties of CMOS Gates Snapshot

Semiconductor memories

EE241 - Spring 2000 Advanced Digital Integrated Circuits. References

Nanoscale CMOS Design Issues

The Devices: MOS Transistors

EEC 118 Lecture #5: CMOS Inverter AC Characteristics. Rajeevan Amirtharajah University of California, Davis Jeff Parkhurst Intel Corporation

Miscellaneous Lecture topics. Mary Jane Irwin [Adapted from Rabaey s Digital Integrated Circuits, 2002, J. Rabaey et al.]

Lecture 25. Semiconductor Memories. Issues in Memory

Where Does Power Go in CMOS?

CMPEN 411 VLSI Digital Circuits. Lecture 03: MOS Transistor

CSE493/593. Designing for Low Power

The Physical Structure (NMOS)

ECE 546 Lecture 10 MOS Transistors

COMBINATIONAL LOGIC. Combinational Logic

CMOS Inverter (static view)

Digital Integrated Circuits Designing Combinational Logic Circuits. Fuyuzhuo

MOSFET and CMOS Gate. Copy Right by Wentai Liu

Digital Integrated Circuits A Design Perspective

SEMICONDUCTOR MEMORIES

Announcements. EE141- Fall 2002 Lecture 7. MOS Capacitances Inverter Delay Power

EE141-Fall 2011 Digital Integrated Circuits

Semiconductor Memories

Magnetic core memory (1951) cm 2 ( bit)

MOS Transistor I-V Characteristics and Parasitics

Integrated Circuits & Systems

Lecture 6 Power Zhuo Feng. Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 2010

CMOS INVERTER. Last Lecture. Metrics for qualifying digital circuits. »Cost» Reliability» Speed (delay)»performance

MOSFET: Introduction

CMPEN 411 VLSI Digital Circuits Spring Lecture 14: Designing for Low Power

EE115C Digital Electronic Circuits Homework #4

The CMOS Inverter: A First Glance

COMP 103. Lecture 16. Dynamic Logic

Low Leakage L SRAM Design in Deep Submicron Technologies

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Sciences

Name: Answers. Mean: 83, Standard Deviation: 12 Q1 Q2 Q3 Q4 Q5 Q6 Total. ESE370 Fall 2015

CMPEN 411 VLSI Digital Circuits. Lecture 04: CMOS Inverter (static view)

THE INVERTER. Inverter

C.K. Ken Yang UCLA Courtesy of MAH EE 215B

Chapter Overview. Memory Classification. Memory Architectures. The Memory Core. Periphery. Reliability. Memory

L ECE 4211 UConn F. Jain Scaling Laws for NanoFETs Chapter 10 Logic Gate Scaling

Hw 6 and 7 Graded and available Project Phase 2 Graded Project Phase 3 Launch Today

ESE570 Spring University of Pennsylvania Department of Electrical and System Engineering Digital Integrated Cicruits AND VLSI Fundamentals

ECE 342 Solid State Devices & Circuits 4. CMOS

Midterm. ESE 570: Digital Integrated Circuits and VLSI Fundamentals. Lecture Outline. Pass Transistor Logic. Restore Output.

5.0 CMOS Inverter. W.Kucewicz VLSICirciuit Design 1

Power Consumption in CMOS CONCORDIA VLSI DESIGN LAB

CMPEN 411 VLSI Digital Circuits Spring 2011 Lecture 07: Pass Transistor Logic

Runtime Mechanisms for Leakage Current Reduction in CMOS VLSI Circuits

Lecture 34: Portable Systems Technology Background Professor Randy H. Katz Computer Science 252 Fall 1995

Semiconductor Memories

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

Static CMOS Circuits. Example 1

MOS Transistor Theory

And device degradation. Slide 1

Semiconductor Memories

ENGR890 Digital VLSI Design Fall Lecture 4: CMOS Inverter (static view)

EECS 427 Lecture 11: Power and Energy Reading: EECS 427 F09 Lecture Reminders

CMOS Inverter. Performance Scaling

Digital Integrated Circuits A Design Perspective

Chapter 2 CMOS Transistor Theory. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

Announcements. EE141- Spring 2003 Lecture 8. Power Inverter Chain

Dynamic operation 20

Topics to be Covered. capacitance inductance transmission lines

CARNEGIE MELLON UNIVERSITY DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING DIGITAL INTEGRATED CIRCUITS FALL 2002

Digital Integrated Circuits 2nd Inverter

Moore s Law Technology Scaling and CMOS

CPE/EE 427, CPE 527 VLSI Design I L18: Circuit Families. Outline

Random Access Memory. DRAM & SRAM Design DRAM SRAM MS635. Dynamic Random Access Memory. Static Random Access Memory. Cell Structure. 6 Tr.

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

Digital Integrated Circuits A Design Perspective

MODULE III PHYSICAL DESIGN ISSUES

DKDT: A Performance Aware Dual Dielectric Assignment for Tunneling Current Reduction

Introduction to CMOS VLSI Design (E158) Lecture 20: Low Power Design

EE105 Fall 2014 Microelectronic Devices and Circuits. NMOS Transistor Capacitances: Saturation Region

Lecture 8: Combinational Circuit Design

ΗΜΥ 307 ΨΗΦΙΑΚΑ ΟΛΟΚΛΗΡΩΜΕΝΑ ΚΥΚΛΩΜΑΤΑ Εαρινό Εξάμηνο 2018

Dynamic Combinational Circuits. Dynamic Logic

MOS Transistor Theory

Lecture Outline. ESE 570: Digital Integrated Circuits and VLSI Fundamentals. Review: 1st Order RC Delay Models. Review: Two-Input NOR Gate (NOR2)

Lecture 4: CMOS review & Dynamic Logic

Transcription:

WARM SRAM: A Novel Scheme to Reduce Static Leakage Energy in SRAM Arrays Mahadevan Gomathisankaran Iowa State University gmdev@iastate.edu Akhilesh Tyagi Iowa State University tyagi@iastate.edu ➀ Introduction ➁ Proposed Circuit Technique ➂ Reducing static energy in On-Chip Caches ➃ Model Validity ➄ Conclusion and Future Work Produced with L A T E X seminar style & PSTricks 1

INTRODUCTION Expected increase in the static leakage current Feature Size to reach 22nm in 2016 Leakage current to increase by factor of 1K-10K in going from 180nm to 70nm Leakage current will play a major role in circuit design Not only arrays but also high fan-out logic will be affected New design methodologies have to be invented to avoid Red Brick Wall We propose warmup-cmos which uses depletion mode transistors INTRODUCTION 2

SUBTHRESHOLD LEAKAGE IN CMOS Various leakage mechanisms PN Reverse Bias, Weak Inversion, DIBL, GIDL, Punchthrough Leakage Current q I sub = A exp n kt (V g V s V th0 γ V s + ηv ds ) B (1) A = µ 0 C ox W eff L eff kt q 2 e 1.8 B = 1 exp( qv ds kt ) SUBTHRESHOLD LEAKAGE IN CMOS 3

SUBTHRESHOLD LEAKAGE IN CMOS Various leakage mechanisms PN Reverse Bias, Weak Inversion, DIBL, GIDL, Punchthrough Leakage Current q I sub = A exp n kt (V g V s V th0 γ V s + ηv ds ) B (1) A = µ 0 C ox W eff L eff kt q 2 e 1.8 B = 1 exp( qv ds kt ) SUBTHRESHOLD LEAKAGE IN CMOS 3-A

EARLIER RESEARCH Gated-V dd + Interposes a high-v t transistor between the circuit and one of the power supply rails + Reduces the leakage current of a normal transistor to effectively the leakage current of the high-v t control transistor - Contents of the cell are lost - Control algorithm should be smart ABB-MTCMOS + Dynamically raise V t by modulating the back-gate bias voltage, i.e., V t = V t0 + γ( φ bi + V sb φ bi ) - Higher energy/delay per transition and higher V dd+ offsets the leakage power savings EARLIER RESEARCH 4

EARLIER RESEARCH Gated-V dd + Interposes a high-v t transistor between the circuit and one of the power supply rails + Reduces the leakage current of a normal transistor to effectively the leakage current of the high-v t control transistor - Contents of the cell are lost - Control algorithm should be smart ABB-MTCMOS + Dynamically raise V t by modulating the back-gate bias voltage, i.e., V t = V t0 + γ( φ bi + V sb φ bi ) - Higher energy/delay per transition and higher V dd+ offsets the leakage power savings EARLIER RESEARCH 4-A

DVS + In sub-micron processes leakage current increases exponentially with supply voltage + Supply voltage is reduced to an optimum value (knee point of the curve, 1.5*V t ) + Two-fold reduction (both voltage and current) of the leakage power is achieved - Memory cell in standby (drowsy) mode cannot be read or written What is Missing? A comprehensive solution which has low (much less) control overhead and still achieves the maximum possible leakage reduction Reduction is maximum if the circuit is in standby or low-leakage mode whenever it is not used EARLIER RESEARCH 5

DVS + In sub-micron processes leakage current increases exponentially with supply voltage + Supply voltage is reduced to an optimum value (knee point of the curve, 1.5*V t ) + Two-fold reduction (both voltage and current) of the leakage power is achieved - Memory cell in standby (drowsy) mode cannot be read or written What is Missing? A comprehensive solution which has low (much less) control overhead and still achieves the maximum possible leakage reduction Reduction is maximum if the circuit is in standby or low-leakage mode whenever it is not used EARLIER RESEARCH 5-A

OUR PROPOSED SOLUTION VPWR VGND Depletion ACC IN Warm Inverter V dd = 1V V TdepN = 0.65V V TP= 0.2V OUT V TN = 0.2V Our solution uses Depletion mode devices The circuit is warm, i.e, when not accessed V P W R is less than V dd and V GND is greater than GND When compared to normal inverter in same technology, warm inverter achieves 377X leakage current reduction ACC V TdepP = 0.65V Depletion Steady State Response IN(V) OUT (V) V P W R (V) V GND (V) I off (pa) 0.0 0.949 0.949 0.148 10 1.0 0.052 0.852 0.052 01 OUR PROPOSED SOLUTION 6

Limitations: Performance Penalty, as NMOS in the charging path and PMOS in the discharging path Energy Penalty, Extra Switching Energy = ξ = 0.3 C diff J Cascading Effect, for a cross coupled inverter we get High = 742mV, Low = 225mV, I off = 515pA (compare with actual I off 6.25nA) Performance Impact t plh (ps) t phl (ps) tr (ps) t f (ps) Base 16.8 10.54 33.63 17.31 New 25.9 16.32 40.72 30.89 %Inc 54.2 54.80 21.10 78.50 OUR PROPOSED SOLUTION 7

APPLICATION TO CACHES BIT LINES ADDRESS BIT LINES Cache Access Timing for a 32KB, 4-way, 1 RW Port, 1 Sub-bank Cache Data Array Delay (ps) Tag Array Delay (ps) Decoder 208.572 099.410 WORD LINES TAG ARRAY ADDRESS DECODER DATA ARRAY WORD LINES Wordline 115.975 044.415 Bitline 011.765 011.898 Senseamp 072.625 044.625 Compare - 112.912 Mux Driver - 150.077 Sel Inverter - 016.612 Total 408.936 479.949 L1 cache sizes are typically 32KB - 64KB COL MUX COL MUX COL MUX COL MUX SENSE AMP Hit? SENSE AMP COMPARATOR SENSE AMP SENSE AMP MUX/OUTPUT DRIVER Cache architecture of a n-way Set-Associative Cache DATA (Athlon has 128KB) L1 miss rates are on the average 2% On-Chip L2 caches are in the range of 256KB (Centrino has 1MB) We used CACTI 3.0 to find the cache access timing APPLICATION TO CACHES 8

Simulation Setup: V SRAM SRAM SRAM 1 2 16 GND V t =0.39V Gnd W Warm SRAM configuration W depn = W min V PWR depp BIT WL V dd WL = 4*W min BIT Basic SRAM cell V PWR V dd V t =0.39V A depletion device pair per cell would increase the area hence offset the energy savings The wordline access signal is used to control the depletion devices PMOS dep is 4W min, as cache read is in critical path this is justified Upto 6X increase in bitline delay (data array) will have no impact on cache access time Simulation is performed in HSPICE for a Subarray of size 128X256 W L is not affected by addition of 16*C g W L is generated from WL and since it is driving only 64*C g it delay can be made one tenth of W L V GND WL APPLICATION TO CACHES 9

Leakage Reduction: Leakage power reduction - 23X V H has moved closer to V T depn, because one NMOS dep is shared with 16 SRAM cells V L has moved closer to V dd V T depp, but not as much as V H, because width of PMOS dep has been increased Steady State Response of a WARM SRAM Cell Param Base Warm SRAM I L (pa) 6250 262 V(BIT ) (V) 1.0 0.686 V(BIT ) (V) 0.0 0.252 APPLICATION TO CACHES 10

Analysis of Write Operation: Transition delay values are as shown in the table Write operation is not getting affected by the presence of Depletion mode devices Two reasons, Faster W L means V GND transits to zero even before the access transistors are turned on Since bits transit from non-zero initial value to V H, the peak current requirement for the transition is smaller and could be supplied by the single NMOS dep Transient Analysis Parameters and Response Param Value Param Value W L tr and t f 100 ps Base tr 47.0 ps W L tr and t f 10 ps Base t f 22.0 ps W L Pulse Width 200 ps Warm SRAM tr 50.1 ps V bitpre 0.5 V Warm SRAM t f 00.0 ps APPLICATION TO CACHES 11

Analysis of Write Operation (contd.): Irrespective of bit state changes, V P W R node and one of the output node (OUT H ) needs to be pulled up Considering the capacitance of V P W R node and OUT H node the extra energy would be 327.9*C diff For 70nm device this would be 36fJ or 0.14fJ/bit which does not change state Warm SRAM uses more energy when 70 bits or less undergo state transition This extra energy (36fJ) is insignificant when compared to dynamic energy per access (0.3nJ), hence we ignored its impact Write Energy Comparison No of Bits Energy (fj) Peak Current (ma) Base Warm SRAM Base Warm SRAM 256 320 144 5.53 0.997 192 240 132 4.14 0.930 128 160 118 2.75 0.840 64 80 99 1.36 0.735 APPLICATION TO CACHES 12

Analysis of Read Operation: Tag array access forms the critical path, hence Warm SRAM is used only in Data Array Since we use Hight-V t access transistors in SRAM cell, access time for precharge voltage of 0.5V closely matches with CACTI s estimated value Bitline delay increases by 4.5X for Warm SRAM, which doesn t increase both cache access time and wave pipelined cycle time The extra energy estimated in write operation also applies to read As V P W R node takes finite amount of time to discharge, extra energy depends on the inter-access time APPLICATION TO CACHES 13

Analysis of Read Operation (contd.): Read Energy w.r.t Inter-Access time Discharging of V P W R node Base Read Energy: 25.92 fj Time (ns) Energy (f J) Extra Energy (f J) 1 950m * # file name: warm_sram_array.sp 25 23.99-1.93 900m 850m 50 33.86 7.94 75 41.56 15.64 100 47.22 21.30 125 51.38 25.46 800m 750m 700m 650m 600m 550m Time=7.33e-09 Voltage=6.84e-01 Time=3.01e-07 Voltage=6.92e-01 150 55.27 29.35 175 57.45 31.53 200 59.44 33.52 Voltages (lin) 500m 450m 400m 350m 300m 300 59.44 33.52 250m 200m Time=4.01e-07 Voltage=2.65e-01 150m 100m 50m 0-50m 0 200n 400n Time (lin) (TIME) APPLICATION TO CACHES 14

Architecture Level Estimation: SPEC2000 Integer benchmarks running on Simplescalar 3.0 is used to estimate the energy savings for a hypothetical 32KB,4-way L1 cache Two sources of extra energy Energy to bring Warm SRAM to normal state (max 33.52fJ per access) Generation of access control signals ( 20fJ per access) Average net energy savings for 0.5ns cache access time (cycle time) is 94.11% Access Percentage w.r.t Time Benchmark 50 Cycles 100 Cycles Benchmark 50 Cycles 100 Cycles crafty 59.73 9.15 eon 77.91 6.06 gcc 77.85 5.47 twolf 70.40 6.46 gzip 79.73 5.61 bzip 86.92 4.90 mcf 68.47 11.02 perlbmk 77.32 3.37 parser 75.18 7.36 vpr 69.59 7.81 Avg for 50 Cycles 74.31 Avg for 100 Cycles 6.721 APPLICATION TO CACHES 15

Net Energy Savings Prog Exec Cycles Mem Access Energy Penalty per access (µj) %Net Saving (0.2 ns/cyc) %Net Saving (0.5 ns/cyc) crafty 396782412 195828079 5.93 91.28 94.02 eon 350714953 240118536 6.06 90.57 93.74 gcc 393784461 223031723 5.68 91.45 94.09 twolf 444314516 172189507 4.76 92.58 94.54 gzip 277336702 169725136 4.21 91.22 94.00 bzip 269543836 185471790 4.19 91.10 93.95 mcf 487390086 195632037 5.23 92.57 94.54 perlbmk 346674071 216796572 5.71 90.82 93.84 parser 326925643 190878110 4.91 91.26 94.01 vpr 421717636 185474202 5.09 92.16 94.37 Avg 371518431.60 197514569.20 5.18 91.50 94.11 APPLICATION TO CACHES 16

MODEL VALIDITY N d (donor concentration) and d I (implantation depth) could be varied to get the required device characteristics Two operating points need to be verified NMOS dep should get cut-off when V sb = 0.65V and V g = 0V When V gs = 1V the gate should have gain comparable to what is predicted by the enhancement model The device should operate in Cut-Off or Surface Accumulation region We solved V T Vsb =0.65 = -0.65V for various values of d I and obtained viable values for N d For all these values of N d the requirement V gs > V N is met Process parameters for NMOS dep γ I d I (10 10 m) σ N d (10 18 cm 3 ) V T 0 (V) V N (mv) 1.5γ 24.21 0.625 28.2-0.6786-37.06 2.0γ 48.41 1.5 14.23-0.6881-54.84 3.0γ 100 5 5.667-0.7084-78.78 MODEL VALIDITY 17

CONCLUSIONS AND FUTURE WORK Static Leakage is one of the biggest challenges facing the semiconductor industry in the near future We have achieved more than 90% leakage energy reduction in On-Chip L1 caches without any performance loss Our technique is immediately applicable to any lower level caches (L2) On-Chip caches constitute a major fraction of processor s area, hence considerable leakage energy could be saved by using our methodology Currently investigating the usage of warmup CMOS design style in logic blocks Working on analytical model capturing the relationship between threshold of depletion devices and leakage reduction CONCLUSIONS AND FUTURE WORK 18

THANK YOU!! Questions? THANK YOU!! 19