Variation-Resistant Dynamic Power Optimization for VLSI Circuits

Similar documents
Design for Manufacturability and Power Estimation. Physical issues verification (DSM)

EECS 427 Lecture 11: Power and Energy Reading: EECS 427 F09 Lecture Reminders

EE 466/586 VLSI Design. Partha Pande School of EECS Washington State University

CSE493/593. Designing for Low Power

Where Does Power Go in CMOS?

ASIC FPGA Chip hip Design Pow Po e w r e Di ssipation ssipa Mahdi Shabany

EE241 - Spring 2000 Advanced Digital Integrated Circuits. Announcements

Spiral 2 7. Capacitance, Delay and Sizing. Mark Redekopp

Lecture 6 Power Zhuo Feng. Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 2010

EECS 141: FALL 05 MIDTERM 1

5.0 CMOS Inverter. W.Kucewicz VLSICirciuit Design 1

Objective and Outline. Acknowledgement. Objective: Power Components. Outline: 1) Acknowledgements. Section 4: Power Components

Lecture 8-1. Low Power Design

Dynamic operation 20

Lecture 2: CMOS technology. Energy-aware computing

Scaling of MOS Circuits. 4. International Technology Roadmap for Semiconductors (ITRS) 6. Scaling factors for device parameters

MODULE III PHYSICAL DESIGN ISSUES

Luis Manuel Santana Gallego 71 Investigation and simulation of the clock skew in modern integrated circuits. Clock Skew Model 1

9/18/2008 GMU, ECE 680 Physical VLSI Design

Midterm. ESE 570: Digital Integrated Circuits and VLSI Fundamentals. Lecture Outline. Pass Transistor Logic. Restore Output.

Lecture 15: Scaling & Economics

Physical Design of Digital Integrated Circuits (EN0291 S40) Sherief Reda Division of Engineering, Brown University Fall 2006

CARNEGIE MELLON UNIVERSITY DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING DIGITAL INTEGRATED CIRCUITS FALL 2002

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

ESE570 Spring University of Pennsylvania Department of Electrical and System Engineering Digital Integrated Cicruits AND VLSI Fundamentals

Power Dissipation. Where Does Power Go in CMOS?

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

EEC 118 Lecture #5: CMOS Inverter AC Characteristics. Rajeevan Amirtharajah University of California, Davis Jeff Parkhurst Intel Corporation

VLSI Design I; A. Milenkovic 1

Topic 4. The CMOS Inverter

The Physical Structure (NMOS)

Chapter 2 Process Variability. Overview. 2.1 Sources and Types of Variations

Digital Integrated Circuits A Design Perspective

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

The CMOS Inverter: A First Glance

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

EE241 - Spring 2001 Advanced Digital Integrated Circuits

Fig. 1 CMOS Transistor Circuits (a) Inverter Out = NOT In, (b) NOR-gate C = NOT (A or B)

EE141Microelettronica. CMOS Logic

Lecture Outline. ESE 570: Digital Integrated Circuits and VLSI Fundamentals. Total Power. Energy and Power Optimization. Worksheet Problem 1

Chapter 5 CMOS Logic Gate Design

PLA Minimization for Low Power VLSI Designs

CMPEN 411 VLSI Digital Circuits Spring Lecture 14: Designing for Low Power

University of Toronto. Final Exam

EE371 - Advanced VLSI Circuit Design

THE INVERTER. Inverter

MOSFET: Introduction

Introduction to CMOS VLSI Design (E158) Lecture 20: Low Power Design

EE115C Winter 2017 Digital Electronic Circuits. Lecture 6: Power Consumption

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

EECS 151/251A Homework 5

Integrated Circuits & Systems

Parallel Processing and Circuit Design with Nano-Electro-Mechanical Relays

Variability Aware Statistical Timing Modelling Using SPICE Simulations

! Crosstalk. ! Repeaters in Wiring. ! Transmission Lines. " Where transmission lines arise? " Lossless Transmission Line.

Interconnects. Wire Resistance Wire Capacitance Wire RC Delay Crosstalk Wire Engineering Repeaters. ECE 261 James Morizio 1

ECE321 Electronics I

E40M Capacitors. M. Horowitz, J. Plummer, R. Howe

Lecture 7 Circuit Delay, Area and Power

Status. Embedded System Design and Synthesis. Power and temperature Definitions. Acoustic phonons. Optic phonons

MOS Transistor Theory

Lecture 5. MOS Inverter: Switching Characteristics and Interconnection Effects

C.K. Ken Yang UCLA Courtesy of MAH EE 215B

Lecture Outline. ESE 570: Digital Integrated Circuits and VLSI Fundamentals. Review: 1st Order RC Delay Models. Review: Two-Input NOR Gate (NOR2)

Digital Integrated Circuits Designing Combinational Logic Circuits. Fuyuzhuo

Lecture 5: DC & Transient Response

CMOS logic gates. João Canas Ferreira. March University of Porto Faculty of Engineering

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

Static CMOS Circuits. Example 1

Announcements. EE141- Spring 2003 Lecture 8. Power Inverter Chain

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING QUESTION BANK

Chapter 5. The Inverter. V1. April 10, 03 V1.1 April 25, 03 V2.1 Nov Inverter

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

Name: Grade: Q1 Q2 Q3 Q4 Q5 Total. ESE370 Fall 2015

EEC 116 Lecture #5: CMOS Logic. Rajeevan Amirtharajah Bevan Baas University of California, Davis Jeff Parkhurst Intel Corporation

Lecture 23. Dealing with Interconnect. Impact of Interconnect Parasitics

Last Lecture. Power Dissipation CMOS Scaling. EECS 141 S02 Lecture 8

Lecture 4: DC & Transient Response

Single Stuck-At Fault Model Other Fault Models Redundancy and Untestable Faults Fault Equivalence and Fault Dominance Method of Boolean Difference

EE371 - Advanced VLSI Circuit Design

EE5311- Digital IC Design

Digital Integrated Circuits 2nd Inverter

EECS150 - Digital Design Lecture 22 Power Consumption in CMOS. Announcements

EEC 216 Lecture #3: Power Estimation, Interconnect, & Architecture. Rajeevan Amirtharajah University of California, Davis

EECS 141: SPRING 09 MIDTERM 2

Low Power CMOS Dr. Lynn Fuller Webpage:

Fault Modeling. Fault Modeling Outline

EEC 118 Lecture #16: Manufacturability. Rajeevan Amirtharajah University of California, Davis

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Sciences

EE5311- Digital IC Design

EE5780 Advanced VLSI CAD

DesignConEast 2005 Track 4: Power and Packaging (4-WA1)

Topics. Dynamic CMOS Sequential Design Memory and Control. John A. Chandy Dept. of Electrical and Computer Engineering University of Connecticut

! Energy Optimization. ! Design Space Exploration. " Example. ! P tot P static + P dyn + P sc. ! Steady-State: V in =V dd. " PMOS: subthreshold

Integrated Circuits & Systems

P. R. Nelson 1 ECE418 - VLSI. Midterm Exam. Solutions

Technology Mapping for Reliability Enhancement in Logic Synthesis

CPE/EE 427, CPE 527 VLSI Design I Delay Estimation. Department of Electrical and Computer Engineering University of Alabama in Huntsville

Lecture 16: Circuit Pitfalls

EEC 118 Lecture #6: CMOS Logic. Rajeevan Amirtharajah University of California, Davis Jeff Parkhurst Intel Corporation

Transcription:

Process-Variation Variation-Resistant Dynamic Power Optimization for VLSI Circuits Fei Hu Department of ECE Auburn University, AL 36849 Ph.D. Dissertation Committee: Dr. Vishwani D. Agrawal Dr. Foster Dai Dr. Darrel Hankerson November 16, 2005

Outline Introduction Background Dynamic power dissipation Glitch reduction Previous LP model Process-variation variation-resistant resistant LP model Process variation Delay model LP model based on worst-case timing LP model based on statistical timing Input-specific optimization Without process-variation With process-variation Experimental results Conclusion Fei Hu, PhD Dissertation 2

Introduction Power component for CMOS circuits P avg = P static + P dynamic P dynamic 1/2 kc L V dd dd2 f clk Power dissipation problem For constant die size, total capacitance increases by 40% when transistor size is reduced by 70% Clock frequency is scaled up faster than the minimum feature size (MFS) Leakage power increases dramatically as MFS reduces into submicron region Architecture trend is towards programmability and reusability leads to more hunger for power Fei Hu, PhD Dissertation 3

VLSI Chip Power Density Power Density (W/cm 2 ) Source: Intel 10000 1000 100 10 1 4004 8008 8080 8086 Nuclear Reactor Hot Plate 8085 286 386 486 Rocket Nozzle P6 Pentium Sun s Surface 1970 1980 1990 2000 2010 Year Fei Hu, PhD Dissertation 4

Outline Introduction Background Dynamic power dissipation Glitch reduction Previous LP model Process-variation variation-resistant resistant LP model Process variation Delay model LP model based on worst-case timing LP model based on statistical timing Input-specific optimization Without process-variation With process-variation Experimental results Conclusion Fei Hu, PhD Dissertation 5

Background Dynamic power dissipation P dyn = P switching + P short-circuit Switching power dissipation P switching = 1/2 kc L V dd dd2 f clk V dd 1 0 off 1 0 on i c C L Gnd Fei Hu, PhD Dissertation 6

Background Short-circuit power dissipation Short-circuit current when both PMOS and NMOS are on Very much affected by the rising and falling times of input signals significant when input rise/fall time much longer than the output rise/fall time Can be kept to a insignificant portion of P dyn Fei Hu, PhD Dissertation 7

Background Glitch reduction A important dynamic power reduction technique Static glitch Glitch power consumes 30~70% P dyn for typical circuits Related techniques Balanced delay Hazard filtering Transistor/Gate sizing Linear Programming approach Fei Hu, PhD Dissertation 8

Glitch reduction Original circuit Balanced path/ path balancing Equalize delays of all path incident on a gate Balancing requires insertion of delay buffers..5.5 1.5 1 1 Hazard/glitch filtering Utilize glitch filtering effect of gate Not necessary to insert buffer Fei Hu, PhD Dissertation 9

Glitch reduction Transistor/gate sizing Find transistor sizes in the circuit to realize the delay No need to insert buffer Suffers from nonlinearity of delay model large solution space, numeric convergence and global optimization not guaranteed Linear programming approach Adopt both path balancing and hazard filtering Find the optimal delay assignments of gates Use technology mappings to map the gate delay assignments to transistor/gate dimensions. Guaranteed optimal solution, a convenient way to solve a large scale optimization problem Fei Hu, PhD Dissertation 10

Previous LP approach 1 2 15 18 19 20 16 21 4 5 22 6 7 23 24 25 8 27 28 10 29 11 12 13 14 3 17 26 9 Timing window (t, T) t 6 t 5 T 6 T 5 t 7 T 7 d 7 Gate constraints: T 7 T 5 + d 7 T 7 T 6 + d 7 t 7 t 5 + d 7 t 7 t 6 + d 7 d 7 > T 7 t 7 Circuit delay constraints: T 11 maxdelay T 12 maxdelay Objective: Minimize sum of buffer delays Fei Hu, PhD Dissertation 11

Outline Introduction Background Dynamic power dissipation Glitch reduction Previous LP model Process-variation variation-resistant resistant LP model Process variation Delay model LP model based on worst-case timing LP model based on statistical timing Input-specific optimization Without process-variation With process-variation Experimental results Conclusion Fei Hu, PhD Dissertation 12

Process-variation variation-resistant resistant optimization Motivation Gate delay assumed fixed in previous models Variation of gate delay in real circuits Environmental factors: temperature, V dd Physical factors: process variations Effect of delay variation Glitch filtering conditions corrupted Power dissipation increases from the optimized value Leakage variation possible, requires separate investigation Our proposal Consider delay variations in dynamic power optimization Only consider process variations (major source of delay variation) Fei Hu, PhD Dissertation 13

Process and delay variations Process variations Variations Variations due to semiconductor process V T, t ox, L eff, W wire, TH wire, etc. Inter-die variation Constant within a die, vary from one die to another die of a wafer or wafer lot Intra-die variation Variation within a die Due to equipment limitations or statistical effects in the fabrication process, e.g., variation in doping concentration Spatial correlations and deterministic variation due to CMP and optical proximity effect Fei Hu, PhD Dissertation 14

Process and delay variations Delay variation First order gate delay model CL Vdd CL Vdd Td = = I μc ( W ox ) L ( V dd V t ) 2 Gate delay sensitive to process-variations Related previous work Static timing analysis Worst case timing analysis Statistical timing analysis Power optimization under process-variations Voltage scaling, multi-v dd /V th considering critical delay variations Gate sizing using statistical delay model No work on glitch power optimization 2 Fei Hu, PhD Dissertation 15

Delay model and implications Random gate delay model D = D +Δ D +ΔD total, i nom, i inter,i intra,i Truncated normal distribution Assume independence Variation in terms of σ/d nom,i ratio Effect of inter-die variations Depends on its effect to switching activities Definition of glitch-filtering probability P glt = P {t 2 -t 1 < d} Signal arrival time t 1, t 2 Gate inertial delay d Theorem 1 states the change of P glt due to inter-die variation 1 k k Δ Pglt = erf( ) erf( ) 2 2 2 2+ 2( r k) erf(), the error function k, a path and gate dependent constant r, σ/d nom,i ratio for inter-die variations Fei Hu, PhD Dissertation 16

Delay model and implications Effect of inter-die variations For a large inter-die variation,, r = 0.15, ΔP glt < 5.3 10 10-3 Negligible effect on switching activity Fei Hu, PhD Dissertation 17

Delay model and implications Process-variation variation-resistant resistant design Can be achieved by path balancing and glitch filtering Critical delay may increase Theorem 2 states that a solution is guaranteed only if circuit delay d is allowed to increase Proved by example, assuming 10% variation 2.1 3.9 Fei Hu, PhD Dissertation 18

LP model based on worst-case timing Timing model...... Fei Hu, PhD Dissertation 19

LP model based on worst-case timing Constraints Gate constraints Tb Tb Tb Glitch filtering constraints Tb tb < d (1 3 r) α where r < 0.33 (33%) Delay constraints for POs Ta D Parameter i i i Ta ; 1 Ta j; Ta ; k tb tb tb i i i i i i i max ta ; 1 ta j; ta ; k Tai = Tbi + di (1 + 3 r); ta = tb + d (1 3 r); i i i r, σ/d nom,i ratio D max, circuit delay parameter α,, optimism factor [1, ]; 1 all glitches filtered, no glitch filtered Objective Minimize #buffer inserted sum of buffer delays Fei Hu, PhD Dissertation 20

LP model based on statistical timing Worst-case timing tends to be too pessimistic Statistical timing model with random variables Gate 1 ta 1 Ta 1...... Gate j ta j Ta j Gate i ta i Ta i ta k Ta k d i Gate k tb i Tb i Fei Hu, PhD Dissertation 21

LP model based on statistical timing Minimum-maximum maximum statistics needed for tb i, Tb i Previous works tb Tb i 1 j k Min, Max for two normal random variable not necessarily distributed ted as normal Can be approximated with a normal distribution Requiring complex operations, e.g., integration, exponentiation, etc. Challenges for LP approach = Min( ta, ta, ta ); = Max( Ta, Ta, Ta ); i 1 j k Require simple approximation w/o nonlinear operations Our approximation for C=Max( Max(A,B), A, B, and C are Gaussian RVs μc = Max( μa, μb) μ + 3σ = Max( μ + 3 σ, μ + 3 σ ) C C A A B B Fei Hu, PhD Dissertation 22

LP model based on statistical timing Min-Max Max statistics approximation error Negligible when μ A -μ B > 3(σ A + σ B ) Largest when μ A =μ B P CDF A CDF B Actual CDF for Max(A,B) Approximated CDF for Max(A,B) μ = Max( μ, μ ) C A B 1 σ C = Max( μa + 3 σ A, μb + 3 σ B ) μc 3 ( ) A B x Fei Hu, PhD Dissertation 23

LP model based on statistical timing Variables Timing, delay variables with mean μ and std dev σ Auxiliary variables, T, t, W = Tb tb, μ, σ Constraints Gate constraints Tb tb i i i W W i i i i Timing window at the inputs for a two-input gate i μ μ ; T μ + 3 σ ; μ μ ; t μ 3 σ ; Tb Ta Tb Ta Ta i 1 i 1 1 μ μ ; T μ + 3 σ ; σ Tb Ta Tb Ta Ta i 2 i 2 2 = ( T μ )/3; Tb Tb Tb i i i tb ta tb ta Ta i 1 i 1 1 μ μ ; t μ 3 σ ; σ tb ta tb ta Ta i 2 i 2 2 = ( μ t )/3; tb tb tb i i i Timing window at outputs μ = μ + μ ; σ = k( σ + r μ ); Ta Tb d Ta Tb d i i i i i i μ = μ + μ ; σ = k( σ + r μ ); ta tb d ta tb d i i i i i i Fei Hu, PhD Dissertation 24

LP model based on statistical Constraints Gate constraint Linear approximation timing σ = σ + ( r μ ) σ = k( σ + r μ ) 2 2 Ta Tb d Ta Tb d i i i i i i k [0.707, 1]; choose k=0.85, since Glitch filtering constraints μ = μ μ W Tb tb i i i σ = k( σ + σ ); W Tb tb i i i μ μ > 3 k( σ + r μ ); d W W d i i i i ; A+ B + + 2 3σ P 2 2 A B A B ; Circuit delay constraint μ (1 + 3 r) Ta i D max d i -W i Fei Hu, PhD Dissertation 25

LP model based on statistical timing Parameter ratio max, circuit delay parameter α,, optimism factor r, σ/d nom,i D max Objective μ μ > 3 k( σ + r μ ) α; d W W d i i i i α=1, no relaxation α<1, optimistic about the actual glitch width α=0, reduce to previous model Minimize #buffer inserted sum of buffer delays Fei Hu, PhD Dissertation 26

Outline Introduction Background Dynamic power dissipation Glitch reduction Previous LP model Process-variation variation-resistant resistant LP model Process variation Delay model LP model based on worst-case timing LP model based on statistical timing Input-specific optimization Without process-variation With process-variation Experimental results Conclusion Fei Hu, PhD Dissertation 27

Input-specific optimization Motivation Previous LP models guarantees glitch filtering for any input vector sequence T i - t i < d i for all gates Redundancy in optimization Insertion of more buffers Increased the overhead in power/area In reality, circuit under embedded environments Optimization for input vector sequence that is possible to the circuit, e.g., functional vectors Same reduction in power dissipation w/ less trade-offs in overheads Fei Hu, PhD Dissertation 28

Input-specific optimization Glitch generation pattern Input vector pair that can potentially generate a glitch AND gate example: 1 1 1 0 1 0 0 1 0 1 0 0 1 0 Glitch generation probability P g [i] Probability glitch-generation generation pattern occurs at input of gate i Steady state signal values match the pattern Fei Hu, PhD Dissertation 29

Input-specific optimization Application to Previous model w/o process-variation Static optimization Only static glitches/hazards considered Relaxation of constraints Relax glitch filtering constraints where glitches unlikely happen T i - t i < d i => (T i t i )*β i < d i Selective relaxation 0 if Pg [ i] = 0 βi = 1 if Pg [ i] > 0 Generalized relaxation β = i P [] i g 1 e τ Fei Hu, PhD Dissertation 30

Input-specific optimization Application to process-variation variation-resistant resistant LP model based on statistical timing Static optimization Relaxation of constraints μ > [ μ + 3 k( σ + r μ ) α] β ; Selective relaxation Generalized relaxation Tuning factor Original objective Current objective d W W d i i i i i Minimize d j; ( j buffers) j 1 Minimize d j + TF ( di); ( j buffers, i other gates) N j i Fei Hu, PhD Dissertation 31

Input-specific optimization Why need a tuning factor Dominating path affected critical delay distribution Dominating path 41 Can be [1,41] 1 0 1 0 1 1 Fei Hu, PhD Dissertation 32

Outline Introduction Background Dynamic power dissipation Glitch reduction Previous LP model Process-variation variation-resistant resistant LP model Process variation Delay model LP model based on worst-case timing LP model based on statistical timing Input-specific optimization Without process-variation With process-variation Experimental results Conclusion Fei Hu, PhD Dissertation 33

Experimental results Experimental procedure Flow chart Power estimation Event driven logic simulation Fanout weighted sum of switching activities Variations of C L and V dd ignored Monte-Carlo simulation with 1,000 samples of delays under process-variation Results analysis Un-Opt., unit-delay circuit Opt, previous optimization Opt1, Proc-var var-rstrst optimization worst-case timing Opt2, Proc-var var-rstrst optimization statistical timing D max r, Circuit Data extraction AMPL Circuit generation Logic simulations Results Constraint set data Gate delays Optimized circuit LP models Fei Hu, PhD Dissertation 34

Experimental results small variation Power dissipation under no process variation UnOpt Opt (w/o proc var.) Opt1 (worst case proc) Opt2 (statistical proc) c432 c499 c880 c1355 c1908 c2670 c3540 c5315 c6288 c7552 Pwr. Pwr. 0.74 0.74 0.94 0.94 0.54 0.54 0.93 0.93 0.53 0.55 0.74 0.74 0.60 0.59 0.56 0.56 0.13 0.13 0.52 0.52 Buf. 95 66 80 48 63 29 224 160 84 54 157 26 219 103 281 113 881 864 369 62 maxdelay 17 34 11 22 24 72 24 72 40 120 32 96 47 141 49 147 124 372 43 129 Pwr. 0.74 0.74 0.94 0.94 0.54 0.54 0.93 0.93 0.53 0.53 0.79 0.75 0.59 0.61 0.62 0.58 0.15 0.14 0.64 0.56 Buf. 96 91 88 88 45 37 296 296 68 92 244 80 228 152 228 130 801 922 180 162 Dmax 20 40 13 26 28 83 28 83 46 138 37 111 55 163 57 170 143 428 50 149 Pwr. 0.74 0.74 0.94 0.94 0.54 0.54 0.93 0.93 0.52 0.52 0.73 0.73 0.59 0.59 0.55 0.55 0.14 0.13 0.52 0.52 Buf. 99 91 97 129 76 37 305 273 136 198 313 168 306 303 401 460 1685 1213 464 879 Dmax 20 40 13 26 28 83 28 83 46 138 37 111 55 163 57 170 143 428 50 149 Fei Hu, PhD Dissertation 35

Experimental results small variation Power distribution under 5% inter-die, 5% intra-die variation Circuit c432 c499 c880 c1355 c1908 c2670 c3540 c5315 c6288 c7552 Maxdelay 17 34 11 22 24 72 24 72 40 120 32 96 47 141 49 147 124 372 43 129 Un-Opt Mean Max. Dev. Pwr. (%) 8 17.5 8 17.5 6 12.9 6 12.9 3 7.1 3 7.1 1.10 18.1 1.10 18.1 1.15 2 1.15 2 1.17 21.8 1.17 21.8 1.15 18.9 1.15 18.9 1.12 14.9 1.12 14.9 1.46 49.9 1.46 49.9 1.17 19.6 1.17 19.6 Opt (w/o proc var.) Mean Max. Dev. Pwr. (%) 0.78 12.8 0.76 8.2 0 12.6 0.99 12.6 0.62 23.1 0.57 12.8 0.99 10.6 0.98 8.8 0.64 28.6 0.64 21.5 0.80 11.6 0.77 6.1 0.66 15.2 0.62 7.2 0.62 13.8 0.60 10.3 0.27 131.6 0.26 128.3 0.57 12.4 0.56 9.3 Opt1 (worst case proc) Mean Max. Dev. Pwr. (%) 0.75 7.0 0.74 0.1 0.95 0.7 0.94 0.0 0.58 13.9 0.55 1.1 0.96 5.5 0.93 0.3 0.62 22.8 0.54 5.9 0.81 5.5 0.78 5.2 0.65 12.9 0.63 5.1 0.67 9.9 0.61 6.8 0.28 105.9 0.23 76.8 0.72 13.3 0.58 5.1 Opt2 (statistical proc) Mean Max. Dev. Pwr. (%) 0.75 4.5 0.74 0.1 0.95 0.7 0.94 0.1 0.55 7.5 0.54 0.95 4.2 0.93 0.1 0.58 21.6 0.54 6.5 0.75 4.8 0.74 1.8 0.63 9.7 0.59 1.3 0.59 9.1 0.56 3.7 0.24 93.6 0.18 56.0 0.57 11.8 0.53 3.5 Fei Hu, PhD Dissertation 36

Experimental results small variation Power timing analysis Example c432 maxdelay=17 maxdelay=26 Complete suppression of power variation Fei Hu, PhD Dissertation 37

Experimental results small variation Critical delay distribution Nominal delay Max. Deviation Similar nominal delay Reduced variation by Opt2 for c880, c2670, c5315, c7552 Fei Hu, PhD Dissertation 38

Experimental results large variation Power dissipation under no process-variation c432 c499 c880 c1355 c1908 c2670 c3540 c5313 c6288 c7552 Un-opt. Pwr. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Opt (w/o proc var.) Pwr. Buf. maxdelay 0.74 66 34 0.74 58 68 0.94 48 22 0.94 0 33 0.54 35 48 0.54 30 120 0.93 192 48 0.93 128 120 0.53 62 80 0.54 34 200 0.74 34 64 0.74 9 160 0.59 139 94 0.59 78 235 0.56 167 98 0.56 53 245 0.13 870 228 0.13 857 620 0.52 91 86 0.52 44 215 Opt1 (worst case proc) Pwr. Buf. Dmax 0.75 87 50 0.74 81 99 0.97 88 32 0.97 0 48 0.58 36 70 0.59 29 174 0.95 264 70 0.96 264 174 0.55 41 116 0.56 12 290 0.80 39 93 0.78 95 232 0.62 149 137 0.65 52 341 0.66 93 143 0.60 144 356 0.14 1303 331 0.13 939 899 0.69 64 125 0.60 622 312 Opt2 (statistical proc) Pwr. Buf. Dmax 0.74 88 50 0.74 106 99 0.94 88 32 0.94 129 48 0.54 57 70 0.54 62 174 0.93 305 70 0.93 305 174 0.52 135 116 0.52 190 290 0.74 249 93 0.73 211 232 0.59 281 137 0.59 311 341 0.55 399 143 0.55 418 356 0.13 1121 331 0.13 1473 899 0.52 481 125 0.52 645 312 Fei Hu, PhD Dissertation 39

Experimental results large variation Power distribution under 15% intra-die and 5% inter-die variation Circuit c432 c499 c880 c1355 c1908 c2670 c3540 c5313 c6288 c7552 Maxdelay 34 68 22 33 48 120 48 120 80 200 64 160 94 235 98 245 228 620 86 215 Un-opt Mean Max. Dev. Pwr. (%) 9 19.8 9 19.8 7 14.0 7 14.0 4 8.0 4 8.0 1.13 21.8 1.13 21.8 1.16 23.1 1.16 23.1 1.19 25.4 1.19 25.4 1.16 20.7 1.16 20.7 1.13 16.5 1.13 16.5 1.45 52.2 1.45 52.2 1.17 21.9 1.17 21.9 Opt (w/o proc var.) Mean Max. Dev. Pwr. (%) 0.78 12.6 0.77 10.3 2 15.3 0.99 10.2 0.62 26.5 0.60 22.7 6 19.7 5 18.8 0.72 49.6 0.66 32.3 0.81 13.6 0.80 11.2 0.67 19.5 0.66 16.1 0.67 24.6 0.64 19.0 0.43 274.3 0.41 264.0 0.64 25.8 0.60 20.2 Opt1 (worst case proc) Mean Max. Dev. Pwr. (%) 0.78 12.1 0.75 6.1 0.98 1.7 0.97 1.4 0.63 15.7 0.60 5.6 0.98 7.3 0.97 1.7 0.66 30.1 0.62 18.8 0.90 16.0 0.82 8.6 0.69 16.9 0.71 11.7 0.74 16.3 0.66 13.9 0.36 193.4 0.31 161.5 0.78 16.0 0.65 11.2 Opt2 (statistical proc) Mean Max. Dev. Pwr. (%) 0.76 11.1 0.74 3.7 0.95 2.0 0.95 0.59 18.2 0.55 8.6 0.98 10.2 0.94 3.0 0.64 35.8 0.58 21.4 0.80 13.6 0.76 6.2 0.66 17.8 0.62 10.1 0.63 20.8 0.60 13.4 0.38 223.8 0.26 125.3 0.59 18.7 0.56 11.8 Fei Hu, PhD Dissertation 40

Experimental results large variation Critical delay distribution Nominal delay Max. Deviation (%) Similar nominal delay Reduced delay variation by Opt2 Fei Hu, PhD Dissertation 41

Experimental results input-specific optimization Application to Opt under no process-variation, IS-Opt c432 c499 c880 c1355 c1908 c2670 c3540 c5315 c6288 c7552 maxdelay 34 68 22 33 48 120 48 120 80 200 64 160 94 235 98 245 228 620 86 215 Un-Opt Pwr. Pwr. 0.74 0.74 0.94 0.94 0.54 0.54 0.93 0.93 0.53 0.54 0.74 0.74 0.59 0.59 0.56 0.56 0.13 0.13 0.52 0.52 Opt (w/o proc var.) Delay 34 68 22 33 51 121 48 121 82 203 65 163 95 239 100 249 226 620 89 220 Buffers 66 58 48 0 35 30 192 128 62 34 34 9 139 78 167 53 870 857 91 44 IS-Opt (input-specific w/o proc) Pwr. 0.74 0.74 0.94 0.95 0.54 0.54 0.93 0.93 0.54 0.53 0.74 0.74 0.59 0.59 0.56 0.56 0.13 0.13 0.52 0.52 Delay 35 69 22 33 49 122 48 120 86 204 66 162 101 239 104 250 228 620 88 221 Buffers 66 41 33 0 32 24 113 25 52 3 30 1 122 73 170 52 870 853 84 38 Fei Hu, PhD Dissertation 42

Experimental results input-specific optimization Application to Opt2 under process-variation, IS-Opt2 under 15% intra-die and 5% inter-die variation Cir. D Max c432 50 99 c499 32 48 c880 70 174 c1355 70 174 c1908 116 290 c2670 93 232 c3540 137 341 c5315 143 356 c6288 331 899 c7552 125 312 Un-opt. Nom. Pwr. Nom. Pwr. 0.74 0.74 0.94 0.94 0.54 0.54 0.93 0.93 0.52 0.52 0.74 0.73 0.59 0.59 0.55 0.55 0.13 0.13 0.52 0.52 Opt2 (statistical proc) IS-Opt2 (input-specific statistical proc) Mean Max Dev. No. Nom. Mean Max Dev. No. Pwr. (%) Buf. Pwr. Pwr. (%) Buf. 0.76 11.1 88 0.74 0.76 9.3 81 0.74 3.7 106 0.74 0.74 3.3 76 0.95 2.0 88 0.94 0.95 1.9 88 0.95 129 0.94 0.95 1.8 58 0.59 18.2 57 0.54 0.59 20.4 38 0.55 8.6 62 0.54 0.56 9.0 38 0.98 10.2 305 0.93 1 13.1 253 0.94 3.0 305 0.93 0.95 4.7 160 0.64 35.8 135 0.52 0.64 34.7 107 0.58 21.4 190 0.52 0.57 18.4 104 0.80 13.6 249 0.73 0.79 11.3 186 0.76 6.2 211 0.73 0.75 4.3 79 0.66 17.8 281 0.59 0.65 15.6 247 0.62 10.1 311 0.59 0.61 7.4 188 0.63 20.8 399 0.55 0.63 2 389 0.60 13.4 418 0.55 0.60 13.2 413 0.38 223.8 1121 0.13 0.38 225.2 1115 0.26 125.3 1473 0.13 0.26 125.5 1243 0.59 18.7 481 0.52 0.58 18.1 389 0.56 11.8 645 0.52 0.55 10.9 520 Fei Hu, PhD Dissertation 43

Experimental results input-specific optimization Trade-off by generalized relaxation c432 circuit with varying τ value Reduction of #buffers with degradation of power distribution Fei Hu, PhD Dissertation 44

Experimental results input-specific optimization Critical delay Nominal delay Max. deviation Similar performance for Opt2 and IS-Opt2 Fei Hu, PhD Dissertation 45

Outline Introduction Background Dynamic power dissipation Glitch reduction Previous LP model Process-variation variation-resistant resistant LP model Process variation Delay model LP model based on worst-case timing LP model based on statistical timing Input-specific optimization Without process-variation With process-variation Experimental results Conclusion Fei Hu, PhD Dissertation 46

Conclusions Proposed a dynamic power optimization technique that is resistant t to the process variation Consider process-variation in terms of the delay variations inter-die and intra-die variations Prove inter-die variation has negligible effect on switching activity and power Construct two new LP models Worst case timing analysis Statistical timing analysis Input-specific optimization to reduce number of buffers Circuit optimized for certain input vector sequence Experimental results Complete suppression of power variation for small circuit and variations Significant reduction of power and delay variations for larger circuit c and variations 53% reduction in power deviation, 40% reduction in delay deviation under 15% intra-die and 5% inter-die variation Input-specific optimization reduces trade-off (buffers) significantly w/ equivalent power and delay performance IS-Opt2 vs. Opt2, Up to 63% reduction of buffer Fei Hu, PhD Dissertation 47

Questions For more questions, contact me at hufei01@auburn.edu Fei Hu, PhD Dissertation 48