Energy Delay Optimization

Similar documents
EE M216A.:. Fall Lecture 5. Logical Effort. Prof. Dejan Marković

Lecture 5. Logical Effort Using LE on a Decoder

EE141-Fall 2011 Digital Integrated Circuits

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

C.K. Ken Yang UCLA Courtesy of MAH EE 215B

Lecture Outline. ESE 570: Digital Integrated Circuits and VLSI Fundamentals. Total Power. Energy and Power Optimization. Worksheet Problem 1

Parallel Processing and Circuit Design with Nano-Electro-Mechanical Relays

School of EECS Seoul National University

EE241 - Spring 2001 Advanced Digital Integrated Circuits

CMPEN 411 VLSI Digital Circuits Spring Lecture 14: Designing for Low Power

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

CMPEN 411 VLSI Digital Circuits Spring Lecture 19: Adder Design

CMPEN 411 VLSI Digital Circuits Spring 2012 Lecture 17: Dynamic Sequential Circuits And Timing Issues

EE M216A.:. Fall Lecture 4. Speed Optimization. Prof. Dejan Marković Speed Optimization via Gate Sizing

EEC 216 Lecture #3: Power Estimation, Interconnect, & Architecture. Rajeevan Amirtharajah University of California, Davis

Digital Integrated Circuits A Design Perspective

Using MOS Models. C.K. Ken Yang UCLA Courtesy of MAH EE 215B

EE241 - Spring 2000 Advanced Digital Integrated Circuits. Announcements

Topics. CMOS Design Multi-input delay analysis. John A. Chandy Dept. of Electrical and Computer Engineering University of Connecticut

Topics. Dynamic CMOS Sequential Design Memory and Control. John A. Chandy Dept. of Electrical and Computer Engineering University of Connecticut

Clock signal in digital circuit is responsible for synchronizing the transfer to the data between processing elements.

Integrated Circuits & Systems

EECS 151/251A Homework 5

CSE241 VLSI Digital Circuits Winter Lecture 07: Timing II

Announcements. EE141- Spring 2003 Lecture 8. Power Inverter Chain

A Mathematical Solution to. by Utilizing Soft Edge Flip Flops

CSE493/593. Designing for Low Power

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

EE115C Winter 2017 Digital Electronic Circuits. Lecture 6: Power Consumption

CMPEN 411 VLSI Digital Circuits Spring Lecture 21: Shifters, Decoders, Muxes

EE141Microelettronica. CMOS Logic

MODULE III PHYSICAL DESIGN ISSUES

Managing Physical Design Issues in ASIC Toolflows Complex Digital Systems Christopher Batten February 21, 2006

ELEC516 Digital VLSI System Design and Design Automation (spring, 2010) Assignment 4 Reference solution

Dynamic operation 20

Digital Integrated Circuits A Design Perspective

EE141- Spring 2007 Digital Integrated Circuits

EE382 Processor Design Winter 1999 Chapter 2 Lectures Clocking and Pipelining

Digital Integrated Circuits A Design Perspective

Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic. November Digital Integrated Circuits 2nd Sequential Circuits

Integrated Circuits & Systems

Lecture 34: Portable Systems Technology Background Professor Randy H. Katz Computer Science 252 Fall 1995

Lecture 4: Implementing Logic in CMOS

EE241 - Spring 2003 Advanced Digital Integrated Circuits

Skew-Tolerant Circuit Design

Where Does Power Go in CMOS?

Semiconductor Memories

Design and Optimization of Low-power CMOS Logic Using. Logical Effort Model with Slope Correction

Integrated Circuits & Systems

EE 447 VLSI Design. Lecture 5: Logical Effort

Chapter 5 CMOS Logic Gate Design

EE371 - Advanced VLSI Circuit Design

Next, we check the race condition to see if the circuit will work properly. Note that the minimum logic delay is a single sum.

Last Lecture. Power Dissipation CMOS Scaling. EECS 141 S02 Lecture 8

Lecture 6 Power Zhuo Feng. Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 2010

MODULE 5 Chapter 7. Clocked Storage Elements

EECS 427 Lecture 11: Power and Energy Reading: EECS 427 F09 Lecture Reminders

ALU A functional unit

Lecture 6: Logical Effort

EE241 - Spring 2000 Advanced Digital Integrated Circuits. Announcements

ECE429 Introduction to VLSI Design

Vectorized 128-bit Input FP16/FP32/ FP64 Floating-Point Multiplier

Chapter 8. Low-Power VLSI Design Methodology

Designing Sequential Logic Circuits

GMU, ECE 680 Physical VLSI Design

9/18/2008 GMU, ECE 680 Physical VLSI Design

Introduction to Computer Engineering. CS/ECE 252, Fall 2012 Prof. Guri Sohi Computer Sciences Department University of Wisconsin Madison

! Memory. " RAM Memory. ! Cell size accounts for most of memory array size. ! 6T SRAM Cell. " Used in most commercial chips

University of Toronto. Final Exam

Hw 6 due Thursday, Nov 3, 5pm No lab this week

Logic Synthesis and Verification

Objective and Outline. Acknowledgement. Objective: Power Components. Outline: 1) Acknowledgements. Section 4: Power Components

Digital Integrated Circuits A Design Perspective

Jin-Fu Li Advanced Reliable Systems (ARES) Lab. Department of Electrical Engineering. Jungli, Taiwan

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

Where are we? Data Path Design

Announcements. EE141- Fall 2002 Lecture 7. MOS Capacitances Inverter Delay Power

Datapath Component Tradeoffs

Digital Integrated Circuits A Design Perspective

Clock Strategy. VLSI System Design NCKUEE-KJLEE

Design for Manufacturability and Power Estimation. Physical issues verification (DSM)

Digital Integrated Circuits A Design Perspective. Semiconductor. Memories. Memories

Memory, Latches, & Registers

Slides for Lecture 19

Physical Design of Digital Integrated Circuits (EN0291 S40) Sherief Reda Division of Engineering, Brown University Fall 2006

L16: Power Dissipation in Digital Systems. L16: Spring 2007 Introductory Digital Systems Laboratory

The Linear-Feedback Shift Register

Variation-Resistant Dynamic Power Optimization for VLSI Circuits

Lecture 8-1. Low Power Design

Chapter Overview. Memory Classification. Memory Architectures. The Memory Core. Periphery. Reliability. Memory

Sequential Logic Worksheet

Digital Integrated Circuits Designing Combinational Logic Circuits. Fuyuzhuo

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

L4: Sequential Building Blocks (Flip-flops, Latches and Registers)

Digital Integrated Circuits 2nd Inverter

Spiral 2 7. Capacitance, Delay and Sizing. Mark Redekopp

COMP 103. Lecture 16. Dynamic Logic

CMPEN 411. Spring Lecture 18: Static Sequential Circuits

Lecture 4: CMOS review & Dynamic Logic

and V DS V GS V T (the saturation region) I DS = k 2 (V GS V T )2 (1+ V DS )

Transcription:

EE M216A.:. Fall 21 Lecture 8 Energy Delay Optimization Prof. Dejan Marković ee216a@gmail.com Some Common Questions Is sizing better than V DD for energy reduction? What are the optimal values of gate size and V DD? What is the optimal ratio of leakage / switching for min E? Shall we increase or decrease V DD for energy reduction? What is the optimal circuit topology? How many levels of parallelism is good? Etc. EEM216A.:. Fall 21 Lecture 8: Energy Delay D. Optimization Markovic / Slide 2 2

Energy Minimization Problem Goal: achieve the lowest energy under the delay constraint E max Energy Unoptimized design W V DD E min W & V DD W & V DD & V TH D min D max (f clk max ) (f clk min ) Delay EEM216A.:. Fall 21 Lecture 8: Energy Delay D. Optimization Markovic / Slide 3 3 Energy Delay Sensitivity Slope of E D curve around a design point (e.g. (A,B )) E/ A S A = D/ A A = A Energy (A S,B ) A f(a,b ) S B D f(a,b) Delay [D. Markovic, V. Stojanovic, B. Nikolic, M.A. Horowitz,.W. Brodersen, JSSC, Aug 4] EEM216A.:. Fall 21 Lecture 8: Energy Delay D. Optimization Markovic / Slide 4 4

Solution: Equal Sensitivities A fixed point is reached when all sensitivities are equal f(a 1,B) E = S A ( D) + S B D Energy (A,B ) D D f(a,b ) f(a,b) Delay EEM216A.:. Fall 21 Lecture 8: Energy Delay D. Optimization Markovic / Slide 5 5 Circuit Optimization Constraints eference design D min sizing @ V DD max, V TH Energy/op topology A topology B Delay Goal: find optimal E D tradeoff for a logic function EEM216A.:. Fall 21 Lecture 8: Energy Delay D. Optimization Markovic / Slide 6 6

Alpha power based Delay Model Combined with Logical Effort Formulation (*) FO4 delay (norm.) 4 3.5 3 2.5 2 1.5 1.5 simulation model V on =.37 V α d = 1.53.5.6.7.8.9 1 V dd / V dd Fitting parameters V on, α d, K d Effective fanout. V DD = 1.2V, FO4 (V DD ) = 25ps (*) [Sutherland et al., Logical Effort, 1999] EEM216A.:. Fall 21 Lecture 8: Energy Delay D. Optimization Markovic / Slide 7 7 Energy Model Switching energy Leakage energy with: D: the cycle time I (S in ): normalized leakage current with inputs in state S in EEM216A.:. Fall 21 Lecture 8: Energy Delay D. Optimization Markovic / Slide 8 8

Adjusting Switching Energy of a Gate V dd, i V dd, i + 1 i i+1 Sizing W i W par,i W out Supply W i p nom,i W i W wire W i+1 = energy stored on the logic gate i EEM216A.:. Fall 21 Lecture 8: Energy Delay D. Optimization Markovic / Slide 9 9 Optimization Setup eference/nominal circuit Sized for D min @ V DD nom Define delay constraint t D con = D min (1 + d inc /1) En nergy erence Minimize energy under delay constraint V DD scaling (global, discrete, per stage) Gate sizing (W) Threshold adjustment (V TH ) Optional buffering D min Delay EEM216A.:. Fall 21 Lecture 8: Energy Delay D. Optimization Markovic / Slide 1 1

Profitability of Optimization ( E/ D) Gate sizing (W) for equal h eff (D min ) Supply voltage (V DD ) S(V DD ) V DD V TH V DD Threshold voltage (V TH ) S(V TH ) V TH V TH EEM216A.:. Fall 21 Lecture 8: Energy Delay D. Optimization Markovic / Slide 11 11 Circuit Optimization Examples Inverter chain Memory decoder Branching Inactive gates Tree adder Long wires e convergent paths Multiple active outputs addr input m = 16 m = 4 n = n = 12 (A 15, B 15 ) predecoder 3 15 C W m = 2 n = 3 word driver m = 1 n = 255 C L C L S 15 word line (A, B ) C in S EEM216A.:. Fall 21 Lecture 8: Energy Delay D. Optimization Markovic / Slide 12 12

Example: Inverter Chain Properties of inverter chain Single path topology Energy increases geometrically from input to output 1 W 1 =1 W 2 W 3 W N C L Goal Find optimal sizing W = [W 1, W 2,, W N ], supply voltage, and buffering strategy to achieve the best energy delay tradeoff EEM216A.:. Fall 21 Lecture 8: Energy Delay D. Optimization Markovic / Slide 13 13 Inverter Chain: Gate Sizing & Buffering effective fanou ut, h eff 25 2 15 1 5 nom opt d inc = 5% 3% 1% 1% % 1 2 3 4 5 6 7 stage [Ma, Franzon, IEEE JSSC, 9/94] Variable taper achieves minimum energy educe number of stages at large d inc EEM216A.:. Fall 21 Lecture 8: Energy Delay D. Optimization Markovic / Slide 14 14

Inverter Chain: V DD Optimization nom 1. 8.6.4 V dd / V dd.8.2 d inc = 5% % nom opt 1 2 3 4 5 6 7 stage effective fanou ut, h eff 18 15 1% 12 1% 3% 9 6 3 nom opt d inc = 5% 3% 1% 1% % 1 2 3 4 5 6 7 stage Variable taper achieved by voltage scaling V DD reduces energy of the final load first EEM216A.:. Fall 21 Lecture 8: Energy Delay D. Optimization Markovic / Slide 15 15 Inverter Chain: Optimization esults energy reduct tion (%) 1 8 6 4 2 cw gvdd 2Vdd cvdd Sensitivity ( norm) 1..8.6.4.2 cw gvdd 2Vdd 1 2 3 4 5 d inc (%) 1 2 3 4 5 d inc (%) Parameter with the largest sensitivity has the largest potential for energy reduction Two discrete supplies mimic per stage V DD EEM216A.:. Fall 21 Lecture 8: Energy Delay D. Optimization Markovic / Slide 16 16

SAM Decoder Energy Profile Energy (nor rm) addr input 1 8 6 4 2 m = 8 Internal energy peak predecoder word driver m = 4 m = 2 m = 1 3 15 C W C L word line EEM216A.:. Fall 21 Lecture 8: Energy Delay D. Optimization Markovic / Slide 17 17 W vs. V DD for educing Energy Peak energy 1 8 6 4 2 1 cw: E (-52%) 2Vdd: E (-25%) E 8 7 d D min,7 +5% inc =1% 6 E 7-19% 4 D min,7 energy 2 1 2 3 4 5 6 7 8 9 stage 1 2 3 4 5 6 7 8 9 stage V DD less effective than W optimization Buffering also reduces energy peak [B. Amrutur, Ph.D. Thesis, Stanford, 8/99] EEM216A.:. Fall 21 Lecture 8: Energy Delay D. Optimization Markovic / Slide 18 18

Tree Adder: Optimization esults eference: all paths are critical erence D min, E Sizing Opt. 1.1D min,.45e 2 V DD Opt. 1.1D min,.73e Internal energy W more effective than V DD For d inc = 1%: E W = 55%, E 2Vdd = 27% EEM216A.:. Fall 21 Lecture 8: Energy Delay D. Optimization Markovic / Slide 19 19 A Look at Tuning Variables 1% excess delay 3 7% energy reduction 1 8 solid: inverter chain dashed: 8/256 decoder dotted: 64-bit adder 1 5 1 4 (V th min ) Energy reduction (%) 6 4 2 W V dd V th Sensitivity 1 3 1 2 1 1 (V dd max ) (V th ) V th V dd W 2 4 6 8 1 Delay increase (%) 1.5.75 1 1.25 1.5 D / D Peak performance is very power inefficient! EEM216A.:. Fall 21 Lecture 8: Energy Delay D. Optimization Markovic / Slide 2 2

Circuit Level esults: Tree Adder E/E 1.5 S W S Vth =.2 S W =22 S Vdd =1.5 1 S Vth =22 S Vdd =16 65%.5 S W =1 S Vth =1 S Vdd =1 5.5 1 D/D 15 1.5 esult of joint (V DD, V TH, W) optimization: 65% of energy saved without delay penalty 25% better delay without energy cost [D. Markovic, V. Stojanovic, B. Nikolic, M.A. Horowitz,.W. Brodersen, JSSC, Aug 4] Limited range of efficient circuit optimization EEM216A.:. Fall 21 Lecture 8: Energy Delay D. Optimization Markovic / Slide 21 21 A Look at Tuning Variables It is best when variables don t reach their boundary values V DD / V DD Supply voltage 1.75 1.5 1.25 1.75.5 25.25 Threshold voltage reliability Sens(V DD ) = 5 limit Sens(W) = 1 Sens(V TH ) = 1 5 25 25 5 75 1 Delay increment (%) V TH (mv) 15 2 25 3 Limited range of tuning variables 35 5 25 25 5 75 1 Delay increment (%) EEM216A.:. Fall 21 Lecture 8: Energy Delay D. Optimization Markovic / Slide 22 22

A Look at Tuning Variables When a variable reaches bound, fewer variables to work with V DD / V DD Supply voltage Threshold voltage 1.75 reliability Sens(V DD ) = 1.5 5 limit Sens(W) = 1.25 1 Sens(V TH ) = 1 1.75.5 25.25 V TH (mv) 15 2 25 Sens(V TH ) = Sens(V( DD ) = 16 3 Sens(W) = 22 5 25 25 5 75 1 Delay increment (%) Limited range of tuning variables 35 5 25 25 5 75 1 Delay increment (%) EEM216A.:. Fall 21 Lecture 8: Energy Delay D. Optimization Markovic / Slide 23 23 Tree Adder: Joint Optimization energy reductio on (%) 1 8 6 4 2 67% 2Vdd + cw 2Vdd gvdd + cw gvdd cw 2 4 6 8 1 d inc (%) energy reduction (% %) @ D min 5 4 3 2 1 42% 1 1. 115 1.15 nom V dd / V dd lure Device fai Choose a more efficient variable Higher V DD yields a lower energy solution EEM216A.:. Fall 21 Lecture 8: Energy Delay D. Optimization Markovic / Slide 24 24

An Update: Slope Aware Delay Model LE overestimates delay Assumes equal slopes Add slope correction K i : gate &V V DD dependentd N g h g h D = gi hi + pi i= 1 Ki i i i 1 i 1 EEM216A.:. Fall 21 Lecture 8: Energy Delay D. Optimization Markovic / Slide 25 25 esult: Adder Example 16 bits, output load: C L = 512 ff (total FO ~ 256) Logical Effort: largely overestimates non critical paths rnal power (µw) Inter 5 45 4 35 3 A Synthesized Adder B Simulation LE with Slope Corr Synthesis Timing Original LE Model Matlab Optimized Adders 25.8 1 1.2 1.4 1.6 1.8 2 Delay (normalized to the synthesized adder) [C.C. Wang, D. Markovic, TCAS-II, Aug 9] EEM216A.:. Fall 21 Lecture 8: Energy Delay D. Optimization Markovic / Slide 26 26

Lessons from Circuit Optimization Sensitivity based optimization framework Equal marginal costs Energy efficient design To reduce energy, it is not always best to reduce V DD Effectiveness of tuning variables Sizing is the most effective for small delay increments Vdd is better for large delay increments Peak performance is VEY power inefficient About 7% energy reduction for 2% delay penalty Limited performance range of tuning variables Additional variables for higher energy efficiency EEM216A.:. Fall 21 Lecture 8: Energy Delay D. Optimization Markovic / Slide 27 27 Choosing Circuit Topology: Optimal egister? 64 bit ALU C in =2 E G C in =1 in b=8 Add C L =32 Cp D S Cp Cp (a) Cycle-latch (CL) Cycle latch (CL) Q Clk Clk Clk 1 Clk 1SS Q S Q D M M Clk Clk 1 Clk 1 Clk Clk 1 Clk (b) Static Master-slave Latch-pair (SMS) Static Master Slave Latch Pair (SMS) Given energy delay tradeoff for adder and register (two register options), what is the best energy delay tradeoff in the ALU? EEM216A.:. Fall 21 Lecture 8: Energy Delay D. Optimization Markovic / Slide 28 28

Balancing Sensitivity Across Circuit Blocks Upsize lower activity blocks (Add) to save energy 2.5 ALU Add Energy (norm.) 2 1.5 1.5 eg CL SMS Add CL SMS 5 1 15 2 25 Delay (FO4) S Add = S eg Energy eg EEM216A.:. Fall 21 Lecture 8: Energy Delay D. Optimization Markovic / Slide 29 29 Micro Architectural Optimization E-D tradeoff par, t-mux Area E-D trade-off Vdd, Vth, W Macro Arch. Micro Arch. Circuit interleaving folding E parallel pipeline time-mux mux # of bits throughput algorithm delay cct topology EEM216A.:. Fall 21 Lecture 8: Energy Delay D. Optimization Markovic / Slide 3 3 E W Vth Vdd

educing the Supply Voltage (while maintaining performance) Concurrency: trading off clock frequency versus area to reduce power f F1 eference design F2 : register, F1,F2: combinational logic (adders, ALUs, etc) [Chandrakasan et al., JSSC 92] C : average switching capacitance EEM216A.:. Fall 21 Lecture 8: Energy Delay D. Optimization Markovic / Slide 31 31 A Parallel Implementation unning slower lowers required V DD (quadratic power reduction) F1 F2 f /2 F1 F2 Almost cancels f /2 EEM216A.:. Fall 21 Lecture 8: Energy Delay D. Optimization Markovic / Slide 32 32

Parallelism Example (9nm CMOS) Power (assuming ov par = 7.5%) t p (norm.) 5 4.5 4 3.5 3 2.5 2 15 1.5 1.5.6.7.8.9 1 V DD (norm.) From J. abaey (UCB) How many levels of parallelism? EEM216A.:. Fall 21 Lecture 8: Energy Delay D. Optimization Markovic / Slide 33 33 The More Parallel the Better? Total Energy eference Parallel From J. abaey (UCB) Supply voltage, V DD Leakage and overhead start to dominate at high levels of parallelism, causing min E to increase Optimum voltage also increases with parallelism EEM216A.:. Fall 21 Lecture 8: Energy Delay D. Optimization Markovic / Slide 34 34

Increasing use of Concurrency Saturates P Fixed Throughput Nominal design (no concurrency) Overhead + leakage Concurrency P min Only option: educe V TH as well! But: Must consider Leakage V DD From J. abaey (UCB) EEM216A.:. Fall 21 Lecture 8: Energy Delay D. Optimization Markovic / Slide 35 35 A Pipelined Implementation Shallower logic reduces required supply voltage F1 F2 f f This example assumes equal V DD for par / pipe designs Assuming ov pipe = 1% EEM216A.:. Fall 21 Lecture 8: Energy Delay D. Optimization Markovic / Slide 36 36

Mapping into the Energy Delay Space Energy delay for ALU realizations with varying parallelism E Op P = 3 P =4 P = 5 P = 2 erence Fixed throughput Minimum EDP Increasing parallelism (P) Delay = 1/Throughput Level of concurrency depends on target performance ule of thumb: If speed exceeds MEP point, add parallelism EEM216A.:. Fall 21 Lecture 8: Energy Delay D. Optimization Markovic / Slide 37 37 E Op / nominal E Op Leakage is Not Necessarily a Bad Thing E Op for three architectures (, par, pipe) of ALU Set V TH, adjust V DD to maintain performance, plot E op 1.8 V th -18mV max.81v dd.6 V th -95mV max.4.57v dd Topology Inv Add Dec 2.2 nominal V -14mV th (E 8 5 2 parallel max Lk /E Sw ) opt.8.5.2.52v pipeline dd Adapt to process and 1-2 1-1 1 1 1 activity variations E Leakage /E Switching Optimal designs have high leakage (E Lk /E Sw.5)! EEM216A.:. Fall 21 Lecture 8: Energy Delay D. Optimization Markovic / Slide 38 38

Parallelism and Pipelining in E D Space f A B A B f f erence (a) 2 A B A B f f f f (c) pipeline pipeline 2 parallel (b) parallel f Ene ergy/op erence It is important to link back to E D tradeoff parallel/pipeline Time/Op EEM216A.:. Fall 21 Lecture 8: Energy Delay D. Optimization Markovic / Slide 39 39 Time Multiplexing A f f f f A 2f 2f f f f f (a) erence for time-mux (b) time-multiplex erence A time mux Energ gy/op erence time mux For throughputs below min EDP: Absorb unused time slack by increasing Clk freq. (and V DD ) Again comes with some area and capacitance overhead Time/Op EEM216A.:. Fall 21 Lecture 8: Energy Delay D. Optimization Markovic / Slide 4 4

Energy Area Tradeoff High throughput: Parallelism = Large Area 4 3 2 1 1 2 1 3 1 4 1 5 64 b ALU Max Eop A = A = 1 5 1 3 A A Ttarget Low throughput: Time Mux = Small Area EEM216A.:. Fall 21 Lecture 8: Energy Delay D. Optimization Markovic / Slide 41 41 Putting it All Together Balance logic depth (L d ) within a block, adjust latency to reach T Clk Apply V DD and W optimization to the underlying pipelines Micro Arch. (block) Circuit (datapath) Latency add Speed Power Area mult Ld W Energy Min delay S W =,S Vdd =2 W W opt @ V DD S W =S Vdd =2 V DD scaling Target delay S W =S Vdd <2 Cycle time (T Clk ) V DD Delay EEM216A.:. Fall 21 Lecture 8: Energy Delay D. Optimization Markovic / Slide 42 42

Motivation for System Level Optimization Optimizations at the architecture or system level can enable more effective power minimization at the circuit level (while maintaining performance), such as Enabling a reduction in supply voltage educing the effective switching capacitance for a given function (physical capacitance, activity) educing the switching rates educing leakage Optimizations at higher abstraction levels tend to have greater potential impact While circuit techniques may yield improvements in the 1 5% range, architecture and algorithm optimizations have reported orders of magnitude power reduction EEM216A.:. Fall 21 Lecture 8: Energy Delay D. Optimization Markovic / Slide 43 43 Some Energy Inspired Design Guidelines For maximum performance Maximize use of concurrency at the cost of area For given performance Optimal amount of concurrency for minimum energy For given energy Least amount of concurrency that meets performance goals For minimum energy Solution with minimum overhead (direct mapping between function and architecture) EEM216A.:. Fall 21 Lecture 8: Energy Delay D. Optimization Markovic / Slide 44 44