EEC 216 Lecture #2: Metrics and Logic Level Power Estimation. Rajeevan Amirtharajah University of California, Davis

Similar documents
EEC 216 Lecture #3: Power Estimation, Interconnect, & Architecture. Rajeevan Amirtharajah University of California, Davis

CMPEN 411 VLSI Digital Circuits Spring Lecture 14: Designing for Low Power

EE241 - Spring 2000 Advanced Digital Integrated Circuits. Announcements

CSE493/593. Designing for Low Power

Design for Manufacturability and Power Estimation. Physical issues verification (DSM)

EE241 - Spring 2001 Advanced Digital Integrated Circuits

Lecture 8-1. Low Power Design

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

Chapter 8. Low-Power VLSI Design Methodology

CMPEN 411 VLSI Digital Circuits Spring Lecture 19: Adder Design

EE 447 VLSI Design. Lecture 5: Logical Effort

ECE 407 Computer Aided Design for Electronic Systems. Simulation. Instructor: Maria K. Michael. Overview

Integrated Circuits & Systems

ECE321 Electronics I

LOGIC CIRCUITS. Basic Experiment and Design of Electronics. Ho Kyung Kim, Ph.D.

EE 466/586 VLSI Design. Partha Pande School of EECS Washington State University

Low Power Design Methodologies and Techniques: An Overview

Lecture 6: Logical Effort

Objective and Outline. Acknowledgement. Objective: Power Components. Outline: 1) Acknowledgements. Section 4: Power Components

Testability. Shaahin Hessabi. Sharif University of Technology. Adapted from the presentation prepared by book authors.

CMPEN 411 VLSI Digital Circuits Spring 2012 Lecture 17: Dynamic Sequential Circuits And Timing Issues

ECE429 Introduction to VLSI Design

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

Novel Devices and Circuits for Computing

CMPEN 411 VLSI Digital Circuits Spring Lecture 21: Shifters, Decoders, Muxes

Name: Answers. Mean: 83, Standard Deviation: 12 Q1 Q2 Q3 Q4 Q5 Q6 Total. ESE370 Fall 2015

VLSI Design I; A. Milenkovic 1

EE141-Fall 2011 Digital Integrated Circuits

! Charge Leakage/Charge Sharing. " Domino Logic Design Considerations. ! Logic Comparisons. ! Memory. " Classification. " ROM Memories.

Area-Time Optimal Adder with Relative Placement Generator

Very Large Scale Integration (VLSI)

Fault Modeling. 李昆忠 Kuen-Jong Lee. Dept. of Electrical Engineering National Cheng-Kung University Tainan, Taiwan. VLSI Testing Class

Slide Set 6. for ENEL 353 Fall Steve Norman, PhD, PEng. Electrical & Computer Engineering Schulich School of Engineering University of Calgary

CSE477 VLSI Digital Circuits Fall Lecture 20: Adder Design

Lecture 7: Logic design. Combinational logic circuits

! Memory. " RAM Memory. ! Cell size accounts for most of memory array size. ! 6T SRAM Cell. " Used in most commercial chips

EE141- Fall 2002 Lecture 27. Memory EE141. Announcements. We finished all the labs No homework this week Projects are due next Tuesday 9am EE141

Section 3: Combinational Logic Design. Department of Electrical Engineering, University of Waterloo. Combinational Logic

COMP 103. Lecture 16. Dynamic Logic

Chapter 5 CMOS Logic Gate Design

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

ΗΜΥ 307 ΨΗΦΙΑΚΑ ΟΛΟΚΛΗΡΩΜΕΝΑ ΚΥΚΛΩΜΑΤΑ Εαρινό Εξάμηνο 2018

For smaller NRE cost For faster time to market For smaller high-volume manufacturing cost For higher performance

Lecture 8: Combinational Circuit Design

VLSI Design. [Adapted from Rabaey s Digital Integrated Circuits, 2002, J. Rabaey et al.] ECE 4121 VLSI DEsign.1

Chapter 2 Fault Modeling

EE141Microelettronica. CMOS Logic

L16: Power Dissipation in Digital Systems. L16: Spring 2007 Introductory Digital Systems Laboratory

Clock signal in digital circuit is responsible for synchronizing the transfer to the data between processing elements.

Logical Effort: Designing for Speed on the Back of an Envelope David Harris Harvey Mudd College Claremont, CA

Homework 4 due today Quiz #4 today In class (80min) final exam on April 29 Project reports due on May 4. Project presentations May 5, 1-4pm

9. Datapath Design. Jacob Abraham. Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017

Vectorized 128-bit Input FP16/FP32/ FP64 Floating-Point Multiplier

Digital Integrated Circuits A Design Perspective

Digital Logic. CS211 Computer Architecture. l Topics. l Transistors (Design & Types) l Logic Gates. l Combinational Circuits.

ESE570 Spring University of Pennsylvania Department of Electrical and System Engineering Digital Integrated Cicruits AND VLSI Fundamentals

EEC 118 Lecture #16: Manufacturability. Rajeevan Amirtharajah University of California, Davis

EE115C Winter 2017 Digital Electronic Circuits. Lecture 6: Power Consumption

Hw 6 and 7 Graded and available Project Phase 2 Graded Project Phase 3 Launch Today

LOGIC CIRCUITS. Basic Experiment and Design of Electronics

ISSN (PRINT): , (ONLINE): , VOLUME-4, ISSUE-10,

VLSI. Faculty. Srikanth

EE 466/586 VLSI Design. Partha Pande School of EECS Washington State University

Floating Point Representation and Digital Logic. Lecture 11 CS301

Topics. Dynamic CMOS Sequential Design Memory and Control. John A. Chandy Dept. of Electrical and Computer Engineering University of Connecticut

ECEN 248: INTRODUCTION TO DIGITAL SYSTEMS DESIGN. Week 9 Dr. Srinivas Shakkottai Dept. of Electrical and Computer Engineering

EECS150 - Digital Design Lecture 24 - Arithmetic Blocks, Part 2 + Shifters

Computer Science. 19. Combinational Circuits. Computer Science COMPUTER SCIENCE. Section 6.1.

VLSI Design, Fall Logical Effort. Jacob Abraham

Lecture 4. Adders. Computer Systems Laboratory Stanford University

CSE370: Introduction to Digital Design

ESE570 Spring University of Pennsylvania Department of Electrical and System Engineering Digital Integrated Cicruits AND VLSI Fundamentals

VLSI Design Verification and Test Simulation CMPE 646. Specification. Design(netlist) True-value Simulator

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING QUESTION BANK

CS470: Computer Architecture. AMD Quad Core

Lecture 2: Metrics to Evaluate Systems

Chapter 2 Boolean Algebra and Logic Gates

EECS 427 Lecture 8: Adders Readings: EECS 427 F09 Lecture 8 1. Reminders. HW3 project initial proposal: due Wednesday 10/7

Hardware Design I Chap. 4 Representative combinational logic

ALU A functional unit

EEC 118 Lecture #6: CMOS Logic. Rajeevan Amirtharajah University of California, Davis Jeff Parkhurst Intel Corporation

MODULE III PHYSICAL DESIGN ISSUES

Lecture 22 Chapters 3 Logic Circuits Part 1

Lecture 34: Portable Systems Technology Background Professor Randy H. Katz Computer Science 252 Fall 1995

CSE140: Components and Design Techniques for Digital Systems. Logic minimization algorithm summary. Instructor: Mohsen Imani UC San Diego

A Novel LUT Using Quaternary Logic

Dynamic Combinational Circuits. Dynamic Logic

Intro To Digital Logic

VLSI Design I; A. Milenkovic 1

Vidyalankar S.E. Sem. III [CMPN] Digital Logic Design and Analysis Prelim Question Paper Solution

Lecture 7 Circuit Delay, Area and Power

Miscellaneous Lecture topics. Mary Jane Irwin [Adapted from Rabaey s Digital Integrated Circuits, 2002, J. Rabaey et al.]

EECS 151/251A Homework 5

Design of System Elements. Basics of VLSI

S No. Questions Bloom s Taxonomy Level UNIT-I

CARNEGIE MELLON UNIVERSITY DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING DIGITAL INTEGRATED CIRCUITS FALL 2002

Lecture 5: DC & Transient Response

CMPE12 - Notes chapter 1. Digital Logic. (Textbook Chapter 3)

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: Digital Logic

Transcription:

EEC 216 Lecture #2: Metrics and Logic Level Power Estimation Rajeevan Amirtharajah University of California, Davis

Announcements PS1 available online tonight R. Amirtharajah, EEC216 Winter 2008 2

Outline Announcements Aside: Environmental Impact of Electronics Finish Lecture 1 Topics Intersignal Correlations Glitches High Level Power Estimation Behavioral Level Power Estimation Architectural Level Power Estimation R. Amirtharajah, EEC216 Winter 2008 3

Power Consumption at the System Level 500 MHz Pentium III with 17 in. Monitor: 150-200 W Server: 300 W Mainframe: 10-20 kw Author s home office: 2 CPUs, 2 monitors, laptop, 3 printers, scanner, plus misc. peripherals Nameplate rating: 2.4 kw Max power (incl. peripherals): 700 W Typical usage: 150-170 W Average over 10 days: 77 W (9% of total consumption) From B. Hayes, The Computer and the Dynamo, American Scientist, Sep-Oct 2001 R. Amirtharajah, EEC216 Winter 2008 4

Power Consumption at the National Level All office equipment consumes 74 TW-hours / year (about 2% US total) Adding telecoms increases total to 3.2 % Power down and sleep modes could save 23-40 TWh Technology has dramatically improved power cost of computing ENIAC (1940s): 18,000 vacuum tubes, 174 kw, roughly 10 W / tube Today s microprocessors: 100 M transistors, 100 W, roughly 1 μw / transistor (would draw 10 GW if no change from 1940s) From B. Hayes, The Computer and the Dynamo, American Scientist, Sep-Oct 2001 R. Amirtharajah, EEC216 Winter 2008 5

Outline Announcements Aside: Environmental Impact of Electronics Finish Lecture 1 Topics Intersignal Correlations Glitches High Level Power Estimation Behavioral Level Power Estimation Architectural Level Power Estimation R. Amirtharajah, EEC216 Winter 2008 6

Outline Announcements Aside: Environmental Impact of Electronics Finish Lecture 1 Topics Intersignal Correlations Glitches High Level Power Estimation Behavioral Level Power Estimation Architectural Level Power Estimation R. Amirtharajah, EEC216 Winter 2008 7

Permissions to Use Conditions & Acknowledgment Permission is granted to copy and distribute this slide set for educational purposes only, provided that the complete bibliographic citation and following credit line is included: "Copyright 2002 J. Rabaey et al." Permission is granted to alter and distribute this material provided that the following credit line is included: "Adapted from (complete bibliographic citation). Copyright 2002 J. Rabaey et al." This material may not be copied or distributed for commercial purposes without express written permission of the copyright holders. Slides 9-20, 22-5 Adapted from CSE477 VLSI Digital Circuits Lecture Slides by Vijay Narayanan and Mary Jane Irwin, Penn State University R. Amirtharajah, EEC216 Winter 2008 8

NOR Gate Transition Probabilities Switching activity is a strong function of the input signal statistics P A, P B are the probabilities that inputs A, B are 1 A B A B C L 0 P A 1 0 1 P B P 0 1 = P 0 x P 1 = (1-(1-P A )(1-P B )) (1-P A )(1-P B ) R. Amirtharajah, EEC216 Winter 2008 9

Transition Probabilities for Basic Gates NOR OR NAND AND XOR P 0 1 = P x out=0 P out=1 (1 - (1 - P A )(1 - P B )) x (1 - P A )(1 - P B ) (1 - P A )(1 - P B ) x (1 - (1 - P A )(1 - P B )) P A P B x (1 - P A P B ) (1 - P A P B ) x P A P B (1 - (P A + P B -2P A P B )) x (P A + P B -2P A P B ) 0.5 0.5 A B For X: P 0 1 = For Z: P 0 1 = X Z R. Amirtharajah, EEC216 Winter 2008 10

Transition Probabilities for Basic Gates NOR OR NAND AND XOR P 0 1 = P x out=0 P out=1 (1 - (1 - P A )(1 - P B )) x (1 - P A )(1 - P B ) (1 - P A )(1 - P B ) x (1 - (1 - P A )(1 - P B )) P A P B x (1 - P A P B ) (1 - P A P B ) x P A P B (1 - (P A + P B -2P A P B )) x (P A + P B -2P A P B ) 0.5 0.5 A B For X: P 0 1 = P x 0 P 1 = (1-P A ) P A = 0.25 For Z: P 0 1 = P 0 x P 1 = (1-P X P B ) P X P B X Z = 3/16 R. Amirtharajah, EEC216 Winter 2008 11

Problems With Input-to-Output Evaluation Signal and transition probabilities are progressively evaluated from input to output node Straightforward approach determined by circuit topology Does not deal with feedback (for example in sequential circuits) Assumes input signal probabilities are independent 0.5 A For X: P 0 1 = For Z: P 0 1 = X Z R. Amirtharajah, EEC216 Winter 2008 12

Problems With Input-to-Output Evaluation Signal and transition probabilities are progressively evaluated from input to output node Straightforward approach determined by circuit topology Does not deal with feedback (for example in sequential circuits) Assumes input signal probabilities are independent 0.5 A X Z For X: P 0 1 = P 0 x P 1 = (1-P A ) P A = 0.25 For Z: P 0 1 = P 0 x P 1 = (1-P X P A ) P X P A = 3/16? R. Amirtharajah, EEC216 Winter 2008 13

Problems With Input-to-Output Evaluation Signal and transition probabilities are progressively evaluated from input to output node Straightforward approach determined by circuit topology Does not deal with feedback (for example in sequential circuits) Assumes input signal probabilities are independent 0.5 A X Z For X: P 0 1 = P 0 x P 1 = (1-P A ) P A = 0.25 For Z: P 0 1 = P 0 x P 1 = (1-P X P A ) P X P A = 0! Z = A A = 0 R. Amirtharajah, EEC216 Winter 2008 14

Intersignal Correlations Determining switching activity is complicated by the fact that signals exhibit correlation in space and time, e.g. by circuits with reconvergent fanout 0.5 0.5 A B X Z Reconvergent fanout: R. Amirtharajah, EEC216 Winter 2008 15

Intersignal Correlations Determining switching activity is complicated by the fact that signals exhibit correlation in space and time, e.g. by circuits with reconvergent fanout 0.5 0.5 A B P(X=1) = (1-0.5)(1-0.5)x(1-(1-0.5)(1-0.5)) = 3/16 X Z Reconvergent fanout: P(Z=1) = (1-3/16 x 0.5) x (3/16 x 0.5) = 0.085? R. Amirtharajah, EEC216 Winter 2008 16

Intersignal Correlations Determining switching activity is complicated by the fact that signals exhibit correlation in space and time, e.g. by circuits with reconvergent fanout 0.5 0.5 A B P(X=1 B=1) = 1 X Z Reconvergent fanout: P(Z=1) = P(X=1 B=1) x P(B=1) = 0.5! P(Z=1) = P(B=1) x P(X=1 B=1) Have to use conditional probabilities! R. Amirtharajah, EEC216 Winter 2008 17

Logic Restructuring Logic restructuring: changing the topology of a logic network to reduce transitions AND: P 0 1 = P 0 x P 1 = (1 - P A P B ) x P A P B 0.5 0.5 (1-0.25)*0.25 = 3/16 A A W 7/64 0.5B B X 15/256 0.5 0.5 C C 0.5 D F D 0.5 0.5 3/16 Y Z 3/16 15/256 F Chain implementation has a lower overall switching activity than the tree implementation for random inputs Ignores glitching effects R. Amirtharajah, EEC216 Winter 2008 18

Input Ordering Two alternate implementations with inputs reordered, same function so output stats identical 0.5 A B 0.2 C 0.1 X F 0.2 B C 0.1 A 0.5 X F Beneficial to postpone the introduction of signals with a high transition rate (signals with signal probability close to 0.5) R. Amirtharajah, EEC216 Winter 2008 19

Input Ordering Two alternate implementations with inputs reordered, same function so output stats identical 0.5 A B 0.2 (1-0.5x0.2)x(0.5x0.2)=0.09 X C F 0.1 (1-0.2x0.1)x(0.2x0.1)=0.0196 0.2 B X C 0.1 A F 0.5 Beneficial to postpone the introduction of signals with a high transition rate (signals with signal probability close to 0.5) R. Amirtharajah, EEC216 Winter 2008 20

Time Multiplexing Resources Time multiplexing can conserve area at the expense of increased switched capacitance Consider the case of parallel vs. serial data transmission: more transitions due to muxing A p A 1 P0 1 0 C L A P0 1 0.5 A B p B 0 P0 1 0 C L B C L B R. Amirtharajah, EEC216 Winter 2008 21

Outline Announcements Aside: Environmental Impact of Electronics Finish Lecture 1 Topics Intersignal Correlations Glitches High Level Power Estimation Behavioral Level Power Estimation Architectural Level Power Estimation R. Amirtharajah, EEC216 Winter 2008 22

Glitching in Static CMOS Networks Gates have a nonzero propagation delay resulting in spurious transitions or glitches (dynamic hazards) Glitch: node exhibits multiple transitions in a single cycle before settling to the correct logic value A B C X Z ABC 101 000 X Z Unit Delay R. Amirtharajah, EEC216 Winter 2008 23

Glitching in Static CMOS Networks Gates have a nonzero propagation delay resulting in spurious transitions or glitches (dynamic hazards) Glitch: node exhibits multiple transitions in a single cycle before settling to the correct logic value A B C X Z ABC 101 000 X Z Unit Delay R. Amirtharajah, EEC216 Winter 2008 24

Glitching In A Ripple Carry Adder Cin S15 3 S14 S2 S1 S0 S Output Voltage (V) 2 1 0 S3 S4 S15 Cin S2 S5 S10 S1 S0 0 2 4 6 8 10 12 Time (ps) R. Amirtharajah, EEC216 Winter 2008 25

Balanced Delay Paths to Reduce Glitching Glitching is due to a mismatch in the path lengths in the logic network; if all input signals of a gate change simultaneously, no glitching occurs 0 0 0 F 1 1 F2 2 0 0 F 1 1 0 F 3 0 0 F 2 1 F 3 So equalize the lengths of timing paths through logic Example: F 1 and F 2 have unit delay R. Amirtharajah, EEC216 Winter 2008 26

Switching Activity in Sequential Circuits Estimating activity more difficult for sequential than combinational circuits Due to activity of state nodes typically being unknown at beginning of estimation process Compute activity assuming steady state Assume sequential circuits implement FSM where states follow Markov process Solve for probabilities of states using state transition probabilities Best implemented in CAD tools R. Amirtharajah, EEC216 Winter 2008 27

Some Notes on Tools Switch-level simulators (e.g. IRSIM) Replace MOSFETs with ideal switch and equivalent resistance in series for driver Uses capacitors as loads Very fast, captures switching activity, not very accurate Lookup Table simulators (e.g. PowerMill, Nanosim) Create lookup table for device I-V curve from Spice, rather than implement I DS equations Fast, more accurate than switches, sometimes fails in odd operating regimes like subthreshold Spice-like analog circuit simulations Most accurate accuracy, most time consuming R. Amirtharajah, EEC216 Winter 2008 28

Tool Caveats Beware of Garbage In-Garbage Out problems Rely heavily on process modeling for device I-V curves Generating I-V curve lookup tables can introduce errors Inconsistent simulations: e.g., circuit works in Spice but not in LUT-based tool Sanity check simulation outputs Make sure peak currents consistent with total device width in design by estimating on resistance of every pullup/pulldown network in parallel Convert power number to equivalent switched capacitance and compare to hand estimate of total capacitance in design R. Amirtharajah, EEC216 Winter 2008 29

Outline Announcements Aside: Environmental Impact of Electronics Finish Lecture 1 Topics Intersignal Correlations Glitches High Level Power Estimation Behavioral Level Power Estimation Architectural Level Power Estimation R. Amirtharajah, EEC216 Winter 2008 30

Alternative Power Estimation Flows Large # Input Patterns Circuit Simulator Large # Current Waveforms Average Average Probabilities & Activities Analysis Tools Power Top approach very compute-intensive Difficult to determine minimum number of simulation runs, long time to simulate large designs Provides greatest accuracy R. Amirtharajah, EEC216 Winter 2008 31

High-Level Power Estimation Estimate power early in design process Feedback on system partitioning, architecture, instruction set choice Rough estimate enables optimization of major power consuming modules Only relative accuracy required Final design stages use device-based power estimation, absolute accuracy required Logic synthesis or custom approach generates design Switch-level simulations Lookup table based simulations SPICE-like analog circuit simulations R. Amirtharajah, EEC216 Winter 2008 32

Top Level Power Estimation System Level Power Estimation Allows designer to analyze power impact of system partitioning among software, FPGAs, ASICs, etc. Spreadsheet analysis using library of models for entire components Models created from measurements, low-level power estimation Instruction Level Power Estimation Run assembly instructions on target processor and measure power Create database of power cost for individual instructions, pairs, maybe entire frequently used traces Add in cache misses, pipeline stalls, etc. R. Amirtharajah, EEC216 Winter 2008 33

Outline Announcements Aside: Environmental Impact of Electronics Finish Lecture 1 Topics Intersignal Correlations Glitches High Level Power Estimation Behavioral Level Power Estimation Architectural Level Power Estimation R. Amirtharajah, EEC216 Winter 2008 34

Behavioral Level Power Estimation Start at the behavioral specification or algorithm level Typically assume some architecture for execution Must predict memory configuration, number of accesses, bus architecture, average wire length, number of bus transactions, control path complexity Components of power include capacitance and activity Physical capacitance can be computed using estimated gate count, previous design experiences Two styles of activity prediction Static: based on estimating access frequency of resources Dynamic: based on profiling simulations with user supplied inputs R. Amirtharajah, EEC216 Winter 2008 35

C r P = r Static Activity Prediction Start with behavioral specification of function (HDL, C) Static analysis of control-data flow graph to determine frequency of resource accesses f r Fast: requires only one analysis pass through program Ignores data dependencies, must use approximations or profiling (dynamic activity prediction) results to incorporate these C V 2 f r DD r { all datapath, control, memory, bus resources} determined by empirical models (for example, benchmarking previous designs) Low absolute accuracy but captures general trends Dynamic prediction slow since requires many sims R. Amirtharajah, EEC216 Winter 2008 36

Outline Announcements Aside: Environmental Impact of Electronics Finish Lecture 1 Topics Intersignal Correlations Glitches High Level Power Estimation Behavioral Level Power Estimation Architectural Level Power Estimation R. Amirtharajah, EEC216 Winter 2008 37

Architecture Level Power Estimation Start at register-transfer (RTL) level Primitives are functional units: adders, multipliers, controllers, register files, memory arrays Gate, circuit, layout level details not yet specified Floorplan may not be available, must estimate interconnect and clock distribution Two strategies: analytical and empirical methods Each technique can be subdivided further Goal: Relate power consumption of RTL to physical capacitance and switching activity R. Amirtharajah, EEC216 Winter 2008 38

Analytical Method 1: Complexity-Based Complexity of a chip architecture can be described in terms of gate equivalents Primitives are specified as the number of reference gates required to implement Example: 256 AND gates + 256 full adders = 16 x 16 array multiplier Gate-equivalents specified in database or provided by user Power estimated by multiplying gate equivalent by P = average power per gate GE ( ) i i { functional units} E typ + C i L V 2 DD f A i R. Amirtharajah, EEC216 Winter 2008 39

Gate Equivalent Average Power P = i GEi { functional units} ( ) E + C i V 2 typ L DD f A i Model Parameters: GE i : number of gate equivalents in unit i E typ C i L A i : average energy consumed by gate : average capacitive load incl. fanout & wire : activity (average percentage of gates switching Improve accuracy by including models for different gates, circuit styles, layout techniques, special blocks R. Amirtharajah, EEC216 Winter 2008 40

Example: SRAM Array 0 1 A 0 A 1 Memory Array A p n 2 1 0 1 2 k 1 k = p n D0 D 1 R. Amirtharajah, EEC216 Winter 2008 41 Dm

P Memory Cell Array Power Model 2 k = + 2 memcell w col 2 ( ) c l n C V V f Model Parameters: 2 k : Number of cells in a row c w : bit line wire capacitance per unit length l col : length of column (bit line) C : wordline transistor drain capacitance cd : bit line voltage swing (often very small) V bl Can use similar models to optimize memory partitioning R. Amirtharajah, EEC216 Winter 2008 42 cd DD bl

Tradeoffs of Complexity-Based Method Requires very little information to obtain estimate Energy per gate based on technology parameters Number of gate equivalents from library or experience Activity factor supplied by user, possibly from experience Interconnect length and capacitance modeled by derivative of Rent s Rule Does not model activity very accurately Misses data dependence User must supply good estimate per functional unit to analyze architectural tradeoffs accurately R. Amirtharajah, EEC216 Winter 2008 43

Analytical Method 2: Activity-Based Methods attempt to model activity more accurately than complexity methods Idea is to relate power of functional unit to amount of computation performed Use information theory concept of entropy as metric for computational work Power is proportional to: Capacitance x Activity Area x Entropy Output Entropy of Functional Unit with m outputs: H o = 2 m i= 1 p R. Amirtharajah, EEC216 Winter 2008 44 i log 2 1 p i

Area and Entropy Estimates Functional unit with n inputs, m outputs: Area 2 n n H o, n 2 n H o, n 10 Average entropy of all nodes in functional unit (where H i is input entropy, similarly defined): 2 3 Entropy H + 2H n + m i o ( ) Assumes entropy decreases quadratically with logic depth R. Amirtharajah, EEC216 Winter 2008 45

Activity-Based Power Estimate Methodology 1. Run many RTL simulations to measure input and output entropies 2. Compute area and average node entropy 3. Compute average power No timing information so glitching power totally unaccounted for Implicitly assumes capacitance uniformly distributed over all circuit nodes Methods still topic of research, yet to demonstrate practical value R. Amirtharajah, EEC216 Winter 2008 46

Empirical Method 1: Fixed Activity Relate power consumption of RTL components to measured power of existing implementations (macromodeling) Best suited for library-based design implementation Example: Power Factor Approximation P = i k G i i { functional units} Model Parameters: k i : power factor constant G i : hardware complexity metric : activation frequency f i R. Amirtharajah, EEC216 Winter 2008 47 f i

Power Factor Approximation Example Try to estimate power for an N x N array multiplier Hardware complexity goes as square of input word length N Clock frequency f mult specified by algorithm and performance constraints Power factor k mult determined from past designs reported in literature (15 fw/bit 2 -Hz for 1.2 μm CMOS at 5 V supply) P = 2 k N f mult mult Ex: 32 x 32 multiply at 233 MHz consumes 3.58 mw Does not account for data dependence since power factor is assumed constant R. Amirtharajah, EEC216 Winter 2008 48

Empirical Method 2: Activity Sensitive Attempt to incorporate data activity statistics into power estimates Count number of bit transitions in input vector stream and multiply by experimentally determined constant fudge factor Develop activity model empirically through RTL simulations for typical input streams Construct multiple models depending on function, for example a datapath activity model and a control path activity model Classic example: Dual-Bit Type (DBT) model R. Amirtharajah, EEC216 Winter 2008 49

Dual-Bit Type Model Observation: fixed-point, 2 s-complement data streams often characterized by two distinct activity regions Data bits (LSBs) exhibit activity similar to uniformly distributed white noise Sign bits (MSBs) depend on sign transition probability, which is in turn related to temporal correlation ρ Different empirically determined coefficients characterize switched (effective) capacitance and complexity in data (C U and N U, respectively) and sign (C S and N S ) regions: P = ( N C + N C ) V 2 f U U S S DD R. Amirtharajah, EEC216 Winter 2008 50

DBT Data from Landman TCAD 96 P = 2 k N f mult mult R. Amirtharajah, EEC216 Winter 2008 51

Dual-Bit Type Model Breakpoints Can derive expressions for different bit region breakpoints based on data stream statistics such as mean (μ), variance (σ 2 ), and correlation (ρ): ρ = cov( X t, X ) 1 t σ 2 ( ) μ 3σ BP1 = log + 2 BP0 = log σ + ΔBP0 2 ΔBP0 = log 2 1 ρ 2 + ρ 8 R. Amirtharajah, EEC216 Winter 2008 52

Control Path Activity Model Observation: control path words often lack definite structure (unlike datapath) Words formed by concatenating independent fields or boolean flags, condition codes Use transition probabilities and activity factors for complex gates like we analyzed earlier Combine with complexity measurements as well Different empirically determined coefficients characterize switched (effective) capacitance (C I for inputs, C O for outputs) and complexity: P = ( N α C N + N α C N ) V 2 f I I I M N I,N O,N M are complexity of input, output, and state transition logic R. Amirtharajah, EEC216 Winter 2008 53 O O O M DD

Summary: Architectural Power Estimation Bridge gap between physical power estimation and conceptual level power estimation Start at register-transfer (RTL) level Primitives are functional units but gate, circuit, layout, floorplan details not yet specified Analytical methods aim to estimate power from physical primitives Estimate capacitance by gate count, area Determine activity from user input, entropy Empirical methods based on measurements of previous designs or ensemble of simulations Model datapath and control activity separately R. Amirtharajah, EEC216 Winter 2008 54

Next Topic: Interconnect Power Look at estimating interconnect power and driving long wires R. Amirtharajah, EEC216 Winter 2008 55