Low Power Design Methodologies and Techniques: An Overview

Similar documents
EEC 216 Lecture #2: Metrics and Logic Level Power Estimation. Rajeevan Amirtharajah University of California, Davis

EEC 216 Lecture #3: Power Estimation, Interconnect, & Architecture. Rajeevan Amirtharajah University of California, Davis

Design for Manufacturability and Power Estimation. Physical issues verification (DSM)

EE241 - Spring 2000 Advanced Digital Integrated Circuits. Announcements

Chapter 8. Low-Power VLSI Design Methodology

ECE 407 Computer Aided Design for Electronic Systems. Simulation. Instructor: Maria K. Michael. Overview

CMPEN 411 VLSI Digital Circuits Spring 2012 Lecture 17: Dynamic Sequential Circuits And Timing Issues

School of EECS Seoul National University

CMPEN 411 VLSI Digital Circuits Spring Lecture 14: Designing for Low Power

VLSI Design Verification and Test Simulation CMPE 646. Specification. Design(netlist) True-value Simulator

CSE493/593. Designing for Low Power

EE 466/586 VLSI Design. Partha Pande School of EECS Washington State University

Chapter 5 CMOS Logic Gate Design

Lecture 8-1. Low Power Design

EE115C Winter 2017 Digital Electronic Circuits. Lecture 6: Power Consumption

MODULE III PHYSICAL DESIGN ISSUES

EE241 - Spring 2001 Advanced Digital Integrated Circuits

Lecture 6 Power Zhuo Feng. Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 2010

Area-Time Optimal Adder with Relative Placement Generator

Lecture 23. Dealing with Interconnect. Impact of Interconnect Parasitics

EE141Microelettronica. CMOS Logic

LOGIC CIRCUITS. Basic Experiment and Design of Electronics. Ho Kyung Kim, Ph.D.

Clock signal in digital circuit is responsible for synchronizing the transfer to the data between processing elements.

L15: Custom and ASIC VLSI Integration

Power Dissipation. Where Does Power Go in CMOS?

Power Consumption in CMOS CONCORDIA VLSI DESIGN LAB

L16: Power Dissipation in Digital Systems. L16: Spring 2007 Introductory Digital Systems Laboratory

EECS 427 Lecture 11: Power and Energy Reading: EECS 427 F09 Lecture Reminders

Objective and Outline. Acknowledgement. Objective: Power Components. Outline: 1) Acknowledgements. Section 4: Power Components

Lecture 9: Clocking, Clock Skew, Clock Jitter, Clock Distribution and some FM

Testability. Shaahin Hessabi. Sharif University of Technology. Adapted from the presentation prepared by book authors.

VLSI Design I; A. Milenkovic 1

Very Large Scale Integration (VLSI)

Dynamic operation 20

Issues on Timing and Clocking

Last Lecture. Power Dissipation CMOS Scaling. EECS 141 S02 Lecture 8

Digital Integrated Circuits A Design Perspective

Chapter Overview. Memory Classification. Memory Architectures. The Memory Core. Periphery. Reliability. Memory

Vectorized 128-bit Input FP16/FP32/ FP64 Floating-Point Multiplier

Semiconductor memories

Lecture 34: Portable Systems Technology Background Professor Randy H. Katz Computer Science 252 Fall 1995

Lecture 2: CMOS technology. Energy-aware computing

Managing Physical Design Issues in ASIC Toolflows Complex Digital Systems Christopher Batten February 21, 2006

CSE241 VLSI Digital Circuits Winter Lecture 07: Timing II

SEMICONDUCTOR MEMORIES

DKDT: A Performance Aware Dual Dielectric Assignment for Tunneling Current Reduction

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

9/18/2008 GMU, ECE 680 Physical VLSI Design

Spiral 2 7. Capacitance, Delay and Sizing. Mark Redekopp

Lecture 8: Combinational Circuit Design

CMOS Logic Gates. University of Connecticut 181

ECE321 Electronics I

VLSI Signal Processing

Logical Effort: Designing for Speed on the Back of an Envelope David Harris Harvey Mudd College Claremont, CA

Datapath Component Tradeoffs

Introduction to CMOS VLSI Design (E158) Lecture 20: Low Power Design

EE 447 VLSI Design. Lecture 5: Logical Effort

Simulation of Logic Primitives and Dynamic D-latch with Verilog-XL

Dynamic Power Management under Uncertain Information. University of Southern California Los Angeles CA

CARNEGIE MELLON UNIVERSITY DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING DIGITAL INTEGRATED CIRCUITS FALL 2002

! Charge Leakage/Charge Sharing. " Domino Logic Design Considerations. ! Logic Comparisons. ! Memory. " Classification. " ROM Memories.

Reducing power in using different technologies using FSM architecture

THE INVERTER. Inverter


Vidyalankar S.E. Sem. III [CMPN] Digital Logic Design and Analysis Prelim Question Paper Solution

Where Does Power Go in CMOS?

Microprocessor Power Analysis by Labeled Simulation

Technology Mapping for Reliability Enhancement in Logic Synthesis

MODULE 5 Chapter 7. Clocked Storage Elements

PERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING QUESTION BANK

On Potential Design Impacts of Electromigration Awareness

A novel Capacitor Array based Digital to Analog Converter

Digital Integrated Circuits A Design Perspective. Semiconductor. Memories. Memories

Lecture 6: Logical Effort

Announcements. EE141- Fall 2002 Lecture 7. MOS Capacitances Inverter Delay Power

Errata of K Introduction to VLSI Systems: A Logic, Circuit, and System Perspective

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

Parallel Processing and Circuit Design with Nano-Electro-Mechanical Relays

University of Toronto. Final Exam

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

Digital Integrated Circuits 2nd Inverter

WARM SRAM: A Novel Scheme to Reduce Static Leakage Energy in SRAM Arrays

Energy Delay Optimization

Design of System Elements. Basics of VLSI

T st Cost Reduction LG Electronics Lee Y, ong LG Electronics 2009

CMOS Logic Gates. University of Connecticut 172

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

EE115C Winter 2017 Digital Electronic Circuits. Lecture 19: Timing Analysis

LOGIC CIRCUITS. Basic Experiment and Design of Electronics

Modeling and Optimization for Soft-Error Reliability of Sequential Circuits

Evaluating Power. Introduction. Power Evaluation. for Altera Devices. Estimating Power Consumption

Lecture 25. Dealing with Interconnect and Timing. Digital Integrated Circuits Interconnect

5.0 CMOS Inverter. W.Kucewicz VLSICirciuit Design 1

CMPEN 411 VLSI Digital Circuits Spring Lecture 19: Adder Design

C.K. Ken Yang UCLA Courtesy of MAH EE 215B

Amdahl's Law. Execution time new = ((1 f) + f/s) Execution time. S. Then:

! Memory. " RAM Memory. ! Cell size accounts for most of memory array size. ! 6T SRAM Cell. " Used in most commercial chips

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 18, NO. 7, JULY

Efficient Circuit Analysis under Multiple Input Switching (MIS) Anupama R. Subramaniam

Transcription:

Design Design Methodologies and Techniques: An Overview Department of EE-Systems University of Southern California Los Angeles CA 90064 pedram@ceng.usc.edu Outline Introduction Analysis/Estimation Techniques Synthesis/Optimization Techniques Summary Page /LP

Design What Is the Power Problem? Power: Biggest challenge facing the industry Ö Power goes up 2X per generation for high end CPU /analysis tools are ecellent at enabling tradeoffs Ö But what if performance cannot be traded-off? Ö Need innovative solutions to reduce Power 40 A264-300 HP PA8000 35 30 25 A2064A UltraSparc-67 PPro-50 HP PA7200 PPro200 MIPS R0000 Power(W) 20 5 0 5 SuperSparc2-90 PPC 604-20 PP-66 MIPS R5000 MIPS R4400 PP66 PPC 60-80 PP-00 PP-33 486-66 i486c-33 DX4 00 PPC603e-00 0 i386 i386c-33 '9 '93 '94 '95 '94 '96 [Source: Microprocessor Report] Process Technology Trend 997 999 200 2003 2006 2009 202 Min. Feature Size (μm) 0.25 0.8 0.5 0.3 0. 0.07 0.05 Threshold Voltage (V) 0.5 0.45 0.4 0.35 0.3 0.25 0.2 T o (nm) 5 4 3 3 2.5 C interconnect (af/μm) 50 36 29 2 20 6 4 Norminal I on (μa/μm)(nmos/pmos) 600/280 600/280 600/280 600/280 600/280 600/280 600/280 Ma I off (μa/μm) (For minimum L device) 3 3 3 0 0 Supply Voltage (V) 2.5.8.8.5.2 0.9 0.6 On-chip across-chip clock freq. (MHz) 350 526 727 928 08 468 827 Off-chip peripheral bus freq. (MHz) 75/75 00.263 00/362 25/464 25/554 50/734 50/93 Usable transistors (millions / cm 2 ) 8 4 6 24 40 64 00 Chip size (cm 2 ) 3 3.4 3.85 4.3 5.2 6.2 7.5 Chip pad count 256-800 300-976 352-93 43-458 524-968 666-2656 846-3578 Cost-Perf. Designs - notebooks, desktop personal computers, telecom Data etracted from NTRS 97 /LP Page 2

Design Design Technology Trend Power (watt) 000 Vcc (volt) 0 Current (Amp) 000 00 00 0. 0 30 00 70 Process Minimum Geometry (nm) Based on Information provided by Shakhar Borkar, Intel Data etracted from EDAR 99 Eample Applications Portable Electronics (PC, PDA, Wireless) IC Cost (Packaging and Cooling) Reliability (Electromigration, Latch-up) Signal Integrity (Switching Noise, DC Voltage Drop) Thermal Design Data Path Clock Where Does the Power Go? Varies from one design to net Memory Control, IO Data Path Clock Memory Control, IO /LP Page 3

Design What Has Worked? Voltage and process scaling (3/Generation) Design methodologies Ö Power management through HW/SW, trade area for lower power Architecture Design Power down techniques ÖClock gating, dynamic power management Dynamic voltage scaling based on workload Power conscious RT/ logic synthesis Better cell library design and resizing methods Ö Cap. reduction, threshold control, transistor layout Opportunities for Power Savings System Behavioral RT-Level Logic > 70 % 40-70 % 25-40% 5-25 % HW/SW Co-design Custom ISA Algorithm Design Communication Synthesis Scheduling, Binding Pipelining Behavioral Transformations Clock Gating, Precomputation Operand Isolation State Assignment, Retiming Logic Restructuring Technology Mapping, Rewiring Pin Ordering & Phase Assignment Physical 0-5 % Savings Fanout Optimization, Buffering Transistor Sizing, Placement Partitioning, Clock Tree Design Glitch Elimination Page 4 /LP

Design System Realistic Estimation Epectations Seconds > 50 % Instruction-Level Models IP Core Models Stochastic Models Program Simulation Architectural Minute 25-50 % Entropic Bounds Architectural Simulation I/O and Memory Accesses RT-Level Minutes 5-30% RT-Level Macromodels HDL Simulation Quick Synthesis Gate Hour 0-20 % Probabilistic Simulation Gate-Level Simulation Sampling and Compaction ASIC Library Models Transistor Hours Speed 5-0 % Error Parasitic Etraction Accurate Timing Analysis Circuit-Level Simulation Sources of Power Consumption in CMOS Power Consumption Active Power Standby Power Short-Circuit Capacitive Subthreshold P-N Junction Interconnect Gate Internal Diffusion /LP Page 5

Design Power Dissipation Equations CMOS circuits only dissipate power when node voltages are changing V DD C short circuit charge 2 DD P = 0.5CV fn + Q V fn + I SC DD leak f: frequency of clocking N: number of times gate switches in a clock cycle V DD leakage current Power Estimation Techniques Static (non-simulative) - useful for synthesis and architectural eploration Ö Probability-based Ö Entropy-based Dynamic (simulative) - useful for final power evaluation and validation Ö Direct (flat and hierarchical) Ö Sampling-based Ö Compaction-based Hybrid (high-level simulation + low-level analytical model evaluation) Ö Power macromodels for datapath, control, memory Ö Instruction-level models for microprocessors, DSPs /LP Page 6

Design Probabilistic Power Estimation Capture primary input and register output statistics Ö E.g., signal probability, switching activity, spatiotemporal correlation Obtain load capacitances for all nodes Ö Estimate at the RT and logic level Ö Etract after layout Propagate input statistics throughout the circuit Ö Account for reconvergent fanout, Ö Use appropriate delay model Ö Iterate in the case of finite state machines Calculate node power and hence total power Zero Gate Delays 0 g 0 0 g 2 0 f 0 g 0 g 2 0 f g g 2 0 0 f 0 Page 7 /LP

Design General Gate Delays A gate may switch many times for a vector pair 2 Static Probabilities Define p g as static probability of g = A B C I J K For AND gate: p I = p A p B For OR gate: p J = - ( - p B ) ( - p C ) Above equations assume that A, B and C are uncorrelated Page 8 /LP

Design Computing Static Probabilities f = a b + b c Write f as a disjoint sum-of-products epression where each product-term has a null intersection with any other Then, f = a b + a b c p f = p a p b + ( - p a ) p b p c Spatial Correlations A B C I J K p K p I p J since I and J are correlated - both depend on B Need to compute static probabilities taking into account the spatial correlations Page 9 /LP

Design Temporal Correlations 0000000000 0000000000 I J K p K = p I p J For temporally uncorrelated data, sw K = 2 p I p J (- p I p J ) For temporally correlated data, this is not true E.g., every on input I is immediately followed by a 0 Need to compute switching probabilities taking into account the temporal correlations Sequential Circuits Primary Inputs Present State Lines PI PS Combinational Logic FF PO NS Primary Outputs Net State Lines Two issues for sequential circuits: Probability of machine being in a particular state Correlation in time: Given a primary input and present state, net state is uniquely determined by the net state combinational logic Solutions: Solve C-K Equations for calculating steady state probabilities Unroll the state machine Page 0 /LP

Design Reducing the Simulation Time RT-Level or Gate-Level Netlist Probabilistic Compaction Compacted Sequence Input Sequence Statistical Sampling Sampled Sequence Event or Cycle-Based Simulation Report Power Library Data Simple Random Sampling (SRS) Input vectors Generate one sample of k units NO Efficiency: depends on population characteristics and sampling procedure Do circuit level simulation Calculate mean power & confidence interval Converged? YES Difficult distributions Report Power /LP Page

Design SRS Results C7552 C6288 C535 C3540 C2670 C908 C355 C880 C432 Biased Sequence Random Sequence 0 2000 4000 6000 8000 Stratified Random Sampling (StRS) Estimate the transistor-level power, assuming that the zero-delay power estimate is readily available Input vectors (population) Zero-Delay Simulation of the Entire population Population Stratification Lower sample variance leads to faster convergence Random Sampling and PowerMill Simulation Report Power Convergent? YES NO /LP Page 2

Design Sampling on FSMs Method : Functional simulation + sampling on comb. circuit Input Seq. Logic Input Seq. + State Lines Logic FFs Method 2: Do machine warm-up for every sampling unit Warm-up vectors + sampling unit Logic FFs Warm-up sequence length can be quite large Probabilistic Compaction (PC) Sequence compaction : Generate a new, but shorter, input sequence which ehibits nearly the same spatiotemporal correlations as the initial sequence Probabilistic Model in out Initial Seq. Sequence Generation New Seq. Target Circuit /LP Page 3

Design PC by Eample Initial sequence 0000000 000000000 00000000 Avg. bit activity 23/6 Compacted sequence 0000 0000 0000 Avg. bit activity 3 /2 /2 8/5 Stochastic State Machine Model /4 6 /3 /4 7 /2 0 /2 /2 /2 5 /2 2/3 /2 4 /2 Comparison with SRS L = 4,000 Compaction Ratio 0 circuit C6288 C3540 C908 C355 C880 C432 0% 2% 4% 6% 8% SRS Hierarchical /LP Page 4

Design Compaction for FSMs Input Sequence Target Circuit n z n = out ( n, s n ) Markov Chain Logic p( n s n ) s n+ = net ( n, s n ) s n FFs Dynamic Markov Trees S = (0000, 000, 00, 00, 00, 00, 00, 00) DMT 0 2 6 2 2 3 2 3 2 3 3 3 3 3 4 4 3 3 4 DMT 2 6 2 2 3 2 3 2 3 3 3 3 3 4 4 3 3 4 5 5 5 5 2 6 6 3 6 6 7 7 3 2 3 7 7 8 8 8 8 2 3 2 Upper subtree Lower subtrees /LP Page 5

Design Higher Order DMTs A lag-k Markov chain which correctly models the input sequence, also models the joint k-step conditional probabilities of the primary inputs and state lines p(v i ) v i DMT 0 DMT p(v j v i ) v j DMT 2 p(v k v j v i ) v k High Order DMT Results s9234 s820 s5378 s423 s96 shiftreg planet mc dk7 bbara Compaction Ratio 0 Order Order 2 0% 0% 20% 30% 40% 50% 60% /LP Page 6

Design Power Characterization Transistor-Level Netlist Gate-Level Netlist SPICE Simulation Event-Driven Simulation SPICE Parameters RT-Level Power/Delay Characterization Data Gate-Level Power/Delay Characterization Data Power Macromodeling Training Set Module Macromodel Equation Form Model Variables Low - Level Simulation Regression Analysis Analytic Model Reduction Macro Models Regression Coefficients /LP Page 7

Design Dual Bit Type Model Consider a data path block: Sign bit Pwr = C 0 + C S + C 2 S 2 + C 3 S 3 + C 4 S 4 Module MSB region LSB region S S 2 : avg. switching activity of LSB (MSB) region of operand S 3 S 4 : avg. switching activity of LSB (MSB) region of operand 2 Bitwise Data Model Consider a random logic block: Pwr = C0 + inputs C i S i S i : avg. switching activity of input signal i Module More parameters lead to a higher degree of accuracy, but increase the computational overhead /LP Page 8

Design Cycle-Accurate Macro-Models Average Power Peak Power Power Distribution Cycle- Accurate Macro- Models Signal Integrity Switching Noise DC Voltage Drop Macro-Model Construction Starting point: an eact power consumption function Pwr = f ( V, V 2 ) = Order terms Equation simplification + Order 2 + + terms Order n terms module module module Eact function After order reduction After input grouping /LP Page 9

Design Variable reduction: find the most significant variables, by using a statistical test Regression Analysis module After variable reduction Population stratification: makes the macro-model accuracy less sensitive to different input conditions Training set design: reduces the macro-model bias A general linear regression model: Pwr = C0 + k i = C i X i Integrated Macro-Modeling Flow Powermill Netlist Powermill Config. Technology File Input Spatial Correlation Analysis for Primary Inputs Training Set Generation and Simulation Circuit Simulator (Powermill) Program Flow Variable Reduction and Curve Fitting Output Library Generation Macro-model Library /LP Page 20

Design Macro-Modeling Results 4 3 EAP (%) 2 0 C355 C908 C2670 C3540 C432 C535 C6288 C7552 Average EAP = 2.03% C880 Mul6 ADD6 20 5 0 5 0 C355 C908 C2670 C3540 C432 C535 C6288 C7552 Average ECP = 0.04% C880 Mul6 ADD6 ECP (%) Adaptive Macro-Modeling Static macro-model Training set Adaptive macro-model Training set Macro-model generation Macro-model Cycle-based simulation Power estimates Low-level simulation Random sampling Macro-model generation Macro-model Cycle-based simulation Power estimates /LP Page 2

Design Standard Cell Characterization SPICE Netlist Configuration File Command Scripts Technology Data Input Function Validation Characterization File Generation and Simulation Circuit Simulator (HSPICE) Program Flow Generalized Data Base (Timing, Power, Capacitance, Wire load, etc.) Library Eportation Synopsys Lib. VHDL Simulation Lib. Verilog Simulation Lib. Customized Lib. Output Design Tool Characteristics Accurate model Ö Include both output load (C) and input slope (S) Ö Nonlinear equation f=k 0 +k *C+k 2 *C 2 +k 3 *S+k 4 *C*S Ö Variable domain stratification, different sets of coefficients for different strata Ö Default sensitization method is pin-to-pin, can be etended to pattern-dependent by customization Fast model Ö Small equation form Ö Can be customized to full table-lookup library Easy library maintenance Ö Manage the libraries through Graphic User Interface Ö Generalized data base enables data reuse /LP Page 22

Design A Power Estimation Methodology Database Input Vector Sequences Cell Library Interconnect Estimates Multipleers Interconnect Estimate Power Memory Hard IP Blocks Glue Logic Controller Behavioral/RTL Simulation Design Planning Clock Tree Datapath Soft IP Blocks HDL Description Library Models Macromodels Analytic Models Prob. Analysis Low Level Simulation Sampling/Compaction RTL Synthesis Flow RTL Code RTL Synthesis Event or Cycle-Based Simulation Power Macromodeling Switching Activity Calculator RTL netlist Power Optimization Power Optimized Netlist Data Trace for Ports and Registers /LP Page 23

Design RTL Optimization for YResource Assignment ÖModule and/or register allocation and binding YClock Gating YState Encoding ÖTwo-level and multi-level logic realizations YMultiple Supply Voltage Scheduling ÖPipelined and non-pipelined designs Register Assignment for Avoid value change at the input of functional units during idle cycles a R 0 R / R 3 R 2 + b R a b a Z = + + c = ( + b) + c 2 R 0 R 3 / R 2 c R 3 + R Z /LP Page 24

Design An RTL Structure a b c Cycle R R 2 R 3 / + R 2 / R 3 + R 2 3 4 5 a b b c < a, > a a < R or R3, > < b, > a a + b < + b, > a b a b 2 + < R or R3, > < +, c > 2 OUT Can insert an isolation latch here Z a A RTL Structure Cycle R 0 R R 2 R 3 / + R 0 / R 3 R 2 b c R 2 3 4 5 a a a + b a + b b b c c a a + b a b 2 + < a, > < a, > a < + b, > a < + b, > a < b, > a b < +, c > 2 + OUT Z /LP Page 25

Design Module Assignment for O O O 2 3 Do module binding to minimize the input toggling of modules = a + b y = c + d y = e + f y a a b y e + + f y c d + y b + y c + d e y f + y O 3 O O 2 O O 2 O 3 Gated Clocks Disable clocking of input registers if the current instruction does not access it Clock Instruction Register Decode Logic Clock Registers A B ALU /LP Page 26

Design Logic Synthesis Flow Gate Netlist Cell Library Logic Synthesis Power Information from Cell Library Mapped Netlist Power Optimization Event or Cycle-Based Simulation Switching Activity Information Power Optimal Netlist Logic Optimization for YCombinational logic optimization ÖTechnology Mapping ÖGate Resizing ÖPin Assignment YSequential logic optimization ÖPrecomputation ÖState Re-encoding ÖPower management for control units Page 27 /LP

Design Technology Mapping Outputs of nodes of combinational circuit mapped to library gates under the cost function of load capacitance multiplied by the switching activity To minimize power dissipation, high switching activity points are either hidden within gates or driven by smaller gates Minimum power realization under zero delay model can be obtained using dynamic programming Circuit and Gate Library 0.09 0.79 0.09 Gate Area Intrinsic Input Load Capacitances Capacitances INV 928 0.029 0.054 NAND2 392 0.42 0.0747 AOI22 2320 0.340 0.033 /LP Page 28

Design Minimum Area Mapping AOI22 0.79 Area = 2320 + 928 = 3248 Power = 0.79 (0.340 + 0.054) + 0.79 (0.029) = 0.0887 Note: Ignore capacitances at circuit inputs Minimum Power Mapping 0.09 0.09 WIRE 0.79 Area = 392 * 3 = 476 Power = 0.09 (0.42 + 0.0747) * 2 + 0.79 (0.42) = 0.0726 /LP Page 29

Design Sequential Precomputation Selectively precompute the outputs of the logic circuit one clock cycle before they are required and use the precomputed values to reduce switching activity in the net clock cycle Original Circuit X X 2 X n R A R 2 f Predictor functions: g = g 2 = f = f = 0 /LP Page 30

Design Subset Input Disabling Architecture X X 2 X n R A R 3 R 2 LE f g g 2 g = g 2 = f = f = 0 A Comparator Eample C<n-> D<n-> C<n-2> D<n-> C<0> D<0> R C > D R 3 R 2 LE f g = C<n-> D<n-> g 2 = C<n-> D<n-> /LP Page 3

Design Layout Optimization for and Noise Floorplanning and/or Placement Gate Resizing Gated Clock Tree Design Driver Design Chip-Package Interface Design Power Bus Planning and Sizing Gated Clock Tree Design Controller - sinks - Steiner points clock tree edges source gate control signals Ö Clock tree : binary tree Ö Controller tree : star tree /LP Page 32

Design Clock Tree Layout source Steiner nodes Sinks Merging sectors Ö Use the Deferred Merge Embedding algorithm of to layout the clock tree Ö The bottom-up merging process is guided by the minimum switched capacitance heuristic Ö Top-down phase determines the eact locations of the Steiner nodes in their respective merging sectors Buffered clock tree versus Gated clock tree 800 600 400 200 0 Switched Capacitance r r2 r3 r4 r5 Buffered Gated Gate Red. 70 60 50 40 30 20 0 0 Area r r2 r3 r4 r5 Ö The average module activity was set to 40% /LP Page 33

Design Conclusions Analysis tools : Ö Simulation based techniques provide the best accuracy Ö Transistor level power estimation is mature; a satisfactory gate and RT level simulation engine is needed Ö Probability-based techniques are good for relative accuracy analysis and to guide the synthesis engine Optimization Tools: Ö Gate level optimization / re-mapping provides 20-40% power reduction Ö Behavioral and RTL power synthesis provides 40-60% power reduction Ö System-level optimization provides 2-5 X power reduction Power analysis, tradeoffs and optimizations for design levels higher than RTL are not mature /LP Page 34