Managing Physical Design Issues in ASIC Toolflows Complex Digital Systems Christopher Batten February 21, 2006

Similar documents
CMPEN 411 VLSI Digital Circuits Spring 2012 Lecture 17: Dynamic Sequential Circuits And Timing Issues

CSE241 VLSI Digital Circuits Winter Lecture 07: Timing II

Clock signal in digital circuit is responsible for synchronizing the transfer to the data between processing elements.

Lecture 23. Dealing with Interconnect. Impact of Interconnect Parasitics

Lecture 9: Clocking, Clock Skew, Clock Jitter, Clock Distribution and some FM

Very Large Scale Integration (VLSI)

Spiral 2 7. Capacitance, Delay and Sizing. Mark Redekopp

EEC 216 Lecture #3: Power Estimation, Interconnect, & Architecture. Rajeevan Amirtharajah University of California, Davis

Itanium TM Processor Clock Design

EE241 - Spring 2000 Advanced Digital Integrated Circuits. Announcements

EE115C Winter 2017 Digital Electronic Circuits. Lecture 19: Timing Analysis

EE241 - Spring 2006 Advanced Digital Integrated Circuits

VLSI Design I; A. Milenkovic 1

EE141. Administrative Stuff

Timing Issues. Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolić. January 2003

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

Logical Effort: Designing for Speed on the Back of an Envelope David Harris Harvey Mudd College Claremont, CA

EE382 Processor Design Winter 1999 Chapter 2 Lectures Clocking and Pipelining

EE M216A.:. Fall Lecture 5. Logical Effort. Prof. Dejan Marković

! Crosstalk. ! Repeaters in Wiring. ! Transmission Lines. " Where transmission lines arise? " Lossless Transmission Line.

Lecture 8: Combinational Circuit Design

GMU, ECE 680 Physical VLSI Design 1

The Linear-Feedback Shift Register

EE 447 VLSI Design. Lecture 5: Logical Effort

EE115C Digital Electronic Circuits Homework #4

Design for Manufacturability and Power Estimation. Physical issues verification (DSM)

Homework #2 10/6/2016. C int = C g, where 1 t p = t p0 (1 + C ext / C g ) = t p0 (1 + f/ ) f = C ext /C g is the effective fanout

Energy Delay Optimization

CMOS Transistors, Gates, and Wires

Lecture 8-1. Low Power Design

Lecture 6: Logical Effort

EECS 151/251A Homework 5

MODULE III PHYSICAL DESIGN ISSUES

ECE429 Introduction to VLSI Design

Digital Integrated Circuits A Design Perspective

Issues on Timing and Clocking

VLSI Design, Fall Logical Effort. Jacob Abraham

Xarxes de distribució del senyal de. interferència electromagnètica, consum, soroll de conmutació.

EE141Microelettronica. CMOS Logic

Chapter 8. Low-Power VLSI Design Methodology

THE INVERTER. Inverter

Lecture 25. Dealing with Interconnect and Timing. Digital Integrated Circuits Interconnect

EE371 - Advanced VLSI Circuit Design

EE 330 Lecture 39. Digital Circuits. Propagation Delay basic characterization Device Sizing (Inverter and multiple-input gates)

Testability. Shaahin Hessabi. Sharif University of Technology. Adapted from the presentation prepared by book authors.

CSE493/593. Designing for Low Power

Lecture 6 Power Zhuo Feng. Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 2010

Digital Microelectronic Circuits ( )

L15: Custom and ASIC VLSI Integration

EEC 118 Lecture #6: CMOS Logic. Rajeevan Amirtharajah University of California, Davis Jeff Parkhurst Intel Corporation

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

! Charge Leakage/Charge Sharing. " Domino Logic Design Considerations. ! Logic Comparisons. ! Memory. " Classification. " ROM Memories.

EE M216A.:. Fall Lecture 4. Speed Optimization. Prof. Dejan Marković Speed Optimization via Gate Sizing

EE115C Digital Electronic Circuits Homework #6

Lecture 7: Logic design. Combinational logic circuits

Delay and Power Estimation

Where are we? Data Path Design

C.K. Ken Yang UCLA Courtesy of MAH EE 215B

CMPEN 411 VLSI Digital Circuits Spring Lecture 14: Designing for Low Power

EE5780 Advanced VLSI CAD

Lecture 21: Packaging, Power, & Clock

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

Digital Integrated Circuits A Design Perspective

University of Toronto. Final Exam

Interconnect (2) Buffering Techniques. Logical Effort

Jin-Fu Li Advanced Reliable Systems (ARES) Lab. Department of Electrical Engineering. Jungli, Taiwan

CMOS logic gates. João Canas Ferreira. March University of Porto Faculty of Engineering

Logic Synthesis and Verification

School of EECS Seoul National University

Interconnect (2) Buffering Techniques.Transmission Lines. Lecture Fall 2003

! Memory. " RAM Memory. ! Cell size accounts for most of memory array size. ! 6T SRAM Cell. " Used in most commercial chips

Digital Integrated Circuits A Design Perspective

Introduction to CMOS VLSI Design (E158) Lecture 20: Low Power Design

EE241 - Spring 2001 Advanced Digital Integrated Circuits

The CMOS Inverter: A First Glance

Chapter 5 CMOS Logic Gate Design

Dynamic operation 20

ECE 407 Computer Aided Design for Electronic Systems. Simulation. Instructor: Maria K. Michael. Overview

COMP 103. Lecture 16. Dynamic Logic

Lecture Outline. ESE 570: Digital Integrated Circuits and VLSI Fundamentals. Review: 1st Order RC Delay Models. Review: Two-Input NOR Gate (NOR2)

Clock Strategy. VLSI System Design NCKUEE-KJLEE

Hold Time Illustrations

EE 466/586 VLSI Design. Partha Pande School of EECS Washington State University

EE115C Winter 2017 Digital Electronic Circuits. Lecture 6: Power Consumption

Lecture 1: Gate Delay Models

Digital Integrated Circuits. The Wire * Fuyuzhuo. *Thanks for Dr.Guoyong.SHI for his slides contributed for the talk. Digital IC.

! Dynamic Characteristics. " Delay

EE241 - Spring 2003 Advanced Digital Integrated Circuits

Topics to be Covered. capacitance inductance transmission lines

Where are we? Data Path Design. Bit Slice Design. Bit Slice Design. Bit Slice Plan

CPE/EE 427, CPE 527 VLSI Design I Pass Transistor Logic. Review: CMOS Circuit Styles

Integrated Circuits & Systems

Lecture 16: Circuit Pitfalls

Announcements. EE141- Fall 2002 Lecture 7. MOS Capacitances Inverter Delay Power

Skew-Tolerant Circuit Design

EE141- Spring 2004 Digital Integrated Circuits

5.0 CMOS Inverter. W.Kucewicz VLSICirciuit Design 1

9/18/2008 GMU, ECE 680 Physical VLSI Design

Implementation of Clock Network Based on Clock Mesh

Interconnects. Wire Resistance Wire Capacitance Wire RC Delay Crosstalk Wire Engineering Repeaters. ECE 261 James Morizio 1

Transcription:

Managing Physical Design Issues in ASI Toolflows 6.375 omplex Digital Systems hristopher Batten February 1, 006

Managing Physical Design Issues in ASI Toolflows Logical Effort Physical Design Issues lock Distribution Power Distribution Wire Delay Power onsumption apacitive oupling 1. What is the issue?. How do custom designers address the issue? 3. How can we approximate these approaches in an ASI toolflow? 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows

Which gate topology and transistor sizing is optimal? Ideally, given a gate topology, we would like to answer two questions in a lightweight and technology independent way: 1. What is the optimal transistor sizing?. What is the optimal number of stages? 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 3

Review of the simple R model for the MOS inverter R eff V in V out V in V out g d R eff R eff = R eff,n = R eff,p g = g,n + g,p d = d,n + d,p 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 4

A gate template is gate with same drive current as minimum sized inverter R eff = R inv R eff d 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 5

We begin by deriving an equation for unitless delay in terms of a template Determine R for an actual gate relative to the template in = α in,t p = α Derive absolute delay in terms of the template p,t R EFF = R EFF,T α d abs = = K R EFF R K α EFF,T ( + ) out α in,t p = K R out in EFF R + K α EFF,T out + K R α p,t EFF p = K R = K R EFF,T EFF in,t out in in out in + K R + K R EFF,T EFF p,t p Independent of actual transistor widths Function of transistor widths in template Independent of actual transistor widths Should be rough 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 6

We begin by deriving an equation for unitless delay in terms of a template Determine R for an actual gate relative to the template in = α in,t p = α Derive absolute delay in terms of the template p,t R EFF = R EFF,T α d abs = = K R EFF R K α EFF,T ( + ) out α in,t p = K R out in EFF R + K α EFF,T out + K R α p,t EFF p = K R = K R EFF,T Function of the actual transistor widths Also called the gate fanout EFF in,t out in in out in + K R + K R EFF,T EFF p,t p 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 7

We begin by deriving an equation for unitless delay in terms of a template Determine R for an actual gate relative to the template in = α in,t p = α Derive absolute delay in terms of the template p,t R EFF = R EFF,T α d abs = = K R EFF R K α EFF,T ( + ) out α in,t p = K R out in EFF R + K α EFF,T out + K R α p,t EFF p = K R = K R EFF,T EFF in,t out in in out in + K R + K R EFF,T EFF p,t p Normalize this delay to the delay of an min inverter with no parasitics d = d abs τ = K R K R EFF,T inv in,t inv out in + K R K R EFF,T inv inv p,t = R EFF,T R inv in,t inv out in + R EFF,T R inv inv p,t For our 0.18um technology, τ 10ps 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 8

We begin by deriving an equation for unitless delay in terms of a template d = d abs τ = R EFF,T R inv in,t inv out in + R EFF,T R inv inv p,t Logical Effort (g) Electrical Effort (h) Parasitic Delay (p) Parasitic Delay is relative to a minimum sized inverter and is roughly independent of actual transistor widths Electrical Effort is the fanout of the gate and is a function of actual transistor widths Logical Effort compares characteristic R time constant of gate to minimum sized inverter and is independent of actual transistor widths 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 9

We begin by deriving an equation for unitless delay in terms of a template d = d abs τ = R EFF,T R inv in,t inv out in + R EFF,T R inv inv p,t d = d abs τ = g h + p Parasitic Delay is relative to a minimum sized inverter and is roughly independent of actual transistor widths Electrical Effort is the fanout of the gate and is a function of actual transistor widths Logical Effort compares characteristic R time constant of gate to minimum sized inverter and is independent of actual transistor widths 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 10

Logical effort is simply ratio of input cap to min inverter with same current drive 1 4 4 1 1 Inverter Input ap = 3 units g = 1 (definition) p = 1 NAND Input ap = 4 units g = 4/3 p = 6/3 = NOR Input ap = 5 units g = 5/3 p = 6/3 = 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 11

Examples illustrating unit-less delay of gates with equal drive strength 4 10 4 4 4 4 10 8 8 10 Inverter g = 1 (definition) p = 1 h = 10/6 = 1.67 d = gh+p =.67 NAND g = 4/3 p = h = 10/8 = 1.5 d = gh+p = 3.67 NOR g = 5/3 p = h = 10/10 = 1 d = gh+p = 3.67 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 1

Examples illustrating unit-less delay of gates with similar area 6 3 10.5.5.5.5 10 4 4 1 1 10 Inverter g = 1 (definition) p = 1 h = 10/9 = 1.11 d = gh+p =.11 NAND g = 4/3 p = h = 10/5 = d = gh+p = 4.67 NOR g = 5/3 p = h = 10/5 = d = gh+p = 5.33 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 13

Path delay (D) is just the sum of the stage delays D = ( gi hi + pi ) = ( gi hi ) di = + i i i i p i 4 (4/3)x(/) + (4/3)x(4/) + 4 = 10.67 4 (4/3)x(/) + (4/3)x(4/) + 4 = 9.33 4 4 (4/3)x(4/) + (4/3)x(4/4) + 4 = 10.67 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 14

6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 15 What is the optimal delay for any general two stage topology? 1 3 g 1 g p 1 p ( ) ( ) + + + = + + + = 3 1 1 1 1 1 1 p g p g p h g p g h D Form unitless delay equation Only free variable is ( ) 0 g g D 3 1 1 = = Minimize with respect to ( ) 1 1 3 1 1 3 1 1 h g g h g g g g = = = Minimal delay occurs when stage effort is equal

Key Result: Delay is minimized when effort is shared equally among stages D = ( gi hi + pi ) = ( gi hi ) di = + i i i i p i 1.33 5.33 4 (4/3)x(/) + (4/3)x(4/) + 4 = 10.67.67.67 4 (4/3)x(/) + (4/3)x(4/) + 4 = 9.33 5.33 4 1.33 4 (4/3)x(4/) + (4/3)x(4/4) + 4 = 10.67 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 16

We now generalize this result with some additional terminology in out Path delay D = Σ d i = Σ g i h i + Σ p i Sum of stage delays Path logical effort Path electrical effort Path effort Optimal stage effort for N stages Optimal path delay G = Π g i H = Π h i = out / in F = Π f i = Π (g i h i ) = GH f OPT = F 1/N D OPT = Σ f OPT + Σ p i Product of stage LE Product of stage EE (Internal s cancel out) Product of stage efforts Optimal delay when g 1 h 1 = g h = = g N h N 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 17

Steps for transistor sizing 1. alculate path effort. alculate optimal path delay 3. Assign each stage equal effort 4. Work from out backwards assigning in values for each stage 5. onvert in values into transistors sizes 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 18

Finding the path effort and optimal delay (Steps 1 and ) Topology A Topology B Topology 4/3 4/3 4/3 4/3 4 4 10/3 8 1 5/3 5/3 5/3 4/3 G N P D OPT H=1 H=1 1 A B.96 3.33 4 7 6 4(.96H) 1/4 + 7 (3.33H) 1/ + 6 1.5 9.65 16.77 18.64 3.33 9 (3.33H) 1/ + 9 1.65 1.64 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 19

Finding actual transistor sizes for H=1 case (Steps 3-5) 6 Path Effort (F) = GH = (3.33)(1) = 3.33 6 6 Divide path effort equally among stages F OPT = F 1/N = (3.33) 1/ = 1.8 out and in are given in equivalent gate transistor width cap Stage effort of nor gate must equal 1.8 We know logical effort is 5/3, so we can find (5/3)(6/ ) = 1.8 = 5.5 Double check that stage effort of first stage works out ()(5.5/6) = 1.8 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 0

Finding actual transistor sizes for H=1 case (Steps 3-5) 6 6 5.5 6 4 4 4 1 1 4 4 4 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 1

How many stages of inverters required if want to drive large load? in out D D OPT OPT N = = NF F 1 / N 1 / N + F Np 1 / N inv ( 1 / N ) + p = 0 ln F inv No simple closed form solution, but we can examine this function numerically 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows

Optimum number of stages for varying parasitic delays and stage effort Optimum Number of Stages (N) Total Path Effort (F) 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 3

Optimum stage effort for varying parasitic delays Optimum Stage Effort (f OPT ) f OPT 4 is a good rule-of-thumb P inv = 0, F OPT = e =.7 Parasitic Delay (P inv ) 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 4

A good rule-of-thumb is to target a stage effort around four in Minimum delay when: Stage effort = logical effort x electrical effort 3.4-3.8 Some derivations use e =.718.. this ignores parasitics Broad optimum, stage efforts of.4-6.0 within 15-0% of minimum Fan-out-of-four (FO4) is convenient design size (~5τ) FO4 delay: Delay of inverter driving four copies of itself out 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 5

Do the topologies in our original example have the optimum number of stages? Optimum Numr of Stages (N) Total Path Effort (F) G N P F (H=1) D (H=1) F (H=1) D (H=1) A.96 4 7.96 1.5 35.5 16.77 B 3.33 6 3.33 9.65 39.96 18.64 3.33 9 3.33 1.65 39.96 1.64 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 6

Managing Physical Design Issues in ASI Toolflows Logical Effort Physical Design Issues lock Distribution Power Distribution Wire Delay Power onsumption apacitive oupling 1. What is the issue?. How do custom designers address the issue? 3. How can we approximate these approaches in an ASI toolflow? 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 7

lock Distribution: The Issue lock propagates across entire chip lock annot really distribute clock instantaneously with a perfectly regular period 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 8

lock Distribution: The Issue Two forms of variability lock Skew Difference in clock arrival time at two spatially distinct points B A A B Skew Usable lock Period 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 9

lock Distribution: The Issue Two forms of variability Period A!= Period B lock Jitter Difference in clock period over time 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 30

lock Distribution: The Issue Why is minimizing skew and jitter hard? lock Distribution Network Variations in trace length, metal width and height, coupling caps entral lock Driver Variations in local clock load, local power supply, local gate length and threshold, local temperature Local lock Buffers 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 31

lock Distribution: ustom Approach lock grids lower skew but high power Grid feeds flops directly, no local buffers lock driver tree spans height of chip Internal levels shorted together 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 3

lock Distribution: ustom Approach Trees have more skew but less power H-Tree R-Tree Recursive pattern to distribute signals uniformly with equal delay over area Each branch is individually routed to balance R delay 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 33

lock Distribution: ustom Approach Active deskewing circuits in Intel Itanium Active Deskew ircuits (cancels out systematic skew) Phase Locked Loop (PLL) Regional Grid 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 34

lock Distribution: ustom Approach Other techniques Use latch-based design Time borrowing helps reduce impact of clock uncertainty Timing analysis can be more difficult Make logical partitioning match physical partitioning Limits global communication where skew is usually the worst Helps break distribution problem into smaller subproblems Use globally asynchronous, locally synchronous design Divides design into synchronous regions which communicate through asynchronous channels Requires overhead for inter-domain communication Use asynchronous design Avoids clocks all together Incurs its own forms of control overhead 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 35

lock Distribution: ASI Approach lock Tree Synthesis Modern back-end tools include clock tree synthesis reates balanced R-trees Uses special clock buffer standard cells an add clock shielding an exploit useful clock skew Automatic clock tree generation still results in significantly worse clock uncertainties as compare to custom clock trees 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 36

Example of clock tree synthesis using commercial ASI back-end tools 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 37

Example of clock tree synthesis using commercial ASI back-end tools 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 38

Power Distribution: The Issue Possible IR drop across power network VDD VDD R eff R eff g Reff d g Reff d GND GND 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 39

Power Distribution: The Issue IR drop can be static or dynamic Static IR Drop Dynamic IR Drop Are these parasitic capacitances bad? VDD VDD R eff R eff g Reff d g Reff d GND GND 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 40

Power Distribution: ustom Approach arefully tailor power network G V G V B A Routed power distribution on two stacked layers of metal (one for VDD, one for GND). OK for lowcost, low-power designs with few layers of metal. V G V G V G V G V G V G Power Grid. Interconnected vertical and horizontal power bars. ommon on most high-performance designs. Often well over half of total metal on upper thicker layers used for VDD/GND. V G V G V G V V G V G V G V Dedicated VDD/GND planes. Very expensive. Only used on Alpha 164. Simplified circuit analysis. Dropped on subsequent Alphas. G G V G V G 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 41

Power Distribution: ASI Approach Strapping and rings for standard cells 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 4

Power Distribution: ASI Approach Power rings partition the power problem Early physical partitioning and prototyping is essential an use special filler cells to help add decoupling cap 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 43

Example of power distribution network using commercial ASI back-end tools 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 44

Example of power distribution network using commercial ASI back-end tools 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 45

Wire Delay: The Issue Large R makes long wires slow R driver R 1 R R N load 1 N Delay N i j = i j = 1 R j i Wire delay increases quadratically 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 46

Wire Delay: ustom Approach Manual insertion of repeaters R driver R 1 R R N load 1 N Delay N i R i i Wire delay increases linearly 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 47

Wire Delay: ustom Approach Several issues with repeater insertion Repeater must connect to transistor layers Blocks other routes with vias that connect down Requires space on active layers for buffer transistors Repeaters often grouped in preallocated repeater boxes spread around chip, and thus repeater location might not give ideal spacing 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 48

Wire Delay: Impact on RTL Make logical, physical partitioning match Limits global communication Helps simplify automatic buffer insertion Add extra pipeline stages for wire delay P4 included stages just for driving signals Requires very early physical prototyping Use latency insensitive methodology reate macroblocks with registered interfaces Enables pipelining wires late in design cycle 1 3 4 5 6 7 8 9 10 11 1 13 14 15 16 17 18 19 0 T Next IP T Fetch Drive Alloc Rename Queue Schedule 1 Schedule Schedule 3 Dispatch 1 Dispatch Register File 1 Register File Execute Flags Branch heck Drive 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 49

Wire Delay: ASI Approach Front-end tools include rough wire-load models Usually statistical in nature Helps synthesis tool with technology mapping Back-end tools include better wire-load models After trial placement can use Manhattan distance Tool will automatically insert buffers where necessary 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 50

Power onsumption: The Issue Power has been increasing rapidly Power (Watts) 1000 100 10 1 8086 386 Pentium 4 proc Pentium proc 1000W PU? 0.1 8080 1970 1980 1990 000 010 00 [ Source: Intel ] 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 51

Power onsumption: The Issue Why is it a problem? Power dissipation is limiting factor in many systems Battery weight and life for portable devices Packaging and cooling costs for tethered systems ase temperature for laptop/wearable computers Fan noise for media hubs Example 1: ellphone 3 Watt hard power limit any more and customers complain Battery life is a strong product differentiator Example : Internet data center ~8,000 servers, ~ MegaWatts 5% of operational costs are in electricity bill for supplying power and running air-conditioning to remove heat 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 5

Power onsumption: The Issue Main forms are dynamic and static power R eff R eff g Reff d g Reff d Dynamic Power Switching power used to charge up load capacitance P dynamic = α F(1/) V DD Static Power Subthreshold leakage power when transistor is off P static = V DD I off 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 53

Power onsumption: ustom Approach P dynamic = α F (1/) V DD Reduce Activity lock gating so clock node of inactive logic doesn t switch Data gating so data nodes of inactive logic doesn t switch Bus encodings to minimize transitions Balance logic paths to avoid glitches during settling Reduce Frequency Doesn t save energy, just reduces rate at which it is consumed Lower power means less heat dissipation but must run longer 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 54

Power onsumption: ustom Approach P dynamic = α F (1/) V DD Reduce Switched apacitance areful transistor sizing (small transistors off critical path) Tighter layout (good floorplanning) Segmented tri-state bus structures Reduce Supply Voltage Need to lower frequency as well quadratic+ power savings an lower statically for cells off critical path an lower dynamically for just-in-time computation 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 55

Power onsumption: ustom Approach P static = V DD I OFF Reduce Supply Voltage In addition to dynamic power reduction, reducing Vdd can help reduce static power Reduce Off urrent Increase length of transistors off critical path Use high-vt cells off critical path (extra Vt increases fab costs) Use stacked devices Use power gating (ie switch off the power supply with a large transistor) 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 56

Power onsumption Reducing activity with clock gating Don t clock flip-flop if not needed Avoids transitioning downstream logic Enable adds control logic complexity P4 has hundreds of gated clock domains Global lock D Enable Latch (transparent on clock low) Gated Local lock Q lock Enable Latched Enable Gated lock 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 57

Power onsumption Reducing activity with data gating A B Shifter 1 Shifter infrequently used Adder 0 Shift/Add Select A B Shifter 1 Adder 0 ould use transparent latch instead of AND gate to reduce number of transitions, but would be bigger and slower. 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 58

Power onsumption Reducing supply voltage Both static and dynamic voltage scaling is possible Delay rises sharply as supply voltage approaches Vt [ Source: Horowitz ] 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 59

Power onsumption Parallel architecture to reduce energy 8-bit adder/cmp 40MHz at 5V, area = 530 kµ Base power P ref Two parallel interleaved adder/cmp units 0MHz at.9v, area = 1,800 kµ (3.4x) Power = 0.36 P ref One pipelined adder/cmp unit 40MHz at.9v, area = 690 kµ (1.3x) Power = 0.39 P ref + + + + + Pipelined and parallel 0MHz at.0v, area = 1,961 kµ (3.7x) Power = 0. P ref + + + + handrakasan et. al, IEEE JSS 7(4), April 199 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 60

Power onsumption: ASI Approach Minimize activity Automatic clock gating is possible if we write Verilog so tools can infer gating Partition designs so minimal number of components activated to perform each operation Floorplan units to reduce length of power-hungry global wires Use lowest voltage and slowest frequency necessary to reach target performance Use pipelined and parallel architectures if possible Modern standard cell libraries include low-power cells, high-vt cells, and low-vt cells tools can automatically replace non-critical cells to optimize for power 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 61

apacitive oupling: The Issue Delay is a function of switching on neighbors A B AB B Most of the wire capacitance is to neighboring wires If A switches then it injects voltage noise on where the magnitude depends on capacitive divider formed [ AB /( AB + B ) ] If A switches in opposite direction while B switches, coupling capacitance effectively doubles (Miller effect) If A switches in same direction while B switches, coupling capacitance disappears These effects can lead to large variance in possible delay of B driver, possibly factor of 5 or 6 between best and worst case 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 6

apacitive oupling ustom vs ASI Approach ustom Approach Avoid placing simultaneously switching signals next to each other for long parallel runs (use swizzling) Reroute signals which will be quiet during switching in between simultaneous switching signals Route signals close to power rails for capacitance ballast Extensive dynamic signal simulation ASI Approach Automatic routers can specifically avoid long straight routes, sometimes this causes the router to avoid the most direct route ritical nets (such as the clock) can use automatic shielding Static timing tools help focus dynamic signal simulation Fixing a coupling problem can require a point change which itself might cause new problems 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 63

Take away points Logical effort is a useful tool for quickly determining transistor sizing and number of stages It is essential to consider physical design issues early and often in ASI design Physical prototyping enables designers to evaluate impact of physical design issues early in the design process with Making logical partitioning match physical partitioning helps expose physical design tradeoffs at the RTL level Next Lecture: Arvind will introduce using guarded atomic actions to describe hardware 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 64