Managing Physical Design Issues in ASI Toolflows 6.375 omplex Digital Systems hristopher Batten February 1, 006
Managing Physical Design Issues in ASI Toolflows Logical Effort Physical Design Issues lock Distribution Power Distribution Wire Delay Power onsumption apacitive oupling 1. What is the issue?. How do custom designers address the issue? 3. How can we approximate these approaches in an ASI toolflow? 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows
Which gate topology and transistor sizing is optimal? Ideally, given a gate topology, we would like to answer two questions in a lightweight and technology independent way: 1. What is the optimal transistor sizing?. What is the optimal number of stages? 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 3
Review of the simple R model for the MOS inverter R eff V in V out V in V out g d R eff R eff = R eff,n = R eff,p g = g,n + g,p d = d,n + d,p 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 4
A gate template is gate with same drive current as minimum sized inverter R eff = R inv R eff d 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 5
We begin by deriving an equation for unitless delay in terms of a template Determine R for an actual gate relative to the template in = α in,t p = α Derive absolute delay in terms of the template p,t R EFF = R EFF,T α d abs = = K R EFF R K α EFF,T ( + ) out α in,t p = K R out in EFF R + K α EFF,T out + K R α p,t EFF p = K R = K R EFF,T EFF in,t out in in out in + K R + K R EFF,T EFF p,t p Independent of actual transistor widths Function of transistor widths in template Independent of actual transistor widths Should be rough 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 6
We begin by deriving an equation for unitless delay in terms of a template Determine R for an actual gate relative to the template in = α in,t p = α Derive absolute delay in terms of the template p,t R EFF = R EFF,T α d abs = = K R EFF R K α EFF,T ( + ) out α in,t p = K R out in EFF R + K α EFF,T out + K R α p,t EFF p = K R = K R EFF,T Function of the actual transistor widths Also called the gate fanout EFF in,t out in in out in + K R + K R EFF,T EFF p,t p 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 7
We begin by deriving an equation for unitless delay in terms of a template Determine R for an actual gate relative to the template in = α in,t p = α Derive absolute delay in terms of the template p,t R EFF = R EFF,T α d abs = = K R EFF R K α EFF,T ( + ) out α in,t p = K R out in EFF R + K α EFF,T out + K R α p,t EFF p = K R = K R EFF,T EFF in,t out in in out in + K R + K R EFF,T EFF p,t p Normalize this delay to the delay of an min inverter with no parasitics d = d abs τ = K R K R EFF,T inv in,t inv out in + K R K R EFF,T inv inv p,t = R EFF,T R inv in,t inv out in + R EFF,T R inv inv p,t For our 0.18um technology, τ 10ps 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 8
We begin by deriving an equation for unitless delay in terms of a template d = d abs τ = R EFF,T R inv in,t inv out in + R EFF,T R inv inv p,t Logical Effort (g) Electrical Effort (h) Parasitic Delay (p) Parasitic Delay is relative to a minimum sized inverter and is roughly independent of actual transistor widths Electrical Effort is the fanout of the gate and is a function of actual transistor widths Logical Effort compares characteristic R time constant of gate to minimum sized inverter and is independent of actual transistor widths 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 9
We begin by deriving an equation for unitless delay in terms of a template d = d abs τ = R EFF,T R inv in,t inv out in + R EFF,T R inv inv p,t d = d abs τ = g h + p Parasitic Delay is relative to a minimum sized inverter and is roughly independent of actual transistor widths Electrical Effort is the fanout of the gate and is a function of actual transistor widths Logical Effort compares characteristic R time constant of gate to minimum sized inverter and is independent of actual transistor widths 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 10
Logical effort is simply ratio of input cap to min inverter with same current drive 1 4 4 1 1 Inverter Input ap = 3 units g = 1 (definition) p = 1 NAND Input ap = 4 units g = 4/3 p = 6/3 = NOR Input ap = 5 units g = 5/3 p = 6/3 = 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 11
Examples illustrating unit-less delay of gates with equal drive strength 4 10 4 4 4 4 10 8 8 10 Inverter g = 1 (definition) p = 1 h = 10/6 = 1.67 d = gh+p =.67 NAND g = 4/3 p = h = 10/8 = 1.5 d = gh+p = 3.67 NOR g = 5/3 p = h = 10/10 = 1 d = gh+p = 3.67 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 1
Examples illustrating unit-less delay of gates with similar area 6 3 10.5.5.5.5 10 4 4 1 1 10 Inverter g = 1 (definition) p = 1 h = 10/9 = 1.11 d = gh+p =.11 NAND g = 4/3 p = h = 10/5 = d = gh+p = 4.67 NOR g = 5/3 p = h = 10/5 = d = gh+p = 5.33 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 13
Path delay (D) is just the sum of the stage delays D = ( gi hi + pi ) = ( gi hi ) di = + i i i i p i 4 (4/3)x(/) + (4/3)x(4/) + 4 = 10.67 4 (4/3)x(/) + (4/3)x(4/) + 4 = 9.33 4 4 (4/3)x(4/) + (4/3)x(4/4) + 4 = 10.67 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 14
6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 15 What is the optimal delay for any general two stage topology? 1 3 g 1 g p 1 p ( ) ( ) + + + = + + + = 3 1 1 1 1 1 1 p g p g p h g p g h D Form unitless delay equation Only free variable is ( ) 0 g g D 3 1 1 = = Minimize with respect to ( ) 1 1 3 1 1 3 1 1 h g g h g g g g = = = Minimal delay occurs when stage effort is equal
Key Result: Delay is minimized when effort is shared equally among stages D = ( gi hi + pi ) = ( gi hi ) di = + i i i i p i 1.33 5.33 4 (4/3)x(/) + (4/3)x(4/) + 4 = 10.67.67.67 4 (4/3)x(/) + (4/3)x(4/) + 4 = 9.33 5.33 4 1.33 4 (4/3)x(4/) + (4/3)x(4/4) + 4 = 10.67 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 16
We now generalize this result with some additional terminology in out Path delay D = Σ d i = Σ g i h i + Σ p i Sum of stage delays Path logical effort Path electrical effort Path effort Optimal stage effort for N stages Optimal path delay G = Π g i H = Π h i = out / in F = Π f i = Π (g i h i ) = GH f OPT = F 1/N D OPT = Σ f OPT + Σ p i Product of stage LE Product of stage EE (Internal s cancel out) Product of stage efforts Optimal delay when g 1 h 1 = g h = = g N h N 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 17
Steps for transistor sizing 1. alculate path effort. alculate optimal path delay 3. Assign each stage equal effort 4. Work from out backwards assigning in values for each stage 5. onvert in values into transistors sizes 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 18
Finding the path effort and optimal delay (Steps 1 and ) Topology A Topology B Topology 4/3 4/3 4/3 4/3 4 4 10/3 8 1 5/3 5/3 5/3 4/3 G N P D OPT H=1 H=1 1 A B.96 3.33 4 7 6 4(.96H) 1/4 + 7 (3.33H) 1/ + 6 1.5 9.65 16.77 18.64 3.33 9 (3.33H) 1/ + 9 1.65 1.64 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 19
Finding actual transistor sizes for H=1 case (Steps 3-5) 6 Path Effort (F) = GH = (3.33)(1) = 3.33 6 6 Divide path effort equally among stages F OPT = F 1/N = (3.33) 1/ = 1.8 out and in are given in equivalent gate transistor width cap Stage effort of nor gate must equal 1.8 We know logical effort is 5/3, so we can find (5/3)(6/ ) = 1.8 = 5.5 Double check that stage effort of first stage works out ()(5.5/6) = 1.8 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 0
Finding actual transistor sizes for H=1 case (Steps 3-5) 6 6 5.5 6 4 4 4 1 1 4 4 4 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 1
How many stages of inverters required if want to drive large load? in out D D OPT OPT N = = NF F 1 / N 1 / N + F Np 1 / N inv ( 1 / N ) + p = 0 ln F inv No simple closed form solution, but we can examine this function numerically 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows
Optimum number of stages for varying parasitic delays and stage effort Optimum Number of Stages (N) Total Path Effort (F) 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 3
Optimum stage effort for varying parasitic delays Optimum Stage Effort (f OPT ) f OPT 4 is a good rule-of-thumb P inv = 0, F OPT = e =.7 Parasitic Delay (P inv ) 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 4
A good rule-of-thumb is to target a stage effort around four in Minimum delay when: Stage effort = logical effort x electrical effort 3.4-3.8 Some derivations use e =.718.. this ignores parasitics Broad optimum, stage efforts of.4-6.0 within 15-0% of minimum Fan-out-of-four (FO4) is convenient design size (~5τ) FO4 delay: Delay of inverter driving four copies of itself out 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 5
Do the topologies in our original example have the optimum number of stages? Optimum Numr of Stages (N) Total Path Effort (F) G N P F (H=1) D (H=1) F (H=1) D (H=1) A.96 4 7.96 1.5 35.5 16.77 B 3.33 6 3.33 9.65 39.96 18.64 3.33 9 3.33 1.65 39.96 1.64 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 6
Managing Physical Design Issues in ASI Toolflows Logical Effort Physical Design Issues lock Distribution Power Distribution Wire Delay Power onsumption apacitive oupling 1. What is the issue?. How do custom designers address the issue? 3. How can we approximate these approaches in an ASI toolflow? 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 7
lock Distribution: The Issue lock propagates across entire chip lock annot really distribute clock instantaneously with a perfectly regular period 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 8
lock Distribution: The Issue Two forms of variability lock Skew Difference in clock arrival time at two spatially distinct points B A A B Skew Usable lock Period 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 9
lock Distribution: The Issue Two forms of variability Period A!= Period B lock Jitter Difference in clock period over time 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 30
lock Distribution: The Issue Why is minimizing skew and jitter hard? lock Distribution Network Variations in trace length, metal width and height, coupling caps entral lock Driver Variations in local clock load, local power supply, local gate length and threshold, local temperature Local lock Buffers 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 31
lock Distribution: ustom Approach lock grids lower skew but high power Grid feeds flops directly, no local buffers lock driver tree spans height of chip Internal levels shorted together 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 3
lock Distribution: ustom Approach Trees have more skew but less power H-Tree R-Tree Recursive pattern to distribute signals uniformly with equal delay over area Each branch is individually routed to balance R delay 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 33
lock Distribution: ustom Approach Active deskewing circuits in Intel Itanium Active Deskew ircuits (cancels out systematic skew) Phase Locked Loop (PLL) Regional Grid 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 34
lock Distribution: ustom Approach Other techniques Use latch-based design Time borrowing helps reduce impact of clock uncertainty Timing analysis can be more difficult Make logical partitioning match physical partitioning Limits global communication where skew is usually the worst Helps break distribution problem into smaller subproblems Use globally asynchronous, locally synchronous design Divides design into synchronous regions which communicate through asynchronous channels Requires overhead for inter-domain communication Use asynchronous design Avoids clocks all together Incurs its own forms of control overhead 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 35
lock Distribution: ASI Approach lock Tree Synthesis Modern back-end tools include clock tree synthesis reates balanced R-trees Uses special clock buffer standard cells an add clock shielding an exploit useful clock skew Automatic clock tree generation still results in significantly worse clock uncertainties as compare to custom clock trees 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 36
Example of clock tree synthesis using commercial ASI back-end tools 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 37
Example of clock tree synthesis using commercial ASI back-end tools 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 38
Power Distribution: The Issue Possible IR drop across power network VDD VDD R eff R eff g Reff d g Reff d GND GND 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 39
Power Distribution: The Issue IR drop can be static or dynamic Static IR Drop Dynamic IR Drop Are these parasitic capacitances bad? VDD VDD R eff R eff g Reff d g Reff d GND GND 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 40
Power Distribution: ustom Approach arefully tailor power network G V G V B A Routed power distribution on two stacked layers of metal (one for VDD, one for GND). OK for lowcost, low-power designs with few layers of metal. V G V G V G V G V G V G Power Grid. Interconnected vertical and horizontal power bars. ommon on most high-performance designs. Often well over half of total metal on upper thicker layers used for VDD/GND. V G V G V G V V G V G V G V Dedicated VDD/GND planes. Very expensive. Only used on Alpha 164. Simplified circuit analysis. Dropped on subsequent Alphas. G G V G V G 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 41
Power Distribution: ASI Approach Strapping and rings for standard cells 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 4
Power Distribution: ASI Approach Power rings partition the power problem Early physical partitioning and prototyping is essential an use special filler cells to help add decoupling cap 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 43
Example of power distribution network using commercial ASI back-end tools 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 44
Example of power distribution network using commercial ASI back-end tools 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 45
Wire Delay: The Issue Large R makes long wires slow R driver R 1 R R N load 1 N Delay N i j = i j = 1 R j i Wire delay increases quadratically 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 46
Wire Delay: ustom Approach Manual insertion of repeaters R driver R 1 R R N load 1 N Delay N i R i i Wire delay increases linearly 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 47
Wire Delay: ustom Approach Several issues with repeater insertion Repeater must connect to transistor layers Blocks other routes with vias that connect down Requires space on active layers for buffer transistors Repeaters often grouped in preallocated repeater boxes spread around chip, and thus repeater location might not give ideal spacing 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 48
Wire Delay: Impact on RTL Make logical, physical partitioning match Limits global communication Helps simplify automatic buffer insertion Add extra pipeline stages for wire delay P4 included stages just for driving signals Requires very early physical prototyping Use latency insensitive methodology reate macroblocks with registered interfaces Enables pipelining wires late in design cycle 1 3 4 5 6 7 8 9 10 11 1 13 14 15 16 17 18 19 0 T Next IP T Fetch Drive Alloc Rename Queue Schedule 1 Schedule Schedule 3 Dispatch 1 Dispatch Register File 1 Register File Execute Flags Branch heck Drive 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 49
Wire Delay: ASI Approach Front-end tools include rough wire-load models Usually statistical in nature Helps synthesis tool with technology mapping Back-end tools include better wire-load models After trial placement can use Manhattan distance Tool will automatically insert buffers where necessary 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 50
Power onsumption: The Issue Power has been increasing rapidly Power (Watts) 1000 100 10 1 8086 386 Pentium 4 proc Pentium proc 1000W PU? 0.1 8080 1970 1980 1990 000 010 00 [ Source: Intel ] 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 51
Power onsumption: The Issue Why is it a problem? Power dissipation is limiting factor in many systems Battery weight and life for portable devices Packaging and cooling costs for tethered systems ase temperature for laptop/wearable computers Fan noise for media hubs Example 1: ellphone 3 Watt hard power limit any more and customers complain Battery life is a strong product differentiator Example : Internet data center ~8,000 servers, ~ MegaWatts 5% of operational costs are in electricity bill for supplying power and running air-conditioning to remove heat 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 5
Power onsumption: The Issue Main forms are dynamic and static power R eff R eff g Reff d g Reff d Dynamic Power Switching power used to charge up load capacitance P dynamic = α F(1/) V DD Static Power Subthreshold leakage power when transistor is off P static = V DD I off 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 53
Power onsumption: ustom Approach P dynamic = α F (1/) V DD Reduce Activity lock gating so clock node of inactive logic doesn t switch Data gating so data nodes of inactive logic doesn t switch Bus encodings to minimize transitions Balance logic paths to avoid glitches during settling Reduce Frequency Doesn t save energy, just reduces rate at which it is consumed Lower power means less heat dissipation but must run longer 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 54
Power onsumption: ustom Approach P dynamic = α F (1/) V DD Reduce Switched apacitance areful transistor sizing (small transistors off critical path) Tighter layout (good floorplanning) Segmented tri-state bus structures Reduce Supply Voltage Need to lower frequency as well quadratic+ power savings an lower statically for cells off critical path an lower dynamically for just-in-time computation 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 55
Power onsumption: ustom Approach P static = V DD I OFF Reduce Supply Voltage In addition to dynamic power reduction, reducing Vdd can help reduce static power Reduce Off urrent Increase length of transistors off critical path Use high-vt cells off critical path (extra Vt increases fab costs) Use stacked devices Use power gating (ie switch off the power supply with a large transistor) 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 56
Power onsumption Reducing activity with clock gating Don t clock flip-flop if not needed Avoids transitioning downstream logic Enable adds control logic complexity P4 has hundreds of gated clock domains Global lock D Enable Latch (transparent on clock low) Gated Local lock Q lock Enable Latched Enable Gated lock 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 57
Power onsumption Reducing activity with data gating A B Shifter 1 Shifter infrequently used Adder 0 Shift/Add Select A B Shifter 1 Adder 0 ould use transparent latch instead of AND gate to reduce number of transitions, but would be bigger and slower. 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 58
Power onsumption Reducing supply voltage Both static and dynamic voltage scaling is possible Delay rises sharply as supply voltage approaches Vt [ Source: Horowitz ] 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 59
Power onsumption Parallel architecture to reduce energy 8-bit adder/cmp 40MHz at 5V, area = 530 kµ Base power P ref Two parallel interleaved adder/cmp units 0MHz at.9v, area = 1,800 kµ (3.4x) Power = 0.36 P ref One pipelined adder/cmp unit 40MHz at.9v, area = 690 kµ (1.3x) Power = 0.39 P ref + + + + + Pipelined and parallel 0MHz at.0v, area = 1,961 kµ (3.7x) Power = 0. P ref + + + + handrakasan et. al, IEEE JSS 7(4), April 199 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 60
Power onsumption: ASI Approach Minimize activity Automatic clock gating is possible if we write Verilog so tools can infer gating Partition designs so minimal number of components activated to perform each operation Floorplan units to reduce length of power-hungry global wires Use lowest voltage and slowest frequency necessary to reach target performance Use pipelined and parallel architectures if possible Modern standard cell libraries include low-power cells, high-vt cells, and low-vt cells tools can automatically replace non-critical cells to optimize for power 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 61
apacitive oupling: The Issue Delay is a function of switching on neighbors A B AB B Most of the wire capacitance is to neighboring wires If A switches then it injects voltage noise on where the magnitude depends on capacitive divider formed [ AB /( AB + B ) ] If A switches in opposite direction while B switches, coupling capacitance effectively doubles (Miller effect) If A switches in same direction while B switches, coupling capacitance disappears These effects can lead to large variance in possible delay of B driver, possibly factor of 5 or 6 between best and worst case 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 6
apacitive oupling ustom vs ASI Approach ustom Approach Avoid placing simultaneously switching signals next to each other for long parallel runs (use swizzling) Reroute signals which will be quiet during switching in between simultaneous switching signals Route signals close to power rails for capacitance ballast Extensive dynamic signal simulation ASI Approach Automatic routers can specifically avoid long straight routes, sometimes this causes the router to avoid the most direct route ritical nets (such as the clock) can use automatic shielding Static timing tools help focus dynamic signal simulation Fixing a coupling problem can require a point change which itself might cause new problems 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 63
Take away points Logical effort is a useful tool for quickly determining transistor sizing and number of stages It is essential to consider physical design issues early and often in ASI design Physical prototyping enables designers to evaluate impact of physical design issues early in the design process with Making logical partitioning match physical partitioning helps expose physical design tradeoffs at the RTL level Next Lecture: Arvind will introduce using guarded atomic actions to describe hardware 6.375 Spring 006 L06 Managing Physical Design Issues in ASI Toolflows 64