Timing-Aware Decoupling Capacitance Allocation in Power Distribution Networks

Similar documents
Static Timing Analysis Considering Power Supply Variations

UPC. 5. Properties and modeling of onchip Power Distribution Networks. Decoupling capacitance

Fast Buffer Insertion Considering Process Variation

CSE241 VLSI Digital Circuits Winter Lecture 07: Timing II

Network Flow-based Simultaneous Retiming and Slack Budgeting for Low Power Design

Buffered Clock Tree Sizing for Skew Minimization under Power and Thermal Budgets

Clock Buffer Polarity Assignment Utilizing Useful Clock Skews for Power Noise Reduction

Statistical Analysis of BTI in the Presence of Processinduced Voltage and Temperature Variations

Physical Design of Digital Integrated Circuits (EN0291 S40) Sherief Reda Division of Engineering, Brown University Fall 2006

Skew Management of NBTI Impacted Gated Clock Trees

Integrated Circuits & Systems

A Temperature-aware Synthesis Approach for Simultaneous Delay and Leakage Optimization

DKDT: A Performance Aware Dual Dielectric Assignment for Tunneling Current Reduction

Dynamic Power Estimation for Deep Submicron Circuits with Process Variation

Early-stage Power Grid Analysis for Uncertain Working Modes

EEC 118 Lecture #5: CMOS Inverter AC Characteristics. Rajeevan Amirtharajah University of California, Davis Jeff Parkhurst Intel Corporation

A Robustness Optimization of SRAM Dynamic Stability by Sensitivity-based Reachability Analysis

Timing Analysis in Presence of Supply Voltage and Temperature Variations

Minimizing Clock Latency Range in Robust Clock Tree Synthesis

Luis Manuel Santana Gallego 31 Investigation and simulation of the clock skew in modern integrated circuits

Power Dissipation. Where Does Power Go in CMOS?

Variation-aware Clock Network Design Methodology for Ultra-Low Voltage (ULV) Circuits

Timing Analysis with Clock Skew

Energy-Efficient Real-Time Task Scheduling in Multiprocessor DVS Systems

Efficient Circuit Analysis under Multiple Input Switching (MIS) Anupama R. Subramaniam

On Potential Design Impacts of Electromigration Awareness

VLSI GATE LEVEL DESIGN UNIT - III P.VIDYA SAGAR ( ASSOCIATE PROFESSOR) Department of Electronics and Communication Engineering, VBIT

A Mathematical Solution to. by Utilizing Soft Edge Flip Flops

A Clustering Technique for Fast Electrothermal Analysis of On-Chip Power Distribution Networks

Where Does Power Go in CMOS?

DesignConEast 2005 Track 4: Power and Packaging (4-WA1)

Name: Answers. Mean: 38, Standard Deviation: 15. ESE370 Fall 2012

Lecture 14: Circuit Families

Last Lecture. Power Dissipation CMOS Scaling. EECS 141 S02 Lecture 8

Lecture 2: CMOS technology. Energy-aware computing

A Geometric Programming-based Worst-Case Gate Sizing Method Incorporating Spatial Correlation

An Algorithm for Optimal Decoupling Capacitor Sizing and Placement for Standard Cell Layouts Λ

Efficient Optimization of In-Package Decoupling Capacitors for I/O Power Integrity

EE115C Winter 2017 Digital Electronic Circuits. Lecture 19: Timing Analysis

EE 560 CHIP INPUT AND OUTPUT (I/0) CIRCUITS. Kenneth R. Laker, University of Pennsylvania

Midterm. ESE 570: Digital Integrated Circuits and VLSI Fundamentals. Lecture Outline. Pass Transistor Logic. Restore Output.

EE 466/586 VLSI Design. Partha Pande School of EECS Washington State University

Session 8C-5: Inductive Issues in Power Grids and Packages. Controlling Inductive Cross-talk and Power in Off-chip Buses using CODECs

Lecture 4: DC & Transient Response

PARADE: PARAmetric Delay Evaluation Under Process Variation *

Lecture 5: DC & Transient Response

CROSSTALK NOISE ANALYSIS FOR NANO-METER VLSI CIRCUITS

Efficient Decoupling Capacitance Budgeting Considering Operation and Process Variations

LECTURE 380 TWO-STAGE OPEN-LOOP COMPARATORS - II (READING: AH ) Trip Point of an Inverter

Lecture 23. Dealing with Interconnect. Impact of Interconnect Parasitics

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

Short Papers. Efficient In-Package Decoupling Capacitor Optimization for I/O Power Integrity

Problem Formulation for Arch Sim and EM Model

Lecture 16: Circuit Pitfalls

CycleTandem: Energy-Saving Scheduling for Real-Time Systems with Hardware Accelerators

The Linear-Feedback Shift Register

MOSFET and CMOS Gate. Copy Right by Wentai Liu

Variation-Resistant Dynamic Power Optimization for VLSI Circuits

Latch/Register Scheduling For Process Variation

Dynamic operation 20

Lecture 6: DC & Transient Response

Lecture Outline. ESE 570: Digital Integrated Circuits and VLSI Fundamentals. Review: 1st Order RC Delay Models. Review: Two-Input NOR Gate (NOR2)

EE5780 Advanced VLSI CAD

Digital Integrated Circuits A Design Perspective

Transistor Sizing for Radiation Hardening

Switching circuits: basics and switching speed

Next, we check the race condition to see if the circuit will work properly. Note that the minimum logic delay is a single sum.

What Does VLV Testing Detect?

Lecture 21: Packaging, Power, & Clock

Signal integrity in deep-sub-micron integrated circuits

ESE535: Electronic Design Automation. Delay PDFs? (2a) Today. Central Problem. Oxide Thickness. Line Edge Roughness

Capturing Post-Silicon Variations using a Representative Critical Path

9/18/2008 GMU, ECE 680 Physical VLSI Design

CARNEGIE MELLON UNIVERSITY DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING DIGITAL INTEGRATED CIRCUITS FALL 2002

Preamplifier in 0.5µm CMOS

VLSI Design and Simulation

Lecture 6 Power Zhuo Feng. Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 2010

Utilizing Redundancy for Timing Critical Interconnect

Efficient Checking of Power Delivery Integrity for Power Gating

COMBINATIONAL LOGIC. Combinational Logic

CD4070BM CD4070BC Quad 2-Input EXCLUSIVE-OR Gate CD4077BM CD4077BC Quad 2-Input EXCLUSIVE-NOR Gate

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

EEC 118 Lecture #16: Manufacturability. Rajeevan Amirtharajah University of California, Davis

TAU 2014 Contest Pessimism Removal of Timing Analysis v1.6 December 11 th,

! Crosstalk. ! Repeaters in Wiring. ! Transmission Lines. " Where transmission lines arise? " Lossless Transmission Line.

Toward More Accurate Scaling Estimates of CMOS Circuits from 180 nm to 22 nm

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

Lecture 12 CMOS Delay & Transient Response

EECS 151/251A Homework 5

Features Y Wide supply voltage range 3 0V to 15V. Y High noise immunity 0 45 VDD (typ ) Y Low power TTL fan out of 2 driving 74L

Topics. Dynamic CMOS Sequential Design Memory and Control. John A. Chandy Dept. of Electrical and Computer Engineering University of Connecticut

On Critical Path Selection Based Upon Statistical Timing Models -- Theory and Practice

EE115C Winter 2017 Digital Electronic Circuits. Lecture 6: Power Consumption

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

On Co-Optimization Of Constrained Satisfiability Problems For Hardware Software Applications

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 32, NO. 7, JULY

University of Toronto. Final Exam

Digital Integrated Circuits Designing Combinational Logic Circuits. Fuyuzhuo

Topic 4. The CMOS Inverter

ECE315 / ECE515 Lecture 11 Date:

Transcription:

Timing-Aware Decoupling Capacitance Allocation in Power Distribution Networks Sanjay Pant, David Blaauw Electrical Engineering and Computer Science University of Michigan 1/22

Power supply integrity issues Functional failure Performance failure Motivation Ldi/dt drop becoming significant Large amounts of extrinsic decap added to suppress Ldi/dt Explicitly added decap is not free Decap oxide leakage increasing with each technology generation Decap leakage may limit the amount of extrinsic decap Proposed work Decap added optimally to improve circuit performance Utilizes the timing slack available in the circuit Non-critical gates can tolerate relatively larger supply drop [Apache]-ITRS 2004 2/22

Outline Traditional Methods and Prior Work Proposed Approach Experimental Results Conclusion 3/22

Prior Works On Decap Allocation Allocate decap with objective of minimizing drop at all nodes Decap sizes w j are the opt. variables V n (t) Vdd g n = violation at node n 90%Vdd g n Noise Metric = Σ g n nodes ts n te n t Minimize Subject to Noise Metric constraints on decap sizes Adjoint sensitivity method for sensitivity of noise metric to decap sizes Sapatnekar [ISPD-02], Roy [DAC-00], Li [DAC-05] 4/22

Prior approaches Proposed Approach Constrain voltage drop at all nodes regardless of connected gates being critical May not be optimal for maximum performance Observation Gates which are not timing critical can afford relatively larger voltage drop Lesser decap area and leakage for same performance if circuit has timing slack Path with timing slack Larger drop Proposed approach Allocate decaps in order to minimize circuit delay Not focused on minimizing the drop at all the power grid nodes Utilizes timing slacks for driving the decap allocation optimization problem 5/22

Outline Traditional Methods and Prior Work Proposed Approach Primal Problem Lagrangian Relaxation and Gradient Computation Path-based greedy algorithm Experimental Results Conclusion 6/22

Problem Definition POWER GRID VDD V 1 (t) V 2 (t) VDD 1 2 3 4 V 3 (t) V 4 (t) Decap Candidates C i, decap sizes for minimum circuit delay =? ΔV 1 ΔV 2 V n (t) 1 2 ΔV 3 ΔV 4 3 4 Vdd ΔV n τ n Gate n assumed to be operating at supply voltage (Vdd ΔV n ) Conservative analysis which assumes all gates switching with local worst drops 7/22

Gate Delay Model Characterize delay of gate, i from its input j as a linear function of Local supplies, Vdd i and Vss i (reduction in drive strength) Input driver s supplies, Vdd j and Vss j (input signal swing) D ji = D ji0 +k ji Vdd i + l ji Vss i + m ji Vdd j + n ji Vss j tr ji = tr ji0 +p ji Vdd i + q ji Vss i + r ji Vdd j + s ji Vss j Library re-characterization A load-slope based 7x7 table containing D ji0, k, l, m, n Delay measured w.r.t. 50%Vdd nominal point 8/22

Primal Problem (PP) Cell Decap w j src 5 6 7 8 in1 in2 in3 in4 V dd3 3 V ss3 V dd4 4 V ss4 a 3 a 4 V dd1 1 V ss1 V dd2 2 V ss2 a 1 out1 sink 0 a 2 out2 Minimize Σ C i Subject to (1) a j T 0 j = input(0) (2) a j + D ji a i j = input(i), gates i (3) D ji = D ji0 +k ji Vdd i + l ji Vss i + m ji Vdd j + n ji Vss j i = input(j), gates j (4) 0 C i w max, i = 1..N decap (5) Voltage Supplies a fn of decap sizes Gx ( t) + Cx& ( t) = u( t) 9/22

Outline Traditional Methods and Prior Work Proposed Approach Primal Problem Lagrangian Relaxation and Gradient Computation Path-based greedy algorithm Experimental Results Conclusion 10/22

Lagrangian Relaxation Problem (LRP) λ 53 λ 31 5 3 a 3 1 a 1 λ 10 src 6 7 8 λ 63 λ 74 4 a 4 λ 32 2 a 2 λ 20 sink 0 λ 84 λ 42 Minimize Σ C i + Σ λ j0 (a j T 0 )+ ΣΣλ ji (a j + D ji -a i ) Subject to (1) D ji = D ji0 +k ji Vdd i + l ji Vss i + m ji Vdd j + n ji Vss j i = input(j), gates j (2) 0 C i w max, i = 1..N decap (3) Voltage Supplies a fn of decap sizes (4) λ 0 Gx ( t) + Cx& ( t) = u( t) λ ji denotes the criticality of gate i from its input j 11/22

Kuhn Tucker Conditions If λ is optimal, sensitivity of objective fn wrt. arrival times = 0 obj ai = 0 k=output(i) gates i Σ λ ik = Σ λ ji gates i λ 53 j=input(i) λ 31 5 3 a 3 1 a 1 src 6 7 8 λ 63 4 a 4 λ 32 2 a 2 sink 0 Using KT conditions, obj. becomes independent of a i for given set of λ Minimize Σ C i + Σ Σλ ji D ji - Σ λ j0 T 0 12/22

Solving the Lagrangian Relaxation Problem Using the delay model expression, objective function becomes a linear function of supply voltages Minimize Σ C i + Σ Σλ ji D ji - Σ λ j0 T 0 D ji = D ji0 +k ji Vdd i + l ji Vss i + m ji Vdd j + n ji Vss j Minimize Σ C i + Σ α i ΔVdd i + Σ β i ΔVss i - Σ λ j0 T 0 Subject to (1) 0 C i C max, i = 1..N decap (2) Voltage Supplies a fn of decap sizes Where, α i = Σλ ji. k ji + Σλ ik.m ik and β i = Σλ ji.l ji + Σλ ik.n ik j=input(i) k=output(i) j=input(i) k=output(i) Needed: Gradients of the objective function wrt decap sizes ( C i + Σ α i ΔVdd i + Σ β i ΔVss i ) C 13/22

Gradient Computation Direct method: Single variable parameter, many measurement variables Total # of simulations = # of nodes in the grid Adjoint method: Many variable parameters, single measurement objective Total # of simulations = # of gates in the circuit Proposed Approach: Many variable parameters, many measurement variables Modified adjoint sensitivity method Measure derivative of voltage in the original circuit at all decap locations Simulate adjoint circuit with multiple current excitations simultaneously applied Z C T = ψ ( T t) v& 0 C Sensitivity of objective function obtained in only one simulation of adjoint grid C ( t) dt Conn, Visweswariah [SPECS, Jiffytune] 14/22

Gradient Computation Illustration ΔV 1 ΔV 2 τ 1 τ 2 ΔV 3 ΔV 4 τ τ 4 3 Original Grid Σ α i ΔV i C τ 1 α 1 τ 2 α 2 τ 3 α 3 α 4 τ 4 Z T Adjoint Grid = ψ C ( T t) v& C 0 C ( t) dt 15/22

Overall Global Optimization Flow Set initial set of λs and decap sizes Original grid simulation Formulate Lagrangian subproblem λ ji k+1 = λ ji k+1 - ρ k s i k Update set of λs Adjoint grid simulation Gradient computation using convolution Solve Lagrangian subproblem Compute slacks AT and RAT based on voltage drops Convergence? Original grid simulation Decap Sizes 16/22

Path Based Heuristic Start with min. decap sizes Original grid simulation Find critical path from STA Adjoint grid simulation Gradient Computation Greedily allocate ΔC Decap budget met yes Decap sizes no 17/22

Outline Traditional Methods and Prior Work Proposed Approach Primal Problem Lagrangian Relaxation and Gradient Computation Path-based greedy algorithm Experimental Results Conclusion 18/22

Experimental Setup Current profiles - A triangular waveform applied at all the gates Gates placed through APR LANCELOT used for non-linear optimization C++ MNA solver used for original and adjoint grid simulation ISCAS85 benchmarks synthesized in 0.13μ library Power Grid description Name # Layers Die area # nodes # elements #C4s Grid1 4 700μx700μ 10,804 17,468 12Vdd, 12Vss Grid2 4 1.2mmx1.2mm 17,530 29,746 28Vdd, 28Vss 19/22

Experimental Results Iso-decap comparison with uniform decap distribution Grid ckt # gates # decaps decap budget Nom. Circuit delay % delay redn. runtimes uniform Global opt. Greedy opt. Global opt. Greedy opt. Global opt. Greedy opt. Grid1 c432 212 476 2.38nF 1.498ns 1.798ns 1.621ns 1.640ns 9.84% 8.79% 11m15s 1m15s Grid1 c499 553 595 2.98nF 1.233ns 1.480ns 1.308ns 1.394ns 11.62% 5.81% 9m41s 1m57s Grid1 c1355 654 793 3.97nF 1.839ns 2.207ns 1.878ns 1.913ns 14.90% 13.32% 11m43s 2m58s Grid1 c1908 543 579 2.89nF 2.088ns 2.506ns 2.251ns 2.256ns 10.17% 9.98% 20m24s 25.83s Grid1 c2670 1043 1190 3.57nF 1.622ns 1.946ns 1.754ns 1.764ns 9.86% 9.35% 52m33s 8m41s Grid2 c3540 1492 1559 7.79nF 2.301ns 2.761ns 2.498ns 2.564ns 9.52% 7.31% 109m59s 23m49s Grid2 c5315 2002 2217 6.65nF 2.080ns 2.769ns 2.409ns 2.416ns 13.00% 12.75% 221m24s 61m18s Grid2 c6288 Grid2 c7552 3595 2360 3712 2571 8.15nF 7.18nF 5.186ns 2.975ns 6.223ns 3.571ns - - 5.820ns 3.262ns - - 6.48% 8.65% >4hrs >4hrs 188m36 s 63m03s Avg Delay Reduction: 10.11% 20/22

Experimental Results Iso-delay comparison with uniform decap distribution Grid Ckt Nom. Delay Delay constraint Decap Allocated Uniform Optim. % decap redn. Grid1 c432 1.498ns 1.640ns 3.55nF 2.38nF 32.98% Grid1 c499 1.233ns 1.394ns 3.49nF 2.98nF 17.60% Grid1 c1355 1.839ns 1.913ns 6.65nF 3.97nF 40.33% Grid1 c1908 2.088ns 2.256ns 6.15nF 2.89nF 52.92% Grid2 c2670 1.622ns 1.764ns 6.96nF 3.57nF 95.80% Grid2 c3540 2.301ns 2.564ns 10.04nF 7.80nF 22.37% Circuit Delay (ns) 2.1 2.0 1.9 1.8 1.7 1.6 1.5 Grid2 c5315 2.080ns 2.416ns 12.20nF 6.65nF 45.56% Grid2 c6288 5.186ns 5.820ns 9.74nF 8.15nF 16.31% 1.0 1.5 2.0 2.5 3.0 3.5 4.0 Total Decap (nf) Grid2 c7552 2.978ns 3.262ns 13.26nF 7.18nF 45.85% Avg Decap Reduction: 35.51% 21/22

Conclusions and Future Work Proposed a method for timing aware decap allocation Efficient sensitivity computation of circuit delay to decap sizes Utilizes timing slacks available in the circuit Iso-decap comparison with uniform decap distribution demonstrates 10% improvement in circuit timing Iso-delay comparison with uniform decap distribution demonstrates 35% reduction in total decap Future Work Validation on larger circuits Explore convergence and better optimization algorithms such as IPOPT Exploit grid-locality for reducing run-time of grid simulations 22/22