Fast Buffer Insertion Considering Process Variation

Similar documents
Making Fast Buffer Insertion Even Faster via Approximation Techniques

Non-Linear Statistical Static Timing Analysis for Non-Gaussian Variation Sources

Buffer Insertion Considering Process Variation

Physical Design of Digital Integrated Circuits (EN0291 S40) Sherief Reda Division of Engineering, Brown University Fall 2006

Timing-Aware Decoupling Capacitance Allocation in Power Distribution Networks

Simultaneous Buffer Insertion and Wire Sizing Considering Systematic CMP Variation and Random Leff Variation

DESIGN uncertainty in nanometer technology nodes

Utilizing Redundancy for Timing Critical Interconnect

Buffered Clock Tree Sizing for Skew Minimization under Power and Thermal Budgets

Accurate Estimation of Global Buffer Delay within a Floorplan

TAU 2014 Contest Pessimism Removal of Timing Analysis v1.6 December 11 th,

An Optimal Algorithm of Adjustable Delay Buffer Insertion for Solving Clock Skew Variation Problem

Statistical Timing Analysis with Path Reconvergence and Spatial Correlations

Luis Manuel Santana Gallego 71 Investigation and simulation of the clock skew in modern integrated circuits. Clock Skew Model 1

Parameterized Timing Analysis with General Delay Models and Arbitrary Variation Sources

Problems in VLSI design

Advanced Computational Methods for VLSI Systems. Lecture 5 Statistical VLSI modeling and analysis. Zhuo Feng

The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization

Variability Aware Statistical Timing Modelling Using SPICE Simulations

Buffer Insertion for Noise and Delay Optimization

TAU 2015 Contest Incremental Timing Analysis and Incremental Common Path Pessimism Removal (CPPR) Contest Education. v1.9 January 19 th, 2015

Thermal-reliable 3D Clock-tree Synthesis Considering Nonlinear Electrical-thermal-coupled TSV Model

Minimizing Clock Latency Range in Robust Clock Tree Synthesis

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 25, NO. 7, JULY

Accounting for Non-linear Dependence Using Function Driven Component Analysis

DesignConEast 2005 Track 4: Power and Packaging (4-WA1)

Pre-Layout Estimation of Individual Wire Lengths

A Geometric Programming-based Worst-Case Gate Sizing Method Incorporating Spatial Correlation

Variation-Resistant Dynamic Power Optimization for VLSI Circuits

Representative Path Selection for Post-Silicon Timing Prediction Under Variability

Capturing Post-Silicon Variations using a Representative Critical Path

Dynamic operation 20

PARADE: PARAmetric Delay Evaluation Under Process Variation *

Gate Delay Calculation Considering the Crosstalk Capacitances. Soroush Abbaspour and Massoud Pedram. University of Southern California Los Angeles CA

Steiner Trees in Chip Design. Jens Vygen. Hangzhou, March 2009

Pre and post-silicon techniques to deal with large-scale process variations

A Framework for Statistical Timing Analysis using Non-Linear Delay and Slew Models

Probabilistic Dual-Vth Leakage Optimization Under Variability

A Temperature-aware Synthesis Approach for Simultaneous Delay and Leakage Optimization

On Optimal Physical Synthesis of Sleep Transistors

ESE535: Electronic Design Automation. Delay PDFs? (2a) Today. Central Problem. Oxide Thickness. Line Edge Roughness

Hurwitz Stable Reduced Order Modeling and D2M Delay Calculation Metric Motivation Review: Moments. Content. Moments: Review. Motivation.

SINCE the early 1990s, static-timing analysis (STA) has

PARADE: PARAmetric Delay Evaluation Under Process Variation * (Revised Version)

Variation-aware Clock Network Design Methodology for Ultra-Low Voltage (ULV) Circuits

Statistical Timing Analysis with AMECT: Asymptotic MAX/MIN Approximation and Extended Canonical Timing Model

Design for Manufacturability and Power Estimation. Physical issues verification (DSM)

Luis Manuel Santana Gallego 31 Investigation and simulation of the clock skew in modern integrated circuits

Longest Path Selection for Delay Test under Process Variation

AFrameworkforScalablePost-SiliconStatistical Delay Prediction under Spatial Variations

Simultaneous Buffer and Wire Sizing for Performance and Power Optimization

Statistical Analysis of Random Telegraph Noise in Digital Circuits

Design of Integrated-Circuit Interconnects with Accurate Modeling of Chemical-Mechanical Planarization

Synthesizing a Representative Critical Path for Post-Silicon Delay Prediction

Chapter 2 Process Variability. Overview. 2.1 Sources and Types of Variations

Incremental Latin Hypercube Sampling

ULTRALOW VOLTAGE (ULV) circuits, where the supply

On Critical Path Selection Based Upon Statistical Timing Models -- Theory and Practice

Dept. Information Systems Engineering, Osaka Univ., Japan

Power-Driven Global Routing for Multi-Supply Voltage Domains

Dynamic Power Estimation for Deep Submicron Circuits with Process Variation

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

Robust Extraction of Spatial Correlation

Static Timing Analysis Considering Power Supply Variations

CSE241 VLSI Digital Circuits Winter Lecture 07: Timing II

Statistical Performance Analysis and Optimization of Digital Circuits

Skew Management of NBTI Impacted Gated Clock Trees

Micro-architecture Pipelining Optimization with Throughput- Aware Floorplanning

CMOS logic gates. João Canas Ferreira. March University of Porto Faculty of Engineering

Stochastic Capacitance Extraction Considering Process Variation with Spatial Correlation

Statistical Analysis of BTI in the Presence of Processinduced Voltage and Temperature Variations

An Automated Approach for Evaluating Spatial Correlation in Mixed Signal Designs Using Synopsys HSpice

CARNEGIE MELLON UNIVERSITY DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING DIGITAL INTEGRATED CIRCUITS FALL 2002

Performance-Impact Limited Area Fill Synthesis

CROSSTALK NOISE ANALYSIS FOR NANO-METER VLSI CIRCUITS

Max Operation in Statistical Static Timing Analysis on the Non-~Gaussian Variation Sources for VLSI Circuits

EE5780 Advanced VLSI CAD

Name: Answers. Mean: 38, Standard Deviation: 15. ESE370 Fall 2012

Implementation of Clock Network Based on Clock Mesh

DC and Transient. Courtesy of Dr. Daehyun Dr. Dr. Shmuel and Dr.

Optimum Prefix Adders in a Comprehensive Area, Timing and Power Design Space

EEC 118 Lecture #16: Manufacturability. Rajeevan Amirtharajah University of California, Davis

POST-SILICON TIMING DIAGNOSIS UNDER PROCESS VARIATIONS

Lecture 9: Clocking, Clock Skew, Clock Jitter, Clock Distribution and some FM

The Wire. Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic. July 30, 2002

A Scalable Statistical Static Timing Analyzer Incorporating Correlated Non-Gaussian and Gaussian Parameter Variations

Simple and Accurate Models for Capacitance Increment due to Metal Fill Insertion

Interconnect s Role in Deep Submicron. Second class to first class

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

Modelling and Compensating for Clock Skew Variability in FPGAs

Efficient Monte Carlo based Incremental Statistical Timing Analysis

ECE260B CSE241A Winter Statistical Timing Analysis and SPICE Simulation

Clock signal in digital circuit is responsible for synchronizing the transfer to the data between processing elements.

Xarxes de distribució del senyal de. interferència electromagnètica, consum, soroll de conmutació.

Technology Mapping for Reliability Enhancement in Logic Synthesis

Spiral 2 7. Capacitance, Delay and Sizing. Mark Redekopp

Skew Management of NBTI Impacted Gated Clock Trees

Delay and Power Estimation

A Hierarchical Analysis Methodology for Chip-Level Power Delivery with Realizable Model Reduction

Dynamic Adaptation for Resilient Integrated Circuits and Systems

Transcription:

Fast Buffer Insertion Considering Process Variation Jinjun Xiong, Lei He EE Department University of California, Los Angeles Sponsors: NSF, UC MICRO, Actel, Mindspeed

Agenda Introduction and motivation Modeling Problem formulation Detailed algorithms with complexity analysis Experimental results Conclusion 2

Buffer Insertion Flashback Buffer insertion and sizing is a commonly used technique for highperformance chip designs to minimize delay Classic results on buffer insertion Two-pin nets: closed form for optimal solution [Bakoglu 90] Multi-pin nets: dynamic-programming based algorithm to find the optimal solution [Van Ginneken 90] Extensions Multiple buffer libraries considering power minimization [Lillis 96] Wire segmentation [Alpert DAC97] Simultaneous buffer insertion and wire sizing [Chu, ISPD97] Simultaneous tree construction and buffer insertion [Okamoto DAC96] Simultaneous dual Vdd assignment and buffered tree construction [Tam DAC05].. 3

Design Optimization in Nanometer Manufacturing Probabilistic design approaches showed great promise to achieve better design quality Compared to deterministic approaches, statistical circuit tuning achieved 20% area reduction [Choi DAC04] 17% power reduction [Mani DAC05] Buffer insertion considering process variation is also gaining attention recently Limited consideration of process variation Wire-length variation [Khandelwal ICCAD03] Independency assumption on process variation Ignores global and spatial correlation [Xiong DATE05], [He ISPD05] High complexity Our major contributions: theoretical foundations Numerical integration to obtain JPDF [Xiong DATE05] that lifts these Applicable to only special routing structures limitations Two-pin nets only [Deng ICCAD05] 4

Agenda Introduction and motivation Modeling Problem formulation Detailed algorithms with complexity analysis Experimental results Conclusion 5

Modeling Linear delay model for buffer Input capacitance (Cb), output resistance (Rb), and intrinsic delay (Tb) π-model for interconnect Wire capacitance (Cw) and wire resistance (Rw) 6 How to model these quantities with correlated process variation?

7 First-order Canonical Form for Variation Modeling A = a + a X + a X + L+ a X + 0 Mean value E(A) = a 0 1 Random variables X 1, X 2,, X n model Die-to-die global variation: instances are affected in the same way Within-die spatial correlation: instances physically nearby are more likely to be similar [Agarwal ASPDAC03, Chang ICCAD03, Khandelwal DAC05] Random variable X Ra model Independent variation: instances next to each other are different All X i follow independent normal distributions Well accepted practice in SSTA [Chang ICCAD03, Visweswariah DAC04] In vector form, write device and interconnect with process variation Device: T b = T b0 + γ bt X, C b = C b0 + η bt X, R b = R b0 + ζ bt X Interconnect: C w = C w0 + η wt X, R w = R w0 + ζ wt X 1 2 2 n n a R X Ra

Buffer Insertion Considering Process Variation Given: a routing tree with required arrival time (RAT) and loading capacitance specified at sinks, and N possible buffer locations Considering: both FEOL device and BEOL interconnect process variations Find: locations to insert buffers So that: the timing slack at the root is maximized Timing slack: min i (RAT i delay i ) s 2 sinks s 0 s 1 s 3 root s 4 possible buffer locations 8

Agenda Introduction and motivation Modeling Problem formulation Detailed algorithms Key operations for buffering solutions Transitive-closure pruning rule Complexity analysis Experimental results Conclusion 9

Key Operations in Van Ginneken Algorithm Associate each node with two metrics (C t, T t ) Downstream loading capacitance (C t ) and RAT (T t ) DP-based alg propagates potential solutions bottom-up [Van Ginneken, 90] Add a wire Add a buffer C = C + C t n w 1 T = T R L R C 2 t n w n w w C t = C b T = T T R L t n b b n C t, T t C w, R w C n, T n C t, T t C n, T n Merge two solutions C = C + C t n m T = min( T, T ) t n m C n, T n C t, T t C m, T m 10 How to define these operations in statistical sense?

Atomic Operations 11 Keep all quantities in canonical form after operations Maintain correlation w.r.t. sources of variation Updated solutions can still be handled by the same set of operations Add a wire C = C + C t n w T = T R L 1 R 2 C Add a buffer Ct = Cb T = T T R L t n w n w w t n b b n Merge two solutions C t = C n + C m T = min( T, T ) t n m Addition/subtraction of two canonical forms is another canonical form T T A + B= a + α X + b + β X Multiplications Minimum ( ) ( ) 0 0 T = ( a + b ) + ( α + β) X 0 0 No longer a canonical form

Approximate Multiplication as Canonical Form Multiplication of two canonical forms results in a quadratic term Matrix Γ = αβ T C = A B= ( a + α X)( b + β X) T T 0 0 T T T T 0 0 ( 0α 0β ) αβ T T 0 0 γ = ab + b + a X+ X X = ab + X+ X ΓX Approximate it as a canonical form by matching the mean and variance with that of the exact solution EC ( ) EC ( ) γ γ 2 2 ' T T C = E( C) + γ X = c T 0 + η X E(C) is the mean value (first moment) of C E(C 2 ) is the second moment of C E(C 2 )-E(C) 2 is the variance C is a new canonical form with the same mean and variance as C 12

Closed Form for Moment Computation 13 1 st Moment 2 nd Moment T T E( C) = c+ γ EX ( ) + EX ( ΓX) 0 E( C ) = E( c + 2cγ X+ 2X Γ Xγ X+ 2 c X Γ X+ X γγ X+ ( X ΓX) ) 2 2 T T T T T T T 2 0 0 0 2 T T T T T T 2 0 0γ γ 0 γγ = c+ 2 c EX ( ) + 2 EX ( Γ X X) + EX ( (2 cγ+ ) X) + E(( XΓX) ) Theorem: If X is an independent multivariate normal distribution ~N(0,I), then for any vector γ and matrix Γ T EX ( Γ X) = tr( Γ) 2 2 ' T T EX Γ X EC γ ( X) ) = 0 EC ( ) T T A B C = E( C) + γ X = c0 + η X T T 2 γ γ 2 2 E(( X Γ X) ) = 2 tr( Γ ) + tr( Γ) Trace of a matrix (tr) equals to the sum of all diagonal elements In general, tr(γ) and tr(γ 2 ) are expensive, but if Γ=αβ T +εη T (a row rank matrix), we can show tr 2 2 2 ( Γ ) = β T α + ηε T, tr( Γ ) = ( β T α) + ( ηε T ) + 2( β T α)( ηε T )

Approximate Minimum as Canonical Form Minimum of two canonical forms is also not a canonical form Approximate it as a canonical from by matching the exact mean and variance Tightness probability of A: Φ is the CDF of a standard normal distribution θ is given by T T min( A, B) = c + ( T β + T α ) X + c X T P A B Exact mean and variance can be computed in closed form [Clack 65] Well known for statistical timing analysis 0 A 2 2 θ = σ A + σ B 2cov( A, B) A B R R 0 0 = ( > ) =Φ Design for mean value design for nominal value because of mean shift b0 a0 E(min( A, B)) = TAa0 + TBb0 θφ min( a0, b0) θ Design for mean value Design for nominal value 14 a b θ

Agenda Introduction and motivation Modeling Problem formulation Detailed algorithms Key operations for buffering solutions Transitive-closure pruning rule Complexity analysis Experimental results Conclusion 15

Deterministic Pruning Rule If T 1 >T 2 and C 1 < C 2 (C 1, T 1 ) dominates (C 2, T 2 ) Dominated solution (C 2, T 2 ) is redundant RAT Redundant solutions Deterministic pruning has linear time complexity because of the following two desired properties Ordering property Can we achieve the same time complexity Either A>B or A<B holds for statistical pruning? Transitive ordering (transitive-closure) property A>B, B>C A>C Make it possible to sort solutions in order Load Assume sorted by load linear time to prune redundant solutions 16

Statistical Pruning Rule (C 1, T 1 ) dominates (C 2, T 2 ) P(C 1 < C 2 ) 0.5 and P(T 1 > T 2 ) 0.5 Properties of this statistical pruning rule Ordering property Given: T 1 and T 2 as two dependent random variables Then: either P(T 1 >T 2 ) 0.5 or P(T 1 <T 2 ) 0.5 holds Transitive-closure ordering property Given T 1, T 2, and T 3 as three dependent random variables with a joint normal distribution, If: P(T 1 >T 2 ) 0.5, P(T 2 >T 3 ) 0.5 Then: P(T 1 >T 3 ) 0.5 Transitive-closure property can be extended to the more general case > P(T 1 >T 2 ) p, P(T 2 >T 3 ) p P(T 1 >T 3 ) p for any p 2 [0.5, 1] Statistical pruning has the same linear time complexity as deterministic pruning 17

Deterministic vs Statistical Buffering 18 For solution (C n, T n ) in node t Z 1 = ADD-WIRE(C n, T n ); Z 2 = ADD-BUFFER(Z 1 ); For solution (C m, T m ) from subtree m For solution (C n, T n ) from subtree n (C t, T t )=MERGE((C m, T m ),(C n, T n )); Z = PRUNE(Z); All quantities are deterministic values For solution (C n, T n ) in node t Z 1 = STAT-ADD-WIRE(C n, T n ); Z 2 = STAT-ADD-BUFFER(Z 1 ); Same O(N 2 ) complexity as the classic deterministic buffering algorithm Deterministic merge and pruning operations can be combined into one linear time operation New complexity result: O(N*log 2 (N)) [Wei, DAC 03] For solution (C m, T m ) from subtree m For solution (C n, T n ) from subtree n Statistic merge and pruning can not be combined Statistic version s complexity is higher (C t, T t )=STAT-MERGE((C m, T m ),(C n, T n )); Z = STAT-PRUNE(Z); All quantities are canonical forms

When Merge and Prune can be Combined? Made possible via merge-sort like operation in deterministic case RAT 1 5 7 RAT 5 7 RAT 1 3 5 2 3 8 Load 1 2 3 Load 1 3 6 Load 2 4 6 9 11 19 Because of the following property: Min(A 1,B 1 ) Min(A 2,B 1 ) if A 1 A 2 Min(A,B) Min(A+δA,B) Min(A,B) is a nondecreasing function of inputs In statistic case, such a property does not hold (even for mean) E(min( A, B)) E(min( A, B)) E(min( A, B)) > 0 < 0 > 0 EA ( ) σ ρ( AB, ) A

Agenda Introduction and motivation Modeling Problem formulation Detailed algorithms Key operations for buffering solutions Transitive-closure pruning rule Complexity analysis Experimental results Conclusion 20

Experimental Setting Variation setting Global, spatial, and independent variations all to be 5% of the nominal value Spatial variation used a grid model similar to [Chang, ICCAD03] Grid size 500um Correlation distance about 2mm (beyond that, no spatial correlation) Benchmarks Two sets of benchmarks from public domain [Shi, DAC03] Deterministic design for worst case (WORST) All parameters projected to its respective 3-sigma values 21

Runtime Comparison Compared with T2P proposed in [Xiong DATE05] Only known work that considered both device and interconnect variations JPDF computed via expensive numerical integration No global and spatial correlation considered Heuristic pruning rules (T2P) Re-implement T2P under the same first-order variation model, but still use its heuristic pruning rule Bench Sink Buf Loc WORST T2P This work p1 269 537 0 25.4 1.0 (25.4x) p2 603 1205 0.01-4.3 r1 267 533 0-3.6 r2 598 1195 0-15.0 r3 862 1723 0.02-27.5 r4 1903 3805 0.04-88.9 r5 3101 6201 0.08-195.8 22

Monte Carlo Simulation Results For a given a buffered routing tree with 10K MC runs, delay PDF at the root PDF from Monte Carlo roughly follows a normal distribution Our approximation technique captures the PDF well 0.06 0.05 Monte Carlo Norm Approx. 0.07 0.06 This work WORST 0.04 0.03 0.02 0.05 0.04 0.03 0.02 Yield Loss 0.01 0.01 0 1910 1920 1930 1940 1950 1960 1970 0 1910 1920 1930 1940 1950 1960 1970 1980 1990 Figure-of-merits: 3-sigma delay vs yield loss 3-sigma delay for red PDF 23

Timing Optimization Comparison based on MC WORST This Work Buffer Mean 3-sigma Delay Yield Loss Buffer mean 3-sigma Delay Yield p1 58 2374 2403 0% 60 (3.3%) 2375 2403 100% p2 149 3161 3203 0% 156 (4.5%) 3161 3204 100% r1 59 772 790 0% 65 (9.2%) 771 790 100% r2 112 1109 (1.7%) 1128 (1.5%) 35.3% 135 (17%) 1090 1111 100% r3 173 1127 (0.7%) 1147 (0.5%) 1.6% 188 (8%) 1119 1142 100% r4 320 1700 (1.5%) 1723 (1.4%) 54.9% 374 (14.4%) 1674 1699 100% r5 544 1958 (1%) 1986 (1.0%) 17.9% 608 (10.5%) 1938 1966 100% Avg 0.7% 0.6% 15.7% 9.6% Buffer insertion considering process variation improves timing yield by 15% on average More effective for large benchmarks Relative mean (or 3-sigma) delay improvement is small large mean values More buffers are inserted in order to achieve this gain 24

Conclusion and Future Work 25 Developed a novel algorithm for buffer insertion considering process variation Two major theoretical contributions An effective approximation technique to handle nonlinear multiplication operation, all through closed form computation A provably transitive-closure pruning rule Maybe useful for other applications Timing optimization shown that considering process variation can improve timing yield by more than 15% Future work Theoretically examine the impact of process variation on buffering Apply the theories in this work to other design applications

Questions? 26