Buffer Insertion for Noise and Delay Optimization

Similar documents
Making Fast Buffer Insertion Even Faster via Approximation Techniques

Accurate Estimation of Global Buffer Delay within a Floorplan

Luis Manuel Santana Gallego 100 Investigation and simulation of the clock skew in modern integrated circuits. Clock Skew Model

Interconnect s Role in Deep Submicron. Second class to first class

Closed Form Expressions for Delay to Ramp Inputs for On-Chip VLSI RC Interconnect

Fast Buffer Insertion Considering Process Variation

PARADE: PARAmetric Delay Evaluation Under Process Variation *

Physical Design of Digital Integrated Circuits (EN0291 S40) Sherief Reda Division of Engineering, Brown University Fall 2006

Efficient Crosstalk Estimation

Very Large Scale Integration (VLSI)

Area-Time Optimal Adder with Relative Placement Generator

CMPEN 411 VLSI Digital Circuits Spring 2012

Problems in VLSI design

CSE241 VLSI Digital Circuits Winter Lecture 07: Timing II

Optimum Prefix Adders in a Comprehensive Area, Timing and Power Design Space

Digital Integrated Circuits. The Wire * Fuyuzhuo. *Thanks for Dr.Guoyong.SHI for his slides contributed for the talk. Digital IC.

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

! Crosstalk. ! Repeaters in Wiring. ! Transmission Lines. " Where transmission lines arise? " Lossless Transmission Line.

CARNEGIE MELLON UNIVERSITY DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING DIGITAL INTEGRATED CIRCUITS FALL 2002

Simultaneous Buffer and Wire Sizing for Performance and Power Optimization

Thermal-reliable 3D Clock-tree Synthesis Considering Nonlinear Electrical-thermal-coupled TSV Model

PARADE: PARAmetric Delay Evaluation Under Process Variation * (Revised Version)

Buffer Delay Change in the Presence of Power and Ground Noise

Interconnect (2) Buffering Techniques.Transmission Lines. Lecture Fall 2003

An Optimal Algorithm of Adjustable Delay Buffer Insertion for Solving Clock Skew Variation Problem

A Sequential Quadratic Programming Approach to Concurrent Gate and Wire Sizing *

Signal integrity in deep-sub-micron integrated circuits

Lecture 23. Dealing with Interconnect. Impact of Interconnect Parasitics

EE5780 Advanced VLSI CAD

Analytical Delay Models for VLSI Interconnects Under Ramp Input

Skew Management of NBTI Impacted Gated Clock Trees

Lecture 5: DC & Transient Response

ECE 342 Electronic Circuits. Lecture 35 CMOS Delay Model

Digital Integrated Circuits (83-313) Lecture 5: Interconnect. Semester B, Lecturer: Adam Teman TAs: Itamar Levi, Robert Giterman 1

Implementation of Clock Network Based on Clock Mesh

Interconnects. Wire Resistance Wire Capacitance Wire RC Delay Crosstalk Wire Engineering Repeaters. ECE 261 James Morizio 1

EE 466/586 VLSI Design. Partha Pande School of EECS Washington State University

Accurate Estimating Simultaneous Switching Noises by Using Application Specific Device Modeling

9/18/2008 GMU, ECE 680 Physical VLSI Design

Integrated Circuits & Systems

The Wire. Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic. July 30, 2002

5.0 CMOS Inverter. W.Kucewicz VLSICirciuit Design 1

Clock signal in digital circuit is responsible for synchronizing the transfer to the data between processing elements.

CPE/EE 427, CPE 527 VLSI Design I L18: Circuit Families. Outline

Lecture 9: Interconnect

Chapter 5 CMOS Logic Gate Design

Name: Answers. Mean: 38, Standard Deviation: 15. ESE370 Fall 2012

EE115C Digital Electronic Circuits Homework #5

CMPEN 411 VLSI Digital Circuits Spring 2011 Lecture 07: Pass Transistor Logic

ECE 342 Solid State Devices & Circuits 4. CMOS

CMPEN 411 VLSI Digital Circuits Spring 2012 Lecture 17: Dynamic Sequential Circuits And Timing Issues

Lecture 4: DC & Transient Response

Analysis of Substrate Thermal Gradient Effects on Optimal Buffer Insertion

Properties of CMOS Gates Snapshot

Analytical Delay Models for VLSI Interconnects Under Ramp Input

ECE414/514 Electronics Packaging Spring 2012 Lecture 6 Electrical D: Transmission lines (Crosstalk) Lecture topics

Digital Integrated Circuits Designing Combinational Logic Circuits. Fuyuzhuo

Homework #2 10/6/2016. C int = C g, where 1 t p = t p0 (1 + C ext / C g ) = t p0 (1 + f/ ) f = C ext /C g is the effective fanout

Lecture 6: DC & Transient Response

EE141. Administrative Stuff

Noise and Delay Uncertainty Studies for Coupled RC Interconnects

Chapter 2 Fault Modeling

HIGH-PERFORMANCE circuits consume a considerable

EE141-Spring 2008 Digital Integrated Circuits EE141. Announcements EECS141 EE141. Lecture 24: Wires

Static Timing Analysis Considering Power Supply Variations

Combinational Logic Design

TAU 2015 Contest Incremental Timing Analysis and Incremental Common Path Pessimism Removal (CPPR) Contest Education. v1.9 January 19 th, 2015

VLSI Design, Fall Logical Effort. Jacob Abraham

The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization

Lecture 5: DC & Transient Response

Name: Answers. Mean: 83, Standard Deviation: 12 Q1 Q2 Q3 Q4 Q5 Q6 Total. ESE370 Fall 2015

DC and Transient. Courtesy of Dr. Daehyun Dr. Dr. Shmuel and Dr.

Buffered Clock Tree Sizing for Skew Minimization under Power and Thermal Budgets

Topics to be Covered. capacitance inductance transmission lines

Lecture 8: Combinational Circuit Design

Steiner Trees in Chip Design. Jens Vygen. Hangzhou, March 2009

Interconnects. Introduction

Skew Management of NBTI Impacted Gated Clock Trees

Luis Manuel Santana Gallego 31 Investigation and simulation of the clock skew in modern integrated circuits

Digital Integrated Circuits A Design Perspective

1 Caveats of Parallel Algorithms

HOW TO CHOOSE & PLACE DECOUPLING CAPACITORS TO REDUCE THE COST OF THE ELECTRONIC PRODUCTS

Variation-aware Clock Network Design Methodology for Ultra-Low Voltage (ULV) Circuits

Digital Integrated Circuits A Design Perspective

Two-pole Analysis of Interconnection Trees *

Issues on Timing and Clocking

Name: Grade: Q1 Q2 Q3 Q4 Q5 Total. ESE370 Fall 2015

The Linear-Feedback Shift Register


and V DS V GS V T (the saturation region) I DS = k 2 (V GS V T )2 (1+ V DS )

Technology Mapping for Reliability Enhancement in Logic Synthesis

SUFFIX TREE. SYNONYMS Compact suffix trie

POWER-OPTIMAL SIMULTANEOUS BUFFER INSERTION/SIZING AND WIRE SIZING

Driver Waveform Computation for Timing Analysis with Multiple Voltage Threshold Driver Models

VLSI GATE LEVEL DESIGN UNIT - III P.VIDYA SAGAR ( ASSOCIATE PROFESSOR) Department of Electronics and Communication Engineering, VBIT

CMOS device technology has scaled rapidly for nearly. Modeling and Analysis of Nonuniform Substrate Temperature Effects on Global ULSI Interconnects

Logic Effort Revisited

Lecture 25. Dealing with Interconnect and Timing. Digital Integrated Circuits Interconnect

Parallel VLSI CAD Algorithms. Lecture 1 Introduction Zhuo Feng

Announcements. EE141- Fall 2002 Lecture 25. Interconnect Effects I/O, Power Distribution

Transcription:

Buffer Insertion for Noise and Delay Optimization Charles J Alpert IBM Austin esearch Laoratory Austin, TX 78660 alpert@austinimcom Anirudh Devgan IBM Austin esearch Laoratory Austin, TX 78660 devgan@austinimcom Stephen T Quay IBM Microelectronics Division Austin, TX 78660 quayst@austinimcom Astract Buffer insertion has successfully een applied to reduce delay in gloal interconnect paths; however, existing techniques only optimize delay and timing slack With the increasing ratio of coupling to total capacitance and the use of aggressive dynamic logic circuit families, noise is ecoming a major design ottleneck We present comprehensive uffer insertion techniques for noise and delay optimization Our experiments on a microprocessor design show that our approach fixes all noise violations that were identified y a detailed, simulation-ased noise analysis tool Further, we show that the performance penalty induced y optimizing oth delay and noise as opposed to only delay is 2% 1 Introduction Performance optimization has always een a critical step in the design of integrated circuits Process technology scaling has made interconnect performance more minant than transistor and logic performance With the continued scaling of process technology, the interconnect resistance per unit length continues to increase, the capacitance per unit length remains roughly constant and logic delay continues to decrease These trends have caused interconnect delay to ecome more minant than logic delay Process technology options, such copper wires, can only provide temporary relief The trend of increasing interconnect minance is expected to continue Interconnect-driven timing optimization techniques, such as wire sizing, uffer insertion and gate sizing have gained widespread acceptance in deep sumicron design [7] In particular, uffer insertion techniques have een successful in reducing interconnect delay To the first order, interconnect delay is proportional to the square of the length of the wire Inserting uffers effectively divides the wire into smaller segments, which makes the interconnect delay almost linear in terms length (plus the uffer delays Additional advantages of uffer insertion will make this optimization even more pervasive as the ratio of device to interconnect delay continues to decrease Several works study delay-driven uffer insertion Closed formed solutions for 2-pin nets are proposed in [1] [4] [5] and [9] In [16], Van Ginneken develops a dynamic programming algorithm which finds the optimal uffer placement under the Elmore delay model [10] In [12], Lillis et al extends this algorithm to simultaneously perform wire sizing, while also minimizing the total numer of uffers Finally, Alpert and Devgan [1] propose a wire segmenting pre-processing algorithm to handle the one uffer per wire limitation of Van Ginneken s algorithm, which results in a smooth trade-off etween solution quality and run time Although timing optimization has always een critical in the design process, present day design techniques and process technologies are making noise analysis and avoidance as important The shrinking of the minimum distance etween adjacent wires has caused an increase in their coupled capacitance Furthermore, as the ratio of wire thickness to width continues to increase, so will the ratio of coupling to total capacitance Coupling capacitance can cause a switching net to induce noise onto a neighoring net, resulting in an incorrect functional response Further, the widespread use of dynamic logic circuits has made noise avoidance even more critical since these logic families are more susceptile to noise failure It is no longer sufficient or even acceptale to optimize only for delay Noise avoidance techniques must ecome an integral part of the performance optimization environment Buffer insertion provides a suitale platform for optimizing oth timing and noise (a ( Figure 1 Noise on a victim net (a without and ( with a uffer Figure 1(a shows the noise effect that an aggressor net (top can have on a victim net (ottom The coupling capacitance may cause an input signal on the aggressor net to induce a noise pulse on the victim net If the resulting noise is greater than the tolerale noise margin of the sink, then an electrical fault results Figure 1( shows how inserting a uffer can distriute the capacitive coupling etween the two newly created wires, resulting in smaller noise pulses on the input of the inserted uffer and on the sink If the amplitude of these noise pulses are less than he noise margins for the sink and the uffer, then the circuit will function correctly Noise analysis is typically performed through detailed circuit simulation or through reduced order interconnect analysis (eg, AWE[13] and ICE[15] Although the latter is more efficient, it is still too slow to e used within an optimization tool Instead, we apt the noise metric of [8] The rest of the paper is as follows Section 2 presents notation and definitions In Section 3, we derive a formula for the maximum wire length such that no noise violation is induced and also present two optimal algorithms for noise 1-58113-049-x-98/0006/$350 35 th Design Automation Conference Copyright 1998 ACM DAC98-06/98 San Francisco, CA USA

avoidance Section 4 presents a third algorithm for minimizing delay such that all noise constraints are satisfied Finally, Section 5 presents experimental results 2 Preliminaries A routing tree T = ( V, E contains a set of n 1 wires E and a set of n nodes V = {{ so} SI IN} where so is the unique source node, SI is the set of sink nodes, and IN is the set of internal nodes A wire ( e = ( u, E with length l e is an ordered pair of nodes in which the signal propagates from u to v Each node v SI IN has a unique parent wire ( uv, E The tree is assumed to e inary, ie, each node can have at most two children 1 Let the left and right children of v e denoted y Tleft( and Tright( respectively Assume that if v has only one child, then it is Tleft( The path from node u to v, denoted y path( u,, is the set of wires that connect u to v We are also given a uffer lirary B = { 1, 2,, m } A uffer insertion solution is a mapping M: IN B { } which either assigns a uffer or no uffer, denoted y, to each internal node of 2 T Let M = { v IN: M( B} denote the numer of internal nodes with inserted uffers Wires are segmented as in [1] to create as many internal nodes as necessary to form a reasonale set of uffer locations Assigning k uffers to T induces k + 1 nets, and hence k + 1 sutrees, each with no internally placed uffers For each v V, let T( = ({ v} SI T( IN T(, E T(, the sutree rooted at v, e the maximal sutree of T such that v is the source and T( contains no internal uffers Oserve that if v SI, T( contains only one node 21 Delay Optimization As in [1][12][16], we apt the Elmore delay model [10] for interconnect delays and a linear model for gate delays For each gate v, let C v denote the input capacitance, v the resistance and K v the intrinsic delay of v Let C e and e respectively denote the lumped capacitance and resistance for each wire e E The capacitive load seen at node v is the total lumped capacitance C T( of T(, ie, = + (1 C T( C w C e w SI T( e E T( The Elmore delay for a wire e = ( u, is given y Delay( e = e ( C e 2 + C T( The delay through a gate v { so} B is given y Delay( = K v + v C T( If v IN, M( =, then Delay( = 0 The total delay from v V to si SI is given y Delay( v si = Delay( e + Delay( u (2 e = ( u, w path( v, si Each sink si has a given required arrival time AT( si, and assume that the input signal arrives at the source node at time zero The condition si SI, Delay( so si AT( si must hold for the circuit to meet timing requirements For every v V, let qv ( = min e the slack at where si is ds ( the v AT( si Delay( v si v ds( set of all sinks that are wnstream from v Oserve that the circuit meets its timing if and only if qso ( 0 1 A non-inary tree can e converted into a inary tree y inserting 2 wires A uffer with placed zero resistance on an internal and capacitance node with degree where appropriate d as having one input, one output, and d 1 fanouts is interpreted 22 Noise Avoidance In [8], Devgan proposes a coupled noise estimation metric which is an upper ound for C and overdamped LC circuits The metric depends on the resistance of the victim net, the resistance of the gate driving the victim net, coupling capacitances to the aggressor nets, and the rise times and the slopes of the signals on the aggressor nets For example, consider the three aggressor nets and the single 2- pin victim net in Figure 2 The wire in the victim net is segmented into seven new wires such that each new wire is completely coupled to either 0, 1 or 2 of the aggressor nets Figure 2 Wire segmenting scheme for multiple aggressor nets The coupling capacitance from an aggressor net can e modeled as some fraction of the wire capacitance of the victim net Given t simultaneously switching aggressor nets 1,, t near wire e, let λ 1, λ, t e the ratios of coupling to wire capacitance from the aggressor nets to e, and let µ 1, µ t, e the slopes (ie, power supply voltage over input rise time of the aggressor net signals The total current induced y the aggressor nets on e is I e t I e = C e ( λ j µ j (3 j = 1 Often, information aout neighoring aggressor nets is incomplete, especially when uffer insertion is performed efore routing When performing uffer insertion in estimation mode, one might assume that (i each wire is coupled to exactly one aggressor net, (ii the slope of all aggressor nets is µ, and (iii some fixed ratio λ of the total capacitance of each wire is due to coupling capacitance Under these assumptions I e = λµ C e for each wire e Let I T( e the total wnstream current seen at v, ie, I T( = I e (4 e E T( Each wire adds to the noise induced on the victim net The amount of additional noise induced from a wire e = ( u, is given y Noise( e = e ( I e 2 + I T( The total additional noise seen at a sink si starting at some upstream node v is given y Noise( v si = I T( v + Noise( e (5 e path( v si where v = 0 if there is no gate at v The path from v to si has no intermediate uffers If an intermediate uffer is present, the noise computation egins from the output of the uffer, since the uffer is a restoring stage Each node v SI B has a predetermined noise margin NM( The condition v { so} { IN M( B}, si SI T(, Noise( v si NM( si must hold for there to e no electrical faults We define the noise slack for every v V as NS ( v = min Noise slack serves equivalently si NM( si Noise( v si SIT ( as v a noise margin for internal nodes Noise constraints for wnstream sinks in T( are satisfied

if and only if NS( is greater than the noise seen at v Oserve that NS( 0 for each v { so} { IN M( B} if there are no noise violations 23 Prolem Formulations We study two different uffer insertion prolems The first prolem seeks to fix all noise violations with the fewest possile uffers Delay is not considered Prolem 1: Given a tree T = ({ so} SI IN, E, a uffer lirary B, and noise margins NM( for each v SI B, find a solution M: IN ( B { } which minimizes M, such that NS( 0 for each v { so} { IN M( B} This prolem may e useful for non-critical nets, for which delay optimization is unnecessary For timing-critical nets, we must consider oth noise and delay at the same time Prolem 2: Given a tree T = ({ so} SI IN, E, a uffer lirary B, and noise margins NM( for each v SI B, find a solution M: IN ( B { } which minimizes qso ( such that NS( 0 for each v { so} { IN M( B} A third formulation can seek to minimize the total numer of uffers inserted y while satisfying oth noise and timing constraints Algorithm 3, which is used to solve Prolem 2, can also e applied to address this third formulation using an extension of Lillis et al, [12] to Van Ginneken [16] 3 Noise Constrained Buffer Insertion We egin with the simplest case of a wire with uniform width and neighoring coupling capacitance, as shown in Figure 3 For each wire e, let = e l e and I = I e l e respectively e the wire resistance per unit length and the current per unit length Since current is a constant times wire capacitance, we can use a π-model to represent its distriution Figure 3 Uniform coupling capacitance for a single wire Theorem 1 For a given wire e = ( u, in a routing tree T, a uffer needs to e inserted on e to satisfy noise constraints if and only if I T v l e ----- ---------- ( ----- 2 I ---------- T( 2 2NS( + + + ------------------ (6 I I I Proof: For noise constraints to e satisfied, we must have ( Il e + I T( + l e (( Il e 2 + I T( NS( which is a quadratic in Solving for yields the theorem 4 l e 3 l e Figure 4 Iterative application of Theorem 1 1 si Theorem 2 A net that has een optimally uffered to minimize delay alone may e susceptile to noise violations Proof: Consider a wire e = ( u, in which u and v are gates in the uffered net (a similar analysis holds for a path from u to v Let µ e the slope of an aggressor net, and let 2 λ e the ratio of coupling to total capacitance for e If there is a noise violation at v, then the following must hold l e ----- ----- 2 2NM( > + + -------------------- Cλµ Solving for NM( yields Cλµ 2 2 l NM( -------------- l e < (8 2 e + ------------- For any fixed values for the parameters on the right side of the inequality, a noise violation will occur if the noise margin is small enough Even if NM( is reasonaly large, an aggressor net can have a very large slope µ and a high coupling ratio λ which would also cause a violation 31 Noise Avoidance for Single-Sink Trees Theorem 1 suggests how to insert uffers for single-sink trees Begin at the sink and work up the tree, updating the total wnstream current and noise slacks of visited internal nodes At each internal node, use Theorem 1 to decide if a uffer should e inserted The algorithm terminates when the source node is reached Input: T = ({ so} { si} IN, E uffer type Output: M: IN B Buffer Insertion Solution 1 Set I T si = 0, NS( si = NM( si, v = si 2 while ( v so Let e = ( u, e the parent wire for v 3 if ( I T( + I e + ( I e T ( v + I e 2 < NS( Set I T( u = I T( + I e, Set NS( u = NS( e ( I T( + I e 2 Set v = u 4 else Let I = I e l e, = e l e Set l I ----- ---------- T( ----- 2 I ---------- T( 2 2NS( = + + + ----------------- I I I Create internal node w (with parent u and child on e at distance l from v Set M( w = Set I T( w = 0, NS( w = NM(, and v = w 5 NS( so = NS( so so I T( so if NS( so < 0, create internal node w at so Set M( w = Figure 5 Algorithm 1, Noise Avoidance for Single-Sink Trees Algorithm 1, Noise Avoidance for Single-Sink Trees, is presented in Figure 5 The algorithm accepts a routing tree, and a single uffer type Step 1 initializes the current and noise slack of the sink node, then Steps 2-4 clim up the tree visiting each node in turn Step 3 examines whether or not a uffer needs to e inserted on the current wire e = ( u,, y computing the noise from placing a uffer at u If this noise is less than the noise slack, no uffer needs to e inserted, so the algorithm computes the wnstream current and noise slack for node u, then moves to the next wire But, if the noise is larger then the noise slack, then a uffer must e inserted Step 4 computes the maximum length that this uffer may e inserted from v, and inserts it there at a new internal node w Finally, Step 5 computes the noise slack at the driver and inserts a uffer right after the driver if there is a noise violation (which can only occur if so > Optimality follows from the fact that uffers are always inserted their maximal distance up the tree, according to Theorem 1 (7

32 Noise Avoidance for Multi-Sink Trees Some difficulty arises in extending Algorithm 1 to multiple sinks Let e l ( e r, I l ( I r, and NS l ( NS r respectively e the wire, current and noise slack for the left (right ranch of an internal node v with two children It is possile that I l NS l, I r NS r and ( I l + I r > min( NS l, NS r, ie, the noise constraints for the left and right ranches are satisfied ut merging the left and right ranches will induce a noise violation Thus, a uffer must e placed on either the left or right ranch immediately after v One cannot immediately deduce which ranch to choose since one needs to know the characteristics and location of the gate driving v Since the algorithm is ottom-up, the location of this gate has not yet een determined Input: v Current node to e processed Output: S List of candidate solutions for node v Gloals: T = ({ so} SI IN, E B Buffer lirary 1 if v SI, then set S = {( 0, NM(, } 2 else if v has only one child then S = Algorithm 2( Tleft( 3 else if v has two children S l = Algorithm 2( Tleft( S r = Algorithm 2( Tright( Set i = 1 and j = 1 4 while j S l and k S r Let α l = ( I e the i th l, NS l, M l candidate in list S l Let α r = ( I e the j th r, NS r, M r candidate in list S r 5 if ( I l + I r > min( NS l, NS r then Insert nodes w l, w r just after v on the left and right ranches Set M l = M l, M r = M r Assign M l ( wl =, M r ( wr = 6 S = S { ( I r, min( NM(, NS r, M l M r } S = S {( I l, min( NM l, NS(, M l M r } else 7 S, } = S { ( I l + I r min( NS l, NS r, M l M r 8 if v is the source then for each α = ( I T, NS(, M S NS( so = NS( so so I T( so if NS( so < 0 create internal node w at so Set M( w = return S 9 Let e = ( u, e the parent wire of v for each α = ( I T(, NS(, M S if I T( + I e + ( I then ( v + I e 2 < NS( Set α = ( I T( + I e, NS( e ( I T( I e 2, M else 10 Let I = I e l e, = e l e Set l I ----- ---------- T( ----- 2 I ---------- T( 2 2NS( = + + + ----------------- I I I Create internal node w (with parent u and child on wire e at distance l from v M( w =, α = ( I( l e l, NM( I e ( l e l, M 11 Prune S of inferior solutions and return S Figure 6 Algorithm 2, Noise Avoidance for Multi-Sink Trees We propose to generate a set of candidate solutions for each node and propagate these candidate solutions up the tree, in the same spirit as in Van Ginneken s algorithm [16] A candidate α is a 3-tuple ( I T(, NS T(, M where I T( is the wnstream current seen at v, NS T( is the noise slack for v, and M is the current solution for the sutree T( Let M = Ml M r denote the resulting solution from merging the left solution M l and the right solution M r for an internal node v with two children For each w in T(, we assign M( w = if either M l ( w = or M r ( w = and M( w = otherwise Whenever a node with two children is encountered and a uffer needs to e inserted, oth a left and right candidate is generated The candidates are sorted in non-decreasing order y wnstream current so that inferior solutions can e pruned [16] Given two candidates α 1 = ( I 1, NS 1, M 1 and α 2 = ( I 2, NS 2, M 2, α 1 is inferior to α 2 if and only if I 1 > I 2 and NS 1 NS 2 Algorithm 2 is similar to Algorithm 1, except for Steps 4-7 which handle nodes with two children Step 4 iterates through each candidate for the left ranch and each candidate in the right ranch using the Van Ginneken s linear merging technique Step 5 tests whether merging the two candidates results in a noise violation, and if not, Step 7 merges the two sets of candidates without inserting a uffer If there is a violation, then two new solutions, one with a uffer on the left ranch and one a new uffer on the right ranch, are generated and inserted into the current list of candidates When the algorithm terminates, the solution(s in S with the fewest numer of uffers is chosen The algorithm returns an optimal solution to Prolem 1 in time quadratic in V As for Algorithm 1, one can otain an optimal solution for a uffer lirary with multiple uffer types y selecting the uffer type with least resistance 4 Optimizing Noise and Delay To address Prolem 2, we modify the approach of Van Ginneken [16] to include noise avoidance Whenever Van Ginneken s algorithm considers inserting a uffer, we check the noise constraints; if they have een violated, the uffer is not inserted Hence, our algorithm generates fewer solutions than Van Ginneken s algorithm since it prunes solutions which have noise violations Input: T = ({ so} SI IN, E B Buffer lirary Output: α Best candidate solution for source so 1 T = Segment_Wires( T, B [1] 2 S = Find_Cands( so 3 for each α = ( C T( so, qso (, I T( so, NS(, M S Set qv ( = qv ( K so so C T( Set NS( so = NS( so I T( so so 4 return ( C T( so, qso (, I T( so, NSso (, M S with maximum qso ( such that NS( so 0 Figure 7 Algorithm 3: Optimizing Noise and Delay As efore, a list of candidates is computed for each node, except that a candidate α = ( C T(, qv (, I T(, NS(, M is now a 5-tuple where C T( is the load seen at v, qv ( is the slack at v, I T( is the wnstream current seen at v, NS( is the noise slack at v, and M is the current solution Figure 7 illustrates Algorithm 3, which is the same as Van Ginneken s 3 except for the modifications in oldface Step 1 of Figure 7 segments the wires to generate sufficient possile uffer locations Step 2 calls Find_Cands which 3 The algorithm can e modified to handle inverting uffers [12]

returns a list of candidate solutions Step 3 adds the driver delay and computes the noise slack, then the candidate with the est timing slack, such that noise constraints are satisfied, is returned in Step 4 The Find_Cands procedure shown in Figure 8 It takes the node v as input, recursively computes the lists of possile candidates for all the nodes in T(, and then returns the candidates for v Find_Cands consists of four main parts: Steps 1-4 constructs candidates for the children of v and merges them to form S, the set of candidates for v Step 5 inserts considers each uffer type in the lirary and adds the uffer which yields the largest slack such that noise constraints are satisfied A uffer will not e inserted if there is a noise violation This step is the fundamental difference etween Algorithm 3 and Van Ginneken s algorithm [16] Step 6 computes the new load, slack, current and noise slack for each candidate induced y the parent wire of v Finally, Step 7 prunes inferior candidates from S, using the pruning schemes of [12][16] Modifications for noise avoidance not increase the O( V 2 B 2 time complexity of Van Ginneken s algorithm since the modifications are mostly constant time operations for ookkeeping and pruning Input: v Current node to e processed Output: S List of candidate solutions for node v Gloals: T = ({ so} SI IN, E B Buffer lirary S = S = 1 if v SI then S = {( C v, AT(, 0, NM(, M } 2 else if v has only one child then for each α Find_Cands( Tleft( S = S { α} 3 else if v has two children S l = Find_Cands( Tleft( S r = Find_Cands( Tright( Set i = 1 and j = 1 4 while i S l and j S r Let α l = ( C e j th l, q l, I l, NS l, M l candidate in S l Let α r = ( C e k th r, q r, I r, NS r, M r candidate in S r Let q = min( q l, q r, NS = min( NS l, NS r S = S { ( C l + C r, q, I l + I r, NS, M l M r } if q l q r then i = i + 1 if q r q l then j = j + 1 5 if v is a feasile uffer location then for each uffer B Find ( C T(, qv (, I T(, NS(, M S that maximizes qv ( K C such that NS( I T( 0 if such a candidate exists then Set M( = and S = S ( C, qv ( K C, 0, NM(, M S = S S 6 Let e = ( u, e the parent wire for v for each α = ( C T(, qv (, I T(, NS(, M S Set q = q( e ( C e 2 + C T( Set NS = NS( e ( I e 2 + I T( S = S { ( C T( + C e, q, I T( + I e, NS, M } α 7 Prune S of inferior solutions and return S Figure 8 Find_Cands Procedure( Theorem 3 If B = { } and NM( NM( si, si SI, then Algorithm 3 returns an optimal solution to Prolem 2 Proof: efer to [2] for a proof With multiple uffers in the uffer lirary, the optimality of Algorithm 3 is no longer guaranteed However, in practice, Algorithm 3 generates solutions that are very close to optima; our experimental results in the next section strongly support this claim 5 Experimental esults The algorithm we use in our experiments is an extended version of Algorithm 3, called BuffOpt, which can trade-off etween delay reduction, noise avoidance, and the total numer of uffers [12] We refer to the algorithm which optimizes only delay [1][12][16] as DOpt DOpt is the same as Algorithm 3 ut without the oldface text For our experiments, we selected a set of 500 critical nets from a modern Power PC microprocessor design Tale 1 shows the distriution of the sizes of these nets To verify BuffOpt, we also ran a detailed, simulation-ased noise analysis tool, called 3dnoise [14] 3dnoise was run oth efore and after BuffOpt and DOptTo perform noise analysis, 3dnoise uses accurate moment-matching ased techniques that are similar to ICE [15] #Sinks 1 2 3 4 5-7 8-10 11-20 21-40 41-57 #Nets 165 33 33 38 55 69 63 39 12 Tale 1: Sink distriution of the 500 test nets We ran BuffOpt, DOpt and 3dnoise all in estimation mode (see Susection 22 assuming a 07 coupling to total capacitance ratio from a single aggressor net with rise time 025 nanoseconds and a power supply voltage of 18V The tolerale noise margin for every gate was set to 08V The uffer lirary contained 5 inverting and 6 non-inverting uffers of varying power levels Our experiments show that BuffOpt eliminated all noise prolems in the design, DOpt could not fix all noise prolems, and the average delay penalty from using BuffOpt instead of DOpt was less than 2% 51 BuffOpt Successfully Avoids Noise We ran BuffOpt on the 500 nets, and oserved that BuffOpt identified 423 noise violations and successfully inserted uffers to fix all of them To verify BuffOpt s identification of noise violations, we ran 3dnoise on the nets efore running BuffOpt The accurate analysis of 3dnoise identified 386 nets with noise violations, all of which were also identified y BuffOpt BuffOpt identified 423-386 = 37 more nets with violations, which shows that the noise metric [8] is slightly conservative We also ran 3dnoise on the nets after running BuffOpt, and 3dnoise identified no noise violations This data is summarized in Tale 2 # Nets # Optimized # Noise Violations Before BuffOpt 500 0 386 After BuffOpt 500 423 0 Tale 2: Numer of noise violations reported y 3dnoise efore and after running BuffOpt

52 Optimizing Delay Alone is Insufficient We now compare BuffOpt to DOpt (optimal delay-driven uffer insertion in terms of noise avoidance Since BuffOpt never inserted more than four uffers on any net, we ran DOpt four times in which no solution was allowed to have more than k uffers where k ranged from 1 to 4 We denote one such run of DOpt y DOpt(k Tale 3 compares DOpt(k with BuffOpt TBI stands for total uffers inserted (ie, times the numer of nets with uffers 4 d d, and #NVs stands for numer of noise violations Method # Nets with d Buffers TBI # NVs Total d=0 d=1 d=2 d=3 d=4 CPU None 500 0 386 N/A DOpt(1 7 493 493 150 216 DOpt(2 5 19 476 971 37 613 DOpt(3 5 16 77 402 1376 21 936 DOpt(4 4 16 29 35 416 1843 13 1135 BuffOpt 77 161 232 28 2 717 0 980 Tale 3: BuffOpt versus DOpt(k for noise Oserve that running DOpt(4 causes the addition of 1126 more uffers than BuffOpt, yet there are still 13 noise violations educing the maximum numer of uffers that can e inserted y DOpt only increases the numer of noise violations and still causes more uffers to e inserted for k 2 Thus, as shown y Theorem 2, noise avoidance must e integrated into the algorithm to guarantee no noise violations Finally, oserve that BuffOpt uses less CPU than DOpt(k for k 3 This occurs ecause BuffOpt prunes candidates with noise violation, which gives BuffOpt fewer total candidates to analyze 53 The Delay Penalty is Small # Buffers 0 1 2 3 4 Avg D # Times 77 161 232 28 2 423 BuffOpt 0 1248 3360 10434 455 3011 DOpt 0 1358 3382 10531 503 3071 Delay Penalty 0 877 064 093 1050 199 Tale 4: Average delay reduction from uffer insertion Finally, we compare DOpt to BuffOpt in terms of total delay We first ran BuffOpt on the 500 nets and recorded the uffers inserted for each net We then ran DOpt for the same numer of uffers as BuffOpt inserted in order to make an apples to apples comparison We computed the reduction in total delay for each net and averaged the results y the numer of uffers inserted The cumulative results are presented in Tale 4 For example, there were 232 nets for which two uffers were inserted, and on average BuffOpt reduced delay y 3360 picoseconds while DOpt reduced delay y 3382 picoseconds The overall delay penalty is 4 For example, BuffOpt inserted ( 0 77 + ( 1 161 + ( 2 232 + ( 3 84 + ( 4 2 = 717 uffers 100( 3382 3360 3382 = 064 percent The average delay reduction over all 423 nets is computed y taking the total delay reduction for all nets and dividing y 423 5 Oserve from the last column that the overall average delay penalty is only 6 ps, or equivalently 199%, from avoiding noise Thus, Buffopt is ale to integrate noise into a delaydriven algorithm with virtually no loss in total delay ecall that Algorithm 3 is optimal when the uffer lirary contains a single uffer type, ut we could not guarantee optimality for a larger uffer lirary The DOpt results in Tale 4 form an upper ound on the optimal solution to Prolem 2 since DOpt is optimal for delay alone Oserve that even with a uffer lirary of size 11, BuffOpt solutions are virtually optimal since they are on average within 2% of an upper ound Acknowledgments The authors would like to thank Joe ahmeh for his help with 3dnoise and the PowerPC design data eferences [1] C J Alpert and A Devgan, Wire Segmenting for Improved Buffer Insertion, DAC, 1997, pp 588-593 [2] C J Alpert, S T Quay and A Devgan, Comprehensive Buffer Insertion Techniques, sumitted to IEEE Trans CAD, April, 1998 [3] C J Alpert, S T Quay and A Devgan, Comprehensive Buffer Insertion Techniques for Noise Avoidance, US patent filed, 1997 [4] H B Bakoglu, Circuits, Interconnections, and Packaging for VLSI, Addison-Wesley, 1990 [5] C C N Chu and D F Wong, Closed Form Solution to Simultaneous Buffer Insertion/Sizing and Wire Sizing, International Symposium on Physical Design, 1997, pp 192-197 [6] C C N Chu and D F Wong, A New Approach to Simultaneous Buffer Insertion and Wire Sizing, ICCAD, 1997, pp 614-621 [7] J Cong, L He, C-K Koh, and P H Madden, Performance Optimization of VLSI Interconnect Layout, Integration: the VLSI Journal, 21, 1996, pp 1-94 [8] A Devgan, Efficient Coupled Noise Estimation for On-Chip Interconnects, ICCAD, 1997, pp 147-151 [9] S Dhar and M A Franklin, Optimum Buffer Circuits for Driving Long Uniform Lines, IEEE Journal of Solid-State Circuits, 26(1, 1991, pp 32-40 [10] W C Elmore, The Transient esponse of Damped Linear Network with Particular egard to Wideand Amplifiers, J Applied Physics, 19, 1948, pp 55-63 [11] M Z-W Kang, W W-M Dai, T Dillinger, and D P LaPotin, Delay Bounded Buffered Tree Construction for Timing Driven Floorplanning, ICCAD, 1997, pp 707-712 [12] J Lillis, C-K Cheng and T-T Y Lin, Optimal Wire Sizing and Buffer Insertion for Low Power and a Generalized Delay Model, IEEE Journal of Solid-State Circuits, 31(3, 1996, pp 437-447 [13] L T Pillage and A ohrer Asymptotic Waveform Evaluation for Timing Analysis IEEE Trans CAD 9(4:352-366, April, 1990 [14] J ahmeh, The 3d-Noise User Guide, IBM Internal eport, 1997 [15] C atzlaff and L T Pillage, ICE: apid Interconnect circuit Circuit Evaluator using Asymptotic Waveform Evaluation, IEEE Trans CAD, pp 763-776, June 1994 [16] L P P P van Ginneken, Buffer Placement in Distriuted C-tree Networks for Minimal Elmore Delay, ISCAS, 1990, pp 865-868 5 Eg, for BuffOpt, the weighted average of 3011 is given y (( 161 1248 + ( 3360 232 + ( 10434 28 + ( 455 2 423