m = 4 n = 9 W 1 N 1 x 1 R D 4 s x i

Similar documents
x i x i c i x CL Component index n = r R D C L RD x 3 x 5 B = {0, 3, 5, 8} m = 2 W = {1, 2, 4, 6, 7}

Two Approaches to Proving. Goldbach s Conjecture

VLSI Circuit Performance Optimization by Geometric Programming

Interconnect Optimization for Deep-Submicron and Giga-Hertz ICs

Small signal analysis

Improvements on Waring s Problem

Improvements on Waring s Problem

Additional File 1 - Detailed explanation of the expression level CPD

Solution Methods for Time-indexed MIP Models for Chemical Production Scheduling

CHAPTER 9 LINEAR MOMENTUM, IMPULSE AND COLLISIONS

Harmonic oscillator approximation

Chapter 11. Supplemental Text Material. The method of steepest ascent can be derived as follows. Suppose that we have fit a firstorder

Specification -- Assumptions of the Simple Classical Linear Regression Model (CLRM) 1. Introduction

MULTIPLE REGRESSION ANALYSIS For the Case of Two Regressors

EECE 301 Signals & Systems Prof. Mark Fowler

Introduction to Interfacial Segregation. Xiaozhe Zhang 10/02/2015

Chapter 6 The Effect of the GPS Systematic Errors on Deformation Parameters

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

AN EFFICIENT TECHNIQUE FOR DEVICE AND INTERCONNECT OPTIMIZATION IN DEEP SUBMICRON DESIGNS. Jason Cong Lei He

Electrical Circuits II (ECE233b)

Start Point and Trajectory Analysis for the Minimal Time System Design Algorithm

Information Acquisition in Global Games of Regime Change (Online Appendix)

Valid Inequalities Based on Demand Propagation for Chemical Production Scheduling MIP Models

Lecture Notes on Linear Regression

Outline and Reading. Dynamic Programming. Dynamic Programming revealed. Computing Fibonacci. The General Dynamic Programming Technique

Week 5: Neural Networks

EEE 241: Linear Systems

Physics 111. CQ1: springs. con t. Aristocrat at a fixed angle. Wednesday, 8-9 pm in NSC 118/119 Sunday, 6:30-8 pm in CCLIR 468.

Mean Field / Variational Approximations

Lecture 10 Support Vector Machines. Oct

Statistical Circuit Optimization Considering Device and Interconnect Process Variations

Generalized Linear Methods

Pythagorean triples. Leen Noordzij.

On the SO 2 Problem in Thermal Power Plants. 2.Two-steps chemical absorption modeling

Yong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 )

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

Lecture 10 Support Vector Machines II

Variable Structure Control ~ Basics

Estimation of Finite Population Total under PPS Sampling in Presence of Extra Auxiliary Information

Method Of Fundamental Solutions For Modeling Electromagnetic Wave Scattering Problems

Module 5. Cables and Arches. Version 2 CE IIT, Kharagpur

Distributed Control for the Parallel DC Linked Modular Shunt Active Power Filters under Distorted Utility Voltage Condition

A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

find (x): given element x, return the canonical element of the set containing x;

The Minimum Universal Cost Flow in an Infeasible Flow Network

Chapter.4 MAGNETIC CIRCUIT OF A D.C. MACHINE

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

Feature Selection: Part 1

MODELLING OF TRANSIENT HEAT TRANSPORT IN TWO-LAYERED CRYSTALLINE SOLID FILMS USING THE INTERVAL LATTICE BOLTZMANN METHOD

Solutions to exam in SF1811 Optimization, Jan 14, 2015

Preemptive scheduling. Disadvantages of preemptions WCET. Preemption indirect costs 19/10/2018. Cache related preemption delay

CSC 411 / CSC D11 / CSC C11

The Price of Anarchy in a Network Pricing Game

A A Non-Constructible Equilibrium 1

Wind - Induced Vibration Control of Long - Span Bridges by Multiple Tuned Mass Dampers

Calculation of time complexity (3%)

Problem #1. Known: All required parameters. Schematic: Find: Depth of freezing as function of time. Strategy:

Problem Free Expansion of Ideal Gas

1 cos. where v v sin. Range Equations: for an object that lands at the same height at which it starts. v sin 2 i. t g. and. sin g

1 GSW Iterative Techniques for y = Ax

The Expectation-Maximization Algorithm

Assortment Optimization under MNL

Research Article Runge-Kutta Type Methods for Directly Solving Special Fourth-Order Ordinary Differential Equations

Spectral Properties of the Grounded Laplacian Matrix with Applications to Consensus in the Presence of Stubborn Agents

Estimating Delays. Gate Delay Model. Gate Delay. Effort Delay. Computing Logical Effort. Logical Effort

An Integrated OR/CP Method for Planning and Scheduling

BOUNDARY ELEMENT METHODS FOR VIBRATION PROBLEMS. Ashok D. Belegundu Professor of Mechanical Engineering Penn State University

Kernel Methods and SVMs Extension

Root Locus Techniques

Simultaneous Device and Interconnect Optimization

APPENDIX A Some Linear Algebra

a new crytoytem baed on the dea of Shmuley and roved t rovably ecure baed on ntractablty of factorng [Mc88] After that n 999 El Bham, Dan Boneh and Om

Linear Approximating to Integer Addition

COS 521: Advanced Algorithms Game Theory and Linear Programming

5.5 Application of Frequency Response: Signal Filters

This appendix presents the derivations and proofs omitted from the main text.

Module 9. Lecture 6. Duality in Assignment Problems

728. Mechanical and electrical elements in reduction of vibrations

Analysis of Queuing Delay in Multimedia Gateway Call Routing

Discrete Simultaneous Perturbation Stochastic Approximation on Loss Function with Noisy Measurements

AP Statistics Ch 3 Examining Relationships

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Physics 120. Exam #1. April 15, 2011

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Problem Set 9 Solutions

Multiple-objective risk-sensitive control and its small noise limit

Lecture 17: Lee-Sidford Barrier

Estimation of a proportion under a certain two-stage sampling design

Which Separator? Spring 1

The Study of Teaching-learning-based Optimization Algorithm

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law:

1. The number of significant figures in the number is a. 4 b. 5 c. 6 d. 7

Scattering of two identical particles in the center-of. of-mass frame. (b)

Foundations of Arithmetic

APPROXIMATE FUZZY REASONING BASED ON INTERPOLATION IN THE VAGUE ENVIRONMENT OF THE FUZZY RULEBASE AS A PRACTICAL ALTERNATIVE OF THE CLASSICAL CRI

Stability Analysis of Inverter for Renewable Energy

Lecture outline. Optimal Experimental Design: Where to find basic information. Theory of D-optimal design

Vector Norms. Chapter 7 Iterative Techniques in Matrix Algebra. Cauchy-Bunyakovsky-Schwarz Inequality for Sums. Distances. Convergence.

Resonant FCS Predictive Control of Power Converter in Stationary Reference Frame

Transcription:

GREEDY WIRE-SIZING IS LINEAR TIME Chr C. N. Chu D. F. Wong cnchu@c.utexa.edu wong@c.utexa.edu Department of Computer Scence, Unverty of Texa at Autn, Autn, T 787. ABSTRACT In nterconnect optmzaton by wre-zng, mnmzng weghted n delay ha been hown to be the ey problem. Wre-zng wth many mportant obectve uch a mnmzng total area ubect to delay bound or mnmzng maxmum delay can all be reduced to olvng a equence of weghted n delay problem by Lagrangan relaxaton [, 3]. GWSA, rt ntroduced n [0] for dcrete wre-zng later extended n [] to contnuou wre-zng, a greedy wre-zng algorthm for the weghted n delay problem. Although GWSA ha been expermentally hown to be very ecent, no mathematcal analy on t convergence rate ha ever been reported. In th paper, we conder GWSA for contnuou wre zng. WeprovethatGWSAconverge lnearly to the optmal oluton, whch mple that the run tme of GWSA lnear wth repect to the number of wre egment for any xed precon of the oluton. Moreover, we alo prove that th true for any tartng oluton. Th a urprng reult becaue prevouly t wa beleved that n order to guarantee convergence, GWSA had to tart from a oluton n whch every wre egment et to the mnmum (or maxmum) poble wdth. Our reult mple that GWSA can ue a good tartng oluton to acheve fater convergence. We demontrate th pont by howng that the mnmzaton of maxmum delay ung Lagrangan relaxaton can be peed up by 57.7%.. INTRODUCTION Wth the evoluton of VLSI fabrcaton technology, nterconnect delay ha become the domnant factor n deep ubmcron degn. In many ytem degned today, a much a 50% to 70% of cloc cycle are conumed by nterconnect delay [7]. A technology contnue to cale down, we expect the gncance of nterconnect delay wll further ncreae n the near future. Wre-zng ha been hown to be an eectve technque for nterconnect optmzaton. Many wor have been done durng the pat few year. See [7] for a urvey. In partcular, the problem of mnmzng weghted n delay ha drawn a lot of attenton. Bacally, a routng tree wth a ource, a et of n a et of wre egment gven. Aocated wth each n a non-negatve weght repreentng the crtcalty of the n. The problem to de- Th wor wa partally upported by the Texa Advanced Reearch rogram under Grant No. 00365888 by a grant from the Intel Corporaton. termne the wdth of each wre egment othattheweghted um of the delay from the ource to the n mnmzed. Solvng th problem a ey to olve problem wth many other mportant obectve uch a mnmzng total area ubect to delay bound or mnmzng maxmum delay. It becaue [, 3] have hown that thoe problem can all be reduced by Lagrangan relaxaton to a equence of weghted n delay problem. So havng ecent algorthm for the weghted n delay problem very mportant for nterconnect optmzaton. For the problem of mnmzng weghted n delay under Elmore delay model [], a wdely ued technque optmal local re-zng. The bac dea to teratvely greedly re-ze the wre egment. In each teraton, the wre egment n the tree are examned one by one. When a wre egment examned, t re-zed optmally whle eepng the wdth of all other egment xed. Th technque wa rt ntroduced n [0] wa later extended to many other wre, buer, gate, drver /or trantor zng problem [,, 4, 5, 6, 8, 9]. In [0], dcrete wre-zng (.e. the egment wdth mut be choen from a gven et of dcrete choce) wa condered. The propoed algorthm wa called GWSA (Greedy Wre- Szng Algorthm). GWSA doe not gve the optmal oluton drectly a t can converge to non-optmal oluton. Rather, GWSA ued to get lower upper bound on the egment wdth of the optmal oluton. Then dynamc programmng technque ued to nd the optmal oluton among all the poble oluton atfyng the lower upper bound. A the lower upper bound obtaned by GWSA are cloe to each other n mot cae, the dynamc programmng tep uually very ecent. In [], GWSA wa extended to contnuou wre-zng (.e. the egment wdth can be from a contnuou range of real number). It wa proved n [] that for contnuou wrezng, GWSA alway converge to the optmal oluton, provded that all egment are et to ther mnmum (or maxmum) poble wdth for the tartng oluton. However, the convergence rate of GWSA not nown. In th paper, we analyze the convergence of GWSA for contnuou wre-zng. One of our contrbuton we prove that the convergence rate of GWSA lnear. Th mple that the run tme of GWSA O(n log ) where n the number of wre egment pece the precon of the oluton (ee Theorem ). So GWSA run n tme lnear to n for a xed precon. For all prevou algorthm ung optmal local re-zng, the convergence alway depend on the fact that the oluton of optmal local re-zng ate a pecal domnance property [0]. That f a wre-zng oluton domnated by the optmal oluton (.e. the wdth of every egment n the oluton maller than or equal to that n the optmal oluton), then the oluton after an optmal local re-zng of any egment wll tll be domnated by the optmal o-

luton. So f we tart from a oluton wth every egment et to t mnmum poble wdth (th oluton obvouly domnated by the optmal oluton), then after any number of optmal local re-zng, the oluton wll tll be domnated by the optmal oluton. In other word, for any egment, the optmal wdth alway an upper bound to the wdth by optmal local re-zng. Snce egment wdth are non-decreang durng optmal local re-zng are upper bounded, the oluton mut converge (to a lower bound of the optmal oluton for dcrete wre-zng, to the optmal oluton for contnuou wre-zng). A mlar property hold for wrezng oluton whch domnate the optmal oluton. Therefore, prevouly n order to guarantee convergence, GWSA alway et all egment to ther mnmum (or maxmum) poble wdth for the tartng oluton. Another contrbuton of th paper we prove that for contnuou wre-zng, GWSA alway converge to the optmal oluton from any tartng oluton. Th done by provng the convergence of GWSA wthout ung domnance property. Soby ung a good tartng oluton for GWSA, fater convergence can be acheved. Th reult on tartng oluton partcularly ueful n optmzng other obectve (e.g. mnmzng total area ubect to delay bound or mnmzng maxmum delay) by Lagrangan relaxaton. A problem wth other obectve canbe olved optmally by reducng t to a equence of weghted n delay problem ung the Lagrangan relaxaton technque. revouly, before olvng each weghted n delay problem, n order to guarantee convergence, all egment are reet to ther mnmum (or maxmum) poble wdth to form the tartng oluton for GWSA. However, nce two conecutve weghted n delay problem n the equence are almot the ame (except that the n weght are changed by a lttle bt), the optmal oluton of the rt weghted n delay problem cloe to the optmal oluton of the econd one, hence a good tartng oluton to the econd one. So t better not to reet the wre-zng oluton before olvng each weghted n delay problem. We expermentally verfy that our new approach of not reetng much better than the prevou approach of reetng each tme. We how that our approach can peed up the mnmzaton of maxmum delay ung Lagrangan relaxaton by 57.7%. The ret of th paper organzed a follow. In Secton, we wll preent theweghted n delay problem the algorthm GWSA condered n []. In Secton 3, we wllan- alyze the convergence of GWSA. In Secton 4, expermental reult to how the lnearty of the run tme of GWSA the peedup on optmzng other obectve ung Lagrangan relaxaton are preented.. THE WEIGHTED SINK DELAY ROBLEM AND THE ALGORITHM GWSA In th ecton, we wll rt preent the contnuou wre-zng problem wth weghted n delay obectve then the algorthm GWSA condered n []. Aume that we are gven a routng tree T mplementng a gnal net whch cont of a ource (at the root) wth drver retance R D, a et of n wre egment W = fw W ::: W ng, a et of m n N = fn N ::: N mg (at the leave) wth load capactance c, m. Aocated wth each n N a non-negatve weght repreentng the crtcalty of the n. Aume m wthout lo of generalty that = =. Bacally, the problem to mnmze the weghted n delay for the routng tree bychangng the wdth of the wre egment. See Fgure for an example of a routng tree. m = 4 n = 9 R D W x x5 W5 W6 W7 W W4 W8 W3 W9 Fgure. An example of a routng tree. N c N c N3 c 3 N4 c 4 Let dec(w ) be the et of decendant wre egment or n of W (excludng W ). Let an(w ) be the et of ancetor wre egment of W (excludng W ). Let path(n ) be the et of wre egment on the path from the drver to the n N. For example, for the routng tree a hown n Fgure, dec(w ) = fw W 3 W 4 N N g, an(w ) = fg, dec(w 8) = fn 3g, an(w 8) = fw 5 W 6 W 7g, path(n 3)=fW 5 W 6 W 7 W 8g. For n, letx be the wdth of wre egment W, L U be repectvely the lower bound the upper bound on the wdth of W. Therefore, L x U for n. Let x =(x x ::: x n), whch wll be referred to a a wre-zng oluton. A wre egment modeled a a -type RC crcut a hown n Fgure. The retance capactance of wre egment W are br =x bc x + f repectvely, where br the unt wdth wre retance, bc the unt wdth wre area capactance, f the wre frngng capactance of W. x c x + f c x + f Fgure. The model of wre egment W by a -type RC crcut. Note that the retance capactance of the egment are br =x bc x + f repectvely, where br the unt wdth wre retance, bc the unt wdth wre area capactance, f the wre frngng capactance of W. Let = N dec(w ), n weght of egment W. Let R (x) = W an(w ) uptream wre retance of egment W. Let C (x) = W dec(w ) r x.e. the total downtream br =x,.e. R (x) aweghted bc x,.e. C (x) the total downtream wre area capactance of egment W. Let C f = W dec(w ) f + N dec(w ) c,.e. C f the total downtream wre frngng capactance n capactance of egment W. Elmore delay model [] ued for delay calculaton. For a wre-zng oluton x, the Elmore delay from the ource to

the n N gven by D (x) = R D( + W W W path(n ) bc x + W W f + N N c ) br bcx x + C(x)+f + Cf Then the weghted n delay problem can be wrtten a: Mnmze D(x) = m = D (x) Subect to L x U n: Now we preent the algorthm GWSA propoed n [] for olvng the weghted n delay problem. The algorthm GWSA a greedy algorthm baed on teratvely re-zng the wre egment. In each teraton, the wre egment are examned one by one. When a wre egment W examned, t re-zed optmally whle eepng the wdth of all other egment xed. Th operaton called an optmal local re-zng of W. The followng lemma gve a formula for optmal local re-zng. Lemma For a wre-zng oluton x = (x x ::: x n), the optmal local re-zng of W gven by changng the wdth of W to x =mn 8 8 < < : U max : L br bc f C(x)+ + C f R (x)+r D 99 = = roof outlne: By extendng the proof of Lemma n [] (whch dd not conder wre frngng capactance), we can how that D(x) = bc x (R (x)+r D)+ br x + term ndependent ofx C (x)+ f + Cf Note that R (x)c (x) are alo ndependentofx.hence by Lemma of [], the reult follow. Let chldren(w ) be the et of all chldren wre egment of W let p be the ndex of the parent wreegmentof W. Then the algorthm GWSA gven below. Note that nce C (x) R (x) are computed ncrementally n tep S3 S4, each teraton of GWSA tae only O(n) tme. For the orgnal GWSA n [], n S, x et to L for all. Then domnance property can be appled to how that the algorthm converge. However, the convergence rate not nown. Alo, f ome other tartng wre-zng oluton ued n S, t t not clear whether the algorthm wll tll converge. In the next ecton, we wll how that GWSA alway converge lnearly for any tartng oluton. 3. CONVERGENCE ANALYSIS OF GWSA In th ecton, we wll rt prove that the algorthm GWSA alway converge to the optmal oluton for any tartng oluton (Theorem ). Then we wllprove that the convergence rate for any tartng oluton alway lnear. Th mple the run tme of GWSA O(n log ) for any tartng oluton, where pece the precon of the oluton (Theorem ). For the followng two lemma, we wll focu on egment W for ome xed. Note that durng the n optmal local ALGORITHM GWSA: S. Let x be ome tartng wre-zng oluton. S. Compute ' C f ' by a bottom-up traveral of T ung the followng formula: f W connect drectly to n N := W chldren(w otherwe ) c C f f W connect drectly to n N := W chldren(w (f + Cf ) ) otherwe S3. Compute all C ' by a bottom-up traveral of T ung the followng formula: C (x) := W chldren(w (bcx + C(x)) ) S4. erform a top-down traveral of T : For each W, R (x) :=R p (x)+ p br p =x p 8 < 8 < x =mn : U max : L S5. Repeat S3{S4 untl no mprovement. f br C (x) + + C f bc R (x) +R D 99 = = re-zng operaton ut before the local re-zng of W at a partcular teraton (except the rt teraton), each wre egment re-zed exactly once. Intutvely, the followng two lemma how that durng thee n re-zng operaton, f the change n all the egment wdth are mall, then the change n the wdth x durng the local re-zng of W at that teraton wll be even maller. For ome t, let x =(x ::: x n), x 0 =(x 0 ::: x 0 n) x 00 = (x 00 ::: x 00 n) be repectvely the wre-zng oluton ut before the local re-zng of W at teraton t, t + br t + of GWSA. Let q 0 = br q 00 = C (x)+ f + C f bc R (x)+r D C (x 0 )+ f + C f bc R (x 0. So by Lemma, x 0 = )+R D mnfu maxfl qgg 0 x 00 =mnfu maxfl q 00 gg. Lemma For any > 0, f + x0 + for all, x then + q00 q 0 + for ome contant 0 <<. roof: If + x x0 ( + )x for all, wehave + R (x) R (x 0 ) ( + )R (x) + C (x) C (x 0 ) ( + )C (x): Let = max n 8 >< >: = + = +! R D W an(w )br =L! f +C f W an(w )bc U 9 >=. > Note that a contant uch that 0 <<. Snce 0 x L for all, we have R (x) = W an(w ) br=x W an(w br ) =L. So = ( + R D=R (x)), or equvalently, R (x) (R (x)+r D):

Hence R (x 0 )+R D ( + )R (x)+r D = R (x)+(r (x)+r D) (R (x)+r D)+(R (x)+r D) = ( + )(R (x)+r D) () R (x 0 )+R D + R (x)+r D = R (x)+r D ; + R (x) R (x)+r D ; + (R (x)+r D) = ( ; + )(R (x)+r D) > + (R (x)+r D) () a >0 0 << Smlarly, nce x U for all, we have C (x) = W an(w ; bc ) x W an(w )bc U. So = +( f + C f )=C (x), or equvalently, C (x) (C (x)+ f + Cf ): Hence we can prove mlarly that C (x 0 )+ f + Cf ( + )(C (x)+ f + Cf ) (3) C (x 0 )+ f + Cf > + (C (x)+ f + Cf ) (4) By denton of q 0 q 00, by () (3), we have br q 00 = C (x 0 )+ f + C f bc R (x 0 )+R D br = ( + )q 0 : ) ( + )(C (x)+ f + C f bc + (R (x)+r D) Smlarly, by () (4), we can prove that q 00 + q0 : A a reult, + q00 +. q 0 Lemma 3 For any > 0, f + x0 + for all, x then + x00 x 0 + for ome contant 0 <<. roof: By Lemma, f + x x0 ( + )x for all, then + q0 q 00 ( + )q 0 where the contant a dened n Lemma. By Lemma, x 0 =mnfu maxfl qgg 0 x 00 =mnfu maxfl q 00 gg. In order to prove + x0 x 00, we conder three cae: Cae ) q 0 <L. Then x 0 = L. So Cae ) q 00 >U. Then x 00 = U. So + x0 = + L <L x 00. + x0 + U <U = x 00. Cae 3) q 0 L q 00 U. Then q 0 L ) x 0 q 0 q 00 U ) q 00 x 00. So + x0 + q0 q 00 x 00. Smlarly, by conderng the cae q 0 > U, q 00 < L (q 0 U q 00 L ), we can prove x 00 ( + )x 0. A a reult, + x00 x 0 +. The followng two lemma gve bound on the change of egment wdth after each teraton of GWSA. Let x (0) = (x (0) x(0) ::: x(0) n ) be the tartng wre-zng oluton, for t, let = ( x(t) ::: x(t) n ) be the wre-zng oluton ut after teraton t of GWSA. Lemma 4 For any t 0 > 0, f + x(t+) + for all, then + x(t+) + for all x (t+) for ome contant 0 <<. roof: Aume wthout lo of generalty that the wre egment are ndexed n uch away that a top-down traveral of T n the order of W W ::: W n. The lemma can be proved by nducton on. Bae cae: Conder the wre egment W. At teraton t +, before local re-zng of W, the wrezng oluton ( x(t) ::: x(t) n ). At teraton t +, before local re-zng of W, the wrezng oluton (x (t+) x (t+) ::: x (t+) n ). Snce have + x(t+) + x(t+) x (t+) + for all, by Lemma 3, we +. Inducton tep: Aume that the nducton hypothe true for = :::. At teraton t +, before local re-zng of W, the wrezng oluton (x (t+) ::: x (t+) + ::: x(t) n ). At teraton t +, before local re-zng of W, the wrezng oluton (x (t+) ::: x (t+) x (t+) + ::: x(t+) n ). By nducton hypothe, hence + x(t+) x (t+) Alo, t gven that + x(t+) x (t+) +, + (a <) for = :::. + x(t+) ::: n. So by Lemma 3, + x(t+) x (t+) + for = + +. Hence the lemma proved. n U ; L o Let = max. n L Lemma 5 For any t 0, + t x(t+) + t for all for ome contant 0 <<.

roof: Th can be proved by nducton on t. Bae cae: Conder t =0. Note that for any wre-zng oluton x =(x ::: x n), L x U for all. So x() x (0) U L + U ;L L +. Smlarly, we can prove that for all, x() x (0). + Inducton tep: Aume that the nducton hypothe true for t. Therefore, + t all. So by Lemma 4, + t+ x(t+) x (t) + t for x(t+) + t+ x (t+) for all. Hence the lemma proved. Theorem GWSA alway converge to the optmal wrezng oluton for any tartng oluton. roof: For any contant 0<<, + t! at!. So by Lemma 5, t obvou that the algorthm GWSA alway converge for any tartng wre-zng oluton. Theorem of [] proved that f GWSA converge, then the wrezng oluton optmal. So the theorem follow. Let x =(x x ::: x n) be the optmal wre-zng oluton. The followng lemma prove that the convergence rate of GWSA lnear wth convergence rato. Lemma 6 For any t 0, x ; x. roof: For any t 0forany, ( + )t Cae ). ; Then x(t) x ( + )t ; U L + + (+)t ;. Smlarly, we can prove x(t) x Cae ) ( + )t ; Q Then x(t) x = Q =t where = ln = = = = <. x () x (+) =t ( + ). =t =t =t = = = ln(+ ) + (+) t. ; for all. So by Lemma 5, x(t) x ; + 3 3 3 + (5) =! =t ( ) t ; t ( ; ) (6) = ln ; t ; (7) where (5) becaue ln(+x) =x; x + 3 x3 +, (6) becaue 0 <<, whch mple 0 < (;) < ; < ; for, (7) becaue 0 < t ; < (+)t ; < ln ;x = x + x + 3 x3 + f 0 <x<. So Hence + (+) t ; Therefore for both cae, + (+)t ; ; ; t = + t ; = ; t ; + t ; = ; + ( + )t = + : ; x(t) x x(t) x + (+)t ;. + It eay to ee that ; (+)t ; ( + )t : ; + (+) t ;. So for any t 0 for all, x ; ( + )t x : ; Snce the convergence rate of GWSA lnear the run tme of each GWSA teraton O(n), we have the followng theorem. Theorem The total run tme of GWSA for any tartng oluton O(n log ), where pece the precon of the nal wre-zng oluton (.e. for the optmal oluton x, the nal oluton x ate (x ; x )=x for all ). roof: By Lemma 6, for any t 0 for all, x ; x ( + )t : ; In order to guarantee that (x ; )=x for all, the number of teraton t mut atfy or equvalently, In other word, ( + ) t ; ( + ) t log ( ; ) : at mot l log (+) (;) m teraton are enough. Snce each teraton of GWSA tae O(n) tme, the total run tme O(n log ). Therefore, to obtan a oluton wth any xed precon, only a contant number of GWSA teraton are needed. Th mple that the run tme of GWSA O(n). In practce, even for very accurate oluton, GWSA uually tae only a few teraton. So, a we wll demontrate n the next ecton, GWSA very ecent n practce.

4. EERIMENTAL RESULTS In th ecton, we wll demontrate the lnearty of the run tme of GWSA n practce the ue of better tartng oluton to peed up the optmzaton of other obectve ung Lagrangan relaxaton. We run the algorthm GWSA on an IBM C wth a 00 MHz entum ro proceor. Fgure 3 how the lnearty of the run tme of GWSA. We are ung the cloc tree r{r5 n []. The number of egment n thee tree range from 533 to 60. In order to have more data pont, we contruct 0 tree from each tree by dvdng each tree edge nto egment where = ::: 0. So we have 50 tree wth the number of egment rangng from 533 to 600. For each tree, we run GWSA wth equal 0 ;5. The run tme plotted agant the number of egment n Fgure 3. It can be een that the run tme of GWSA lnear n practce. CU Tme () 4.50 4.00 3.50 3.00.50.00.50.00 0.50 0.00 Run Tme of GWSA Lnear # Segment x 0 3 0.00 0.00 40.00 60.00 Fgure 3. Run tme of GWSA vere number of egment. To demontrate the uefulne of beng able to ue any tartng wre-zng oluton, we olve the problem of mnmzng the maxmum n delay of the cloc tree r{r5. Th problem reduced by Lagrangan relaxaton to a equence of weghted n delay problem. revouly, before olvng each weghted n delay problem, all egment are reet to ther mnmum poble wdth to form the tartng oluton of GWSA. Our reult mple that GWSA wll tll converge even f we do not reet the egment wdth. So n our new approach, we do not reet, therefore the optmal oluton of a weghted n delay problem ued a a better tartng oluton to the next one n the equence. The run tme of the prevou approach our new approach are lted n Table. For the old approach, each weghted n delay problem tae about 4 teraton of GWSA. For our approach, each weghted n delay problem tae only.6 teraton of GWSA on average. The overall mprovement on the run tme 57.7% on average. ACKNOWLEDGMENT We than Chung-ng Chen for h help n carryng out the experment. Crcut CU tme () Name Sze Old approach Our approach Improv. r 533.95 0.88 54.9% r 95 7.85 3.3 57.7% r3 73.97 5.09 57.5% r4 3805 55.34.54 59.3% r5 60 7.59 9.4 58.9% Average: 57.7% Table. Demontraton of the uefulne of beng able to ue any tartng oluton. The run tme for the old approach (reet to mn-wdth before each call to GWSA) our new approach (do not reet) are lted. REFERENCES [] Chung-ng Chen, Yao-Wen Chang, D. F. Wong. Fat performance-drven optmzaton for buered cloc tree baed on Lagrangan relaxaton. In roc. IEEE Intl. Conf. on Computer-Aded Degn, page 405{408, 996. [] Chung-ng Chen D. F. Wong. A fat algorthm for optmal wre-zng under Elmore delay model. In roc. IEEE ISCAS, volume 4, page 4{45, 996. [3] Chung-ng Chen, Ha Zhou, D. F. Wong. Optmal non-unform wre-zng under the Elmore delay model. In roc. IEEE Intl. Conf. on Computer-Aded Degn, page 38{43, 996. [4] Chr C. N. Chu, Chung-ng Chen, D. F. Wong. Fat exact multaneou gate wre zng by Lagrangan relaxaton. Techncal Report TR98{06, Department of Computer Scence, Unverty oftexa at Autn, February 998. [5] Jaon Cong Le He. An ecent approach to multaneou trantor nterconnect zng. In roc. IEEE Intl. Conf. on Computer-Aded Degn, page 8{86, 996. [6] Jaon Cong Le He. Optmal wrezng for nterconnect wth multple ource. ACM Tran. Degn Automaton of Electronc Sytem, (4), October 996. [7] Jaon Cong, Le He, Cheng-Ko Koh, atrc H. Madden. erformance optmzaton of VLSI nterconnect layout. INTEGRATION, the VLSI Journal, :{ 94, 996. [8] Jaon Cong Cheng-Ko Koh. Smultaneou drver wre zng for performance power optmzaton. In roc. IEEE Intl. Conf. on Computer-Aded Degn, page 06{, 994. [9] Jaon Cong, Cheng-Ko Koh, Kwo-Shng Leung. Smultaneou buer wre zng for performance power optmzaton. In roc. Intl. Symp. on Low ower Electronc Degn, page 7{76, Augut 996. [0] Jaon Cong Kwo-Shng Leung. Optmal wrezng under the dtrbuted Elmore delay model. In roc. IEEE Intl. Conf. on Computer-Aded Degn, page 634{639, 993. [] W. C. Elmore. The tranent repone of damped lnear networ wth partcular regard to wdeb ampler. J. Appled hyc, 9:55{63, 948. [] R. S. Tay. An exact zero-ew cloc routng algorthm. IEEE Tran. Computer-Aded Degn, ():4{49, February 993.