Abstract. The assumptions made for rank computation are as follows. (see Figure 1)

Similar documents
Problem Set 9 Solutions

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

The Minimum Universal Cost Flow in an Infeasible Flow Network

Chapter - 2. Distribution System Power Flow Analysis

Structure and Drive Paul A. Jensen Copyright July 20, 2003

Interconnect Optimization for Deep-Submicron and Giga-Hertz ICs

Coarse-Grain MTCMOS Sleep

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

Annexes. EC.1. Cycle-base move illustration. EC.2. Problem Instances

Resource Allocation with a Budget Constraint for Computing Independent Tasks in the Cloud

Statistical Circuit Optimization Considering Device and Interconnect Process Variations

Kernel Methods and SVMs Extension

Outline and Reading. Dynamic Programming. Dynamic Programming revealed. Computing Fibonacci. The General Dynamic Programming Technique

NP-Completeness : Proofs

Feature Selection: Part 1

On the Multicriteria Integer Network Flow Problem

EEL 6266 Power System Operation and Control. Chapter 3 Economic Dispatch Using Dynamic Programming

Lecture Notes on Linear Regression

An Interactive Optimisation Tool for Allocation Problems

Real-Time Systems. Multiprocessor scheduling. Multiprocessor scheduling. Multiprocessor scheduling

Simultaneous Optimization of Berth Allocation, Quay Crane Assignment and Quay Crane Scheduling Problems in Container Terminals

ECE559VV Project Report

A 2D Bounded Linear Program (H,c) 2D Linear Programming

FUZZY GOAL PROGRAMMING VS ORDINARY FUZZY PROGRAMMING APPROACH FOR MULTI OBJECTIVE PROGRAMMING PROBLEM

Assortment Optimization under MNL

Calculation of time complexity (3%)

Computing Correlated Equilibria in Multi-Player Games

Simultaneous Device and Interconnect Optimization

A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS

Effective Power Optimization combining Placement, Sizing, and Multi-Vt techniques

MMA and GCMMA two methods for nonlinear optimization

An Admission Control Algorithm in Cloud Computing Systems

Single-Facility Scheduling over Long Time Horizons by Logic-based Benders Decomposition

Lab 2e Thermal System Response and Effective Heat Transfer Coefficient

Variability-Driven Module Selection with Joint Design Time Optimization and Post-Silicon Tuning

A FAST HEURISTIC FOR TASKS ASSIGNMENT IN MANYCORE SYSTEMS WITH VOLTAGE-FREQUENCY ISLANDS

Polynomial Regression Models

Interconnect Modeling

Estimating Delays. Gate Delay Model. Gate Delay. Effort Delay. Computing Logical Effort. Logical Effort

VQ widely used in coding speech, image, and video

Chapter Newton s Method

Generalized Linear Methods

VARIATION OF CONSTANT SUM CONSTRAINT FOR INTEGER MODEL WITH NON UNIFORM VARIABLES

CHAPTER 17 Amortized Analysis

Module 9. Lecture 6. Duality in Assignment Problems

Copyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for P Charts. Dr. Wayne A. Taylor

Winter 2008 CS567 Stochastic Linear/Integer Programming Guest Lecturer: Xu, Huan

find (x): given element x, return the canonical element of the set containing x;

Amiri s Supply Chain Model. System Engineering b Department of Mathematics and Statistics c Odette School of Business

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

x = , so that calculated

Second Order Analysis

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE

CONTRAST ENHANCEMENT FOR MIMIMUM MEAN BRIGHTNESS ERROR FROM HISTOGRAM PARTITIONING INTRODUCTION

Appendix B: Resampling Algorithms

The Study of Teaching-learning-based Optimization Algorithm

Pricing and Resource Allocation Game Theoretic Models

Solutions to exam in SF1811 Optimization, Jan 14, 2015

Outline. Communication. Bellman Ford Algorithm. Bellman Ford Example. Bellman Ford Shortest Path [1]

Some modelling aspects for the Matlab implementation of MMA

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

DUE: WEDS FEB 21ST 2018

TOPICS MULTIPLIERLESS FILTER DESIGN ELEMENTARY SCHOOL ALGORITHM MULTIPLICATION

Reliable Power Delivery for 3D ICs

Dynamic Programming. Preview. Dynamic Programming. Dynamic Programming. Dynamic Programming (Example: Fibonacci Sequence)

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

Finding Dense Subgraphs in G(n, 1/2)

Design and Optimization of Fuzzy Controller for Inverse Pendulum System Using Genetic Algorithm

Optimum Design of Steel Frames Considering Uncertainty of Parameters

Improved Worst-Case Response-Time Calculations by Upper-Bound Conditions

4DVAR, according to the name, is a four-dimensional variational method.

( ) = ( ) + ( 0) ) ( )

Department of Electrical & Electronic Engineeing Imperial College London. E4.20 Digital IC Design. Median Filter Project Specification

A linear imaging system with white additive Gaussian noise on the observed data is modeled as follows:

Embedded Systems. 4. Aperiodic and Periodic Tasks

Speeding up Computation of Scalar Multiplication in Elliptic Curve Cryptosystem

AN EFFICIENT TECHNIQUE FOR DEVICE AND INTERCONNECT OPTIMIZATION IN DEEP SUBMICRON DESIGNS. Jason Cong Lei He

Tornado and Luby Transform Codes. Ashish Khisti Presentation October 22, 2003

Turbulence classification of load data by the frequency and severity of wind gusts. Oscar Moñux, DEWI GmbH Kevin Bleibler, DEWI GmbH

Design and Analysis of Algorithms

Leakage and Dynamic Glitch Power Minimization Using Integer Linear Programming for V th Assignment and Path Balancing

Singular Value Decomposition: Theory and Applications

Maximizing the number of nonnegative subsets

Errors for Linear Systems

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

Basic Statistical Analysis and Yield Calculations

Welfare Properties of General Equilibrium. What can be said about optimality properties of resource allocation implied by general equilibrium?

On the correction of the h-index for career length

Negative Binomial Regression

Inductance Calculation for Conductors of Arbitrary Shape

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

Min Cut, Fast Cut, Polynomial Identities

Temperature. Chapter Heat Engine

Week3, Chapter 4. Position and Displacement. Motion in Two Dimensions. Instantaneous Velocity. Average Velocity

An Integrated OR/CP Method for Planning and Scheduling

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

Logic effort and gate sizing

CHAPTER III Neural Networks as Associative Memory

Global Sensitivity. Tuesday 20 th February, 2018

Transcription:

A Novel Metrc for Interconnect Archtecture Performance Parthasarath Dasgupta, Andrew B. Kahng, and Swamy Muddu CSE Department, UCSD, La Jolla, CA 92093-0114 ECE Department, UCSD, La Jolla, CA 92093-0407 partha@cs.ucsd.edu, abk, smuddu @ucsd.edu Abstract We propose a new metrc for evaluaton of nterconnect archtectures. Ths metrc s computed by optmal assgnment of wres from a gven wre length dstrbuton (WLD) to a gven nterconnect archtecture (IA). Ths new metrc, the rank of an IA, s a sngle number that gves the number of connectons n the WLD that meet a specfc target delay when embedded n the IA. A dynamc programmng algorthm s presented to exactly compute the rank of an IA wth respect to a gven WLD wthn practcal runtmes. We use our new IA metrc to quanttatvely compare mpacts of geometrc parameters as well as process and materal technology advances. For example, we observe that 42% reducton n Mller couplng factor acheves the same rank mprovement as a 38% reducton n nter-layer delectrc permttvty for a 1M gate desgn n the 130nm technology. 1 Introducton Performance evaluaton of nterconnect archtectures (IA) s typcally made wth respect to delay, crosstalk nose, number of nterconnecton layers and congeston. These factors are often studed wth respect to global lnes whch are crtcal to meet performance requrements. However, such studes often fal to consder factors such as va blockage and repeater nserton n sem-global and local layers 1. Prevous IA qualty measures are also typcally ndependent of desgn parameters (e.g., Rent parameter or wre length dstrbuton) and do not permt quantfed comparson of dfferent types of IA mprovements (materals, dmensons, etc.) In ths paper, we propose a novel metrc for nterconnect archtecture performance whch returns a sngle number for gven IA and wre length dstrbuton (WLD). Our metrc s the rank of the IA wth respect to a WLD. The metrc s computed by optmal assgnment of connectons (and repeater from a fxed repeater resource) from the WLD to the IA wth the objectve of maxmzng the number of longest wres that meet ther clock frequency dependent target delays. The rest of the paper s organzed as follows. Secton 2 descrbes the prevous works on performance evaluaton and optmzaton of IA s. Secton 3 ntroduces the proposed metrc and Secton 4 gves a dynamc programmng (DP) algorthm for computaton of rank by optmal assgnment of wres. Secton 5 descrbes our expermental setup and results for rank-based comparson of varous IA mprovements. Secton 6 provdes conclusons and future research drectons. 2 Related Works Performance evaluaton of IA s has been extensvely studed n recent years. [6] presents a study of scalng nterconnect parameters Ths work was partally supported by Cadence Desgn Systems, Inc. and the MARCO Ggascale Slcon Research Center. Parthasarath Dasgupta s permanent address: Dept. of Management Informaton Systems, Indan Insttute of Management. Emal: partha@mcal.ac.n 1 Va blockage effect decreases the total wrng area avalable n a layer. Repeater nserton n global layers ncreases the va blockage n local layers. These two effects thus ncrease the number of layers requred for routng a desgn and must be taken nto account durng IA evaluaton. on delay and sgnal ntegrty. They perform delay and nose analyss of global lnes over dfferent technology nodes wth varyng crtcal lne lengths. Smlar studes n [10] and [2] study the effect of changng geometrc parameters and technology constrants on delay, crosstalk and sgnal ntegrty. However, these works do not consder local and sem-global lnes n performance measurement. [11] evaluates delay of global lnes and prescrbes desgn technques for mprovng delay. [5] also computes the delay of global lnes, but consders repeater nserton to mnmze total delay. In [1] and [13], geometrc parameters of nterconnect lnes n local, sem-global and global layers are optmzed to ncrease routng densty whle mnmzng area and delay. [13] gves an optmal top to bottom desgn methodology for mnmzng delay, number of wrng layers, and area. Repeaters are nserted to mnmze the delay of global lnes wthn a gven repeater area resource. In most of these recent studes, effects of va blockage and repeater nserton on desgn are not consdered when measurng performance. In [7] and [3], these factors are shown to strongly affect the number of layers needed to acheve a gven IA qualty. Repeater nserton s also shown to severely lmt the delay performance of an IA n [13]. We take these factors nto consderaton when defnng a new IA metrc. 3 A New Metrc for Interconnect Archtecture Evaluaton Ultmately, qualty of an IA should reflect how well the IA allows desgners to meet both performance requrements and manufacturng constrants. We seek a qualty metrc whch s smple, effcently computable, desgn-dependent, frequency-dependent and senstve to nterconnect geometrc parameters as well as materal propertes. Gven an nterconnect archtecture and a wre length dstrbuton 2, our new metrc determnes the qualty of the IA by optmal assgnment of wres from the WLD to the IA subject to the constrants that () longer wres are assgned to hgher layers and () longer wres are buffered frst to meet ther target delays. The qualty of an IA s determned by the rank of the frst wre assgnment that fals to meet ts target delay (Defnton 2). Defnton 1 The rank of a wre s ts ndex n the wre length dstrbuton (WLD), where the wres have been arranged n order of non-ncreasng lengths. Defnton 2 The rank of an nterconnect archtecture α, denoted by r(α), s a non-negatve nteger, and s gven by the ndex of the hghest-rank wre n the WLD that fals to meet ts target delay, wthn a specfed repeater area budget, subject to the condton that all the wres of the WLD can be assgned n the gven archtecture. Defnton 3 An nterconnect archtecture α has a rank r(α) =0f not all the wres n the gven WLD can be assgned to ts layer-pars even wthout meetng the delay requrements. The assumptons made for rank computaton are as follows. (see Fgure 1) 2 The WLD used n ths study s the stochastc wre length dstrbuton of [4].

The gven IA s characterzed by layer-pars. All wres n a layer-par have dentcal values of wdth and thckness. The spacng between any two adjacent wres n a gven layer-par and the heght of the nter-layer delectrc (ILD) between any two consecutve layer-pars are constants. All wres n the archtecture are L -shaped. Each segment of an L -shaped wre s routed n one layer of a layer-par. Va area for the L, and of the ends of the L segments, s computed as a part of the wre. Longer wres are routed on upper layer-pars and shorter wres are routed on lower layer-pars. The maxmum area avalable for repeaters s specfed as a percentage of total de area 3. All gates are placed evenly n the entre de. Repeaters used n all wres of a gven layer-par are of unform sze. Repeaters are nserted startng from the longer wres and proceedng to the shorter wres. Then, the problem of computng the rank of an IA can be well-defned as follows: Input: (see Table 1 for notaton) Interconnect archtecture α wth fxed number of layer-pars and fxed values of wdth, spacng, heght, and thckness per layer. WLD w contanng n wres. Avalable repeater area A R. Upper bounds d on the maxmum permssble delay of wre, 1 n n the WLD w. Objectve: Assgn wres from the gven WLD to the layer-pars of the gven archtecture α, and nsert repeaters, such that r α s maxmum. Layer par j Repeaters Layer par j+1 Fgure 1: Longer wres are assgned to hgher layer-pars. Shorter wres are assgned to lower layer-pars. Repeaters are nserted n longer wres frst to meet the target delay requrements. 4 Rank Computaton usng Dynamc Programmng To compute the rank of the IA, a maxmum number of wres should be assgned to ts layer-pars, satsfyng delay requrements. To acheve ths, we requre an optmal combnaton of wres assgned to layer-pars, repeaters nserted n the wres, and vas. Such an optmal combnaton s not guaranteed by greedy top-down assgnment of wres to layer-pars wth repeater nserton. Fgure 2 shows a counter example wth all four wres to be assgned havng equal 3 In the current verson of our mplementaton, we do not reconcle mpled drver and recever szng wth total gate area budget. However, the DP algorthm can be extended to address ths. Notaton α η ν τ j a A d A R A w j A v j A u j b B j c o c j C L c p d D f c j l l max m M n p q r r o r j s j S j s opt j v a w W j z r Descrpton Archtecture wth m layer-pars, fxed wdth, spacng, and thckness Repeater count n wre Number of vas contrbuted by a wre n α Delay of wre segment between two consecutve repeaters n layer-par j Swtchng constant of repeater De area Maxmum repeater area Total wrng area n layer-par j Total area allocated n layer-par j for vas from wres assgned to layer-pars 1 j 1. Total area allocated n layer-par j for vas from repeaters used n layer-pars 1 j 1. Swtchng constant of repeater Avalable area for wre assgnment n layer-par j Input capactance of mnmum-szed nverter Capactance per unt length of wre on layer-par j Load capactance Parastc capactance of transstor n drver Target delay of wre n WLD Delay of wre Target clock frequency Index of wre n WLD Index of a wre n WLD that meets target delay Index of layer-pars n IA Length of wre Maxmum wre length Number of layer-pars n IA α Array storng feasblty of wre assgnment Number of wres n WLD Varable representng wre ndex Varable representng layer-par ndex Varable representng repeater area Output resstance of mnmum-szed nverter Resstance per unt length of wre on layer-par j Repeater sze correspondng to layer-par j Spacng between adjacent wres n layer-par j Optmal repeater sze n layer-par j Area of a va (obtaned from process parameters) Wre length dstrbuton Wdth of wre n layer-par j Number of repeaters used for area r Table 1: Table of notatons. length. RC delay of the upper layer-par s much larger than that of the bottom layer-par. Greedy wre assgnment assgns two wres to the upper layer-par and repeaters to meet target delay, but ths exhausts the repeater budget of eght repeaters. Wres assgned to the lower layer-par thus fal to meet target delay. The optmal soluton has rank 4 whle the greedy soluton has rank 2. (a) (b) Wres n layer par j meet target delay Wres n layer par j+1 fal to meet target delay Wre n layer par j meets target delay. Wres n layer par j+1 meet target delay. Fgure 2: Suboptmalty of greed. (a) shows the greedy wre assgnment to two consecutve layer-pars achevng rank = 2. (b) shows the optmal wre assgnment achevng rank = 4. Exhaustve search over all possble layer-assgnments and repeater confguratons s mpractcal. We now gve a dynamc programmng (DP) algorthm that performs optmal rank computaton n reasonable tme. The DP runs n m stages, where m = total number of layer-pars n the archtecture. The problem of rank computaton s consdered as a collecton 2

of subproblems, where each subproblem s characterzed by the four parameters () number of wres to be assgned, () number of layer-pars used for assgnng the wres, () repeater area used to satsfy the delay constrants, and (v) number of wres assgned that meet delay constrants. Let, j, r and respectvely denote the elements of the 4-tuple n the order. A four-dmensonal boolean array M of cells s defned wth dmensons correspondng to, j, r and. If wres can be assgned to j layer-pars, such that wres meet ther target delay usng at most r repeater area and the remanng n wres can be assgned to m j layer-pars (gnorng delay requrements), then the value of M j r s 1. If the assgnment s nfeasble, then M j r s 0. The DP populates the cells of M accordng to the recurrence relaton gven by Equaton (1), where 1,, 1, 2 n,1 j m, 1 r, r 1, r 2 A R, and z r1 and z r2 are the number of repeaters correspondng to repeater areas r 1 and r 2 respectvely. The defntons of the terms 1, 2, 3 n Equaton (1) are as follows. M[1, j, r 1, 1 ] correspond to prevously computed entres of M. M (1, j 1, z r 1, r r 1, r 2, 2, ) ndcates whether t s possble to assgn wres 1 1,, 1 2 meetng delay requrements to the j 1 st layer-par usng at most r r 1 repeater area, gven that 1 wres have already been assgned to layerpars 1,, j usng r 1 repeater area, and also that - 1-2 wres ft nto layer-par j 1, gnorng the delay constrants. r 2 r r 1 denotes the actual repeater area used for assgnng 2 wres to j 1 st layer-par. z r1 s used to compute the va area used n j 1 st layer-par due to r 1 repeater area used n layer-pars 1 j. M (.) s 1 f the assgnment s feasble, and 0 otherwse. M (n,, m, j +1,z r1 z r2 ) ndcates whether t s possble to assgn (n - ) wres to the (remanng) last (m -(j + 1)) layer-pars gnorng the delay requrements, gven that r 1 r 2 repeater area has been used n wres n layer-pars 1 j +1. z r1 z r2 s the repeater count correspondng to r 1 r 2 repeater area used n layer-pars 1 j 1. M (.) s 1 f the assgnment s feasble, and 0 otherwse. For rank computaton, we need to fnd the maxmum value of for whch M j A R s 1 for 1 n and j 1 m. Algorthm 1: Rank computaton Input: number of wres n, number of layer-pars m, maxmum repeater area A R Output: Rank of archtecture r α 1. Intalze M(n, A R ) 2. update M(n, m, A R ) // ths s the key step n rank computaton 3. for j = m to 1 4. for = n to 1 5. for = to 1 6. f M[, j, A R, ] == 1 then 7. return( ) 8. return(0) Fgure 3: Algorthm for computaton of IA rank. The DP algorthm starts wre assgnment from the topmost layer-par and proceeds to the lower layer-pars. Longer wres are assgned to the hgher layer-pars and shorter wres to the lower layer-pars. In each teraton of the DP (Steps 8 10 n Fgure 5), we compute a bnary value that ndcates the feasblty of wre assgnment to a sequence of layer-pars. Startng wth Algorthm 2: Intalze M Input: number of wres n, maxmum repeater area A R Output: Intalzed Boolean Array M 1. for =1ton 2. for r =1toA R 3. f M 0 2 0 r r 2 == 1 and M n m 2 z r2 == 1 then M 1 r =1 4. else M j r =0 Fgure 4: Intalzaton of data structure M. Algorthm 3: DP (update M) Input: number of wres n, number of layer-pars m, maxmum repeater area A R Output: Boolean array M 1. for =2ton 2. for j =1tom -1 3. for =1to 4. for r =1toA R 5. M[, j 1,r, ]=0 6. for 1 =1to 7. for r 1 =1tor 8. p = M 1 j r 1 1 M 1 j 1 z r1 r r 1 r 2 1 M n m j 1 z r1 z r2 9. M! j 1! r! M! j 1! r! $ p 10. f M! j 1! r! == 1 then goto step 4 Fgure 5: Procedure for updatng boolean array M. the longest wre and the topmost layer-par, we compute the feasblty of assgnng n wres to m layer-pars usng at most A R repeater area. Wre assgnment wth repeater nserton n a specfc layer-par s performed by a functon M (.). Wre assgnment to a sequence of layer-pars wthout consderng delay requrements s performed by M (.). The algorthm starts wth an ntal set of values of the boolean array M j r, for 1 n, j 1, and r =1toA R, set by the functon & ' ( (, ( -. 0 n Step 1 of Fgure 4. At the frst teraton, the value of M j 1 r s computed for r 1 and for j = 1. Subsequent teratons wll compute the new values of M j 1 r from the pre-computed values of M[ ], and the values returned by M (.) and M (.). The tme complextes of the procedures gven above are as follows. M (.) has a worst-case complexty of O n A R. M (.) has a worst-case complexty of O n. The functon. 3 0 has a worst-case complexty of O m 4 n 4 4 A 3 R. Overall, the worst-case tme complexty of our algorthm for rank computaton (Fgure 3) s O m 4 n 4 4 A 3 R. 4.1 Delay Computaton and Repeater Inserton Target delay for wre n the WLD s defned as d l 5 l max 4 15 f c where d represents the normalzed (wth respect to length) delay of the wre, l s the length of wre, and l max s the maxmum wre length n WLD. Longer wres have a larger value of d and hence have more strngent delay requrements than shorter wres. The actual delay D of a wre depends on the layer-par to whch t s assgned. Repeater nserton s performed accordng to the followng rules. Longer wres are buffered before shorter wres. Incremental nserton of repeaters s performed untl target delay s met or repeaters cannot be placed at approprate ntervals for a wre. 3

1 M j 1 r M 2 3 1 j r 1 1 M 1 j 1 z r 1 r r 1 r 2 2 M n m j 1 z r1 z r2 r 1 r 2 r 1 2 (1) Repeaters nserted n all wres of a layer-par are of unform sze. The optmal repeater sze s opt j s determned usng constants r j and c j of a layer-par. Thus, the number of repeater types s equal to the number of layer-pars. The number and sze of repeaters nserted n a wre depends upon the layer-par 4 to whch the wre s assgned (as well as the wre length and delay constrants). The delay of wre assgned to layer-par j s computed from the model of nterconnect gven n [15]. Specfcally, delay of a wre segment of length l between any two consecutve repeaters s gven by [15]: τ j br tr C L c p b c j R tr r j C L l a r j c j l 2 (2) where R tr s transstor equvalent resstance, a and b are constants 5 that depend on the swtchng model of the repeater, and C L and c p are load and parastc capactances respectvely. Also, r j and c j are determned completely by the wre wdth, spacng and thckness of a layer-par. Repeater sze s j s expressed as a multple of the mnmum nverter sze. The sze of the repeater requred to mnmze total wre delay s a functon of wre parameters and s determned by R tr r o5 s j and C L s j 4 c o, where r o s the output resstance and c o s the nput capactance of a mnmum-szed nverter. On layer-par j, the total delay of a wre of length l wth η repeaters, each of sze s j, s gven by the followng equaton [15]: D η τ j η br o c o c p b c j r o s j r j c o s j l η r j c j a l 2 η 2 r br o c o c p η b o c j r j c o s j l r j c j a l 2 (3) s j η 2 To make D d, we nsert η repeaters each of sze s j n wre. A closed form soluton for η and s j cannot be obtaned by solvng D d. Instead, we () determne optmum repeater sze s j for layer-par j to mnmze delay [14] and () nsert repeaters 6 of sze s j ncrementally n a wre untl D d. Optmum repeater sze requred to mnmze total delay D s obtaned by settng D s 0 j and s gven by s opt c j j r o (4) c o r j To compute wre area avalable for routng n a layer-par, wre count and the number of repeaters nserted n the wres are requred. We compute the number of repeaters correspondng to a repeater area as the rato of repeater area to the repeater sze. r 1 z r1 (5) s j In Subsectons 4.2 and 4.3, we descrbe the evaluaton of M (.) and M (.). In these procedures, Equaton (5) s used to obtan the number of vas to be allocated n a gven layer-par for repeaters already nserted n hgher layer-pars. 4.2 Wre Assgnment to Layer-Par Wth Delay Requrements We now explan key aspects of the procedure ( ( ' for computng M (.) n the Equaton (1) recurrence. Ths procedure returns a boolean value ndcatng the feasblty of assgnment of wres to a layer-par consderng delay requrements. 4 Resstance per unt length, capactance per unt length, and ground capactance depend on parameters of the layer-par. a 5 0 4 and b 0 7 for wre delay computaton [15]. 6 In ths work we assume unform sze repeaters n all wres of a layer-par. Algorthm 4: Wre assgnment (wth delay requrements) wre assgn Input: number of wres 1 above layer-par j, current layer-par j, number of repeaters z r1 used for wres n layer-pars 1!! j 1, repeater area r 3 avalable for assgnment n layer-par j, number of wres 2 requred to meet target delay n layer-par j, total number of wres to be assgned up to current layer-par, de area A d, repeater sze s opt j, WLD w, target delay d of wres. Output: Boolean value M 1! j! z r 1! r 3! r 2! 2!, r 2 = repeater area actually used n current layer-par j 1. B j = A d - A v j 1 - A u j 1 // A v j 1, A u j 1 are computed from 1 and z r 1 respectvely 2. p = 1 1 3. whle p 1 2 4. wre area = l p! W j S j 5. f (wre area B j ) then goto step (6) else return(0) 6. assgn wre p to layer-par j 7. B j B j - wre area 8. whle (D p d p AND repeater area# r 3 ) 9. compute D p 10. repeater area = repeater area + s opt j 11. f (repeater area == r 3 ) then return(0) 12. p p 1 13. f wres 1 14. return(1) 2 1,, cannot be assgned then return(0) Fgure 6: Algorthm for assgnment of wres to sngle layer-par consderng delay requrements. It assumes that 1 wres are assgned to layer-pars 1 j 1 meetng delay requrements usng z r1 number of repeaters. 1 2 wres are to be assgned to the current layer-par ( j) of whch 2 wres should meet the target delay wthn an avalable repeater area r 3. Intally, the avalable area (B j ) for assgnment of wres n layer-par j s computed from de area A d, va area ( A v j 1 ) used by wres n layer-pars 1 j 1, and va area (A u j 1 ) used by repeaters nserted n wres on the layerpars 1 j 1. Wres are assgned ncrementally n the current layer-par untl ether no more area s avalable for assgnment of wres, or the number of wres assgned s equal to the specfed count. The procedure returns 0 f the former condton s satsfed. For each wre assgned, ts actual delay D s computed, and s compared wth ts target delay d. If D $ d, then repeaters of sze s opt j are nserted ncrementally untl D d or repeater area used s not less than the avalable area r 3. The procedure returns 0 f the avalable area for repeaters s used up before the delay n the wre reaches the desred bound. If the procedure s able to successfully assgn 2 wres wthn the avalable repeater area, t next attempts to assgn the remanng 1 2 wres to the current layer-par gnorng delay constrants. If the assgnment s unsuccessful, t returns 0. If all the above assgnments can be done successfully, the procedure returns 1. 4

Algorthm 5: Greedy assgnment (greedy assgn) Input: total number of wres n, ndex of last wre assgned so far, number of layer-pars m j 1, number of repeaters z r1 z r2 used for repeater area (r 1 r 2 ) for wres n layer-pars 1!! j 1, de area A d. Output: Boolean value M n!! m! j 1! z r1 z r2!!!!!! 1. for q m to j 2//q s the layer-par ndex 2. compute B q = A d z r1 z r2 ν v a 3. q m // start wth bottommost layer-par 4. p n // start wth smallest wre 5. whle(q j 1 ) 6. f(p ) then return(1) 7. A w q = l p W q S q 8. A v q =0 9. whle(a w q A v q B q ) 10. assgn wre p to layer-par q 11. compute A w q A w q l p W q S q 12. compute A v q p ν v a 13. p p 1 14. f(p ) then return(1) 15. q q 1 16. return(0) Fgure 7: Greedy algorthm for assgnment of wres to layer-pars wthout consderng delay bounds. ( ' 4.3 Wre Assgnment to Layer-Pars Wthout Delay Requrements The procedure computes the feasblty of assgnng n wres to m j 1 layer-pars wthout consderng delay requrements. Wre assgnment s performed to layer-pars greedly n a bottom-up manner untl all the layer-pars are full. Salent aspects are as follows. Repeaters assgned to wres n layer-pars 1 j are routed usng vas passng through all the layer-pars below. Wre assgnments n all the layer-pars j 2 m take nto account area occuped by repeater vas. Ths s computed n Steps 9 and 10 of procedure ( '. The area remanng n layer-par j for wre assgnment s A d A u j (after removng area correspondng to repeater vas). Wres are assgned bottom-up startng from layer-par m, and from wre n. Incremental assgnment s performed to one layer-par at a tme untl A w j A v j A d A u j. If the avalable area n a layer-par s zero, then wre assgnment starts from the next hgher layer-par. ( ' The procedure returns 1 f all n wres can be assgned wthn the avalable m j 1 layer-pars. It returns 0 otherwse. Lemma 1 s optmal. Proof. The procedure has the followng characterstcs: () wres are assgned n ascendng order of ther lengths, startng from the shorter wres at the bottom layers; () t uses strctly more wres n the lower layers; () t has strctly more wrng resource n the hgher layers and (v) t has less wrng demand n the upper layers. Thus, at any stage, f there s some extra space n any lower layer-par, then some wres can always be moved from a hgher layer-par to the lower layer-par to get an mproved soluton. Ths procedure thus attempts to pack wres n the layerpars strctly n a bottom-up manner, and uses optmum number of layer-pars. 5 Performance Studes We study the varaton n rank for dfferent IA s and WLD s. Archtectures chosen for study are based on TSMC parameters for the 180nm, 130nm and 90nm technology nodes (gven n Table 3) [12]. WLDs are generated for 1M, 4M and 10M gate desgns usng the method of [4] and Rent parameter p = 0.6. Varaton of rank s studed wth varyng ILD permttvty, Mller couplng factor, target clock frequency, and maxmum repeater area. To reduce runtme, we perform coarsenng of the WLD for large nstances. 5.1 Coarsenng of the WLD The tme complextes of our proposed algorthms are very large. For the large gate counts used n our studes, nave mplementaton of the basc algorthm requres exorbtant runtme. We reduce nstance complexty by formng bunches of connectons gven by the WLD, such that each bunch s a collecton of wres of unform sze, and assgnment of the connectons to the layer-pars s done n bunches of several wres nstead of the smple method of one wre at a tme. The rank of the archtecture s determned by the actual number of wres present n the maxmum set of bunches that can be feasbly assgned to the layer-pars. Hence, error n rank computaton due to bunchng can be at most the sze of the maxmum bunch formed from the gven WLD. The relaton between maxmum number of bunches, bunch sze and the number of wres s gven by max number o f bunches number o f wres5 bunch sze. In our bunchng procedure, all the wres of a bunch are of unform length. For nstance, for a set of 100 wres of dentcal sze, f the bunch sze s specfed as 40, we generate three bunches, of szes 40, 40 and 20 respectvely. Delay consderatons for a bunch can be easly obtaned from those of a sngle wre n the bunch. However, our proposed bunchng scheme may not be approprate for an nput WLD wth very few wres of dentcal sze. 7 5.2 Expermental Results and Implcatons Delectrc constant, Mller couplng factor, target clock frequency and maxmum repeater area are vared to study the effect of these parameters on rank. A baselne desgn s chosen and each of the above parameters s vared one at a tme to observe varaton n rank. We performed experments wth baselne desgns of 4M gates n the 90nm, 1M gates n the 130nm, and 1M gates n the 180nm technology nodes. For space reasons, here we report experments wth a sngle baselne desgn of 1M gates n the 130nm technology node. The baselne parameters are gven n Table 2. Parameter Baselne value k 3.9 Mller couplng factor 2 Repeater area fracton 0.4 Sem-global layer-pars 2 Global layer-pars 1 Target clock frequency 500MHz Table 2: Baselne parameters for the 180nm, 130nm and 90nm technology node desgns. WLDs are generated for the 1M and 4M gate desgn based on the stochastc WLD model of [4] wth Rent parameter p = 0.6. Intally, de area s computed based on the gate ptch (g), number of gates (N) asde Area due to gates g 2 N. Maxmum repeater allocaton (A r ) s specfed as a fracton of the de area and s added 7 For further reducton of runtme, we also use a dfferent, and orthogonal, nstance sze reducton from the bunchng technque. In ths bnnng technque, we replace a group of wres wth a sngle wre whose length s the mean of all wre lengths n the group. Thus, for example, f we have a set of wres of lengths 5996, 5997, 5998, 5999, and 6000 of counts 3, 2, 2, 1 and 1 respectvely, then the bnnng procedure wll reduce ths set to a sngle wre length of 5998 wth a count of 9. Ths reduces the sze of the dstrbuton by factor of 5. Bnnng can also be used on bunched wres. Whle ths separate coarsenng technque s also avalable, we dd not use t because practcal runtmes were achevable usng only bunchng. (However, snce our present results may be partly compromsed by the effects of large bunch sze, our ongong expermentaton s ncorporatng a reduced bunch sze n conjuncton wth bnnng.) It s mportant to note that bunchng and bnnng do not change the tme complexty of our DP algorthm. 5

1 1 to the actual de area. Then, the actual de area used s gven by Equaton (6). A r Max repeater f racton 4 A d (6) A d A r De Area due to gates The gate ptch for computng the actual wre lengths n the WLD s then obtaned by dstrbutng gates evenly n the actual de area A d. A clock frequency of 1.7GHz s chosen for 130nm (maxmum MPU clock frequency based on ITRS 2001 [8]). The bunch sze used s 10000. The technology parameters chosen for the study of varaton of rank are gven n Table 3. For de area computaton, gate ptch s taken as 12 6 4 Tech Node (based on emprcal data from ITRS [8]). The varaton of rank wth delectrc constant, Mller couplng factor, target clock frequency, and repeater area s gven n Table 4. Parameter 180nm 130nm 90nm M 1 mnmum wdth 0 230µm 0 160µm 0 120µm M 1 mnmum spacng 0 230µm 0 180µm 0 12µm M 1 thckness 0 483µm 0 336µm 0 26µm Mx mnmum wdth 0 280µm 0 200µm 0 14µm Mx mnmum spacng 0 280µm 0 210µm 0 14µm Mx thckness 0 588µm 0 340µm 0 30µm Mt mnmum wdth 0 440µm 0 440µm 0 42µm Mt mnmum spacng 0 460µm 0 460µm 0 42µm Mt thckness 0 960µm 1 020µm 0 88µm V 1 mnmum wdth 0 260µm 0 190µm 0 13µm V x mnmum wdth 0 260µm 0 260µm 0 13µm V t mnmum wdth 0 360µm 0 360µm 0 36µm Table 3: Technology parameters used for study of varaton of rank for the 130nm technology node. Parameters for the 180nm and 90nm technology nodes are also gven. For 180nm, x = 2, 3, 4, 5 and t = 6. For 130nm, x=2,3,4,5,6andt=7.for90nm, x=2, 3, 4, 5, 6, 7 and t = 8. K M C R 3.90 0.397288 2.00 0.397288 5.00e+08 0.397288 0.10 0.117438 3.80 0.402596 1.95 0.401711 6.00e+08 0.391980 0.20 0.210967 3.70 0.407019 1.90 0.407019 7.00e+08 0.388441 0.30 0.303728 3.60 0.413212 1.85 0.412327 8.00e+08 0.385787 0.40 0.397288 3.50 0.418520 1.80 0.418520 9.00e+08 0.384018 0.50 0.491019 3.40 0.424713 1.75 0.423828 1.00e+09 0.382249 3.30 0.430021 1.70 0.429136 1.10e+09 0.309706 3.20 0.437098 1.65 0.435329 1.20e+09 0.309706 3.10 0.444175 1.60 0.441521 1.30e+09 0.309706 3.00 0.450368 1.55 0.449483 1.40e+09 0.309706 2.90 0.458330 1.50 0.456561 1.50e+09 0.309706 2.80 0.465364 1.45 0.463594 1.60e+09 0.235608 2.70 0.474210 1.40 0.471556 1.70e+09 0.235608 2.60 0.482172 1.35 0.479518 2.50 0.491904 1.30 0.488365 2.40 0.501635 1.25 0.498096 2.30 0.512251 1.20 0.507828 2.20 0.522867 1.15 0.518444 2.10 0.534368 1.10 0.529060 2.00 0.547637 1.05 0.540560 1.80 0.575947 1.00 0.553830 1.90 0.560907 Table 4: Varaton of rank for the 130nm, 1M gate desgn. The second sub-column n each column corresponds to normalzed rank. Legend: K ILD permttvty; M Mller couplng factor; C target clock frequency; R maxmum repeater fracton of de area. We observe that reducton n delectrc constant enables reducton n couplng capactance and delay. For the 130nm technology node (Table 4), reducton of 38% n k produces the same ncrease n rank as 42.5% change n Mller couplng factor 8. For ease of comparson, we normalzed rank wth respect to the total number of wres n the WLD. Smulatons were performed on a dual-processor Intel Xeon system wth 2GB of memory, runnng Lnux OS. No rank computaton has runtme greater than 200s n our mplementaton. 6 Conclusons and Future Drectons In ths paper, we have proposed a new metrc for evaluaton of qualty of nterconnect archtectures. A dynamc programmng method 8 In our experments, we consdered the mnmum value of Mller couplng factor to be 1.0. Ths can be acheved by double-sded sheldng of lnes. for rank computaton s presented. The varaton of rank wth K, M, C and R (see Table 3 for notaton) for dfferent geometrc parameters and technology nodes s studed. Results show, n general, an mprovement of rank wth decreasng values of K, M, C, and ncreasng values of R. Comparson of trends n varaton of rank for the 130nm technology node for a 1M gate desgn ndcate that reductons n M can have almost the same performance mpact as reducton n K. The varaton of rank wth several geometrc and technology parameters show the need to co-optmze across several materal, process, and desgn characterstcs to acheve hghrank embeddngs of future WLDs n future nterconnect archtectures. In other words, t s not possble to enable future MPU-class desgns by materal mprovements alone. In our study, the delay requrement of wres n the WLD s assumed to be lnear n wre length. Ths requrement becomes unreasonable snce the actual delay of the connectons n the IA s proportonal to the square of length. Thus, we are currently studyng alternatve models for per-connecton delay requrement. We are also pursung drect optmzaton of nterconnect archtectures accordng to our proposed metrc, wth the goal of evaluatng ITRS and foundry BEOL archtectures. References [1] M. B. Anand, M. Kukumu and H. Shbta, Multobjectve Optmzaton of VLSI Interconnect Parameters, IEEE Trans. on CAD of Integrated Crcuts and Systems, 17(12), 1998, pp. 231-239. [2] M. T. Bohr, Interconnect Scalng - The Real Lmter to Hgh Performance ULSI, IEDM Tech Dg., 1995, pp. 241-244. [3] Q. Chen, J. A. Davs, J. D. Mendl and P. Zarkesh-Ha, A Compact Physcal va Blockage Model, IEEE Trans. on VLSI Systems, 8(6), 2000, pp. 689-692. [4] J. A. Davs, V. K. De and J. D. Mendl, A Stochastc Wre-length Dstrbuton for Ggascale Integraton (GSI) - Part 1: Dervaton and Valdaton, IEEE Trans. on Electron Devces, 45(3), 1998, pp. 580-589. [5] M. Edahro, Y. Hayash and S. Takahash, Interconnect Desgn Strategy: Structures, Repeaters and Materals wth Strategc System Performance Analyss (S 2 PAL) Model, IEEE Trans. on Electron Devces, 48(2), 2001, pp. 345-356. [6] C. Hu, S. -Y. Oh, O. S. Nakagawa and D. Sylvester, Interconnect Scalng: Sgnal Integrty and Performance n Future Hgh-Speed CMOS Desgns, Symposum on VLSI Technology: Dgest of Techncal Papers, 1998, pp. 45-47. [7] A. B. Kahng, S. Mantk, D. Stroobandt, Toward Accurate Models of Achevable Routng, IEEE Trans. on CAD of Crcuts and Systems, 20(5), 2001, pp. 648-659. [8] The Internatonal Technology Roadmap for Semconductors, 2001 edton, Internatonal Sematech, Austn, Texas, December 2001. http://publc.trs.net/ [9] A. B. Kahng, D. Stroobandt, Wrng layer assgnment wth consstent stage delays, Proc. Intl. Workshop SLIP, 2000, pp. 115-122. [10] K. Rahmat, O. S. Nagakawa, S. -Y. Oh and J. Moll, A Scalng Scheme for nterconnect n deep submcron. HP techncal lterature, ULSI Laboratory, Hewlett-Packard Co., Palo Alto, CA 94304. [11] S. Odanaka and K. Yamashta, Interconnect Scalng Scenaro Usng a Chp Level Interconnect Model, IEEE Trans. on Electron Devces, 47(1), 2000, pp. 151-162. [12] Tawan Semconductor Manufacturng Company Ltd. http://www.tsmc.com/ [13] R. Venkatesan, J. A. Davs, K. A. Bowman, and J. D. Mendl, Optmal n ter Multlevel Interconnect Archtecture for Ggascale Integraton (GSI), IEEE Trans. on VLSI Systems, 9(6), 2001, pp. 899-912. [14] H. B. Bakoglu, Crcuts, Interconnectons, and Packagng for VLSI, Addson-Wesley, 1990. [15] R. H. J. M. Otten and R. K. Brayton, Plannng for Performance, Proc. of DAC, 1998, pp. 122-127. 6