A Network-Flow Based Cell Sizing Algorithm

A Networ-Flow Based Cell Sizing Algorithm Huan Ren and Shantanu Dutt Det. of ECE, University of Illinois-Chicago Abstract We roose a networ flow based algorithm for the area-constrained timing-driven discrete cell-sizing roblem. In order to simlify the roblem, we divide it into two subroblems: ) sizing cells on critical and near critical aths to reduce the circuit delay under available cell area quota, and ) down-sizing cells on non-critical aths without increasing the circuit delay to rovide more available area quota. Cell sizing is erformed by solving these two roblems iteratively (critical and near critical aths, and non-critical aths are udated after each iteration). We model each of the two subroblems as a min-cost networ flow roblem. In the networ flow model, available sizes of each cell are modeled as nodes. Flow assing through a node indicates choosing the corresonding cell size, and the total flow cost reflects the objective function value, i.e., critical ath delay for the first roblem, and cell area for the second roblem. We tested our algorithm on the ISCAS 85 benchmar. An exhaustive search method for the first subroblem with exonential time comlexity has been develoed for comarison. The results show that the imrovement for the first subroblem obtained by our method is only % worse (.9% v.s..9%) than the otimal solution, and our method is 60 times faster than the otimal method. We also comare our comlete algorithm to a recently roosed discrete sizing algorithm. On average, the circuit delay obtained by our algorithm is about 9% better, while our run time is about 40% more. I. INTRODUCTION In order to achieve a balance between design quality and time-to-maret, cell library based design is becoming the dominant design methodology over the custom design method even for high erformance ICs. Usually in a cell library, several different cell imlementations are available for the same function with different sizes, intrinsic delays, driving resistances and inut caacitances. Choosing the cell with an aroriate size, i.e., cell sizing, is a very effective aroach to imrove timing. Cell sizing has been studied for a long time. There are generally two related cell sizing roblems: ) given a total cell area constraint, minimizing the circuit delay, and ) given a circuit delay constraint, minimizing the total cell area. In this aer, we target the first roblem. Many nown cell sizing methods [6], [9] assume the availability of a continuous range of cell sizes, i.e., the size of a cell can tae any value in a range. Then, the obtained gate size is rounded to the nearest available size in the library. However, a large number of realistic cell libraries are sarse, e.g., geometrically saced instead of uniformly saced [0]. Geometrically saced gate sizes are desired in order to cover a large size range with a relatively small number of cell instances. Also it has been roved in [4] that under certain conditions, the set of otimal gate sizes must satisfy the geometric rogression. With a sarse library, This wor was suorted by NSF grant CCR-004097. the simle rounding scheme can introduce huge deterioration from the continuous solution [0]. On the other hand, few time-efficient methods are now that can directly handle timing otimization with discrete cell sizes, since this roblem is NP-comlete. The technique in [7] uses multi-dimensional descent otimization, that iteratively imroves a current solution by changing the size choices of a set of cells that roduces the largest imrovement. It is not clear how well this method can avoid being traed in a local otimum. In [0], a new rounding method is develoed based on an initial continuous solution. Instead of only rounding to the nearest available size, it visits cells in toological order in the circuit, and tries several discrete sizes around the continuous solution for each cell. In order to reduce the search sace, after a new cell is visited, it erforms a runing ste that discards obviously inferior solutions, and a merging ste that ees only several reresentative solutions within a certain quality region. The run time of this method is significantly larger than the method in [7]. In this aer, we roose a networ-flow based method for the area constrained timing otimization cell sizing roblem. We divide the roblem into two cell sizing subroblems. The first subroblem is sizing cells on non-critical aths to minimizing the total cell area under the constraint of not increasing circuit delay, i.e., area driven non-critical cell sizing (AD-NCri). The second subroblem is the seudo-dual of the first one: sizing cells on critical and near critical aths to otimize the circuit delay under available cell area quota, i.e., timing-driven critical cell sizing (TD-Cri). Solving the first subroblem rovides more available area quota for the second subroblem. We model each of the two roblems by a mincost net wor flow roblem. In the networ flow model, the different size otions of a cell are modeled by nodes in the networ grah. The flow cost reresents the objective function value (timing for the TD-Cri subroblem, and area for the AD- NCri subroblem) when the cell sizes corresonding to the nodes in the grah that have flows through them are chosen. The constraint metric value (area for the TD-Cri subroblem, and timing for the AD-NCri subroblem) is reresented by amounts of certain flows in the networ flow grah. By the setting the caacities of arcs that carry these flows according to given uer bounds on the constraint metrics, the constraints are guaranteed to be satisfied. Therefore, by solving a min-cost flow in the networ grah, near-otimal cell sizing can be determined by choosing cell sizes whose corresonding nodes have the min-cost flow through them the near-otimality comes from having to constraint the flow to adhere to certain discrete requirements lie going through exactly one size-otion node er cell. The standard min-cost networ flow, as a continuous otimization

method, usually roduces invalid solutions that violate the discrete requirement. In our method, we use an auxiliary objective-function-indeendent arc cost to guide the min-cost flow to follow the discrete requirement. The rest of the aer is organized as follows. Section II rovides an overview of our method. A general view of our size selection networ grah (SSG) is resented in Sec. III. In Secs. IV-VII, we discuss various issues of the SSG. Section VIII briefly describes an otimal exhaustive search method to which we comare our networ-flow based technique. Section IX resents exerimental results and we conclude in Sec. X. II. OVERVIEW OF OUR METHOD Our cell sizing method starts from an initial sizing solution that may be far from the otimal. The objective is to imrove the critical ath delay of the circuit by resizing cells under a given total cell area constraint. Our method emloys an iterative imrovement aroach that solves the two subroblems in every iteration. In each iteration I j, we first set u an timing imrovement objective d o j. do j is determined according to the actual timing imrovement d j obtained in the revious iteration I j : if d j d o j, i.e., the timing imrovement objective in I j is achieved, then d o j = do j ; otherwise, do j = do j /. In the first iteration, we set d o emirically to be 0% of the initial circuit delay. Let D j be the circuit delay at the beginning of I j. We define the set Pj c of critical and near critical aths in I j to be the set of aths that have delays larger than D j d o j. Obviously, to achieve our timing imrovement objective d o j, we have to resize cells on aths in Pj c nc. We also define the set Pj of noncritical aths to be the set of aths that have delays smaller than a threshold D j d o nc j. Cells on aths in Pj can be downsized to rovide more available cell area quota for sizing cells in Pj c as long as the delays of these aths do not exceed the threshold. Note that the threshold of the delays of aths in Pj nc can be exanded to D o d t j. However, this may cause a thrashing roblem, i.e., a non-critical ath in I j becomes a critical ath in I j+ since the target circuit delay in I j+ will become D j d o j if we achieve the delay imrovement objective in I j. If the thrashing roblem occurs, we may end u downsizing certain cells in Pj nc in I j, and usizing the same cells in Pj+ c in I j+. In each iteration, we first solve the AD-NCri roblem on cells in Pj nc with a ath delay uer bound of D j d o j to squeeze as much cell area quota as ossible, and then solve the TD-Cri roblem on cells in Pj c under the available area quota to imrove the circuit delay. A. Timing Objective Function Minimizing the circuit delay, i.e., the max of ath delays, is not easy to handle for analytical otimization methods, since there is no closed form exression for it. To solve this roblem, we design a continuous timing objective function (instead of a discontinuous one lie max) that is strongly correlated to the max function, i.e., one in which aths with higher delays have significantly higher contributions to the function than lower-delay aths. Thus by minimizing such a function, more critical aths essentially get more imrovement, which is a good aroximation of minimizing the max-delay ath We define CS j (n i ) to be the set of critical sins of a net n i in I j, which are: ) all sins in Pj c if the net is in P j c, or ) the sin with the minimum slac otherwise. A timing cost t c j (n i) of a net n i in I j roosed in [8] is defined as: t c j(n i ) = u CS j (n i ) D(u, n i )/(a j (n i )) β () where u is a sin cell of net n i, D j (u, n i ) is the net delay of n i to u, and a j (n i ) is the allocated slac of n i at the beginning of I j. The allocated slac is defined as the ath slac divided by the number of nets in the ath. β is the exonent of the allocated slac used to adjust the weight difference between costs of nets on aths with different criticalities. Based on exerimental results, β = wors best in this scenario. The timing objective function Fj t for the TD-Cri roblem in I j is the summation of t c j (n i) of all nets in Pj c given as: Fj t = t c j(n i ) () n i P c j Note that in this objective function the nets on more critical aths have a higher contribution since the net cost is inversely roortional to the allocated slac, and thus will be otimized more a desirable outcome, esecially in a scenario where there is a quota on the resource (e.g. total area) available for otimization. B. Handle Timing Constraint To handle the ath delay constraints in the AD-NCri roblem, we use a interconnect slac allocation method. The available slacs of aths in Pj nc are allocated to interconnects between two cells in them. Hence, if the delay of any interconnect after sizing is less than or equal to the delay of the interconnect before sizing lus its allocated slac, the ath delay constraint is satisfied. Therefore, the ath delay constraint is converted to interconnect delay constraints. The slac allocation is based on the delay-area sensitivity of interconnects. Interconnects, in which cells can be downsized a lot without significantly increasing the delay, should be allocated more slacs, so that we can downsize more without violating constraints. More details about interconnect slac allocation is rovided in Sec. V-B. III. THE SIZE SELECTION GRAPH (SSG) We model both of the two sizing subroblems in each iteration of our sizing algorithm as min-cost networ flow roblems. A networ flow grah called the size selection grah (SSG) is constructed for each sizing subroblem. Let the set of aths on which cell sizes need to be determined be P. P = Pj nc for the TD-Cri subroblem in iteration I j, and P = Pj c for the AD-NCri subroblem in I j. In a SSG, the set of sizing otions of each cell in P is modeled as a set of

S (Source) Net structure n n M converting structure n 3 N N 3 N N4 n 4 (a) Net sanning structure (b) n 5 n 7 N5 n 6 N 7 N6 M constraining arc T (Sin) Fig.. (a) Nets in P of a circuit. (b) The corresonding SSG, which includes net structures and net sanning structures. N i is the net structure corresonding to net n i. M converting structures are attached to net structures of nets related to the constraint metric M. Flows from M converting structures are routed to the sin through an M constraining arc. sizing otion nodes, and each sizing otion node corresonds to one sizing otion. We use flows in the SSG to model cell size selection, i.e., a size otion of a cell is chosen by a flow in the SSG if its corresonding otion node in the SSG is on the ath of the flow. Furthermore, if we can set the costs of arcs between otion nodes in the SSG in such a way that the total cost incurred by a flow through the SSG that select an otion for each cell in P is equal to the objective function (i.e., total cell area for the AD-NCri subroblem or Eqn. for the TD- Cri subroblem) for the selected sizes, then the min-cost flow in the SSG selects the otimal cell sizing scheme. Hence the roblem of finding the otimal cell sizing is converted to the roblem of finding the min-cost flow in a grah. The general structure of our SSG is described below. A. The SSG Structure We emloy a divide-and-conquer aroach in constructing the SSG, i.e., first a mini networ flow grah called a net structure is constructed for each nets in P, and then net structures of connected nets are connected by net sanning structures to form the comlete SSG as shown in Fig.. Thus, the SSG has a similar toology as the aths in P. A net structure N j is a child net structure of N i if they are connected and N j follows N i in the flow direction in the SSG (i.e. the signal direction in the circuit is from net n i to net n j ); corresondingly N i is also a arent net structure of N j. Each net structure contains the sets of otion nodes for cells in the corresonding net. Let us denote the set of sizing otions of a cell u as, and the l th sizing otion in it as Su. l For a net n i with a driving cell u d, the net structure is constructed by connecting each sizing otion Su l d of u d to all sizing otion nodes in of every sin cell u to form a comlete biartite subgrah between d and. An examle is shown in Fig.. The net has one driving cell u d and two sin cells u j and u ; see Fig. (a). The corresonding net structure is shown in Fig. (b). Here, we only show two sizing otions in each otion set. A flow u d (a) u u j Comlete biartite subgrah Fig.. (a) A net in P. (b) The net structure corresonding to the net; f is a flow through the net structure. f through the net structure is also shown, which selects size otion Su d for u d, Su for u and Su j for u j. f is a valid flow, since it selects only one size for each cell which is required for a valid cell sizing scheme. With the comlete biartite subgrahs between the sizing otion sets of the drive cell and sin cells in a net structure, every ossible combination of size choices of cells in the net has a corresonding flow through the net structure, and hence is considered in our size selection rocess. There are two major issues that need to be tacled in constructing the SSG in order to correctly ma the min-cost flow in the SSG to an otimal sizing choice. These are: () Maing the objective function value of each net n i to the cost of flow through the corresonding net structure N i. As shown in Fig. (b), in N i, there are one set of arcs (d, ) between the sizing otion set d of the drive cell u d of n i and the sizing otion set of each sin cell u of n i. A valid flow through N i asses through one arc in each such arc set. Hence, for a net n i with m sins, we need a artition scheme to divide the objective function value of n i into m nonoverlaing arts, and attach each art to the costs of arcs in each arc set (d, ), so that the sum of costs of arcs assed by a valid flow is equal to the objective function value with cell sizes selected by the flow. The arc costs determined in this way is called objective function deendent costs (F-cost), as oosed to the objective function indeendent costs that will be introduced shortly. A roduct term based artition scheme is develoed, which will be exlained in Sec. IV. () The min-cost flow that is determined must be a valid flow. An invalid flow, i.e., selecting more than one otion sizes for some cells, cannot be converted to a valid sizing scheme. A valid flow requires the satisfaction of two tyes of consistencies. First, sizing otion nodes of a articular cell have to be chosen in a mutually exclusive manner (consistency in a sizing otion set). Second, if a cell is connected to multile nets, its sizing otions will also be included in each of the corresonding net structures. In such cases, the selected sizing otions for the cell must be consistent across all these net structures (consistency across net structures). Net sanning structures and an auxiliary objective-function-indeendent arc cost for consistency (C-cost) are used to ensure the min-cost flow is a valid flow; these are discussed in detail in Sec. VI. f Su d B. Constraint Satisfaction technique The constraint satisfaction is handled by converting the constraint metric values to flow amounts in the SSG, and converting the given uer bound to arc caacities. The (b) Su j Su

Algorithm FlowSize. Construct a net structure as deicted in Fig. for each net in P.. Determine the F-cost of each arc in the net structures, so that the total cost incurred by a valid flow through them correctly reresents the objective function value of the corresonding net. Connect net structures with net sanning structures. 3. Determine the C-cost of arcs in the net structures and the net sanning structures (see Sec. VI), so that the min-cost flow satisfies the consistency requirements in cell size selection. 4. Add M converting structures and the M constraining arc (Sec. V) for each constraint metric M to mae sure the value of M, which is equal to the flow amount on the M constraint arc, is equal to or less than the given uer bound, which is equal to the caacity of the M constraining arc. 5. The resulting arc cost formulation with both F-costs and C-costs is a concave function. Determine the min-cost flow using the near-otimal networ flow concave-cost minimization algorithm of []. 6. Select the size otions chosen by the valid mincost flow as cell sizes. Fig. 3. Networ flow algorithm for solving the TD-Cri and the AD-NCri roblems. constraint satisfaction is guaranteed by the basic networ flow roerty that flow amount on an arc e is equal to or less than the caacity of e. More secifically, for an constraint metric M with an given uer bound value b M, let the set of nets that are related to the metric be A M. Note that the cell area metric (for TD- Cri roblems) is related to all nets, while interconnect delay metrics (for AD-NCri roblems) only relate to the net that the interconnect belongs to. A M converting structure is added to each net structure N corresonding to net n A M. The outut flow of all M converting structures are gathered by a M constraining arc which is connected to the sin, as shown in Fig. (b). The function of the M converting structure in N is to send out a flow of amount equal to the value of M in net n with the selected cell sizes. The caacity of the M constraint arc is set to b M. Therefore, we mae sure the value of M, which is equal to the total flow amount on the M constraining arc, is smaller than or equal to the given uer bound. The detailed discussion about this issue is in Sec. V. The high-level seudo code of our method FlowSize for solving the two subroblems is given in Fig. 3. IV. OBJECTIVE FUNCTION PARTITION AND F-COST DETERMINATION In this section, we will base our exlanation mainly on the timing objective function (Eqn. ), which is obviously more comlex than the area objective function. According to Eqn., the timing objective function of each net n i in iteration I j is just the timing cost t c j (n i) of n i. On the other hand, the area objective function of a net n i should be u n i A(u)/degree(u), where A(u) is the size of a cell u, and degree(u) is the number of nets that include u. Hence, the area of a cell will not be counted multile times. We assume a lumed caacitance and resistance net delay model, which is widely used in cell sizing [4], [5], [9]. For net n i, the delay D(u, n i ) to a sin cell u of n i is: D(u, n i ) = R d (c L ni + C u ) u SC (n i) where R d is the driving resistance of n i, c is the unit WL caacitance, L ni is the WL of n i, SC(n i ) is the set of sin cells of n i, and C u is the inut caacitance of cell u. In the re-lacement sizing stage, the WL of a net is usually estimated according to the fan-out number of the net. In the ost-lacement resizing stage, it can be estimated using one of several well now models, e.g., HPBB. With this delay model, in iteration I j for a net n i with m sins, the timing cost t c j (n i) of n i is: t c j(n i ) = m a β j (n i) R d(c L ni + u SC (n i) C u ) (3) There are m sets of arcs (d, ) in the corresonding net structure N i between the size otion node set d of the drive cell u d in n i and the size otion node set of every sin cell u in n i. A valid flow asses through one arc in each arc set, and if an arc (, q ) in set ( d d, ) has a flow on it, then we choose size otions and q for cells d u d and u, resectively. Hence, our idea for determining the arc cost is to divide the timing cost into m non-overlaing arts A ( x, S d u y ) ( m), and the value of each art A ( x, S d u y ) only deends on the size choices of the drive cell and sin cell u. Then, the F-cost of an arc (, q ) d is set to be A (, q ), and the total cost of the m arcs d assed by a valid flow is equal to the timing cost with the cell sizes chosen by the flow. The artition of the timing cost is done based on roduct m terms in it. In Eqn. 3, let us denote the coefficient as Sa β (n i) α. The arameters affected by cell sizing otions are R d and C u. Hence, if a roduct term (from now on we will use term for short) in the timing cost formula includes R d, its value is determined by the size of u d, and if a term includes C u, its value is determined by the size of u. For examle, the term αr d C u is determined by choices of sizes of both u d and u, and the term αr d c L ni is determined only by the size of u d. Hence, we can artition the timing cost into m nonoverlaing arts by ut terms in it into corresonding arts according to the cell sizes these terms deend on. Secifically, we consider three ind of terms. Terms whose values deend on the sizes of drive cell u d and a sin cell u is ut in the art A ( x, S d u y ), and terms whose values only deend on the size of a sin cell u is also ut in the art A ( x, S d u y ). On the other hand terms whose values only deend on the size of the drive cell u d, e.g., α cr d L ni is evenly distributed among the m arts, e.g., each art gets a term α cr d L ni /m.

Central node M converting structure Su d M constraining arc To sin... Su To other otion nodes Central node ca=a S q (, d u ) max ca=a d (, ) d Comlementary brach arc To sin q Fig. 6. A comlementary branch arc is added to mae the total branch flow amount is indeendent of cell sizes chose. Fig. 4. arc. The M converting structure on a net structure and the M constraining Note that for most delay models, e.g. lumed caacitance and resistance, linear, and Elmore models, all terms fall into the three inds mentioned above, and there is no term whose value deends on the sizes of multile sin cells. Therefore, our term based artition method alies for most delay models. Furthermore, obviously in the area objective function, the value of each term only deends on the size of one cell. Hence, our term based artition method also alies for the area objective function. To sum u, to solve the TD-Cri roblem, for an arc (S u d, S q u ) in a net structure, its cost cost(s u d, S q u ) is A (S u d, S q u ), which is given as: cost(s u d, S q u ) = αr d (S u d )C u (S q u ), +αr d (S l u d )cl ni The notations R d (S u d ) and C u (S q u ) indicate that the values of R d and C u deend on the chosen sizes S u d and S q u, resectively. V. CONSTRAINT SATISFACTION TECHNIQUES As we mentioned before, the ey art of our constraint satisfaction techniques is the M converting structure, which convert the value of a constraint metric M of a net to its outut flow amount. The outut flows of M converting structures are gathered by a M constraining arc whose caacity is set to the uer bound on M to ensuring constraint satisfaction. As shown in Fig. 4, the M converting structure on a net structure N i of net n i consists of a set of branch arcs from all arcs in N i, which are then connected to a center node. Here, we exand the term based objective function artition method introduced in Sec. IV to constraint metrics M (i.e., the cell area for TD-Cri roblems, and interconnect delays for AD-NCri roblems). For a net n i of m sins, we divide M of a net n i into m non-overlaing arts, with the value of each art A (S x u d, S y u ) only deending on the sizes of drive cell u d and a sin cell u in n i. When there is valid flow on an arc (S u d, S q u ) in N j, art of the flow with amount A (S x u d, S y u ) is branched to the central node through the branch arc in the M converting structure from (S u d, S q u ). Therefor, the total outut flow from the central node is equal to the value of M of net n i with the chosen cell sizes. There are still two issues that need to be solved. ) How to ensure that the branch flow amount on the branch arc from arc (S u d, S q u ) is A (S u d, S q u ). ) In each iteration, how to convert the given ath delay constraint to interconnect delay constraints, i.e., how to erform the slac allocation. A. Flow Amount and Shunting Branch Arcs Central node In order to mae sure that branch flow amount on the S q ca=a (, d u ) branch arc from arc (S u d, Su q ) f is A (Su d, Su q ), an imortant q condition is that when there is d ca=f valid flow on (Su d, Su q ), the flow S q A (, d u ) amount f is always fixed, i.e., irresective of the chosen size of Fig. 5. First method to ensure any cell. Then, we can ensure the the flow amount on the branch flow amount on the branch arc arc. by: first, set the caacity of the branch arc to A (Su d, Su q ), and second, set the caacity of the art of (Su d, Su q ) from the branch oint to Su q to f A (Su d, Su q ), as shown in Fig. 5. However, with the M converting structure shown in Fig. 4. The fixed flow amount condition cannot be satisfied. This is because in the net structure N i, the total amount of the flow leaving N i to the sin through the M converting structure, i.e., the M value of n i is deendent on the size otions chosen. Hence, the flow amount sent from N i to its child net structures in the SSG is not fixed. Our aroach here is adding an extra comlementary branch arc to each arc in N i as shown in Fig. 6. The caacity of the comlementary branch arc from an arc (Su d, Su q ) is set to A max (d, ) A (Su d, Su q ), where A max (d, ) is the maximum of A (e) for all arcs e in the arc set between d and. The flow on the comlementary branch arc is branched to the sin. Also corresondingly, the caacity of the art of (d, ) after the branch oint is set to f A max (d, ). Hence, no matter which arc in the arc set between d and has flow on it, the total branch flow amount on both the branch arc in the M converting structure and the comlementary branch arc is A max (d, ). Therefore, the total branch flow from a net structure to the sin is a constant irresective of chosen sizes. With this roerty, the fixed flow amount condition can be satisfied. The detailed flow amount determination is exlained in Sec. VII. The F-costs of all arcs in M converting structures, and of all comlementary branch arcs are 0. B. Slac Allocation In the AD-NCri roblem, in order to reduce cell area as much as ossible under delay constraints, slac allocation

should be biased towards interconnects that can roduce large area reduction with small delay increase. Note that for an interconnect from cell u d to u, downsizing u will not increase the delay of the interconnect. Hence, we only consider the delay increase caused by downsizing cell u d. Assume cell u d has n sizing otions Su d,..., Su n d ordered from small sizes to large sizes, and currently (at the beginning of the current iteration) the size of u d is Su d. We determine delay area ratio for each interconnect (u d, u ) from u d. Each ratio r q (u d, u ) ( q ) is calculated as the delay increase d q(u d, u ) of (u d, u ) divided by the area decrease when the size of u d is reduced from Su d to Su q d. Note that when calculating d q(u d, u ), cells other than u d are considered at their current sizes. The slac allocation is according to the decreasing order of ratios of interconnects. Every time, we select a max ratio r q (u d, u ) among all unselected ratios, and attemt to allocate a slac of amount d q(u d, u ) to the interconnect (u d, u ), so that there is enough slac to reduce the size of u d from current size Su d to Su q d. However, if (u d, u ) has already been assigned a larger slac (note that there are more than one ratio for each interconnect, so there are also more than one attemt to allocate slac to an interconnect), or allocating a slac of amount d q(u d, u ) violates the ath delay constraints, then the slac allocation attemt either is not necessary or fails, and is hence canceled. Note that one drawbac of the above slac allocation method is that we cannot guarantee that all slacs allocated are used in the downsizing rocess. This is esecially because that for each interconnect, besides downsizing its drive cell, we are also downsizing its sin cells. Hence the actual delay increase of an interconnect (u d, u ) when downsizing u d from Su d to Su q d is usually smaller than d q(u d, u ). To tacle this roblem, when solving the AD-NCri roblem, we use several iterations of slac allocation and networ flow based sizing rocess. If an interconnect has unused slac in the revious iteration, we will not allocate any slac to it in the next iteration. VI. CONSISTENT OPTION SELECTION The net sanning structure is designed to maintain consistency across net structures, i.e., if a cell belongs to multile nets, the size chosen for this cell in each corresonding net structure has to be the same. Note that if multile nets have a common cell, their net structures must be connected in the SSG by net sanning structures. Let u be a common cell of multile nets. Among these nets, u is a sin cell of nets n i,..., n i, and the driving cell of l nets n j,..., n jl. The corresonding net structures are shown in Fig. 7. These net structures are connected by a net sanning structure. Each of the above net structures will contain the corresonding size otion set of u; see Fig. 7. For consistency, we need to mae sure that the selected otions in each are the same across all these net structures. Let us first consider consistency among the arent net structures N i,..., N i, where N im is the net structure of net n im. To maintain consistency, a set of bridge arcs are added N N i i... Collection nodes Bridge arcs A valid flow Distribution nodes N j Nj l Fig. 7. The sanning net structure for maintaining consistency of otion selection for a cell u that is common to multile net structures. in the sanning structure as shown in Fig. 7. Each bridge arc corresonds to one otion node Su i of, and the Su s i in all these net structures N i,..., N i are connected to a collection node at the start end of its corresonding bridge arc. Hence, if in N im the flow asses through Su, i it will then asses through the corresonding bridge arc. Therefore, the roblem of maintaining consistency among N i,..., N i is translated to ensuring the mutual exclusiveness in the bridge arc set, i.e., only one bridge arc can have flow on it. For consistency between the arent net structures N im s and the child net structures N jr s, we connect a distribution node at the termination end of each bridge arc to the same otion in in each N jr that the collection node of the bridge arc is connected to. In this way, if an otion Su i is chosen (assed by flow) in N im, the flow will also be sent to Su i in each N jr through the corresonding bridge arc. Thus, if mutual exclusiveness of the bridge arc set for is satisfied, the same otions will be selected in each net structure to which u belongs as shown in Fig. 7. The F-cost of all arcs in the sanning structure is 0. Besides consistency between f d Fig. 8. The invalid slit flow situation.... net structures, we also need to satisfy the consistency within each size otion set, i.e., only one node in each size otion set can have flow through it. In order to achieve this, we must mae sure there is no slit flow as shown in Fig. 8. This is equivalent to satisfy the mutual exclusiveness condition in all arc sets (, S v ) between two sizing otion sets and S v. Techniques for satisfying the mutual exclusiveness condition for an arc set are discussed in Subsec. VI-A A. C-cost and Satisfying Mutual Exclusiveness Condition The standard min-cost networ flow solves a linear rogramming roblem (a continuous otimization roblem), and thus cannot directly handle the discrete requirements in our method, such as the mutually exclusiveness condition. The basic idea we use to coe with this roblem is to attach an objective-function-indeendent cost, C-cost, to those sets of arcs for which mutual exclusiveness condition has to be satisfied, i.e., arcs between two sets of sizing otions and

bridge arcs. Then, the total cost cost(e) for such arcs is C(e) + C (e), where C(e) is the F-cost, and C (e) is the C-cost, while for other arcs cost(e) is only C(e). Let C be the cost difference between the continuous min-cost flow (which usually violates the mutual exclusiveness conditions) of cost C min, and a valid (meets all mutual exclusiveness conditions) flow of cost C val in the networ flow grah with only C(e) as the arc cost. We determine C (e) according to two requirements: () the total C-cost C val incurred by any valid flow is the same, and () it is smaller than the total C-cost C inval incurred by any invalid flow by at least C +. Then, in the networ flow grah with both C(e) and C (e) costs, the min-cost valid flow of cost Cval min + C val must also have smaller cost than any invalid flow, since for any invalid flow that incurs a C-cost of C inval, its total cost is C inval + C inval C min + C inval C min + C val + C + (C val + C val ) + > Cmin val + C val. Therefore, to ensure the mutual exclusiveness condition in an arc set, C (e) is set as: C (e) = C + (4) for each arc e in the set. It is easy to see that for any valid flow that only goes through one arc in the set, the incurred C cost is always C +. Otherwise, if an invalid flow asses through > arcs in the arc set, the incurred C cost becomes (C + ) which is larger than the C of a valid flow by at least C +. Hence the two conditions for C (e) are satisfied. We note at this oint that an arc e in our networ flow grah with a total cost cost(e) incurs a binary flow cost for a flow f through it, i.e., cost incurred is cost(e) if f > 0, and 0 otherwise. This is because the flow in our model reresents a binary variable: selecting a size otion of not. This cost incurring formulation is different from that in a the standard networ grah in which the incurred cost of a flow on an arc is a linear function of the flow amount, and results in the incurred arc cost being a concave function of the flow amount f. We note that obtaining a min-cost flow in a networ with a concave arc cost is NP-hard, and we solve the roblem with an aroximation algorithm roosed in [] to obtain a nearotimal solution. VII. FLOW AMOUNT AND CAPACITY DETERMINATION As we mentioned in Sec. V, determining the incoming flow amount to each net structure is necessary for determining aroriate arc caacities. A net structure has two tyes of outgoing flows: ) into the M converting structure and comlementary branch arcs; ) suly flow to its child net structures. Let u denote sin nodes in a net n i, and u d denote the drive node. The first tye of outgoing flow has a fixed amount u n i A max (d, ) for net structure N i, irresective of the chosen sizing otions, as discussed in Sec. V. The incoming flow amount must be sufficient to cover the two outgoing flows, and thus the required incoming amount f in (N i ) of a net structure N i is recursively given as: f in (N i ) = f in (N i ) = u n i A max (d, ) + (5) N j child(n i ) f in(n j )/d in (N j ) if N i has any child net structure u n i A max (d, ) otherwise (N i is a (6) leaf net structure in the SSG) where child(n i ) is the set of child net structures of N i, and d in (N j ) is the incoming degree of N j, i.e., the number of arent net structures of N j. In Eqn. 7, we assume that the required flow amount for a net structure is sent uniformly from all its arent net structures. The determination of the flow needed in each structure starts from the boundary condition of leaf net structures given in Eqn. 7 that are directly connected to the sin. The incoming flow amount needed for these net structures is the total of their first tye of outgoing flows. Starting from the leaf net structures, we visit other net structures in a reverse toological order and determine their required incoming flow amount according to the formulation in Eqn. 7. After obtaining f in for each net structure, we can determine its arc caacities as follows: For incoming arcs to a net structure N i from net sanning structures (arcs after bridge arcs in Fig. 7), their caacities are f in (N j ) to rovide the required incoming flows. For outgoing arcs from a node in the size otion node set in N i to net sanning structures (arcs before bridge arcs in Fig. 7), let N j,..., N jt be the child net structures of N i that also have cell u, and then the caacity of these arcs are the total amount of flow needed by N j,..., N jt from N i : t r= f in(n jr )/d in (N jr ). Within a net structure, the caacities of arcs in each arc set (d, ) in N i are the same and derived as below; u d is the driving cell and u is any sin cell of the corresonding net n i. For an arc set (d, ), if is not connected to any arc in outgoing sanning structures from N i, the caacity of each arc in it is set as A max (d, ), so that sufficient flow can be sent to branch arcs in the M converting structure and comlementary branch arcs. Otherwise, let N j,..., N jt be the child net structures of N i that is connected to (via sanning structures). Note that this means that u is a common cell in nets n i and n j,..., n jt. Then, the caacity of each arc in the set is set to be A max (d, ) + t r= f in(n jr )/d in (N jr ) to send enough flow to net sanning structures. VIII. AN OPTIMAL EXHAUSTIVE-SEARCH ALGORITHM In order to verify the solution quality of our networ flow based method, we imlemented an exhaustive search method for the TD-Cri roblem, and comare our solution quality to the otimal one. The idea of this exhaustive search is to try every ossible cell sizing solution, and choose the one with best critical

ath delay. However, if no roer solution runing method is alied, the run time of the search would be unaccetable even for the smallest circuit in the benchmar. We roose three runing methods that can maintain the otimality of the solution, but greatly reduce the run time. In the exhaustive search method, we visit cells in toological order, and when we are visiting a cell, we will combine all artial solutions we have for all visited cells with ossible size choices of the current cell to form new artial solutions. The runing rocess haens after new artial solutions are generated. We roose three inut visited cells unvisited cells... c j c... boundary outut Fig. 9. Visited cells, unvisited cells and the boundary between them. runing conditions. A artial solution is runed when: ) it fails to meet the area constraint; ) it gives longer delay than the critical ath delay roduced by our method; 3) there is another better artial solution (generated in the search rocess or extracted from the comlete solution of our method) that gives smaller total area, and better arrival time at the oututs of cells on the boundary of the visited region (connected to unvisited cells as shown in Fig. 9) in both of the following cases: (a) the unvisited cells are all at their maximum sizes or (b) the unvisited cells are all at their minimum sizes. The first two runing methods obviously do not change the otimality of the exhaustive search method. For the third condition, let us first denote the arrival time at the outut of a cell c i as A o (c i ). Figure. 9 shows a single ath situation. c j is the visited cell at the boundary of the visited region, and c is the unvisited cell connected to it. We have two artial solutions τ and τ, and τ is a better solution according to our third runing condition. Then for any comlete solution τ com exanded from τ, we can also exand τ to τ com by choosing exactly the same sizes for unvisited cells in τ as in τ com. Since τ has smaller area than τ, if τ com meets the constraints, so does τ com. Then the total delay at the outut of c j will be the same for both comlete solutions. Since τ is better than τ, we have A τ o(c j ) > A τ o (c j ), where A τ o(c j ) is the A o (c j ) value according to artial solution τ, and A τ o (c j ) is the A o (c j ) value according to artial solution τ. Hence, the total delay of the ath for τ com > the total delay for τ com. Therefore, τ cannot be exanded to an otimal solution. Thus, our third runing method does not affect otimality of the method. IX. EXPERIMENTAL RESULTS We tested our algorithm on ISCAS 85 benchmars. We use the same industrial 0.8um standard cell library as in [], which rovides four cell imlementations for each function with different areas, driving resistances, inut caacities and intrinsic delays. The interval between the four available sizes w < w < w 3 < w 4 for each cell is increased about exonentially, i.e., (w 4 w 3 ) (w 3 w ) 4(w w ). We exanded the cell library by adding six artificial size otions Ct # # crit. Our method Otimal cells cells % T % A runtime % T % A runtime (secs) (secs) C43 60 50.9-9.9 9 3. -0.0 03 C499 0 5.9-9.4 4.4-9.3 9 C880 383 77 4.8-9.9 9 6. -0.0 95 C355 544 85 6.8-8. 7.0-8.8 489 C908 880 88 6. -9.8 39 7.0-0.0 556 C670.3K 9 9.6-9.5 38. -9.7 908 C3540.7K 4 6.8-9.0 69 9.0-9.4 74 C535.3K 38 0.0-6.9 70 0. -7.9 38 C688.4K 99.5-8. 97.7-9. 4998 C755 3.5K 99 9.9-8..5-7.5 6847 Avg. 336 0.9-8.9 48.9-9. 96 TABLE I Results for our method and otimal exhaustive search method. Four sizes are available for each cell. % T is the ercentage timing imrovement over the initial solution, % A is the ercentage change of total cell area (negative value means deterioration). # of crit. cells means the number of cells in critical and near critical aths. ct % T % A runtime % T (secs) over [0] C43 3.5-9.9 08 0.0 C499 4.8-9.9 65 9.5 C880 5.0-9.9 84 7. C355 9. -9.9 450 8.8 C908 9.8-9.3 687 7. C670 4.0-9.8 6. C3540 8.5-9.5 977 7.6 C535 9. -9. 954 9.5 C688 4.6-9.5 358.0 C755 8.9-9. 3796 6. Avg. 4.8-9.6 305 9.0 TABLE II Results for our comlete iterative cell sizing method. with roortional driving resistances and inut caacitances for each cell. Three size otions are added between w 3 and w 4 with uniform sacing between them, and the other three added otions are made larger than w 4. The intervals between the last three newly added size otions are the same as between the first three newly added size otions. We use linear aroximation determined from the four otions rovided in the original library to calculate the driving resistances and inut caacitances of added otions. Other electrical arameters we use are: unit length interconnect resistance r = 7.6 0 ohm/µm, unit length interconnect caacitance c = 8 0 8 f/µm. Results were obtained on Pentium IV machines with GB of main memory. A. Comaring with the Otimal Method for the TD-Cri roblem Since our otimal exhaustive search method cannot handle many size otions for each cell, we only use the four otions rovided by the original library for each cell (with out the six exanded otions). With exactly the same initial sizing solution and the cell area uer bound set to. times the initial area, the imrovements on the critical ath delay obtained by our net wor flow method and the otimal exhaustive search method over the initial sizing solution are listed in Table I. Comared to the otimal solution, the imrovement obtained by our method is only % worse (.9% v.s..9%); note that the solutions of both methods satisfy the area constraint. Furthermore, our runtime is 60X less than that

Fig. 0. Run time vs. number of cells. of the exhaustive search method even with all three runing conditions. B. Comaring with [0] We also tested our comlete iterative sizing method. The initial sizing solution is obtained by rounding continuous sizing solutions. The area constraint is set to. times the initial area, and each cell has 0 ossible sizes (with the six exanded ones). The timing imrovement we obtain is listed in Table II. We also imlemented the method in [0], and tested it with the same area constraints for the ten benchmar circuits. The ercentage differences in critical ath delays of our solutions from the solutions of the method in [0] are also listed. On average, our method obtain about 9% better critical ath delay than the method in [0]. Our run time is about.4 times of the run time of method in [0]. However, our method still has good scalability. We lotted the run times of our method w.r.t. the numbers of cells in benchmar circuits, as shown in Fig. 0. The lot best matches a linear function, which is a desirable comlexity. sizing methods. Furthermore, our method is also quite scalable with an almost linear timing comlexity. REFERENCES [] I. Adler, and N. Megiddo. A simlex algorithm whose average number of stes is bounded between two quadratic functions of the smaller dimension, Journal of the ACM (JACM), Volume 3, Issue 4, October 985,. 87-895. [] R. K. Ahuja, T. L. Magnanti, and J. B. Orlin, Networ Flows: Theory, Algorithms, and Alications, Chater 0. 360, and Chater. 40, Prentice Hall. [3] R. K. Ahuja and J.B. Orlin, Scaling Networ Simlex Algorithm, Oerations Research, Vol. 40, Sulement : Otimization 99,. S5- S3. [4] F. Beeftin, P. Kudva, D. Kung, and L. Sto, Gate-size selection for standard cell libraries, Proc. International Conference on Comuter-Aided Design, 998,. 545-550. [5] M. Berelaar, and J. Jess, Gate Sizing in MOS Digital Circuits with Linear Programming, Euroean Design Automation Conference, 990,. 7-. [6] C. Chen, C. Chu, and D. Wong, Fast and exact simultaneous gate and wire sizing by lagrangian relaxation, IEEE Trans. Comuter-Aided Design, vol. 8, no. 7,. 04-05, 999. [7] O. Coudert, Gate sizing for constrained delay/ower/area otimization, IEEE Trans. on Very Large Scale Integration (VLSI) Systems, vol. 5, no. 4,. 465-47, 997. [8] S. Dutt, H. Ren, F. Yuan and V. Suthar, A Networ-Flow Aroach to Timing-Driven Incremental Placement for ASICs, Proc. International Conference on Comuter-Aided Design, 006,. 375-38. [9] J. Fishburn and A. Dunlo, Tilos: A osynomial rogramming aroach to transistor sizing, Proc. International Conference on Comuter-Aided Design, 985,. 36-38. [0] S. Hu, M. Ketary, J. Hu, Gate Sizing For Cell Library-Based Designs, Proc. Design Automation Conference, 007,. 847-85. [] D. Kim, and P. Pardalos, A solution aroach to the fixed charge networ flow roblem using a dynamic sloe scaling rocedure, ORL,. 95-03,999. [] Yang Xiaojian, Choi Bo-Kyung, M. Sarrafzadeh, Timing-driven lacement using design hierarchy guided constraint generation, International Conference on Comuter-Aided Design,. 77-80, 00. X. CONCLUSIONS We resented a novel timing-driven networ flow based discrete cell sizing algorithm. We simlify the roblem by dividing it into two subroblems, sizing cells in critical and near critical aths to imrove circuit delay, and downsizing cells in non-critical aths to rovide more cell area quota. We roosed networ flow based models to solve both subroblems. Cell size otions are modeled as nodes in the networ flow grah, and the cost of flows assing through various nodes is equal to the objective function value when the cell sizes corresonding to these nodes are chosen. Thus, by solving for a min-cost flow, we can determine the cell sizes that otimize the objective function. Various techniques are roosed to ensure that the min-cost flow we obtain from the continuous networ flow model meets the discrete mutual exclusiveness condition, which is required to maintain the consistency in cell size selection. Constraints in the two subroblems are converted to constraints on flow amounts in the networ flow model by constraint metric converting structures, and are satisfied by setting aroriate caacities of arcs. The results show that the timing imrovement obtained with our method is near otimal, and with high quality comared to other discrete