392 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 32, NO. 3, MARCH 2013

Size: px
Start display at page:

Download "392 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 32, NO. 3, MARCH 2013"

Transcription

1 392 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 32, NO. 3, MARCH 2013 An Optimal Allocation Algorithm of Adjustable Delay Buffers and Practical Extensions for Clock Skew Optimization in Multiple Power Mode Designs Kyoung-Hwan Lim, Deokjin Joo, Student Member, IEEE, and Taewhan Kim, Senior Member, IEEE Abstract Satisfying a clock skew constraint is one of the most important tasks in clock tree synthesis. Moreover, the task becomes much harder to solve when the clock tree is designed in a multiple power mode environment, in which the voltage applied to some design module varies as the power mode changes. Recently, it has been shown that an adjustable delay buffer (ADB), whose delay can be tuned dynamically, can be used to solve the clock skew problem effectively under multiple power modes. However, due to the area or control overhead by ADBs, it is very important to minimize the number of ADBs to be allocated. This paper provides a complete solution to the problem of clock skew optimization using ADBs under multiple power modes. We propose a linear-time algorithm that simultaneously solves the problems of computing: 1) the minimum (optimal) number of ADBs to be used; 2) the location where each ADB is to be placed; and 3) the delay value of each ADB to be assigned to each power mode. Experimental results show that, in comparison with the previous work, which iteratively performs the ADB allocation, placement, and value assignment, our integrated algorithm produces consistently better designs for all tested benchmarks; it reduces the numbers of ADBs by 9.27% on average under the skew bound of ps, even with shorter clock latencies compared to that of previous algorithm of ADB allocation, placement, and delay assignment. To make it practically feasible, we also propose a new ADB design technique and systematic algorithmic solutions to address the problems of discrete delay values, slew rate variation, nonzero initial ADB delay, and a possible exploration of ADB resizing. Index Terms Adjustable delay buffer (ADB), cell allocation, clock skew, clock tree synthesis, multiple power modes. Manuscript received March 11, 2012; revised June 3, 2012 and July 24, 2012; accepted August 20, Date of current version February 14, This work was supported in part by the Basic Science Research Program through the National Research Foundation under Grant , by the Center for Integrated Smart Sensors funded by the Ministry of Education, Science, and Technology as the Global Frontier Project (CISS ), and by the Ministry of Knowledge Economy, Korea, under the Information Technology Research Center Support Program supervised by the National IT Industry Promotion Agency (NIPA) under Grant NIPA-2012-H A preliminary version of this paper was presented in [1]. This paper was recommended by Associate Editor C. C.-N. Chu. K.-H. Lim is with Samsung Electronics Company, Ltd., Yongin , Korea ( kh12.lim@samsung.com). D. Joo and T. Kim are with the School of Electrical Engineering and Computer Science, Seoul National University, Seoul , Korea ( jdj@ssl.snu.ac.kr; tkim@ssl.snu.ac.kr). Color versions of one or more of the figures in this paper are available online at Digital Object Identifier /TCAD I. Introduction IN SYNCHRONOUS circuit design, all sequential elements in the design are synchronized by a unified signal, usually called a clock signal. Ideally, the clock signal should reach all sequential elements at the same time from the clock source. However, in practice, there exists some timing difference between the clock signal paths from the clock source to the sequential elements due to some variations of path lengths and buffer characteristics of the paths. The largest difference among the arrival times of a clock signal is called clock skew, and achieving zero clock skew is a very difficult task. One possible solution is to limit the clock skew to a certain bound that can tolerate all variations caused by the clock skew. Extensive research works on clock tree optimization, such as clock routing, clock buffer insertion or sizing, and wire sizing, have been performed to minimize clock skew (e.g., [2] [8]). A common assumption of those works is that the generated clock tree is to be operated on a single (fixed) power mode condition. For multiple power or voltage mode designs, the clock signal delay on a path may change as the applied power mode (i.e., operating condition) changes. Thus, a clock tree optimized to meet a clock skew constraint on one power mode may violate a clock skew constraint on another power mode. Even though the previous works can consider the clock skew constraint on every power mode, it is highly likely that the resulting clock tree uses a substantially long wirelength or that there exists no clock tree that satisfies the clock skew constraint on every power mode. On the other hand, post-silicon tuning (e.g., [9] [12]), such as inserting adjustable delay buffers (ADBs), is a widely used method for dealing with the timing problem caused by process and environment variations. Because the delay of an ADB can be controlled by its delay control inputs [13], the clock skew variation caused by process variation can be tuned by properly inserting ADBs after the manufacturing stage has been completed. For those works that have used ADBs in clock tree synthesis, their objective is to minimize statistical variation of clock skew under a single power mode domain, and as yet the problem of using ADBs for clock skew optimization in multiple power modes has not been intensively investigated. The idea of using ADBs in multiple power modes is to replace some of normal clock buffers with ADBs so that the /$31.00 c 2013 IEEE

2 LIM et al.: OPTIMAL ALLOCATION ALGORITHM OF ADBs AND PRACTICAL EXTENSIONS 393 clock skew constraint on each power mode can be met. When the power mode changes during execution, for example, from power mode Mode-1 to power mode Mode-2, the delays of ADBs in the clock tree that have been adjusted under Mode-1 are readjusted to meet the clock skew constraint under Mode-2. Since the ADB logic component is much bigger than the normal buffer and it requires control lines, a set of related problems to be solved for the ADB-based clock skew optimization in multiple power modes is: 1) (problem-1) allocating a minimum number of ADBs; 2) (problem-2) finding the normal buffers (or locations) in the clock tree that are to be replaced by ADBs; and 3) (problem-3) determining the delay values of ADBs to be assigned on each power mode. To our knowledge, the works by [14] [16] are the only ones that have considered the use of ADBs to minimize clock skew in multiple power mode domains. The authors of [14] and [15] proposed a linear-time optimal algorithm for solving problem-3 and attempted to solve problems-1 and -2 heuristically in a greedy manner by repeatedly applying the algorithm of problem-3. The work in [16] proposed an efficient algorithm adopting a two-stage approach: performing a topdown ADB insertion followed by performing a bottom-up ADB elimination. Even though the two-stage approach reduces the runtime over the work in [14] and [15], it still does not guarantee an optimality. Moreover, [14] [16] do not address the practically important issues caused by the ADB allocation. Those are: 1) (problem-4) the consideration of ADB s base delay, and 2) (problem-5) the consideration of ADB s output slope change. (As will be discussed later, note that the base delay number of an ADB is closely related to the size of the ADB implementation, whereas the output slope of an ADB dynamically varies according to the delay value of the ADB.) In summary, the key contribution of our work consists of two parts: 1) the design of a linear time optimal algorithm to solve the fundamental problems 1, 2, and 3 simultaneously, and 2) to address the practical problems 4 and 5, the development of systematic and effective postprocessing algorithms based on the optimal solution to the problems 1, 2, and 3. This paper is an extended version of the preliminary work in [1]. The extension includes: 1) providing a formal proof of the optimality of the proposed linear-time ADB allocation algorithm that minimizes the number of ADBs to be used; 2) proposing a technique that considers the discrete delay increment of ADBs; 3) proposing a new ADB design technique to solve the output slew variation problem; 4) proposing a technique to support the nonzero base delay of ADBs; 5) proposing the ADB refinement technique to reduce the total cost of capacitors in the ADBs; and 6) providing a set of diverse experimental data to validate the effectiveness of the proposed techniques of practical extensions. The remainder of this paper is organized as follows. In Section II, we describe the internal logic structure of an ADB and propose an ADB optimization flow in clock tree synthesis. We then propose, in Section III, an integrated algorithm for solving the ADB allocation, placement, and delay assignment problems together with a complete proof of linear-time optimality of the algorithm and handling the discrete delay values of ADBs. In Section IV, we discuss the Fig. 1. Two logic structures of ADB. (a) Capacitor-based structure [17]. (b) Inverter-based structure [16], [18]. characteristics of ADB implementation, propose a new ADB structure to cope with the output slew variation problem, and propose a tuning algorithm to address the ADB s base delay problem, followed by refinement techniques to reduce the total capacitors and transistors of ADBs. We provide in Section V a set of diverse experimental data to validate the effectiveness of our proposed approach. Conclusions are given in Section VI. II. ADB Structure and Synthesis Flow A. ADB Logic Structure The ADB is a buffer that can provide more than one delay that can be dynamically controlled by control inputs. In other words, any design of special buffers that provides various delays would be acceptable as an ADB. Fig. 1 shows two widely used structures of an ADB. Fig. 1(a) shows an ADB implementation based on capacitors [17]. The ADB consists of two inverters: a capacitor bank and a capacitor bank controller. When we assume to use a uniform size of capacitors in the bank in Fig. 1(a), the delay of its ADB is linearly proportional to the number of capacitors activated in the bank. On the other hand, Fig. 1(b) shows an ADB implementation based on inverters [16], [18]. The ADB consists of parallel inverters and their SELECT pins. The SELECT signals provide several driving modes, by which the ADB delay is adjusted. For example, when all the SELECT signals are set to logic 0, the inverters connected to the corresponding SELECT pins are turned off. Thus, only the leftmost and the rightmost inverters in Fig. 1(b) are turned on and drive output. The delay of ADB can be adjusted by setting some of the SELECT signals to logic 1 to activate

3 394 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 32, NO. 3, MARCH 2013 the problem of the output slope change (in Section IV-A) combined with a possible ADB size reduction (in Section IV-C). Fig. 2. Proposed synthesis flow of clock tree optimization using ADBs. their parallel inverters. The more SELECT signals are enabled, the shorter the delay of ADB is. Note that as the maximum delay an ADB needs to generate increases, its inverter-based implementation requires increasing the number of parallel transistors, while its capacitor-based implementation requires increasing the number of capacitors. In addition, the precision of the delay values of an ADB is controlled by the size of the set of parallel transistors for the inverter-based implementation and by the size of capacitor bank for the capacitor-based implementation. The capacitor-based implementation offers the finer granularity of delay values over that of the inverterbased implementation because of the easier fine tuning of delays by using capacitors than using inverters. However, the size of the capacitor-based implementation is relatively larger than that of the inverter-based implementation. For this reason, the inverter-based implementation is suited for the design domains with very high clock skew variation, while the capacitor-based implementation is acceptable to the high-speed designs, in which a delicate delay adjustment is necessary. In this paper, we use the capacitor-based ADBs in Fig. 1(a). B. Synthesis Flow Fig. 2 shows our proposed synthesis flow that generates an ADB-based clock tree. We accept the input clock tree that is constructed by any of traditional clock tree generation schemes. We first find optimal ADB allocations and its delay value assignment under a given constraint of clock skew bound B while using the given arrival time information for every power mode. Our ADB allocation presented in Section III-B guarantees to find a minimum number of ADBs under B.Inthe subsequent step in Section III-C, the optimal results are iteratively explored to support the discrete delay values of ADBs. The last two (yellow) steps in Fig. 2 solve the problem of the initial delay (i.e., base delay) of ADBs (in Section IV-B) and III. ADB Allocation, Placement, and Delay Assignment A. Key Observations The ADB-based clock skew optimization problem is described as follows. Problem 1 (ADB-Based Clock Skew Optimization Problem): Given an initial buffered clock tree T, power modes Mode-1, Mode-2,, Mode-K, clock signal arrival times on all power modes, and clock skew bound B, find the buffers in T to be replaced by ADBs and assign delay values to the ADBs for every power mode such that the number of ADBs used is minimized while the clock skew constraint is satisfied for all power modes. Before we describe the details of the proposed ADB allocation algorithm, we want to illustrate, using clock timing analysis, a number of key observations upon which our algorithm is built. Let us consider the clock tree rooted at a buffered node n 1 in Fig. 3(a), where there are five flip-flops (FFs) E, F, G, H, and I with arrival times in a power mode. We now look into whether the buffer n 2 should be replaced by an ADB, and if so, what delay it should be assigned. First, we introduce some definitions and notations to facilitate our discussion. Definition 1 (Fully Reducible and Fully Simplified Timing Tree): A clock subtree T with arrival time information to the FFs that T has is called fully reducible if the ADB allocation and delay assignment for T has been completed for every power mode. The fully reduced timing tree of a fully reducible subtree T consists of the root of T and two children of the root, which represent the FFs with the latest and earliest arrival times. If there is more than one FF whose arrival time is the latest (or earliest), any of the FFs can be chosen. For example, if the subtree in Fig. 3(b) with two buffers n 1 and n 2 becomes a fully reducible subtree by performing the allocation of ADB only to n 2 with the ADB delay of 3 on a power mode, the fully reduced timing tree of n 1 on the power mode is shown on the right in Fig. 3(b). 1) α i ( 0): the delay increment 1 to be assigned to an ADB in node n i. α i = 0 means no ADB is needed at n i. 2) est i and lst i : the earliest and latest arrival times to the FFs whose clock signals pass through n i. For example, in Fig. 3(a) est 1 =2,est 2 =2,lst 1 = 14, and lst 2 =10. 3) lat max : the latest arrival time to FFs for a clock tree. That is, when n 1 is the root of the clock tree, lat max is another notation equivalent to lat 1. 4) est i\{k1,,k r } and lst i\{k1,,k r }: the earliest and latest arrival times to the FFs whose clock signals pass through n i but not through n k1,, or n kr. If there is no such clock signal, est i\{k1,,k r } = and lst i\{k1,,k r } =0.For example, in Fig. 3(a) est 1\{2} = 5 and lst 1\{2} =14. 5) slk i ( 0): the clock skew of clock tree rooted at n i, i.e., the value of lst i est i. slk i\{k1,,k r } is defined to 1 For conciseness, we also call it delay value when it does not cause any confusion.

4 LIM et al.: OPTIMAL ALLOCATION ALGORITHM OF ADBs AND PRACTICAL EXTENSIONS 395 an ADB is used in n 2. For example, in Fig. 3(d), since the delay value to an ADB is nonnegative, the value of lst max est 1 (= 12 1) can never be reduced. For this case, B should be reset to a number 11. We formalize observations 1 4 in the following three cases. Consider a clock tree with two buffered nodes n 1 and n 2 such that n 1 is the root of the tree and n 2 is a child node of n 1, such as in Fig. 3(a). Suppose all lst ( ), lst ( ), and slk ( ) values of the clock tree before the ADB allocation to n 2 are available. Notation ( ) indicates every node associated with the ADB allocation. For the example in discussion, it corresponds to nodes n 1 and n 2. Now, when an ADB is allocated to n 2 with delay value α 2, the resulting clock skew can be expressed as follows: max{slk 1\{2}, slk 2, lst 2 + α 2 est 1\{2}, lst 1\{2} (est 2 + α 2 )}. (1) Fig. 3. Relations between the ADB allocation and the arrival time distribution (the clock skew bound B is set to 10). (a) Simple clock tree with two buffered nodes n 1 and n 2. (b) Case of arrival time distribution where n 2 should be replaced with an ADB. (c) Case where n 2 needs no ADB. (d) Case where ADB allocation never resolves the clock skew violation. lst i\{k1,,k r } est i\{k1,,k r }. For example, in Fig. 3(a), slk 1 = 12 and slk 2 =8.slk i is a fixed value independent of the delay values assigned to an ADB, if there is, in node n i. The skew bound B is assumed to be set to a value such that B slk i for every node n i, because otherwise no ADB allocation is able to resolve the clock skew violation. Let us examine the problem of determining if we need an ADB in node n 2 in the clock tree rooted at n 1, as shown in Fig. 3(a). With the clock skew bound B = 10, we make the following observations. 1) If an ADB is not allocated to n 2, no matter what delay values to n 1 are assigned, it is always lst max est 2 >B. 2) If an ADB is allocated to n 2, we can make lst max est 2 B by setting a delay increment (e.g., 3 or 4) to the ADB. Thus, we can conclude that an ADB should be assigned to n 2 to satisfy the skew constraint on the power mode. Once a delay value (e.g., α 2 = 3) of the ADB is assigned, the resulting fully reduced (arrival) timing tree is shown on the right side in Fig. 3(b). 3) Let us consider other arrival times, as shown on the left clock tree in Fig. 3(c). Since est 2 is large enough to satisfy the skew constraint, an ADB is not needed at n 2, producing the fully simplified timing tree shown on the right in Fig. 3(c). 4) Last, there could be another arrival time relation in which the skew constraint can never be met even though 1) Case 1 (lst max est 1\{2} B and lst max est 2 >B): Since lst max est 2 >B, it is necessary to set α 2 to a positive value. We set α 2 = est 1\{2} est 2. (2) We can confirm α 2 > 0 by using lst max est 2 > B, slk 2 B, and slk 1\{2} B. 2 The clock skew in (1) is then expressed as max{slk 1\{2}, slk 2, lst 2 est 2, lst 1\{2} est 1\{2} }, which becomes max{slk 1\{2}, slk 2, slk 2, slk 1\{2} } B. 2) Case 2 (lst max est 1\{2} B and lst max est 2 B): With α 2 = 0, implying no ADB allocation at n 2, the skew expression in (1) is simplified to max{slk 1\{2}, slk 2, lst 2 est 1\{2}, lst 1\{2} est 2 }.Iflst max = lst 1\{2}, the clock skew becomes max{slk 1\{2}, slk 2, lst 2 est 1\{2}, lst max est 2 } max{slk 1\{2}, slk 2, lst max est 1\{2}, lst max est 2 } B. Otherwise, i.e., if lst max = lst 2, the clock skew becomes max{slk 1\{2}, slk 2, lst max est 1\{2}, lst 1\{2} est 2 } max{slk 1\{2}, slk 2, lst max est 1\{2}, lst max est 2 } B. Thus, no ADB is needed at n 2. 3) Case 3 (lst max est 1\{2} >B): If lst max = lst 1\{2}, lst max est 1\{2} = lst 1\{2} est 1\{2} = slk 1\{2} >B, which violates the assumption that B slk ( ) for every ( ), whereas if lst max = lat 2, lst 2 + α 2 est 1\{2} in (1) becomes lst max est 1\{2} + α 2 >Bsince α 2 0 and lst max est 1\{2} >B. This means the skew bound B can never be met by ADB allocation. B. Proposed Algorithm Our proposed algorithm called CLK-ADB for the ADBbased cock skew optimization is an iterative one, processing in a bottom-up fashion on a clock tree T. CLK-ADB sorts all the internal buffered nodes, excluding the leaf buffered nodes, in T topologically and performs the following two steps iteratively for the clock subtrees rooted at the nodes in the sorted list. Step 1) Allocating ADBs: Consider allocating ADBs to the nodes in the clock subtree rooted at node n i. At this point, the ADB allocation to all descendant nodes 2 From lst max = {lst 2, lst 1\{2} }, lst max est 2 >B, and lst 2 est 2 B, lst max must be lst 1\{2}. Then, from lst max est 2 >Band lst 1\{2} est 1\{2} B, it is true that est 1\{2} > est 2.

5 396 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 32, NO. 3, MARCH 2013 Fig. 4. Illustration of the two-step ADB allocation of CLK-ADB for a clock signal timing tree rooted at n i. The tree contains two child nodes n k1 and n k2. According to Cases 1, 2, and 3, n k1 belongs to Case 1, needing an ADB with a delay value of 4 to meet the clock skew constraint, while n k2 belongs to Case 2, not needing an ADB. of n i other than the child nodes n k1,,n kr of n i have already been processed during the previous iterations. Thus, all the subtrees rooted at the processed descendant nodes have been replaced with their fully simplified timing trees. Now, CLK-ADB will determine for each child node n kj, j =1,,r whether an ADB is needed or not. The ADB allocation to n kj is determined according to the three cases classified before, by replacing the notations lst max, est 2, lst 2, and est 1\{2} in the inequality relations in Case 1, Case 2, and Case 3 with lst i, est kj, lst kj, and est i\{nk1,,n kr }, respectively. Step 2) Assigning delay values: Let A case1 denote the subset of child nodes n k1,,n kr that were determined in Step 1 to allocate ADBs. The delay value α kj to be assigned to an ADB at each node n kj A case1 is set according to the generalization of (2) α kj = est min est kj (3) where est min = est i\acase1. Note that the path of est min always exists, which otherwise automatically implies A case1 = {n k1,,n kr }, which contradicts slk kj B, which is due to the fact that by the second inequality in Case 1, for some child node n kj, lst max = lst kj. Once the computed delay values are assigned, the corresponding subtrees are reduced to fully simplified timing trees. It should be noted that α kj in (3) is the smallest value that can be assigned as a delay value to the ADB in node n kj to meet the clock skew constraint. This directly implies that a minimal sized capacitor should be allocated in the ADB in the course of ADB allocation. However, it does not always lead to a globally minimal allocation of capacitors of ADBs even though CLK-ADB allocates a minimal number of ADBs. In Section IV-C, we propose an ADB cost refinement technique to further reduce the total cost of capacitors of ADBs. Fig. 4 shows an example of processing ADB allocation by CLK-ADB for a clock signal timing tree rooted at node n i. The tree has two buffered nodes n k1 and n k2. In Step 1, all est ( ) and lst ( ) values are computed, Then, the three cases are checked for each of n k1 and n k2. It turns out that n k1 belongs to Case 1, requiring an ADB to satisfy the clock skew constraint, while n k2 belongs to Case 2, not needing an ADB. In Step 2, the delay value of the allocated ADB is computed by (3) where est min = est i\{k1 } since A case1 = {n k1 }, and the clock timing tree rooted at n i is fully simplified. A complete example: Fig. 5(a) (e) shows step-by-step results of ADB allocation and delay assignment by CLK- ADB for the clock subtrees rooted at node set L = {Buf 0, Buf 1,Buf 2,Buf 5 } with clock skew bound B = 10 in Fig. 5(a). We assume that the topologically sorted list of L is (Buf 5,Buf 1,Buf 2,Buf 0 ). Then, the clock subtree rooted at Buf 5 belongs to the form in Fig. 3(a), in which for Mode-1, lst max = 15, est 5\{7} =6,est 7 = 8 that satisfy Case 2, while for Mode-2, lst max =14,est 5\{7} =5,est 7 = 2 that satisfy Case 1, allocating an ADB at Buf 7 with a delay value of 3 (= est min est 7 =5 2) on Mode-2. Then, the clock tree with the fully simplified timing subtree rooted at Buf 5 is shown in Fig. 5(b). Note that the timing numbers 6 and 5 (or 15 and 14) in G (or H ) indicate the earliest (latest) arrival times of clock signal to FFs passing through Buf 5 for Mode-1 and Model-2, respectively. Next, we look into the clock subtree rooted at Buf 1 in Fig. 5(b). Since there is no clock path on the subtree other than the paths that cover either Buf 3 or Buf 4, est 1\{3,4} = by our definition, from which we can easily check that both of Buf 3 and Buf 4 belong to Case 2 for Mode-1 and Mode-2. The resultant simplified timing tree is shown in Fig. 5(c). Similarly, the clock subtree rooted at Buf 2 is then processed, and the resultant simplified clock tree is shown in Fig. 5(d). Finally, the clock subtree rooted at Buf 0 is processed, by which it is shown that only Buf 1 needs an ADB with a delay increment of 2 on Mode-1 and 3 on Mode-2. The whole clock tree with the ADB allocation and delay assignment is shown in Fig. 5(e), where the nodes with blue color represent ADBs. Note that an ADB is never allocated to the root node Buf 0. Fig. 6 depicts the flow of the proposed ADB allocation algorithm CLK-ADB. During the iterations that check the three cases in Step 1, if there is at least one clock subtree tested that satisfies Case 3 on a power mode, we report that the input clock tree can never meet the clock skew constraint. Otherwise, if all clock subtrees do not satisfy Case 3 on every power mode, but there is a clock subtree that satisfies Case 1 for a child node of the root of the subtree on a power mode, an ADB should be allocated at the child node with delay assignment. Let N and K be the number of buffered nodes in an input clock tree and the number of power modes, respectively. Since each iteration of CLK-ADB takes a constant time for each power mode and there are at most N iterations, the time complexity of CLK-ADB is bounded by O(N K). Since K is a small number, mostly not exceeding 6, CLK-ADB is a linear time algorithm. The optimality of the ADB allocation by CLK-ADB is formally given by Lemma 1 and Theorem 1. Lemma 1: The clock tree produced by CLK-ADB always contains a clock signal path of the earliest arrival time to FF such that no ADBs are allocated on that path. Proof: We use induction on the height (h), in terms of the number of buffered nodes, of clock subtrees. For the case

6 LIM et al.: OPTIMAL ALLOCATION ALGORITHM OF ADBs AND PRACTICAL EXTENSIONS 397 Fig. 5. Example showing step-by-step results by CLK-ADB for processing clock subtrees. (a) Clock tree T before the bottom-up ADB allocation of CLK-ADB with clock skew bound B = 10. The topologically sorted node list is (Buf5, Buf1, Buf2, Buf0 ), and two power modes Mode-1 and Mode-2 are considered. (b) Processing clock subtree rooted at Buf1. (Case 1 holds on Mode-2 for Buf7. Thus, an ADB is allocated at Buf7 with a delay value of α2 = +3 on Mode-2.) (c) Processing clock subtree rooted at Buf2. (No ADBs are needed at Buf3 and Buf4.) (d) Processing clock subtree rooted at Buf0. (An ADB is needed at Buf1.) (e) Complete clock tree T after the ADB allocation of CLK-ADB where the allocated ADBs and delay assignment are shown with blue color.

7 398 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 32, NO. 3, MARCH 2013 Fig. 6. Flow of CLK-ADB for solving the ADB-based clock skew optimization problem. The green boxes represent the termination of the flow. The outer loop indicates the iteration of CLK-ADB, and the inner loop indicates the testing of each child node of the root of the subtree corresponding to the current iteration for ADB allocation with delay assignment. of h = 1, i.e., the clock subtrees rooted at nodes with no child node, the claim is trivially satisfied since there is no buffered node except for the roots, and thus no ADB allocation. For the case of h = l(> 1), consider a clock subtree rooted at a node ni such that h = l. We claim that the clock signal path corresponding to estmin is a path with: 1) the earliest arrival time to FF, and 2) no ADBs. Property 1 holds from the delay assignment in Step 2 of CLK-ADB that for every child node nkj Acase1, the increased value of estkj by adding αkj in (3) never exceeds the value of estmin, while property 2 holds by the induction hypothesis of the subtrees of height l 1 or below. Theorem 1: CLK-ADB uses a minimal number of ADBs to meet the clock skew constraint for all power modes. Proof: We use induction on height h of clock subtree in terms of the number of buffered nodes. For the case of h = 1, i.e., the clock subtrees rooted at nodes with no child node, the claim is trivially true since there is no ADB on the corresponding subtrees except for the roots. For the case of h = l(> 1), consider a clock subtree rooted at a node ni such that h = l, Let nk1,, nkr be the child nodes of ni, and Nopt (Tx ) and NADB (Tx ) denote an optimal number of ADBs and the number of ADBs used by CLK-ADB for the clock tree rooted at nx, respectively. Since NADB (Tkj ) = Nopt (Tkj ) for all j = 1,, r by the induction hypothesis and a clock skew violation, if it exists, caused by two clock arrival times in different clock subtrees rooted at child nodes requires one or more new ADBs, the optimal ADB relation between Ti and Tkj can expressed as Nopt (Ti ) Nopt (Tk1 ) + + Nopt (Tkr ). (4) More precisely, for each nkj Acase1, it is required to allocate at least one more ADB in the subtree rooted at nkj to resolve the clock skew violation since the clock path of estkj to be increased contains no ADBs at all according to Lemma 1. Thus, the relation of ADB requirement can be expressed as follows: Nopt (Ti ) Nopt (Tk1 ) + + Nopt (Tkr ) + Acase1. (5) On the other hand, for each nkj Acase1, CLK-ADB allocates only one ADB at the root of the subtree, thus NADB (Ti ) = Nopt (Tk1 ) + + Nopt (Tkr ) + Acase1. By (5) and (6), NADB (Ti ) = Nopt (Ti ). (6)

8 LIM et al.: OPTIMAL ALLOCATION ALGORITHM OF ADBs AND PRACTICAL EXTENSIONS 399 C. Consideration of Discrete Delay Values The proposed ADB allocation algorithm and the existing algorithms [14] [16] have initially assumed continuous delay values of ADBs. However, the size of each unit of capacity banks in Fig. 1(a) will determine the delay precision of the ADB. CLK-ADB employs a simple roundoff scheme to handle the discrete delay values of ADBs. However, the roundoff scheme could cause a clock skew violation. We control the potential clock skew violation by intentionally tightening the clock skew bound B. That is, we repeatedly perform the following two-step procedure (called ADB-RD) until there is no clock skew violation: Step 1) apply CLK-ADB with clock skew bound B and roundoff the delay value of ADBs, and Step 2) if there is a clock skew violation, reset B = B B and go to Step 1. The B value is set to a number such that R B = B, where R is the maximum number of iterations (Steps 1 and 2) the designer wants to set. We refer the combined ADB allocation algorithm of CLK-ADB and ADB-RD to CLK-ADB-RD. IV. Coping With Physical Limitations of ADBs Since the characteristics of normal buffers and ADBs are inherently different, replacing some buffers with ADBs to optimize clock tree timing should address a number of issues in addition to the ADB allocation problem in Section III. Those are: 1) maintaining a consistent output slew of ADBs; 2) supporting the nonzero initial (i.e., base) delay of ADBs; and 3) configuring the size of capacitor banks in ADBs. A. Maintaining a Consistent Output Slope The delay of each ADB allocated in the clock tree varies according to the power modes in the system. The predefined delay value of the ADB for a power mode is established by controlling the number of capacitors to be activated in the ADB. However, depending on the number of activated capacitors, the ADB s output slope varies, enabling a very low output slew rate when a long ADB delay is established. We upgrade the conventional capacitor-based ADB circuit element by adding a new circuit consisting of six transistors to cope with the low output slew rate problem. The devised circuit is shown in Fig. 7(b), in which both of the input and output signals of the original ADB in Fig. 7(a) are used as input to the new circuit, and the left four transistors of the newly added circuit in Fig. 7(b) generate an inverted signal only when the two input signals switch to the same logic state while the right two transistors form a normal inverter, producing the correct output. The new ADB circuit elements work as follows. Let us suppose the two inputs (i.e., original and delayed inputs) and output signals are all in logic state 0. When the original input signal switches from 0 to 1, the output signal still maintains state 0 because the left four transistors in the added circuit do not let the output be in state 1. Then, as the input signal passes through the capacitor bank for the time duration of the predefined delay value, both of the original input and delayed input signals become in state 1, which subsequently leads to state 0 to the output of the left four transistors, which then produces state 1 at the final output. The occurrence Fig. 7. (a) Proposed updated ADB structure, maintaining a consistent output slew rate that is immune to the ADB delay variation in the (b) conventional capacitor-based ADB structure. of a low slew rate at the output of the conventional ADB implementation due to a long delay will be prevented by attaching the new circuit elements to the ADB. Note that to maintain the output slew of some design components other than ADB, a technique similar to our proposed one in Fig. 7(b) is commonly used in the design field. B. Supporting Nonzero Base Delay If an ADB allocated in the clock tree is not used in a power mode, that is, delay increment α = 0, the ADB is ideally assumed to be behaving exactly as the buffer that has been previously placed on the same location. However, when all the capacitors in the capacitor bank of the ADB are turned off, the delay is actually more than the buffer delay, i.e., the delay increment of ADB is not zero. This problem is especially serious for the large ADBs that are used to adjust delay over hundreds picoseconds. From the experiment, it is shown that the nonzero-based delay caused by the parasitic effect is not more than 5 ps for small ADBs that are used to adjust delay up to 40 ps with 25.6 ff loading capacitance. However, ADBs that adjust delay up to 200 ps with the same loading capacitance induce about 40 ps base delay even if all capacitors in the capacitor bank are turned off. Consequently, the substantially large base delay of ADBs could distort the clock skew calculation that we performed in the previous section. Rather than attempting to design an ADB that eliminates its parasitic effect completely, we propose a path-based greedy algorithm that can systematically refine the ADB allocation result. Our ADB refinement procedure is performed in the following four steps.

9 400 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 32, NO. 3, MARCH ) Compute the actual arrival times to FFs in the clock tree. If there is at least one power mode that violates the skew constraint, i.e., slk root >Bdue to the nonzero base delay of ADBs, go to step 2. 2) Let lst slk and est slk be the latest and earliest arrival times of a power mode such that slk root is the largest. If slk root B, stop. Otherwise, go to step 3 to increase the delay on the clock path of est slk. 3) Look into the ADBs on the path of est slk from the top to bottom nodes of the clock tree. For each ADB, we want to increase its delay value for the power mode without considering resizing of ADBs. a) If there is no ADB from the current ADB to the bottom ADB that can increase delay without resizing, go to step 4. b) Let D be the maximum delay to be increased by the current ADB without resizing. Increase the ADB s delay value (within up to D) until: i) the clock skew violation of slk root is resolved, or ii) lst 3 slk is increased. For i), go to step 1. For ii), repeat step a) for the next ADB. 4) Look into the ADBs and normal buffers (assuming ADB with zero delay) on the path of est slk from the top to bottom nodes. For each ADB, we want to increase its delay value for the power mode considering resizing of ADBs. a) Increase the ADB s delay value by resizing the ADB until: i) the clock skew violation of slk root is resolved, or ii) lst slk is increased. For i), go to step 1. For ii), repeat step 4a) for the next ADB. Note that it is guaranteed that the repeated application of step 4a) is eventually terminated by executing i). Steps 1 and 2 compute the actual clock delay and identify the clock signal path whose delay should be increased to resolve the worst clock skew violation. Step 3 then attempts to increase ADBs delay values without resizing the ADBs, thus causing no area overhead, while Step 4 tries to increase ADBs delay by resizing the ADBs. Note that finding a polynomial time algorithm or proving the NP completeness of the nonzero base delay problem is another nontrivial task to be tackled. It looks that it is neither a simple nor extendable work. This is because intuitionally the zero base delay assumption offers that allocating ADBs to a clock tree in a power mode to fix the clock skew violation does not affect the result of allocating ADBs in another power mode. However, the nonzero base delay means to consider the ADB allocation problem in all power modes all together. Besides the algorithmic solution to the nonzero base delay problem, it may also be solved by applying some analog design skills, for example, upscaling the front and back inverters in ADB properly. 3 Note that as the delay value of ADB is increased in Step 3b) or Step 4a), the clock path corresponding to lst slk may be changed due to the increase of its delay. For this case, the increase of the delay value stops at the time when lst slk starts to increase. C. Discussion on Further Reducing ADB Cost Even though CLK-ADB uses a minimal number of ADBs, the total implementation cost of ADBs may not be minimal because the area of an ADB is proportionally determined by the largest delay value that the ADB should establish. Furthermore, the proposed new ADB structure increases the implementation cost. We apply the following two simple methodologies called Method A and Method B to explore the possibility of further reducing the total cost of ADB implementations. We apply Method A first to reduce the total cost of capacitor banks in ADBs, followed by applying Method B to reduce the total cost of transistors in ADBs by selectively using our newly designed ADBs. 1) Method A: Maximally Trading Large Capacitors With Small Ones: A.1) Pick the ADB A i with the largest size. Then, find the ADBs that are in the timing dependence relation with A i, and sort the ADBs from the smallest size to the largest. A.2) Let C be the size of the unit capacitor bank. Check if reducing the capacitor size of A i by C causes a clock skew violation. If it does, go to step A.3. Otherwise, reduce the capacitor size of A i by C and repeat step A.2. A.3) Check if reducing the capacitor size of A i by 2 C and increasing the capacitor size of some of the sorted list by C cause a clock skew violation. If it does, A i is excluded from the consideration of reducing size and go to step A.1. Otherwise, reduce the capacitor size of A i by 2 C and increase the capacitor size of the selected ADB by C, and repeat step A.3. 2) Method B: Minimally Replacing ADBs With Newly Designed ADBs: B.1) Collect all the ADBs allocated by CLK-ADB-RD which violate the slew constraint. If there are no ADBs collected, stop. Otherwise, pick the ADB A i with the lowest slew rate. B.2) Replace A i with the newly designed ADB in Fig. 7(b). This may also lead to satisfy the slew constraint on some of the collected ADBs. Go to step B.1. V. Experimental Results A. Experimental Setup We implemented our proposed algorithm CLK-ADB and the algorithm in [14] using C++ on a system with GHz Intel Xeon CPU with 8 GB memory. All input clock trees are generated using Synopsys IC Compiler with the 45- nm Nangate Open Cell Library. As a clock tree synthesis parameter, maximum load capacitance was set to 51.2 ff, and buffer sizing was disabled. The buffers in the resulting clock trees had 18.8 fan-outs on average. The wire segments between buffering elements had a lumped resistance of 70 and a lumped capacitance of 8.4 ff on average. We tested eight benchmark circuits that are composed of three ISCAS 95 benchmarks, three ITC 99 benchmarks, and two ISPD 09 benchmarks. For each benchmark, we partitioned it into six to ten power subdomains, each of which can operate in two

10 LIM et al.: OPTIMAL ALLOCATION ALGORITHM OF ADBs AND PRACTICAL EXTENSIONS 401 TABLE I Comparison of Results Produced by the Previous ADB Allocation [14] and Our CLK-ADB Using a Simple Roundoff Scheme and Our ADB-RD Skew Using Simple Roundoff Scheme Combining Our ADB-RD Original Bound [14] CLK-ADB [14] + ADB-RD CLK-ADB-RD Circuit #FFs #Bufs Skew Latency B #ADBs Skew #ADBs Skew #ADBs/Area Skew Ctrl WL #ADBs/Area Skew Ctrl WL (ps) (ps) (ps) (ps) (ps) (ps) (μm) (ps) (μm) / / s / / / / / / s / / / / / / s / / / / / / b / / / / / / b / / / / / / b / / / / / / f / / / / / / f / / / / Area indicates the total size of ADBs in μm 2. On average, 9%, 11%, and 10% reductions of the total number of ADBs, the total ADB area, and the control wire length are achieved, respectively, by CLK-ADB. Fig. 8. Numbers of ADBs used by CLK-ADB with ADB-RD as the clock skew constraint is relaxed. different voltage levels 0.95 V and 1.1 V. We assumed to use four different power modes and found the worst clock skew over the whole power modes. We have also assumed that each ADB can be adjusted with a granularity of 10 ps. The experiment associated with the real implementation of ADB has been done by using HSpice of version B. Assessing the Performance of CLK-ADB and ADB-RD Table I summarizes the results produced by the previous ADB allocation algorithm in [14] and our CLK-ADB in Section III-B by using a simple roundoff scheme applied to each ADB independently or integrating our roundoff scheme ADB-RD in Section III-C to support the discrete delay value of ADB. The first column shows the tested benchmark circuits: ISCAS 95 benchmarks s35932, s38417, and s38584, ITC 99 benchmarks b17, b18, and b22, and ISPD 09 benchmarks f31 and f34. The second, third, fourth, fifth, and sixth columns show the numbers of FFs, buffers in each tested clock tree, the worst clock skew, the maximum latency extracted from IC Compiler, and the skew bound B, respectively. The seventh and eighth columns show the results produced by the ADB allocation algorithm in [14] and CLK-ADB, followed by a simple roundoff to each ADB. Notice that the number of ADBs used in CLK-ADB is minimal. However, both algorithms fail to meet the clock skew bound when the simple roundoff scheme is applied. On the other hand, the last six columns show the results produced by the ADB allocation algorithm in [14] and CLK-ADB, combining our CLK-RD to meet the clock skew constraint. The comparison shows that CLK-ADB with ADB-RD uses consistently fewer number of ADBs and smaller total wirelength to control ADBs than that by [14] with ADB-RD while satisfying the clock skew constraint. The wirelength is estimated by finding a minimum spanning tree on a graph formed with ADBs and the half-perimeter wirelength between ADBs. Precisely, the ADB reduction is about 10% on average for a clock skew bound of 30 ps 50p with even shorter clock latencies. For example, for s35932, [14] requires 60 ADBs to meet a skew bound constraint of 30 ps under all power modes, whereas CLK-ADB uses 55 ADBs to meet the skew constraint. It should be noted that we could not compare the performance of the previous ADB allocation algorithm by [16] with ours because it assumes a different ADB structure from ours and it does not describe the discrete characteristics of ADB clearly; it does not even specify a scheme or algorithm to support the discrete delay values of ADB. Fig. 8 shows the numbers of ADBs used by CLK-ADB combined with ADB-RD as the clock skew constraint changes for s35932, s38417, and s We can see that there is a good tradeoff between the skew bound and the number of ADBs, which implies that CLK-ADB-RD can be used to find

11 402 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 32, NO. 3, MARCH 2013 TABLE II Transition Delay Table of ADB With Five-Stage Capacitor Bank Number of Activated Unit Capacitors Load Voltage Input Cap. Rise Fall Rise Fall Rise Fall Rise Fall Rise Fall Rise Fall Transition (ff) Delay Delay Delay Delay Delay Delay Delay Delay Delay Delay Delay Delay (ps) (ps) (ps) (ps) (ps) (ps) (ps) (ps) (ps) (ps) (ps) (ps) ps V ps ps V ps Fig. 9. ADB allocation and placement result by CLK-ADB-RD for design B17 at B = 50 ps in Table I. Large blue and small red dots indicate ADBs and buffers, respectively. CLK-ADB-RD allocates 13 ADBs, whereas the algorithm in [14] allocates 15 ADBs. alternative solutions in terms of skew bound and the number of ADBs. Fig. 9 shows the allocation and placement by CLK- ADB-RD for B17 under B = 50 ps. The dots represent the buffer locations, among which the blue dots represent the ADBs allocated by CLK-ADB-RD where a total of 13 ADBs are allocated whereas 15 ADBs are allocated in [14]. C. Validation of the Practical Characteristics of ADBs 1) Nonuniform Discrete Delay Variation: To extract the real timing characteristics of ADBs, we have implemented and experimented ADBs using HSpice We initially implemented a conventional ADB that consists of two inverters: a capacitor bank and its controller. Table II summarizes the timing results of an ADB composed of a five-stage capacitor bank. We implemented each switch and capacitor in the capacitor bank with minimum size nmos transistors. The first column indicates the voltage applied to the ADB, and the second column indicates the input transition time to the ADB. The third column shows the four different load capacitances that we used. The pairs of the remaining columns show the delays for the rising edge and the falling edge when we turn on none (0), one (1), two (2), three (3), four (4), and all five (5) switches in the capacitor bank. The results indicate that the discrete delay value of ADB varies (i.e., not a constant precision) according to the input transition time, the load capacitance, and the operating voltage. As an example of an extreme case, the delay of ADB is changed from ps to ps with 1.92 ps (small) granularity when the input transition time is 37.5 ps and the load capacitance is 0.4 ff under a 1.1 V operating voltage, while the delay of ADB is changed from 1216 ps to 1242 ps with 5.2 ps (large) granularity when the input transition time is 150 ps and the capacitance is ff under a 0.95 V operating voltage. This nonuniform delay granularity of ADB does not obey the assumption used in [1], [14], [15], in which a uniform granularity of 10 ps is applied. Note that our CLK-ADB-RD has no restriction on the delay granularity. We implemented various kinds of ADBs with 5 50 internal unit capacitors and switches, and created the ADB delay table through HSpice simulation. Then, we estimated the delays of all possible configurations by interpolation based on the table, and used the delays to check the effectiveness of our practical extensions of CLK-ADB-RD. 2) Nonuniform Output Slope Variation: Table III shows the output slope changes when we select different sized ADB and activate different numbers of capacitors. In the experiment, we set the input transition time to 150 ps, load capacitance to 25.6 ff, and operating voltage to 0.95 V. Then, we changed the number of activated capacitors in the circuit and observed the changes of output slew of the circuit where the slews are measured at We implemented an internal capacitor bank with minimum sized nmos transistors, Then, we controlled the number of capacitors and switches to change the size of ADB and its adjustable delay range. The first column in Table V-B indicates the numbers of unit capacitors in the internal capacitor bank in ADB. The next five pairs of columns show the variation of output slews for the rising

12 LIM et al.: OPTIMAL ALLOCATION ALGORITHM OF ADBs AND PRACTICAL EXTENSIONS 403 TABLE III Output Slew Variation of Traditional ADB Structure Rates of Activated Capacitors 0% 20% 40% 80% 100% Variation (%) Size Slew Slew Slew Slew Slew Slew Slew Slew Slew Slew of (Rising) (Falling) (Rising) (Falling) (Rising) (Falling) (Rising) (Falling) (Rising) (Falling) Rising Falling ADB (ps) (ps) (ps) (ps) (ps) (ps) (ps) (ps) (ps) (ps) Supply voltage of 0.95 V, input transition of 150 ps, and load capacitance of 6.4 ff are applied to the ADB in Fig. 7(a). As the slew varies depending on load capacitance, ADB inverter sizing, and ADB capacitor bank sizing, the designer should control such parameters to balance the rising and falling slews. edge and the falling edge when the numbers of activated unit capacitors are 0%, 20%, 40%, 80%, and 100%. The last two columns show the relative difference between the cases when all switches are turned off and when all switches are turned on for the rising and falling edges. The results reveal that the output slew varies depending on the number of activated capacitors in the capacitor bank. For the ADB with 50 unit capacitors, the output slew is ps for the rising edge and 185 ps for the falling edge when all switches connected to capacitors are turned off. Moreover, the output slew increases as the number of activated capacitors increases. For example, the output slew of ADB with 50 unit capacitors rises up to ps for the rising edge and ps for the falling edge. As shown in the table, for ADBs with less than or equal to ten unit capacitors, the variation of output slew is within 10% for both the rising and falling edges, which we accept the variation as tolerable variation in our experiment. We also call the ADBs with tolerable variation small-sized ADBs, which will be used in the ADB cost reduction procedure (Method B in Section IV-C). Table V-C2 shows the result of output slew variation obtained by simulating the newly designed ADBs in Fig. 7(b). It is verified that all the output slews are small and uniform. 3) Nonzero Base Delay: Another characteristic of the ADB implementation is the nonzero base delay increment due to the parasitic effect. This invalidates the assumption that the ADB delay can be set to 0 when it is not used for a power mode. We have implemented the new ADB structure in Fig. 7(b) and compared its base delay with a normal buffer. Fig. 10 shows the changes of the base delay of an ADB as the range of capacitor bank in the ADB changes when the load capacitance of 25.6 ff is assumed. The base delay proportionally increases with the size of ADB, which determines the parasitic resistance and capacitance. This nonzero base delay may cause a clock skew violation. For example, an ADB implementation whose delay should increase up to ps under 25.6 ff load capacitance will inherently entail a 60.2 ps base delay. Then, consider the case when the ADB delay of ps adjusted under power mode 1 needs to be readjusted into 0 ps under power mode 2, because the ADB will not be used in power mode 2. One solution might be to install the ADB and one buffer in Fig. 10. Change in base delay of ADB as the range of capacitor bank in ADB changes under load capacitance = 25.6 ff. parallel with transmission gate to select one of them in a power mode. Instead, in this paper, we have proposed a greedy ADB refinement algorithm in Section IV-B. D. Assessing the Performance of Practical Extensions of CLK- ADB-RD 1) CLK-ADB-RD Supporting Base Delay: Table V summarizes the results produced by the CLK-ADB-RD without and with the use of our refinement algorithm (specified as refalg in the table) in Section IV-B to support the practical issue of nonzero base delay. The comparison shows that without considering the nonzero base delay, the resultant clock skew greatly exceeds the clock skew bound B for all test cases while our proposed ADB refinement algorithm enables to meet the clock skew constraint for all test cases under the nonzero base delay, but there is the area penalty of 18% on average. 2) CLK-ADB-RD Supporting Slew Variation and Refining ADB Cost: Fig. 11(a) and (b) shows the distribution of ADB sizes (in terms of the number of unit capacitors) produced by the ADB cost refinement technique Method A proposed in Section IV-C for test case s35932 under 30 ps and 40 ps clock skew constraints, respectively. The distributions show that the majority of the ADBs (82% 86%) is the ones with not more than ten unit capacitors. Fig. 12 shows the (normalized) ADB cost optimized by applying Method A followed by Method B to the ADB allocation results produced by CLK-ADB-RD, in which the

13 404 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 32, NO. 3, MARCH 2013 TABLE IV Output Slew Variation of a New ADB Structure Rates of Activated Capacitors Variation (%) 0% 20% 40% 80% 100% Size Slew Slew Slew Slew Slew Slew Slew Slew Slew Slew of (Rising) (Falling) (Rising) (Falling) (Rising) (Falling) (Rising) (Falling) (Rising) (Falling) Rising Falling ADB (ps) (ps) (ps) (ps) (ps) (ps) (ps) (ps) (ps) (ps) Supply voltage of 0.95 V, input transition of 150 ps, and load capacitance of 6.4 ff are applied to the ADB in Fig. 7(b). As the slew delays vary depending on load capacitance, ADB inverter sizing, and ADB capacitor bank sizing, the designer should control such parameters to balance the rising and falling slews. TABLE V Comparison of Results Produced by CLK-ADB-RD Assuming Base Delay = 0 and CLK-ADB-RD Combining the Refinement Algorithm in Section IV-B to Support the Nonzero Base Delay Original Skew CLK-ADB-RD CLK-ADB-RD+ref-alg Increased Circuit #FF #buf Skew Bound B Skew Skew Area (ps) (ps) (ps) (ps) (%) S S S B B B Avg Fig. 11. ADB distribution in terms of the number of unit capacitors for s35932 (a) under 30 ps and (b) under 40 ps skew constraints, respectively. initial ADB cost is measured by the total area of ADBs allocated by CLK-ADB-RD using the new designed ADBs exclusively for the output slew conservation. However, the greedy redistribution of capacitors among ADBs by Method A enables Method B to minimally use the newly designed ADBs to meet the output slew constraint. The overall area reduction is 29.5% on average. Fig. 12. (Normalized) total area of ADBs after the application of cost reduction technique Method A followed by the selective ADB replacement technique Method B. VI. Conclusion This paper proposed a complete solution to the problem of clock skew minimization using ADBs under multiple power modes. We proposed a linear-time algorithm that simultaneously solved the problems of computing the minimum (optimal) number of ADBs to be used, the location at which each ADB is to be placed, and the delay value of each

14 LIM et al.: OPTIMAL ALLOCATION ALGORITHM OF ADBs AND PRACTICAL EXTENSIONS 405 ADB to be assigned to each power mode. To be practically feasible, we also proposed a new ADB design technique and systematic algorithmic solutions to address the problems of discrete delay values, slew rate variation, nonzero base delay of ADB, and a possible exploration of ADB resizing, which have not been completely addressed by the previous works. Through extensive experiments, it was confirmed that our proposed ADB allocation flow was able to provide a practically useful solution to the clock skew optimization problem for the designs with multiple power modes that support diverse platforms or applications [19], [20]. References [1] K.-H. Lim and T. Kim, An optimal algorithm for allocation, placement, and delay assignment of adjustable delay buffers for clock skew minimization in multi-voltage mode designs, in Proc. IEEE Asia South- Pacific Design Autom. Conf., Jan. 2011, pp [2] C. J. Alpert, A. Devgan, and S. T. Quay, Buffer insertion with accurate gate and interconnect delay computation, in Proc. ACM/IEEE Design Autom. Conf., Jun. 1999, pp [3] J. Cong, C. Koh, and K. Leung, Simultaneous buffer and wire sizing for performance and power optimization, in Proc. IEEE/ACM Int. Symp. Low Power Electron. Design, Aug. 1996, pp [4] C. C. N. Chu and M. D. F. Wong, An efficient and optimal algorithm for simultaneous buffer and wire sizing, IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 18, no. 9, pp , Sep [5] I.-M. Liu, T.-L. Chou, A. Aziz, and M. D. F. Wong, Zero-skew clock tree construction by simultaneous routing, wire sizing and buffer insertion, in Proc. ACM Int. Symp. Phys. Design, 2000, pp [6] T. Okamoto and J. Cong, Buffered Steiner tree construction with wire sizing for interconnect layout optimization, in Proc. IEEE/ACM Int. Conf. Comput.-Aided Design, Nov. 1996, pp [7] J.-L. Tsai, T.-H. Chen, and C.-P. Chen, Zero skew clock-tree optimization with buffer insertion/sizing and wire sizing, IEEE Trans. Comput.- Aided Des. Integr. Circuits Syst., vol. 23, no. 4, pp , Apr [8] K. Wang, Y. Ran, H. Jiang, and M. Marek-Sadowska, General skew constrained clock network sizing based on sequential linear programming, IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 24, no. 5, pp , May [9] S. Hu and J. Hu, Unified adaptivity optimization of clock and logic signals, in Proc. IEEE/ACM Int. Conf. Comput.-Aided Des., Nov. 2007, pp [10] V. Khandelwal and A. Srivastava, Variability-driven formulation for simultaneous gate sizing and post-silicon tunability allocation, in Proc. ACM Int. Symp. Phys. Des., 2007, pp [11] J.-L. Tsai and L. Zhang, Statistical timing analysis driven post-silicontunable clock-tree synthesis, in Proc. IEEE/ACM Int. Conf. Comput.- Aided Des., Nov. 2005, pp [12] E. Takahashi, Y. Kasai, M. Murakawa, and T. Higuchi, A post-silicon clock timing adjustment using genetic algorithms, in Proc. Symp. Very Large Scale Integr. Circuits, Jun. 2003, pp [13] S. Tam, S. Rusu, U. Nagarji Desai, R. Kim, J. Zhang, and I. Young, Clock generation and distribution for the first IA-64 microprocessor, IEEE J. Solid-State Circuits, vol. 35, no. 11, pp , Nov [14] Y.-S. Su, W.-K. Hon, C.-C. Yang, S.-C. Chang, and Y.-J. Chang, Value assignment of adjustable delay buffers for clock skew minimization in multi-voltage mode designs, in Proc. IEEE/ACM Int. Conf. Comput.- Aided Design, Nov. 2009, pp [15] Y.-S. Su, W.-K. Hon, C.-C. Yang, S.-C. Chang, and Y.-J. Chang, Clock skew minimization in multi-voltage mode designs using adjustable delay buffers, IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 29, no. 12, pp , Dec [16] K.-Y. Lin, H.-T. Lin, and T.-Y. Ho, An efficient algorithm of adjustable delay buffer insertion for clock skew minimization in multiple dynamic supply voltage designs, in Proc. IEEE Asia-South Pacific Des. Autom. Conf., Jan. 2011, pp [17] A. Kapoor, N. Jayakumar, and S. P. Khatri, A novel clock distribution and dynamic de-skewing methodology, in Proc. IEEE/ACM Int. Conf. Comput.-Aided Des., Nov. 2004, pp [18] G. N. Roberts, Adjustable buffer driver, U.S. Patent , [19] E. H. Nam, K. S. Choi, J.-Y. Choi, H. J. Min, and S. L. Min, Hardware platforms for Flash memory/nvram software development, J. Comput. Sci. Eng., vol. 3, no. 3, pp , Sep [20] K. Deray and S. J. Simoff, Designing technology for visualisation of interactions on mobile devices, J. Comput. Sci. Eng., vol. 3, no. 4, pp , Dec Kyoung-Hwan Lim received the B.S. degree in electrical engineering from the Korea Advanced Institute of Science and Technology, Daejeon, Korea, in 2004, and the M.S. and Ph.D. degrees in electrical and computer engineering from Seoul National University, Seoul, Korea, in 2007 and 2012, respectively. He is currently with the System LSI Division, Samsung Electronics Company, Ltd., Yongin, Korea. His current research interests include clock tree optimization, high-level synthesis, and timing closure. Deokjin Joo (S 11) received the B.S. degree and the M.S. degree in electrical engineering from Seoul National University, Seoul, Korea, in 2009 and 2011, respectively. He is currently pursuing the Ph.D. degree with the School of Electrical Engineering and Computer Science, Seoul National University. His current research interests include clock tree synthesis for low-power and thermal-resilient design. Taewhan Kim (SM 08) received the B.S. degree in computer science and statistics and the M.S. degree in computer science from Seoul National University, Seoul, Korea, and the Ph.D. degree in computer science from the University of Illinois at Urbana- Champaign, Urbana, in He is currently a Professor with the School of Electrical Engineering and Computer Science, Seoul National University. After graduation, he was with Lattice Semiconductor Corporation and Synopsys, Inc., San Jose, CA, for six years, specializing in design automation tool development. He has published over 160 technical papers in international journals and conferences, including the IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, the IEEE Transactions on Very Large Scale Integration (VLSI) Systems, ACM TODAES, DAC, ICCAD, and ASPDAC. His current research interests include computer-aided design of integrated circuits ranging from architectural synthesis to physical designs, specifically focusing on power, thermal, noise, reliability, and 3-D integrated circuit design issues. Dr. Kim is the Editor-in-Chief of the International Journal of Computing Science and Engineering.

An Optimal Algorithm of Adjustable Delay Buffer Insertion for Solving Clock Skew Variation Problem

An Optimal Algorithm of Adjustable Delay Buffer Insertion for Solving Clock Skew Variation Problem An Optimal Algorithm of Adjustable Delay Buffer Insertion for Solving Clock Skew Variation Problem Juyeon Kim 1 juyeon@ssl.snu.ac.kr Deokjin Joo 1 jdj@ssl.snu.ac.kr Taewhan Kim 1,2 tkim@ssl.snu.ac.kr 1

More information

Clock Buffer Polarity Assignment Utilizing Useful Clock Skews for Power Noise Reduction

Clock Buffer Polarity Assignment Utilizing Useful Clock Skews for Power Noise Reduction Clock Buffer Polarity Assignment Utilizing Useful Clock Skews for Power Noise Reduction Deokjin Joo and Taewhan Kim Department of Electrical and Computer Engineering, Seoul National University, Seoul,

More information

Variation-aware Clock Network Design Methodology for Ultra-Low Voltage (ULV) Circuits

Variation-aware Clock Network Design Methodology for Ultra-Low Voltage (ULV) Circuits Variation-aware Clock Network Design Methodology for Ultra-Low Voltage (ULV) Circuits Xin Zhao, Jeremy R. Tolbert, Chang Liu, Saibal Mukhopadhyay, and Sung Kyu Lim School of ECE, Georgia Institute of Technology,

More information

CSE241 VLSI Digital Circuits Winter Lecture 07: Timing II

CSE241 VLSI Digital Circuits Winter Lecture 07: Timing II CSE241 VLSI Digital Circuits Winter 2003 Lecture 07: Timing II CSE241 L3 ASICs.1 Delay Calculation Cell Fall Cap\Tr 0.05 0.2 0.5 0.01 0.02 0.16 0.30 0.5 2.0 0.04 0.32 0.178 0.08 0.64 0.60 1.20 0.1ns 0.147ns

More information

TAU 2014 Contest Pessimism Removal of Timing Analysis v1.6 December 11 th,

TAU 2014 Contest Pessimism Removal of Timing Analysis v1.6 December 11 th, TU 2014 Contest Pessimism Removal of Timing nalysis v1.6 ecember 11 th, 2013 https://sites.google.com/site/taucontest2014 1 Introduction This document outlines the concepts and implementation details necessary

More information

On Detecting Multiple Faults in Baseline Interconnection Networks

On Detecting Multiple Faults in Baseline Interconnection Networks On Detecting Multiple Faults in Baseline Interconnection Networks SHUN-SHII LIN 1 AND SHAN-TAI CHEN 2 1 National Taiwan Normal University, Taipei, Taiwan, ROC 2 Chung Cheng Institute of Technology, Tao-Yuan,

More information

TAU 2015 Contest Incremental Timing Analysis and Incremental Common Path Pessimism Removal (CPPR) Contest Education. v1.9 January 19 th, 2015

TAU 2015 Contest Incremental Timing Analysis and Incremental Common Path Pessimism Removal (CPPR) Contest Education. v1.9 January 19 th, 2015 TU 2015 Contest Incremental Timing nalysis and Incremental Common Path Pessimism Removal CPPR Contest Education v1.9 January 19 th, 2015 https://sites.google.com/site/taucontest2015 Contents 1 Introduction

More information

The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization

The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization Jia Wang, Shiyan Hu Department of Electrical and Computer Engineering Michigan Technological University Houghton, Michigan

More information

Implementation of Clock Network Based on Clock Mesh

Implementation of Clock Network Based on Clock Mesh International Conference on Information Technology and Management Innovation (ICITMI 2015) Implementation of Clock Network Based on Clock Mesh He Xin 1, a *, Huang Xu 2,b and Li Yujing 3,c 1 Sichuan Institute

More information

Clock signal in digital circuit is responsible for synchronizing the transfer to the data between processing elements.

Clock signal in digital circuit is responsible for synchronizing the transfer to the data between processing elements. 1 2 Introduction Clock signal in digital circuit is responsible for synchronizing the transfer to the data between processing elements. Defines the precise instants when the circuit is allowed to change

More information

CMPEN 411 VLSI Digital Circuits Spring 2012 Lecture 17: Dynamic Sequential Circuits And Timing Issues

CMPEN 411 VLSI Digital Circuits Spring 2012 Lecture 17: Dynamic Sequential Circuits And Timing Issues CMPEN 411 VLSI Digital Circuits Spring 2012 Lecture 17: Dynamic Sequential Circuits And Timing Issues [Adapted from Rabaey s Digital Integrated Circuits, Second Edition, 2003 J. Rabaey, A. Chandrakasan,

More information

Physical Design of Digital Integrated Circuits (EN0291 S40) Sherief Reda Division of Engineering, Brown University Fall 2006

Physical Design of Digital Integrated Circuits (EN0291 S40) Sherief Reda Division of Engineering, Brown University Fall 2006 Physical Design of Digital Integrated Circuits (EN0291 S40) Sherief Reda Division of Engineering, Brown University Fall 2006 1 Lecture 04: Timing Analysis Static timing analysis STA for sequential circuits

More information

Minimizing Clock Latency Range in Robust Clock Tree Synthesis

Minimizing Clock Latency Range in Robust Clock Tree Synthesis Minimizing Clock Latency Range in Robust Clock Tree Synthesis Wen-Hao Liu Yih-Lang Li Hui-Chi Chen You have to enlarge your font. Many pages are hard to view. I think the position of Page topic is too

More information

Skew Management of NBTI Impacted Gated Clock Trees

Skew Management of NBTI Impacted Gated Clock Trees Skew Management of NBTI Impacted Gated Clock Trees Ashutosh Chakraborty ECE Department The University of Texas at Austin Austin, TX 78703, USA ashutosh@cerc.utexas.edu David Z. Pan ECE Department The University

More information

Reducing Delay Uncertainty in Deeply Scaled Integrated Circuits Using Interdependent Timing Constraints

Reducing Delay Uncertainty in Deeply Scaled Integrated Circuits Using Interdependent Timing Constraints Reducing Delay Uncertainty in Deeply Scaled Integrated Circuits Using Interdependent Timing Constraints Emre Salman and Eby G. Friedman Department of Electrical and Computer Engineering University of Rochester

More information

EE115C Winter 2017 Digital Electronic Circuits. Lecture 19: Timing Analysis

EE115C Winter 2017 Digital Electronic Circuits. Lecture 19: Timing Analysis EE115C Winter 2017 Digital Electronic Circuits Lecture 19: Timing Analysis Outline Timing parameters Clock nonidealities (skew and jitter) Impact of Clk skew on timing Impact of Clk jitter on timing Flip-flop-

More information

ULTRALOW VOLTAGE (ULV) circuits, where the supply

ULTRALOW VOLTAGE (ULV) circuits, where the supply 1222 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 31, NO. 8, AUGUST 2012 Variation-Aware Clock Network Design Methodology for Ultralow Voltage (ULV) Circuits Xin

More information

University of Toronto. Final Exam

University of Toronto. Final Exam University of Toronto Final Exam Date - Apr 18, 011 Duration:.5 hrs ECE334 Digital Electronics Lecturer - D. Johns ANSWER QUESTIONS ON THESE SHEETS USING BACKS IF NECESSARY 1. Equation sheet is on last

More information

Spiral 2 7. Capacitance, Delay and Sizing. Mark Redekopp

Spiral 2 7. Capacitance, Delay and Sizing. Mark Redekopp 2-7.1 Spiral 2 7 Capacitance, Delay and Sizing Mark Redekopp 2-7.2 Learning Outcomes I understand the sources of capacitance in CMOS circuits I understand how delay scales with resistance, capacitance

More information

STATIC TIMING ANALYSIS

STATIC TIMING ANALYSIS STATIC TIMING ANALYSIS Standard Cell Library NanGate 45 nm Open Cell Library Open-source standard cell library Over 62 different functions ranging from buffers, to scan-able FFs with set and reset, to

More information

MOSIS REPORT. Spring MOSIS Report 1. MOSIS Report 2. MOSIS Report 3

MOSIS REPORT. Spring MOSIS Report 1. MOSIS Report 2. MOSIS Report 3 MOSIS REPORT Spring 2010 MOSIS Report 1 MOSIS Report 2 MOSIS Report 3 MOSIS Report 1 Design of 4-bit counter using J-K flip flop I. Objective The purpose of this project is to design one 4-bit counter

More information

Clock Skew Scheduling in the Presence of Heavily Gated Clock Networks

Clock Skew Scheduling in the Presence of Heavily Gated Clock Networks Clock Skew Scheduling in the Presence of Heavily Gated Clock Networks ABSTRACT Weicheng Liu, Emre Salman Department of Electrical and Computer Engineering Stony Brook University Stony Brook, NY 11794 [weicheng.liu,

More information

Lecture 21: Packaging, Power, & Clock

Lecture 21: Packaging, Power, & Clock Lecture 21: Packaging, Power, & Clock Outline Packaging Power Distribution Clock Distribution 2 Packages Package functions Electrical connection of signals and power from chip to board Little delay or

More information

Lecture 9: Clocking, Clock Skew, Clock Jitter, Clock Distribution and some FM

Lecture 9: Clocking, Clock Skew, Clock Jitter, Clock Distribution and some FM Lecture 9: Clocking, Clock Skew, Clock Jitter, Clock Distribution and some FM Mark McDermott Electrical and Computer Engineering The University of Texas at Austin 9/27/18 VLSI-1 Class Notes Why Clocking?

More information

On Two Class-Constrained Versions of the Multiple Knapsack Problem

On Two Class-Constrained Versions of the Multiple Knapsack Problem On Two Class-Constrained Versions of the Multiple Knapsack Problem Hadas Shachnai Tami Tamir Department of Computer Science The Technion, Haifa 32000, Israel Abstract We study two variants of the classic

More information

UNIVERSITY OF CALIFORNIA, BERKELEY College of Engineering Department of Electrical Engineering and Computer Sciences

UNIVERSITY OF CALIFORNIA, BERKELEY College of Engineering Department of Electrical Engineering and Computer Sciences UNIVERSITY OF CALIFORNIA, BERKELEY College of Engineering Department of Electrical Engineering and Computer Sciences Elad Alon Homework #9 EECS141 PROBLEM 1: TIMING Consider the simple state machine shown

More information

DKDT: A Performance Aware Dual Dielectric Assignment for Tunneling Current Reduction

DKDT: A Performance Aware Dual Dielectric Assignment for Tunneling Current Reduction DKDT: A Performance Aware Dual Dielectric Assignment for Tunneling Current Reduction Saraju P. Mohanty Dept of Computer Science and Engineering University of North Texas smohanty@cs.unt.edu http://www.cs.unt.edu/~smohanty/

More information

Santa Claus Schedules Jobs on Unrelated Machines

Santa Claus Schedules Jobs on Unrelated Machines Santa Claus Schedules Jobs on Unrelated Machines Ola Svensson (osven@kth.se) Royal Institute of Technology - KTH Stockholm, Sweden March 22, 2011 arxiv:1011.1168v2 [cs.ds] 21 Mar 2011 Abstract One of the

More information

Steiner Trees in Chip Design. Jens Vygen. Hangzhou, March 2009

Steiner Trees in Chip Design. Jens Vygen. Hangzhou, March 2009 Steiner Trees in Chip Design Jens Vygen Hangzhou, March 2009 Introduction I A digital chip contains millions of gates. I Each gate produces a signal (0 or 1) once every cycle. I The output signal of a

More information

Design for Variability and Signoff Tips

Design for Variability and Signoff Tips Design for Variability and Signoff Tips Alexander Tetelbaum Abelite Design Automation, Walnut Creek, USA alex@abelite-da.com ABSTRACT The paper provides useful design tips and recommendations on how to

More information

CARNEGIE MELLON UNIVERSITY DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING DIGITAL INTEGRATED CIRCUITS FALL 2002

CARNEGIE MELLON UNIVERSITY DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING DIGITAL INTEGRATED CIRCUITS FALL 2002 CARNEGIE MELLON UNIVERSITY DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING 18-322 DIGITAL INTEGRATED CIRCUITS FALL 2002 Final Examination, Monday Dec. 16, 2002 NAME: SECTION: Time: 180 minutes Closed

More information

AS computer hardware technology advances, both

AS computer hardware technology advances, both 1 Best-Harmonically-Fit Periodic Task Assignment Algorithm on Multiple Periodic Resources Chunhui Guo, Student Member, IEEE, Xiayu Hua, Student Member, IEEE, Hao Wu, Student Member, IEEE, Douglas Lautner,

More information

iretilp : An efficient incremental algorithm for min-period retiming under general delay model

iretilp : An efficient incremental algorithm for min-period retiming under general delay model iretilp : An efficient incremental algorithm for min-period retiming under general delay model Debasish Das, Jia Wang and Hai Zhou EECS, Northwestern University, Evanston, IL 60201 Place and Route Group,

More information

Topics to be Covered. capacitance inductance transmission lines

Topics to be Covered. capacitance inductance transmission lines Topics to be Covered Circuit Elements Switching Characteristics Power Dissipation Conductor Sizes Charge Sharing Design Margins Yield resistance capacitance inductance transmission lines Resistance of

More information

Problem. Problem Given a dictionary and a word. Which page (if any) contains the given word? 3 / 26

Problem. Problem Given a dictionary and a word. Which page (if any) contains the given word? 3 / 26 Binary Search Introduction Problem Problem Given a dictionary and a word. Which page (if any) contains the given word? 3 / 26 Strategy 1: Random Search Randomly select a page until the page containing

More information

Reducing power in using different technologies using FSM architecture

Reducing power in using different technologies using FSM architecture Reducing power in using different technologies using FSM architecture Himani Mitta l, Dinesh Chandra 2, Sampath Kumar 3,2,3 J.S.S.Academy of Technical Education,NOIDA,U.P,INDIA himanimit@yahoo.co.in, dinesshc@gmail.com,

More information

Name: Answers. Mean: 83, Standard Deviation: 12 Q1 Q2 Q3 Q4 Q5 Q6 Total. ESE370 Fall 2015

Name: Answers. Mean: 83, Standard Deviation: 12 Q1 Q2 Q3 Q4 Q5 Q6 Total. ESE370 Fall 2015 University of Pennsylvania Department of Electrical and System Engineering Circuit-Level Modeling, Design, and Optimization for Digital Systems ESE370, Fall 2015 Final Tuesday, December 15 Problem weightings

More information

EE371 - Advanced VLSI Circuit Design

EE371 - Advanced VLSI Circuit Design EE371 - Advanced VLSI Circuit Design Midterm Examination May 7, 2002 Name: No. Points Score 1. 18 2. 22 3. 30 TOTAL / 70 In recognition of and in the spirit of the Stanford University Honor Code, I certify

More information

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

ESE 570: Digital Integrated Circuits and VLSI Fundamentals ESE 570: Digital Integrated Circuits and VLSI Fundamentals Lec 18: March 27, 2018 Dynamic Logic, Charge Injection Lecture Outline! Sequential MOS Logic " D-Latch " Timing Constraints! Dynamic Logic " Domino

More information

Making Fast Buffer Insertion Even Faster via Approximation Techniques

Making Fast Buffer Insertion Even Faster via Approximation Techniques Making Fast Buffer Insertion Even Faster via Approximation Techniques Zhuo Li, C. N. Sze, Jiang Hu and Weiping Shi Department of Electrical Engineering Texas A&M University Charles J. Alpert IBM Austin

More information

Testability. Shaahin Hessabi. Sharif University of Technology. Adapted from the presentation prepared by book authors.

Testability. Shaahin Hessabi. Sharif University of Technology. Adapted from the presentation prepared by book authors. Testability Lecture 6: Logic Simulation Shaahin Hessabi Department of Computer Engineering Sharif University of Technology Adapted from the presentation prepared by book authors Slide 1 of 27 Outline What

More information

EE 466/586 VLSI Design. Partha Pande School of EECS Washington State University

EE 466/586 VLSI Design. Partha Pande School of EECS Washington State University EE 466/586 VLSI Design Partha Pande School of EECS Washington State University pande@eecs.wsu.edu Lecture 9 Propagation delay Power and delay Tradeoffs Follow board notes Propagation Delay Switching Time

More information

Area-Time Optimal Adder with Relative Placement Generator

Area-Time Optimal Adder with Relative Placement Generator Area-Time Optimal Adder with Relative Placement Generator Abstract: This paper presents the design of a generator, for the production of area-time-optimal adders. A unique feature of this generator is

More information

The Linear-Feedback Shift Register

The Linear-Feedback Shift Register EECS 141 S02 Timing Project 2: A Random Number Generator R R R S 0 S 1 S 2 1 0 0 0 1 0 1 0 1 1 1 0 1 1 1 0 1 1 0 0 1 1 0 0 The Linear-Feedback Shift Register 1 Project Goal Design a 4-bit LFSR SPEED, SPEED,

More information

Perfect-Balance Planar Clock. Routing with Minimal Path Length UCSC-CRL March 26, University of California, Santa Cruz

Perfect-Balance Planar Clock. Routing with Minimal Path Length UCSC-CRL March 26, University of California, Santa Cruz Perfect-Balance Planar Clock Routing with Minimal Path Length Qing Zhu Wayne W.M. Dai UCSC-CRL-93-17 supercedes UCSC-CRL-92-12 March 26, 1993 Board of Studies in Computer Engineering University of California,

More information

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

ESE 570: Digital Integrated Circuits and VLSI Fundamentals ESE 570: Digital Integrated Circuits and VLSI Fundamentals Lec 23: April 17, 2018 I/O Circuits, Inductive Noise, CLK Generation Lecture Outline! Packaging! Variation and Testing! I/O Circuits! Inductive

More information

COS597D: Information Theory in Computer Science October 19, Lecture 10

COS597D: Information Theory in Computer Science October 19, Lecture 10 COS597D: Information Theory in Computer Science October 9, 20 Lecture 0 Lecturer: Mark Braverman Scribe: Andrej Risteski Kolmogorov Complexity In the previous lectures, we became acquainted with the concept

More information

Timing Analysis with Clock Skew

Timing Analysis with Clock Skew , Mark Horowitz 1, & Dean Liu 1 David_Harris@hmc.edu, {horowitz, dliu}@vlsi.stanford.edu March, 1999 Harvey Mudd College Claremont, CA 1 (with Stanford University, Stanford, CA) Outline Introduction Timing

More information

Lecture 8: Combinational Circuits

Lecture 8: Combinational Circuits Introduction to CMOS VLSI Design Lecture 8: Combinational Circuits David Harris Harvey Mudd College Spring 004 Outline ubble Pushing Compound Gates Logical Effort Example Input Ordering symmetric Gates

More information

CMSC 451: Lecture 7 Greedy Algorithms for Scheduling Tuesday, Sep 19, 2017

CMSC 451: Lecture 7 Greedy Algorithms for Scheduling Tuesday, Sep 19, 2017 CMSC CMSC : Lecture Greedy Algorithms for Scheduling Tuesday, Sep 9, 0 Reading: Sects.. and. of KT. (Not covered in DPV.) Interval Scheduling: We continue our discussion of greedy algorithms with a number

More information

A Temperature-aware Synthesis Approach for Simultaneous Delay and Leakage Optimization

A Temperature-aware Synthesis Approach for Simultaneous Delay and Leakage Optimization A Temperature-aware Synthesis Approach for Simultaneous Delay and Leakage Optimization Nathaniel A. Conos and Miodrag Potkonjak Computer Science Department University of California, Los Angeles {conos,

More information

Issues on Timing and Clocking

Issues on Timing and Clocking ECE152B TC 1 Issues on Timing and Clocking X Combinational Logic Z... clock clock clock period ECE152B TC 2 Latch and Flip-Flop L CK CK 1 L1 1 L2 2 CK CK CK ECE152B TC 3 Clocking X Combinational Logic...

More information

Thermal-reliable 3D Clock-tree Synthesis Considering Nonlinear Electrical-thermal-coupled TSV Model

Thermal-reliable 3D Clock-tree Synthesis Considering Nonlinear Electrical-thermal-coupled TSV Model Thermal-reliable 3D Clock-tree Synthesis Considering Nonlinear Electrical-thermal-coupled TSV Model Yang Shang 1, Chun Zhang 1, Hao Yu 1, Chuan Seng Tan 1, Xin Zhao 2, Sung Kyu Lim 2 1 School of Electrical

More information

Optimum Prefix Adders in a Comprehensive Area, Timing and Power Design Space

Optimum Prefix Adders in a Comprehensive Area, Timing and Power Design Space Optimum Prefix Adders in a Comprehensive Area, Timing and Power Design Space Jianhua Liu, Yi Zhu, Haikun Zhu, John Lillis 2, Chung-Kuan Cheng Department of Computer Science and Engineering University of

More information

Lecture 8: Logic Effort and Combinational Circuit Design

Lecture 8: Logic Effort and Combinational Circuit Design Lecture 8: Logic Effort and Combinational Circuit Design Slides courtesy of Deming Chen Slides based on the initial set from David Harris CMOS VLSI Design Outline q Logical Effort q Delay in a Logic Gate

More information

Skew Management of NBTI Impacted Gated Clock Trees

Skew Management of NBTI Impacted Gated Clock Trees International Symposium on Physical Design 2010 Skew Management of NBTI Impacted Gated Clock Trees Ashutosh Chakraborty and David Z. Pan ECE Department, University of Texas at Austin ashutosh@cerc.utexas.edu

More information

Interconnect s Role in Deep Submicron. Second class to first class

Interconnect s Role in Deep Submicron. Second class to first class Interconnect s Role in Deep Submicron Dennis Sylvester EE 219 November 3, 1998 Second class to first class Interconnect effects are no longer secondary # of wires # of devices More metal levels RC delay

More information

Xarxes de distribució del senyal de. interferència electromagnètica, consum, soroll de conmutació.

Xarxes de distribució del senyal de. interferència electromagnètica, consum, soroll de conmutació. Xarxes de distribució del senyal de rellotge. Clock skew, jitter, interferència electromagnètica, consum, soroll de conmutació. (transparències generades a partir de la presentació de Jan M. Rabaey, Anantha

More information

Efficient Circuit Analysis under Multiple Input Switching (MIS) Anupama R. Subramaniam

Efficient Circuit Analysis under Multiple Input Switching (MIS) Anupama R. Subramaniam Efficient Circuit Analysis under Multiple Input Switching (MIS) by Anupama R. Subramaniam A Dissertation Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy Approved

More information

Gradient Clock Synchronization

Gradient Clock Synchronization Noname manuscript No. (will be inserted by the editor) Rui Fan Nancy Lynch Gradient Clock Synchronization the date of receipt and acceptance should be inserted later Abstract We introduce the distributed

More information

Chapter 5 CMOS Logic Gate Design

Chapter 5 CMOS Logic Gate Design Chapter 5 CMOS Logic Gate Design Section 5. -To achieve correct operation of integrated logic gates, we need to satisfy 1. Functional specification. Temporal (timing) constraint. (1) In CMOS, incorrect

More information

Chapter 3. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 3 <1>

Chapter 3. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 3 <1> Chapter 3 Digital Design and Computer Architecture, 2 nd Edition David Money Harris and Sarah L. Harris Chapter 3 Chapter 3 :: Topics Introduction Latches and Flip-Flops Synchronous Logic Design Finite

More information

Itanium TM Processor Clock Design

Itanium TM Processor Clock Design Itanium TM Processor Design Utpal Desai 1, Simon Tam, Robert Kim, Ji Zhang, Stefan Rusu Intel Corporation, M/S SC12-502, 2200 Mission College Blvd, Santa Clara, CA 95052 ABSTRACT The Itanium processor

More information

CMPUT 675: Approximation Algorithms Fall 2014

CMPUT 675: Approximation Algorithms Fall 2014 CMPUT 675: Approximation Algorithms Fall 204 Lecture 25 (Nov 3 & 5): Group Steiner Tree Lecturer: Zachary Friggstad Scribe: Zachary Friggstad 25. Group Steiner Tree In this problem, we are given a graph

More information

Exam Spring Embedded Systems. Prof. L. Thiele

Exam Spring Embedded Systems. Prof. L. Thiele Exam Spring 20 Embedded Systems Prof. L. Thiele NOTE: The given solution is only a proposal. For correctness, completeness, or understandability no responsibility is taken. Sommer 20 Eingebettete Systeme

More information

Price of Stability in Survivable Network Design

Price of Stability in Survivable Network Design Noname manuscript No. (will be inserted by the editor) Price of Stability in Survivable Network Design Elliot Anshelevich Bugra Caskurlu Received: January 2010 / Accepted: Abstract We study the survivable

More information

Power Dissipation. Where Does Power Go in CMOS?

Power Dissipation. Where Does Power Go in CMOS? Power Dissipation [Adapted from Chapter 5 of Digital Integrated Circuits, 2003, J. Rabaey et al.] Where Does Power Go in CMOS? Dynamic Power Consumption Charging and Discharging Capacitors Short Circuit

More information

Very Large Scale Integration (VLSI)

Very Large Scale Integration (VLSI) Very Large Scale Integration (VLSI) Lecture 4 Dr. Ahmed H. Madian Ah_madian@hotmail.com Dr. Ahmed H. Madian-VLSI Contents Delay estimation Simple RC model Penfield-Rubenstein Model Logical effort Delay

More information

ECE 407 Computer Aided Design for Electronic Systems. Simulation. Instructor: Maria K. Michael. Overview

ECE 407 Computer Aided Design for Electronic Systems. Simulation. Instructor: Maria K. Michael. Overview 407 Computer Aided Design for Electronic Systems Simulation Instructor: Maria K. Michael Overview What is simulation? Design verification Modeling Levels Modeling circuits for simulation True-value simulation

More information

Luis Manuel Santana Gallego 71 Investigation and simulation of the clock skew in modern integrated circuits. Clock Skew Model 1

Luis Manuel Santana Gallego 71 Investigation and simulation of the clock skew in modern integrated circuits. Clock Skew Model 1 Luis Manuel Santana Gallego 71 Appendix 1 Clock Skew Model 1 Steven D. Kugelmass, Kenneth Steiglitz [KUG-88] 1. Introduction The accumulation of clock skew, the differences in arrival times of signal in

More information

CMOS logic gates. João Canas Ferreira. March University of Porto Faculty of Engineering

CMOS logic gates. João Canas Ferreira. March University of Porto Faculty of Engineering CMOS logic gates João Canas Ferreira University of Porto Faculty of Engineering March 2016 Topics 1 General structure 2 General properties 3 Cell layout João Canas Ferreira (FEUP) CMOS logic gates March

More information

Logic BIST. Sungho Kang Yonsei University

Logic BIST. Sungho Kang Yonsei University Logic BIST Sungho Kang Yonsei University Outline Introduction Basics Issues Weighted Random Pattern Generation BIST Architectures Deterministic BIST Conclusion 2 Built In Self Test Test/ Normal Input Pattern

More information

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

ESE 570: Digital Integrated Circuits and VLSI Fundamentals ESE 570: Digital Integrated Circuits and VLSI Fundamentals Lec 24: April 19, 2018 Crosstalk and Wiring, Transmission Lines Lecture Outline! Crosstalk! Repeaters in Wiring! Transmission Lines " Where transmission

More information

Problem Set 9 Solutions

Problem Set 9 Solutions CSE 26 Digital Computers: Organization and Logical Design - 27 Jon Turner Problem Set 9 Solutions. For each of the sequential circuits shown below, draw in the missing parts of the timing diagrams. You

More information

10/12/2016. An FSM with No Inputs Moves from State to State. ECE 120: Introduction to Computing. Eventually, the States Form a Loop

10/12/2016. An FSM with No Inputs Moves from State to State. ECE 120: Introduction to Computing. Eventually, the States Form a Loop University of Illinois at Urbana-Champaign Dept. of Electrical and Computer Engineering An FSM with No Inputs Moves from State to State What happens if an FSM has no inputs? ECE 120: Introduction to Computing

More information

Digital Integrated Circuits Designing Combinational Logic Circuits. Fuyuzhuo

Digital Integrated Circuits Designing Combinational Logic Circuits. Fuyuzhuo Digital Integrated Circuits Designing Combinational Logic Circuits Fuyuzhuo Introduction Digital IC Dynamic Logic Introduction Digital IC EE141 2 Dynamic logic outline Dynamic logic principle Dynamic logic

More information

EE141-Fall 2011 Digital Integrated Circuits

EE141-Fall 2011 Digital Integrated Circuits EE4-Fall 20 Digital Integrated Circuits Lecture 5 Memory decoders Administrative Stuff Homework #6 due today Project posted Phase due next Friday Project done in pairs 2 Last Lecture Last lecture Logical

More information

Utilizing Redundancy for Timing Critical Interconnect

Utilizing Redundancy for Timing Critical Interconnect 1 Utilizing Redundancy for Timing Critical Interconnect Shiyan Hu, Qiuyang Li, Jiang Hu, Peng Li Abstract Conventionally, the topology of signal net routing is almost always restricted to Steiner trees,

More information

PCB Project: Measuring Package Bond-Out Inductance via Ground Bounce

PCB Project: Measuring Package Bond-Out Inductance via Ground Bounce PCB Project: Measuring Package Bond-Out Inductance via Ground Bounce Kylan Roberson July 9, 014 Abstract In this experiment I looked into a way of measuring the ground bounce generated by capacitively

More information

Static CMOS Circuits. Example 1

Static CMOS Circuits. Example 1 Static CMOS Circuits Conventional (ratio-less) static CMOS Covered so far Ratio-ed logic (depletion load, pseudo nmos) Pass transistor logic ECE 261 Krish Chakrabarty 1 Example 1 module mux(input s, d0,

More information

Design of Control Modules for Use in a Globally Asynchronous, Locally Synchronous Design Methodology

Design of Control Modules for Use in a Globally Asynchronous, Locally Synchronous Design Methodology Design of Control Modules for Use in a Globally Asynchronous, Locally Synchronous Design Methodology Pradnya Deokar Department of Electrical and Computer Engineering, VLSI Design Research Laboratory, Southern

More information

! Crosstalk. ! Repeaters in Wiring. ! Transmission Lines. " Where transmission lines arise? " Lossless Transmission Line.

! Crosstalk. ! Repeaters in Wiring. ! Transmission Lines.  Where transmission lines arise?  Lossless Transmission Line. ESE 570: Digital Integrated Circuits and VLSI Fundamentals Lec 24: April 19, 2018 Crosstalk and Wiring, Transmission Lines Lecture Outline! Crosstalk! Repeaters in Wiring! Transmission Lines " Where transmission

More information

Problems in VLSI design

Problems in VLSI design Problems in VLSI design wire and transistor sizing signal delay in RC circuits transistor and wire sizing Elmore delay minimization via GP dominant time constant minimization via SDP placement problems

More information

LECTURE 28. Analyzing digital computation at a very low level! The Latch Pipelined Datapath Control Signals Concept of State

LECTURE 28. Analyzing digital computation at a very low level! The Latch Pipelined Datapath Control Signals Concept of State Today LECTURE 28 Analyzing digital computation at a very low level! The Latch Pipelined Datapath Control Signals Concept of State Time permitting, RC circuits (where we intentionally put in resistance

More information

VLSI Design Verification and Test Simulation CMPE 646. Specification. Design(netlist) True-value Simulator

VLSI Design Verification and Test Simulation CMPE 646. Specification. Design(netlist) True-value Simulator Design Verification Simulation used for ) design verification: verify the correctness of the design and 2) test verification. Design verification: Response analysis Specification Design(netlist) Critical

More information

Lecture 23. Dealing with Interconnect. Impact of Interconnect Parasitics

Lecture 23. Dealing with Interconnect. Impact of Interconnect Parasitics Lecture 23 Dealing with Interconnect Impact of Interconnect Parasitics Reduce Reliability Affect Performance Classes of Parasitics Capacitive Resistive Inductive 1 INTERCONNECT Dealing with Capacitance

More information

Lecture 6: Greedy Algorithms I

Lecture 6: Greedy Algorithms I COMPSCI 330: Design and Analysis of Algorithms September 14 Lecturer: Rong Ge Lecture 6: Greedy Algorithms I Scribe: Fred Zhang 1 Overview In this lecture, we introduce a new algorithm design technique

More information

Robust Network Codes for Unicast Connections: A Case Study

Robust Network Codes for Unicast Connections: A Case Study Robust Network Codes for Unicast Connections: A Case Study Salim Y. El Rouayheb, Alex Sprintson, and Costas Georghiades Department of Electrical and Computer Engineering Texas A&M University College Station,

More information

Linear Programming Redux

Linear Programming Redux Linear Programming Redux Jim Bremer May 12, 2008 The purpose of these notes is to review the basics of linear programming and the simplex method in a clear, concise, and comprehensive way. The book contains

More information

Skew-Tolerant Circuit Design

Skew-Tolerant Circuit Design Skew-Tolerant Circuit Design David Harris David_Harris@hmc.edu December, 2000 Harvey Mudd College Claremont, CA Outline Introduction Skew-Tolerant Circuits Traditional Domino Circuits Skew-Tolerant Domino

More information

CAPACITOR PLACEMENT IN UNBALANCED POWER SYSTEMS

CAPACITOR PLACEMENT IN UNBALANCED POWER SYSTEMS CAPACITOR PLACEMET I UBALACED POWER SSTEMS P. Varilone and G. Carpinelli A. Abur Dipartimento di Ingegneria Industriale Department of Electrical Engineering Universita degli Studi di Cassino Texas A&M

More information

Timing Constraints in Sequential Designs. 63 Sources: TSR, Katz, Boriello & Vahid

Timing Constraints in Sequential Designs. 63 Sources: TSR, Katz, Boriello & Vahid Timing Constraints in Sequential esigns 63 Sources: TSR, Katz, Boriello & Vahid Where we are now. What we covered last time: FSMs What we ll do next: Timing constraints Upcoming deadlines: ZyBook today:

More information

HIGH-PERFORMANCE circuits consume a considerable

HIGH-PERFORMANCE circuits consume a considerable 1166 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL 17, NO 11, NOVEMBER 1998 A Matrix Synthesis Approach to Thermal Placement Chris C N Chu D F Wong Abstract In this

More information

ESE570 Spring University of Pennsylvania Department of Electrical and System Engineering Digital Integrated Cicruits AND VLSI Fundamentals

ESE570 Spring University of Pennsylvania Department of Electrical and System Engineering Digital Integrated Cicruits AND VLSI Fundamentals University of Pennsylvania Department of Electrical and System Engineering Digital Integrated Cicruits AND VLSI Fundamentals ESE570, Spring 017 Final Wednesday, May 3 4 Problems with point weightings shown.

More information

Simultaneous Buffer and Wire Sizing for Performance and Power Optimization

Simultaneous Buffer and Wire Sizing for Performance and Power Optimization Simultaneous Buffer and Wire Sizing for Performance and Power Optimization Jason Cong, Cheng-Kok Koh cong, kohck @cs.ucla.edu Computer Science Dept., UCLA Kwok-Shing Leung ksleung@ichips.intel.com Intel

More information

Methodology to Achieve Higher Tolerance to Delay Variations in Synchronous Circuits

Methodology to Achieve Higher Tolerance to Delay Variations in Synchronous Circuits Methodology to Achieve Higher Tolerance to Delay Variations in Synchronous Circuits Emre Salman and Eby G. Friedman Department of Electrical and Computer Engineering University of Rochester Rochester,

More information

Fault Collapsing in Digital Circuits Using Fast Fault Dominance and Equivalence Analysis with SSBDDs

Fault Collapsing in Digital Circuits Using Fast Fault Dominance and Equivalence Analysis with SSBDDs Fault Collapsing in Digital Circuits Using Fast Fault Dominance and Equivalence Analysis with SSBDDs Raimund Ubar, Lembit Jürimägi (&), Elmet Orasson, and Jaan Raik Department of Computer Engineering,

More information

EECS 312: Digital Integrated Circuits Midterm Exam 2 December 2010

EECS 312: Digital Integrated Circuits Midterm Exam 2 December 2010 Signature: EECS 312: Digital Integrated Circuits Midterm Exam 2 December 2010 Robert Dick Show your work. Derivations are required for credit; end results are insufficient. Closed book. No electronic mental

More information

Luis Manuel Santana Gallego 31 Investigation and simulation of the clock skew in modern integrated circuits

Luis Manuel Santana Gallego 31 Investigation and simulation of the clock skew in modern integrated circuits Luis Manuel Santana Gallego 31 Investigation and simulation of the clock skew in modern egrated circuits 3. Clock skew 3.1. Definitions For two sequentially adjacent registers, as shown in figure.1, C

More information

MODULE III PHYSICAL DESIGN ISSUES

MODULE III PHYSICAL DESIGN ISSUES VLSI Digital Design MODULE III PHYSICAL DESIGN ISSUES 3.2 Power-supply and clock distribution EE - VDD -P2006 3:1 3.1.1 Power dissipation in CMOS gates Power dissipation importance Package Cost. Power

More information