Modelling and Compensating for Clock Skew Variability in FPGAs

Size: px
Start display at page:

Download "Modelling and Compensating for Clock Skew Variability in FPGAs"

Transcription

1 Modelling and Compensating for Clock Skew Variability in FPGAs Pete Sedcole, Justin S. Wong and Peter Y. K. Cheung Department of Electrical & Electronic Engineering, Imperial College London South Kensington campus, London SW7 2AZ, UK Abstract As integrated circuits are scaled down it becomes difficult to maintain uniformity in process parameters across each individual die. To avoid significant performance loss through pessimistic over-design new design strategies are required that are cognisant of within-die performance variability. This paper examines the effect of process variability on the clock resources in FPGA devices. A model of variation in clock skew in FPGA clock networks is presented. Techniques for reducing the impact of variations on the performance of implemented designs are proposed and analysed, demonstrating that skew variation can be reduced by 70% or more through a combination of phase adjustment and clock rerouting. Measurements on a Virtex-5 FPGA validate the feasibility and benefits of the proposed compensation strategies. 1. Introduction The fabrication of integrated circuits involves processes and materials that cannot be perfectly controlled. Manufacturing variations result in devices where performance and power consumption varies, both between dice and, more recently, between circuit elements within a single die. This variability is expected to increase as transistor sizes are scaled down [1]. Field-Programmable Gate Arrays (FPGAs), often on the cutting edge of technology scaling, are susceptible to process and material variations, possibly more than other highperformance integrated circuits. Unlike ASICs, the critical paths of the circuit the FPGA implements is not known until after fabrication, which results in particularly pessimistic circuit timing. Since variability cannot be eliminated by improving the fabrication process, new design techniques are required that are aware of and manage the variability. In our previous work, we reported on measurements of logic and routing variation in FPGAs using both ring oscillators [2] and an improved atspeed testing method [3]. We have also developed techniques for quantifying the variability in clock skew within FPGAs [4], which indicated that clock skew variability is comparable to logic path delay variability. With the knowledge gained from the experimental work in [4], this paper proposes a model to predict the effect of withindie parameter variations on FPGA clock networks. Because of the flexibility required in the clock routing within an FPGA, the structure of the clock network is substantively different to an ASIC clock tree, and is affected differently by variability. The model predicts the variation in the clock skew between any two register locations. An accurate model of the clock skew variation is beneficial, as it allows timing tools to reduce the required guard-band for the skew. Furthermore, we propose post-configuration compensation techniques to reduce the impact of clock skew variability, enabling more aggressive timing to be achieved. These are analysed using the clock skew variation model. The feasibility of the techniques is demonstrated by experimental measurements from a Xilinx Virtex-5 FPGA. 2. Background 2.1 Related work The study of the effect of process variability on clock trees has been previously examined in ASIC devices. This include work employing Monte Carlo simulations [5], [6] as well as approaches based on canonical or numerical analysis of the classical H-tree clock structure [7], [8]. Unlike an FPGA clock network, which is fixed (although programmable), in ASICs the clock tree design and routing can be optimised to the application before fabrication. By including awareness of variability into the optimisation process, the impact of variation can be reduced. For example, Venkataraman, Sze and Hu have investigated skew scheduling and clock routing incorporating variability awareness [9]. Rajaram and Pan describe a technique for reducing skew variation by inserting cross-links in the clock tree [10]. Skew variation may be corrected post-fabrication by using active de-skewing techniques, commonly employing elements in the clock tree with adjustable delays [11], [12], [13]. This technique has recently been investigated for FPGAs [14], [15]. The only published work to date on FPGA clock variability is our previous report on the measurement of skew variability [4]. An in-depth analysis of the impact of variability on FPGA clock trees is so far lacking in the literature. 2.2 FPGA clock trees The clock network in an integrated circuit is generally designed to manage the skew between any two points in the device. A design with zero nominal skew can be achieved by employing the well-known H-tree structure. An FPGA clock network must balance the minimal-skew requirement with sufficient flexibility to implement the clocking requirements of many different circuits. Inevitability, providing this flexibility

2 Programmable branch buffer Switch block Central buffer Programmable quadrant buffer Clock octant Fig. 2. The clock tree structure in a Stratix-II type of device. The structure is based on an H-tree, resulting in clock octants regardless of the size of the device. U 11 Source register u U 12 U i =V i fori = 1..7 Signal path Fig. 1. The clock tree structure in a Virtex-4/5 type of device. The device is divided into a number of fixed-sized clock regions. reduces the symmetry in the clock distribution, which has implications for the sensitivity of the clock to variations. Clock networks in FPGAs generally come in two flavours. A spine-and-branch approach is typified by the Xilinx Virtex- 4 [16] and Virtex-5 devices [17], and is represented by the diagram in Fig. 1. The clock is distributed on a hierarchical network of linear spines, where each spine taps directly off the higher-level spine. In the Virtex-4 and -5 architectures, all clock regions are of equal size: larger devices have a higher number of separate clock regions. The Stratix-II [18] and -III [19] devices from Altera favour a structure that resembles the traditional H-tree design, as shown in Fig. 2. Again, the structure is hierarchical: the higher levels of the hierarchy use an H-tree network, which minimises delay differences. At the lower levels, the clock is distributed to rows of logic blocks along linear branches. With this structure the device is divided into clock octants (or sixteen parts for the Stratix-III) regardless of the size of the device. Although the clock networks in Altera devices are more balanced than in those from Xilinx, FPGAs from both vendors exhibit definite differences in clock routing delay across the chip. Point-to-point clock skew (as reported by vendor timing tools) is typically of the order of hundreds of picoseconds in mid-range devices. In all cases, the clock network comprises duplicate resources to enable multiple clocks to be distributed throughout the device. A Virtex-5 XC5VLX50 device, for example, has 32 central buffers each of which drives a separate vertical spine, U 10 U 9 U 8 V 8 V 9 U 7 U 6 U 5 V 10 V 11 V 12 V 13 U 4 U 3 U 2 U 1 = V 1 Unit length wire v Destination register Fig. 3. Example of clock routing to two spatially separated registers at locations u and v. The first seven labelled resources are shared in this example. and each region has 10 horizontal spine and branch lines [17]. Hierarchical levels are connected by some form of crossbar switch, and buffers at any level in general can be disabled to reduce dynamic power dissipation. In addition to the global clocking network, FPGAs also have available regional clock buffers and distribution networks. These are not considered in this paper, and will be the subject of future work. 3. Clock Network Variation 3.1 Model In order to gain a greater comprehension of the effects of variability on the clock skew, an analysis of delay variations in the clock network is presented in this section. The outcome of the analysis is a model, which is then used in Section 4 to study strategies that compensate for clock skew variability. Consider two register locations in an FPGA, placed at positions u and v. As shown in Fig. 3, the clock is routed to each location along the dedicated clock resources, and the

3 resources may be shared for some part of the routing. In order to model the spatial correlation in delay variation, each wire segment is divided into unit lengths. The unit length is arbitrary, although the accuracy of the model will be better with a smaller unit length. The deviation from nominal delay along a wire unit, and through each buffer, from the source to location u is described with a variable U i. Similarly, the variation in delays from the source to v along the clock tree is modelled with variables V i. Note that all variables U i, V i have zero mean. The actual clock skew between locations u and v is s(u, v) = s 0 (u, v) + i w i U i i w i V i (1) where s 0 (u, v) is the nominal clock skew. The summations exclude the variables corresponding to shared clock resources (such as U 1 to U 7 and V 1 to V 7 in the example of Fig. 3) as they do not contribute to the skew between u and v. The variable w i is a weighting, equal to 1 for buffers and proportional to the wire segment length for wire units. The values of the weights are determined in the next section. The variance in skew is: [ Var [s(u, v)] = Var w i U i ] w i V i (2) i i This can be expanded: Var [s(u, v)] = i w 2 i Var [U i ] + i i j w i w j cov [U i, U j ] + i j 2 i,j w 2 i Var [V i ]+ w i w j cov [V i, V j ] w i w j cov [U i, V j ] (3) The variance of the clock skew between two locations in the FPGA can therefore be calculated from the covariance matrix of the buffer and wire unit delays of the clock tree routing. It is necessary to determine the covariance between each buffer and wire delay. There are three cases to consider: the covariance between two buffers, the covariance between two wire units, and the covariance between a buffer and a wire. Buffer-buffer: Where U i and U j correspond to buffer delay variation, we assume a homogeneous and isotropic spatial correlation function, ρ b (d), which only depends on the distance d between the two buffers. This assumption is common in the literature (e.g., [20]). The covariance is then simply: cov[u i, U j ] = σ Ui σ Uj ρ b (d) (4) Wire-wire: For the case where U i and U j both correspond to wire delays, a similar assumption is made for the correlation in delay variation. In this case the spatial correlation function is ρ w (d), where d is the distance between the mid-points of the two wire units. Thus cov[u i, U j ] = σ Ui σ Uj ρ w (d) (5) Buffer-wire: The model assumes that there is no spatial correlation between buffer delay variation and wire delay variation. This is reasonable, since variation in buffer delay is the result of FEOL processes 1, whereas wire delay variation is a consequence of BEOL 2 process variation. Therefore where U i and U j correspond to a buffer delay and a wire delay: cov[u i, U j ] = 0 (6) It should be noted that similar equations can be used to express the covariance in the clock routing to v and the covariance between clock trees (cov[v i, V j ] and cov[u i, V j ] respectively). 3.2 Weights We now determine the values to assign to the weights w i, based on the Elmore delay of a wire [21]. Recall that in the Elmore delay model, the total resistance and capacitance of a wire is divided into a finite number N of distributed resistances and capacitances R i and C i, i = 1,..., N. For the case of interest, each R i and C i are random variables and correspond to the unit lengths described earlier. We define a time constant X i of each unit length of wire X i = R i C i, such that U i = X i E[X i ]. Note that dxi du i = 1. The propagation delay of the wire is given by [21]: N N t w = R i C j (7) i=1 We are interested in the sensitivity of a change in a variable U k to the overall propagation delay of the wire. Therefore we calculate the partial derivative: j=i t w = t w dx k (8) U k X k du k = k 1 N C k R i + R k C k + R k C j (9) X k i=1 = 1 k 1 R i R k C k i=1 N j=k+1 j=k+1 C j (10) This value is also a random statistic. We can calculate the mean of this value by taking the expected value, noting that E[R i ] = R and E[C i ] = C: [ ] [ ] k 1 tw 1 [ ] 1 N E = E R+1+E C N (11) U k R k C k i=1 j=k+1 We see that variation in a wire unit will cause variation in total delay relative to the number of wire units in that segment. In other words, the variation in delay of the wire increases superlinearly with length. This makes intuitive sense, since the wire delay also increases superlinearly with length. 1 Front-End-Of-the-Line, the fabrication steps involving the patterning of silicon. 2 Back-End-Of-the-Line, the fabrication steps for depositing metal layers.

4 (a) Virtex-5 style device, high correlation in spatial variation. (b) Virtex-5 style device, low correlation in spatial variation. (c) Stratix-III style device, high correlation in spatial variation. (d) Stratix-III style device, low correlation in spatial variation. Fig. 4. Clock skew variation modelling results. Two types of device are modelled, one with a spine-and-branch clock network, Virtex-5 style, and one with an H-tree clock network, Stratix-III style. The variance in clock skew relative to a fixed location (25,5) is computed for both high and low spatial correlation. Using the variance values the 3-σ guard-banding values are plotted as a function of location. The z-scale of the plots are in units of the standard deviation in delay of one clock buffer. The weight w i for a wire unit is set to the total number of units in the segment. This weighting only applies to wire units, so for buffers w i = Case study The model derived above can be used to calculate the expected variance of the clock skew between any two locations on the FPGA. Conventionally, variation in delay or skew is accounted for by guard-bands: margins added to the nominal delay or skew to allow for the worst-case variation. Typically a margin of three times the standard deviation is used for the guard-band. Thus, if the clock skew has a standard deviation σ of 100ps, the guard-band will be ±300ps. Fig. 4 shows some 3σ guard-bands calculated using the proposed model above for two different devices types, corresponding to the two FPGA clock-tree styles discussed in Section 2.2. For each of the two device types, two different levels of spatially-correlated variation are modelled. For the high correlation model, the spatial correlation functions ρ b (d) and ρ w (d) fall as d 0.3 and asymptote to 0.2. For the low correlation model, ρ b (d) and ρ w (d) fall as d 2 and asymptote to 0.1 The total level of variability is set to σ U = 10% of delay for buffers and σ U = 5% for wire units. The plots are calculated assuming the register at one end of a signal path has been placed at location (25, 5) in the FPGA. The required guard-band to add to the clock skew for the second register is location-dependent. As expected, if the two end-point registers are placed within the same clock region, the required guard-banding is lower than if they are placed further apart. Although there are differences between the spine-andbranch and the H-tree clock distribution schemes, the total level of variability in the clock skew remains broadly similar for devices of the same size. Note also that where the variation has low spatial correlation, the necessary guard-banding is less position-dependent (the plots are flatter ), as would be expected, although it is still advantageous to place both registers within the same clock region. The model can be used during place-and-route to provide more aggressive timing than would be possible by using a single global guard-band value for skew. The model calculations are computationally non-complex and could be computed as necessary during place-and-route. Alternatively, to avoid extra time overhead during place-and-route, the guard-band values could be pre-computed for various register locations and approximations used during place-and-route. 4. Variation Compensation In this section we propose methods to mitigate variability in clock skew. The effectiveness of the methods are studied

5 TABLE I MODEL PARAMETERS Model parameter Value Logic block rows 80 Logic block columns 40 Buffer delay µ = 1.0, σ = 10% Wire unit delay µ = 0.1, σ = 5% High spatial corr. function ρ(d) = 0.3d Low spatial corr. function ρ(d) = 0.1d by modifying the model of Section 3.1, and by experiments on a Xilinx Virtex-5 XC5VLX50 FPGA. 4.1 Clock phase adjustment Modern high-end FPGAs include several very flexible clock generating resources, such as PLLs and Digital Clock Managers [17], [19]. In both Stratix and Virtex devices, in addition to being capable of synthesizing many clock frequencies, these clock generators are also able to produce phase-shifted clocks where the amount of phase-shifting can be changed at runtime. Using this capability, it is possible to generate an additional clock of the same frequency as the main clock but phaseadjusted to compensate for skew variations. The amount of phase adjustment can be tuned for each FPGA. Since this requires an additional DCM/PLL to generate the second clock, it is only possible if there are unused DCMs/PLLs in the FPGA. Although this technique can compensate exactly for the skew variation between any two particular register locations, it clearly cannot achieve this for all paths, as this would require a DCM/PLL for every register on the FPGA. A practical approach is to compensate for the random skew variation between two clock regions, by supplying one of the regions with a phase-adjusted clock tuned to compensate for the average offset in skew between the two regions. This technique we call regional phase compensation. A further improvement may be possible by constraining the placement of registers within each region. If registers are placed close together they are more likely to experience the same variation in clock skew. Therefore, by placing all source and sink registers of critical paths between the two regions close together, the phase adjustment can be more finely tuned to the local variation. We term this local phase compensation. It is necessary to modify the model of Section 3.1 to include these adaptations. This is relatively trivial. Examining Fig. 3, it can be seen that phase compensation will cause the variation in skew between the two divergent branches of the clock tree to be exactly cancelled up to some fixed point along each branch (for example, up to just after U 9 and V 9 ). When calculating the variance of the phase compensated technique, it is sufficient to disregard the terms corresponding to the clock tree before the compensation points. Note that there will be an increase in power consumption by using spare DCM/PLL resources. If there are no such spare resources, or the power overhead is unacceptable, gains may still be made by splitting the main clock and routing it through two central clock buffers. Stochastic differences in the (a) After regional skew correction by clock phase adjustment. (b) After local skew correction by clock phase adjustment. Fig. 5. Required skew guard-bands after compensating for skew variation with dual phase-adjusted clocks, based on a model of a Virtex-5 style FPGA with high correlation in spatial variation. Assumes a source register placed at location (25, 5). Guard-band values are again plotted relative to clock buffer standard deviation in delay. buffer delays will produce a phase shift in the two resulting clocks. The phase shift will not be controllable however, so the effectiveness of this approach is limited. The results of the modified clock skew model are shown in Fig. 5. Again, one register is fixed at location (25, 5). The guard-banding required when two registers are supplied by phase-adjusted clocks is plotted as a function of placement location of the second register. Fig. 5(a) shows the case where the clock phases are adjusted to cancel regional variations in skew. Fig. 5(b) is an example of the more aggressive local phase compensation. This assumes that all registers for critical paths between regions are placed within 3 3 logic blocks in each region. The graphs can be compared to the baseline case in Fig. 4(a). The regional phase compensation scheme reduces the guard-band by up to 42%, and the local phase compensation reduces the guard-band by up to 49%. Both schemes are most effective for registers placed a long way apart. 4.2 Clock resource re-routing As mentioned in Section 2.2, the buffers and wires that are used for clock signal routing in FPGAs are duplicated at each level, to provide flexibility and to allow multiple clocks to be distributed. Stochastic variations in the buffers, wires and switches will cause each duplicate resource to exhibit

6 different delays. It is possible to use these differences, given a particular FPGA and one clock net of interest, by selecting a clock routing which gives the most optimal clock skew. As an example, consider the Virtex-5 FPGA from Xilinx. In this device there are 10 nominally identical horizontal clock spines per region. At each register there is a multiplexer which determines which clock spine is connected to the clock input of the register. The clock signal can be routed on all 10 clock lines simultaneously, and the best signal selected at each register by reconfiguring the multiplexer. The best or most optimal signal may be the signal with the closest to nominal skew. Alternatively, a clock signal with a deviation in skew could be selected to compensate for reduced slack caused by path delay variations. Nominal skew objective: By choosing the signal with the closest-to-nominal skew, the skew variance will be reduced and therefore the required guard-band will also be smaller. To include this in the model, we need to quantify the effect of selection on the skew variance. Firstly, note that the duplicated clock resources are physically close together, so will exhibit the same correlated delay variation. The difference in skew of N duplicated resources is therefore a stochastic quantity of zero mean, which we will denote by the random variable X i, i = 1,..., N. Assuming that X i is approximately normally distributed with variance σ 2, its probability density can be described by ( ) x P( x < X i < x) = erf (12) 2σ where erf(x) is the error function. Let us label the value of X i which is closest to zero by Y. It is straightforward to show that: f Y (x) = P(Y = x) = N ( ) [ ( )] x 2 N 1 x exp 2πσ 2σ 2 1 erf (13) 2σ The variance of Y is defined as Var[Y ] = x2 f Y (x)dx which, while not possible to solve analytically, can be computed numerically. For N = 10, the variance Var[Y ] = σ 2. This is applied to the model of (3) by scaling the variance terms corresponding to the duplicated resources. The covariance terms remain the same. Positive skew objective: For a given register, instead of selecting the clock routing that gives the most nominal skew, one may choose to select the routing that gives the most positive skew. This will yield the most slack for paths that end at that register, although at the expense of slack for paths originating at the register. In this case, we select the maximum value of X i, which is the order statistic X N. The variance values in the model of (3) will be replaced by Var[X N ], and the guard-band will be reduced by E[X N ]. Order statistics have been extensively studied; mean and variance tables are readily available, such as in [22]. (a) Nominal skew objective. (b) Most positive skew objective. Fig. 6. Guard-bands after compensating for skew by clock phase adjustments and clock re-routing, for a high amount of spatially correlated delay. Results from the modified models for the clock resource rerouting strategies are plotted in Fig. 6. Both models assume that regional differences in phase are compensated for by the clock phase adjustment described above, and then the best of 10 available regional clock trees are used to route the clock signal. The graphs should therefore be compared to Fig. 5(a). By choosing the resources which give the nearest to nominal skew, the guard-band can be reduced by an additional 10% to 40% over regional phase compensation alone. The benefit is greatest when the two registers are placed close together. The most positive skew objective yields improvements of 30% to 90% additional reduction in guard-band compared with regional phase compensation. The results in Fig. 5(a) and Fig. 5(b) are based on a high level of spatial correlation. The model has also been used to investigate the situation where the delay variation is more stochastic. The results are broadly similar. The guard-band result for the clock resource re-routing for nominal skew is shown in Fig. 7 as an example. Compared to the highly correlated variation case of Fig. 6(a) the method offers less of an improvement for closely-spaced registers, and overall the guard-band has less locational dependence, as would be expected. 4.3 Experimental results In order to validate the feasibility of the proposed skew variability compensation techniques, experiments have been

7 Fig. 7. Guard-band after compensating for skew by clock phase adjustments and clock re-routing. The model assumes low spatial correlation in delay variation and the clock re-routing targets nominal skew. 4 possible central buffer locations Phase adjust Clock generation 4 9 possible regional buffers Down paths Launch Test path Capture x16 9 Up paths Capture Test path Launch Fig. 8. A simplified diagram of the test circuitry used in the Virtex-5 experiment. Two clock regions ( top and bottom ) are supplied with separate clocks. The phase offset between the clocks can be adjusted dynamically. A total of 32 paths connect the two regions, 16 in either direction. performed on a Xilinx Virtex-5 XC5VLX50-1 FPGA. These are designed to determine whether or not it is possible to change the clock phase for a region to compensate for skew variation, and if different parallel clock resources do actually exhibit different delays. A simplified diagram of the test circuitry used is shown in Fig. 8. Two clock regions in the FPGA were supplied with separate clocks of the same frequency, where the phase offset between the two clocks can be adjusted dynamically. The phase adjustment was achieved using the Virtex embedded Digital Clock Managers [17]. Thirty-two paths were placed and routed in the FPGA between the two clock regions, 16 in each direction. Paths originating in the lower of the two regions are termed up paths, the others down paths. The observable delay of each path was able to be accurately measured using the method reported in [3]. The observed delay of the path in reality is the sum of the path propagation delay and the clock skew between the start and the end registers of the path. An additional 192 paths were placed and routed in other regions of the FPGA, and were used for calibrating the measurements for environmental changes. The experiment involved measuring the observable delay of the 32 test paths for different clock phase offsets, and when different clock resources were used to route the clock of x16 Fig. 9. Empirical measurements and post-calibration values of observed path delay for all 32 paths (16 up and 16 down ) under test. Each path is measured 36 times, each time with a different combination of central buffer location and regional clock routing. the top-most region. Since the paths under test are invariant, any change in observed delay is therefore actually caused by changes in clock skew. The raw measured path delays for all 36 combinations of clock routing are plotted in the left half of Fig. 9. It can be seen that changing the resources the clock is routed on causes a change in measured delay of up to ±50ps. This is significant when compared to the variation in LUT delay, which has a standard deviation of approximately 11ps in this device [4]. The mean measured path delay is 3705ps. Note that there is a difference in the ensemble measurements of the up paths (3807ps) compared to the down paths (3603ps). There are also differences in delays between paths within the up group and within the down group. These differences are partially due to process variability and partially due to differences in the placement and routing of each path. Since we are interested in compensating for clock skew variation, it is necessary to calibrate the initial data-set to produce a set of values where the the effect of other sources of variation in the delay have been removed. The measurements were first calibrated to remove expected differences in delay using the path and skew timing reported by the vendor timing tools. The resulting values for the delay of each path were then shifted towards the mean to counteract the variance introduced by the LUT in each path. The resulting post-calibrated values, plotted in the right half of Fig. 9, are somewhat artificial but realistic. The effect of different experiments are summarised in Fig. 10. The graph shows the timing offset (degradation) of the slowest path for a given test, relative to the case of nil variation. Nil variation is estimated as the mean of all calibrated delays. In order to gain an insight into how the degree of connectivity between regions affects the results, three bars are shown in each experiment: the case where the regions are connected by just one path in each direction, as well as for four paths and sixteen paths. To give a meaningful sense of scale to the results, standard deviations of LUT delay, σ L, are also plotted on the graph. Using the initial assignment of clock resources and no phase

8 Fig. 10. The observed delay of the slowest path using different compensation techniques. The delay is plotted relative to the nil-variation baseline. Different numbers of paths are considered: 1, 4 and 16 paths in either direction. A scale of LUT delay standard deviations is also plotted for reference. correction, the slowest path delay is degraded by over 10σ L compared to the case of zero skew variation. This is mainly due to the difference between the up and down path delays. By trying four different locations of the main clock buffer, but changing nothing else, this can be reduced by approximately half in this particular instance. A much greater improvement is possible by actively adjusting the clock phase between the two clock regions to cancel the difference in the up and down delays. Using this technique, the timing degradation is reduced to about 1 to 4σ L. The effectiveness of this technique is to some extent limited by the granularity of the phase adjustment possible using the Virtex- 5. With infinitely-adjustable phase, the improvement would be slightly better, as indicated by the Phase (ideal) results. The best result from this series of experiments came from a combination of phase adjustment and clock re-routing. By judicious selection of resources on which to route the clock to the top region, the effect of skew variation could be completely cancelled for the cases of one or four paths. Obviously, the experimental setup does not account for the negative impact on slack of other circuit paths by using this proposed approach. Nevertheless, it demonstrates the effectiveness the technique can have. 5. Conclusions The clock distribution network in FPGAs are substantially different to those in ASICs. The effect of process variability on clock skew, and approaches to mitigate such effects, must therefore also be different. This paper described a proposed clock skew variability model for FPGAs. The model can be used to predict guard-band requirements on clock skew. In addition, two techniques for compensating for skew variability were presented. These involved adjusting the phase of the clock between regions, and using the stochastic differences in duplicated clock resources to achieve better skew timings. Results predicted by the model show that these techniques could significantly reduce the skew guard-band. Phase adjustments alone reduced the guard-band by almost 50%; by additionally routing the clock through the optimal resources the guard-band could be reduced by 70% or more. A reduced skew guard-band ultimately yields better timing. The feasibility of the techniques were also verified experimentally using a Virtex-5 FPGA. Acknowledgements The authors wish to acknowledge the financial support of the EPSRC under Platform Grant EP/C549481/1. References [1] S. R. Nassif, Design for variability in DSM technologies, in Proc. IEEE International Symposium on Quality Electronic Design, [2] P. Sedcole and P. Y. K. Cheung, Within-die delay variability in 90nm FPGAs and beyond, in Proc. IEEE International Conference on Field Programmable Technology, [3] J. S. Wong, P. Sedcole, and P. Y. K. Cheung, Self-characterization of combinatorial circuit delays in FPGAs, in Proc. IEEE International Conference on Field Programmable Technology, [4] P. Sedcole, J. S. Wong, and P. Y. K. Cheung, Characterisation of FPGA clock variability, in Proc. International Symposium on Very Large Scale Integration, [5] V. Mehrotra and D. Boning, Technology scaling impact of variation on clock skew and interconnect delay, in International Interconnect Technology Conference, [6] S. Zanella, A. Nardi, A. Neviani, M. Quarantelli, S. Saxena, and C. Guardiani, Analysis of the impact of process variations on clock skew, IEEE Transactions on Semiconductor Manufacturing, vol. 13, no. 4, pp , Nov [7] A. Agarwal, V. Zolotov, and D. T. Blaauw, Statistical clock skew analysis considering intradie-process variations, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 23, no. 8, pp , Aug [8] M. Hashimoto, T. Yamamoto, and H. Onodera, Analysis of clock skew variation in H-tree structure, in Proc. IEEE International Symposium on Quality Electronic Design, [9] G. Venkataraman, C. N. Sze, and J. Hu, Skew scheduling and clock routing for improved tolerance to process variations, in Proc. Asia and South Pacific Design Automation Conference, [10] A. Rajaram and D. Z. Pan, Fast incremental link insertion in clock networks for skew variability reduction, in Proc. IEEE International Symposium on Quality Electronic Design, [11] A. Chakraborty, K. Duraisami, A. Sathanur, P. Sithambaram, A. Macii, E. Macii, M. Poncino, and L. Benini, Dynamic thermal clock skew compensation using tunable delay buffers, in Proc. International Symposium on Low Power Electronics and Design, [12] A. Kapoor, N. Jayakumar, and S. P. Khatri, A novel clock distribution and dynamic de-skewing methodology, in Proc. International Conference on Computer Aided Design, [13] J.-L. Tsai, L. Zhang, and C. C.-P. Chen, Statistical timing analysis driven post-silicon-tunable clock-tree synthesis, in Proc. International Conference on Computer Aided Design, [14] S. Sivaswamy and K. Bazargan, Statistical generic and chip-specific skew assignment for improving timing yield of FPGAs, in Proc. Field- Programmable Logic and Applications, [15], Statistical analysis and process variation-aware routing and skew assignment for FPGAs, ACM Transactions on Reconfigurable Technology and Systems, vol. 1, no. 1, Mar [16] Virtex-4 User Guide, Xilinx Inc., February [17] Virtex-5 User Guide v3.0, Xilinx Inc., February [18] Stratix II Device Handbook, Altera Corp., May [19] Stratix III Device Handbook, Altera Corp., May [20] J. Xiong, V. Zolotov, and L. He, Robust extraction of spatial correlation, in Proc. International Symposium on Physical Design, [21] W. C. Elmore, The transient response of damped linear networks with particular regard to wideband amplifiers, Journal of Applied Physics, vol. 19, no. 1, pp , Jan [22] H. J. Godwin, Some low moments of order statistics, The Annals of Mathematical Statistics, vol. 20, no. 2, pp , Jun 1949.

On the Tradeoff between Power and Flexibility of FPGA Clock Networks

On the Tradeoff between Power and Flexibility of FPGA Clock Networks On the Tradeoff between Power and Flexibility of FPGA Clock Networks JULIEN LAMOUREUX AND STEVEN J.E. WILTON University of British Columbia FPGA clock networks consume a significant amount of power since

More information

Word-length Optimization and Error Analysis of a Multivariate Gaussian Random Number Generator

Word-length Optimization and Error Analysis of a Multivariate Gaussian Random Number Generator Word-length Optimization and Error Analysis of a Multivariate Gaussian Random Number Generator Chalermpol Saiprasert, Christos-Savvas Bouganis and George A. Constantinides Department of Electrical & Electronic

More information

Parametric Yield in FPGAs Due to Within-die Delay Variations: A Quantitative Analysis

Parametric Yield in FPGAs Due to Within-die Delay Variations: A Quantitative Analysis Parametric Yield in FPGAs Due to Within-die Delay Variations: A Quantitative Analysis Pete Sedcole and Peter Y. K. Cheung Dept. Electrical and Electronic Engineering, Imperial College London, UK {pete.sedcole,p.cheung}@imperial.ac.uk

More information

Clock signal in digital circuit is responsible for synchronizing the transfer to the data between processing elements.

Clock signal in digital circuit is responsible for synchronizing the transfer to the data between processing elements. 1 2 Introduction Clock signal in digital circuit is responsible for synchronizing the transfer to the data between processing elements. Defines the precise instants when the circuit is allowed to change

More information

Design for Manufacturability and Power Estimation. Physical issues verification (DSM)

Design for Manufacturability and Power Estimation. Physical issues verification (DSM) Design for Manufacturability and Power Estimation Lecture 25 Alessandra Nardi Thanks to Prof. Jan Rabaey and Prof. K. Keutzer Physical issues verification (DSM) Interconnects Signal Integrity P/G integrity

More information

Reduced-Area Constant-Coefficient and Multiple-Constant Multipliers for Xilinx FPGAs with 6-Input LUTs

Reduced-Area Constant-Coefficient and Multiple-Constant Multipliers for Xilinx FPGAs with 6-Input LUTs Article Reduced-Area Constant-Coefficient and Multiple-Constant Multipliers for Xilinx FPGAs with 6-Input LUTs E. George Walters III Department of Electrical and Computer Engineering, Penn State Erie,

More information

PARADE: PARAmetric Delay Evaluation Under Process Variation *

PARADE: PARAmetric Delay Evaluation Under Process Variation * PARADE: PARAmetric Delay Evaluation Under Process Variation * Xiang Lu, Zhuo Li, Wangqi Qiu, D. M. H. Walker, Weiping Shi Dept. of Electrical Engineering Dept. of Computer Science Texas A&M University

More information

Statistical Timing Analysis with Path Reconvergence and Spatial Correlations

Statistical Timing Analysis with Path Reconvergence and Spatial Correlations Statistical Timing Analysis with Path Reconvergence and Spatial Correlations Lizheng Zhang, Yuhen Hu, Charlie Chung-Ping Chen ECE Department, University of Wisconsin, Madison, WI53706-1691, USA E-mail:

More information

Luis Manuel Santana Gallego 31 Investigation and simulation of the clock skew in modern integrated circuits

Luis Manuel Santana Gallego 31 Investigation and simulation of the clock skew in modern integrated circuits Luis Manuel Santana Gallego 31 Investigation and simulation of the clock skew in modern egrated circuits 3. Clock skew 3.1. Definitions For two sequentially adjacent registers, as shown in figure.1, C

More information

Fast Buffer Insertion Considering Process Variation

Fast Buffer Insertion Considering Process Variation Fast Buffer Insertion Considering Process Variation Jinjun Xiong, Lei He EE Department University of California, Los Angeles Sponsors: NSF, UC MICRO, Actel, Mindspeed Agenda Introduction and motivation

More information

Multivariate Gaussian Random Number Generator Targeting Specific Resource Utilization in an FPGA

Multivariate Gaussian Random Number Generator Targeting Specific Resource Utilization in an FPGA Multivariate Gaussian Random Number Generator Targeting Specific Resource Utilization in an FPGA Chalermpol Saiprasert, Christos-Savvas Bouganis and George A. Constantinides Department of Electrical &

More information

CMPEN 411 VLSI Digital Circuits Spring 2012 Lecture 17: Dynamic Sequential Circuits And Timing Issues

CMPEN 411 VLSI Digital Circuits Spring 2012 Lecture 17: Dynamic Sequential Circuits And Timing Issues CMPEN 411 VLSI Digital Circuits Spring 2012 Lecture 17: Dynamic Sequential Circuits And Timing Issues [Adapted from Rabaey s Digital Integrated Circuits, Second Edition, 2003 J. Rabaey, A. Chandrakasan,

More information

Constrained Clock Shifting for Field Programmable Gate Arrays

Constrained Clock Shifting for Field Programmable Gate Arrays Constrained Clock Shifting for Field Programmable Gate Arrays Deshanand P. Singh Dept. of Electrical and Computer Engineering University of Toronto Toronto, Canada singhd@eecg.toronto.edu Stephen D. Brown

More information

PARADE: PARAmetric Delay Evaluation Under Process Variation * (Revised Version)

PARADE: PARAmetric Delay Evaluation Under Process Variation * (Revised Version) PARADE: PARAmetric Delay Evaluation Under Process Variation * (Revised Version) Xiang Lu, Zhuo Li, Wangqi Qiu, D. M. H. Walker, Weiping Shi Dept. of Electrical Engineering Dept. of Computer Science Texas

More information

Variation-aware Clock Network Design Methodology for Ultra-Low Voltage (ULV) Circuits

Variation-aware Clock Network Design Methodology for Ultra-Low Voltage (ULV) Circuits Variation-aware Clock Network Design Methodology for Ultra-Low Voltage (ULV) Circuits Xin Zhao, Jeremy R. Tolbert, Chang Liu, Saibal Mukhopadhyay, and Sung Kyu Lim School of ECE, Georgia Institute of Technology,

More information

Reducing Delay Uncertainty in Deeply Scaled Integrated Circuits Using Interdependent Timing Constraints

Reducing Delay Uncertainty in Deeply Scaled Integrated Circuits Using Interdependent Timing Constraints Reducing Delay Uncertainty in Deeply Scaled Integrated Circuits Using Interdependent Timing Constraints Emre Salman and Eby G. Friedman Department of Electrical and Computer Engineering University of Rochester

More information

EEC 216 Lecture #3: Power Estimation, Interconnect, & Architecture. Rajeevan Amirtharajah University of California, Davis

EEC 216 Lecture #3: Power Estimation, Interconnect, & Architecture. Rajeevan Amirtharajah University of California, Davis EEC 216 Lecture #3: Power Estimation, Interconnect, & Architecture Rajeevan Amirtharajah University of California, Davis Outline Announcements Review: PDP, EDP, Intersignal Correlations, Glitching, Top

More information

CSE241 VLSI Digital Circuits Winter Lecture 07: Timing II

CSE241 VLSI Digital Circuits Winter Lecture 07: Timing II CSE241 VLSI Digital Circuits Winter 2003 Lecture 07: Timing II CSE241 L3 ASICs.1 Delay Calculation Cell Fall Cap\Tr 0.05 0.2 0.5 0.01 0.02 0.16 0.30 0.5 2.0 0.04 0.32 0.178 0.08 0.64 0.60 1.20 0.1ns 0.147ns

More information

Using Global Clock Networks

Using Global Clock Networks Using Global Clock Networks Introduction Virtex-II devices support very high frequency designs and thus require low-skew advanced clock distribution. With device density up to 0 million system gates, numerous

More information

Design Exploration of an FPGA-Based Multivariate Gaussian Random Number Generator

Design Exploration of an FPGA-Based Multivariate Gaussian Random Number Generator Design Exploration of an FPGA-Based Multivariate Gaussian Random Number Generator Chalermpol Saiprasert A thesis submitted for the degree of Doctor of Philosophy in Electrical and Electronic Engineering

More information

Variations-Aware Low-Power Design with Voltage Scaling

Variations-Aware Low-Power Design with Voltage Scaling Variations-Aware -Power Design with Scaling Navid Azizi, Muhammad M. Khellah,VivekDe, Farid N. Najm Department of ECE, University of Toronto, Toronto, Ontario, Canada Circuits Research, Intel Labs, Hillsboro,

More information

Xarxes de distribució del senyal de. interferència electromagnètica, consum, soroll de conmutació.

Xarxes de distribució del senyal de. interferència electromagnètica, consum, soroll de conmutació. Xarxes de distribució del senyal de rellotge. Clock skew, jitter, interferència electromagnètica, consum, soroll de conmutació. (transparències generades a partir de la presentació de Jan M. Rabaey, Anantha

More information

HIGH-PERFORMANCE circuits consume a considerable

HIGH-PERFORMANCE circuits consume a considerable 1166 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL 17, NO 11, NOVEMBER 1998 A Matrix Synthesis Approach to Thermal Placement Chris C N Chu D F Wong Abstract In this

More information

Constraining and Analyzing Source-Synchronous Interfaces

Constraining and Analyzing Source-Synchronous Interfaces Constraining and Analyzing Source-Synchronous Interfaces December 2007, ver. 2.0 Application Note 433 Introduction This application note describes techniques for constraining and analyzing source-synchronous

More information

Session 8C-5: Inductive Issues in Power Grids and Packages. Controlling Inductive Cross-talk and Power in Off-chip Buses using CODECs

Session 8C-5: Inductive Issues in Power Grids and Packages. Controlling Inductive Cross-talk and Power in Off-chip Buses using CODECs ASP-DAC 2006 Session 8C-5: Inductive Issues in Power Grids and Packages Controlling Inductive Cross-talk and Power in Off-chip Buses using CODECs Authors: Brock J. LaMeres Agilent Technologies Kanupriya

More information

Making Fast Buffer Insertion Even Faster via Approximation Techniques

Making Fast Buffer Insertion Even Faster via Approximation Techniques Making Fast Buffer Insertion Even Faster via Approximation Techniques Zhuo Li, C. N. Sze, Jiang Hu and Weiping Shi Department of Electrical Engineering Texas A&M University Charles J. Alpert IBM Austin

More information

Implementation of Clock Network Based on Clock Mesh

Implementation of Clock Network Based on Clock Mesh International Conference on Information Technology and Management Innovation (ICITMI 2015) Implementation of Clock Network Based on Clock Mesh He Xin 1, a *, Huang Xu 2,b and Li Yujing 3,c 1 Sichuan Institute

More information

Lecture 9: Clocking, Clock Skew, Clock Jitter, Clock Distribution and some FM

Lecture 9: Clocking, Clock Skew, Clock Jitter, Clock Distribution and some FM Lecture 9: Clocking, Clock Skew, Clock Jitter, Clock Distribution and some FM Mark McDermott Electrical and Computer Engineering The University of Texas at Austin 9/27/18 VLSI-1 Class Notes Why Clocking?

More information

TAU 2015 Contest Incremental Timing Analysis and Incremental Common Path Pessimism Removal (CPPR) Contest Education. v1.9 January 19 th, 2015

TAU 2015 Contest Incremental Timing Analysis and Incremental Common Path Pessimism Removal (CPPR) Contest Education. v1.9 January 19 th, 2015 TU 2015 Contest Incremental Timing nalysis and Incremental Common Path Pessimism Removal CPPR Contest Education v1.9 January 19 th, 2015 https://sites.google.com/site/taucontest2015 Contents 1 Introduction

More information

Methodology to Achieve Higher Tolerance to Delay Variations in Synchronous Circuits

Methodology to Achieve Higher Tolerance to Delay Variations in Synchronous Circuits Methodology to Achieve Higher Tolerance to Delay Variations in Synchronous Circuits Emre Salman and Eby G. Friedman Department of Electrical and Computer Engineering University of Rochester Rochester,

More information

An Integer Programming Placement Approach to FPGA Clock Power Reduction

An Integer Programming Placement Approach to FPGA Clock Power Reduction An Integer Programming Placement Approach to FPGA Clock Power Reduction Alireza Rakhshanfar Dept. of ECE, University of Toronto Toronto, ON Canada e-mail: ali.rakhshanfar@utoronto.ca Jason H. Anderson

More information

A Novel Ternary Content-Addressable Memory (TCAM) Design Using Reversible Logic

A Novel Ternary Content-Addressable Memory (TCAM) Design Using Reversible Logic 2015 28th International Conference 2015 on 28th VLSI International Design and Conference 2015 14th International VLSI Design Conference on Embedded Systems A Novel Ternary Content-Addressable Memory (TCAM)

More information

TAU 2014 Contest Pessimism Removal of Timing Analysis v1.6 December 11 th,

TAU 2014 Contest Pessimism Removal of Timing Analysis v1.6 December 11 th, TU 2014 Contest Pessimism Removal of Timing nalysis v1.6 ecember 11 th, 2013 https://sites.google.com/site/taucontest2014 1 Introduction This document outlines the concepts and implementation details necessary

More information

Design Methodology and Tools for NEC Electronics Structured ASIC ISSP

Design Methodology and Tools for NEC Electronics Structured ASIC ISSP Design Methodology and Tools for NEC Electronics Structured ASIC ISSP Takumi Okamoto NEC Corporation 75 Shimonumabe, Nakahara-ku, Kawasaki, Kanagawa -8666, Japan okamoto@ct.jp.nec.com Tsutomu Kimoto Naotaka

More information

CMOS device technology has scaled rapidly for nearly. Modeling and Analysis of Nonuniform Substrate Temperature Effects on Global ULSI Interconnects

CMOS device technology has scaled rapidly for nearly. Modeling and Analysis of Nonuniform Substrate Temperature Effects on Global ULSI Interconnects IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 24, NO. 6, JUNE 2005 849 Modeling and Analysis of Nonuniform Substrate Temperature Effects on Global ULSI Interconnects

More information

Physical Design of Digital Integrated Circuits (EN0291 S40) Sherief Reda Division of Engineering, Brown University Fall 2006

Physical Design of Digital Integrated Circuits (EN0291 S40) Sherief Reda Division of Engineering, Brown University Fall 2006 Physical Design of Digital Integrated Circuits (EN0291 S40) Sherief Reda Division of Engineering, Brown University Fall 2006 1 Lecture 04: Timing Analysis Static timing analysis STA for sequential circuits

More information

EE115C Winter 2017 Digital Electronic Circuits. Lecture 19: Timing Analysis

EE115C Winter 2017 Digital Electronic Circuits. Lecture 19: Timing Analysis EE115C Winter 2017 Digital Electronic Circuits Lecture 19: Timing Analysis Outline Timing parameters Clock nonidealities (skew and jitter) Impact of Clk skew on timing Impact of Clk jitter on timing Flip-flop-

More information

Variation-Resistant Dynamic Power Optimization for VLSI Circuits

Variation-Resistant Dynamic Power Optimization for VLSI Circuits Process-Variation Variation-Resistant Dynamic Power Optimization for VLSI Circuits Fei Hu Department of ECE Auburn University, AL 36849 Ph.D. Dissertation Committee: Dr. Vishwani D. Agrawal Dr. Foster

More information

Intel Stratix 10 Thermal Modeling and Management

Intel Stratix 10 Thermal Modeling and Management Intel Stratix 10 Thermal Modeling and Management Updated for Intel Quartus Prime Design Suite: 17.1 Subscribe Send Feedback Latest document on the web: PDF HTML Contents Contents 1...3 1.1 List of Abbreviations...

More information

Capturing Post-Silicon Variations using a Representative Critical Path

Capturing Post-Silicon Variations using a Representative Critical Path 1 Capturing Post-Silicon Variations using a Representative Critical Path Qunzeng Liu and Sachin S. Sapatnekar Abstract In nanoscale technologies that experience large levels of process variation, post-silicon

More information

Novel Devices and Circuits for Computing

Novel Devices and Circuits for Computing Novel Devices and Circuits for Computing UCSB 594BB Winter 2013 Lecture 4: Resistive switching: Logic Class Outline Material Implication logic Stochastic computing Reconfigurable logic Material Implication

More information

Skew Management of NBTI Impacted Gated Clock Trees

Skew Management of NBTI Impacted Gated Clock Trees International Symposium on Physical Design 2010 Skew Management of NBTI Impacted Gated Clock Trees Ashutosh Chakraborty and David Z. Pan ECE Department, University of Texas at Austin ashutosh@cerc.utexas.edu

More information

Interconnect Yield Model for Manufacturability Prediction in Synthesis of Standard Cell Based Designs *

Interconnect Yield Model for Manufacturability Prediction in Synthesis of Standard Cell Based Designs * Interconnect Yield Model for Manufacturability Prediction in Synthesis of Standard Cell Based Designs * Hans T. Heineken and Wojciech Maly Department of Electrical and Computer Engineering Carnegie Mellon

More information

A Random Walk from Async to Sync. Paul Cunningham & Steev Wilcox

A Random Walk from Async to Sync. Paul Cunningham & Steev Wilcox A Random Walk from Async to Sync Paul Cunningham & Steev Wilcox Thank You Ivan In the Beginning March 2002 Azuro Day 1 Some money in the bank from Angel Investors 2 employees Small Office rented from Cambridge

More information

FPGA Implementation of a Predictive Controller

FPGA Implementation of a Predictive Controller FPGA Implementation of a Predictive Controller SIAM Conference on Optimization 2011, Darmstadt, Germany Minisymposium on embedded optimization Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

More information

Closed Form Expressions for Delay to Ramp Inputs for On-Chip VLSI RC Interconnect

Closed Form Expressions for Delay to Ramp Inputs for On-Chip VLSI RC Interconnect ISSN -77 (Paper) ISSN -87 (Online) Vol.4, No.7, - National Conference on Emerging Trends in Electrical, Instrumentation & Communication Engineering Closed Form Expressions for Delay to Ramp Inputs for

More information

CARNEGIE MELLON UNIVERSITY DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING DIGITAL INTEGRATED CIRCUITS FALL 2002

CARNEGIE MELLON UNIVERSITY DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING DIGITAL INTEGRATED CIRCUITS FALL 2002 CARNEGIE MELLON UNIVERSITY DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING 18-322 DIGITAL INTEGRATED CIRCUITS FALL 2002 Final Examination, Monday Dec. 16, 2002 NAME: SECTION: Time: 180 minutes Closed

More information

Stack Sizing for Optimal Current Drivability in Subthreshold Circuits REFERENCES

Stack Sizing for Optimal Current Drivability in Subthreshold Circuits REFERENCES 598 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL 16, NO 5, MAY 2008 design can be easily expanded to a hierarchical 64-bit adder such that the result will be attained in four cycles

More information

Logical Effort: Designing for Speed on the Back of an Envelope David Harris Harvey Mudd College Claremont, CA

Logical Effort: Designing for Speed on the Back of an Envelope David Harris Harvey Mudd College Claremont, CA Logical Effort: Designing for Speed on the Back of an Envelope David Harris David_Harris@hmc.edu Harvey Mudd College Claremont, CA Outline o Introduction o Delay in a Logic Gate o Multi-stage Logic Networks

More information

Robust Clock Tree Synthesis with Timing Yield Optimization for 3D-ICs

Robust Clock Tree Synthesis with Timing Yield Optimization for 3D-ICs Robust Clock Tree Synthesis with Timing Yield Optimization for 3D-ICs Jae-Seok Yang, Jiwoo Pak, Xin Zhao, Sung Kyu Lim, and David Z. Pan Dept. of ECE, The University of Texas at Austin, TX USA School of

More information

Pre and post-silicon techniques to deal with large-scale process variations

Pre and post-silicon techniques to deal with large-scale process variations Pre and post-silicon techniques to deal with large-scale process variations Jaeyong Chung, Ph.D. Department of Electronic Engineering Incheon National University Outline Introduction to Variability Pre-silicon

More information

Delay Variation Tolerance for Domino Circuits

Delay Variation Tolerance for Domino Circuits Delay Variation Tolerance for Domino Circuits Student: Kai-Chiang Wu Advisor: Shih-Chieh Chang Department of Computer Science National Tsing Hua University Hsinchu, Taiwan 300, R.O.C. June, 2004 Abstract

More information

Thermal-reliable 3D Clock-tree Synthesis Considering Nonlinear Electrical-thermal-coupled TSV Model

Thermal-reliable 3D Clock-tree Synthesis Considering Nonlinear Electrical-thermal-coupled TSV Model Thermal-reliable 3D Clock-tree Synthesis Considering Nonlinear Electrical-thermal-coupled TSV Model Yang Shang 1, Chun Zhang 1, Hao Yu 1, Chuan Seng Tan 1, Xin Zhao 2, Sung Kyu Lim 2 1 School of Electrical

More information

Very Large Scale Integration (VLSI)

Very Large Scale Integration (VLSI) Very Large Scale Integration (VLSI) Lecture 4 Dr. Ahmed H. Madian Ah_madian@hotmail.com Dr. Ahmed H. Madian-VLSI Contents Delay estimation Simple RC model Penfield-Rubenstein Model Logical effort Delay

More information

Design for Variability and Signoff Tips

Design for Variability and Signoff Tips Design for Variability and Signoff Tips Alexander Tetelbaum Abelite Design Automation, Walnut Creek, USA alex@abelite-da.com ABSTRACT The paper provides useful design tips and recommendations on how to

More information

Statistical Clock Skew Modeling With Data Delay Variations

Statistical Clock Skew Modeling With Data Delay Variations 888 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 9, NO. 6, DECEMBER 2001 Statistical Clock Skew Modeling With Data Delay Variations David Harris and Sam Naffziger Abstract Accurate

More information

Buffered Clock Tree Sizing for Skew Minimization under Power and Thermal Budgets

Buffered Clock Tree Sizing for Skew Minimization under Power and Thermal Budgets Buffered Clock Tree Sizing for Skew Minimization under Power and Thermal Budgets Krit Athikulwongse, Xin Zhao, and Sung Kyu Lim School of Electrical and Computer Engineering Georgia Institute of Technology

More information

A Novel LUT Using Quaternary Logic

A Novel LUT Using Quaternary Logic A Novel LUT Using Quaternary Logic 1*GEETHA N S 2SATHYAVATHI, N S 1Department of ECE, Applied Electronics, Sri Balaji Chockalingam Engineering College, Arani,TN, India. 2Assistant Professor, Department

More information

An Automated Approach for Evaluating Spatial Correlation in Mixed Signal Designs Using Synopsys HSpice

An Automated Approach for Evaluating Spatial Correlation in Mixed Signal Designs Using Synopsys HSpice Spatial Correlation in Mixed Signal Designs Using Synopsys HSpice Omid Kavehei, Said F. Al-Sarawi, Derek Abbott School of Electrical and Electronic Engineering The University of Adelaide Adelaide, SA 5005,

More information

Statistical Analysis Techniques for Logic and Memory Circuits

Statistical Analysis Techniques for Logic and Memory Circuits Statistical Analysis Techniques for Logic and Memory Circuits A DISSERTATION SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY Qunzeng Liu IN PARTIAL FULFILLMENT OF THE

More information

The Linear-Feedback Shift Register

The Linear-Feedback Shift Register EECS 141 S02 Timing Project 2: A Random Number Generator R R R S 0 S 1 S 2 1 0 0 0 1 0 1 0 1 1 1 0 1 1 1 0 1 1 0 0 1 1 0 0 The Linear-Feedback Shift Register 1 Project Goal Design a 4-bit LFSR SPEED, SPEED,

More information

Chapter 2 Process Variability. Overview. 2.1 Sources and Types of Variations

Chapter 2 Process Variability. Overview. 2.1 Sources and Types of Variations Chapter 2 Process Variability Overview Parameter variability has always been an issue in integrated circuits. However, comparing with the size of devices, it is relatively increasing with technology evolution,

More information

Determining Appropriate Precisions for Signals in Fixed-Point IIR Filters

Determining Appropriate Precisions for Signals in Fixed-Point IIR Filters 38.3 Determining Appropriate Precisions for Signals in Fixed-Point IIR Filters Joan Carletta Akron, OH 4435-3904 + 330 97-5993 Robert Veillette Akron, OH 4435-3904 + 330 97-5403 Frederick Krach Akron,

More information

SINCE the early 1990s, static-timing analysis (STA) has

SINCE the early 1990s, static-timing analysis (STA) has IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 27, NO. 4, APRIL 2008 589 Keynote Paper Statistical Timing Analysis: From Basic Principles to State of the Art David

More information

Reducing power in using different technologies using FSM architecture

Reducing power in using different technologies using FSM architecture Reducing power in using different technologies using FSM architecture Himani Mitta l, Dinesh Chandra 2, Sampath Kumar 3,2,3 J.S.S.Academy of Technical Education,NOIDA,U.P,INDIA himanimit@yahoo.co.in, dinesshc@gmail.com,

More information

Statistical Clock Skew Modeling with Data Delay Variations

Statistical Clock Skew Modeling with Data Delay Variations Statistical Clock Skew Modeling with Data Delay Variations Abstract David Harris 1 and Sam Naffziger 2 David_Harris@hmc.edu, sdn@fc.hp.com Accurate clock skew budgets are important for microprocessor designers

More information

Longest Path Selection for Delay Test under Process Variation

Longest Path Selection for Delay Test under Process Variation 2093 1 Longest Path Selection for Delay Test under Process Variation Xiang Lu, Zhuo Li, Wangqi Qiu, D. M. H. Walker and Weiping Shi Abstract Under manufacturing process variation, a path through a net

More information

Optimum Prefix Adders in a Comprehensive Area, Timing and Power Design Space

Optimum Prefix Adders in a Comprehensive Area, Timing and Power Design Space Optimum Prefix Adders in a Comprehensive Area, Timing and Power Design Space Jianhua Liu, Yi Zhu, Haikun Zhu, John Lillis 2, Chung-Kuan Cheng Department of Computer Science and Engineering University of

More information

Interconnect s Role in Deep Submicron. Second class to first class

Interconnect s Role in Deep Submicron. Second class to first class Interconnect s Role in Deep Submicron Dennis Sylvester EE 219 November 3, 1998 Second class to first class Interconnect effects are no longer secondary # of wires # of devices More metal levels RC delay

More information

Luis Manuel Santana Gallego 71 Investigation and simulation of the clock skew in modern integrated circuits. Clock Skew Model 1

Luis Manuel Santana Gallego 71 Investigation and simulation of the clock skew in modern integrated circuits. Clock Skew Model 1 Luis Manuel Santana Gallego 71 Appendix 1 Clock Skew Model 1 Steven D. Kugelmass, Kenneth Steiglitz [KUG-88] 1. Introduction The accumulation of clock skew, the differences in arrival times of signal in

More information

Reversible Implementation of Ternary Content Addressable Memory (TCAM) Interface with SRAM

Reversible Implementation of Ternary Content Addressable Memory (TCAM) Interface with SRAM International Journal of Electrical Electronics Computers & Mechanical Engineering (IJEECM) ISSN: 2278-2808 Volume 5 Issue 4 ǁ April. 2017 IJEECM journal of Electronics and Communication Engineering (ijeecm-jec)

More information

ULTRALOW VOLTAGE (ULV) circuits, where the supply

ULTRALOW VOLTAGE (ULV) circuits, where the supply 1222 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 31, NO. 8, AUGUST 2012 Variation-Aware Clock Network Design Methodology for Ultralow Voltage (ULV) Circuits Xin

More information

Novel Bit Adder Using Arithmetic Logic Unit of QCA Technology

Novel Bit Adder Using Arithmetic Logic Unit of QCA Technology Novel Bit Adder Using Arithmetic Logic Unit of QCA Technology Uppoju Shiva Jyothi M.Tech (ES & VLSI Design), Malla Reddy Engineering College For Women, Secunderabad. Abstract: Quantum cellular automata

More information

High Speed Time Efficient Reversible ALU Based Logic Gate Structure on Vertex Family

High Speed Time Efficient Reversible ALU Based Logic Gate Structure on Vertex Family International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 11, Issue 04 (April 2015), PP.72-77 High Speed Time Efficient Reversible ALU Based

More information

Non-Invasive Pre-Bond TSV Test Using Ring Oscillators and Multiple Voltage Levels

Non-Invasive Pre-Bond TSV Test Using Ring Oscillators and Multiple Voltage Levels Non-Invasive Pre-Bond TSV Test Using Ring Oscillators and Multiple Voltage Levels Sergej Deutsch and Krishnendu Chakrabarty Duke University Durham, NC 27708, USA Abstract Defects in TSVs due to fabrication

More information

Itanium TM Processor Clock Design

Itanium TM Processor Clock Design Itanium TM Processor Design Utpal Desai 1, Simon Tam, Robert Kim, Ji Zhang, Stefan Rusu Intel Corporation, M/S SC12-502, 2200 Mission College Blvd, Santa Clara, CA 95052 ABSTRACT The Itanium processor

More information

EE382 Processor Design Winter 1999 Chapter 2 Lectures Clocking and Pipelining

EE382 Processor Design Winter 1999 Chapter 2 Lectures Clocking and Pipelining Slide 1 EE382 Processor Design Winter 1999 Chapter 2 Lectures Clocking and Pipelining Slide 2 Topics Clocking Clock Parameters Latch Types Requirements for reliable clocking Pipelining Optimal pipelining

More information

Performance and Variability Driven Guidelines for BEOL Layout Decomposition with LELE Double Patterning

Performance and Variability Driven Guidelines for BEOL Layout Decomposition with LELE Double Patterning Performance and Variability Driven Guidelines for BEOL Layout Decomposition with LELE Double Patterning Tuck-Boon Chan, Kwangok Jeong and Andrew B. Kahng ECE and CSE Depts., University of California at

More information

Name: Answers. Mean: 83, Standard Deviation: 12 Q1 Q2 Q3 Q4 Q5 Q6 Total. ESE370 Fall 2015

Name: Answers. Mean: 83, Standard Deviation: 12 Q1 Q2 Q3 Q4 Q5 Q6 Total. ESE370 Fall 2015 University of Pennsylvania Department of Electrical and System Engineering Circuit-Level Modeling, Design, and Optimization for Digital Systems ESE370, Fall 2015 Final Tuesday, December 15 Problem weightings

More information

A COMBINED 16-BIT BINARY AND DUAL GALOIS FIELD MULTIPLIER. Jesus Garcia and Michael J. Schulte

A COMBINED 16-BIT BINARY AND DUAL GALOIS FIELD MULTIPLIER. Jesus Garcia and Michael J. Schulte A COMBINED 16-BIT BINARY AND DUAL GALOIS FIELD MULTIPLIER Jesus Garcia and Michael J. Schulte Lehigh University Department of Computer Science and Engineering Bethlehem, PA 15 ABSTRACT Galois field arithmetic

More information

Single Stuck-At Fault Model Other Fault Models Redundancy and Untestable Faults Fault Equivalence and Fault Dominance Method of Boolean Difference

Single Stuck-At Fault Model Other Fault Models Redundancy and Untestable Faults Fault Equivalence and Fault Dominance Method of Boolean Difference Single Stuck-At Fault Model Other Fault Models Redundancy and Untestable Faults Fault Equivalence and Fault Dominance Method of Boolean Difference Copyright 1998 Elizabeth M. Rudnick 1 Modeling the effects

More information

EE 330 Lecture 39. Digital Circuits. Propagation Delay basic characterization Device Sizing (Inverter and multiple-input gates)

EE 330 Lecture 39. Digital Circuits. Propagation Delay basic characterization Device Sizing (Inverter and multiple-input gates) EE 330 Lecture 39 Digital ircuits Propagation Delay basic characterization Device Sizing (Inverter and multiple-input gates) Review from last lecture Other MOS Logic Families Enhancement Load NMOS Enhancement

More information

Overlay Aware Interconnect and Timing Variation Modeling for Double Patterning Technology

Overlay Aware Interconnect and Timing Variation Modeling for Double Patterning Technology Overlay Aware Interconnect and Timing Variation Modeling for Double Patterning Technology Jae-Seok Yang, David Z. Pan Dept. of ECE, The University of Texas at Austin, Austin, Tx 78712 jsyang@cerc.utexas.edu,

More information

Impact of Modern Process Technologies on the Electrical Parameters of Interconnects

Impact of Modern Process Technologies on the Electrical Parameters of Interconnects Impact of Modern Process Technologies on the Electrical Parameters of Interconnects Debjit Sinha, Jianfeng Luo, Subramanian Rajagopalan Shabbir Batterywala, Narendra V Shenoy and Hai Zhou EECS, Northwestern

More information

Using A54SX72A and RT54SX72S Quadrant Clocks

Using A54SX72A and RT54SX72S Quadrant Clocks Application Note AC169 Using A54SX72A and RT54SX72S Quadrant Clocks Architectural Overview The A54SX72A and RT54SX72S devices offer four quadrant clock networks (QCLK0, 1, 2, and 3) that can be driven

More information

Micro-architecture Pipelining Optimization with Throughput- Aware Floorplanning

Micro-architecture Pipelining Optimization with Throughput- Aware Floorplanning Micro-architecture Pipelining Optimization with Throughput- Aware Floorplanning Yuchun Ma* Zhuoyuan Li* Jason Cong Xianlong Hong Glenn Reinman Sheqin Dong* Qiang Zhou *Department of Computer Science &

More information

Programmable Logic Devices

Programmable Logic Devices Programmable Logic Devices Mohammed Anvar P.K AP/ECE Al-Ameen Engineering College PLDs Programmable Logic Devices (PLD) General purpose chip for implementing circuits Can be customized using programmable

More information

Design and Implementation of Carry Tree Adders using Low Power FPGAs

Design and Implementation of Carry Tree Adders using Low Power FPGAs 1 Design and Implementation of Carry Tree Adders using Low Power FPGAs Sivannarayana G 1, Raveendra babu Maddasani 2 and Padmasri Ch 3. Department of Electronics & Communication Engineering 1,2&3, Al-Ameer

More information

For smaller NRE cost For faster time to market For smaller high-volume manufacturing cost For higher performance

For smaller NRE cost For faster time to market For smaller high-volume manufacturing cost For higher performance University of California at Berkeley College of Engineering Department of Electrical Engineering and Computer Sciences EECS5 J. Wawrzynek Spring 22 2/22/2. [2 pts] Short Answers. Midterm Exam I a) [2 pts]

More information

Design of Arithmetic Logic Unit (ALU) using Modified QCA Adder

Design of Arithmetic Logic Unit (ALU) using Modified QCA Adder Design of Arithmetic Logic Unit (ALU) using Modified QCA Adder M.S.Navya Deepthi M.Tech (VLSI), Department of ECE, BVC College of Engineering, Rajahmundry. Abstract: Quantum cellular automata (QCA) is

More information

Today. ESE532: System-on-a-Chip Architecture. Energy. Message. Preclass Challenge: Power. Energy Today s bottleneck What drives Efficiency of

Today. ESE532: System-on-a-Chip Architecture. Energy. Message. Preclass Challenge: Power. Energy Today s bottleneck What drives Efficiency of ESE532: System-on-a-Chip Architecture Day 20: November 8, 2017 Energy Today Energy Today s bottleneck What drives Efficiency of Processors, FPGAs, accelerators How does parallelism impact energy? 1 2 Message

More information

Design and FPGA Implementation of Radix-10 Algorithm for Division with Limited Precision Primitives

Design and FPGA Implementation of Radix-10 Algorithm for Division with Limited Precision Primitives Design and FPGA Implementation of Radix-10 Algorithm for Division with Limited Precision Primitives Miloš D. Ercegovac Computer Science Department Univ. of California at Los Angeles California Robert McIlhenny

More information

Chapter 1: Logic systems

Chapter 1: Logic systems Chapter 1: Logic systems 1: Logic gates Learning Objectives: At the end of this topic you should be able to: identify the symbols and truth tables for the following logic gates: NOT AND NAND OR NOR XOR

More information

A Novel Flow for Reducing Clock Skew Considering NBTI Effect and Process Variations

A Novel Flow for Reducing Clock Skew Considering NBTI Effect and Process Variations A Novel Flow for Reducing Clock Skew Considering NBTI Effect and Process Variations Jifeng Chen, and Mohammad Tehranipoor University of Connecticut, Storrs, CT 6269, USA {jifeng.chen, tehrani}@engr.uconn.edu

More information

NANO-CMOS DESIGN FOR MANUFACTURABILILTY

NANO-CMOS DESIGN FOR MANUFACTURABILILTY NANO-CMOS DESIGN FOR MANUFACTURABILILTY Robust Circuit and Physical Design for Sub-65nm Technology Nodes Ban Wong Franz Zach Victor Moroz An u rag Mittal Greg Starr Andrew Kahng WILEY A JOHN WILEY & SONS,

More information

POLITECNICO DI TORINO Repository ISTITUZIONALE

POLITECNICO DI TORINO Repository ISTITUZIONALE POLITECNICO DI TORINO Repository ISTITUZIONALE Modeling of thermally induced skew variations in clock distribution network Original Modeling of thermally induced skew variations in clock distribution network

More information

Accounting for Non-linear Dependence Using Function Driven Component Analysis

Accounting for Non-linear Dependence Using Function Driven Component Analysis Accounting for Non-linear Dependence Using Function Driven Component Analysis Lerong Cheng Puneet Gupta Lei He Department of Electrical Engineering University of California, Los Angeles Los Angeles, CA

More information

PLA Minimization for Low Power VLSI Designs

PLA Minimization for Low Power VLSI Designs PLA Minimization for Low Power VLSI Designs Sasan Iman, Massoud Pedram Department of Electrical Engineering - Systems University of Southern California Chi-ying Tsui Department of Electrical and Electronics

More information

Low-complexity generation of scalable complete complementary sets of sequences

Low-complexity generation of scalable complete complementary sets of sequences University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2006 Low-complexity generation of scalable complete complementary sets

More information

Variability Aware Statistical Timing Modelling Using SPICE Simulations

Variability Aware Statistical Timing Modelling Using SPICE Simulations Variability Aware Statistical Timing Modelling Using SPICE Simulations Master Thesis by Di Wang Informatics and Mathematical Modelling, Technical University of Denmark January 23, 2008 2 Contents List

More information