Modelling and Compensating for Clock Skew Variability in FPGAs
|
|
- Gerald Lawrence
- 6 years ago
- Views:
Transcription
1 Modelling and Compensating for Clock Skew Variability in FPGAs Pete Sedcole, Justin S. Wong and Peter Y. K. Cheung Department of Electrical & Electronic Engineering, Imperial College London South Kensington campus, London SW7 2AZ, UK Abstract As integrated circuits are scaled down it becomes difficult to maintain uniformity in process parameters across each individual die. To avoid significant performance loss through pessimistic over-design new design strategies are required that are cognisant of within-die performance variability. This paper examines the effect of process variability on the clock resources in FPGA devices. A model of variation in clock skew in FPGA clock networks is presented. Techniques for reducing the impact of variations on the performance of implemented designs are proposed and analysed, demonstrating that skew variation can be reduced by 70% or more through a combination of phase adjustment and clock rerouting. Measurements on a Virtex-5 FPGA validate the feasibility and benefits of the proposed compensation strategies. 1. Introduction The fabrication of integrated circuits involves processes and materials that cannot be perfectly controlled. Manufacturing variations result in devices where performance and power consumption varies, both between dice and, more recently, between circuit elements within a single die. This variability is expected to increase as transistor sizes are scaled down [1]. Field-Programmable Gate Arrays (FPGAs), often on the cutting edge of technology scaling, are susceptible to process and material variations, possibly more than other highperformance integrated circuits. Unlike ASICs, the critical paths of the circuit the FPGA implements is not known until after fabrication, which results in particularly pessimistic circuit timing. Since variability cannot be eliminated by improving the fabrication process, new design techniques are required that are aware of and manage the variability. In our previous work, we reported on measurements of logic and routing variation in FPGAs using both ring oscillators [2] and an improved atspeed testing method [3]. We have also developed techniques for quantifying the variability in clock skew within FPGAs [4], which indicated that clock skew variability is comparable to logic path delay variability. With the knowledge gained from the experimental work in [4], this paper proposes a model to predict the effect of withindie parameter variations on FPGA clock networks. Because of the flexibility required in the clock routing within an FPGA, the structure of the clock network is substantively different to an ASIC clock tree, and is affected differently by variability. The model predicts the variation in the clock skew between any two register locations. An accurate model of the clock skew variation is beneficial, as it allows timing tools to reduce the required guard-band for the skew. Furthermore, we propose post-configuration compensation techniques to reduce the impact of clock skew variability, enabling more aggressive timing to be achieved. These are analysed using the clock skew variation model. The feasibility of the techniques is demonstrated by experimental measurements from a Xilinx Virtex-5 FPGA. 2. Background 2.1 Related work The study of the effect of process variability on clock trees has been previously examined in ASIC devices. This include work employing Monte Carlo simulations [5], [6] as well as approaches based on canonical or numerical analysis of the classical H-tree clock structure [7], [8]. Unlike an FPGA clock network, which is fixed (although programmable), in ASICs the clock tree design and routing can be optimised to the application before fabrication. By including awareness of variability into the optimisation process, the impact of variation can be reduced. For example, Venkataraman, Sze and Hu have investigated skew scheduling and clock routing incorporating variability awareness [9]. Rajaram and Pan describe a technique for reducing skew variation by inserting cross-links in the clock tree [10]. Skew variation may be corrected post-fabrication by using active de-skewing techniques, commonly employing elements in the clock tree with adjustable delays [11], [12], [13]. This technique has recently been investigated for FPGAs [14], [15]. The only published work to date on FPGA clock variability is our previous report on the measurement of skew variability [4]. An in-depth analysis of the impact of variability on FPGA clock trees is so far lacking in the literature. 2.2 FPGA clock trees The clock network in an integrated circuit is generally designed to manage the skew between any two points in the device. A design with zero nominal skew can be achieved by employing the well-known H-tree structure. An FPGA clock network must balance the minimal-skew requirement with sufficient flexibility to implement the clocking requirements of many different circuits. Inevitability, providing this flexibility
2 Programmable branch buffer Switch block Central buffer Programmable quadrant buffer Clock octant Fig. 2. The clock tree structure in a Stratix-II type of device. The structure is based on an H-tree, resulting in clock octants regardless of the size of the device. U 11 Source register u U 12 U i =V i fori = 1..7 Signal path Fig. 1. The clock tree structure in a Virtex-4/5 type of device. The device is divided into a number of fixed-sized clock regions. reduces the symmetry in the clock distribution, which has implications for the sensitivity of the clock to variations. Clock networks in FPGAs generally come in two flavours. A spine-and-branch approach is typified by the Xilinx Virtex- 4 [16] and Virtex-5 devices [17], and is represented by the diagram in Fig. 1. The clock is distributed on a hierarchical network of linear spines, where each spine taps directly off the higher-level spine. In the Virtex-4 and -5 architectures, all clock regions are of equal size: larger devices have a higher number of separate clock regions. The Stratix-II [18] and -III [19] devices from Altera favour a structure that resembles the traditional H-tree design, as shown in Fig. 2. Again, the structure is hierarchical: the higher levels of the hierarchy use an H-tree network, which minimises delay differences. At the lower levels, the clock is distributed to rows of logic blocks along linear branches. With this structure the device is divided into clock octants (or sixteen parts for the Stratix-III) regardless of the size of the device. Although the clock networks in Altera devices are more balanced than in those from Xilinx, FPGAs from both vendors exhibit definite differences in clock routing delay across the chip. Point-to-point clock skew (as reported by vendor timing tools) is typically of the order of hundreds of picoseconds in mid-range devices. In all cases, the clock network comprises duplicate resources to enable multiple clocks to be distributed throughout the device. A Virtex-5 XC5VLX50 device, for example, has 32 central buffers each of which drives a separate vertical spine, U 10 U 9 U 8 V 8 V 9 U 7 U 6 U 5 V 10 V 11 V 12 V 13 U 4 U 3 U 2 U 1 = V 1 Unit length wire v Destination register Fig. 3. Example of clock routing to two spatially separated registers at locations u and v. The first seven labelled resources are shared in this example. and each region has 10 horizontal spine and branch lines [17]. Hierarchical levels are connected by some form of crossbar switch, and buffers at any level in general can be disabled to reduce dynamic power dissipation. In addition to the global clocking network, FPGAs also have available regional clock buffers and distribution networks. These are not considered in this paper, and will be the subject of future work. 3. Clock Network Variation 3.1 Model In order to gain a greater comprehension of the effects of variability on the clock skew, an analysis of delay variations in the clock network is presented in this section. The outcome of the analysis is a model, which is then used in Section 4 to study strategies that compensate for clock skew variability. Consider two register locations in an FPGA, placed at positions u and v. As shown in Fig. 3, the clock is routed to each location along the dedicated clock resources, and the
3 resources may be shared for some part of the routing. In order to model the spatial correlation in delay variation, each wire segment is divided into unit lengths. The unit length is arbitrary, although the accuracy of the model will be better with a smaller unit length. The deviation from nominal delay along a wire unit, and through each buffer, from the source to location u is described with a variable U i. Similarly, the variation in delays from the source to v along the clock tree is modelled with variables V i. Note that all variables U i, V i have zero mean. The actual clock skew between locations u and v is s(u, v) = s 0 (u, v) + i w i U i i w i V i (1) where s 0 (u, v) is the nominal clock skew. The summations exclude the variables corresponding to shared clock resources (such as U 1 to U 7 and V 1 to V 7 in the example of Fig. 3) as they do not contribute to the skew between u and v. The variable w i is a weighting, equal to 1 for buffers and proportional to the wire segment length for wire units. The values of the weights are determined in the next section. The variance in skew is: [ Var [s(u, v)] = Var w i U i ] w i V i (2) i i This can be expanded: Var [s(u, v)] = i w 2 i Var [U i ] + i i j w i w j cov [U i, U j ] + i j 2 i,j w 2 i Var [V i ]+ w i w j cov [V i, V j ] w i w j cov [U i, V j ] (3) The variance of the clock skew between two locations in the FPGA can therefore be calculated from the covariance matrix of the buffer and wire unit delays of the clock tree routing. It is necessary to determine the covariance between each buffer and wire delay. There are three cases to consider: the covariance between two buffers, the covariance between two wire units, and the covariance between a buffer and a wire. Buffer-buffer: Where U i and U j correspond to buffer delay variation, we assume a homogeneous and isotropic spatial correlation function, ρ b (d), which only depends on the distance d between the two buffers. This assumption is common in the literature (e.g., [20]). The covariance is then simply: cov[u i, U j ] = σ Ui σ Uj ρ b (d) (4) Wire-wire: For the case where U i and U j both correspond to wire delays, a similar assumption is made for the correlation in delay variation. In this case the spatial correlation function is ρ w (d), where d is the distance between the mid-points of the two wire units. Thus cov[u i, U j ] = σ Ui σ Uj ρ w (d) (5) Buffer-wire: The model assumes that there is no spatial correlation between buffer delay variation and wire delay variation. This is reasonable, since variation in buffer delay is the result of FEOL processes 1, whereas wire delay variation is a consequence of BEOL 2 process variation. Therefore where U i and U j correspond to a buffer delay and a wire delay: cov[u i, U j ] = 0 (6) It should be noted that similar equations can be used to express the covariance in the clock routing to v and the covariance between clock trees (cov[v i, V j ] and cov[u i, V j ] respectively). 3.2 Weights We now determine the values to assign to the weights w i, based on the Elmore delay of a wire [21]. Recall that in the Elmore delay model, the total resistance and capacitance of a wire is divided into a finite number N of distributed resistances and capacitances R i and C i, i = 1,..., N. For the case of interest, each R i and C i are random variables and correspond to the unit lengths described earlier. We define a time constant X i of each unit length of wire X i = R i C i, such that U i = X i E[X i ]. Note that dxi du i = 1. The propagation delay of the wire is given by [21]: N N t w = R i C j (7) i=1 We are interested in the sensitivity of a change in a variable U k to the overall propagation delay of the wire. Therefore we calculate the partial derivative: j=i t w = t w dx k (8) U k X k du k = k 1 N C k R i + R k C k + R k C j (9) X k i=1 = 1 k 1 R i R k C k i=1 N j=k+1 j=k+1 C j (10) This value is also a random statistic. We can calculate the mean of this value by taking the expected value, noting that E[R i ] = R and E[C i ] = C: [ ] [ ] k 1 tw 1 [ ] 1 N E = E R+1+E C N (11) U k R k C k i=1 j=k+1 We see that variation in a wire unit will cause variation in total delay relative to the number of wire units in that segment. In other words, the variation in delay of the wire increases superlinearly with length. This makes intuitive sense, since the wire delay also increases superlinearly with length. 1 Front-End-Of-the-Line, the fabrication steps involving the patterning of silicon. 2 Back-End-Of-the-Line, the fabrication steps for depositing metal layers.
4 (a) Virtex-5 style device, high correlation in spatial variation. (b) Virtex-5 style device, low correlation in spatial variation. (c) Stratix-III style device, high correlation in spatial variation. (d) Stratix-III style device, low correlation in spatial variation. Fig. 4. Clock skew variation modelling results. Two types of device are modelled, one with a spine-and-branch clock network, Virtex-5 style, and one with an H-tree clock network, Stratix-III style. The variance in clock skew relative to a fixed location (25,5) is computed for both high and low spatial correlation. Using the variance values the 3-σ guard-banding values are plotted as a function of location. The z-scale of the plots are in units of the standard deviation in delay of one clock buffer. The weight w i for a wire unit is set to the total number of units in the segment. This weighting only applies to wire units, so for buffers w i = Case study The model derived above can be used to calculate the expected variance of the clock skew between any two locations on the FPGA. Conventionally, variation in delay or skew is accounted for by guard-bands: margins added to the nominal delay or skew to allow for the worst-case variation. Typically a margin of three times the standard deviation is used for the guard-band. Thus, if the clock skew has a standard deviation σ of 100ps, the guard-band will be ±300ps. Fig. 4 shows some 3σ guard-bands calculated using the proposed model above for two different devices types, corresponding to the two FPGA clock-tree styles discussed in Section 2.2. For each of the two device types, two different levels of spatially-correlated variation are modelled. For the high correlation model, the spatial correlation functions ρ b (d) and ρ w (d) fall as d 0.3 and asymptote to 0.2. For the low correlation model, ρ b (d) and ρ w (d) fall as d 2 and asymptote to 0.1 The total level of variability is set to σ U = 10% of delay for buffers and σ U = 5% for wire units. The plots are calculated assuming the register at one end of a signal path has been placed at location (25, 5) in the FPGA. The required guard-band to add to the clock skew for the second register is location-dependent. As expected, if the two end-point registers are placed within the same clock region, the required guard-banding is lower than if they are placed further apart. Although there are differences between the spine-andbranch and the H-tree clock distribution schemes, the total level of variability in the clock skew remains broadly similar for devices of the same size. Note also that where the variation has low spatial correlation, the necessary guard-banding is less position-dependent (the plots are flatter ), as would be expected, although it is still advantageous to place both registers within the same clock region. The model can be used during place-and-route to provide more aggressive timing than would be possible by using a single global guard-band value for skew. The model calculations are computationally non-complex and could be computed as necessary during place-and-route. Alternatively, to avoid extra time overhead during place-and-route, the guard-band values could be pre-computed for various register locations and approximations used during place-and-route. 4. Variation Compensation In this section we propose methods to mitigate variability in clock skew. The effectiveness of the methods are studied
5 TABLE I MODEL PARAMETERS Model parameter Value Logic block rows 80 Logic block columns 40 Buffer delay µ = 1.0, σ = 10% Wire unit delay µ = 0.1, σ = 5% High spatial corr. function ρ(d) = 0.3d Low spatial corr. function ρ(d) = 0.1d by modifying the model of Section 3.1, and by experiments on a Xilinx Virtex-5 XC5VLX50 FPGA. 4.1 Clock phase adjustment Modern high-end FPGAs include several very flexible clock generating resources, such as PLLs and Digital Clock Managers [17], [19]. In both Stratix and Virtex devices, in addition to being capable of synthesizing many clock frequencies, these clock generators are also able to produce phase-shifted clocks where the amount of phase-shifting can be changed at runtime. Using this capability, it is possible to generate an additional clock of the same frequency as the main clock but phaseadjusted to compensate for skew variations. The amount of phase adjustment can be tuned for each FPGA. Since this requires an additional DCM/PLL to generate the second clock, it is only possible if there are unused DCMs/PLLs in the FPGA. Although this technique can compensate exactly for the skew variation between any two particular register locations, it clearly cannot achieve this for all paths, as this would require a DCM/PLL for every register on the FPGA. A practical approach is to compensate for the random skew variation between two clock regions, by supplying one of the regions with a phase-adjusted clock tuned to compensate for the average offset in skew between the two regions. This technique we call regional phase compensation. A further improvement may be possible by constraining the placement of registers within each region. If registers are placed close together they are more likely to experience the same variation in clock skew. Therefore, by placing all source and sink registers of critical paths between the two regions close together, the phase adjustment can be more finely tuned to the local variation. We term this local phase compensation. It is necessary to modify the model of Section 3.1 to include these adaptations. This is relatively trivial. Examining Fig. 3, it can be seen that phase compensation will cause the variation in skew between the two divergent branches of the clock tree to be exactly cancelled up to some fixed point along each branch (for example, up to just after U 9 and V 9 ). When calculating the variance of the phase compensated technique, it is sufficient to disregard the terms corresponding to the clock tree before the compensation points. Note that there will be an increase in power consumption by using spare DCM/PLL resources. If there are no such spare resources, or the power overhead is unacceptable, gains may still be made by splitting the main clock and routing it through two central clock buffers. Stochastic differences in the (a) After regional skew correction by clock phase adjustment. (b) After local skew correction by clock phase adjustment. Fig. 5. Required skew guard-bands after compensating for skew variation with dual phase-adjusted clocks, based on a model of a Virtex-5 style FPGA with high correlation in spatial variation. Assumes a source register placed at location (25, 5). Guard-band values are again plotted relative to clock buffer standard deviation in delay. buffer delays will produce a phase shift in the two resulting clocks. The phase shift will not be controllable however, so the effectiveness of this approach is limited. The results of the modified clock skew model are shown in Fig. 5. Again, one register is fixed at location (25, 5). The guard-banding required when two registers are supplied by phase-adjusted clocks is plotted as a function of placement location of the second register. Fig. 5(a) shows the case where the clock phases are adjusted to cancel regional variations in skew. Fig. 5(b) is an example of the more aggressive local phase compensation. This assumes that all registers for critical paths between regions are placed within 3 3 logic blocks in each region. The graphs can be compared to the baseline case in Fig. 4(a). The regional phase compensation scheme reduces the guard-band by up to 42%, and the local phase compensation reduces the guard-band by up to 49%. Both schemes are most effective for registers placed a long way apart. 4.2 Clock resource re-routing As mentioned in Section 2.2, the buffers and wires that are used for clock signal routing in FPGAs are duplicated at each level, to provide flexibility and to allow multiple clocks to be distributed. Stochastic variations in the buffers, wires and switches will cause each duplicate resource to exhibit
6 different delays. It is possible to use these differences, given a particular FPGA and one clock net of interest, by selecting a clock routing which gives the most optimal clock skew. As an example, consider the Virtex-5 FPGA from Xilinx. In this device there are 10 nominally identical horizontal clock spines per region. At each register there is a multiplexer which determines which clock spine is connected to the clock input of the register. The clock signal can be routed on all 10 clock lines simultaneously, and the best signal selected at each register by reconfiguring the multiplexer. The best or most optimal signal may be the signal with the closest to nominal skew. Alternatively, a clock signal with a deviation in skew could be selected to compensate for reduced slack caused by path delay variations. Nominal skew objective: By choosing the signal with the closest-to-nominal skew, the skew variance will be reduced and therefore the required guard-band will also be smaller. To include this in the model, we need to quantify the effect of selection on the skew variance. Firstly, note that the duplicated clock resources are physically close together, so will exhibit the same correlated delay variation. The difference in skew of N duplicated resources is therefore a stochastic quantity of zero mean, which we will denote by the random variable X i, i = 1,..., N. Assuming that X i is approximately normally distributed with variance σ 2, its probability density can be described by ( ) x P( x < X i < x) = erf (12) 2σ where erf(x) is the error function. Let us label the value of X i which is closest to zero by Y. It is straightforward to show that: f Y (x) = P(Y = x) = N ( ) [ ( )] x 2 N 1 x exp 2πσ 2σ 2 1 erf (13) 2σ The variance of Y is defined as Var[Y ] = x2 f Y (x)dx which, while not possible to solve analytically, can be computed numerically. For N = 10, the variance Var[Y ] = σ 2. This is applied to the model of (3) by scaling the variance terms corresponding to the duplicated resources. The covariance terms remain the same. Positive skew objective: For a given register, instead of selecting the clock routing that gives the most nominal skew, one may choose to select the routing that gives the most positive skew. This will yield the most slack for paths that end at that register, although at the expense of slack for paths originating at the register. In this case, we select the maximum value of X i, which is the order statistic X N. The variance values in the model of (3) will be replaced by Var[X N ], and the guard-band will be reduced by E[X N ]. Order statistics have been extensively studied; mean and variance tables are readily available, such as in [22]. (a) Nominal skew objective. (b) Most positive skew objective. Fig. 6. Guard-bands after compensating for skew by clock phase adjustments and clock re-routing, for a high amount of spatially correlated delay. Results from the modified models for the clock resource rerouting strategies are plotted in Fig. 6. Both models assume that regional differences in phase are compensated for by the clock phase adjustment described above, and then the best of 10 available regional clock trees are used to route the clock signal. The graphs should therefore be compared to Fig. 5(a). By choosing the resources which give the nearest to nominal skew, the guard-band can be reduced by an additional 10% to 40% over regional phase compensation alone. The benefit is greatest when the two registers are placed close together. The most positive skew objective yields improvements of 30% to 90% additional reduction in guard-band compared with regional phase compensation. The results in Fig. 5(a) and Fig. 5(b) are based on a high level of spatial correlation. The model has also been used to investigate the situation where the delay variation is more stochastic. The results are broadly similar. The guard-band result for the clock resource re-routing for nominal skew is shown in Fig. 7 as an example. Compared to the highly correlated variation case of Fig. 6(a) the method offers less of an improvement for closely-spaced registers, and overall the guard-band has less locational dependence, as would be expected. 4.3 Experimental results In order to validate the feasibility of the proposed skew variability compensation techniques, experiments have been
7 Fig. 7. Guard-band after compensating for skew by clock phase adjustments and clock re-routing. The model assumes low spatial correlation in delay variation and the clock re-routing targets nominal skew. 4 possible central buffer locations Phase adjust Clock generation 4 9 possible regional buffers Down paths Launch Test path Capture x16 9 Up paths Capture Test path Launch Fig. 8. A simplified diagram of the test circuitry used in the Virtex-5 experiment. Two clock regions ( top and bottom ) are supplied with separate clocks. The phase offset between the clocks can be adjusted dynamically. A total of 32 paths connect the two regions, 16 in either direction. performed on a Xilinx Virtex-5 XC5VLX50-1 FPGA. These are designed to determine whether or not it is possible to change the clock phase for a region to compensate for skew variation, and if different parallel clock resources do actually exhibit different delays. A simplified diagram of the test circuitry used is shown in Fig. 8. Two clock regions in the FPGA were supplied with separate clocks of the same frequency, where the phase offset between the two clocks can be adjusted dynamically. The phase adjustment was achieved using the Virtex embedded Digital Clock Managers [17]. Thirty-two paths were placed and routed in the FPGA between the two clock regions, 16 in each direction. Paths originating in the lower of the two regions are termed up paths, the others down paths. The observable delay of each path was able to be accurately measured using the method reported in [3]. The observed delay of the path in reality is the sum of the path propagation delay and the clock skew between the start and the end registers of the path. An additional 192 paths were placed and routed in other regions of the FPGA, and were used for calibrating the measurements for environmental changes. The experiment involved measuring the observable delay of the 32 test paths for different clock phase offsets, and when different clock resources were used to route the clock of x16 Fig. 9. Empirical measurements and post-calibration values of observed path delay for all 32 paths (16 up and 16 down ) under test. Each path is measured 36 times, each time with a different combination of central buffer location and regional clock routing. the top-most region. Since the paths under test are invariant, any change in observed delay is therefore actually caused by changes in clock skew. The raw measured path delays for all 36 combinations of clock routing are plotted in the left half of Fig. 9. It can be seen that changing the resources the clock is routed on causes a change in measured delay of up to ±50ps. This is significant when compared to the variation in LUT delay, which has a standard deviation of approximately 11ps in this device [4]. The mean measured path delay is 3705ps. Note that there is a difference in the ensemble measurements of the up paths (3807ps) compared to the down paths (3603ps). There are also differences in delays between paths within the up group and within the down group. These differences are partially due to process variability and partially due to differences in the placement and routing of each path. Since we are interested in compensating for clock skew variation, it is necessary to calibrate the initial data-set to produce a set of values where the the effect of other sources of variation in the delay have been removed. The measurements were first calibrated to remove expected differences in delay using the path and skew timing reported by the vendor timing tools. The resulting values for the delay of each path were then shifted towards the mean to counteract the variance introduced by the LUT in each path. The resulting post-calibrated values, plotted in the right half of Fig. 9, are somewhat artificial but realistic. The effect of different experiments are summarised in Fig. 10. The graph shows the timing offset (degradation) of the slowest path for a given test, relative to the case of nil variation. Nil variation is estimated as the mean of all calibrated delays. In order to gain an insight into how the degree of connectivity between regions affects the results, three bars are shown in each experiment: the case where the regions are connected by just one path in each direction, as well as for four paths and sixteen paths. To give a meaningful sense of scale to the results, standard deviations of LUT delay, σ L, are also plotted on the graph. Using the initial assignment of clock resources and no phase
8 Fig. 10. The observed delay of the slowest path using different compensation techniques. The delay is plotted relative to the nil-variation baseline. Different numbers of paths are considered: 1, 4 and 16 paths in either direction. A scale of LUT delay standard deviations is also plotted for reference. correction, the slowest path delay is degraded by over 10σ L compared to the case of zero skew variation. This is mainly due to the difference between the up and down path delays. By trying four different locations of the main clock buffer, but changing nothing else, this can be reduced by approximately half in this particular instance. A much greater improvement is possible by actively adjusting the clock phase between the two clock regions to cancel the difference in the up and down delays. Using this technique, the timing degradation is reduced to about 1 to 4σ L. The effectiveness of this technique is to some extent limited by the granularity of the phase adjustment possible using the Virtex- 5. With infinitely-adjustable phase, the improvement would be slightly better, as indicated by the Phase (ideal) results. The best result from this series of experiments came from a combination of phase adjustment and clock re-routing. By judicious selection of resources on which to route the clock to the top region, the effect of skew variation could be completely cancelled for the cases of one or four paths. Obviously, the experimental setup does not account for the negative impact on slack of other circuit paths by using this proposed approach. Nevertheless, it demonstrates the effectiveness the technique can have. 5. Conclusions The clock distribution network in FPGAs are substantially different to those in ASICs. The effect of process variability on clock skew, and approaches to mitigate such effects, must therefore also be different. This paper described a proposed clock skew variability model for FPGAs. The model can be used to predict guard-band requirements on clock skew. In addition, two techniques for compensating for skew variability were presented. These involved adjusting the phase of the clock between regions, and using the stochastic differences in duplicated clock resources to achieve better skew timings. Results predicted by the model show that these techniques could significantly reduce the skew guard-band. Phase adjustments alone reduced the guard-band by almost 50%; by additionally routing the clock through the optimal resources the guard-band could be reduced by 70% or more. A reduced skew guard-band ultimately yields better timing. The feasibility of the techniques were also verified experimentally using a Virtex-5 FPGA. Acknowledgements The authors wish to acknowledge the financial support of the EPSRC under Platform Grant EP/C549481/1. References [1] S. R. Nassif, Design for variability in DSM technologies, in Proc. IEEE International Symposium on Quality Electronic Design, [2] P. Sedcole and P. Y. K. Cheung, Within-die delay variability in 90nm FPGAs and beyond, in Proc. IEEE International Conference on Field Programmable Technology, [3] J. S. Wong, P. Sedcole, and P. Y. K. Cheung, Self-characterization of combinatorial circuit delays in FPGAs, in Proc. IEEE International Conference on Field Programmable Technology, [4] P. Sedcole, J. S. Wong, and P. Y. K. Cheung, Characterisation of FPGA clock variability, in Proc. International Symposium on Very Large Scale Integration, [5] V. Mehrotra and D. Boning, Technology scaling impact of variation on clock skew and interconnect delay, in International Interconnect Technology Conference, [6] S. Zanella, A. Nardi, A. Neviani, M. Quarantelli, S. Saxena, and C. Guardiani, Analysis of the impact of process variations on clock skew, IEEE Transactions on Semiconductor Manufacturing, vol. 13, no. 4, pp , Nov [7] A. Agarwal, V. Zolotov, and D. T. Blaauw, Statistical clock skew analysis considering intradie-process variations, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 23, no. 8, pp , Aug [8] M. Hashimoto, T. Yamamoto, and H. Onodera, Analysis of clock skew variation in H-tree structure, in Proc. IEEE International Symposium on Quality Electronic Design, [9] G. Venkataraman, C. N. Sze, and J. Hu, Skew scheduling and clock routing for improved tolerance to process variations, in Proc. Asia and South Pacific Design Automation Conference, [10] A. Rajaram and D. Z. Pan, Fast incremental link insertion in clock networks for skew variability reduction, in Proc. IEEE International Symposium on Quality Electronic Design, [11] A. Chakraborty, K. Duraisami, A. Sathanur, P. Sithambaram, A. Macii, E. Macii, M. Poncino, and L. Benini, Dynamic thermal clock skew compensation using tunable delay buffers, in Proc. International Symposium on Low Power Electronics and Design, [12] A. Kapoor, N. Jayakumar, and S. P. Khatri, A novel clock distribution and dynamic de-skewing methodology, in Proc. International Conference on Computer Aided Design, [13] J.-L. Tsai, L. Zhang, and C. C.-P. Chen, Statistical timing analysis driven post-silicon-tunable clock-tree synthesis, in Proc. International Conference on Computer Aided Design, [14] S. Sivaswamy and K. Bazargan, Statistical generic and chip-specific skew assignment for improving timing yield of FPGAs, in Proc. Field- Programmable Logic and Applications, [15], Statistical analysis and process variation-aware routing and skew assignment for FPGAs, ACM Transactions on Reconfigurable Technology and Systems, vol. 1, no. 1, Mar [16] Virtex-4 User Guide, Xilinx Inc., February [17] Virtex-5 User Guide v3.0, Xilinx Inc., February [18] Stratix II Device Handbook, Altera Corp., May [19] Stratix III Device Handbook, Altera Corp., May [20] J. Xiong, V. Zolotov, and L. He, Robust extraction of spatial correlation, in Proc. International Symposium on Physical Design, [21] W. C. Elmore, The transient response of damped linear networks with particular regard to wideband amplifiers, Journal of Applied Physics, vol. 19, no. 1, pp , Jan [22] H. J. Godwin, Some low moments of order statistics, The Annals of Mathematical Statistics, vol. 20, no. 2, pp , Jun 1949.
On the Tradeoff between Power and Flexibility of FPGA Clock Networks
On the Tradeoff between Power and Flexibility of FPGA Clock Networks JULIEN LAMOUREUX AND STEVEN J.E. WILTON University of British Columbia FPGA clock networks consume a significant amount of power since
More informationWord-length Optimization and Error Analysis of a Multivariate Gaussian Random Number Generator
Word-length Optimization and Error Analysis of a Multivariate Gaussian Random Number Generator Chalermpol Saiprasert, Christos-Savvas Bouganis and George A. Constantinides Department of Electrical & Electronic
More informationParametric Yield in FPGAs Due to Within-die Delay Variations: A Quantitative Analysis
Parametric Yield in FPGAs Due to Within-die Delay Variations: A Quantitative Analysis Pete Sedcole and Peter Y. K. Cheung Dept. Electrical and Electronic Engineering, Imperial College London, UK {pete.sedcole,p.cheung}@imperial.ac.uk
More informationClock signal in digital circuit is responsible for synchronizing the transfer to the data between processing elements.
1 2 Introduction Clock signal in digital circuit is responsible for synchronizing the transfer to the data between processing elements. Defines the precise instants when the circuit is allowed to change
More informationDesign for Manufacturability and Power Estimation. Physical issues verification (DSM)
Design for Manufacturability and Power Estimation Lecture 25 Alessandra Nardi Thanks to Prof. Jan Rabaey and Prof. K. Keutzer Physical issues verification (DSM) Interconnects Signal Integrity P/G integrity
More informationReduced-Area Constant-Coefficient and Multiple-Constant Multipliers for Xilinx FPGAs with 6-Input LUTs
Article Reduced-Area Constant-Coefficient and Multiple-Constant Multipliers for Xilinx FPGAs with 6-Input LUTs E. George Walters III Department of Electrical and Computer Engineering, Penn State Erie,
More informationPARADE: PARAmetric Delay Evaluation Under Process Variation *
PARADE: PARAmetric Delay Evaluation Under Process Variation * Xiang Lu, Zhuo Li, Wangqi Qiu, D. M. H. Walker, Weiping Shi Dept. of Electrical Engineering Dept. of Computer Science Texas A&M University
More informationStatistical Timing Analysis with Path Reconvergence and Spatial Correlations
Statistical Timing Analysis with Path Reconvergence and Spatial Correlations Lizheng Zhang, Yuhen Hu, Charlie Chung-Ping Chen ECE Department, University of Wisconsin, Madison, WI53706-1691, USA E-mail:
More informationLuis Manuel Santana Gallego 31 Investigation and simulation of the clock skew in modern integrated circuits
Luis Manuel Santana Gallego 31 Investigation and simulation of the clock skew in modern egrated circuits 3. Clock skew 3.1. Definitions For two sequentially adjacent registers, as shown in figure.1, C
More informationFast Buffer Insertion Considering Process Variation
Fast Buffer Insertion Considering Process Variation Jinjun Xiong, Lei He EE Department University of California, Los Angeles Sponsors: NSF, UC MICRO, Actel, Mindspeed Agenda Introduction and motivation
More informationMultivariate Gaussian Random Number Generator Targeting Specific Resource Utilization in an FPGA
Multivariate Gaussian Random Number Generator Targeting Specific Resource Utilization in an FPGA Chalermpol Saiprasert, Christos-Savvas Bouganis and George A. Constantinides Department of Electrical &
More informationCMPEN 411 VLSI Digital Circuits Spring 2012 Lecture 17: Dynamic Sequential Circuits And Timing Issues
CMPEN 411 VLSI Digital Circuits Spring 2012 Lecture 17: Dynamic Sequential Circuits And Timing Issues [Adapted from Rabaey s Digital Integrated Circuits, Second Edition, 2003 J. Rabaey, A. Chandrakasan,
More informationConstrained Clock Shifting for Field Programmable Gate Arrays
Constrained Clock Shifting for Field Programmable Gate Arrays Deshanand P. Singh Dept. of Electrical and Computer Engineering University of Toronto Toronto, Canada singhd@eecg.toronto.edu Stephen D. Brown
More informationPARADE: PARAmetric Delay Evaluation Under Process Variation * (Revised Version)
PARADE: PARAmetric Delay Evaluation Under Process Variation * (Revised Version) Xiang Lu, Zhuo Li, Wangqi Qiu, D. M. H. Walker, Weiping Shi Dept. of Electrical Engineering Dept. of Computer Science Texas
More informationVariation-aware Clock Network Design Methodology for Ultra-Low Voltage (ULV) Circuits
Variation-aware Clock Network Design Methodology for Ultra-Low Voltage (ULV) Circuits Xin Zhao, Jeremy R. Tolbert, Chang Liu, Saibal Mukhopadhyay, and Sung Kyu Lim School of ECE, Georgia Institute of Technology,
More informationReducing Delay Uncertainty in Deeply Scaled Integrated Circuits Using Interdependent Timing Constraints
Reducing Delay Uncertainty in Deeply Scaled Integrated Circuits Using Interdependent Timing Constraints Emre Salman and Eby G. Friedman Department of Electrical and Computer Engineering University of Rochester
More informationEEC 216 Lecture #3: Power Estimation, Interconnect, & Architecture. Rajeevan Amirtharajah University of California, Davis
EEC 216 Lecture #3: Power Estimation, Interconnect, & Architecture Rajeevan Amirtharajah University of California, Davis Outline Announcements Review: PDP, EDP, Intersignal Correlations, Glitching, Top
More informationCSE241 VLSI Digital Circuits Winter Lecture 07: Timing II
CSE241 VLSI Digital Circuits Winter 2003 Lecture 07: Timing II CSE241 L3 ASICs.1 Delay Calculation Cell Fall Cap\Tr 0.05 0.2 0.5 0.01 0.02 0.16 0.30 0.5 2.0 0.04 0.32 0.178 0.08 0.64 0.60 1.20 0.1ns 0.147ns
More informationUsing Global Clock Networks
Using Global Clock Networks Introduction Virtex-II devices support very high frequency designs and thus require low-skew advanced clock distribution. With device density up to 0 million system gates, numerous
More informationDesign Exploration of an FPGA-Based Multivariate Gaussian Random Number Generator
Design Exploration of an FPGA-Based Multivariate Gaussian Random Number Generator Chalermpol Saiprasert A thesis submitted for the degree of Doctor of Philosophy in Electrical and Electronic Engineering
More informationVariations-Aware Low-Power Design with Voltage Scaling
Variations-Aware -Power Design with Scaling Navid Azizi, Muhammad M. Khellah,VivekDe, Farid N. Najm Department of ECE, University of Toronto, Toronto, Ontario, Canada Circuits Research, Intel Labs, Hillsboro,
More informationXarxes de distribució del senyal de. interferència electromagnètica, consum, soroll de conmutació.
Xarxes de distribució del senyal de rellotge. Clock skew, jitter, interferència electromagnètica, consum, soroll de conmutació. (transparències generades a partir de la presentació de Jan M. Rabaey, Anantha
More informationHIGH-PERFORMANCE circuits consume a considerable
1166 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL 17, NO 11, NOVEMBER 1998 A Matrix Synthesis Approach to Thermal Placement Chris C N Chu D F Wong Abstract In this
More informationConstraining and Analyzing Source-Synchronous Interfaces
Constraining and Analyzing Source-Synchronous Interfaces December 2007, ver. 2.0 Application Note 433 Introduction This application note describes techniques for constraining and analyzing source-synchronous
More informationSession 8C-5: Inductive Issues in Power Grids and Packages. Controlling Inductive Cross-talk and Power in Off-chip Buses using CODECs
ASP-DAC 2006 Session 8C-5: Inductive Issues in Power Grids and Packages Controlling Inductive Cross-talk and Power in Off-chip Buses using CODECs Authors: Brock J. LaMeres Agilent Technologies Kanupriya
More informationMaking Fast Buffer Insertion Even Faster via Approximation Techniques
Making Fast Buffer Insertion Even Faster via Approximation Techniques Zhuo Li, C. N. Sze, Jiang Hu and Weiping Shi Department of Electrical Engineering Texas A&M University Charles J. Alpert IBM Austin
More informationImplementation of Clock Network Based on Clock Mesh
International Conference on Information Technology and Management Innovation (ICITMI 2015) Implementation of Clock Network Based on Clock Mesh He Xin 1, a *, Huang Xu 2,b and Li Yujing 3,c 1 Sichuan Institute
More informationLecture 9: Clocking, Clock Skew, Clock Jitter, Clock Distribution and some FM
Lecture 9: Clocking, Clock Skew, Clock Jitter, Clock Distribution and some FM Mark McDermott Electrical and Computer Engineering The University of Texas at Austin 9/27/18 VLSI-1 Class Notes Why Clocking?
More informationTAU 2015 Contest Incremental Timing Analysis and Incremental Common Path Pessimism Removal (CPPR) Contest Education. v1.9 January 19 th, 2015
TU 2015 Contest Incremental Timing nalysis and Incremental Common Path Pessimism Removal CPPR Contest Education v1.9 January 19 th, 2015 https://sites.google.com/site/taucontest2015 Contents 1 Introduction
More informationMethodology to Achieve Higher Tolerance to Delay Variations in Synchronous Circuits
Methodology to Achieve Higher Tolerance to Delay Variations in Synchronous Circuits Emre Salman and Eby G. Friedman Department of Electrical and Computer Engineering University of Rochester Rochester,
More informationAn Integer Programming Placement Approach to FPGA Clock Power Reduction
An Integer Programming Placement Approach to FPGA Clock Power Reduction Alireza Rakhshanfar Dept. of ECE, University of Toronto Toronto, ON Canada e-mail: ali.rakhshanfar@utoronto.ca Jason H. Anderson
More informationA Novel Ternary Content-Addressable Memory (TCAM) Design Using Reversible Logic
2015 28th International Conference 2015 on 28th VLSI International Design and Conference 2015 14th International VLSI Design Conference on Embedded Systems A Novel Ternary Content-Addressable Memory (TCAM)
More informationTAU 2014 Contest Pessimism Removal of Timing Analysis v1.6 December 11 th,
TU 2014 Contest Pessimism Removal of Timing nalysis v1.6 ecember 11 th, 2013 https://sites.google.com/site/taucontest2014 1 Introduction This document outlines the concepts and implementation details necessary
More informationDesign Methodology and Tools for NEC Electronics Structured ASIC ISSP
Design Methodology and Tools for NEC Electronics Structured ASIC ISSP Takumi Okamoto NEC Corporation 75 Shimonumabe, Nakahara-ku, Kawasaki, Kanagawa -8666, Japan okamoto@ct.jp.nec.com Tsutomu Kimoto Naotaka
More informationCMOS device technology has scaled rapidly for nearly. Modeling and Analysis of Nonuniform Substrate Temperature Effects on Global ULSI Interconnects
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 24, NO. 6, JUNE 2005 849 Modeling and Analysis of Nonuniform Substrate Temperature Effects on Global ULSI Interconnects
More informationPhysical Design of Digital Integrated Circuits (EN0291 S40) Sherief Reda Division of Engineering, Brown University Fall 2006
Physical Design of Digital Integrated Circuits (EN0291 S40) Sherief Reda Division of Engineering, Brown University Fall 2006 1 Lecture 04: Timing Analysis Static timing analysis STA for sequential circuits
More informationEE115C Winter 2017 Digital Electronic Circuits. Lecture 19: Timing Analysis
EE115C Winter 2017 Digital Electronic Circuits Lecture 19: Timing Analysis Outline Timing parameters Clock nonidealities (skew and jitter) Impact of Clk skew on timing Impact of Clk jitter on timing Flip-flop-
More informationVariation-Resistant Dynamic Power Optimization for VLSI Circuits
Process-Variation Variation-Resistant Dynamic Power Optimization for VLSI Circuits Fei Hu Department of ECE Auburn University, AL 36849 Ph.D. Dissertation Committee: Dr. Vishwani D. Agrawal Dr. Foster
More informationIntel Stratix 10 Thermal Modeling and Management
Intel Stratix 10 Thermal Modeling and Management Updated for Intel Quartus Prime Design Suite: 17.1 Subscribe Send Feedback Latest document on the web: PDF HTML Contents Contents 1...3 1.1 List of Abbreviations...
More informationCapturing Post-Silicon Variations using a Representative Critical Path
1 Capturing Post-Silicon Variations using a Representative Critical Path Qunzeng Liu and Sachin S. Sapatnekar Abstract In nanoscale technologies that experience large levels of process variation, post-silicon
More informationNovel Devices and Circuits for Computing
Novel Devices and Circuits for Computing UCSB 594BB Winter 2013 Lecture 4: Resistive switching: Logic Class Outline Material Implication logic Stochastic computing Reconfigurable logic Material Implication
More informationSkew Management of NBTI Impacted Gated Clock Trees
International Symposium on Physical Design 2010 Skew Management of NBTI Impacted Gated Clock Trees Ashutosh Chakraborty and David Z. Pan ECE Department, University of Texas at Austin ashutosh@cerc.utexas.edu
More informationInterconnect Yield Model for Manufacturability Prediction in Synthesis of Standard Cell Based Designs *
Interconnect Yield Model for Manufacturability Prediction in Synthesis of Standard Cell Based Designs * Hans T. Heineken and Wojciech Maly Department of Electrical and Computer Engineering Carnegie Mellon
More informationA Random Walk from Async to Sync. Paul Cunningham & Steev Wilcox
A Random Walk from Async to Sync Paul Cunningham & Steev Wilcox Thank You Ivan In the Beginning March 2002 Azuro Day 1 Some money in the bank from Angel Investors 2 employees Small Office rented from Cambridge
More informationFPGA Implementation of a Predictive Controller
FPGA Implementation of a Predictive Controller SIAM Conference on Optimization 2011, Darmstadt, Germany Minisymposium on embedded optimization Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan
More informationClosed Form Expressions for Delay to Ramp Inputs for On-Chip VLSI RC Interconnect
ISSN -77 (Paper) ISSN -87 (Online) Vol.4, No.7, - National Conference on Emerging Trends in Electrical, Instrumentation & Communication Engineering Closed Form Expressions for Delay to Ramp Inputs for
More informationCARNEGIE MELLON UNIVERSITY DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING DIGITAL INTEGRATED CIRCUITS FALL 2002
CARNEGIE MELLON UNIVERSITY DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING 18-322 DIGITAL INTEGRATED CIRCUITS FALL 2002 Final Examination, Monday Dec. 16, 2002 NAME: SECTION: Time: 180 minutes Closed
More informationStack Sizing for Optimal Current Drivability in Subthreshold Circuits REFERENCES
598 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL 16, NO 5, MAY 2008 design can be easily expanded to a hierarchical 64-bit adder such that the result will be attained in four cycles
More informationLogical Effort: Designing for Speed on the Back of an Envelope David Harris Harvey Mudd College Claremont, CA
Logical Effort: Designing for Speed on the Back of an Envelope David Harris David_Harris@hmc.edu Harvey Mudd College Claremont, CA Outline o Introduction o Delay in a Logic Gate o Multi-stage Logic Networks
More informationRobust Clock Tree Synthesis with Timing Yield Optimization for 3D-ICs
Robust Clock Tree Synthesis with Timing Yield Optimization for 3D-ICs Jae-Seok Yang, Jiwoo Pak, Xin Zhao, Sung Kyu Lim, and David Z. Pan Dept. of ECE, The University of Texas at Austin, TX USA School of
More informationPre and post-silicon techniques to deal with large-scale process variations
Pre and post-silicon techniques to deal with large-scale process variations Jaeyong Chung, Ph.D. Department of Electronic Engineering Incheon National University Outline Introduction to Variability Pre-silicon
More informationDelay Variation Tolerance for Domino Circuits
Delay Variation Tolerance for Domino Circuits Student: Kai-Chiang Wu Advisor: Shih-Chieh Chang Department of Computer Science National Tsing Hua University Hsinchu, Taiwan 300, R.O.C. June, 2004 Abstract
More informationThermal-reliable 3D Clock-tree Synthesis Considering Nonlinear Electrical-thermal-coupled TSV Model
Thermal-reliable 3D Clock-tree Synthesis Considering Nonlinear Electrical-thermal-coupled TSV Model Yang Shang 1, Chun Zhang 1, Hao Yu 1, Chuan Seng Tan 1, Xin Zhao 2, Sung Kyu Lim 2 1 School of Electrical
More informationVery Large Scale Integration (VLSI)
Very Large Scale Integration (VLSI) Lecture 4 Dr. Ahmed H. Madian Ah_madian@hotmail.com Dr. Ahmed H. Madian-VLSI Contents Delay estimation Simple RC model Penfield-Rubenstein Model Logical effort Delay
More informationDesign for Variability and Signoff Tips
Design for Variability and Signoff Tips Alexander Tetelbaum Abelite Design Automation, Walnut Creek, USA alex@abelite-da.com ABSTRACT The paper provides useful design tips and recommendations on how to
More informationStatistical Clock Skew Modeling With Data Delay Variations
888 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 9, NO. 6, DECEMBER 2001 Statistical Clock Skew Modeling With Data Delay Variations David Harris and Sam Naffziger Abstract Accurate
More informationBuffered Clock Tree Sizing for Skew Minimization under Power and Thermal Budgets
Buffered Clock Tree Sizing for Skew Minimization under Power and Thermal Budgets Krit Athikulwongse, Xin Zhao, and Sung Kyu Lim School of Electrical and Computer Engineering Georgia Institute of Technology
More informationA Novel LUT Using Quaternary Logic
A Novel LUT Using Quaternary Logic 1*GEETHA N S 2SATHYAVATHI, N S 1Department of ECE, Applied Electronics, Sri Balaji Chockalingam Engineering College, Arani,TN, India. 2Assistant Professor, Department
More informationAn Automated Approach for Evaluating Spatial Correlation in Mixed Signal Designs Using Synopsys HSpice
Spatial Correlation in Mixed Signal Designs Using Synopsys HSpice Omid Kavehei, Said F. Al-Sarawi, Derek Abbott School of Electrical and Electronic Engineering The University of Adelaide Adelaide, SA 5005,
More informationStatistical Analysis Techniques for Logic and Memory Circuits
Statistical Analysis Techniques for Logic and Memory Circuits A DISSERTATION SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY Qunzeng Liu IN PARTIAL FULFILLMENT OF THE
More informationThe Linear-Feedback Shift Register
EECS 141 S02 Timing Project 2: A Random Number Generator R R R S 0 S 1 S 2 1 0 0 0 1 0 1 0 1 1 1 0 1 1 1 0 1 1 0 0 1 1 0 0 The Linear-Feedback Shift Register 1 Project Goal Design a 4-bit LFSR SPEED, SPEED,
More informationChapter 2 Process Variability. Overview. 2.1 Sources and Types of Variations
Chapter 2 Process Variability Overview Parameter variability has always been an issue in integrated circuits. However, comparing with the size of devices, it is relatively increasing with technology evolution,
More informationDetermining Appropriate Precisions for Signals in Fixed-Point IIR Filters
38.3 Determining Appropriate Precisions for Signals in Fixed-Point IIR Filters Joan Carletta Akron, OH 4435-3904 + 330 97-5993 Robert Veillette Akron, OH 4435-3904 + 330 97-5403 Frederick Krach Akron,
More informationSINCE the early 1990s, static-timing analysis (STA) has
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 27, NO. 4, APRIL 2008 589 Keynote Paper Statistical Timing Analysis: From Basic Principles to State of the Art David
More informationReducing power in using different technologies using FSM architecture
Reducing power in using different technologies using FSM architecture Himani Mitta l, Dinesh Chandra 2, Sampath Kumar 3,2,3 J.S.S.Academy of Technical Education,NOIDA,U.P,INDIA himanimit@yahoo.co.in, dinesshc@gmail.com,
More informationStatistical Clock Skew Modeling with Data Delay Variations
Statistical Clock Skew Modeling with Data Delay Variations Abstract David Harris 1 and Sam Naffziger 2 David_Harris@hmc.edu, sdn@fc.hp.com Accurate clock skew budgets are important for microprocessor designers
More informationLongest Path Selection for Delay Test under Process Variation
2093 1 Longest Path Selection for Delay Test under Process Variation Xiang Lu, Zhuo Li, Wangqi Qiu, D. M. H. Walker and Weiping Shi Abstract Under manufacturing process variation, a path through a net
More informationOptimum Prefix Adders in a Comprehensive Area, Timing and Power Design Space
Optimum Prefix Adders in a Comprehensive Area, Timing and Power Design Space Jianhua Liu, Yi Zhu, Haikun Zhu, John Lillis 2, Chung-Kuan Cheng Department of Computer Science and Engineering University of
More informationInterconnect s Role in Deep Submicron. Second class to first class
Interconnect s Role in Deep Submicron Dennis Sylvester EE 219 November 3, 1998 Second class to first class Interconnect effects are no longer secondary # of wires # of devices More metal levels RC delay
More informationLuis Manuel Santana Gallego 71 Investigation and simulation of the clock skew in modern integrated circuits. Clock Skew Model 1
Luis Manuel Santana Gallego 71 Appendix 1 Clock Skew Model 1 Steven D. Kugelmass, Kenneth Steiglitz [KUG-88] 1. Introduction The accumulation of clock skew, the differences in arrival times of signal in
More informationReversible Implementation of Ternary Content Addressable Memory (TCAM) Interface with SRAM
International Journal of Electrical Electronics Computers & Mechanical Engineering (IJEECM) ISSN: 2278-2808 Volume 5 Issue 4 ǁ April. 2017 IJEECM journal of Electronics and Communication Engineering (ijeecm-jec)
More informationULTRALOW VOLTAGE (ULV) circuits, where the supply
1222 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 31, NO. 8, AUGUST 2012 Variation-Aware Clock Network Design Methodology for Ultralow Voltage (ULV) Circuits Xin
More informationNovel Bit Adder Using Arithmetic Logic Unit of QCA Technology
Novel Bit Adder Using Arithmetic Logic Unit of QCA Technology Uppoju Shiva Jyothi M.Tech (ES & VLSI Design), Malla Reddy Engineering College For Women, Secunderabad. Abstract: Quantum cellular automata
More informationHigh Speed Time Efficient Reversible ALU Based Logic Gate Structure on Vertex Family
International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 11, Issue 04 (April 2015), PP.72-77 High Speed Time Efficient Reversible ALU Based
More informationNon-Invasive Pre-Bond TSV Test Using Ring Oscillators and Multiple Voltage Levels
Non-Invasive Pre-Bond TSV Test Using Ring Oscillators and Multiple Voltage Levels Sergej Deutsch and Krishnendu Chakrabarty Duke University Durham, NC 27708, USA Abstract Defects in TSVs due to fabrication
More informationItanium TM Processor Clock Design
Itanium TM Processor Design Utpal Desai 1, Simon Tam, Robert Kim, Ji Zhang, Stefan Rusu Intel Corporation, M/S SC12-502, 2200 Mission College Blvd, Santa Clara, CA 95052 ABSTRACT The Itanium processor
More informationEE382 Processor Design Winter 1999 Chapter 2 Lectures Clocking and Pipelining
Slide 1 EE382 Processor Design Winter 1999 Chapter 2 Lectures Clocking and Pipelining Slide 2 Topics Clocking Clock Parameters Latch Types Requirements for reliable clocking Pipelining Optimal pipelining
More informationPerformance and Variability Driven Guidelines for BEOL Layout Decomposition with LELE Double Patterning
Performance and Variability Driven Guidelines for BEOL Layout Decomposition with LELE Double Patterning Tuck-Boon Chan, Kwangok Jeong and Andrew B. Kahng ECE and CSE Depts., University of California at
More informationName: Answers. Mean: 83, Standard Deviation: 12 Q1 Q2 Q3 Q4 Q5 Q6 Total. ESE370 Fall 2015
University of Pennsylvania Department of Electrical and System Engineering Circuit-Level Modeling, Design, and Optimization for Digital Systems ESE370, Fall 2015 Final Tuesday, December 15 Problem weightings
More informationA COMBINED 16-BIT BINARY AND DUAL GALOIS FIELD MULTIPLIER. Jesus Garcia and Michael J. Schulte
A COMBINED 16-BIT BINARY AND DUAL GALOIS FIELD MULTIPLIER Jesus Garcia and Michael J. Schulte Lehigh University Department of Computer Science and Engineering Bethlehem, PA 15 ABSTRACT Galois field arithmetic
More informationSingle Stuck-At Fault Model Other Fault Models Redundancy and Untestable Faults Fault Equivalence and Fault Dominance Method of Boolean Difference
Single Stuck-At Fault Model Other Fault Models Redundancy and Untestable Faults Fault Equivalence and Fault Dominance Method of Boolean Difference Copyright 1998 Elizabeth M. Rudnick 1 Modeling the effects
More informationEE 330 Lecture 39. Digital Circuits. Propagation Delay basic characterization Device Sizing (Inverter and multiple-input gates)
EE 330 Lecture 39 Digital ircuits Propagation Delay basic characterization Device Sizing (Inverter and multiple-input gates) Review from last lecture Other MOS Logic Families Enhancement Load NMOS Enhancement
More informationOverlay Aware Interconnect and Timing Variation Modeling for Double Patterning Technology
Overlay Aware Interconnect and Timing Variation Modeling for Double Patterning Technology Jae-Seok Yang, David Z. Pan Dept. of ECE, The University of Texas at Austin, Austin, Tx 78712 jsyang@cerc.utexas.edu,
More informationImpact of Modern Process Technologies on the Electrical Parameters of Interconnects
Impact of Modern Process Technologies on the Electrical Parameters of Interconnects Debjit Sinha, Jianfeng Luo, Subramanian Rajagopalan Shabbir Batterywala, Narendra V Shenoy and Hai Zhou EECS, Northwestern
More informationUsing A54SX72A and RT54SX72S Quadrant Clocks
Application Note AC169 Using A54SX72A and RT54SX72S Quadrant Clocks Architectural Overview The A54SX72A and RT54SX72S devices offer four quadrant clock networks (QCLK0, 1, 2, and 3) that can be driven
More informationMicro-architecture Pipelining Optimization with Throughput- Aware Floorplanning
Micro-architecture Pipelining Optimization with Throughput- Aware Floorplanning Yuchun Ma* Zhuoyuan Li* Jason Cong Xianlong Hong Glenn Reinman Sheqin Dong* Qiang Zhou *Department of Computer Science &
More informationProgrammable Logic Devices
Programmable Logic Devices Mohammed Anvar P.K AP/ECE Al-Ameen Engineering College PLDs Programmable Logic Devices (PLD) General purpose chip for implementing circuits Can be customized using programmable
More informationDesign and Implementation of Carry Tree Adders using Low Power FPGAs
1 Design and Implementation of Carry Tree Adders using Low Power FPGAs Sivannarayana G 1, Raveendra babu Maddasani 2 and Padmasri Ch 3. Department of Electronics & Communication Engineering 1,2&3, Al-Ameer
More informationFor smaller NRE cost For faster time to market For smaller high-volume manufacturing cost For higher performance
University of California at Berkeley College of Engineering Department of Electrical Engineering and Computer Sciences EECS5 J. Wawrzynek Spring 22 2/22/2. [2 pts] Short Answers. Midterm Exam I a) [2 pts]
More informationDesign of Arithmetic Logic Unit (ALU) using Modified QCA Adder
Design of Arithmetic Logic Unit (ALU) using Modified QCA Adder M.S.Navya Deepthi M.Tech (VLSI), Department of ECE, BVC College of Engineering, Rajahmundry. Abstract: Quantum cellular automata (QCA) is
More informationToday. ESE532: System-on-a-Chip Architecture. Energy. Message. Preclass Challenge: Power. Energy Today s bottleneck What drives Efficiency of
ESE532: System-on-a-Chip Architecture Day 20: November 8, 2017 Energy Today Energy Today s bottleneck What drives Efficiency of Processors, FPGAs, accelerators How does parallelism impact energy? 1 2 Message
More informationDesign and FPGA Implementation of Radix-10 Algorithm for Division with Limited Precision Primitives
Design and FPGA Implementation of Radix-10 Algorithm for Division with Limited Precision Primitives Miloš D. Ercegovac Computer Science Department Univ. of California at Los Angeles California Robert McIlhenny
More informationChapter 1: Logic systems
Chapter 1: Logic systems 1: Logic gates Learning Objectives: At the end of this topic you should be able to: identify the symbols and truth tables for the following logic gates: NOT AND NAND OR NOR XOR
More informationA Novel Flow for Reducing Clock Skew Considering NBTI Effect and Process Variations
A Novel Flow for Reducing Clock Skew Considering NBTI Effect and Process Variations Jifeng Chen, and Mohammad Tehranipoor University of Connecticut, Storrs, CT 6269, USA {jifeng.chen, tehrani}@engr.uconn.edu
More informationNANO-CMOS DESIGN FOR MANUFACTURABILILTY
NANO-CMOS DESIGN FOR MANUFACTURABILILTY Robust Circuit and Physical Design for Sub-65nm Technology Nodes Ban Wong Franz Zach Victor Moroz An u rag Mittal Greg Starr Andrew Kahng WILEY A JOHN WILEY & SONS,
More informationPOLITECNICO DI TORINO Repository ISTITUZIONALE
POLITECNICO DI TORINO Repository ISTITUZIONALE Modeling of thermally induced skew variations in clock distribution network Original Modeling of thermally induced skew variations in clock distribution network
More informationAccounting for Non-linear Dependence Using Function Driven Component Analysis
Accounting for Non-linear Dependence Using Function Driven Component Analysis Lerong Cheng Puneet Gupta Lei He Department of Electrical Engineering University of California, Los Angeles Los Angeles, CA
More informationPLA Minimization for Low Power VLSI Designs
PLA Minimization for Low Power VLSI Designs Sasan Iman, Massoud Pedram Department of Electrical Engineering - Systems University of Southern California Chi-ying Tsui Department of Electrical and Electronics
More informationLow-complexity generation of scalable complete complementary sets of sequences
University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2006 Low-complexity generation of scalable complete complementary sets
More informationVariability Aware Statistical Timing Modelling Using SPICE Simulations
Variability Aware Statistical Timing Modelling Using SPICE Simulations Master Thesis by Di Wang Informatics and Mathematical Modelling, Technical University of Denmark January 23, 2008 2 Contents List
More information