Parametric Yield in FPGAs Due to Within-die Delay Variations: A Quantitative Analysis

Size: px

Start display at page:

Download "Parametric Yield in FPGAs Due to Within-die Delay Variations: A Quantitative Analysis"

Clemence Singleton
5 years ago
Views:

1 Parametric Yield in FPGAs Due to Within-die Delay Variations: A Quantitative Analysis Pete Sedcole and Peter Y. K. Cheung Dept. Electrical and Electronic Engineering, Imperial College London, UK {pete.sedcole,p.cheung}@imperial.ac.uk ABSTRACT Variations in the semiconductor fabrication process results in variability in parameters between transistors on the same die, a problem exacerbated by lithographic scaling. The reconfigurability of Field-Programmable Gate Arrays presents the opportunity to compensate for within-die delay variability. This paper presents three reconfiguration-based strategies for compensating within-die stochastic delay variability in FPGAs: reconfiguring the entire FPGA, relocating subcircuits within an FPGA, and reconfiguring signal paths within a design. The yield of each strategy is analysed and compared with worst-case design and statistical static timing analysis (SSTA). It is demonstrated that significant improvements in circuit yield and timing are possible using SSTA alone, and these improvements can be enhanced by employing reconfiguration-based techniques. Categories and Subject Descriptors B.6. [Integrated Circuits]: Design Styles Logic arrays; B.7. [Integrated Circuits]: Types and Design Styles Advanced technologies General Terms Performance, Keywords Delay, FPGA, modelling, process variation, reconfiguration, statistical theory, within-die variability, yield. INTRODUCTION Variations in process parameters during semiconductor fabrication are manifested in the variability of the performance of the resulting integrated circuits. Historically, performance parameters have varied from wafer to wafer or lot to lot. At-speed testing techniques combined with speed-binning has been employed to partially compensate Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. FPGA 7, February 8, 7, Monterey, California, USA. Copyright 7 ACM /7/...$5.. for variations in propagation delay between dice. In deepsubmicron technology nodes, variations in transistor and wire parameters within the same die are expected to become significant [5, 8]. The parametric difference between two nominally identical features on the same die is partly stochastic and partly correlated, with the correlation depending on physical locality. Importantly, several sources of stochastic variation are intrinsic to the materials used in fabrication [3, ]. Stochastic variability cannot therefore be eliminated by improving the fabrication process, and is in fact predicted to increase relative to other sources of variability [5]. Like other high-performance integrated circuits, Field- Programmable Gate Arrays (FPGAs) are affected by parametric variability. However, their reconfigurability gives FP- GAs two distinct advantages over ASIC solutions. Firstly, the actual performance of each FPGA can be measured and characterised by configuring the device with Built-In Self- Test (BIST) circuits. Secondly, in theory it is possible to compensate for, or even make use of, the variations in performance by adapting the application circuit based on the measured parameters of the target FPGA (see for example [8]). There are a number of ways in which a circuit could be made adaptive to within-fpga variations in performance. The approach taken has a significant impact on the development of parametric test techniques, circuit design methods and tools. It is crucially important to quantify the performance improvement a given approach is expected to provide. The novel contributions of this paper are: (a) a discussion of generalised reconfiguration-based strategies for variationadaptive circuits in FPGAs (in Section 3), (b) analytical models based on the statistical theory underlying each strategy, as well as statistical static timing analysis and worstcase design (Section 4), (c) comparisons of the various techniques using the models, verified by Monte Carlo simulations (Section 5). This fundamental theoretical research provides a basis for exploration of variability-adaption techniques.. BACKGROUND The manufacture of high-performance digital integrated circuits requires rigorous control over many process variables, each of which influence propagation delay to a different extent [5]. Deviations from nominal values in process variables can be systematic or stochastic [4, ]. The effect can be localised to a few transistors, a die, a wafer or an entire lot. Systematic variations induce a shift in circuit parameters, sources of which include, for example, mask er- 78

2 rors due to inaccuracies in the process model, lithographic off-axis focusing errors and reticle stepper alignment errors. Stochastic variations cause circuit parameters to increase in spread, and comes from sources such as vibrations during lithography, wafer unevenness, and non-uniformity in resist thickness. Importantly, some sources of stochastic variability are not caused by imperfections in the fabrication process but are the result of the discrete or granular nature of materials at nanometre scales. Such sources of variation are termed intrinsic and include line-edge roughness, random discrete dopants, and oxide thickness fluctuations [3, ]. As these sources cannot be corrected by improving the process, they must be compensated for by new devices or novel circuit and system level design techniques. Within-die variability exacerbates verification complexity. The conventional approach to verifying circuit designs in the presence of variation uses static timing analysis coupled with corner-case or Monte Carlo simulations. This is feasible if parameters are constant over the entire circuit. In the extreme case, where every transistor and wire in the design has parameters which vary independently, the dimensionality of the parameter space becomes too large for simulation-based methods to be practical. Statistical static timing analysis (SSTA) is a promising new approach to timing analysis which incorporates the effects of within-die variations [5, 9]. The analysis can consider complete end-to-end signal paths or propagate statistically described delays block-by-block through the circuit. Lin, Hutton and Le recently applied SSTA to enhance placement and routing techniques in FPGAs [3]. Some research has been reported on reducing variability in FPGA architectures. Nabaa et al. describe a selfcharacterising and adaptive FPGA which compensates for variability using body-biasing [4]. Wong et al. determine yields using SPICE and numerical methods, and use the information to investigate the effect of LUT and cluster sizes []. The work presented in this paper uses analytical models; moreover, we examine the effect of end-user reconfiguration strategies on yield. Delay variability testing is similar to delay fault detection, which is performed using at-speed tests. In FPGAs, at-speed testing can be performed by Built-In Self-Test circuits, which can be design-independent [, 6] or designspecific [7, ]. Published work describing at-speed testing for within-fpga variability includes Li et al. [] and Katsuki et al. [9]. Li et al. used an array of ring oscillators to detect process variations in commercially available FP- GAs from Xilinx. Katsuki et al. made measurements on a custom-designed 9nm LUT array, and also describe a yield enhancement scheme, whereby placement is optimised based on the measured LUT variations [8]. 3. STRATEGIES This section describes several strategies for variabilityadaptive design in FPGAs. The focus is on delay variations. For reference, a typical signal path in an FPGA is shown in Figure. A path is comprised of a number of elements, which can be considered to be individual basic features (such as LUTs, interconnect switches or wire segments) or groups clk local wiring d switch matrix wire segment switch matrix local wiring LUT other logic and wiring d d3 d4 d5 d6 d7 delays Figure : A signal path is made up of a number of elements, each contributing to the signal propagation delay. of basic features. The propagation delay of the path is the sum of the element delays. Note that all strategies focus on stochastic rather than correlated variation. The speed-binning process gives a guarantee on the maximum delay of the slowest element in the FPGA. Since we do not assume any knowledge about the nature of the correlated variation in the FPGA, to be conservative it is necessary to assume that there are no parts of the FPGA which are significantly faster; in other words, the correlated variation is negligible. For completeness, worst-case timing analysis is considered first. In this case, parametric yield is a manufacturing issue for the FPGA vendor. The remaining techniques are based on achieving a design-specific yield determined by the end-user. How the variability-adaptive strategies are executed in practice is an important consideration. However, before embarking on a practical implementation of any approach, it is critically important to know its theoretical limitations. The knowledge obtained from the theoretical investigation can then be used to pursue the development of the most profitable strategy. This paper concentrates on the theory underpinning each approach. Nevertheless, some practical constraints are assumed. In particular, it will be necessary for the end-user to be able to perform at-speed testing of the design in the FPGA, most likely using BIST techniques to avoid the expense of automatic test equipment. Moreover, it is assumed that BIST will indicate whether a given path or circuit passes or fails the at-speed test. It is improbable that path delays would be quantified, much less delays of individual LUTs and wires, using BIST. 3. Worst-case timing The simplest timing strategy is to assign an upper bound on the value of the delay for each path element in the die. These worst-case values must take into account all sources of variability: within-die and between dice. In speed-binned FPGAs, the delay values can be determined during at-speed testing. If there is little within-die variability, testing is straightforward: each die can be characterised by measuring the delay of a localised test structure. However, if withindie variation is a significant part of overall delay variability, the test coverage must be either exhaustive, or at least sufficiently comprehensive to enable statistically reliable bounds on the slowest element in the die. Provided the signal path does not include chains of unbuffered wire segments, which would result in a more complex path delay function. In practice, FPGA routing is buffered, so this is a valid assumption. clk 79

3 Using worst-case timing, all parametric yield issues are the responsibility of the FPGA vendor; end-user designs are guaranteed to operate correctly at-speed. The designed delay for a signal path is the sum of the worst-case delays of the individual elements. Under stochastic variation, it is improbable that all elements in a path will exhibit near worst-case delays, making the design overly conservative. Swap modules Shift into spare space 3. Statistical static timing analysis Research into statistical static timing analysis applied to FPGA designs has only recently begun [3]. It is instructive to examine the use of SSTA in FPGAs, particularly as a comparison for the reconfiguration-based techniques described below. Statistical static timing analysis improves on worst-case timing by taking into account the probability that each path element has a given delay. The delay of a complete path can therefore be statistically described. Conventional STA will identify a single path in a given circuit implementation as the critical path, having the least slack. However, when implemented in FPGAs with significant within-die variation, the critical path may differ from device to device based on the variation in each die. Using SSTA, a circuit can be designed such that required timing is achieved at a given yield, taking into consideration all paths, not just the path with the least nominal slack. SSTA can be path-based or block-based. A path-based scheme examines complete end-to-end signal paths separately, making it highly accurate but computationally expensive. Block-based schemes are faster and resemble conventional static timing analysers, in that maximum delay values are propagated through leaf nodes of a path in parallel. This is less accurate as the maximum of statistically described values can in general only be estimated. Theoretically, using SSTA in FPGAs requires at-speed testing of all end-user products. The tests would need to be specific to the end-user designs. Testing could be neglected by selecting a sufficiently high design yield, such that the risk of not testing is acceptably low, a decision that would be application and market dependent. This strategy, although not using the full benefit of SSTA, nevertheless would outperform worst-case timing, and would be more amenable to in-field upgrades. Ultimately, statistical static timing analysis enables a trade-off between product parametric yield and speed. 3.3 Multiple configurations We now examine the class of strategies which makes use of the reconfigurability of the FPGA, the first of which is predicated on the use of multiple implementations of the same circuit design. Statistically, a given implementation of a circuit has a certain probability of passing at-speed testing when configured on an FPGA. If several implementations of the circuit are generated, then there is an increased probability that at least one of the implementations will meet at-speed requirements. In this context, a circuit implementation is stored as a configuration bitstream. All configurations are functionally identical, and could be generated from the same netlist if the placement and routing of the netlist differs between configurations. Ideally, each configuration uses a different set of resources for the critical path (or near-critical paths); if re- Move to spare region FPGA Subcircuit module Figure : Relocating regions in an FPGA. source usage is highly correlated between configurations the effectiveness of this technique is diminished. This strategy requires a specific at-speed test for each configuration. Tests are run one by one in a given FPGA until a configuration is found which passes or all configuration options are exhausted. In the later case, the FPGA is failed. Multiple configurations adds a degree of freedom (the number of configurations) to the design space, in addition to parametric yield and speed. The design will therefore theoretically outperform statistical static timing analysis. The approach has limitations. Several circuit configurations and test configurations must be generated and stored. Storage can be particularly problematic if the strategy is implemented on-line in an embedded system. Design constraints will generally preclude completely uncorrelated configurations. 3.4 Region relocation The second strategy which exploits reconfiguration involves reconfiguring and relocating parts of a complete circuit. The premise is similar to the multiple configuration case: different implementations of the same circuit increase the probability that there exists one implementation that passes at-speed testing. In this case, instead of completely reconfiguring the FPGA, different configurations are created by partitioning the circuit into modules and then assembling the modules in different ways. Different approaches to this strategy are illustrated in Figure. Fundamentally, the circuit design must be sufficiently modular such that critical (or near-critical) paths are encapsulated in module blocks. Moreover, the design must support some degree of relocation of the modular blocks. This can include, for example, swapping the location of modules in the FPGA, or shifting modules into unused areas. With appropriate constraints on the circuit design, it is possible to store the implemented circuit modules as partial bitstreams, and perform module relocation using dynamic reconfiguration [6]. A relocatable at-speed test configuration is required for each module. Compared with the multiple configuration case, the amount of bitstream storage required is reduced, and the implementation of the circuit only needs to be generated once. The approach has some limitations. Implementing relocatable modular circuits increases the complexity of system design, in particular in the connectivity between modules. 8

4 Moreover, while there are many ways to assemble circuit modules to form different implementations of the system, the implementations are clearly not all independent. The space of potential solutions is therefore large and not trivial to search. 3.5 Path reconfiguration A signal path, when implemented on a particular FPGA, may fail at-speed testing. Rather than discard the circuit, it may be possible to reduce the propagation delay by making adjustments to the way the path is implemented. A change to a signal path is called isochronal if, in the absence of within-die variability, the path delay would remain the same. Examples of isochronal changes can include altering the equation implemented by a LUT, swapping off-path LUT inputs, or even coarse alterations such as rerouting the path through a different logic cell or component. Although such changes are nominally isochronal, when within-die variations are present the path delay will be affected. It is possible that the delay can be reduced, such that the path meets speed requirements. 4. MODELLING AND ANALYSIS In the preceding section, several broad strategies for variability aware design were described. Before pursuing an implementation of any particular strategy, it is expedient to determine quantitatively the benefits the approach will provide. This section presents an analysis of each of the strategies of Section 3. The objective is to determine bounds on the relative yield or speed improvement of each approach. In the models which follow, die-to-die variation is ignored as it can be accounted for by speed-binning. It is assumed that any within-die variation present is stochastic in nature; correlated within-die variation is negligible. 4. Notation Some of the notation used in the following analysis is listed in Table. Other notation will be introduced as necessary. The error function, erf, has the usual definition: erf(z) π Z z e x dx The complementary error function, erfc, is defined as erfc(z) erf(z). 4. Worst-case timing An FPGA has many types of primitives which may form elemental parts of a signal path, such as LUTs, wires, interconnect switch points, multipliers, and so on. For timing modelling, a path element is composed of the smallest interesting segment of a signal path, such as a LUT together with the input and output wiring. Parametric yield modelling can be applied to one or more different elemental types. Assume there are K types of element of interest for parametric yield in an FPGA, L k elements of type k, and the delay through any given type k element is normally distributed: N(µ k, σ k ). The cumulative probability distribution for the delay of a type k element is: D k (d k ) = + erf dk µ k σ k The parametric yield of an element type, Y k, is an order statistic which depends on L k, the total number of elements «() T Target path delay D Cumulative delay distribution µ π Mean delay of a path π σπ Variance of the delay of a path π τ = T µ π Target path delay relative to the mean d i = µ i + X i Delay of path element i X i Random variable of delay of element i Z π = P N i X i Random variable of delay of path π N Number of elements in a path P Number of paths in a circuit Y Die yield Table : Notation used in the analysis. of that type in the FPGA. The manufacturing parametric yield of the FPGA is: KY KY Y = Y k = [D k (d k )] L k () k= k= We will assume that the yield of each elemental type is balanced, such that Y k = Y K. If this is the case, the designedfor delay of a signal path of N elements is: NX NX T = d i = hµ i i + σ i erf L Y ik i= i= where the ith element has mean delay µ i and variance σi, dependent on its type. Considering the case where parametric yield is applied to LUTs only, for an FPGA with L LUTs, the relative target delay for a given yield Y is: T µ π = N erf Y L (4) σ π 4.3 Statistical static timing analysis Path-based statistical static timing analysis is examined here, as it is generally more accurate than block-based SSTA. For a single path π in a given circuit implementation, with mean delay µ π and variance σ π, the yield of the path will be: Y π = D π(t ) = + erf T µπ σ π In general, a circuit implementation will have a number of paths which will contribute to yield loss. The impact each path has on the die yield is related to the path delay mean and variance. For simplicity, assume that P of the paths have sufficiently little delay slack such that they impact on yield; these we will label near-critical paths. The remaining paths in the circuit have a negligible effect on yield. Moreover, assume each near-critical path has the same mean delay and variance, and therefore the same yield. To be consistent, the same approximation will be made when analysing the other strategies. The yield of the circuit is the product of the yields of the paths: «(3) (5) Y = D π(t ) P (6) The relative target delay for a given yield is: T µ π = erf Y P σ π (7) 8

5 This assumes that all the P paths are independent and separate. If the near-critical paths share segments, the yield will be higher than (6), as the effective number of paths (P eff ) is lower. In the limit, P eff as the correlation between paths approaches unity. 4.4 Multiple configurations To be consistent with the SSTA evaluation, it is again assumed that a circuit has P near-critical paths, each with mean delay µ π and variance σ π. The yield of an individual path is given by (5). For a single configuration, the die yield is given by (6). When multiple independent configurations are available, and the fastest configuration is chosen for each FPGA through at-speed testing, the circuit yield is: Y = h i D π(t ) P C To derive this, note that for an FPGA to fail, it must fail at-speed tests for all C configurations. The probability that it fails for a single configuration is D π(t ) P, and therefore to fail in all configurations ˆ D π(t ) P C. The relative target delay for a given yield is: T µ π σ π = erf» ( Y ) C P From (9) it is possible to determine how many independent configurations are required to achieve a required yield Y, given a target path delay of T : where (8) (9) ln ( Y ) C ln ` () u+ P «T µπ u = erf σ π () It should be emphasised that this technique, and the analysis, is dependent on independency of configurations. Configurations are independent if the near-critical paths in each configuration use different resources. In practise, this will not be the case. Correlations between configurations has the effect of reducing the effective value of C. The limiting case is where each configuration uses exactly the same resources, and the effective value of C is unity. 4.5 Region relocation The strategy of subdividing the circuit into many separate modular regions which can be assembled in different ways is next considered. Assume that the modularisation of a circuit creates R identical regions and R subcircuit modules, each of which can be assigned to any of the R regions. Clearly, there are R! possible permutations for placing the subcircuits. The yield of this strategy is the probability of finding at least one assignment within the R! implementations where all subcircuits function (that is, pass at-speed testing). However, unlike the multiple configuration scheme, the implementations are not independent. Assume that the circuit is subdivided evenly into R subcircuits such that each subcircuit has P critical paths. This R we will term a balanced division. Unbalanced divisions will be examined later. Theorem. Given a balanced subdivision of a circuit with P near-critical paths into R subcircuit modules, and considering all possible assignments of modules to regions, the yield of the system can be approximated by: Y ( q) R R () where:» q = D π(t ) P R = + «P T erf µπ R (3) σ π Note that q is the yield of an individual subcircuit module assigned to a single region. Proof. The subdivision of the circuit is balanced. Therefore, it is reasonable to assume that any given pairing of a subcircuit and a region will have a fixed probability of functioning, given by q in (3). The probability that a given subcircuit does not function in a given region is ( q). The yield of the region relocation scheme can be derived by determining the probability that of the R! possible assignments of subcircuits to regions, no combination can be found where all subcircuits function. We denote this event (that no combination works) as E. It is conjectured that E generally occurs due to one of two scenarios: either there a subcircuit which does not function in any region, or there is a region in which none of the subcircuits function. Other situations (such as there being two subcircuits which only function in one region) are sufficiently rare that they can be ignored. For a given subcircuit module m i, the probability that it does not function in any region (event F mi ) is: P(F mi ) P(m i cannot be placed) = ( q) R (4) Since there are R modules, the probability that there is a subcircuit which cannot be placed is: P(F m) = P [ i! F mi = ( q) R R (5) Similarly, over all R regions, the probability there exists a region in which no subcircuit works is: P(F r) = P [ i! F ri = ( q) R R (6) where F ri is the event that no subcircuit functions in region r i. The yield is therefore: Y = P(E) (7) (P(F m F r)) (8) (P(F m) + P(F r) P(F m)p(f r)) (9) which can be reduced to () by substitution of (5) and (6). Note that events F r and F m are only weakly dependent for large R, and so P(F m F r) P(F m)p(f r). The relative target delay given a yield Y is: T µ π = " erf Y «R # P R R σ π () If the subdivision of the circuit is unbalanced, the yield will be reduced. The limiting case is where all P near-critical paths are allocated to a single subcircuit. There are then R 8

6 P(Z = x) π target delay variability, define: X i = d i µ i (3) Z π = d π µ π = X X i i (4) (a) Initial PDF of path delay. PDF of reconfigured paths initial PDF (b) After path reconfiguration. τ τ failing paths delay remaining failing paths delay Figure 3: The probability density function (PDF) of stochastic path delay Z π before and after reconfiguration of a single path element. different placements of this subcircuit, while the placement of the remaining subcircuits is irrelevant. The yield is then similar to the multiple configuration case: Y unbalanced = h i D π(t ) P R () 4.6 Path reconfiguration The path reconfiguration scheme is complex to analyse. The analysis can be approached by treating the strategy as an iterative process: repeated alterations are made to an initial circuit implementation. The result is an estimation of yield after a number of iterations. The drawback is that it is not possible to derive a canonical expression for the relative delay for a given yield. Before beginning the analysis, an overview of the approach is as follows. The initial yield of a circuit implementation is determined by statistical STA. The paths which fail make up some proportion of the total number of paths (see Figure 3(a)). Path reconfiguration is applied to the failing paths. The reconfigured paths have a different delay distribution, as depicted in Figure 3(b). The resulting yield is calculated from the mean and variance of the reconfigured paths. To begin with, consider a single path π in a circuit implementation, for example as depicted in Figure. As described in the worst-case timing analysis, the path is constructed from a number of elements, each of which contribute a delay d i to the overall path delay d π. d π = X i d i () A path will fail at speed testing if d π > T. The delay d i of an element is a random variable; each element may have a different mean µ i and variance σi. The mean path delay µ π is a function of the design and die-to-die variation, while within-die stochastic variability affects the variance of each elemental delay. To make it easier to isolate the stochastic τ = T µ π (5) We will assume that the variables X i are normally distributed with mean and variance σi. The at-speed test will pass the path if Z π < τ. The aim of path reconfiguration is to improve the speed of a path by making isochronal changes (using reconfiguration) to elements in the path. An isochronal change, by definition, does not affect the mean delay of the path µ π. Instead, a change to element i results in a new value for X i, which we will denote X i. The reconfigured paths will have a new mean µ π,r and variance σ π,r. The mean is given by: µ π,r = E{Z π Z π > τ} E{X i Z π > τ} + E{X i} (6) Assuming that there is no dependency between the original element delay and the new element delay, the expected value E{X i} =. Therefore, on average, after the change, the path delay will have decreased by the value of the original elemental stochastic delay X i. The first term of (6) is straightforward to find: r τ σ π exp σπ E{Z π Z π > τ} = (7) π τ erfc σπ In order to quantify the improvement in path speed, we need to determine the expected value of X i given that the path failed at-speed testing. Following this, we need to quantify the variance of the reconfigured paths, so that the new yield can be determined. Theorem. The expectation value E{X i Z π > τ} can be approximated by: r! E{X i Z π > τ} σ i σ π τ + τ + 8σ π π, (8) where σ π = P σ i. Proof. To start, let us find the probability density function P(X i = x Z π > τ). This is the probability that a given stochastic delay will be x given that the path failed at-speed testing. By Bayes Rule: P(Zπ > τ Xi = x)p(xi = x) P(X i = x Z π > τ) = P(Z π > τ) (9) The terms P(X i = x) and P(Z π > τ) are straightforward results as the variables X i and Z π are (assumed to be) normally distributed: «P(X i = x) = exp x σ i π σi P(Z π > τ) = «τ erfc σπ (3) (3) The remaining term P(Z π > τ X i = x) is also normally distributed: P(Z π > τ X i = x) = P(Z ρ > τ x) (3) = «τ x erfc (33) σρ 83

7 where Z ρ is the partial sum Z ρ = P j i Xj, and σ ρ = P j i σ j. Making these substitutions: erfc τ x x σρ exp σ i P(X i = x Z π > τ) = πσi (34) τ erfc σπ The expected value is found from the integral: E{X i Z π > τ} = Z xp(x i = x Z π > τ)dx (35) Unfortunately, this is unsolvable analytically. However, by inspection of (34), it may be noted that the function P(X i = x Z π > τ) is approximately Gaussian in shape. Therefore, it is possible to estimate the expected value by determining the location of the peak in the function. This is achieved in the standard way, by determining where the differential of (34) is zero. The following approximation is used to simplify the result: and (8) follows. erfc(u) π exp( u ) (u + p u + 4/π) (36) Thus far, we have found the average decrease in the delay of a failing path after the isochronal change. In order to determine the yield of the path reconfiguration scheme when applied to a large number of paths, it is also necessary to know the variance of the delay after the change. It is then possible to determine the fraction of reconfigured paths which still fail to meet the timing requirement, as shown in Figure 3(b). The variance of a reconfigured path is determined by finding the variance of the path not being reconfigured, and adding the variance of the reconfigured element: Var{Z π} = Var{Z ρ Z π > τ} + Var{X i} (37) Here, Z ρ is again the sum of the stochastic delays for the elements not undergoing reconfiguration. Note that Var{X i} = σi. Theorem 3. The variance Var{Z ρ Z π > τ} can be approximated by: «σ π,r = Var{Z ρ Z π > τ} σi σ i (38) σπ The proof of Theorem 3 is omitted for reasons of limited space; it follows from the derivation of Theorem. The resulting path yield after each iteration of path reconfigurations is the yield of the previous iteration plus the yield of the reconfigured paths. Y π,r Y π,r + ( Y π,r )» + erf «τ µπ,r σπ,r (39) The yield of the chip is a function of the path yield and the number of independent critical paths P in the design: Y chip,r = Y P π,r (4) Again, where critical paths are correlated (that is, they share segments) the effective value of P is reduced. 4.7 Summary The derived expressions for the yields of the different strategies are summarised in Table. SSTA Mult. conf. Region reloc. Path reconf. where Yield Y = D π(t ) P Y = ˆ D π(t ) P C h Y ( D π(t ) P R ) R i R P Y r = Y π,r Y π,r [Y π,r + ( Y π,r )D π,r(τ)] D π(t ) = + erf T µπ σ π D π,r(τ) = + erf τ µπ,r σπ,r Table : Summary of the analysis. σ i = Figure 4: Die yield for a target elemental delay. Four different values of elemental variability are plotted, covering a standard deviation of % to % of the mean delay. 5. EXPERIMENTAL RESULTS This section contains results of simulations and experiments, with two objectives. Primarily, it is of interest to examine and compare the relative yield enhancement offered by the alternative strategies under different conditions. Moreover, the expressions derived in the analysis of the previous section are verified using Monte Carlo simulations. The worst-case strategy is examined first. Assuming an FPGA of moderate size (around 7 logic elements) and parametric yield applied to LUTs only, die yield curves are plotted against target elemental delay in Figure 4. The target delay is normalised to the mean LUT delay. The different curves describe different amounts of LUT delay variability, from a standard deviation of % of the mean up to % of the mean. It can be seen that an increase in variability has a significant impact on the achievable speed of the device. For the remaining graphs in this section, the elemental variation is set to σ i = %, a realistic value for FPGAs fabricated in 9nm technology [7]. Using statistical static timing analysis (SSTA) provides a significant improvement, as shown in Figure 5. Here yield curves are plotted for circuits with differing numbers of critical paths, from to 5, each comprising five elements (). Note in particular the scale of the x-axis compared to Figure 4. Indicated on the graph is a timing/yield point for a design with critical paths, showing that if the target path delay is chosen to be.95 the mean path delay, 85% of dice will be able to achieve this or better. Each critical path in this scenario is constructed from five ele- 84

8 Yield σ i =. (.95,.85) P = limit for correlated paths Required number of configurations σ i =. P = 5 µ π +.75 σ π Target yield Figure 5: Circuit yield vs target path delay using statistical static timing analysis. The curves represent different numbers of critical paths P in the circuit, from to 5. Yield σ i =. P = 5 C = limit for correlated configurations C = 3.. C = SSTA Figure 6: Circuit yield vs target path delay in the multiple configuration strategy. mental components, and each path is disjoint; there are no common elements shared by any two critical paths. Where critical paths do share elemental components their delay is correlated, and the effective number of paths is reduced, shifting the yield curve to the left. The limiting case, where all critical paths are completely correlated, is the curve for a single critical path, P =. The remaining results in this section assume a circuit with the equivalent of 5 uncorrelated paths of near-critical delay (P = 5). Further work is required to ascertain the value of P for an arbitrary circuit. Next, the multiple configuration strategy is examined. In Figure 6, yield curves are plotted for a design with between two and ten independent configurations. Observe the dashed line indicating the yield curve for statistical static timing analysis applied to the same design. It can be seen that the multiple configuration scheme provides some improvement over SSTA alone. However, it should also be noted that if the configurations are not independent the improvement is reduced; in the limiting case (where all configurations are fully correlated) the yield is the same as SSTA. When using this strategy to improve performance, it is desirable to know how many independent configurations would Figure 7: The minimum necessary number of configurations required to achieve a target yield and path timing in the multiple configuration scheme. Yield σ i =. P = 5.4 R = R = 6.3 R = 4.. Upper bound Lower bound Figure 8: Circuit yield vs target path delay in the region relocation strategy. The theoretical upper bound (for balanced subdivisions) is shown for different numbers of regions from 4 to. The lower (non-balanced) bound is shown for four regions. be required to meet a given yield and timing target. Figure 7 plots the necessary number of configurations needed to achieve yield at a range of different path timing targets. The timing targets are expressed in terms of the mean delay and variance of the path, and range from µ π +.75σ π (aggressive) to µ π + 4.5σ π (conservative). It can observed that the required number of configurations escalates rapidly for modestly more aggressive timing, particularly when the required yield rate is high. Yield curves for the region relocation strategy are depicted in Figure 8. The number of regions graphed ranges from 4 to. Note that for a large number of regions, the number of critical paths in the design (P = 5) cannot be subdivided evenly between the regions. This results in a divergence between the simulations and the theoretical results. The limiting scenario for unbalanced subdivisions is plotted for the four-region case only (dashed curve). This would result when all critical paths were contained within a single region. As with the multiple configuration strategy, it is of interest to know how many regions would be necessary to achieve a 85

9 Required number of regions µ π +.5 σ π σ i =. P = Target yield Yield th path reconf. C = R = σ i =. P = 5 Worst Case SSTA Multiple conf. Region reloc. Path reconf Figure 9: The minimum necessary number of regions required to achieve a target yield and path timing in the region relocation scheme. Yield σ i =. P = 5 5th 4th 3rd nd st reconfiguration Initial Figure : Circuit yield vs target path delay in the path reconfiguration strategy. The yield is plotted after reconfiguring each of the five elements forming the path. given target yield and timing. Figure 9 graphs this information. Note that the timing targets are more aggressive than the multiple configuration case, ranging from µ π +.5σ π (aggressive) to µ π + 3.σ π (conservative). The outcome of the final strategy, path reconfiguration, is shown in Figure. The initial yield curve (dashed line) is that achieved by statistical STA. As noted earlier, the 5 critical paths in the design are made up of five elements each. The curves plotted in Figure show the effect after successive iterations, where each failing path has one element reconfigured. For this graph, the delay distributions of all elements are identical. The improvement is underestimated (particularly noticeable in the fourth and fifth iterations) because of the approximations made in the analysis. The yields of all strategies are compared in Figure and Figure. The first graph plots the extreme cases for a design with 5 uncorrelated critical paths each comprising 5 elements. Taking as an example the 85% yield point, and compared to the reference worst-case design, the strategies are enumerated from slowest to fastest: Figure : A comparison of the yields of the different strategies. Relative delay offset (σ π ) SSTA Multiple conf. Region reloc. Path reconf. σ i =. P = 5 C = (.99, 3.54) 5th path reconf. R = Target Yield Figure : Relative timing vs yield for the reconfiguration strategies, compared with SSTA. C =..., R = SSTA provides a 3.% improvement in the path delay over worst-case design.. The multiple configuration strategy achieves a 34.8% improvement with independent configurations. 3. Reconfiguring up to all five elements per path results in a 36.6% improvement. 4. Region relocation provides a 44.7% improvement when the design is divided into balanced subcircuits. The graph of Figure is an amalgamation of the theoretical yields of the reconfiguration techniques compared with SSTA. Here, the expected circuit timing is plotted against target yield. The timing is expressed as an offset from the mean path delay (µ π) in terms of the number of standard deviations of path delay (σ π). For example, a 99% chip yield using SSTA only would mean that the slowest expected path would be 3.54 σ π slower than the average path delay. From the graph, it can be observed that multiple configurations and path reconfiguration provide a similar range of improvement over SSTA, particular at high target yields. Adding more configurations provides an increasingly diminishing return. The region relocation strategy outperforms both with relatively few regions (eight in this case). 86

10 6. CONCLUSIONS AND FUTURE WORK Within-die delay variation will become increasingly significant in future technology nodes. This paper presented three strategies for compensating for variability by exploiting the reconfigurability of FPGAs. The techniques involve reconfiguring the entire FPGA; relocating subcircuits to different regions within the FPGA; and reconfiguring individual signal path elements. Using probability theory, the yield of each approach was modelled and compared with statistical static timing analysis and worst-case design, demonstrating the benefits of reconfiguration-based techniques. The analysis presented in this paper provides a foundation from which to explore delay variability adaptive design in FPGAs. In addition to enhancing the verification of this work through more accurate simulations, we plan to refine the simplifying assumptions such as path and configuration independence. Moreover, we will investigate the implementation of the delay adaptive strategies presented. 7. ACKNOWLEDGEMENTS The authors are grateful for the financial support of the UK Engineering and Physical Sciences Research Council (Platform Grant EP/C54948/). Thanks also to Dr. R. Sedcole for advice and suggestions on statistical theory. 8. REFERENCES [] M. Abramovici and C. E. Stroud. BIST-based delay-fault testing in FPGAs. Journal of Electronic Testing: and Applications, 9(5): , Oct 3. [] A. Asenov, S. Kaya, and A. R. Brown. Intrinsic parameter fluctuations in decananometer MOSFETs introduced by gate line edge roughness. IEEE Trans. Electron Devices, 5(5):54 6, May 3. [3] A. Asenov, S. Kaya, and J. H. Davies. Intrinsic threshold voltage fluctuationsin decananometer MOSFETs due to local oxide thickness variations. IEEE Trans. Electron Devices, 49(6): 9, Jun. [4] Y. Cao, P. Gupta, A. B. Kahng, D. Sylvester, and J. Yang. Design sensitivities to variability: Extrapolations and assessments in nanometer VLSI. In Proc. IEEE International ASIC/SOC Conference,. [5] H. Chang, V. Zolotov, S. Narayan, and C. Visweswariah. Parameterized block-based statistical timing analysis with non-gaussian parameters, nonlinear delay functions. In Proc. Design Automation Conference, 5. [6] P. Girard, O. Héron, S. Pravossoudovitch, and M. Renovell. High quality TPG for delay faults in look-up tables of FPGAs. In Proc. IEEE International Workshop on Electronic Design, Test and Applications, 4. [7] I. G. Harris, P. R. Menon, and R. Tessier. BIST-based delay path testing in FPGA architectures. In Proc. IEEE International Test Conference,. [8] K. Katsuki, M. Kotani, K. Kobayashi, and H. Onodera. A yield and speed enhancement scheme under within-die variations on 9nm LUT array. In Proc. IEEE Custom Integrated Circuits Conference, 5. [9] K. Katsuki, M. Kotani, K. Kobayashi, and H. Onodera. Measurement results of within-die variations on a 9nm LUT array for speed and yield enhancement of reconfigurable devices. In Proc. Asia and South Pacific Design Automation Conference, 6. [] K. S. Kim, S. Mitra, and P. G. Ryan. Delay defect characteristics and testing strategies. IEEE Design & Test of Computers, (5):8 6, Sept-Oct 3. [] A. Kraśniewski. Evaluation of testability of path delay faults for user-configured programmable devices. In Proc. Field-Programmable Logic and Applications, 3. [] X.-Y. Li, F. Wang, T. La, and Z.-M. Ling. FPGA as process monitor an effective method to characterize poly gate CD variation and its impact on product performance and yield. IEEE Transactions on Semiconductor Manufacturing, 7(3):67 7, Aug. 4. [3] Y. Lin, M. Hutton, and L. He. Placement and timing for FPGAs considering variations. In Proc. Field-Programmable Logic and Applications, 6. [4] G. Nabaa, N. Azizi, and F. N. Najm. An adaptive FPGA architecture with process variation compensation and reduced leakage. In Proc. Design Automation Conference, 6. [5] S. R. Nassif. Design for variability in DSM technologies. In Proc. IEEE International Symposium on Quality Electronic Design,. [6] P. Sedcole, B. Blodget, T. Becker, J. Anderson, and P. Lysaght. Modular dynamic reconfiguration in Virtex FPGAs. IEE Proceedings Computers and Digital Techniques, 53(3):57 64, May 6. [7] P. Sedcole and P. Y. K. Cheung. Within-die delay variability in 9nm FPGAs and beyond. In Proc. IEEE International Conference on Field Programmable Technology, 6. [8] C. Visweswariah. Death, taxes and failing chips. In Proc. Design Automation Conference, 3. [9] C. Visweswariah, K. Ravindran, K. Kalafala, S. G. Walker, and S. Narayan. First-order incremental block-based statistical timing analysis. In Proc. Design Automation Conference, 4. [] H.-Y. Wong, L. Cheng, Y. Lin, and L. He. FPGA device and architecture evaluation considering process variation. In Proc. International Conference on Computer Aided Design, 5. 87

Chapter 2 Process Variability. Overview. 2.1 Sources and Types of Variations

Chapter 2 Process Variability. Overview. 2.1 Sources and Types of Variations Chapter 2 Process Variability Overview Parameter variability has always been an issue in integrated circuits. However, comparing with the size of devices, it is relatively increasing with technology evolution,