SRAM supply voltage scaling: a reliability perspective

SRAM supply voltage scaling: a reliability perspective Abstract SRAM leakage power is a significant fraction of the total power consumption on a chip. Traditional SRAM supply voltage scaling reduces the leakage power, but it increases stored-data failure rate (e.g., due to soft-errors). Accordingly, this work studies SRAM leakage power reduction with a data-reliability constraint ensured by system-level design techniques, like error-correction, supply voltage reduction, and data-refresh (scrubbing). A statistical or probabilistic setup is used to model failure mechanisms like soft-errors or process-variations, and error-probability is used as a metric for data-failure rate. Error models which combine various SRAM cell failure mechanisms are developed. Using these error-models, system level optimization of leakage power constrained by a constant data error-probability requirement is studied. Circuit-level simulation results and leakage power reduction estimates for the CMOS 90nm technology are presented. 1 Introduction With technology scaling, the SRAM size on a chip, and the SRAM leakage power contribution to the total power increases. For low duty-cycle applications, like sensors, the SRAM leakage power dominates the total power consumption [1]. The popular supply-voltage reduction technique can reduce the leakage power. However, supply-voltage reduction increases the failure-rate of stored data. Stored SRAM-cell data faces the following failure mechanisms: (i) soft-errors due to cosmic particles or alpha particles from die-packaging, (ii) parametric failures like read-upset, access-time failure, etc. due to process-variations, (iii) supply noise induced failures, (iv) gate-leakage fluctuations due to trapped charge in gate oxide, and (v) any permanent defects. With the exception of (v), other failure mechanisms increase with supply voltage reduction [2, 3, 4]. Thus, any voltage-scaling based leakage power reduction is achieved at the cost of lower data-reliability. Past works have addressed these data-reliability issues in isolation [2, 3, 4, 5, 6]. Making larger SRAM cells is an obvious solution to increase reliability since it increases resistance to soft-errors and combats parametric failures. However, it increases leakage power and SRAM area. Using a system-design approach, SRAM leakage power reduction at a constant reliability set by a high supplyvoltage is studied. System level techniques consisting of error-correction, supply voltage reduction, and periodic data-refresh (or scrubbing [7]) are studied in a leakage power optimization framework with constant datareliability. Failures are modeled in a probabilistic setup. The focus is on system level optimization without changing circuit parameters like V T,L, or W. Thus, SRAM cell s design and area will be unaffected. The accomplished goals are as follows: Error models which combine various SRAM cell failure mechanisms are developed, while accounting for spatially fixed or random nature of the errors. The supply voltage dependencies of failure mechanisms, a key ingredient in optimization, are estimated by circuitlevel Monte-Carlo simulations and macro-models. An error-probability constrained optimization framework is developed, which accepts SRAM cell parameters as input and optimizes leakage power over supply voltage and refresh-time while accounting for datarefresh and error-correction overhead. Remarks: (i) Only bounded-distance decoding based block codes are considered. Thus, LDPC, turbo, or convolutional codes are not considered [8]. (ii) Multiple-bit failures have been reported in sub-90nm SRAMs (e.g., [4]). Dependencies of these failures are not known. Address permutation schemes can interleave SRAM data with negligible energy overhead and make the failures statistically independent. For simplicity, such address interleaving is assumed. Notation: Supply voltage is denoted by v and 1.0V is high supply voltage. Probability is denoted by p and r is used for probability rate. Data-lifetime and refresh time are denoted by t 0 and t r, respectively. Average leakage power is denoted by P l and E is used for energy. E and P are statistical expectation and probability, respectively. And, [n,k,d] represents error-correction code (ECC) parameters [8]. Organization: In Section 2, the optimization framework and the failure probability combination models are devel- 1

oped. In Section 3, error-probability calculations are illustrated using circuit-level Monte-Carlo simulations and macro-models. In Section 4, leakage power optimization results are presented. Section 5 concludes the paper. 2 Leakage power optimization framework As envisioned, the optimization problem has a leakage power per bit (power per bit) cost-function which will be optimized over the choices of refresh time t r, ECC, and supply voltage v. The optimization constraint is that the error probability of any decoded SRAM block should be equal to the decoding error associated with [31, 26, 3] Hamming code based SRAM block at a supply-voltage of v = 1.0V. 1 A SEC-DED code is chosen for target error probability since it is used in contemporary SRAM. The optimization framework has the following ingredients: (i) supply voltage v, (ii) SRAM cell leakage power (P l (v)), (iii) SRAM cell soft-error rate (r s (v)), (iv) spatial parametric failure rate (p p f (v)), (v) supply noise induced error rate (r n (v)), (vi) oxide trap-charge assisted erratic rate (r e f (v)), (vii) the data-lifetime parameter t 0, (viii) SRAM cell parameters such as read and write energy (E r and E w, respectively), and (ix) ECC parameters such as block length, information bits, minimum distance, and encoding and decoding energy. These parameters, except (ix), are expected as an input by the optimization program. Some ECC families will be used as a variable in optimization. A schematic diagram of the framework is shown in Figure 1. Figure 1. The optimization program accepts error probabilities and error rates of error-mechanisms, data lifetime, and memory parameters as input. The optimizer predicts the minimum leakage power achievable within specified ECC families. These inputs will be estimated or simulated for the 90nm CMOS technology (courtesy: ST Microelectronics) and used to exemplify optimization framework results (output). The supply voltage is discretized to the set {0.3V,0.4V,...1.0V} and the optimizer computes power per bit on this set of supply voltage. At 0.2V, the selected 1 All Hamming codes fall into the category of single-error correcting double-error detecting (SEC-DED) codes. SRAM cell was not writeable. 2 Failure rates for various error-mechanisms at these discrete supply voltages will be estimated (see Section 3). The read-write energy for SRAM cell, and the ECC encoding and decoding energy will be estimated by their values at a supply voltage of 1.0V for simplicity. These estimates will be pessimistic since these energies are expected to reduce with supply voltage. However, this approach saves simulation effort, and it does not changes the power optimization results. The probabilistic aspects of the optimization framework will be discussed next. Union bound will be used to upper bound the probability of failure event. For any two sets A and B the union bound states that, P(A B) P(A) + P(B). Let p p f (v) be the net parametric failure probability. Let p h (v), p w (v), p r (v), p at (v), and p wt (v) be the probabilities of hold-failure, write-failure, read-upset, access-time failure, and write-time failure [3, 5]. By union-bound, p p f (v) p h (v) + p w (v) + p r (v) + p at (v) + p wt (v). (1) Using this bound, dependencies between parametric-failure mechanisms are not needed. Recall that r s (v), r n (v), and r e f (v) are the rates of soft-error, supply noise induced errors, and erratic fluctuation induced errors. Let t be any time period of interest. Then, the error-probability due to these mechanisms is upper-bounded by, p e (v) t[r n (v) + r e f (v) + r s (v)], if p e (v) 1. (2) Observe that for p e (v) 1, this error probability increases with time-period t. Error check and refresh (scrubbing) at periodic rate mitigates this error mechanism. Next, the differentiation between errors and erasures will be introduced. An error is a flipped bit, while an erasure is a bit that is known to be defective. An erasure is similar to a don t care ( ). The differentiation is important since an erasure is easier to decode compared to an error. In simple terms, no information (erasure) is better than wrong information (flipped bit). Consider the simplest repetition coding (TMR) for a single bit. The codewords to be stored corresponding to bits 0 and 1 are (000) and (111), respectively. On using majority rule, two bit flips (errors) lead to an incorrect decision. But correct decision can be made if two bits are in erasure. Decoding errors and erasures together was studied by Forney as generalized decoding [9]. Some ECC families (e.g., BCH codes) jointly decode errors and erasures (generalized decoding). With generalized decoding, if an error-correction code has minimum Hamming distance d, then x-errors and y-erasures can be corrected if, 2x + y < d. (3) 2 Supply-voltage quantization is flexible in the optimization program. Only for results presented in Section 4, this particular discrete set is chosen. 2

Thus, the repetition code can correct up to two erasures or one error. Loosely speaking, two erasures and one error contribute equally to decoding error. This distinction is useful since parametric failures happen at fixed locations (on the scale of decoding time) and noise-induced errors happen in random location. While decoding, the location of parametric failures can be learned by writing and reading test patterns in SRAM cells. Note that this advantage in error-resilience comes at the cost of small decoding overhead. The erasure probability p x is given by, p x (v) = p p f (v). (4) The error probability in generalized decoding depends on the pair (p e, p x ) and it is computed for any [n,k,d] ECC using (2), (3), and (4). On the other hand, if all bit flips are treated as errors, then the error-probability in this specialized decoding depends on (p e + p x ), and the condition for correct decoding is, 2(x + y) < d. (5) The error probability for specialized decoding is simply the probability that d 2 or more bits out of n bits flip, with each flip having a probability (p e + p x ). This distinction between generalized and specialized decoding will be used to compare power per bit reduction in Section 4. Let [n,k,d] be ECC parameters. The number of redundant parity bits are (n k). The power per bit cost function, including the data-refresh overhead, is given by, P b (v) = n k P l(v) + n(e r + E w ) kt r + E ECC t r. (6) The data-refresh overhead becomes negligible when t r and t 0 are large. This is reasonable since leakage power is significant only when data-lifetime is large. For the 90nm standard-v T technology, t 0 > 1sec has negligible refresh power overhead for low complexity codes like SEC-DED. This t 0 = 1sec number will be used in future sections. Finally, the optimization constraint is set by decoding error probability for a [31, 26, 3]-Hamming coded SRAM cell block at a supply of v = 1.0V. The error-probability estimation for various error mechanisms is discussed next. 3 SRAM cell error-probability estimation Estimation methods and models for various failure mechanisms will be discussed in this section. Typical FIT rate for an SRAM cell at v = 1.0V is of the order of 0.001/cell. This error-probability rate is for 10 9 hours, which equals to an error-probability rate of 2.77 10 16 per second. This error-probability rate is extremely low and expensive to measure by experiments. Therefore, a modeling approach will be used to estimate the error-probability rate of SRAM cell at different voltages. These error-probability rates are also affected by process-variations. This phenomenon is modeled using circuit level Monte-Carlo simulations. These modeling methods are not absolute, and there are better ways (e.g., experimental) to estimate error-probability rates. But, the optimizer s is separate from the inputs. Since these errorprobability rates are inputs, therefore any superior estimates can always be used to calculate optimized power per bit. 3.1 Soft-error rate estimation SRAM cells retain data as charge at storage node and radioactive particles act as a noise mechanism that affects this stored charge causing errors. For lower supply voltages the stored charge decreases, making it easier for radioactive particles to flip the stored bit. Thus, soft-error rate increases with reduction in supply-voltage v (see [10], for example). Soft-error rate estimation uses the circuit shown in Figure 2. 3 The feedback inverter pair represents an SRAM cell without the access transistors. L and R are pneumonic for left and right. The inverters L and R hold the stored bit when access-transistors are off. The noise current i(t) is induced by the radioactive particle. The noise-current i(t) is modeled by the following two-parameter waveform, i(t) i(t,q,τ) = 2q t ( τ π τ exp t ), (7) τ where q represents the total charge and τ is a timeparameter. For the CMOS 90nm technology, τ = 90ps has been estimated [10]. The charge q = t i(t)dt characterizes the magnitude of noise. For any v and i(t) as in (7), there is a charge threshold q c (v), called as the critical charge, at which the stored bit in SRAM flips [11]. Then the soft-error Figure 2. The circuit for critical-charge estimation is illustrated. An analytic noise current source i(t) models the effect of radioactive particle. rate is given by [10], r s (v) = K s exp( α s q c (v)), (8) where K s and α s are constants independent of v. 3 Noise-current i(t) at other nodes, e.g., access-transistor gate, can also cause bit-flip. However, the fraction of such upsets is negligible. 3

With process-variations, SRAM cells will have a critical charge distribution. Let Q c (v) be the random critical charge for an SRAM cell. The expected soft-error rate is given by, r s (v) = K s E[exp( α s Q c (v))], (9) For refresh-time t r, and if t r r s (v) 1 the soft-error probability is given by, 4 p s (v) = t r K s E[exp( α s Q c (v))], (10) Using Monte-Carlo simulations, this equation will be used to determine the soft-error probability for the optimization. 3.2 Parametric failure probability models Parametric failures consist of read-upset, write-failure, hold-failure, write-time failure, and access-time failure. Of these read-upset, write-failure, and hold-failure can be estimated using voltage transfer characteristics (VTC) and noise margin. And access-time failure and write-time failure can be made arbitrarily small by having a large enough time of reading and writing, respectively [3, 5]. For brevity, only read-upset probability estimation and write-time calculation will be illustrated. Other failure probabilities can be estimated similarly and the reader can refer to the literature. For read-upset probability estimation, read noise margin (RNM) is needed, and for RNM calculations appropriate VTCs are needed. The circuits of Figure 3(a) are used to compute two VTCs for RNM calculation. The decoupled L and R inverters of SRAM cell are biased as during the read operation. In the first circuit of Figure 3(a), V L is swept from 0 to v and V R is tabulated. Similarly, using the second circuit of Figure 3(a), V R is swept and V L is tabulated. These tabulated functions form the butterfly-graph as shown in Figure 3(b). The RNM is defined as the side of smaller square among S 1 and S 2. Thus, RNM is rnm(v) = min(s 1,s 2 ), where s i is the side of square S i, i = 1,2. The RNM will be a random variable due to process variations. Let RNM(v) be the random RNM of an SRAM cell. A negative rnm(v) or the absence of butterfly-structure in Figure 3(b) signifies a read-upset. Thus, p r (v) is given by, p r (v) = P[RNM(v) 0]. (11) It has been shown that empirical RNM(v) exhibits a Gaussian distribution for large number of trials [5]. Thus, RNM(v) N (µ r (v),σr 2 (v)) and estimation of µ r (v) and σ r (v), for various v, is sufficient to compute p r (v) in (11). Calculation procedures for hold and write failure probabilities are similar, except that the write noise margin exhibits one-sided distribution in Monte-Carlo circuit simulations. Write-time calculation is presented next, and it can be analogously extended to access-time calculation. While 4 In this work, t r r s 1 is satisfied for t r < 10 13 sec. Figure 3. (a) The circuits used to derive VTCs used for RNM calculations are shown. (b) The butterfly curve derived from VTCs in Figure 3(a) is illustrated. writing a bit, the bit lines are pre-charged to complementary levels and access transistors are turned on for a time t w (write-time). If the bit is not written within this time t w, then a write-time failure happens. An estimate of writetime t w will be developed, such that the fraction of cells in write-time failure is insignificant compared to other failure probabilities. Direct estimation with Monte-Carlo simulations will require humongous number of trials. Accordingly, an extreme-value theory based prediction method was used [12]. Let T w (v) be the random write-time for an SRAM cell. The write operation will be successful (from timing perspective) if T w t w, where t w is the write time fixed by the designer. The residual probability function is defined as, R w (t,x,v) := P[T w (v) > t + x T w (v) > x]. (12) Extreme value theory tells us that if lim x R w (t,x,v) converges, then the limit will be exponential. Thus, R w (t,v) := lim x R w (t,x,v) = exp( α w (v)t), (13) if the limit R w (t,v) exists. If this convergence holds, then a suitable x and α w (v) are needed for probability computation. The empirical ln(r w (t,x,v)) (with 2000 trials) is shown in Figure 4 for v = 0.4V and x such that P[T w > x] = 0.1. The parameter α w (v) is the least-square slope of ln(r w (t,x,v)). Then t w is increased such that P[T w > t w ] is negligible compared to other failure probabilities. 3.3 Simplifying assumptions Supply noise: Supply voltage noise affects the noise margin based failure estimation techniques since concepts like VTC are defined for a fixed supply voltage. Traditional method to deal with supply noise is to provide a 100mV margin. This work assumes the same approach to avoid non-trivial noise and error-probability modeling difficulties. 4

Figure 4. The exponential dependence of R w (t,x,v) on t is illustrated for v = 0.4V, where P[T w > x] = 0.1. Oxide trap-charge induced errors: Oxide trap-charge induced errors have been modeled as random-telegraph noise in the literature. However important issues, such as rate of trapping/detrapping, trap charge density, magnitude of gateleakage current, etc. are not known. Accordingly, its relative characterization with respect to other failures is nontrivial and is left as a future work. Finally, the optimizer allows random noise induced by trap-charges as an input. Using these SRAM cell error probability modeling techniques, the following results were obtained (see Figure 5). These will be input to the optimization framework. Holdfailure probability is negligible compared to the read-upset probability and is not shown. t 0 1sec. Note that read-write energy for SRAM cell is in the range of a few pj. The average leakage current for an SRAM cell is in the range of few na. Thus, leakage power contribution is significant only when the read-write activity is occasional. A data-lifetime t 0 1sec is coherent with the assumption that the leakage power is significant, and the results presented are for t 0 = 1sec. To understand the advantage of data-refresh, power per bit cost function P b (v) is plotted against v when the ECC is restricted to [31,26,3] Hamming code. The refresh time t r is chosen to meet the target error probability (set by SEC- DED code and soft-errors at v = 1.0V). For v 0.6V, where parametric failures are dominant, the probability constraint cannot be met by refresh, and t r is set to zero, which makes P b (v) infinite (see (6) and Figure 5). Since parametric failures are spatially fixed, therefore data-refresh will not combat its effect on error probability. The power per bit P b (v) can be reduced by 61% at a constant error-probability. Figure 6. For [31,26,3] Hamming code, P b (v) can be reduced by 61% with constant error-probability maintained by data-refresh. The refresh time is shown by dotted curve (t 0 = 1sec). Figure 5. Obtained estimates for soft-errors and parametric failures are compared in this semilog plot. At low voltages, parametric failures are significant. At highvoltages, error-probability consists of only soft-errors. 4 Leakage power optimization results For error-probability data as in Section 3, power per bit optimization results will be presented for a data-lifetime of When ECC choice includes more families (e.g. BCH codes), the following optimization procedure is used. As before, the error-probability constraint is set by the [31,26,3] code and soft-error rate at v = 1.0V. Recall that if errors and erasures (parametric failures) are distinguished, the setup is called as generalized decoding. If errors and erasures are combined, it is called as specialized decoding. The decoding failure events for the two cases were given by (3) and (5), respectively. For each ECC with parameters [n,k,d], and for each v, a refresh time t r is calculated such that the error-probability constraint is satisfied. If the probability constraint cannot be met by t r = 0 due to parametric failures, then t r is set to zero, which makes P b (v) infinite. Once data-refresh times have been computed, P b (v) function is optimized over the choice of v. This will result in optimized power per bit function for every ECC. Finally, P b (v) is optimized over ECC with same minimum distance 5

d, which can be thought of as the complexity of decoding. Power reduction will be measured against the per-cell leakage at v = 1.0V for [31,26,3] code. The average leakage per cell at v = 0.3V sets an upper bound of 94% on power per bit reduction. The result of this optimization procedure for generalized and specialized decoding is plotted in Figure 7 as a function of d 1 2, the number of errors that can corrected. With increasing d, the power per bit reduction gets closer to the upper bound. And, generalized decoding approaches the lower bound at a faster rate. 93% was estimated (for the CMOS 90nm technology) over multiple coding families. Data-refresh tackles random errors effectively. Specialized decoding, in which erasures and errors are combined, has an inferior power reduction compared to the generalized decoding. 6 Acknowledgements The authors wish to acknowledge the contributions of the students, faculty and sponsors of the Berkeley Wireless Research Center, the National Science Foundation Infrastructure Grant No. 0403427, technology access from STMicroelectronics, and the support of the Gigascale Silicon Research Center (GSRC), one of five research centers funded under the Focus Center Research Program, a Semiconductor Research Corporation program. Discussions on this topic with Dr. T. M. Mak, Dr. M. Spica, Dr. M. Zhang, Dr. M. Roncken, and Dr. R. Mathur from Intel Corporation were very helpful. References Figure 7. Power per bit reduction gets close to the upper bound with increase in minimum distance d of ECC. And, generalized decoding based power reduction approaches the upper bound at a faster rate. Remarks: Coding introduces delay and parity overhead. Since p e (v) and p x (v) are close to zero, therefore parity overhead can be made negligible. For decoding delays, note that codes with n 1024 were used in the optimization. If n 1024, and p x (v) 10 5, then probability of no cell in error is approximately (1 np x (v)) 0.99. Thus, more than 99% decoding cases require only parity check (small delay). Availability: The power optimization tool with documentation is available at the following website: https://bwrcs.eecs.berkeley.edu/freshram/ 5 Conclusions SRAM leakage power reduction problem was studied in this work. It was noted that SRAM supply voltage scaling reduces the leakage power, but it increases the data error probability. Therefore, SRAM leakage power reduction, at a constant data error-probability, using system-level design techniques was studied. A probabilistic analysis framework was developed for various error-mechanisms. Failures were distinguished depending on whether their locations are fixed or random, leading to generalized decoding. System level techniques like error-correction, supply voltage reduction, and data-refresh were used. Leakage power reduction by [1] M. Sheets et al., A (6x3)cm 2 self-contained energy-scavenging wireless sensor network node, in Wireless Personal Multimedia Communications, Abano Terme, Italy, 2004. [2] M. Agostinelli et al., Erratic fluctuations of SRAM cache V min at the 90nm process technology node, in IEEE International Electron Devices Meeting, 2005. IEDM Technical Digest, Dec 2005, pp. 655 658. [3] S. Mukhopadhyay, H. Mahmoodi, and K. Roy, Modeling of failure probability and statistical design of SRAM array for yield enhancement in nanoscaled CMOS, IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, vol. 24, no. 12, pp. 1859 1880, Dec. 2005. [4] V. Degalahal et al., Soft errors issues in low-power caches, IEEE Trans. on VLSI Systems, vol. 13, no. 10, pp. 1157 1166, Oct. 2005. [5] K. Agarwal and S. Nassif, The Impact of Random Device Variation on SRAM Cell Stability in Sub-90nm CMOS Technologies, IEEE Trans. on VLSI Systems, vol. 16, no. 1, pp. 86 97, Jan. 2008. [6] E. Alon, V. Stojanovic, and M. Horowitz, Circuits and techniques for high-resolution measurement of on-chip power supply noise, IEEE Journal of Solid-State Circuits, vol. 40, no. 4, pp. 820 828, April 2005. [7] S. S. Mukherjee et al., Cache scrubbing in microprocessors: myth or necessity? Proc. of 10th IEEE Pacific Rim Intl. Symp. on Dependable Computing, 2004., pp. 37 42, March 2004. [8] F. J. MacWilliams and N. J. A. Sloane, The Theory of Error- Correcting Codes, 2nd ed. Amsterdam, CA: North Holland, 1977. [9] J. Forney, G., Generalized minimum distance decoding, IEEE Trans. on Information Theory, vol. 12, no. 2, pp. 125 131, Apr 1966. [10] P. Hazucha and C. Svensson, Impact of CMOS technology scaling on the atmospheric neutron soft error rate, IEEE Trans. on Nuclear Science, vol. 47, no. 6, pp. 2586 2594, Dec 2000. [11] L. B. Freeman, Critical charge calculations for a bipolar SRAM array, IBM J. Res. Dev., vol. 40, no. 1, pp. 119 129, 1996. [12] A. A. Balkema and L. De Haan, Residual life time at great age, The Annals of Probability, vol. 2, no. 5, pp. 792 804, Oct 1974. 6