PREDICTION OF SEMICONDUCTOR LIFETIME USING BAYESIAN LINEAR MODELS WITH MIXED DISTRIBUTIONS

Size: px

Start display at page:

Download "PREDICTION OF SEMICONDUCTOR LIFETIME USING BAYESIAN LINEAR MODELS WITH MIXED DISTRIBUTIONS"

Robert Riley
6 years ago
Views:

1 PREDICTION OF SEMICONDUCTOR LIFETIME USING BAYESIAN LINEAR MODELS WITH MIXED DISTRIBUTIONS Olivia Bluder 1, and Jürgen Pilz 1 1 Alpen-Adria-University of Klagenfurt, Universitätsstrasse 64-68, 900 Klagenfurt, Austria KAI - Kompetenzzentrum Automobil- und Industrie-Elektronik GmbH, Europastrasse 8, 954 Villach, Austria Address corresponding to Olivia Bluder: KAI - Kompetenzzentrum Automobil- und Industrie-Elektronik GmbH, Europastrasse 8, 954 Villach, Austria olivia.bluder@k-ai.at Key Words: Bayesian linear models, semiconductor reliability, cross validation, mixed distributions. 1

2 ABSTRACT This paper analyzes two models to predict cycles to failure of semiconductor devices. Both are Bayesian linear models dependent on test parameters of cycle stress tests. The first approach models the data with a normal distribution and non-informative priors. The quality of the posterior predictive distributions is poor, because the model ignores the occurrence of two competing mechanisms for device failure. The second approach includes this information into the model by using a mixture of two normal distributions and informative as well as non-informative priors. Correlation analysis shows that the mixing proportion depends on the peak temperature of the device under test. To compare the performance of the two models, cross validation is used. The analysis shows a significant increase in quality for the model with mixed distribution. 1 INTRODUCTION Modeling and predicting lifetimes of power semiconductor devices has become more and more important during the last years. Since testing resources, especially time, are restricted, reliable prediction methods for the lifetime of Devices under Test (DUT) are required. For this study 10 datasets containing Cycles to Failure (CTF) of Smart Power ICs (Glavanovics et al., 001) tested with a temperature cycle stress test system (Glavanovics et al., 007) are used. Currently these tests are modeled with a log-normal distribution (Bluder, 008) to predict the required parts per million (ppm) quantiles. Generally, predictions of mean lifetime are done based on physical acceleration models, e.g.

3 Arrhenius or Coffin-Manson (Escobar et al., 006), but for the given data these models are insufficient (see figure 1). Figure 1 shows the result of applying a Coffin-Manson model to the given data. To achieve linear correlation a logarithmic x-axis is used. It illustrates that at least two Coffin-Manson models must be considered to model the data well. mean(logctf Figure 1: Applying Coffin-Manson models to the data Based on this result another prediction model needs to be used. We propose Bayesian linear models, because data of previous tests and expert knowledge are available. DATA CHARACTERISTICS Investigating the characteristics of the given data is the first step of the modeling process. The data contain CTF values and test settings of 148 Smart Power ICs. Smart means that each device includes several protection functions against over-temperature, over-current, open load, etc. These devices are frequently used in automotive applications, e.g. to replace mechanical switches and relays. All DUTs belong to the same type, but they have been tested under different electrical stress conditions. Generally, 14 to 16 DUTs belong 3

4 to the same test; this means that they are tested under the same conditions. The test system measures and records the state of every DUT. The Cycles to Failure (CTF) of each DUT depend on the test parameters. Backward elimination indicated that the four most important parameters are (Bluder, 009): o clamping voltage, V Cl [V] o peak current, I[A] o pulse length, t p [µs] o repetition time, t rep [ms] (time between two stress pulses = 1/frequency) Previous investigations (Bluder, 008) showed, that the logarithmic CTFs follow a normal distribution, hence to visualize the output, normal probability plots over a logarithmic x-axis are used. 99% 95% 84% (+sigma) 50% (mean) 16% (-sigma) 5% 1,0% 1000ppm T04 100ppm T09_ML T1 T13 10ppm T15 T16 1ppm 1e 1e3 1e4 1e5 1e6 1e7 1e8 1e9 1e10 Cycles to Failure (CTF) Figure : Results of lifetime tests Figure shows the results of six lifetime tests. The shift in mean CTFs (x-direction) between the tests is due to the different stress levels. Basically, devices under higher stress fail earlier. Figure illustrates also that test T13 behaves differently than the 4

Both failure mechanisms occur in all tests, but the ratio of failure one to failure two depends on the test settings. The split into two branches is proven by physical failure analysis.

5 surrounding tests. T13 seems to indicate a transition between two groups of tests, because it shows a mixture of two distributions. The division into two groups is based on the assumption of two dominating failure mechanisms. Both failure mechanisms occur in all tests, but the ratio of failure one to failure two depends on the test settings. The split into two branches is proven by physical failure analysis. Devices with a logctf value less than 6 form the first branch. These DUTs do not show a visible failure spot in physical inspection, although cracks at the surface can be observed (see figure 3(a)). These cracks are only superficial and therefore do not cause the device failure, but these devices fail in a short circuit condition due to local degradation processes in the lower metal layers. Devices failing in the second branch (logctf > 6) show visible failure spots (see figure 3(b)). Based on this investigation, test data can be divided into tests with a heavy first branch (e.g. T1 and T15) and tests with a heavy second branch (e.g. T04, T16 and T09_ML). (a) 1 st group (b) nd group Figure 3: Images from physical failure analysis of a smart power semiconductor switch 5

6 3 BAYESIAN LINEAR MODEL Linear Models (LM) are used for normal distributed data. Since the given data follow a log-normal distribution, a logarithmic transformation leads to normal distributed data. 3.1 Model definition Bayesian LMs derive from the Bayesian law (Gill, 008) which states, that the posterior distribution of the model parameters p(θ y) is equal to the conditional probability of the data on the model parameters p(y θ) times the prior distribution of the model parameters p(θ) divided by the probability of the data p(y). The conditional probability can be expressed by the likelihood function L(θ y), hence the law converts to: p( θ y) L( θ y) p( θ ) (1) The equation changes to a proportion, because the probability of the given data is constant and therefore it can be neglected. First the likelihood function needs to be calculated. The logarithmic transformed data (log 10 CTF = y) follow a normal distribution y ~ N(µ, σ²i). Where the vector of means will be modeled with a LM dependent on the four critical test parameters: µ = X β + ε = β0 + β1 VCL + β I + β3 t p + β 4 trep + ε () With X the matrix of normalized covariates and normal random errors ε ~ N(0, σ²i). In this case normalized covariates means relative deviations from reference values. The reference values are defined as V Cl = 5.6 V, I = 3.6 A, t p = 600 µs and t rep = 9 ms. With these assumptions the likelihood function is defined as follows: 6

7 L n 1 (, ) = ( ) t β σ X y πσ exp ( y Xβ ) ( y Xβ ), σ (3) To complete the posterior distribution, suitable prior distributions for the parameters are needed. The suggestion is to use non-informative prior distributions because in this case the amount of data is sufficiently large to apply Bayesian learning. The priors for these parameters are defined as: p( β i ) const. and 1 p ( σ ) (4) σ Over the support [- ; ] and [0; ], respectively. With this the model is defined and the posterior distribution can be calculated. 7

8 3. Posterior Distributions According to equation 1 the joint posterior distribution is given as p( β, σ y) L( β, σ y) p( β ) p( σ ). (5) The posterior distributions for the model parameters and the variance can be achieved by integration. This leads to a multivariate-t distribution for the model parameters and an inverse gamma distribution for the variance (Gill, 008). Table 1 shows the summary statistic for the posterior distributions. Table 1: Summary statistics of posterior distributions for Bayesian LM mean st. deviation β β β β β σ² From a physical point of view the achieved mean values make sense, because: o The intercept (β 0 ) is comparable to the mean of the reference data set (= 6.90). o The model parameters β 1, β and β 3 correspond to the voltage, the current and the pulse length. A negative value indicates that increasing this parameter leads to a decrease in lifetime. o β 4 needs to be positive since increasing the repetition time results in a lower stress level and hence in a longer lifetime. The covariates of the LM do not carry physical dimensions since relative deviations have been used, therefore the parameter with the highest numeric value (β ) has the highest impact. The standard deviations of the posterior distributions shown in table 1 indicate 8

9 that a learning process from the prior to the posterior distribution is given. Therefore using non-informative priors is justified. 3.3 Model quality With these results the posterior predictive distribution can be calculated. This is important because the main aim is to make predictions for future tests. The posterior predictive distribution is the predicted distribution of new data after observing and including the information of old data. For a given set of new data (y new ), the posterior predictive distribution is (Gill, 008): Θ p( y X, y) = p( y θ, X, y) p( θ X, y) dθ (6) new new with θ = {β, σ²}. To determine the goodness of the fit of the posterior predictive distribution, which is important to identify problems and weaknesses of the model, the Bayesian χ²-test (Hamada et al., 007) is used. This test is similar to the standard Pearson χ²-test, but with the difference that instead of one test several tests (in this work 10000) are performed. The critical value is the percentage of failed χ²-tests. Predicting the distribution for all 10 datasets and calculating the goodness of the fit shows that the model still has weaknesses. Only tests with a heavy second branch can be predicted well (see figure 4(b)), tests which show a mixed behavior or have a heavy first branch cannot be modeled sufficiently well (see figure 4(a)). One reason for this result may be the higher amount of devices failing in the second branch than in the first one. This means that the second branch has more influence on the model parameters. 9

10 data post. pred. dist 0.5 data post. pred. dist density 0.4 density logctf logctf (a) test with heavy first branch (b) test with heavy second branch Figure 4: Comparison of posterior predictive and measured distribution The goodness of fit test showed that this model can only be used for devices sharing the same failure mechanism. From experience and physical failure inspection it is known that two failure mechanisms occur, as explained in Section. An increase in quality may be achieved by adapting the model to the observed behavior of the DUTs. The idea is to include the mixture of two failure mechanisms in the distribution of the data. 4 BAYESIAN LINEAR MODEL WITH MIXED DISTRIBUTIONS 4.1 Model definition For the given data the idea of adapting the model to the DUTs' behavior can be put into practice by using a mixture of two normal distributions with a mixing proportion π (Escobar et al., 1995, Diebolt et al., 1994): y ~ π N( µ, σ ) + (1 π ) N( µ, σ ) (7) 1 As before the vectors of means (µ 1 and µ ) will be modeled with LMs dependent on the four test parameters clamping voltage, current, pulse length and repetition rate: 10

11 t µ = X β + ε = β + β V 1 t µ = X γ + ε = γ + γ V CL CL + β I + β t + γ I + γ t 3 3 p p + β t + γ t 4 4 rep rep + ε + ε (8) With X the matrix of normalized covariates and normal random errors ε ~ N(0,σ²I). For the mixture model it is not possible to use only non-informative prior distributions, since it has to be ensured that the intercepts (β 0 and γ 0 ) differ and correspond to the mean values of the first and second branch, respectively. density R0 T0 R8 R log(ctf) Figure 5: Densities of tests with mixed behavior Investigations of previous tests show (see figure 5) that the mean of the first branch devices lies in the interval [, 6] and of the second branch devices in [6, 10]. This information is included in the following prior distributions: β ~ [, 6] and ~ [ 6, 10] 0 U γ (9) 0 U The priors for the remaining model parameters and the variance are chosen as before, non-informative. As a last step for the model definition, the mixing proportion will be modeled dependent on a parameter. This proportion is the probability of devices failing into the first branch. The classification into the two branches was done with the uniform 11

12 priors assumed in equation 9, hence the first branch contains devices with logctf values from to 6. Data analysis showed that the square root of the probabilities correlates with the calculated peak temperatures (T peak ) of the DUTs (see figure 6). The peak temperature itself depends on the four test settings. SQRT(prob. 1.branch) 1 0,8 0,6 0,4 0, Tests Fit peak temperature Figure 6: Dependence between T peak and percentage of devices failing in the first branch For the correlation analysis shown in figure 6, datasets with a peak temperature outside the interval [35, 350] have been excluded, because they do not show a mixing behavior. The coefficient of correlation for this model is R² = These results lead to the following function for the mixing proportion: 0 for Tpeak 36.9 π ( Tpeak ) = 1 for Tpeak 35.4 (10) (0.039 Tpeak 1.761) else With the definitions stated above the model is defined and the posterior distributions can be calculated. 1

13 4. Posterior distribution As before, the joint posterior distribution is defined by equation 5, but for the model with mixed distribution the posterior needs to be simulated. The simulation is done with the slice sampling algorithm (Casella et al. 00) in MATLAB. Using mixed distributions leads to more flexibility in the model, but also to a higher number of parameters to estimate with the same amount of data. The consequence is more variation in the posterior distribution (see table ). Table : Summary statistics of posterior distributions for Bayesian LM with mixed distribution mean st. deviation.5% 5% 50% 95% 97.5% β β β β β γ γ γ γ γ σ² From a physical point of view, the analysis of the summary statistics of the posterior distributions given in table leads to similar results as for the first model. The mean values of the parameters are reasonable because the intercepts correlate with the mean values of the two branches, the model parameters corresponding to the voltage (β 1, γ 1 ), the current (β, γ ) and the pulse length (β 3, γ 3 ) are negative and the ones corresponding to the repetition rate (β 4, γ 4 ) are positive. The uniform priors given to the intercepts (β 0 and 13

14 γ 0 ) are also justified, because the posterior densities do not show truncation at the applied limits. The quality of the second model cannot be checked with the Bayesian χ²-test which is only valid for normal distributed data without mixture. As a first indicator for model quality the densities of the data and of the posterior predictive distribution can be compared. Figure 7 shows that the posterior predictive distributions based on the mixture do not model the data well in all cases. In some cases the model tends to exaggerate (see figure 7(b)) data post. pred. dist data post. pred. dist density 0.3 density logctf logctf (a) good fit (b) exaggerated fit Figure 7: Comparison of the posterior predictive and the measured distribution of the data Another indicator for an increase in quality may be the decrease in the mean of the variance of the data (σ²) from 0.55 to 0.0, but table shows also higher variations in the model parameters which may lead to flatter distributions with less information. The mentioned methods can not be reasonably used to compare the two models because they contain subjective judgments, therefore an objective method is needed. 14

15 5 COMPARISON OF THE TWO MODELS For the comparison of the two models, leave-one-out cross validation (LOOCV) is used (Refaeilzadeh et al., 009). Normally, a more efficient cross validation method should be used (e.g. 10-fold cross validation), but in this case the computational effort keeps within limits because the number of tests is small. For the LOOCV the posterior predictive distribution for each test (= validation dataset) is simulated iteratively by using the data of the remaining 9 tests (= training dataset) to simulate the posterior distribution. Next the sum of squared errors of each prediction is calculated based on the validation dataset. n i= 1 ( prediction i validation i ) (11) where n is the sample size of the validation dataset. As a critical value for the LOOCV the mean of the sum of squared errors over all 10 datasets is chosen. This means that the model with the smaller critical value performs better. Cross validation confirms the assumption that the Bayesian LM with mixed distributions fits the data better. The critical value of this model (=7.75) is significantly smaller than the one of the Bayesian LM (= 57.8). To check the overall increase in quality, also comparison in pairs was performed (see figure 8). 15

16 Figure 8: Comparison in pairs of sum of squared errors for each test In figure 8 the sum of squared errors of the prediction for each test are plotted. The tests are sorted according to their decreasing peak temperatures, this is that test T1 has the highest peak temperature and test T09 the lowest, respectively. For better visibility the y- axis is reduced. In 9 out of 10 cases the mixture leads to a smaller sum of squared errors. Only for test T15 the sum of squared errors is slightly higher for the model using the mixture. This may be caused by a bad temperature calculation combined with an error of the model given in equation 10, because the probability of devices failed in the first branch in test T15 is 0.9, but the estimated probability is only This leads to an overweighting of the second branch. Eye-catching are also the peaks at tests T1, T04 and T09 for the Bayesian LM. One reason for the bad prediction for test T1 is that the stress level for this test was the highest, hence these DUTs had the lowest lifetime among all DUTs (see figure ). In this case the prediction is an extrapolation which may cause a big variation between the predicted and the measured values. The same happens for the lowest stress level test located at the opposite end of the lifetime scale, this is test T09. 16

17 The error is also high for T13, T04 and T14, because these tests show mixtures. These results indicate on one hand poor extrapolation quality for the Bayesian LM and on the other hand the need of the mixture for model improvement. 6 SUMMARY AND CONCLUSION This work has shown that the lifetime of semiconductor devices is not linearly dependent on the applied test settings, because the Bayesian LM based on the four test parameters leads to bad results. One reason is the occurrence of two dominating failure mechanisms. To include this information into the model, a mixture of two normal distributions has been used to model the logctf values. Correlation analysis of the data with measured and calculated test parameters has pointed out that the mixing proportion can be modeled dependent on the peak temperature. This model leads to significantly better results than the model without mixture, this was demonstrated by leave-one-out cross validation. The mixture model does not lead to better results for all tests, therefore further improvement is needed. BIBLIOGRAPHY Bluder, O. (008). Statistical Analysis of Smart Power Switch Life Test Results. Diploma thesis, Alpen-Adria-University of Klagenfurt, Austria. Bluder, O. (009). Bayesian Lifetime Modeling for Power Semiconductor Devices. Proc. World Congress on Engineering and Computer Science 009, San Francisco, USA, Casella, G., Mengersen, K. L. Robert, C. P., Titterington, D. M. ( 00). Perfect slice samplers for mixture distributions. Journal of the Royal Statistical Society B 64:

18 Dey, D.K. Gosh, S.K. Mallick, B.K. (000). Generalized Linear Models: a Bayesian perspective. New York: Marcel Dekker Inc. Diebolt, J., Robert, C. P. (1994). Estimation of Finite Mixture Distributions through Bayesian Sampling. Journal of the Royal Statistical Society B 56: Escobar, L.A. Meeker, W.Q. (006). A Review of Accelerated Test Models. Statistical Science 1: Escobar, M. D. West, M. (1995). Bayesian Density Estimation and Inference Using Mixtures. Journal of the American Statistical Association 90: Gill, J. (008). Bayesian methods. Boca Raton (FL): Chapman & Hall/CRC. Glavanovics, M. Estl, H. Bachofner, A. (001). Reliable Smart Power Systems ICs for Automotive and Industrial Application - The Infineon Smart Multichannel Switch Family. Proc. 43. International Conference Power Electronics, Intelligent Motion, Power Quality (PCIM), Nürnberg, Germany. Glavanovics, M. Köck, H. Eder, H. Košel, V. Smorodin, T. (007). A new cycle test system emulating inductive switching waveforms. Proc. 1th European Conference on Power Electronics and Applications (EPE), Aalborg, Denmark, 1 9. Hamada, M.S. Wilson, A.G. Reese, C.S. Martz, H.F. (007). Bayesian Reliability. New York: Springer Science + Business Media. Kohavi, R. (1995). A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. Proc. of the 14th Joint Conference on Artificial Intelligence (IJCAI), Montreal, Quebec, Canada, Refaeilzadeh, P. Tang, L. Liu, H. (009). Cross Validation. In Özsu, M.T. Ling, L. (Eds): Encyclopedia of Database Systems. Springer. Shao, J. (1993). Linear model selection by cross-validation. Journal of the American Statistical Association 88:

Proceedings of the 2014 Winter Simulation Conference A. Tolk, S. Y. Diallo, I. O. Ryzhov, L. Yilmaz, S. Buckley, and J. A. Miller, eds.

Proceedings of the 2014 Winter Simulation Conference A. Tolk, S. Y. Diallo, I. O. Ryzhov, L. Yilmaz, S. Buckley, and J. A. Miller, eds. MODELING AND PREDICTION OF SMART POWER SEMICONDUCTOR LIFETIME DATA