A Data-Driven Model for Software Reliability Prediction

A Data-Driven Model for Software Reliability Prediction Author: Jung-Hua Lo IEEE International Conference on Granular Computing (2012) Young Taek Kim KAIST SE Lab. 9/4/2013

Contents Introduction Background Overall Approach Detailed Process Experimental Results Conclusion Discussion 2 / 31

Introduction Background Overall Approach Detailed Process Experimental Results Conclusion Discussion SW Reliability Prediction Definition of SW Reliability Probability of failure-free operation of a software product in a specified environment for a specified time. SRM (Software Reliability Model) To estimate how reliable the software is now. To predict the reliability in the future. Two categories of SRMs Analytical Models: NHPP SRMs Data-Driven Models: ARIMA, SVM 3 / 31

Introduction Background Overall Approach Detailed Process Experimental Results Conclusion Discussion Data Driven Model Limitations of Analytical Models Software behavior changes during testing phase Assumption of all faults are independent & equally detectable is violated by the dataset. Data Driven Models Much less unpractical assumptions: developed from collected failure data. Easy to make abstractions and generalizations of the SW failure process: the approach of regression or time series analysis. 4 / 31

Introduction Background Overall Approach Detailed Process Experimental Results Conclusion Discussion Motivation Problems Actual SW failure data set is rarely pure linear or nonlinear No general model suitable for all situations Proposed Solution Hybrid strategy with both linear and nonlinear predicting model ARIMA model: Good performance in predicting linear data SVM model: Successful application to nonlinear data 5 / 31

Stationarity Statistical properties (mean, variance, covariance, etc.) are all constant over time. (1) E( y ) u for all t. t y 2 2 (2) Var( yt ) E[( yt uy ) ] y for all t. (3) Cov( y, y ) for all t. t tk k 60 50 μ 1, σ 12, γ 1 μ 2, σ 22, γ 2 60 50 40 30 20 10 40 Differencing 30 20 10 = μ 2, σ 22, γ 2 μ 1, σ 12, γ 1 0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 10 11 6 / 31

7 / 31 ACF (Autocorrelation Function) The correlation between observations at different distances apart (lag) where n t t n k t k t t k y y y y y y r 1 2 1 ) ( ) )( ( Background Detailed Process Introduction Experimental Results Conclusion Discussion Overall Approach 1 n t t y y n

PACF PACF (Partial ACF) The degree of association between y t and y t-k, when the effects of other time lags 1, 2, 3,, k-1 are removed. r kk r1 rk 1 k 1 j1 k 1 j1 r k 1, j r k 1, j r r k j k if if k 1, k 2,3, where r kj for j = 1, 2,, k-1. rk 1, j rkkrk 1, k j 8 / 31

PACF Removing Non-stationarity Differencing Differenced series: y t y t y t1 9 / 31

3 Prediction Models for Stationary Data AR (Auto Regressive) Model Use past values in forecast AR(p) y t = α 1 y t 1 + α 2 y t 2 + +α p y t p + ε t MA (Moving Average) Model Use past residuals (random events) in forecast MA(q) y t = ε t + β 1 ε t 1 + + β q ε t q ARMA (Auto Regressive & Moving Average) Model Combination of AR & MA ARMA(p, q) y t = α 1 y t 1 + α 2 y t 2 + +α p y t p + ε t +β 1 ε t 1 + + β q ε t q 10 / 31

PACF AR (Auto Regressive) Model (1/2) AR(p) y t = α 1 y t 1 + α 2 y t 2 + +α p y t p + ε t α i : Autocorrelation coefficient ε t : error at t Selection of a model ACF decreasing exponentially Directly: 0<a<1 Oscillating patter: -1<a<0 PACF identifying the order of AR model Autocorrelation Partial Autocorrelation 1.0 0.8 0.6 0.4 0.2 0.0-0.2-0.4-0.6-0.8-1.0 1.0 0.8 0.6 0.4 0.2 0.0-0.2-0.4-0.6-0.8-1.0 1 Autocorrelation Function for AR1 data series (with 5% significance limits for the autocorrelations) Exponentially Decreasing 5 10 15 20 25 30 35 Lag (oscillating) Partial Autocorrelation Function for AR1 data series (with 5% significance limits for the partial autocorrelations) 2 Cut off at Lag 1 AR(1) 4 6 8 10 Lag 12 14 40 16 45 18 50 20 11 / 31

2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 MA (Moving Average) Model (1/2) MA(q) y t = ε t + β 1 ε t 1 + + β q ε t q β i : MA parameter ε t : error at t Example Year Sales(B$) MA(3) 2000 1000 1000 + 1500 + 1250 3 2001 1500 2002 1250 MA(3) 2003 900 1250 2004 1600 1217 1800 2005 950 1250 2006 1650 1150 2007 1750 1400 1300 2008 1200 1450 2009 2000 1533 800 2010 2100 1650 2011 1767 Sales(B$) MA(3) 12 / 31

PACF MA (Moving Average) Model (2/2) Selection of a model ACF identifying the order of MA model PACF decreasing exponentially Directly: 0<a<1 Oscillating patter: -1<a<0 Autocorrelation 1.0 0.8 0.6 0.4 0.2 0.0-0.2-0.4-0.6-0.8-1.0 1 Autocorrelation Function for MA1 data series (with 5% significance limits for the autocorrelations) Cut off at Lag 1 MA(1) 5 10 15 20 25 Lag Partial Autocorrelation Function for MA1 data series (with 5% significance limits for the partial autocorrelations) 30 35 40 45 50 Partial Autocorrelation 1.0 0.8 0.6 0.4 0.2 0.0-0.2-0.4-0.6 Exponentially Decreasing (oscillating) -0.8-1.0 2 4 6 8 10 Lag 12 14 16 18 20 13 / 31

ARMA Model ARMA(p,q) = AR(p) + MA(q) y t = α 1 y t 1 + α 2 y t 2 + +α p y t p + ε t β 1 ε t 1 + + β q ε t q Procedures for model identification Guideline to determine p, q for ARMA 14 / 31

ARIMA Model Auto Regressive Integrated Moving Average (By Box and Jenkins (1970)) Linear model for forecasting time series data: Future values is a linear function of several past observations. ARIMA(p, d, q) Moving average of order q Integrated differentiation of order d (Expand to Non-Stationary Time Series) Auto Regression of order p 15 / 31

SVM (Support Vector Machine) Proposed by Vladimir N. Vapnik (1995, Rus) An algorithm (or recipe) for maximizing a particular mathematical function with respect to a given collection of data 4 Key Concepts: Separating hyperplane Maximum-margin hyperplane Soft margin Kernel function 16 / 31

Separating Hyperplane denotes +1 denotes -1 w x + b>0 f(x,w,b) = sign(w x + b) Separating Hyperplane (= Classifier) w x + b<0 17 / 31

Maximum Margin denotes +1 denotes -1 f(x,w,b) = sign(w x + b) Support Vectors are those data points that the margin pushes up Against Only Support vectors are used to specify the separating hyperplane!! x + X - M=Margin Width 18 / 31

Kernel Function (1/2) Nonlinear SVMs Datasets that are linearly separable with some noise work out great: 0 x But what are we going to do if the dataset is just too hard? 0 x How about mapping data to a higher-dimensional space: x 2 x 19 / 31

Kernel Function (2/2) Nonlinear SVMs: Feature Spaces General idea: The original input space can always be mapped to some higher-dimensional feature space where the training set is separable linearly. Definition of Kernel Function: some function that corresponds to an inner product in some expanded feature space. Φ: x φ(x) x 20 / 31

Genetic Algorithm Search & Optimization technique By J. Holland, 1975 Based on Darwin s Principle of Natural Selection Basic operations Crossover Mutation END Create inintial, random population (potential solutions) Evaluate fitness for each population Optimal or "good" solution found? No Selection or kill population Crossover Mutation 21 / 31

Support Vector Machines ARIMA Overall Approach (1/2) Random Initial Population Chromosome 1 Chromosome 2. Chromosome N Training SVM Model Initial Parameters Nonlinear Residual Yes Data Set Model Identification Model Estimation Is satisfied model checking? No Trained SVM Model Fitness Evaluation Yes Trained SVM Model (Nonlinear Forecasting) Trained ARIMA Model (Linear Forecasting) Support Vector Machines ARIMA Stop Criteria? No Genetic Operations + Software Reliability Prediciton Random Initial Population Chromosome 1 Data Set Chromosome 2... Chromosome N Initial Parameters Model Identification Model Estimation No Training SVM Model Nonlinear Residual Yes Is satisfied model checking? Trained SVM Model Fitness Evaluation Yes Trained SVM Model (Nonlinear Forecasting) Trained ARIMA Model (Linear Forecasting) Stop Criteria? No + Software Reliability Prediciton Genetic Operations 22 / 31

Overall Approach (2/2) X t = L t + N t X t : Time series data L t : Linear part of time series data N t : Nonlinear part of time series data After ARIMA model processing, we can get L t, ε t : L t : Predicted value of the ARIMA model ε t : residual at time t from the linear model ε t = X t - L t Finally, the residuals (ε t ) will be modeled by the SVM model with GA (Genetic Algorithm). 23 / 31

ARIMA Process (1/2) Data Set Model Identification Parameter Estimation Is satisfied model checking? Yes SW Reliability Prediction No Stationarize input data - Differencing, determine d - ACF, PACF checking Determination of the values of p and q - ACF, PACF checking MA(q) AR(p) ARMA(p,q) ACF Cuts after q Tails off Tails off PACF Tails off Cuts after p Tails off MLE (Maximum Likelihood Estimation) - Find a set of parameters q 1,q 2,..., q k to maximize L(q 1,q 2,..., q k )= f(x 1,x 2,..., x N ;q 1,q 2,..., q k ) 24 / 31

ARIMA Process (2/2) Data Set Model Identification Parameter Estimation Is satisfied model checking? Yes No Residual randomness Check - Residuals of the well-fitted model will be random and follow the normal distribution - Check ACF and PACF SW Reliability Prediction 25 / 31

SVM Process (1/2) Random Initial Population Chromosome 1 Chromosome 2.. Chromosome N Training SVM Model Initial Parameters Nonlinear Residual o Due to the characteristics of input data (randomness), random initial population selected - ex: C, ε, σ o Data set is divided into two part: training & testing data Trained SVM Model Fitness Evaluation Stop Criteria? Yes Trained SVM Model (Nonlinear Forecasting) No Genetic Operations 26 / 31

SVM Process (2/2) Random Initial Population Chromosome 1 Chromosome 2.. Chromosome N Training SVM Model Trained SVM Model Fitness Evaluation Stop Criteria? No Genetic Operations Yes Initial Parameters Nonlinear Residual Trained SVM Model (Nonlinear Forecasting) o The higher fitness value, the more survivability ability o The high-fitness valued candidate chromosome retained, & combined to produce new offspring. o GA is applied to SVM parameter search - No theoretical method for determining a kernel function and its parameter - No a priori knowledge for setting kernel parameter C. o Applied GA operations - Crossover operation - Mutation operation 27 / 31

Introduction Background Overall Approach Detailed Process Experimental Results Conclusion Discussion Experimental Results (1/2) Collected data: cumulative number of failures, x i, at time t i Data Set (DS-1) RADC (Rome Air Development Center) Project reported by Musa 21 weeks tested, 136 observed failures Output: predicted value, x i+1, using (x 1, x 2,, x i ) Goodness of fit curves Relative Error curves 28 / 31

Introduction Background Overall Approach Detailed Process Experimental Results Conclusion Discussion Experimental Results (1/2) Collected data: cumulative number of failures, x i, at time t i Data Set (DS-2) 28 weeks SW test, 234 observed failures Output: predicted value, x i+1, using (x 1, x 2,, x i ) Goodness of fit curves Relative Error curves 29 / 31

Conclusion Proposed hybrid methodology in forecasting software reliability: exploits unique strength of the ARIMA model and the SVM model Test results showed improvement of the prediction performance 30 / 31

Introduction Background Overall Approach Detailed Process Experimental Results Conclusion Discussion Discussion Pros Providing a possible solution of SRM selection difficulties Improving SW reliability prediction performance Cons Not present detailed test methods (ex: stop criteria for SVM, parameter estimation criteria for ARIMA, etc.) 31 / 31

Thank you!