Lesson 13: Box-Jenkins Modeling Strategy for building ARMA models

Lesson 13: Box-Jenkins Modeling Strategy for building ARMA models Facoltà di Economia Università dell Aquila umberto.triacca@gmail.com

Introduction In this lesson we present a method to construct an ARMA(p, q) model. The so-called Box-Jenkins Modeling Strategy.

Introduction The Box-Jenkins approach to modeling ARMA(p, q) models was described in a highly influential book by statisticians George Box and Gwilym Jenkins in 1970. Box, G.E.P. and G.M. Jenkins (1970) Time series analysis: Forecasting and control, San Francisco: Holden-Day.

Introduction The Box-Jenkins modelling procedure involved a preliminary analysis (Data Transformation) and an iterative three-stage process: 1 Model identification 2 Model estimation 3 Model checking

Introduction Each stage concerns a question. Preliminary analysis: Is the time series stationary? 1 Model identification: What class of models probably produced the (transformed) series? 2 Model estimation: What are the model parameters? 3 Model checking: Are the residuals from the estimated model white noise?

The assumption of stationarity The assumption that our time series is a realization of a stationary process is clearly fundamental in time series analysis. The Box-Jenkins methodology requires that the ARMA(p, q) process to be used in describing the DGP to be both stationary and invertible. Thus, in order to construct an ARMA model, we must first determine whether our time series can be considered a realization of a stationary process. If it is not, we must transform the time series in order to get the stationarity.

The assumption of stationarity A time series can be considered a realization of a stationary stochastic process if: 1 if there is no systematic change in mean (no trend), 2 if there is no systematic change in variance, 3 if there is no periodic variation.

Data Transformation In this stage a very useful tool is the graph of the series. From the plot of the time series values we can obtain useful indications concerning the stationarity of the process. If the observed values of the time series seem to fluctuate with constant variation around a constant mean, then it is reasonable to suppose that the process is stationary, otherwise, it is nonstationary.

Time series Figure : Time plot of a series generated by a stationary ARMA process.

Time series In the practice many time series cannot be considered like realizations of stationary processes.

Time series Consider an example the Airline series. Figure : Monthly totals in thousands of international airline passengers from January 1949 to December 1960.

Time series The plot shows that: 1 The number of passengers tends to increase over time (positive trend). 2 The spread or variance in the counts of passengers tends to increase over time. 3 The number of passengers tends to peak in certain months in each year.

Time series Figure : Monthly totals in thousands of international airline passengers from January 1949 to December 1960. Figure : Time plot of a series generated by a stationary ARMA process.

Time series Conclusion: Figure : Monthly totals in thousands of international airline passengers from January 1949 to December 1960. this time series cannot be considered like a realization of a stationary process.

Making a time series stationary Goal : Make the data set airlines stationary

Variance stabilizing techniques First, we want to stabilize the increasing variability of the series.

Variance stabilizing techniques To stabilize the variance, we can use the Box-Cox Transformation:

The Box-Cox Transformation The Box-Cox Transformation xt λ 1 if λ 0 λ y t = log(x t ) if λ = 0 where the parameter λ is chosen by the analyst. Different values of λ yield different transformations. Popular choices of the parameter λ are 0 and 1/2.

Mathematical Foundation of the Box-Cox Transformation with λ equal to 0 or 1/2 Why is it often the case that either λ = 0 or λ = 1/2 is adequate?

Mathematical Foundation of the Box-Cox Transformation with λ equal to 0 or 1/2 Consider a time series x t such that x t = µ t +v t where µ t is a nonstochastic mean level. Suppose that the variance of the time series x t has the form var(x t ) = var(v t ) = µ 2 tσ 2 The variance of the series is varying according to the mean level.

Mathematical Foundation of the Box-Cox Transformation with λ equal to 0 or 1/2 We want to find a transformation g on x t such that the variance of g(x t ) is constant.

Mathematical Foundation of the Box-Cox Transformation with λ equal to 0 or 1/2 By using the Taylor s approximation we have g(x t ) = g(µ t )+g (µ t )(x t µ t ) Thus var(g(x t )) = [g (µ t )] 2 var(x t ) = [g (µ t )] 2 µ 2 tσ 2

Mathematical Foundation of the Box-Cox Transformation with λ equal to 0 or 1/2 We require that var(g(x t )) = constant Therefore g is chosen such that g (µ t ) = 1 µ t

Mathematical Foundation of the Box-Cox Transformation with λ equal to 0 or 1/2 This implies that g(µ t ) = log(µ t ) resulting in the usual logarithmic trasformation.

Mathematical Foundation of the Box-Cox Transformation with λ equal to 0 or 1/2 If then which implies that var(x t ) = µ t σ 2 g (µ t ) = µ 1/2 t g(µ t ) = 2µ 1/2 t resulting in the square-root trasformation.

Mathematical Foundation of the Box-Cox Transformation with λ equal to 0 or 1/2 If x t = µ t +v t var(x t ) = µ 2 tσ 2 the appropriate transformation is the log-trasformation.

Mathematical Foundation of the Box-Cox Transformation with λ equal to 0 or 1/2 If x t = µ t +v t var(x t ) = µ t σ 2 the appropriate transformation is the square-root trasformation.

Mathematical Foundation of the Box-Cox Transformation with λ equal to 0 or 1/2 If the variance of the series appears to increase quadratically with the mean, the logarithmic transformation (λ = 0) is appropriate; If the variance increases linearly with the mean, we should use λ = 0.5, that is the square-root trasformation.

Time series Figure : Monthly totals in thousands of international airline passengers from January 1949 to December 1960. Consider the log transformation y t = log(x t ) t = 1,2,...,T

Time series Figure : Log of Monthly totals in thousands of international airline passengers from January 1949 to December 1960. The log transformation has removed the increasing variability.

Time series In order to remove the trend and the seasonal component, we decide to use the differencing method. By using the filter 12 = 1 L 12 we remove the seasonal component Figure : (1 L 12 ) Log of Monthly totals in thousands of international airline passengers

Time series Finally, we use the filter = 1 L in order to remove the non-stationarity in mean.

Time series The transformed series is given by z t = 12 log(x t ) t = 1,2,...,T We see that the differencing has well removed the trend and the seasonal component. Figure : (1 L)(1 L 12 ) Log of Monthly totals in thousands of international airline passengers

Time series Figure : (1 L)(1 L 12 ) Log of Monthly totals in thousands of international airline passengers Figure : Time plot of a series generated by a stationary ARMA process.

The DGP s model DGP ARMA z k,...,z T x 1,...,x T

Conclusion After the data have been rendered stationary, we are ready to fit an appropriate model to the data. This is the subject of the next lessons.

Lesson 13 BIS: The Identification of ARMA Models Dipartimento di Ingegneria e Scienze dell Informazione e Matematica Università dell Aquila, umberto.triacca@ec.univaq.it Lesson 13 BIS: The Identification of ARMA Models

Identification Consider an ARMA process x t ARMA(p,q) Before an ARMA(p,q) model can be estimated we need to select the order p and q of the AR and MA-polynomial Following the Box and Jenkins s terminology we will refer to this step as identification of the appropriate ARMA model Lesson 13 BIS: The Identification of ARMA Models

Identification The guidelines for the choice of p and q come from the shape of two sample functions: 1 The Sample AutoCorrelation Function (SACF) 2 The Sample Partial AutoCorrelation Function (SPACF) Lesson 13 BIS: The Identification of ARMA Models

Identification The sample autocorrelation and partial autocorrelation functions should reflect (with sampling variation) the properties of the theoretical autocorrelation and partial autocorrelation functions of the process. In order to identify the order of the model, the SACF and SPACF are compared with the theoretical ACF and PACF, respectively. The sample autocorrelation plot and the sample partial autocorrelation plot are compared to the theoretical behavior of these plots. Lesson 13 BIS: The Identification of ARMA Models

Identification The theoretical behavior of ACF and PACF If x t WN(0,σ 2 ), then ρ k = 0 and π k = 0 for all k; If x t AR(p) process, then ρ k 0 for all k, ρ k 0 as k and π k 0 for k p, π k = 0 for k > p; If x t MA(q) process, then ρ k 0 for k q, ρ k = 0 for k > q and π k 0 for all k, π k 0 as k ; If x t ARMA(p,q), then ρ k 0 for all k, ρ k 0 as k and π k 0 for all k, π k 0 as k. Lesson 13 BIS: The Identification of ARMA Models

Identification If x t AR(p) process, then ρ k decays exponentially (either direct or oscillatory) and π k cut off after the lag p. Figure : Lesson 13 BIS: The Identification of ARMA Models

Identification If x t MA(q) process, then ρ k cut off after the lag q and π k decays exponentially (either direct or oscillatory) Figure : Lesson 13 BIS: The Identification of ARMA Models

Identification If x t ARMA(p,q), then ρ k decay exponentially (either direct or oscillatory) and π k decay exponentially (either direct or oscillatory) Lesson 13 BIS: The Identification of ARMA Models

Identification The identification of a pure autoregressive or moving average process is reasonably straightforward using the sample autocorrelation and partial autocorrelation functions. On the other hand, as we will see, for ARMA(p,q) processes with p and q both non-zero, the SACF and SPACF are much more difficult to interpret Lesson 13 BIS: The Identification of ARMA Models

Identifying the orders p and q by using Information Criteria The mixed models can be particularly difficult to identify by using the correlogram and the partial correlogram. For this reason, in recent years information-based criteria such as AIC (Akaike Information Criterion) and BIC (Bayes Information Criteria) and others have been preferred and used. Lesson 13 BIS: The Identification of ARMA Models

Model Idendification The AIC statistic is defined as AIC(p,q) = ln(ˆσ 2 )+ 2(p +q) T where ˆσ 2 is the maximum likelihood estimated of the white noise variance. Among a set of models, we select the values of p and q for our fitted model to be those which minimize AIC(p,q). Lesson 13 BIS: The Identification of ARMA Models

Model Idendification Intuitively one can think of 2(p +q) T as a penality term to discourage over-parameterization. Lesson 13 BIS: The Identification of ARMA Models

Model Idendification There is an empirical evidences that AIC has the tendency to pick models which are over-parameterized. The BIC is a criterion which attempts to correct the overfitting nature of the AIC. It is defined to be BIC(p,q) = ln(ˆσ 2 )+ ln(t)(p +q) T Lesson 13 BIS: The Identification of ARMA Models

Model Idendification We note that BIC penalizes larger models more than AIC. ln(t) T > 2 T T 8 Lesson 13 BIS: The Identification of ARMA Models

Model Idendification The procedure to use these criteria is the following: 1 Set upper bounds, P and Q for the AR and MA order, respectively 2 Fit all possible ARMA(p,q) models for p P and q Q using a common sample size T 3 The AIC(p A,q A ) and BIC(p B,q B ) of the best models satisfy, rispectively, AIC(p A,q A ) = min p P,q Q AIC (p,q) BIC(p B,q B ) = min p P,q Q BIC (p,q) Lesson 13 BIS: The Identification of ARMA Models

Model Idendification The theoretical properties of these criteria have been investigated. It is known that BIC is consistent in the sense that the probability of selecting the true model approaches 1 (if the true model is in the candidate list), but AIC is not. Lesson 13 BIS: The Identification of ARMA Models

Some examples Lesson 13 BIS: The Identification of ARMA Models

Some examples The blue dotted parallel lines show approximative 95% confidence intervals for the null hypotesis H 0 : ρ k = 0 and H 0 : π k = 0, respectively Lesson 13 BIS: The Identification of ARMA Models

Some examples Lesson 13 BIS: The Identification of ARMA Models

Some examples Table : Selection ARMA order by AIC and BIC. Orders p,q of ARMA model 2,2 2,1 1,2 2,0 0,2 1,1 1,0 0,1 AIC 288.7 286.8 286.7 306.6 293.7 285.2 325.5 320.4 BIC 304.4 299.9 299.8 317.1 304.2 289.4 333.4 328.3 Lesson 13 BIS: The Identification of ARMA Models