ARIMA Modelling and Forecasting

ARIMA Modelling and Forecasting Economic time series often appear nonstationary, because of trends, seasonal patterns, cycles, etc. However, the differences may appear stationary. Δx t x t x t 1 (first difference) Δ 2 x t Δx t Δx t 1 x t 2x t 1 x t 2 (second difference) etc. In the case of a seasonal pattern, the seasonal differences could be stationary. Define where S 4orS 12 say. Δ S x t x t x t S The ARIMA (autoregressive integrated moving average) model is LΔ S d x t L t where L 1 p L p, L 1 q L q and d 0,1 or (occasionally) 2. Assuming no seasonal pattern (S 1), we speak of the ARIMA(p,d, q) model where p, d, and q are integer values to be chosen. (c) James Davidson 2014 5.1 09/05/2014

Box-Jenkins Methodology Box and Jenkins (Time Series Analysis: Forecasting and Control, 1970) advocated a forecasting technique based on the ARIMA model. The basic steps described by B-J are as follows: 1. Choose smallest d so that the series appears stationary. 2. Choose values for p and q, estimate ARMA model for series Δ d x t by ML. 3. Check correlogram of residuals for evidence of autocorrelation. 4. Repeat 2 and 3 as necessary to choose the most parsimonious model that accounts for autocorrelation. The procedure of choosing d, p and q is called (by B-J) model identification. B-J advocate using known autocorrelation patterns of AR and MA processes to choose p and q. Thus. correlogram of AR dies out exponentially as lag increases, while correlogram of MA cuts off at q1 lags. If correlogram does not die out fast enough (or estimated AR root close to 1) may need to increase d. (c) James Davidson 2014 5.2 09/05/2014

Testing for Uncorrelatedness Recall the correlogram of a stationary process, j Covx t,x tj Varx t Suppose x t is an independent series. Then, j 0forallj 0. The sample counterparts are j 1,2, 3, r j T j x t1 t x x tj x 0 T x t x 2 t1 but these statistics should be small, on average. It can be shown that s.e.r j T 1/2 and that T r j d N0,1. When x t is serially independent, it can also be shown that r j and r k for j k are asymptotically independent of each other. Therefore, under the null hypothesis of independence, for any chosen m, Q T j1 m rj 2 d 2 m. The problem is to choose m large enough to detect all deviations from independence, but should not generally exceed T/3. (c) James Davidson 2014 5.3 09/05/2014

Testing an ARIMA Specification Let t denote the residuals from ARIMA(p,d,q) estimation. Then let r j T j1 t1 t tj T t1 t 2 Let the null hypothesis be that the specification is correct that is, t iid0, 2 in LΔ S d x t L t Box and Pierce (1970) show that in this case, m r j2 d 2 m p q Q T j1 Ljung and Box (1978) suggest a small-sample correction: Q TT 2 j1 m r j 2 T j Since T while m is fixed, note that this has the same asymptotic distribution as Q. Simulations suggest that the small-sample distribution is closer to the limit case. (c) James Davidson 2014 5.4 09/05/2014

Testing for Nonlinear Model Features Suppose model has uncorrelated disturbances. Does this mean that they are independent? Nonlinear dependence (given uncorrelateness of levels) can be tested by the Q statistic computed for the squares of the series. where r T j1 t1 j m Q T j1 r j2 t2 2 2 tj 2 T t2 2. 2 t1 This statistic has the 2 m distribution in the limit, when the disturbances are independent. (see McLeod and Li (1983), Li and Mak (1994)) If the series is autocorrelated, this test will tend to reject whether or not there is nonlinear dependence. Thus, the test should be performed only if the usual Q test does not reject. Require at least 4th moments to exist for the test to be valid. (c) James Davidson 2014 5.5 09/05/2014

Consistent Model Selection Criteria A consistent criterion is one that selects the correct model with probability 1 as T, when this is one of the alternatives examined. Suppose there are M models under consideration, indexed by k 1,,M. Let the maximized (quasi-) log likelihood be L T T 2 log T 1 t t2. Let L kt maximized log-likelihood for model k,andp k number of fitted parameters in model k. If we simply selected model k giving the largest L kt we should tend to over-fit, by including too many parameters. Some penalty for allowing extra parameters is necessary. Hence, for some sequence r T, choose k to maximize A kt L kt p k r T For consistency we require r T as T, but r T /T 0. (c) James Davidson 2014 5.6 09/05/2014

Let L k plimt 1 L kt. T If the true model (case k m) is among those considered, we should find L k L m for all k. A consistent criterion should choose the case k m with probability 1asT. Cases: 1. If the kth model is incorrect in the sense L k L m then A mt pr L m 1. A kt L k 2. Suppose L k L m but p k p m (kth model correct but over-parameterized). By assumption, L kt L mt O p 1 while L kt O p T. Hence, A mt A kt L mt p m r T p 1 r k p m T L kt p k r T L kt p k r T 1 with probability 1asT. O p T 1 Note: the second r.h.s. term is O p r T /T and dominates the last one as T. In each case, the false model is rejected w.p. 1asT. Pmodel m selected can be made as near 1 as desired by taking T large enough. (c) James Davidson 2014 5.7 09/05/2014

Three popular criteria: 1. Akaike criterion: maximize L T k 2. Schwarz criterion: maximize L T 1 2 klogt 3. Hannan-Quinn criterion: maximize L T klog logt When the number of parameters in the true model is finite, and it is included in the models compared: The Akaike criterion is not consistent. The Schwarz and H-Q are consistent. Schwarz criterion favours the most parsimonious models. The Akaike criterion tends to select larger (less parsimonious) models than the other two. It is argued that the Akaike criterion can be viewed as consistent when the class of models include infinite-parameter cases (e.g. AR( Caution: some books/software packages reverse the signs of the criteria, so that the rule is to choose the model that minimises the criterion. There is no accepted convention in this respect - read the small print! (c) James Davidson 2014 5.8 09/05/2014

Forecasting: General Principles Given information represented by sigma field F t, we wish to forecast a variable Y (e.g. Y x tm ). If Z t is a F t -measurable forecast function, let EY Z t 2 the mean squared error of forecast (MSE) be the preferred loss function. The aim is to choose Z t to make the loss as small as possible. Note, other loss functions are possible - this one is symmetric, weighting positive and negative errors equally. As we know, setting Z t equal to EY F t gives the minimum MSE forecast. Therefore, consider the ARMA model: Note that x tm 1 x tm 1 p x tm p tm 1 tm 1 q tm q Ex tm F t 1 Ex tm 1 F t p Ex tm p F t m t q tm q The last terms vanish if m q. Ex tm j F t x tm j if j m. (c) James Davidson 2014 5.9 09/05/2014

Forecasting using the ARIMA Let hats denote estimated parameters, residuals. Forecast of period T m from date T, form 1,2,3, is m 1 p x Tm T j1 jx Tm j T jm q jx Tm j jm j Tm j The first sum is empty if m 1. The second sum is empty if m p. The third sum is empty if m q. If d 1: Fit the ARMA in the differences. Compute the forecasts as m x Tm T x T j1 Δx Tj T Note that in this case the intercept is the coefficient of linear trend (drift). (c) James Davidson 2014 5.10 09/05/2014

Forecast Confidence Intervals The forecast error variances can be calculated from the MA() form of the model. Ignoring errors in parameter estimates, the difference between the m-period-ahead forecast and the out-turn is found by putting the unknown future shocks to zero. Case 1: d 0. x Tm 1 L L Tm a Tm b 1 Tm 1 b 2 Tm 2 b m T (say, defining b 1, b 2, ). The forecast of x Tm (assuming parameters known) is x Tm T a b m T b m1 T 1 b m2 T 2 Hence, the forecast error is f Tm T x Tm x Tm T Tm b 1 Tm 1 b m 1 T1. Assuming that t iid0, 2, note that Ef Tm T 0 (unbiased forecasts) Varf Tm T 2 1 b 2 2 1 b m 1 (c) James Davidson 2014 5.11 09/05/2014

Case 2: d 1. and hence x Tm x T Δx T1 Δx Tm. and x Tm x Tm T f T1 f Tm Tm 1 b 1 Tm 1 1 b 1 b m 1 T1 where Notice the difference between the two cases. Varf Tm T 2 1 c 2 2 1 c m 1 c j 1 i1 If d 0, the forecast error variance tends as m to the unconditional variance of the process: Varf Tm T 2 2 bj. j0 j b i If d 1, then Varf Tm T Om. (c) James Davidson 2014 5.12 09/05/2014

State Space Modelling A formalized representation of model dynamics. Let y t m 1 represent a vector of observed variables, and t r 1 an unobserved state vector. The evolution of y t is described by two equations: State equation (or transition equation) t T t 1 v t, Measurement equation y t H t w t Ev t 0, Ev t v t Q Ew t 0, Ew t w t R Also assume Ev t w t 0. Optionally, the matrices T, H, Q and R can be time-dependent, and receive t subscripts. Optionally, either measurement equation or state equation can include explanatory variables ("control variables") treated as exogenous. An important motivation for the state-space form is that almost any linear time series model can be cast into this form. (c) James Davidson 2014 5.13 09/05/2014

Example: the ARMA(p,q). Let m 1, r maxp,q 1, and consider the ARMA(r,r 1) case: extend the AR or MA orders as required, by specifying zero coefficients. Defining a scalar state variable z t, put t z t z t 1 z t r1, T 1 2 r 1 r 1 0 0 0 0 1 0 0 0 1 0, v t t 0 0 (basically, an application of the companion form) and also H 1 1 r 1, R 0, hence w t 0. Note that the state equation effectively takes the form Lz t t. The system resolves as or Ly t L t where 1. y t H t Lz t L L t (c) James Davidson 2014 5.14 09/05/2014

The Kalman Filter This is a computer algorithm for solving a state space model recursively. It has uses in model simulation and forecasting. It can provide a convenient vehicle for evaluating the likelihood function in dynamic models, generating the sequence of residuals from the data. It is popular in applied work, since numerous software packages are available to compute it, having a wide range of applications. Consider the above state-space model. Suppose we wish to predict the sequence of states t, given a sequence of observations y 1,,y t. Assume that the matrices T, H, Q and R are fixed and known. (In the general formulation, they can depend on t.) Let the estimator of the state vector t at time t be a t. Let the covariance matrix of the state estimator be 2 P t Ea t t a t t. (c) James Davidson 2014 5.15 09/05/2014

Given a t 1 and P t 1, the basic Kalman recursion for period t is now as follows: Prediction step: a t t 1 Ta t 1 Updating step: P t t 1 TP t 1 T Q a t a t t 1 P t t 1 HHP t t 1 H R 1 y t Ha t t 1 P t P t t 1 P t t 1 HHP t t 1 H R 1 H P t t 1 The matrix K t P t t 1 HHP t t 1 H R 1 is called the Kalman gain. Starting the iterations at initial values a 0,andP 0 satisfying the steady-state condition P 0 TP 0 T Q, the filter provides a sequence of one-step-ahead estimates of the state. The Kalman formulae are derived to minimise the mean squared error 2 P t, conditional on information up to date t. The forecast of y t, given information up to date t 1, is then ŷ t t 1 Ha t t 1. To forecast multiple steps ahead, simply iterate the prediction steps without updating. A smoothing recursion can also be used to estimate 1,, T from the full sample y 1,,y T. (c) James Davidson 2014 5.16 09/05/2014

Illustration - the EWMA (exponentially weighted moving average) Consider the scalar Gaussian state-space model, sometimes called a local level model: t t 1 t t NID0, q 2 y t t t t NID0, 2 so that in this case T 1andH 1, R 1andQ q (all scalars). Note that this is an ARIMA(0,1,1). We may solve it as Δy t t Δ t. Observe that 0 E t Δ t 2 2 q 2 whereas j E t Δ t t j Δ t j 2 j 1 0 j 1. Hence Δy t is a MA(1) process, and the MA coefficient is found as a function of q by solving the quadratic 1 0 1 q 2 1. 2 (c) James Davidson 2014 5.17 09/05/2014

Kalman Recursion To start the recursion set a 1 y 1,andP 1 Ey 1 1 2 / 2 1. The stationary form of the updating equation for P t is P P q P q 2 1 P q, which simplifies to P 2 P q q with solutions P 1 2 q q2 4q where the positive solution is the relevant one. The prediction and updating equations therefore yield a steady state form of P q a t a t 1 1 P q y t a t 1 which rearranges as a t 1 a t 1 y t where P q/1 P q. It is easy to see that this recursion has the solution t 1 a t j y t j j0 hence EWMA. (c) James Davidson 2014 5.18 09/05/2014

Comment The Kalman filter has proved a very important tool in engineering applications, such as rocketry. (The Apollo missions were its first major application.) It is also much promoted as a method for econometrics. It is therefore worth bearing in mind: The Kalman filter is just a tool for doing certain recursive calculations, especially in ARIMA-type models. It s not a modelling paradigm of itself. There is nothing the Kalman filter can do that cannot be done by more direct means, coded for the problem at hand. It cannot handle nonlinear time series models, or long memory models - it is ultimately limited in scope. I m not sure that it deserves the prominence it often receives in econometrics time series syllabuses. (A personal view!) (c) James Davidson 2014 5.19 09/05/2014