Stat 5100 Handout #12.e Notes: ARIMA Models (Unit 7) Key here: after stationary, identify dependence structure (and use for forecasting)

Stat 5100 Handout #12.e Notes: ARIMA Models (Unit 7) Key here: after stationary, identify dependence structure (and use for forecasting) (overshort example) White noise H 0 : Let Z t be the stationary time series after transforming (including estimating out time trends and other covariates) the original time series Y 1,..., Y n Autocorrelation function (ACF or SACF) measure linear association between time series observations separated by a lag of m time units: r m = n m t=b SE of r m is S rm = (b = 1 unless use differencing) (Z t Z)(Z t+m Z) nt=b Z t nt=b (Z t Z), Z = 2 n b + 1 1 + 2 m 1 l=1 r2 l n b + 1 call r m the sample autocorrelation function: SACF (m) or ACF (m) sometimes used: t rm = r m /S rm Autocorrelation plot (or SAC): determine stationarity and identify MA(q) dependence structure bar-plot r m vs. m for various lags m (sketch) lines often added to represent 2 SE s (sketch) rough 95% confidence intervals if r m is more than 2 SE s away from zero, consider it significant compare t rm to 2 (for lags m 3, use 1.6 because low lags most important to pick up) SAC terminology: spike : r m is significant cuts off : no significant spikes after r m dies down : decreases in steady fashion 1

SAC & stationarity: 1. Z t stationary if SAC either cuts off fairly quickly or dies down fairly quickly; sometimes dyes down in damped exponential fashion (sketches) 2. Z t nonstationary if SAC dies down extremely slowly (sketch) 3. Ch. 9 of Bowerman & O Connell:... if the SAC cuts off fairly quickly, it will often [but not always] do so after a lag k that is less than or equal to 2. MA(q) dependence structure: moving average process of order q: model: Z t = δ + a t θ 1 a t 1 θ 2 a t 2... θ q a t q Z t : stationary transformed time series a t : random shocks θ i : unknown parameters δ : unknown parameter [only included if Z is statistically different from 0 identify using SAC: first q terms of SAC will be non-zero, then drop to zero (sketch) Partial Autocorrelation Function (PACF or SPACF) autocorrelation of time series observations separated by a lag of m, with the effects of the intervening observations eliminated: r m,m = r 1 if k = 1 r m,m = r m m 1 l=1 r m 1,lr m l 1 m 1 l=1 r if k 2 m 1,lr l where r m = SACF (m) and r m,l = r m 1,l r m,m r m 1,m l for l = 1,..., m 1 SE of r m,m is S rm,m = 1/ n b + 1 (b = 1 unless use differencing) call r m,m the sample partial autocorrelation function: SP ACF (m) or sometimes used: t rm,m = r m,m /S rm,m P ACF (m) Inverse Autocorrelation Function (IACF) is similar to PACF, and rarely discussed 2

Partial Autocorrelation Plot (or SPAC) bar-plot r m,m vs. m for various lags m (sketch) lines often added to represent 2 SE s; compare t rm,m to 2; (sketch) Use SPAC to identify AR(p) dependence structure: terminology as before: spike, cuts off, and dies down r m,m first p terms of SPAC will be non-zero, then drop to zero (sketch) AR(p) dependence structure: autoregressive process of order p: recall [special case] first-order autocorrelation: ε t = φε t 1 + a t generalize to account for error dependence structure where current time series value depends on past values: ε t = φ 1 ε t 1 + φ 2 ε t 2 +... + φ p ε t p + a t More common representation for AR(p): Z t = δ + φ 1 Z t 1 + φ 2 Z t 2 +... + φ p Z t p + a t φ i are unknown parameters; random shock a t iid N(0, σ 2 ) δ = µ(1 φ 1... φ p ); µ = E[Z t ] Z t are residuals µ 0 common to assume δ = 0 special case: Random Walk Model Z t = Z t 1 + a t AR(1) is a discrete time continuous Markov Chain (probability at time t depends only on state at time t 1) (GE investment example) Some convenient notation: backshift operator BZ t = Z t 1 B 2 Z t = BBZ t = BZ t 1 = Z t 2 AR(p): Z t = δ + φ 1 Z t 1 +... + φ p Z t p + a t = δ + φ 1 BZ t +... + φ p B p Z t + a t = δ + (φ 1 B +... + φ p B p )Z t + a t (1 φ 1 B... φ p B p )Z t δ = a t MA(q): Z t = δ + a t θ 1 a t 1... θ q a t q = δ + a t θ 1 Ba t... θ q B q a t = δ + (1 θ 1 B... θ q B q )a t (1 θ 1 B... θ q B q ) 1 (Z t δ) = a t 3

Interpreting and representing common dependence structures: AR(p) Autoregressive Process: current & future values (of Z t ) depend on previous p time series values (Z t ) (1 φ 1 B... φ p B p )Z t δ = a t MA(q) Moving Average Process: current & future values (of Z t ) depend on previous q random shocks (a t ) (less intuitive) (1 θ 1 B... θ q B q ) 1 (Z t δ) = a t (Gas price example) ARMA(p,q) dependence structure: mixed autoregressive-moving average model If AR and MA alone aren t enough, what about a composite (AR and MA) model? Z t = δ + φ 1 Z t 1 +... + φ p Z t p }{{} + a t θ 1 a t 1... θ q a t q }{{} AR(p) MA(q) in backshift notation, ARMA(p,q): (1 φ 1 B φ 2 B 2... φ p B p )Z t = δ + (1 θ 1 B θ 2 B 2... θ q B q )a t (1 θ 1 B... θ q B q ) 1 [(1 φ 1 B... φ p B p )Z t δ] = a t Invertibility model assumption (in addition to stationarity) intuitively, weights (φ l & θ l ) on past observations decrease for larger l Common Dependence Structures for Stationary Time Series SAC SPAC MA(1) cuts off after lag 1 dies down, dominated by damped exponential decay MA(2) cuts off after lag 2 dies down, in mixture of damped exp. decay & sine waves AR(1) dies down in damped cuts off after lag 1 exponential decay AR(2) dies down, in mixture of cuts off after lag 2 damped exp. decay & sine waves ARMA(1,1) dies down in damped exp. decay dies down in damped exp. decay 4

Several approaches exist to estimate φ l s, θ l s, and β j s, and deal with initial lag ULS (unconditional least squares): MA(q) & AR(p) also called nonlinear least squares minimize SS error YW (Yule-Walker): AR(p) generalized least squares using OLS residuals to estimate covariance across observations ARIMA(p,d,q) dependence structure: Autoregressive Integrated Moving Average Model a very flexible family of models useful prediction Recall differencing: First difference: Z t = Y t Y t 1, t = 2,..., n Second difference: W t = Z t Z t 1 = Y t 2Y t 1 + Y t 2, t = 3,..., n Pros: help make time series more stationary ( stubborn trends) Cons: can destroy cyclic behavior (harder to forecast) Useful when transformations and addition of time-related predictors (loworder polynomial, trigonometric, dummy) do not make time series stationary After differencing, AR and MA dependence structures may exist: ARIMA(p, d,q) p : AR(p) value at time t depends on previous p values) d : # of differences (need to take d th difference to make stationary) q : MA(q) value at time t depends on previous q random shocks) use SAC and SPAC to select p and q but how to select d? usually look at plots of time series choose lowest d to make stationary (also SAC) recall backshift notation: d = 1 : Z t = Y t Y t 1 = Y t BY t = (1 B)Y t general d: Z t = (1 B) d Y t ARIMA Forecasting & Goodness of Fit (note order in which SAS does this) (1 B) d }{{} Y t = β 0 + β 1 X t,1 +... + β k 1 X t,k 1 }{{} Differencing Linear Model + (1 φ 1 B... φ p B p ) 1 }{{} (1 θ 1B... θ q B q ) }{{} a t, a t iid N(0, σ 2 ) }{{} Independence Autoregressive Moving Average (given p, d, and q, SAS estimates β j s, φ l s, and θ l s) 5

ARIMA(p,d,q) model rewritten, with t = 1,..., n: Y t = g 1 (Y 1,..., Y t 1 ) + g 2 (X t,1,..., X t,k 1 ) + g 3 (a 1,..., a t ) where g 1 = linear combination (LC) of previous observations (Differencing) g 2 = LC of predictors at time t, in terms of parameters β j (Linear Model) g 3 = function of random shocks in terms of parameters φ l & θ l (AR & MA dependence structures) fit model estimates & standard errors for β j s, φ l s, & θ l s Diagnostics for goodness of fit Predicted values (point forecast from Box-Jenkins model, even for times t > n): Ŷ t = g 1 (Y 1,..., Y t 1 ) + ĝ 2 (X t,1,..., X t,k 1 ) + ĝ 3 (â 1,..., â t ) }{{}}{{}}{{} Estimate Y l with Estimate β j with b j Estimate φ l & θ l Ŷ l if no obs. with ˆφ l & ˆθ l at time l (l > n) predictors Y 1,..., Y t 1, X t,1,..., X t,k 1 related multicollinearity Numerical diagnostics what are S and Q like for a better model? Standard Error measure of overall fit in SAS: Std Error Estimate n1 S = (Y t Ŷt) 2, n p = # parameters in model n n p Ljung-Box statistic diagnostic checking in SAS: lag 6 χ 2 for Autocorrelation Check of Residuals Residuals reflect model assumptions Check adequacy of overall Box-Jenkins model (for these data) M Q = n (n + 2) (n m) 1 rm(â) 2 (this is the Ljung-Box statistic) m=1 n = n d, d = degree of differencing r m (â) = RSAC: sample autocorrelation of residuals at lag m M = somewhat arbitrary number of lags to consider, usually multiple of 6; basic idea: look at local dependence among residuals in first M sample autocorrelations Under H 0 : model is adequate, Q χ 2 M n p (sketch) 6

Graphical assessment of model adequacy what dependence structure remains? look at Residual Sample Autocorrelation (RSAC) & Residual Sample Partial Autocorrelation (RSPAC) plots Interval Forecasting recall: Estimate ± (CriticalV alue) (StandardError) Note: get CriticalV alue from the sampling distribution (sketch) get point forecast Ŷt = Ŷn+τ (most interested in τ > 0) SE n+τ : standard error of Ŷn+τ depends on S usually, smaller S smaller SE n+τ Ŷn+τ ± ( t n np (1 α/2) ) SE n+τ Based on historical data and the selected Box-Jenkins model, we are (1 α)100% confident that the true value of Y at time n + τ will be inside this interval. General SAS code for ARIMA(p,d,q), Y in terms of X 1,..., X k 1 : proc arima data = a1; identify var = Y (d ) crosscorr = (X 1... X k 1 ) ; estimate p = p q = q input = (X 1... X k 1 ) method = uls plot; forecast lead = L alpha = a noprint out = fout; run; option description d, p, q differencing, AR, & MA settings (as before) plot adds RSAC & RSPAC plots L # times after last observed to forecast a set confidence limit; a =.10 90% conf. limits noprint optional, suppresses output out = fout optional, sends forecast data to fout data set AR model p Z t = δ + φ 1 Z t 1 + φ 2 Z t 2 + a t 2 Z t = δ + φ 1 Z t 1 + φ 3 Z t 3 + a t (1,3) MA model q Z t = δ + a t θ 1 a t 1 θ 2 a t 2 θ 3 a t 3 3 Z t = δ + a t θ 1 a t 1 θ 3 a t 3 θ 7 a t 7 (1,3,7) Differencing d First: Z t = Y t Y t 1 = (1 B)Y t 1 Second: Z t = (1 B)(1 B)Y t 1,1 Lagged : Z t = Y t Y t 7 7 For a =.10, data set fout will contain variables Y, forecast, std, l90, u90, residual what about time or X 1... X k 1? 7

(gas price example, continued) How to get forecast values? easiest to use backshift notation forecasting equation for ARIMA(1,1,1) with one covariate: (1 B)Y t = (β 0 + β 1 X t ) + (1 φ 1 B) 1 (1 θ 1 B) a t (1 φ 1 B) (1 B)Y t = (1 φ 1 B) (β 0 + β 1 X t ) + (1 θ 1 B) a t ( 1 (1 + φ1 )B + φ 1 B 2) Y t = β 0 (1 φ 1 ) + β 1 (1 φ 1 B) X t + (1 θ 1 B) a t Y t (1 + φ 1 )Y t 1 + φ 1 Y t 2 = β 0 (1 φ 1 ) + β 1 (X t φ 1 X t 1 ) + a t θ 1 a t 1 Ŷ t = ˆβ 0 ( 1 ˆφ1 ) + ˆβ1 ( Xt ˆφ 1 X t 1 ) +â t ˆθ 1 â t 1 +(1 + ˆφ 1 )Y t 1 ˆφ 1 Y t 2 Note: â t = 0, â l = Y l Ŷl for l < t, and â l = 0 for l > n Summary of Box-Jenkins Models - choosing a good model (choice of p, d, & q) for useful forecasts, meet model assumptions stationarity transform Y, add predictors, differencing (d) account for what influences Y : obvious effects: effects of predictors less obvious : dependence structure (SAC & SPAC together) previous values (autoregressive, p) previous errors (moving average, q) assess model adequacy RSAC & RSPAC die down quickly should have nothing left small standard error (S) & small Ljung-Box statistic (Q ) forecast (point & interval) with adequate model want tight forecasting intervals how far into future? (t = n + τ, τ > 0) good summary / comparison plot: forecast with confidence limits (sketch) Now a case study 8