Decision 411: Class 9. HW#3 issues

Decision 411: Class 9 Presentation/discussion of HW#3 Introduction to ARIMA models Rules for fitting nonseasonal models Differencing and stationarity Reading the tea leaves : : ACF and PACF plots Unit roots Estimation issues Example: the UNITS series HW#3 issues Data transformations needed? How to model seasonality? Outliers? Lagged effects? Comparison of economic impact of promotion variables (sums of coefficients) Forecast for next period 1

Transformations? If the dependent variable has only a slight trend, it may be hard to tell if growth is exponential or if seasonality is multiplicative: a log transformation may be unnecessary Effects of promotion variables on sales are probably additive if promotional activities are carried out on a per-package package basis (rather than mass media) Variables cannot be logged if they contain zeroes. Seasonal adjustment? A priori seasonal adjustment of the dependent variable in a regression model may be problematic due to effects of independent variables (seasonal adjustment may distort the effects of other variables if those effects do not naturally vary with the season) Seasonal adjustment can be performed inside the regression model by using an externally-supplied seasonal index, and perhaps also the product of the seasonal index and the time index, as a separate regressor (or else by using seasonal dummies) If the dependent variable is transformed via seasonal adjustment or logging, it will be necessary to untransform the final forecasts. Economic interpretations of coefficients are also affected. 2

Outliers? In general, outliers in data need to be carefully considered for their effects on model estimation and prediction they they should not be removed high- handedly. Outliers in random variables may or may not be expected to occur again in the future. Outliers in decision variables may be the most informative data points (experiments!) Do effects remain linear for large values of independent variables? (Residual plots may shed light on this.) Lagged effects? When fitting regression models to time series data, cross-correlation correlation plots may help to reveal lags of independent variables that are useful regressors However, lack of significant cross-correlations correlations does not prove that lagged variables won t t be helpful in the context of a multiple regression model, where other variables are also present If there are a priori reasons for believing that an independent variable may have lagged effects, then those lags probably should be included in a preliminary all likely suspects model Usually only low order lags (e.g., lag 1 and lag 2, in addition to lag 0) are important 3

ARIMA models Auto-Regressive Integrated Moving Average Are an adaptation of discrete-time time filtering methods developed in 1930 s s and 1940 s s by electrical engineers (Norbert Wiener et al.) Statisticians George Box and Gwilym Jenkins developed systematic methods for applying them to business & economic data in the 1970 s s (hence the name Box-Jenkins models ) ARIMA models are... Generalized random walk models fine-tuned to eliminate all residual autocorrelation Generalized exponential smoothing models that can incorporate long-term trends and seasonality Stationarized regression models that use lags of the dependent variables and/or lags of the forecast errors as regressors The most general class of forecasting models for time series that can be stationarized* by transformations such as differencing, logging, and or deflating * A time series is stationary if all of its statistical properties mean, variance, autocorrelations, etc. are constant in time. Thus, it has no trend, no heteroscedasticity, and a constant degree of wiggliness. 4

Why use ARIMA models? ARIMA model-fitting tools allow you to systematically explore the autocorrelation pattern in a time series in order to pick the optimal statistical model assuming there is really a stable pattern in the data! ARIMA options let you fine-tune model types such as RW, SES, and multiple regression (add trend to SES, add autocorrelation correction to RW or multiple regression) Construction of an ARIMA model 1. Stationarize the series, if necessary, by differencing (& perhaps also logging, deflating, etc.) 2. Regress the stationarized series on lags of itself and/or lags of the forecast errors as needed to remove all traces of autocorrelation from the residuals 5

What ARIMA stands for A series which needs to be differenced to be made stationary is an integrated (I)) series Lags of the stationarized series are called auto- regressive (AR)) terms Lags of the forecast errors are called moving average (MA)) terms ARIMA terminology A non-seasonal ARIMA model can be (almost) completely summarized by three numbers: p = the number of autoregressive terms d = the number of nonseasonal differences q = the number of moving-average terms This is called an ARIMA( ARIMA(p,d,q) model The model may also include a constant term (or not) 6

ARIMA models we ve already met ARIMA(0,0,0)+c = mean (constant) model ARIMA(0,1,0) = RW model ARIMA(0,1,0)+c = RW with drift model ARIMA(1,0,0)+c = 1 st order AR model ARIMA(1,1,0)+c = differenced 1 st -order AR model ARIMA(0,1,1) = SES model ARIMA(0,1,1)+c = SES + trend = RW + correction for lag-1 1 autocorrelation ARIMA(1,1,2) ARIMA(0,2,2) = LES w/ damped trend (leveling off) = generalized LES (including Holt s) The ARIMA filtering box 1 1 1 0 2 0 2 0 2 time series p d q signal (forecasts) Objective: adjust the knobs until the residuals are white noise (uncorrelated) noise (residuals) 7

ARIMA forecasting equation Let Y denote the original series Let y denote the differenced (stationarized)) series No difference (d=0):( y t = Y t First difference (d=1): y t = Y t Y t-1 Second difference (d=2): y t = (Y t Y t-1 ) (Y t-1 Y t-2 ) = Y t 2Y t-1 + Y t-2 Forecasting equation for y yˆ t μ + φ1 yt 1 +... + By convention, the AR terms are + and the MA terms are constant AR terms (lagged values of y) = φ y t 1 t p MA terms (lagged errors) Not as bad as it looks! Usually p+q 2 and either p=0 0 or q=0 (pure AR or pure MA model) q p θ e... θ e 1 t q 8

Undifferencing the forecast The differencing (if any) must be reversed to obtain a forecast for the original series: If d = 0 : Yˆ t = yˆ t If d 1: Yˆ = yˆ + Y If d = t t t 1 ˆ ˆ = 2 : Yt = yt + 2Yt 1 Yt 2 Fortunately, your software will do all of this automatically! Why do you need both AR and MA terms? In general, you don t: : usually it suffices to use only one type or the other. Some series are better fitted by AR terms, others are better fitted by MA terms (at a given level of differencing). Rough rule of thumb: if the stationarized series has positive autocorrelation at lag 1, AR terms often work best. If it has negative autocorrelation at lag 1, MA terms often work best. 9

Interpretation of AR terms A series displays autoregressive (AR) behavior if it apparently feels a restoring force that tends to pull it back toward its mean. In an AR(1) model, the AR(1) coefficient determines how fast the series tends to return to its mean. If the coefficient is near zero,, the series returns to its mean quickly; ; if the coefficient is near 1, the series returns to its mean slowly. In a model with 2 or more AR coefficients, the sum of the coefficients determines the speed of mean reversion, and the series may also show an oscillatory pattern. Interpretation of MA terms A series displays moving-average (MA) behavior if it apparently undergoes random shocks whose effects are felt in two or more consecutive periods. The MA(1) coefficient is (minus) the fraction of last period s s shock that is still felt in the current period. The MA(2) coefficient, if any, is (minus) the fraction of the shock two periods ago that is still felt in the current period, and so on. 10

Tools for identifying ARIMA models: ACF and PACF plots The autocorrelation function (ACF) plot shows the correlation of the series with itself at different lags The autocorrelation of Y at lag k is the correlation between Y and LAG(Y,k) The partial autocorrelation function (PACF) plot shows the amount of autocorrelation at lag k that is not explained by lower-order order autocorrelations The partial autocorrelation at lag k is the coefficient of LAG(Y,k) ) in an AR(k) ) model, i.e., in a regression of Y on LAG(Y,, 1), LAG(Y,2), up to LAG(Y,k) AR and MA signatures ACF that dies out gradually and PACF that cuts off sharply after a few lags AR signature An AR series is usually positively autocorrelated at lag 1 (or even borderline nonstationary) ACF that cuts off sharply after a few lags and PACF that dies out more gradually MA signature An MA series is usually negatively autcorrelated at lag 1 (or even mildly overdifferenced) 11

AR signature: mean-reverting behavior, slow decay in ACF (usually positive at lag 1), sharp cutoff after a few lags in PACF. Here the signature is AR(2) because of 2 spikes in PACF. MA signature: noisy pattern, sharp cutoff in ACF (usually negative at lag 1), gradual decay in PACF. Here the signature is MA(1) because of 1 spike in ACF. 12

AR or MA? It depends! Whether a series displays AR or MA behavior often depends on the extent to which it has been differenced. An underdifferenced series has an AR signature (positive autocorrelation) After one or more orders of differencing, the autocorrelation will become more negative and an MA signature will emerge The autocorrelation spectrum Positive autocorrelation No autocorrelation Negative autocorrelation add DIFF add AR add DIFF add MA remove DIFF Nonstationary Auto-Regressive White Noise Moving-Average Overdifferenced 13

Model-fitting steps 1. Determine the order of differencing 2. Determine the numbers of AR & MA terms 3. Fit the model check to see if residuals are white noise, highest-order coefficients are significant (w/ no unit roots ), and forecasts look reasonable. If not, return to step 1 or 2. In other words, move right or left in the autocorrelation spectrum by appropriate choices of differencing and AR/MA terms, until you reach the center (white noise) 1. Determine the order of differencing If the series apparently (or logically) has a stable long-term mean,, then perhaps no differencing is needed If the series has a consistent trend,, you MUST use at least one order of differencing [or else include a trended regressor] ] otherwise the long-term forecasts will revert to the mean of the historical sample If two orders of differencing are used, you should suppress the constant (otherwise you will get a quadratic trend in forecasts) 14

How much differencing is needed? Usually the best order of differencing is the least amount needed to stationarize the series When in doubt, you can also try the next higher order of differencing & compare models If the lag-1 1 autocorrelation is already zero or negative, you don t t need more differencing Beware of overdifferencing if there is more autocorrelation and/or higher variance after differencing, you may have gone too far! 2. Determine # of AR & MA terms After differencing, inspect the ACF and PACF at low-order order lags (1, 2, 3, ): Positive decay pattern in ACF at lags 1, 2, 3, 3 single* positive spike in PACF at lag 1 AR=1 Negative spike in ACF at lag 1, negative decay pattern in PACF at lags 1, 2, 3, MA=1 MA terms often work well with differences: ARIMA(0,1,1) = SES, ARIMA(0,2,2) = LES. *Two positives spikes in PACF AR=2, two negative spikes in ACF MA=2, etc. 15

3. Fit the model & check Residual time series plot looks stationary? No residual autocorrelation? (or is there a signature of AR or MA terms that still need to be added)? Highest-order coefficient(s) ) significant? Sum of coefficients of same type < 1? UNITS series: begin with ARIMA(0,0,0)+c model, i.e., constant model. Residuals = original data with mean subtracted, showing a strong upward trend, lag-1 1 autocorrelation close to 1.0, slowly- declining pattern in ACF differencing is obviously needed 16

After 1 nonseasonal difference: ARIMA(0,1,0)+c (random walk with drift) model: lag-1 1 residual autocorrelation still positive, ACF still declines slowly, 2 technically-significant spikes in PACF. Is it stationary (e.g. mean-reverting)? If we think so, try adding AR=2. After adding AR=2: ARIMA(2,1,0)+c model. No significant spikes in ACF or PACF, residuals appear to be stationary. The mean is the estimated constant long-term trend (0.416 per period). AR(2) coefficient is significant (t=2.49). AR coefficients sum to less than 1.0 ( ( no unit root,, i.e., not seriously underdifferenced). White noise std. dev. (RMSE) = 1.35, versus 1.44 for RW. 17

What s s the difference between MEAN and CONSTANT in the ARIMA output? MEAN = mean of stationarized series If no differencing was used, this is just the mean of the original series. If one order of differencing was used, this is the long-term average trend CONSTANT = the intercept term (μ)( ) in the forecasting equation Connection (a mathematical identity): CONSTANT= MEAN (1 sum of AR coefficients) Estimated white noise standard deviation? This is just the RMSE of the model in transformed units, adjusted for # coefficients estimated. I.e., it is the same statistic that is called standard error of the estimate in regression If a natural log transform was used, it is the RMS error in percentage terms. 18

units 290 270 250 230 Time Sequence Plot for units ARIMA(2,1,0) with constant actual forecast 95.0% limits 210 190 0 30 60 90 120 150 180 The plot of data and forecasts shows that the trend in the forecasts equals the historical average trend. A good model?? What if we had gone to 2 orders of differencing instead? With 2 nonseasonal differences and no constant: : ARIMA(0,2,0) model. Residuals are definitely stationary but now strongly negatively autocorrelated at lag 1, with a single negative spike in ACF. Overdifferenced or MA=1?? Note that RMSE is larger than it was with one difference 1.69 vs 1.44 not a good sign. 19

After adding MA=1: ARIMA(0,2,1) model essentially essentially a linear exponential smoothing model. Residuals look fine, no autocorrelation. Estimated MA(1) coefficient is only 0.75 (not dangerously close to 1.0 not seriously overdifferenced). White noise std. dev. = 1.37, about the same as (2,1,0)+c model. units 290 270 250 230 210 190 Time Sequence Plot for units ARIMA(0,2,1) 0 30 60 90 120 150 180 actual forecast 95.0% limits Forecast plot shows that the trend in this model is the local trend estimated at the end of the series, flatter than in the previous model. Confidence limits widen more rapidly than in the previous model because a time-varying trend is assumed, as in LES. 20

Equation of first model The ARIMA(2,1,0)+c forecasting equation is Yˆ Y t = μ+ φ ( Y Y ) + φ ( Y Y t 1 1 t 1 t 2 2 t 2 t 3 where μ = 0.228, φ 1 = 0.250, and φ 2 = 0.201. Thus, the predicted change is a constant plus multiples of the last two changes ( momentum ) ) Equation of second model The ARIMA(0,2,1) forecasting equation is: Yˆ t Y t 1 = ( Yt 1 Yt 2 ) θ e 1 t 1...where the MA(1) coefficient is θ 1 = 0.753 Thus, the predicted change equals the last change minus a multiple of the last forecast error An ARIMA(0,2,1) model is almost the same as Brown s s LES model, with the correspondence α 1- ½θ 1. Hence this model is similar to an LES model with α 0.625 21

ARIMA(0,2,2) vs. LES General ARIMA (0,2,2) model (w/o const): Yˆ Y = ( Y Y ) θ e θ e t t 1 t 1 t 2 1 t 1 2 t 2 Brown s s LES is exactly equivalent to (0,2,2) with θ 1 = 2 2α, θ 2 = (1 α α) 2 Holt s s LES is exactly equivalent to (0,2,2) with θ 1 = 2 α αβ, θ 2 = (1 α α) The ARIMA model is somewhat more general because it allows α and β outside (0,1) Model comparison shows that the two best ARIMA models (B and D) perform about equally well for one-period period-ahead forecasts. They differ mainly in their assumptions about trend and width of confidence limits for longer-term forecasts. An LES model has been included for comparison essentially the same as ARIMA(0,2,1). Note that α 0.61. 22

Estimation of ARIMA models Like SES and LES models, ARIMA models face a start-up problem at the beginning of the series: prior lagged values are needed. Naïve approach: AR models can be initialized by assuming that prior values of the differenced series are equal to the mean MA models can be initialized by assuming that prior errors were zero Backforecasting A better approach: use backforecasting to estimate the prior values of the differenced series and the forecast errors The key idea: a stationary time series looks the same (statistically speaking) whether it is traversed forward or backward in time. Hence the same model that forecasts the future of a series can also forecast its past. 23

How backforecasting works On each iteration of the estimation algorithm, the algorithm first makes a backward pass through the series and forecasts into the past until the forecasts decay to the mean. Then it turns around and starts forecasting in the forward direction. Squared error calculations are based on all the errors in the forward pass, even those in the prior period. Theoretically, this yields the maximally efficient estimates of coefficients. Backforecasting caveats If the model is badly mis-specified specified (e.g., under or over-differenced, or over-fitted with too many AR or MA terms), backforecasting may drive the coefficient estimates to perverse values and/or yield too-optimistic optimistic error statistics If in doubt, try estimating the model without backforecasting until you are fairly sure it is properly identified. Turning on backforecasting should yield slight changes in coefficients & improved error stats, but not a radical change. 24

Backforecasting is an estimation option for ARIMA models, as well as SES, LES, Holt, and Winters models (The stopping criteria are thresholds for the change in residual sum of squares and parameter estimates below which the estimation will stop) Number of iterations ARIMA models are estimated by an iterative, nonlinear optimization algorithm (like Solver in Excel). The precision of the estimation is controlled by two stopping criteria (normally you don t t need to change these) A large number of iterations ( 10)( indicates that the optimal coefficients were hard to find possibly a problem with the model such as redundant AR/MA parameters or a unit root 25

What is a unit root? Suppose that a time series has the true equation of motion: Yt = φyt 1 + εt...where ε t is a stationary process (mean-reverting, constant variance & autocorrelation, etc.) If the true value of φ is 1, then Y is said to have an (autoregressive) unit root.. Its true equation is then Y Y = ε t t 1 t which is just a random walk (perhaps with drift and correlated steps) Why are unit roots important? If a series has an AR unit root, then a jump (discontinuity) in the series permanently raises its expected future level, as in a random walk If it doesn t t have an AR unit root, then it eventually reverts to a long-term mean or trend line after a jump A series that has a unit root is much less predictable at long forecast horizons than one that does not If can be hard to tell whether a trended series has a unit root or whether it is reverting to a trend line 26

Unit root tests A crude unit root test is to fit an AR model and see whether the AR coefficients sum to almost exactly 1. A more sensitive test is to regress DIFF(Y) ) on LAG(Y,1), i.e., fit the model Y Y = α + βy + ε t t 1 t 1 t and test whether β is significantly different from zero using a one-sided t-test test with 1.75 times the usual critical value (the Dickey-Fuller test ). If not, then you can t t reject the unit root hypothesis. Fancier unit root and cointegration tests regress DIFF(Y) ) on LAG(Y,1) plus a bunch of lags of DIFF(Y) ( augmented DF test ) ) and/or other variables (e.g. time trend), with more complicated critical values. Does UNITS have a unit root? Degrees of freedom for error 20 50 100 Infinity Critical t value (.05, 1-tailed) 1.725 1.676 1.660 1.645 Critical Dickey-Fuller value (= t *1.75) 3.018 2.933 2.905 2.879 By the DF test, we can t t reject the unit root hypothesis for the UNITS series 27

But we can reject the unit root hypothesis for DIFF(UNITS), suggesting that 1 order of differencing is sufficient Practical implications (for us) In regression analysis, if both Y and X have unit roots (i.e., are random walks), then the regression Y on X may yield a spurious estimate of the relation of Y to X (better to regress DIFF(Y) ) on DIFF(X) ) in that case) In ARIMA analysis, if a unit root exists either on the AR or the MA side of the model, the model can be simplified by using a higher or lower order of differencing This brings us to 28

Over- or under-differencing differencing What if we had identified a grossly wrong order of differencing? For example, suppose we hadn t t differenced the UNITS series at all, or supposed we had differenced it more than twice? Here is the original UNITS series again. If we ignore the obvious trend in the time series plot, we see that the ACF and PACF display a (suspiciously!) strong AR(1) signature. Suppose we just add an AR(1) term 29

The estimated AR(1) coefficient is 0.998 (essentially 1.0), which is equivalent to an order of differencing. An AR factor mimics an order of differencing when the sum of AR coefficients is 1.0 (a sign of a unit root), so the model is really telling us that a higher order of differencing is needed. Adding an AR(2) term does not help. Now the sum of the AR coefficients is almost exactly equal to 1.0. Again, this means that there is an AR unit root-- --a a higher order of differencing is needed. 30

On the other hand, suppose we grossly overdifferenced the series. Here 3 orders of differencing have been used (not recommended!). The series now displays a suspiciously strong MA(1) or MA(2) signature. If we add an MA(1) term, its estimated coefficient is 0.995 (essentially 1.0), which is exactly canceling one order of differencing. An MA factor cancels an order of differencing if the sum of MA coefficients is 1.0 (indicating an MA unit root, also called a non-invertible model). 31

Adding an MA(2) term does not really improve the results, and the sum of the MA coefficients is now almost exactly equal to 1.0. Again, this tells us less differencing was needed. Conclusions If the series is grossly under- or over-differenced, you will see what appears to be a suspiciously strong AR or MA signature. This is not good!! The correct order of differencing usually reduces rather than increases the patterns in the ACF and PACF When coefficients are estimated, if the coefficients of the same type sum to almost exactly 1.0, this indicates that a higher or lower order of differencing should have been used. 32

The rules of the ARIMA game The preceding guidelines for fitting ARIMA models to data can be summed up in the following rules. ARIMA rules: differencing Rule 1: If the series has positive autocorrelations out to a high number of lags and/or a non-zero trend,, then it probably needs a higher order of differencing. Rule 2: If the lag-1 1 autocorrelation is zero or negative,, the series does not need a higher order of differencing. If the lag-1 1 autocorrelation is -0.5 or more negative, the series may already be overdifferenced differenced. Rule 3: The optimal order of differencing is often (not always, but often) the order of differencing at which the standard deviation is lowest. 33

ARIMA rules: differencing continued Rule 4: (a) A model with no orders of differencing assumes that the original series is stationary (among other things, mean-reverting). (b) A model with one order of differencing assumes that the original series has a constant average trend (e.g. a random walk or SES-type model, with or without drift). (c) A model with two orders of total differencing assumes that the original series has a time- varying trend (e.g. an LES-type model). ARIMA rules: constant term or not? Rule 5: (a) A model with no orders of differencing normally includes a constant term. The estimated mean is the average level. (b) In a model with one order of differencing, a constant should be included if there is a long- term trend. The estimated mean is the average trend. (b) A model with two orders of total differencing normally does not include a constant term. 34

ARIMA rules: AR signature Rule 6: If the partial autocorrelation function (PACF) of the differenced series displays a sharp cutoff and/or the lag-1 1 autocorrelation is positive i.e., i.e., if the series appears slightly underdifferenced then consider adding one or more AR terms to the model. The lag beyond which the PACF cuts off is the indicated number of AR terms. ARIMA rules: MA signature Rule 7: If the autocorrelation function (ACF) of the differenced series displays a sharp cutoff and/or the lag-1 1 autocorrelation is negative i.e., if the series appears slightly overdifferenced then consider adding an MA term to the model. The lag beyond which the ACF cuts off is the indicated number of MA terms. 35

ARIMA rules: residual signature Rule 8: If an important AR or MA term has been omitted,, its signature will usually show up in the residuals. For example, if an AR(1) term (only) has been included in the model, and the residual PACF clearly shows a spike at lag 2, this suggests that an AR(2) term may also be needed. Rule 9: ARIMA rules: cancellation It is possible for an AR term and an MA term to cancel each other's effects, so if a mixed AR-MA model seems to fit the data, also try a model with one fewer AR term and one fewer MA term particularly if the parameter estimates in the original model need more than 10 iterations to converge. 36

ARIMA rules: unit roots Rule 10: If there is a unit root in the AR part of the model i.e., if the sum of the AR coefficients is almost exactly 1 you 1 should try reducing the number of AR terms by one and increasing the order of differencing by one. Rule 11: If there is a unit root in the MA part of the model i.e., if the sum of the MA coefficients is almost exactly 1 you 1 should try reducing the number of MA terms by one and reducing the order of differencing by one. Rule 12: If the long-term forecasts appear erratic or unstable, and/or the model takes 10 or more iterations to estimate, there may be a unit root in the AR or MA coefficients. 37