In this chapter, we assume the model is known exactly, and consider the calculation of forecasts and their properties for both deterministic trend models and ARIMA models. 9.1 Minimum Mean Square Error Forecasting Suppose the series is available up to time t, namely Y 1, Y 2,, Y t. We forecast the value of Y t+l. We call t the forecast origin, l the lead time, and denote the forecast as Ŷ t (l). We shall develop methods based on minimizing the mean square forecasting error. It turns out that Ŷ t (l) = E(Y t+l Y 1, Y 2,, Y t ).
9.2 Deterministic Trends We assume that the model is Y t = µ t + X t, where µ t is a deterministic function, and X t is a white noise with zero mean and variance γ 0. Then Ŷ t (l) = E(µ t+l + X t+l Y 1,, Y t ) = E(µ t+l Y 1,, Y t ) + E(X t+l Y 1,, Y t ) = µ t+l. The forecast error is given by e t (l) = Y t+l Ŷ t (l) = X t+l. Since E(e t (l)) = 0, the forecast is unbiased. The forecast error variance is Var (e t (l)) = Var (X t+l ) = γ 0.
Ex. In Exhibits 3.5 and 3.6, we estimate the temperature series in Dubuque, Iowa as a cosine trend ˆµ t = 46.2660 + ( 26.7079) cos(2πt) + ( 2.1697) sin(2πt) The start time is January 1964, and the end time is December 1975. Here ˆµ t is a periodic function of period 1 so we will get the same forecast temperature on the same month each year. For example, the forecast temperature at June 1976 is ˆµ 1976+5/12 = ˆµ 5/12 = 68.31 F
9.3 ARIMA Forecasting For ARIMA models, the forecasts can be expressed in several different ways.
9.3.1 AR(1) The model is Y t µ = φ(y t 1 µ) + e t. Replace t by t + l: Y t+l µ = φ(y t+l 1 µ) + e t+l Take conditional expectation E( Y 1,, Y t ): Ŷ t (l) µ = φ(ŷ t (l 1) µ) + E(e t+l Y 1,, Y t ) Ŷ t (l) µ = φ(ŷ t (l 1) µ) or Ŷ t (l) = µ + φ(ŷ t (l 1) µ). The last equation is the difference equation form of the forecast. Then Ŷ t (l) µ = φ(ŷt(l 1) µ) = φ 2 (Ŷt(l 2) µ) = = φ l (Y t µ) So the lead l forecast may also be expressed as Ŷ t (l) = µ + φ l (Y t µ). Since φ < 1, we have Ŷt(l) µ for large l. In numerical calculation, apply the difference equation form recursively will accumulate the round off error. We should use many decimal places in such calculations.
The estimated model is: Y t 74.3293 = 0.5705(Y t 1 74.3293) + e t, and the last observed value of the color property is Y t = 67. We get the lead l forecast Ŷ t (l) = 74.3293 + (0.5705) l (67 74.3293). We can implement a function to calculate Ŷ t (l) for some l: Ŷ t (1) = 70.14793, Ŷ t (1) = 71.94383, Ŷ t (1) = 73.88636, Ŷ t (1) = 74.30253.
Now we calculate the forecast error e t (l) for AR(1) model. Recall Y t+l µ = e t+l + φ(y t+l 1 µ) = e t+l + φe t+l 1 + φ 2 (Y t+l 2 µ) = = e t+l + φe t+l 1 + + φ l 1 e t+1 + φ l (Y t µ), Ŷ t (l) µ = φ l (Y t µ). Therefore, for AR(1) model, e t (l) = Y t+l Ŷ t (l) = e t+l + φe t+l 1 + + φ l 1 e t+1. In particular, e t (1) = e t+1.
9.3.2 MA(1) The invertible model is: Y t = µ + e t θe t 1. Replace t by t + 1 and take E( Y 1,, Y t ): Ŷ t (1) = µ + E(e t+1 Y 1,, Y t ) θe(e t Y 1,, Y t ) = µ θe t. For l > 1, Ŷ t (l) = µ + E(e t+l Y 1,, Y t ) θe(e t+l 1 Y 1,, Y t ) = µ.
9.3.3 The Random Walk with Drift The nonstationary model is: Y t = Y t 1 + θ 0 + e t. Then E(Y t+l Y 1,, Y t ) = E(Y t+l 1 Y 1,, Y t )+θ 0 +E(e t+l Y 1,, Y t ) We get Ŷ t (l) = Ŷ t (l 1) + θ 0 = = Y t + θ 0 l. Now we consider the forecast error. Y t+l = Y t+l 1 +θ 0 +e t+l = = Y t +θ 0 l+e t+l +e t+l 1 + +e t+1. Hence e t (l) = Y t+l Ŷ t (l) = e t+l + e t+l 1 + + e t+1. So the forecast is unbiased as E(e t (l)) = 0, and Var (e t (l)) = σ 2 el. In general, an ARIMA process is nonstationary iff Var (e t (l)) grows without limit.
9.3.4 ARMA(p,q) For a stationary invertible ARMA(p,q) model, the difference equation form is Ŷ t (l) = in which p φ i Ŷ t (l i) + θ 0 i=1 q θ j E(e t+l j Y 1,, Y t ) j=1 E(e t+j Y 1,, Y t ) = { 0 for j > 0 e t+j for j 0 The Ŷ t (l) is the true forecast when l > 0, but Ŷ t (l) = Y t+l for l 0.
Ex. Consider the ARMA(1,2) model: Y s = φy s 1 + θ 0 + e s θ 1 e s 1 θ 2 e s 2. Then Ŷ t (1) = φy t + θ 0 θ 1 e t θ 2 e t 1, Ŷ t (2) = φŷ t (1) + θ 0 θ 2 e t, Ŷ t (l) = φŷ t (l 1) + θ 0, l 3. The forecast may be expressed explicitly in terms of µ = θ 0 1 φ : Ŷ t (l) µ = φ l (Y t µ) (φ l 1 θ 1 + φ l 2 θ 2 )e t φ l 1 θ 2 e t 1.
For ARMA(p,q) models, the noise terms e t (q 1),, e t 1, e t appear directly in the computation of the forecasts for leads l = 1, 2,, q. However, for l > q, the autoregressive portion of the difference equation takes over, and we have Ŷ t (l) = φ 1 Ŷ t (l 1)+φ 2 Ŷ t (l 2)+ +φ p Ŷ t (l p)+θ 0 for l > q. (1) So the nature of the forecasts for long lead times will be determined by the autoregressive parameters φ 1,, φ p. Recall that θ 0 = µ(1 φ 1 φ 2 φ p ). (1) may be rewritten as Ŷ t (l) µ = φ 1 [Ŷ t (l 1) µ]+φ 2 [Ŷ t (l 2) µ]+ +φ p [Ŷ t (l p) µ] for l > q. (2)
Now we discuss the forecast error {e t (l)} for general (stationary or nonstationary) ARIMA models. Appendix G shows that every ARIMA model has truncated linear process representation: Y t+l = C t (l) + I t (l) for l 1 (3) where C t (l) is a function of Y t, Y t 1,, and I t (l) = e t+l + Ψ 1 e t+l 1 + Ψ 2 e t+l 2 + + Ψ l 1 e t+1 for l 1. (4) Take E( Y 1,, Y t ) on (3): Ŷ t (l) = E(C t (l) Y 1,, Y t ) + E(I t (l) Y 1,, Y t ) = C t (l) Therefore, e t (l) = Ŷ t (l) Y t+l = I t (l) = e t+l + Ψ 1 e t+l 1 + + Ψ l 1 e t+1.
It implies that the forecasts are unbiased: and E(e t (l)) = 0, l 1 Var (e t (l)) = σe 2 j=0 Ψ 2 j for l 1. For stationary ARMA models and large l, the variances increase with upper bounded: Var (e t (l)) σe 2 Ψ 2 j = γ 0. For nonstationary ARIMA models, the forecasts are unbiased; however, the Ψ j weights does not decay to zero as j increases, so that the error variances increase without bounded. j=0
9.3.5 Nonstationary Models Recall that {Y t } ARIMA(p, d, q) means that d Y t = φ 1 d Y t 1 +φ 2 d Y t 2 + +φ p d Y t p +e t θ 1 e t 1 θ 2 e t 2 θ q e where d Y t = d 1 Y t d 1 Y t 1 = = Y t ( ) d Y t 1 + 1 d i=0 ( ) d ( 1) i Y t i i ( ) d Y t 2 + 2 ( ) d ( 1) d 1 Y t d+1 + ( 1 d 1 So an ARIMA(p,d,q) model can be naturally expressed as a nonstationary ARMA(p+d,q) model. We can get the forecasts Ŷ t (t) and the forecast errors e t (l) similarly to those for stationary ARMA(p,q) models.
Ex. The ARIMA(1,1,1) model is Y t Y t 1 = φ(y t 1 Y t 2 ) + θ 0 + e t θe t 1. It has an ARMA(2,1) expression: Y t = (1 + φ)y t 1 φy t 2 + θ 0 + e t θe t 1. In the above equation, replace t by t + 1, t + 2, or t + l, and take E( Y 1,, Y t ). We get the forecasts: Ŷ t (1) = (1 + φ)y t φy t 1 + θ 0 θe t, Ŷ t (2) = (1 + φ)ŷ t (1) φy t + θ 0, Ŷ t (l) = (1 + φ)ŷ t (l 1) φŷ t (l 2) + θ 0.
For all nonstationary ARIMA models, the forecasts are unbiased, l 1 but the variance of forecast error, Var (e t (l)) = σe 2 Ψ 2 j, grows without bound. Some examples: 1 for the random walk with drift, Ψ j = 1; 2 for IMA(1,1) model, Ψ j = 1 θ for j 1; j=0 3 for ARI(1,1) model, Ψ j = (1 φ j+1 )/(1 φ) for j 1. So with nonstationary series, the distant future is quite uncertain. Section 9.9 gives the Summary of Forecasting with Certain ARIMA Models.
9.4 Prediction Limits 9.4.1 Deterministic Trends The model is: Y t = µ t + X t. Recall that Ŷ t (l) = µ t+l, e t (l) = X t+l, Var (e t (l)) = γ 0. Suppose the stochastic component X t is normally distributed. Then so is the forecast error e t (l). For a given confidence level 1 α, we could use a standard normal percentile z 1 α/2 = F 1 (1 α/2) (the inverse function of the cdf of the standard normal distribution), to claim [ ] P z 1 α/2 < Y t+l Ŷt(l) = 1 α (5) Var (et (l)) Thus we may be (1 α)100% confident that the future observation Y t+l will be contained within the prediction limits Ŷ t (l) ± z 1 α/2 Var (et (l)).
Ex. The monthly average temperature series in Dubuque, Iowa is modeled by a trend ˆµ t = 46.2660 + ( 26.7079) cos(2πt) + ( 2.1697) sin(2πt) with Var (e t (l)) = γ 0 = 3.7 F. The predicted temperature of June 1976 is 68.3 F. Thus with 95% of confidence interval, the average June 1976 temperature is 68.3 ± 1.96(3.7) = 68.3 ± 7.252, or [61.05 F, 75.55 F ]. In practice, the correct forecast error variance will be slightly larger than Var (e t (l)) as we use the estimated parameters.
9.4.2 ARIMA Models If the white noise terms {e t } are normally distributed, then so is the forecast error e t (l). We know that l 1 Var (e t (l)) = σe 2. Both σ 2 e and Ψ-weights must be estimated from the observed time series. For large sample sizes, the estimations will have little effect on the actual prediction limits. j=0 Ψ 2 j
Ex. (TS-ch9.R) The AR(1) model estimation of industry color property in Exhibit 9.1 estimates that and we have φ = 0.5705, µ = 74.3293, σ 2 e = 24.8, Ŷ t (l) = 74.3293 + (0.5705) l (67 74.3293), [ ] 1 φ Var (e t (l)) = σe 2 2l 1 φ 2. Thus the 95% confidence interval for l-step-ahead prediction is ( ) 1 0.5705 74.3293 + (0.5705) l 2l (67 74.3293) ± 1.96 24.8 1 0.5705 2. We can calculated some intervals for l = 1, 2, 5, 10, etc.
9.5 Forecasting Illustrations 9.5.1 Deterministic Trends We use the example of temperature series in Dubuque, Iowa to show how to plot the series with forecast and confidence band.
Ex. 9.2 (cont) The model fits quite well with a relatively small error variance, the forecast limits are quite close to the fitted trend forecast.
9.5.2 ARIMA Models Notice how the forecasts approach the mean exponentially as the lead time increases. Also note how the prediction limits increase in width.
9.6 Updating ARIMA Forecasts The origin forecast with origin time t and lead l + 1 is Ŷ t (l + 1). Once the observation Y t+1 at time t + 1 is known, we may update the forecast as Ŷ t+1 (l). The truncated linear process shows that Y t+l+1 = C t (l + 1) + e t+l+1 + Ψ 1 e t+l + + Ψ l e t+1. Note that C t (l + 1) and e t+1 are functions of Y t+1, Y t,. Take E( Y 1,, Y t+1 ) on both sides. Ŷ t+1 (l) = C t (l + 1) + Ψ l e t+1 = Ŷ t (l + 1) + Ψ l [Y t+1 Ŷ t (1)] where Y t+1 Ŷ t (1) is the actual forecast error at time t + 1 once Y t+1 has been observed.
Ex. The AR(1) model for the industrial color property in Exhibit 9.1 gives Ŷ 35 (1) = 70.14793, Ŷ 35 (2) = 71.94383. If we now observe the next value Y 36 = 65, then we update the forecast for t = 37 as Ŷ 36 (1) = 71.94383 + 0.5705(65 70.14793) = 69.00694.
9.7 Forecast Weights and Exponentially Weighted Moving Averages We hope to explicitly determine the forecasts from the observed series Y t, Y t 1,, Y 1. In general, an invertible ARIMA(p,d,q) process has an inverted form: Y t = π 1 Y t 1 + π 2 Y t 2 + π 3 Y t 3 + + e t. Then Y t+l = π 1 Y t+l 1 + π 2 Y t+l 2 + + e t+l. Apply E( Y 1,, Y t ) on both sides for l = 1, 2, We get Ŷ t (1) = π 1 Y t + π 2 Y t 1 + π 3 Y t 2 +, Ŷ t (2) = π 1 Ŷ t (1) + π 2 Y t + π 3 Y t 1 +,.
Now we determine the π-weights. An ARIMA(p,d,q) model may be written as a nonstationary ARMA(p+d,q) model: Y t = ϕ 1 Y t 1 +ϕ 2 Y t 2 + +ϕ p+d Y t p d +e t θ 1 e t 1 θ q e t q.
Ex. Consider the nonstationary IMA(1,1) model: Y t = Y t 1 + e t θe t 1. Use (9.7.2) or substitute e t = Y t π 1 Y t 1 π 2 Y t 2 for e t and e t 1. We get Therefore, π j = θπ j 1 = θ j 1 π 1 = (1 θ)θ j 1, j 1. Ŷ t (1) = (1 θ)y t + (1 θ)θy t 1 + (1 θ)θ 2 Y t 2 + Here the π-weights decrease exponentially. Thus Ŷt(1) is called an exponentially weighted moving average (EWMA). We also have Ŷ t (1) = (1 θ)y t + θŷ t 1 (1) = Ŷ t 1 (1) + (1 θ)[y t Ŷ t 1 (1)]. They show how to update forecasts from origin t 1 to origin t.
9.8 Forecasting Transformed Series 9.8.1 Differencing For nonstationary ARIMA models, we use differences to achieve stationarity. Two methods of forecasting may be used: 1 forecasting the original nonstationary series, 2 forecasting the stationary differenced series, then sum the terms to obtain the forecast of original series. Both methods will lead to the same forecast for any type of linear transformation with constant coefficients, including differences. Because taking conditional expectation is a linear function.
9.8.2 Log Transformations Let Y t denote the original series value and let Z t = log(y t ). We always have E(Y t+l Y 1,, Y t ) exp[e(z t+l Z 1,, Z t )] with equality holding only in trivial cases. Thus exp[ẑ t (l)] is not the minimum mean square error forecast of Y t+l. Fact: If X has a normal distribution with mean µ and variance σ 2, then ] E(exp(X)) = exp [µ + σ2. 2
In our case, µ = E(Z t+l Z 1,, Z t ) and σ 2 = Var (Z t+l Z t, Z t 1,, Z 1 ) ) = Var (Ẑt (l) + e t (l) Z t, Z t 1,, Z 1 = Var (C t (l) + e t (l) Z t, Z t 1,, Z 1 ) = Var (e t (l) Z t, Z t 1,, Z 1 ) = Var (e t (l)). Thus the minimum mean square error forecast in the original series is given by { Ŷ t (l) = exp Ẑ t (l) + 1 } 2 Var (e t(l)).
However, if Z t has a normal distribution, then Y t = exp(z t ) has a lognormal distribution, for which a criterion based on the mean absolute error would be better. The optimal forecast for this criterion is the median of Y t+l conditional on Y t, Y t 1,, Y 1. The log function preserves the mean. So in this case Ŷ t (l) = mean [Y t+l Y t,, Y 1 ] = exp [mean(z t+l Z t, Z 1 )] ] = exp [Ẑt (l) = exp µ.
9.9 Summary of Forecasting with Certain ARIMA Models This section includes a summary of Ŷt(l), e t (l), Var (e t (l)), and Ψ j, for AR(1), MA(1), IMA(1,1), IMA(2,2) models.