Forecasting. This optimal forecast is referred to as the Minimum Mean Square Error Forecast. This optimal forecast is unbiased because

Forecasting 1. Optimal Forecast Criterion - Minimum Mean Square Error Forecast We have now considered how to determine which ARIMA model we should fit to our data, we have also examined how to estimate the parameters in the ARIMA model and how to examine the goodness of fit of our model. So we are finally in a position to use our model to forecast new values for our time series. Suppose we have observed a time series up to time T, so that we have known values for X 1, X 2,, X T, and subsequently, we also know Z 1, Z 2,, Z T. This collection of information we know will be denoted as: I T = {X 1, X 2,, X T ; Z 1, Z 2,, Z T } Suppose that we want to forecast a value for X T+k, k time units into the future. We now introduce the most prevalent optimality criterion for forecasting. Theorem: The optimal k-step ahead forecast X T+k,T (which is a function of I T ) that will minimize the mean square error, is E(X T+k X T+k,T ) 2 X T+k,T = E(X T+k I T ) This optimal forecast is referred to as the Minimum Mean Square Error Forecast. This optimal forecast is unbiased because E(X T+k X T+k,T ) = E [E ((X T+k X T+k,T ) I T )] = E[X T+k,T X T+k,T ] = 0 Lemma: Suppose Z and W are real random variables, then min h E [(Z h(w)) 2 ] = E [(Z E(Z W)) 2 ] 1

That is, the posterior mean E(Z W) will minimize the quadratic loss (mean square error). Proof: E [(Z h(w)) 2 ] = E [(Z E(Z W) + E(Z W) h(w)) 2 ] = E [(Z E(Z W)) 2 ] + 2E[(Z E(Z W))(E(Z W) h(w))] +E [(E(Z W) h(w)) 2 ] Conditioning on W, the cross term is zero. Thus E [(Z h(w)) 2 ] = E [(Z E(Z W)) 2 ] + E [(E(Z W) h(w)) 2 ] Note: We usually omit the word optimal in the K Periods Ahead (Optimal) Forecast, because in the time series context, when we refer to forecast, we mean the optimal minimum mean square error forecast, by default. 2

2. Forecast in MA(1) The MA(1) model is: X t = Z t + θz t 1, where Z t ~WN(0, σ 2 ) Given the observed data {X 1, X 2,, X T }, the white noise term Z t is not directly observed, however, we can perform an Recursive Estimation of the White Noise as follows: Given the initial condition Z 0 (not important), we have: Z 1 = X 1 θz 0 Z 2 = X 2 θz 1 Z T = X T θz T 1 Therefore, given {X 1, X 2,, X T }, we can also obtain {Z 1, Z 2,, Z T }, the entire information known can thus be written as: I T = {X 1, X 2,, X T ; Z 1, Z 2,, Z T } Since X T+1 = Z T+1 + θz T, the one period ahead (optimal) forecast is X T+1,T = E(X T+1 I T ) = E(Z T+1 + θz T I T ) = E(Z T+1 I T ) + θe(z T I T ) = E(Z T+1 ) + θe(z T Z T ) = 0 + θz T = θz T Note: Per convention in time series literature, we wrote: E(Z T Z T ) = Z T Although one can write this more rigorously as: E(Z T Z T = z T ) = z T Similarly, we wrote E(X T X T ) = X T, which can be written more rigorously as: E(X T X T = x T ) = x T e T+1,T = X T+1 X T+1,T = Z T+1 Since the forecast is unbiased, the minimum mean square error is equal to the forecast error variance: E(X T+1 X T+1,T ) 2 = Var(X T+1 X T+1,T ) = Var(Z T+1 ) = σ 2 Given I T = {X 1, X 2,, X T ; Z 1, Z 2,, Z T }, and X T+2 = Z T+2 + θz T+1, the two periods ahead (optimal) forecast is X T+2,T = E(X T+2 I T ) = E(Z T+2 + θz T+1 I T ) = E(Z T+2 I T ) + θe(z T+1 I T ) = 0 + θ 0 = 0 3

e T+2,T = X T+2 X T+2,T = Z T+2 + θz T+1 Since the forecast is unbiased, the minimum mean square error is equal to the forecast error variance: E(X T+2 X T+2,T ) 2 = Var(X T+2 X T+2,T ) = Var(Z T+2 + θz T+1 ) = (1 + θ 2 )σ 2 In summary, for more than one period ahead, just like the two-periods ahead forecast, the forecast is zero, the unconditional mean. That is, X T+2,T = E(X T+2 I T ) = 0 X T+3,T = E(X T+3 I T ) = 0 The MA(1) process is not forecastable for more than one period ahead (apart from the unconditional mean). Forecast for MA(1) with Intercept (non-zero mean) If the MA(1) model includes an intercept X t = μ+z t + θz t 1, where Z t ~WN(0, σ 2 ) We can perform forecasting using the same approach. For example, since X T+1 = μ+z T+1 + θz T, the one period ahead (optimal) forecast is X T+1,T = E(X T+1 I T ) = E(μ + Z T+1 + θz T I T ) = μ + E(Z T+1 I T ) + θe(z T I T ) = μ + 0 + θz T = μ + θz T e T+1,T = X T+1 X T+1,T = Z T+1 Since the forecast is unbiased, the minimum mean square error is equal to the forecast error variance: E(X T+1 X T+1,T ) 2 = Var(X T+1 X T+1,T ) = Var(Z T+1 ) = σ 2 3. Generalization to MA(q): For a MA(q) process, we can forecast up to q out-of-sample periods. Can you write down the (optimal) point forecast for each period, the forecast error, and the forecast error variance? Please practice before the exam. 4. Forecast in AR(1) For the AR(1) model, X t = X t 1 +Z t, where Z t ~WN(0, σ 2 ) Given I T = {X 1, X 2,, X T ; Z 1, Z 2,, Z T }; The One Period Ahead (Optimal) Forecast can be computed as follows: 4

X T+1 = X T +Z T+1 So, given data I T, the one period ahead forecast is X T+1,T = E(X T+1 I T ) = E( X T +Z T+1 I T ) = E( X T I T ) + E(Z T+1 I T ) = X T + 0 = X T e T+1,T = X T+1 X T+1,T = Z T+1 The forecast error variance: Var(e T+1,T ) = Var(Z T+1 ) = σ 2 Two Periods Ahead Forecast By back-substitution X T+2 = X T+1 +Z T+2 X T+2 = ( X T +Z T+1 )+Z T+2 X T+2 = 2 X T + Z T+1 +Z T+2 So, given data I T, the two periods ahead (optimal) forecast is X T+2,T = E(X T+2 I T ) = E( 2 X T + Z T+1 +Z T+2 I T ) = 2 X T + 0 + 0 = 2 X T That is, the 2 periods ahead forecast is also a linear function of the final observed value, but with the coefficient 2. e T+2,T = X T+2 X T+2,T = Z T+1 +Z T+2 The forecast error variance is: Var(e T+2,T ) = Var( Z T+1 +Z T+2 ) = (1 + 2 )σ 2 K Periods Ahead Forecast Similarly X T+k = k X T + k 1 Z T+1 + + Z T+k 1 + Z T+k So, given I T, the k periods ahead optimal forecast is X T+k,T = E(X T+k I T ) = k X T AR(1) with Intercept If the AR(1) model includes an intercept X t = α + X t 1 + Z t, where Z t ~WN(0, σ 2 ) Then the one period ahead forecast is X T+1,T = E(X T+1 I T ) = α + E( X T I T ) + E(Z T+1 I T ) 5

= α + X T What is the two periods ahead forecast? Forecast AR(1) recursively. Alternatively, rather than using the back-substitution method directly as qwe have shown above, we can forecast the AR(1) recursively as follows: X T+2 = X T+1 +Z T+2 So, given data I T, the two periods ahead (optimal) forecast is X T+2,T = E(X T+2 I T ) = E( X T+1 +Z T+2 I T ) = E(X T+1 I T ) = X T+1,T Similarly, one can obtain the general recursive forecast relationship as: X T+k,T = E(X T+k I T ) = E( X T+k 1 +Z T+k I T ) = E(X T+k 1 I T ) = X T+k 1,T = = k 1 X T+1,T This way, once you have computed the one-step ahead forecast as: X T+1,T = E(X T+1 I T ) = E( X T +Z T+1 I T ) = E( X T I T ) + E(Z T+1 I T ) = X T + 0 = X T You can obtain the rest of the forecast easily. Note: As mentioned in class, I prefer the back-substitution method (introduced first) in general because it can provide you with the forecast error directly as well. 5. AR(p) Process Consider an AR(2) model with intercept X t = α + 1 X t 1 + 2 X t 2 + Z t, where Z t ~WN(0, σ 2 ) We have X T+1 = α + 1 X T + 2 X T 1 + Z T+1 Then the one period ahead forecast is X T+1,T = E(X T+1 I T ) = α + E( 1 X T I T ) + E( 2 X T 1 I T ) + E(Z T+1 I T ) = α + 1 X T + 2 X T 1 The two periods ahead forecast (by the recursive foreecasting method) is X T+2,T = E(X T+2 I T ) = α + E( 1 X T+1 I T ) + E( 2 X T I T ) + E(Z T+2 I T ) = α + 1 X T+1,T + 2 X T = α + 1 (α + 1 X T + 2 X T 1 ) + 2 X T = (1 + 1 )α + ( 1 2 + 2 )X T + 1 2 X T 1 6

Alternatively, the two periods ahead forecast (by the back-substitution method) can be derived as: X T+2 = α + 1 X T+1 + 2 X T + Z T+2 = α + 1 (α + 1 X T + 2 X T 1 + Z T+1 ) + 2 X T + Z T+2 = (1 + 1 )α + ( 1 2 + 2 )X T + 1 2 X T 1 + 1 Z T+1 + Z T+2 Therefore X T+2,T = E(X T+2 I T ) = (1 + 1 )α + ( 1 2 + 2 )X T + 1 2 X T 1 e T+2,T = X T+2 X T+2,T = 1 Z T+1 + Z T+2 The forecast error variance is: Var(e T+2,T ) = Var( 1 Z T+1 + Z T+2 ) = (1 + 2 1 )σ 2 What are the forecasts for more future periods? Please practice before the exam. 6. ARMA Process Consider an ARMA(1,1) model X t = X t 1 + Z t + θz t 1, where Z t ~WN(0, σ 2 ) Again we can use the back-substitution method to compute point forecast. X T+1 = X T + Z T+1 + θz T X T+2 = X T+1 + Z T+2 + θz T+1 = ( X T + Z T+1 + θz T ) + Z T+2 + θz T+1 One-step ahead forecast: X T+1,T = E(X T+1 I T ) = E( X T I T ) + E(Z T+1 I T ) + E(θZ T I T ) = X T + θz T Two-step ahead forecast: X T+2,T = E(X T+2 I T ) = E( X T+1 I T ) + E(Z T+2 I T ) + E(θZ T+1 I T ) = E(X T+1 I T ) = ( X T + θz T ) The two-step ahead forecast error is e T+2,T = X T+2 X T+2,T = Z T+1 + Z T+2 + θz T+1 = ( + θ)z T+1 + Z T+2 Again, since the forecast is unbiased, the minimum mean square error is equal to the forecast error variance: E(X T+2 X T+2,T ) 2 = Var(X T+2 X T+2,T ) = Var[( + θ)z T+1 + Z T+2 ] = [( + θ) 2 + 1]σ 2 7

7. ARIMA Process Suppose X t follows the ARIMA(1,1,0) model. That means its first difference Y t = X t = X t X t 1 follows the ARMA(1,0) model, which is simply the AR(1) model as follows: Y t = φy t 1 + Z t Here we assume Z t ~WN(0, σ Z 2 ) Note: we used σ Z 2 instead of σ 2 so you can get familiar with different notations. Substituting Y t = X t = X t X t 1, and Y t 1 = X t 1 = X t 1 X t 2 we can write the original ARIMA(1,1,0) model as: X t X t 1 = φ(x t 1 X t 2 ) + Z t That is: X t = (1 + φ)x t 1 φx t 2 + Z t One-step ahead forecast: X T+1,T = E[X T+1 I T ] = E[(1 + φ)x T φx T 1 + Z T+1 I T ] = (1 + φ)x T φx T 1 Thus the forecast error e T+1,T = X T+1 X T+1,T = (1 + φ)x T φx T 1 + Z T+1 [(1 + φ)x T φx T 1 ] = Z T+1 And the variance of the forecast error Var(e T+1,T ) = Var(Z T+1 ) = σ Z 2 Two-step ahead forecast: X T+2 = (1 + φ)x T+1 φx T + Z T+2 = (1 + φ)[(1 + φ)x T φx T 1 + Z T+1 ] φx T + Z T+2 = (1 + φ + φ 2 )X T (φ + φ 2 )X T 1 + Z T+2 + (1 + φ)z T+1 Therefore X T+2,T = E[X T+2 I T ] = E[(1 + φ + φ 2 )X T (φ + φ 2 )X T 1 + Z T+2 + (1 + φ)z T+1 I T ] = (1 + φ + φ 2 )X T (φ + φ 2 )X T 1 Thus the forecast error e T+2,T = X T+2 X T+2,T = Z T+2 + (1 + φ)z T+1 And the variance of the forecast error Var(e T+2,T ) = (2 + 2φ + φ 2 )σ Z 2 8. Prediction (Forecast) Error, and Prediction Interval Recall the k-step ahead prediction error is defined as e +, = X + E(X + I ) By the law of total variance: Var( ) = E[Var( X)] + Var[E( X)] 8

We can derive the variance of the prediction error: Var(e +, ) = Var(X + I ) Based on the predictor and its error variance, we may construct the prediction intervals. For example, if the X s are normally distributed, we can consider the 95% prediction interval E(X + I ) 0.02 Var(X + I ) 9