Forecasting Let {y t } be a covariance stationary are ergodic process, eg an ARMA(p, q) process with Wold representation y t = X μ + ψ j ε t j, ε t ~WN(0,σ 2 ) j=0 = μ + ε t + ψ 1 ε t 1 + ψ 2 ε t 2 + Let I t = {y t,y t 1,} denote the information set available at time t Recall, E[y t ] = μ var(y t ) = σ 2 X ψ 2 j j=0 Goal: Using I t produce optimal forecasts of y t+h for h =1, 2,,s Define y t+h t as the forecast of y t+h based on I t known parameters The forecast error is ε t+h t = y t+h y t+h t and the mean squared error of the forecast is MSE(ε t+h t ) = E[ε 2 t+h t ] = E[(y t+h y t+h t ) 2 ] Theorem: The minimum MSE forecast (best forecast) of y t+h based on I t is y t+h t = E[y t+h I t ] Proof: See Hamilton pages 72-73 Note: y t+h = μ + ε t+h + ψ 1 ε t+h 1 + +ψ h 1 ε t+1 + ψ h ε t + ψ h+1 ε t 1 +
Remarks 1 The computation of E[y t+h I t ] depends on the distribution of {ε t } and may be a very complicated nonlinear function of the history of {ε t } Even if {ε t } is an uncorrelated process (eg white noise) it may be the case that E[ε t+1 I t ] 6= 0 2 If {ε t } is independent white noise, then E[ε t+1 I t ]= 0 and E[y t+h I t ] will be a simple linear function of {ε t } y t+h t = μ + ψ h ε t + ψ h+1 ε t 1 + Linear Predictors A linear predictor of y t+h t is a linear function of the variables in I t Theorem: The minimum MSE linear forecast (best linear predictor) of y t+h based on I t is y t+h t = μ + ψ h ε t + ψ h+1 ε t 1 + Proof See Hamilton page 74 The forecast error of the best linear predictor is ε t+h t = y t+h y t+h t = μ + ε t+h + ψ 1 ε t+h 1 + +ψ h 1 ε t+1 + ψ h ε t + (μ + ψ h ε t + ψ h+1 ε t 1 + ) = ε t+h + ψ 1 ε t+h 1 + + ψ h 1 ε t+1 and the MSE of the forecast error is MSE(ε t+h t )=σ 2 (1 + ψ 2 1 + + ψ2 h 1 )
Remarks Example: BLP for MA(1) process 1 E[ε t+h t ]=0 2 ε t+h t is uncorrelated with any element in I t Here y t = μ + ε t + θε t 1, ε t WN(0,σ 2 ) ψ 1 = θ, ψ h =0for h>1 3 The form of y t+h t is closely related to the IRF Therefore, y t+1 t = μ + θε t 4 MSE(ε t+h t )=var(ε t+h t ) var(y t ) 5 lim h y t+h t = μ 6 lim h MSE(ε t+h t )=var(y t ) y t+2 t = μ y t+h t = μ for h>1 The forecast errors and MSEs are ε t+1 t = ε t+1, MSE(ε t+1 t )=σ 2 ε t+2 t = ε t+2 + θε t+1, MSE(ε t+2 t )=σ 2 (1 + θ 2 )
Prediction Confidence Intervals Predictions with Estimated Parameters If {ε t } is Gaussian then y t+h I t N(y t+h t,σ 2 (1 + ψ 2 1 + + ψ2 h 1 )) A95%confidence interval for the h step prediction has the form q y t+h t ± 196 σ 2 (1 + ψ 2 1 + + ψ2 h 1 ) Let ŷ t+h t denote the BLP with estimated parameters: ŷ t+h t =ˆμ + ˆψ hˆε t + ˆψ h+1ˆε t 1 + where ˆε t is the estimated residual from the fitted model The forecast error with estimated parameters is ˆε t+h t = y t+h ŷ t+h t = (μ ˆμ)+ε t+h + ψ 1 ε t+h 1 + + ψ h 1 ε t+1 + ³ ψ h ε t ˆψ hˆε t + ³ ψh+1 ε t 1 ˆψ h+1ˆε t 1 + Obviously, MSE(ˆε t+h t ) 6= MSE(ε t+h t )=σ 2 (1+ψ 2 1 + +ψ2 h 1 ) Note: Most software computes dmse(ε t+h t )=ˆσ 2 (1 + ˆψ 2 1 + + ˆψ 2 h 1)
Computing the Best Linear Predictor The BLP y t+h t maybecomputedinmanydifferent but equivalent ways The algorithm for computing y t+h t from an AR(1) model is simple and the methodology allows for the computation of forecasts for general ARMA models as well as multivariate models Example: AR(1) Model y t μ = φ(y t 1 μ)+ε t ε t ~WN(0,σ 2 ) μ, φ, σ 2 are known In the Wold representation ψ j = φ j Starting at t and iterating forward h periods gives y t+h = μ + φ h (y t μ)+ε t+h + φε t+h 1 + +φ h 1 ε t+1 = μ + φ h (y t μ)+ε t+h + ψ 1 ε t+h 1 + The best linear forecasts of y t+1,y t+2,,y t+h are computed using the chain-rule of forecasting (law of iterated projections) y t+1 t = μ + φ(y t μ) y t+2 t = μ + φ(y t+1 t μ) =μ + φ(φ(y t μ)) = μ + φ 2 (y t μ) y t+h t = μ + φ(y t+h 1 t μ) =μ + φ h (y t μ) The corresponding forecast errors are ε t+1 t = y t+1 y t+1 t = ε t+1 ε t+2 t = y t+2 y t+2 t = ε t+2 + φε t+1 = ε t+2 + ψ 1 ε t+1 ε t+h t = y t+h y t+h t = ε t+h + φε t+h 1 + +φ h 1 ε t+1 = ε t+h + ψ 1 ε t+h 1 + + ψ h 1 ε t+1 +ψ h 1 ε t+1
The forecast error variances are var(ε t+1 t ) = σ 2 var(ε t+2 t ) = σ 2 (1 + φ 2 )=σ 2 (1 + ψ 2 1 ) var(ε t+h t ) = σ 2 (1 + φ 2 + + φ 2(h 1) )=σ 21 φ2h 1 φ 2 Clearly, = σ 2 (1 + ψ 2 1 + + ψ2 h 1 ) lim y t+h t = μ = E[y t ] h lim var(ε t+h t ) = σ 2 h 1 φ 2 = σ 2 X h=0 ψ 2 h = var(y t) AR(p) Models Consider the AR(p) model φ(l)(y t μ) = ε t, ε t WN(0,σ 2 ) φ(l) = 1 φ 1 L φ p L p The forecasting algorithm for the AR(p) models is essentiallythesameasthatforar(1)modelsonceweputthe AR(p) model in state space form Let X t = y t μ The AR(p) in state space form is or X t X t 1 X t p+1 = φ 1 φ 2 φ p 1 0 0 0 1 0 ξ t = Fξ t 1 +w t var(w t ) = Σ w X t 1 X t 2 X t p + ε t 0 0
Starting at t and iterating forward h periods gives ξ t+h = F h ξ t + w t+h + Fw t+h 1 + + F h 1 w t+1 Then the best linear forecasts of y t+1,y t+2,,y t+h are computed using the chain-rule of forecasting are ξ t+1 t = Fξ t ξ t+2 t = Fξ t+1 t = F 2 ξ t ξ t+h t = Fξ t+h 1 t = F h ξ t The forecast for y t+h is given by μ plus the first row of ξ t+h t = F h ξ t : ξ t+h t = φ 1 φ 2 φ p 1 0 0 0 1 0 h y t μ y t 1 μ y t p+1 μ The forecast errors are given by w t+1 t = ξ t+1 ξ t+1 t = w t+1 w t+2 t = ξ t+2 ξ t+2 t = w t+2 + Fw t+1 w t+h t = ξ t+h ξ t+h t = w t+h + Fw t+h 1 + +F h 1 w t+1 and the corresponding forecast MSE matrices are var(w t+1 t ) = var(w t )=Σ w var(w t+2 t ) = var(w t+2 )+Fvar(w t+1 )F 0 var(w t+h t ) = Notice that = Σ w + FΣ w F 0 h 1 X F j Σ w F j0 j=0 var(w t+h t )=Σ w + Fvar(w t+h 1 t )F 0
Forecast Evaluation Diebold-Mariano Test for Equal Predictive Accuracy Let {y t } denote the series to be forecast and let y t+h t 1 and y t+h t 2 denote two competing forecasts of y t+h based on I t For example, y t+h t 1 could be computed from an AR(p) modelandy t+h t 2 could be computed from an ARMA(p, q) model The forecast errors from the two models are ε 1 t+h t = y t+h y 1 t+h t ε 2 t+h t = y t+h y 2 t+h t The h step forecasts are assumed to be computed for t = t 0,,T for a total of T 0 forecasts giving {ε 1 t+h t }T t 0, {ε 2 t+h t }T t 0 Because the h-step forecasts use overlapping data the forecast errors in {ε 1 t+h t }T t 0 and {ε 2 t+h t }T t 0 will be serially correlated
The accuracy of each forecast is measured by a particular loss function L(y t+h,yt+h t i )=L(εi t+h t ), i =1, 2 Some popular loss functions are: L(ε i t+h t ) = ³ ε i t+h t 2 : squared error loss L(ε i t+h t ) = ε i t+h t : absolute value loss To determine if one model predicts better than another wemaytestnullhypotheses H 0 : E[L(ε 1 t+h t )] = E[L(ε2 t+h t )] against the alternative H 1 : E[L(ε 1 t+h t )] 6= E[L(ε2 t+h t )] The Diebold-Mariano test is based on the loss differential d t = L(ε 1 t+h t ) L(ε2 t+h t ) The null of equal predictive accuracy is then H 0 : E[d t ]=0 The Diebold-Mariano test statistic is d S = ³ avar( d d) 1/2 = d ³ LRV d 1/2 d /T where d = 1 X T d t T 0 t=t 0 LRV d = X γ 0 +2 γ j, γ j = cov(d t,d t j ) j=1 Note: The long-run variance is used in the statistic because the sample of loss differentials {d t } T t 0 are serially correlated for h>1
Diebold and Mariano (1995) show that under the null of equal predictive accuracy S A ~ N(0, 1) So we reject the null of equal predictive accuracy at the 5% level if S > 196 One sided tests may also be computed