Lecture 8: ARIMA Forecasting Please read Chapters 7 and 8 of MWH Book 1
Predicting Error 1. y denotes a random variable (stock price, weather, etc) 2. Sometimes we want to do prediction (guessing). Let c denotes the predicted value. 3. y c is the predicting error, or loss. 4. Consider the quadratic loss function (y c) 2 5. The quadratic loss function puts heavy penalty on big predicting errors 6. The quadratic function is differentiable, a nice property 2
Unconditional Mean 1. The goal is to find the best guess that minimizes the expected quadratic loss min c E(y c) 2. (1) 2. First order condition is 2E(y c) = 0 (2) and the solution is unconditional mean c = E(y) (3) 3. So unconditional mean is the best guess that minimizes the quadratic loss E(y E(y)) 2 E(y c) 2, c (4) The equality holds only when c = E(y) 3
Unconditional Variance 1. If we use the unconditional mean as the guess, the expected quadratic loss is E(y E(y)) 2 = var(y) (5) 2. So the unconditional variance measures the accuracy of prediction. 3. We get more accurate prediction as variance gets smaller. 4
Application 1. Suppose one day you meet a stranger. You want to guess his salary. 2. Your best guess is 3. The intuition is, we can treat that stranger as an average person, who earns average salary. 4. The unconditional mean can be obtained by averaging salaries of all persons (population). 5
Conditional Mean 1. You can obtain better guess by using more information 2. Suppose this person is white. Intuitively, the better guess is the average salary for white, or the mean of salary conditional on the fact that a person is white 3. Let x denote the new information (conditioning variable). In this example x is the color of skin 4. E(y x) denotes the conditional mean, the mean of y conditional on that x takes a specific value 5. Note the unconditional mean is a unique number. But the conditional mean is a variable that takes values of E(y x = white),e(y x = black),e(y x = yellow). 6
Conditional Mean is a Random Variable 1. The unconditional mean is the mean for population, while the conditional mean is the mean for sub-population. 2. Sub-population is specified by conditioning variable. The sub-population varies when the conditioning variable varies. 3. The conditional mean is random variable since it is function of conditioning (random) variable 4. Therefore it makes sense to ask, say, what is the expected value of conditional mean E[E(y x)] =? 7
Law of Iterated Expectation (LIE) 1. We have the following important result E[E(y x)] = E(y), (Law of Iterated Expecation) (6) 2. LIE says that the mean of conditional mean is the unconditional mean 3. Note that the expectation inside the bracket (conditional mean) is taken with respect to y while holding x constant. The outside expectation is taken with respect to x. Some textbook uses subscript to make this distinction clear: E x [E y (y x)] 8
Intuition for Law of Iterated Expectation We can always divide a population into several sub-populations, and calculate the mean for each one of them. Then the mean for population equals the weighted average of the means for sub-populations. The weight is determined by the distribution of conditioning variable. 9
Application 1. Let y be the time spent on Facebook by a Miami student 2. There are two ways to obtain E(y). First we can calculate it directly by averaging y among all Miami students. 3. Let the conditioning variable x be a student s major. 4. To obtain E(y) indirectly we first compute E(y ma jor = eco), E(y ma jor = math), and etc. Then we compute the weighted average of those major averages. The weight for E(y ma jor = eco) is the proportion of eco major students to all Miami students. 10
Proof of Law of Iterated Expectation For discrete random variable the proof is E(y) = y j P(y = y j ) j = j = j = i ) y j ( P(y = y j,x = x i ) i y j ( ( i P(y = y j x = x i )P(x = x i ) y j P(y = y j x = x i ) j = E(y x = x i )P(x = x i ) i = E[E(y x)] ) ) P(x = x i ) 11
Conditional Mean is the Optimal Predictor 1. We have an important result Theorem 1 E[y E(y x)] 2 E[y g(x)] 2 g(.) (7) 2. This theorem says that the conditional mean is the best predictor (based on quadratic loss function) for y among all predictors that use x. 3. This theorem justifies that we use the average salary for white E(salary color = white) as the predictor. 4. This theorem provides foundation for modern forecasting theory 12
Forecasting MA(1) 1. Consider an MA(1) process that ends at the n-th period y t = e t + θ 1 e t 1, (t = 1,2,...,n) where e t is white noise with zero serial correlation cov(e t,e t j ) = 0, j 0. This implies E(e n+k Ω n ) = 0, ( k > 0) (8) where Ω n = (y n,y n 1,...,y 1,e n,e n 1,...,e 1 ) is the information set at time n 2. The one-step forecast is E(y n+1 Ω n ) = E((e n+1 + θ 1 e n ) Ω n ) = θ 1 e n due to (8). We can obtain e recursively: letting e 0 = 0, then e 1 = y 1, e 2 = y 2 θ 1 e 1,... 3. The two-step forecast is E(y n+2 Ω n ) = E((e n+2 + θ 1 e n+1 ) Ω n ) = 0 In general, for MA(1) process, E(y n+k Ω n ) = 0, k 2. 13
Forecasting AR(1) 1. Consider an AR(1) process y t = ϕ 1 y t 1 + e t, (t = 1,2,...,n) 2. The one-step forecast is E(y n+1 Ω n ) = E((ϕ 1 y n + e n+1 ) Ω n ) = ϕ 1 y n 3. The two-step forecast is E(y n+2 Ω n ) = E(((ϕ 1 y n+1 + e n+2 ) Ω n ) = ϕ 1 E(y n+1 Ω n ) = ϕ1 2 y n 4. So we can obtain all other forecasts recursively (using R for loop) E(y n+k Ω n ) = ϕ 1 E(y n+k 1 Ω n ) = ϕ1 2 E(y n+k 2 Ω n ) =... = ϕ1 k y n 14
Forecasting Error 1. Again consider an AR(1) process y t = ϕ 1 y t 1 + e t, (t = 1,2,...,n) 2. The one-step forecasting error is 3. The two-step forecasting error is Exercise: prove it! 4. In general, the k-step forecasting error is y n+1 E(y n+1 Ω n ) = y n+1 ϕ 1 y n = e n+1 y n+2 E(y n+2 Ω n ) = e n+2 + ϕ 1 e n+1 y n+k E(y n+k Ω n ) = e n+k + ϕ 1 e n+k 1 + ϕ 2 1 e n+k 2 +...ϕ k 1 1 e n+1 15
Limit of Forecast 1. For both MA(1) and AR(1), we see that as k,e(y n+k Ω n ) 0, the unconditional mean. So another interpretation of unconditional mean is that it is the limit of long run forecast. 2. We can also show the limit of variance of forecasting error is the unconditional variance (Exercise: prove it). 3. In terms of 95% prediction interval, the limit is a band with the unconditional mean plus minus 1.96 times the unconditional standard deviation. 16
Forecasting Nonstationary Series 1. So far we assume a time series is stationary. For nonstationary series, we need to take difference, then estimate ARMA models 2. For example, for ARIMA(1,1,0), we first obtain the forecasts for the first-difference E( y n+1 Ω n ) = ϕ 1 y n (9) E( y n+2 Ω n ) = ϕ 1 E( y n+1 Ω n ) = ϕ 2 1 y n (10)... =... 3. Then we obtain the forecasts for the levels: E(y n+1 Ω n ) = y n + E( y n+1 Ω n ) (11) E(y n+2 Ω n ) = E(y n+1 Ω n ) + E( y n+2 Ω n ) (12)... =... 17
Forecasting Trending Series 1. We already knew that a random walk with drift term is trending or equivalently, y t = d + y t 1 + e t y t = d + e t 2. More generally, for a trending series, Box-Jenkins (BJ) methodology assumes stochastic trend and considers, for instance, an ARMIA(p,1,q) with drift term ϕ(l) y t = d + θ(l)e t where ϕ(l) is a p-order polynomial of lag (or backward shift) operator ϕ(l) = 1 ϕ 1 L ϕ 2 L 2... ϕ p L p, and θ(l) is a q-order polynomial of lag operator. 18
Forecasting Seasonal Series There are two ways to forecast a series with seasonality: 1. We can use dummy variable. For example, consider an AR(1) with three quarterly dummies y t = β 0 + β 1 D 1 + β 2 D 2 + β 3 D 3 + ϕ 1 y t 1 + e t where D i equals one for the i-th quarter. The fourth quarter, the omitted one or base group, is captured by β 0 ; β 1 measures the difference between the first and fourth quarter, etc. 2. More parsimoniously, we can use seasonal AR(1) model. For quarterly data, consider y t = β 0 + γ 4 y t 4 + e t 19
Seasonal ARIMA Model For a series that shows both trending behavior and seasonality, we can consider the seasonal ARIMA model ϕ(l) s y t = d + θ(l)e t where s y t y t y t s denotes the seasonal difference. We let s = 4 for quarterly data, and s = 12 for monthly data. Basically, the trend is captured by the drift term d, and the seasonality is accounted for by taking seasonal difference. Once again, after estimating the models, we obtain forecasts using the conditional means. 20