Monitoring Forecasting Performance Identifying when and why return prediction models work Allan Timmermann and Yinchu Zhu University of California, San Diego June 21, 2015
Outline Testing for time-varying forecasting performance Exploiting time-varying forecasting performance A simulation experiment Empirical results
Conditional forecast comparison 1. At time t, we have two predictions ŷ t+1,1 (challenger) and ŷ t+1,2 (benchmark) for the quantity y t+1 2. At time t + 1, we compute the realized relative loss of the two predictions: L t+1 = L (ŷ t+1,1, y t+1 ) L (ŷ t+1,2, y t+1 ). We ask the following questions: Does L t+1 depend on information z t observed at time t? If so, can we exploit this?
Comparison of methodologies Two models ( with parameter β 1 and β 2 with (β 1,, β 2, ) the true parameter value and ˆβt,1, ˆβ ) t,2 the estimated value. L t+1 (β 1, β 2) = L (ŷ t+1,1 (β 1), y t+1) L (ŷ t+1,2 (β 2), y t+1) H (1) 0 : E[ L t+1 ( ˆβ t,1, ˆβ t,2 )] = 0. (Diebold and Mariano (1995)) H (2) 0 : E[ L t+1 (β 1,, β 2, )] = 0. (West (1996), Clark and McCracken (2001), etc) H (3) 0 : E[ L t+1 ( ˆβ t,1, ˆβ t,2 ) z t ] = 0. (Giacomini and White (2006)) H (4) 0 : E[ L t+1 (β 1,, β 2, ) z t ] = 0. We are interested in H (3) 0.
Testing for time-varying forecasting performance H 0 : E ( L t+1 z t ) = 0 almost surely where L t+1 = L t+1 ( ˆβ t,1, ˆβ t,2 ) with ˆβ t,1 and ˆβ t,2 based on a rolling window. Notice that H 0 is equivalent to H 0 : E ( L t+1 h(z t )) = 0 h directing power to chosen directions: Giacomini and White (2006) directing power to all directions: Escanciano (2007)
Giacomini and White (2006) approach We first choose a R d h-valued function h and then test H 0,h : E ( L t+1h (z t )) = 0. [ T 1 J h,t : = T 1 t=1 T 1 ˆΩ h,t = T 1 t=1 L t+1 h (z t ) ] ˆΩ 1 h,t g h,t ( L t+1 ) 2 h (z t ) h (z t ) [ T 1 t=1 L t+1 h (z t ) ] In other words, for a test for H 0 J h,t with χ 2 d h,1 α. with nominal size α, we compare
Escanciano (2007) approach 1. Define R T (u) = T 1/2 T 1 t=1 L t+1w u (z t ) such that E ( L t+1 w u (z t )) = 0 u E ( L t+1 h(z t )) = 0 h e.g. w (z, u) = 1 {z u}. 2. Compute M w,t = R T (u) 2 φ (u) du, where φ ( ) > 0 is some kernel, e.g. pdf of N (0, 1). 3. Simulate M w,t = R T (u) 2 φ (u) du, where R T (u) = T 1/2 T 1 t=1 V t L t+1 w u (z t ) and V t iid with EV t = 0 and EV 2 t = 1. 4. In a test for H 0 of nominal size α, compare M w,t with the 1 α quantile of M w,t.
Exploiting relative forecasting performance If H 0 is reject, we can consider the following simple method: L t+1 = γ 0 + γ 1 z t + ξ t+1 1. At time t, we run the above regression using OLS with a rolling window 2. At time t, our new prediction for y t+1 is ŷ t+1,sw = ŷ t+1,1 1 {ˆγ 0,t + ˆγ 1,t z t > 0}+ŷ t+1,2 1 {ˆγ 0,t + ˆγ 1,t z t 0}
Exploiting relative forecasting performance If E ( L t+1 ) = 0 but E ( L t+1 z t ) 0, then the challenger model is sometimes better but also sometimes worse. The switching rule would be better than always using the challenger model or the benchmark model.
Why should this work: an experiment Why don t we simply include z t in the model? One answer is that it depends on the tradeoff between specification error and estimation error. Consider the following data generating process: y t+1 = α + β st x t + σ st ε t+1 and z t = m st + σ u u t, where x t, ε t, u t are iid N(0, 1) and s t {1, 2} is iid Bernoulli with P (s t = 1) = p.
Why should this work: an experiment We compare the prevailing mean forecast ŷ t+1,0 (benchmark) with the following: ŷ t+1,1 : univariate model y t+1 = α + βx t + ε t+1 using OLS ŷ t+1,2 : bivariate model y t+1 = α + βx t + γz t + ε t+1 using OLS ŷ t+1,3 : true model using MLE ŷ t+1,4 : univariate model y t+1 = α + βx t + ε t+1 with the switching rule described before Let L (i) t+1 = (y t+1 ŷ t+1,0 ) 2 (y t+1 ŷ t+1,i ) 2 and compute, by simulation, E L (i) t+1.
Why should this work: an experiment The parameters in the true data generating process are set to match the real data. All the regressions are done with a rolling window of length 240. We simulate 2.7 million random samples. E L (1) t+1 E L (2) t+1 E L (3) t+1 E L (4) t+1 0.0780 0.0020 0.1028 0.1037
Empirical results: motivation Goyal and Welch (2008): no univariate prediction models seem to outperform out-of-sample the prevailing mean model. Paye and Timmermann (2006), Rapach and Wohar (2006), Goyal and Welch (2008), Rapach, Strauss and Zhou (2010): there are breaks in model parameters; predictability varies with the economic cycle. Henkel, Martin and Nadari (2010), Dangl and Halling (2012), Johannes, Korteweg and Polson (2014): models with regime switching or time-varying coefficients have better performance.
Empirical results: data description We consider the dataset in Goyal and Welch (2008). The goal is to forecast the S&P500 monthly return r t+1. There are 14 predictors, including financial variables: dp (dividende-price ratio), lnv (log realized volatility), etc macro variables: inflation, tbl (t-bill rate) We add more macro variables: UG (unemployment gap), GDP and Cash (firms cash holding).
Empirical results: MSE as loss function Model 1: fit 14 univariate models and use the average of these 14 forecasts as ˆr t+1,1 Model 2: use the forecast of the prevailing mean model as ˆr t+1,2 We look at the p-values of the tests for model instability and the t-stats for E L MSE t+1 and E LMSE t+1,sw, where L MSE t+1 = (r t+1 ˆr t+1,2 ) 2 (r t+1 ˆr t+1,1 ) 2 L MSE t+1,sw = (r t+1 ˆr t+1,2 ) 2 (r t+1 ˆr t+1,sw ) 2
Empirical results: MSE as loss function Z J h,t (p-val) M w,t (p-val) E L MSE t+1 (t-stat) E L MSE t+1,sw (t-stat) X 0.02** 0.23 1.82* 1.97** X 2 0.08* 0.04** 1.20 ln V 0.02** 0.00*** 1.42 UG 0.01** 0.01*** 1.48 GDP 0.04** 0.25 1.41 Cash 0.08* 0.04** 1.28 UG, X 0.02** 0.06* 1.82* UG, X 2 0.03** 0.07* 1.36 UG, ln V 0.02** 0.08* 1.80* UG, GDP 0.03** 0.07* 1.45 UG, Cash 0.03** 0.07* 1.82*
Empirical results: MSE as loss function 20 x 10 3 MSE switching rule of 14 GW var model avg with Z= X 15 10 5 0 5 1970 1975 1980 1985 1990 1995 2000 2005 2010 time Red line is cumulated L MSE t+1,sw ; the blue is cumulated LMSE t+1.
Empirical results: utility as loss function The data and models are exactly the same as before, but we change the loss function. L utility t+1 = U (r t+1, ˆr t+1,1 ) U (r t+1, ˆr t+1,2 ) L utility t+1,sw = U (r t+1, ˆr t+1,sw ) U (r t+1, ˆr t+1,2 ) where U (r, m) = rw (m) γ 2 r 2 w 2 (m), γ measures the risk aversion, and w (m) is the portfolio weight on the risky asset assuming that its conditional mean is m.
Empirical results: utility as loss function Z J h,t (p-val) M w,t (p-val) E L utility t+1 (t-stat) E Lutility t+1,sw (t-stat) X 0.14 0.06* 1.80* 1.85* X 2 0.17 0.01** 1.67* ln V 0.19 0.07* 1.58 UG 0.02** 0.00*** 2.43** GDP 0.19 0.04** 1.79* Cash 0.03** 0.00*** 1.89* UG, X 0.04** 0.01** 2.29** UG, X 2 0.04** 0.02** 2.40** UG, ln V 0.02** 0.02** 2.26** UG, GDP 0.04** 0.02** 2.12** UG, Cash 0.03** 0.02** 2.55**
Empirical results: utility as loss function 1.6 quadratic utility switching rule of 14 GW var model avg with Z= ug 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0.2 1970 1975 1980 1985 1990 1995 2000 2005 2010 time Red line is cumulated L utility t+1,sw ; the blue is cumulated Lutility t+1.