Forecasting 1: Comparing Forecasting Model

Size: px

Start display at page:

Download "Forecasting 1: Comparing Forecasting Model"

Christian Simon
5 years ago
Views:

1 Forecasting 1: Comparing Forecasting Model Peter Reinhard Hansen European University Institute February 25-26, 2013 Peter Reinhard Hansen (SMU 2013) EPA, SPA and MCS February 25-26, / 49

2 Summary: Forecast Comparison Tests Testing for Equal Predictive Ability (EPA) Comparing two (or more) competing forecasts. Question: Are the forecasts equally good? Testing for Superior Predictive Ability (SPA) Multiple competing forecasts - one being a benchmark. Question: Does any alternative beat the benchmark? Model Confidence Set (MCS) Multiple competing forecasts. Question: Which alternative might be the best? Discard inferior forecasts. Peter Reinhard Hansen (SMU 2013) EPA, SPA and MCS February 25-26, / 49

3 Notation Y t time series of interest, t = 1,..., T. Ŷ T +h,t forecast of Y T +h made at time T. Example (regression based forecast): Ŷ T +h,t = ˆβ T X T, where X T is a vector of predictors and ˆβ T = ( T H t=1 X t X t ) 1 T H t=1 X t Y t+h. Peter Reinhard Hansen (SMU 2013) EPA, SPA and MCS February 25-26, / 49

4 Out-of-Sample Forecast Evaluation Out of sample period t = n + 1,..., n + m = T. Forecasts evaluated with your favorite loss function L(Y t, Ŷ t,t h ), t = n + 1,..., n + m. Example (Mean Square Prediction Error) L(Y t, Ŷ t,t h ) = (Y t Ŷ t,t h ) 2. Now we can compare forecasts in terms of their out-of-sample performance, 1 T (Y t Ŷ t,t h ) 2. m t=n+1 Peter Reinhard Hansen (SMU 2013) EPA, SPA and MCS February 25-26, / 49

5 Diebold-Mariano Test for Equal Predictive Ability Peter Reinhard Hansen (SMU 2013) EPA, SPA and MCS February 25-26, / 49

6 Peter Reinhard Hansen (SMU 2013) EPA, SPA and MCS February 25-26, / 49

7 DM: Comparing Two Forecasts Forecast 1: Ŷ (1) (2) t,t h compared with Forecast 2: Ŷ t,t h, 1 m n+m t=n+1 [ (Y t Ŷ (1) t,t h )2 (Y t Ŷ (2) t,t h )2]. < 0 means Forecast 1 had a better out-of-sample performance. Significantly better? Define expected loss differential µ = E [d t ], d t = (Y t Ŷ (1) t,t h )2 (Y t Ŷ (2) t,t h )2. Now test H 0 : µ = 0. Test for Equal Predictive Ability. How? d 0 SE( d). Peter Reinhard Hansen (SMU 2013) EPA, SPA and MCS February 25-26, / 49

8 DM: General Framework The general framework in DM: d t = L(Y t, Ŷ (1) t,t h ) L(Y t, Ŷ (2) t,t h ). Assumptions: So that {d t } is stationary (ergodic) with short memory. d = 1 m n+m t=n+1 d t p E(dt ), and m ( d E [dt ] ) d N(0, σ 2 d). Peter Reinhard Hansen (SMU 2013) EPA, SPA and MCS February 25-26, / 49

9 The Diebold Mariano Test The implementation of the DM test requires an estimate of σ 2 d. DM propose ˆσ 2 d = h 1 (h 1) ˆγ j, where ˆγ j = 1 m n+m t=n+1+j (d t d)(d t j d). ˆσ 2 d p σ 2 d if d t is at most h 1-dependent. However, ˆσ 2 d is not guaranteed to be positive. Peter Reinhard Hansen (SMU 2013) EPA, SPA and MCS February 25-26, / 49

10 A Comment on HAC Estimation Two common misunderstandings about Heteroskedasticity and Autocorrelation Consistent (HAC) Estimation. 1) If γ j = 0 for j > H then... one can use the Newey-West estimator H j= H (HAC estimator with Bartlett kernel). ( 1 j ) ˆγ j, H + 1 No! Not consistent for H j= H γ j. That is why H as sample size goes to infinity Peter Reinhard Hansen (SMU 2013) EPA, SPA and MCS February 25-26, / 49

11 A Comment on HAC Estimation If γ j = 0 for j > H then... one can use H j= H ˆγ j. There is no point estimating γ j for j > H because these are known to be zero. Wrong! Often there is a linear combination of the noisy zeros j >H a j ˆγ j that is negatively correlated with j H ˆγ j so that var ˆγ j + a j ˆγ j = var ˆγ j + var a j ˆγ j j H j >H j H j >H +2cov ˆγ j, a j ˆγ j j H j >H < var ˆγ j. j H Peter Reinhard Hansen (SMU 2013) EPA, SPA and MCS February 25-26, / 49

12 Back to Diebold Mariano We have d p E(d t ), m ( d E [dt ] ) d N(0, σ 2 d), and ˆσ 2 d p σ 2 d. So if E(d t ) = 0 then DM = d d ˆσ 2 d/m N(0, 1). Peter Reinhard Hansen (SMU 2013) EPA, SPA and MCS February 25-26, / 49

13 Limitation of DM Test The result that DM = d d ˆσ 2 d/m N(0, 1), is largely driven by the assumed stationarity of {d t }. This stationarity assumption is not compatible with recursively estimated parameters. Recall that in the regression model the SE of the prediction errors had two sources X n ˆΣ ˆβn X n + ˆσ u(x 2 n ), where ˆΣ ˆβn = O p ( 1 n ). With recursively estimated parameters we should expect to be decreasing in t. var(l(y t, Ŷ t,t h )), Peter Reinhard Hansen (SMU 2013) EPA, SPA and MCS February 25-26, / 49

14 West s Test for Equal Predictive Ability Peter Reinhard Hansen (SMU 2013) EPA, SPA and MCS February 25-26, / 49

15 Peter Reinhard Hansen (SMU 2013) EPA, SPA and MCS February 25-26, / 49

16 West s Framework Unlike the Diebold Mariano framework, parameter uncertainty matters in West s framework.... to some extend. Let the i-th forecast be given by Ŷ (i) t+h,t = Ŷ (i) (i) t+h,t (ˆθ t ), where ˆθ (i) t General loss functions, are parameters that have been estimated, with Ŷ (i) t+h,t F t. L(Y t+h, Ŷ (i) t+h,t ). Peter Reinhard Hansen (SMU 2013) EPA, SPA and MCS February 25-26, / 49

17 West s Framework To analyze the effects of parameter estimation, define l i,t (θ) = L(Y t+h, Ŷ (i) t+h,t (θ (i))). Stack competing forecast l t (θ) = L(Y t+h, Ŷ (1) t+h,t (θ (1))). L(Y t+h, Ŷ (M) t+h,t (θ (M))). Peter Reinhard Hansen (SMU 2013) EPA, SPA and MCS February 25-26, / 49

18 West s Framework Define ˆθ t = (ˆθ (1),t,..., ˆθ (M),t ), and θ := plim t ˆθ t. The null hypothesis of interest in West (1996) is c E[l t (θ )] =.. c All forecast are equally good... Once their estimates have converged to their probability limit Peter Reinhard Hansen (SMU 2013) EPA, SPA and MCS February 25-26, / 49

19 The Asymptotic Analysis Assume smoothness l t (θ) = l t (θ ) + l t (θ)(θ θ ) + remainder. Assume ˆθ t θ = A t S t, where A t p A, and S t = 1 t t τ=1 s τ (θ ) with E[s τ (θ )] = 0, and so that t( S t 0) d N(0, ) Peter Reinhard Hansen (SMU 2013) EPA, SPA and MCS February 25-26, / 49

20 The Asymptotic Analysis Out-of-sample evaluation l = 1 m n+m t=n+1 l t(ˆθ t ). m [ l El t (θ ) ] n+m = m 1/2 where the second term... m 1/2 n+m t=n+1 t=n+1 n+m +m 1/2 t=n+1 n+m l t (θ )(ˆθ t θ ) = m 1/2 = m 1/2 FA [l t (θ ) El t (θ )]. t=n+1 l t (θ )(ˆθ t θ ) +... n+m t=n+1 E[ l t (θ )] A }{{} t S t (θ ) + small =F S t (θ ) + small Peter Reinhard Hansen (SMU 2013) EPA, SPA and MCS February 25-26, / 49

21 The Asymptotic Analysis Joint convergence ( m 1/2 n+m t=n+1 [l t(θ ) El t (θ )] A n+m S t=n+1 t (θ ) ) d N(0, Σ), with ( Σll ΠΣ Σ = ls A 2ΠAΣ ss A ), where Π = π log(1+π) π with π = lim T m n. Peter Reinhard Hansen (SMU 2013) EPA, SPA and MCS February 25-26, / 49

22 The Asymptotic Analysis 1 m[ m n+m t=n+1 l t (ˆθ t ) } {{ } = l n+m El t (θ ) ] = m 1/2 t=n+1 +m 1/2 FA [l t (θ ) El t (θ )]. n+m t=n+1 S t (θ ) +... so that m [ l El t (θ ) ] d N(0, Σll + Π(2FAΣ ss A F + Σ ls A F + FAΣ sl )). }{{} parameter estimation Peter Reinhard Hansen (SMU 2013) EPA, SPA and MCS February 25-26, / 49

23 Discussion Parameter estimation contributes to the asymptotic variance unless Π(2FAΣ ss A F + Σ ls A F + FAΣ sl ) = 0. If If m lim T n = 0 Π = 0. F = E[ l t (θ )] = 0. F 0 means that θ = plim t ˆθt does not minimize loss!!! Peter Reinhard Hansen (SMU 2013) EPA, SPA and MCS February 25-26, / 49

24 Discussion Is E[l t (θ )] = the appropriate null hypothesis? All forecast are equally good... once we have estimated all parameters without error. Testing c. c There is no penalty for model complexity. E[l t (ˆθ t )] = c. c, seems more appealing from a practical viewpoint. Peter Reinhard Hansen (SMU 2013) EPA, SPA and MCS February 25-26, / 49

25 Testing for SPA (Superior Predictive Ability) Peter Reinhard Hansen (SMU 2013) EPA, SPA and MCS February 25-26, / 49

26 Testing for Superior Predictive Ability Testing for equal predictive ability: H 0 : E(L i,t L j,t ) = 0. forecast i and forecast j are equally good. Testing for superior predictive ability is H 0 : E(L i,t L j,t ) 0. forecast i is better than (or as good as) forecast j. Former is a simple hypothesis, latter is a composite hypothesis. Peter Reinhard Hansen (SMU 2013) EPA, SPA and MCS February 25-26, / 49

27 White (2000) Reality Check for Data Snooping Peter Reinhard Hansen (SMU 2013) EPA, SPA and MCS February 25-26, / 49

28 Peter Reinhard Hansen (SMU 2013) EPA, SPA and MCS February 25-26, / 49

29 Setup Let L i,t be the loss of i-th forecast at times t. For i = 0, 1,..., M. i = 0 is the benchmark model. Loss differentials d i,t = L 0,t L i,t, i = 1,..., M. Assume stationarity and define: µ i = E(d i,t ). Hypothesis of interest Benchmark is Best : µ 1 0,..., µ M 0. Peter Reinhard Hansen (SMU 2013) EPA, SPA and MCS February 25-26, / 49

30 Asymptotic Analysis Hypothesis of interest Benchmark is Best : d t = d 1,t. d M,t H 0 : µ = E(d t ) 0 R M. n 1/2 ( d µ) d N(0, Ω). Test statistic T max = max d i. 1 i M Peter Reinhard Hansen (SMU 2013) EPA, SPA and MCS February 25-26, / 49

31 Asymptotic Analysis Distribution of test statistic T max = max 1 i M n1/2 di, under the null hypothesis H 0 : µ 0. E.g. based on asymptotic result: n 1/2 ( d µ) d N(0, Ω). Problem µ is not unique under null. RC follows convention... use least favorable configuration (LFC): µ = 0. Peter Reinhard Hansen (SMU 2013) EPA, SPA and MCS February 25-26, / 49

32 Asymptotic Analysis n 1/2 ( d µ) d N(0, Ω). Under LFC µ = 0 we have n 1/2 d A N(n 1/2 µ, Ω). White proposes two way to derive critical values for T max = max 1 i M n1/2 di. Peter Reinhard Hansen (SMU 2013) EPA, SPA and MCS February 25-26, / 49

33 Asymptotic Analysis Critical values for T max = max 1 i M n 1/2 d i, where n 1/2 d A N(n 1/2 µ, Ω). 1) Simulated X N(0, ˆΩ) where ˆΩ p Ω, compute max i X i. Given enough simulations... We obtain good estimate of distribution of max i X i. 2) Bootstrap d t d (pretends µ = 0) and compute d (b), for b = 1,..., B. Now max i n 1/2 d i (b) will produce bootstrap estimate of T max -distribution (when µ = 0). Peter Reinhard Hansen (SMU 2013) EPA, SPA and MCS February 25-26, / 49

34 Least Favorable Configuration Why does LFC use µ = 0? Denote the distribution of T max as a function of µ by F µ (t). (Will also depend on other things...ω, sample size n, etc). Determine c α = c µ α so that F µ (c µ α) = 1 α. So c µ α can be interpreted as a critical value P µ (T max > c µ α) = α. Peter Reinhard Hansen (SMU 2013) EPA, SPA and MCS February 25-26, / 49

35 LFC Critical Value The critical value c µ α will depend on µ, which is unknown. How to pick a critical value c α so that when then null hypothesis is true? P µ (T max > c α ) α, H 0 can be true in many ways, µ 0, With LFC Critical Value: Pick c α so that sup P µ (T max > c α ) α. µ 0 LFC: c α = sup µ 0 c µ α. Conservative when µ 0. Peter Reinhard Hansen (SMU 2013) EPA, SPA and MCS February 25-26, / 49

36 Hansen (2005) Testing for Superior Predictive Ability Peter Reinhard Hansen (SMU 2013) EPA, SPA and MCS February 25-26, / 49

37 Peter Reinhard Hansen (SMU 2013) EPA, SPA and MCS February 25-26, / 49

38 A TESTS FOR SUPERIOR PREDICTIVE ABILITY 2. WHAT IS A TEST FOR SPA? Variable of Interest: Y t, t 1,..., n. Benchmark Forecast: Ŷ 0,t, L 0,t L Y t, Ŷ 0,t Alternative Forecast k 1 : Ŷ 1,t, L 1,t L Y t, Ŷ 1,t... Alternative Forecast k m : Ŷ m,t, L m,t L Y t, Ŷ m,t January 19, 2004 Introduction/1 (p.3)

39 A TESTS FOR SUPERIOR PREDICTIVE ABILITY 3. TEST OF EPA AND SPA Define relative performance variables contrasts X k,t L 0,t L k,t, k 1,..., m, t 1,..., n, µ k E X k,t, k 1,..., m. µ k > 0 µ k < 0 means that k th forecast better worse than benchmark. Test for Equal Predictive Ability: H 0 : µ 1 0 m 1. Diebold & Mariano JBES, 1995 and West EMA, 1996 Test for Superior Predictive Ability: H 0 : µ k 0, k 1,..., m. White The Reality Check for Data Snooping" EMA, 2000 January 19, 2004 Introduction/2 (p.4)

40 A TESTS FOR SUPERIOR PREDICTIVE ABILITY 4. MOTIVATION FOR TEST FOR SPA Forecasters: A forecaster, currently using a particular forecasting model/method/rule, wants to know whether a better forecast is available. Testing Economic Theory: Economic theory may identify a particular model as the best forecasting model, or imply certain parameter restrictions. Efficient Markets Hypothesis Can restrictions from an RBC model improve forecasts? Relations to Data Mining Model Mining and Pre-Testing He who mines data, may strike fools gold PETER COY IN BUSINESS WEEK January 19, 2004 Introduction/3 (p.5)

41 A TESTS FOR SUPERIOR PREDICTIVE ABILITY A Coin Flip Example Suppose that we observed x 67 heads out of n 100 coinflips. Is it plausible that this is a fair coin? n ( ) n q P X 67 p i 1 p n i i when p 0.5. i 67 If m 1, 000 coins were flipped 100 times each, and was the largest number of heads ; 67 x max max k 1,...,1000 x k, P X max q January 19, 2004 Introduction/4 (p.6)

42 A TESTS FOR SUPERIOR PREDICTIVE ABILITY 5. COMPARING FORECASTS IS MORE COMPLICATED Dependence Across Models: The predictive abilities of different forecasting models are not independent. Autocorrelation: Performance of individual models may be autocorrelated. Unknown Distribution: The distribution is more complicated than the binomial. Solution: Bootstrap, Bootstrap, and Asymptotics January 19, 2004 Introduction/5 (p.7)

43 A TESTS FOR SUPERIOR PREDICTIVE ABILITY 6. CONTRIBUTIONS OF THIS PAPER THEORY: Derive the distribution of a class of tests statistics for SPA. The testing problem involves Nuisance Parameters. Important Implication for the Reality Check of White TEST: Propose a New Test for SPA Based on two modifications of the RC that improves power substantially. Related to Asymptotic Tests of Composite Hypotheses", Hansen IMPLEMENTATION: Bootstrap Implementation. APPLICATION: Comparison of forecasting models for annual US inflation. January 19, 2004 Contributions/1 (p.8)

44 A TESTS FOR SUPERIOR PREDICTIVE ABILITY 7. FRAMEWORK WITH EXAMPLES At time t h we are to make a choice: δ k,t h, k 0, 1,..., m. At time t state of nature is observed: ξ t. Objective is to minimize expected loss: E L k,t E L ξ t, δ k,t h. Example: Point Forecast and MSE Loss δ k,t h Ŷ k,t, k 0, 1,..., m. ξ t Y t, and E L k,t E Y t Ŷ k,t 2. January 19, 2004 Assumptions and Theoretical Results/1 (p.9)

45 A TESTS FOR SUPERIOR PREDICTIVE ABILITY Example: Value-at-Risk Objective: α-quantile of a conditional distr. ξ t. Compare δ k,t h k 0, 1,..., m different VaR s using L k,t δk,t h ξ t x dx α. Substituting, ˆξ t, for ξ t is problematic, see Hansen & Lunde Example: Trading Rules Let δ k,t h k 0, 1,..., m be a binary variable that instructs one to take a long/short position in a particular asset. The object of interest is to maximize expected profit. Thus L k,t δ k,t h r t. Technical Trading Rules 300,000 : Sullivan, Timmermann, & White JoF, Calendar Effects 9,400 : Sullivan, Timmermann, & White JoE, Does Anything Beat a Garch 1,1? 330 : Hansen & Lunde JoAE, January 19, 2004 Assumptions and Theoretical Results/2 (p.10)

46 A TESTS FOR SUPERIOR PREDICTIVE ABILITY 8. ASSUMPTIONS AND THEORETICAL RESULTS Define X t X 1,t,..., X m,t and µ µ 1,..., µ m E X t. Null hypothesis is: H 0 : µ 0 maintained hypothesis is: µ R m. Assumption 1 It holds that n 1/2 X µ d N m 0, Ω, where X n t 1 X t and Ω lim n var n 1/2 X. Diebold & Mariano 1995, West 1996, and White 2000 make primitive assumptions that imply Assumption 1. No need to assume a particular model for the DGP! January 19, 2004 Assumptions and Theoretical Results/3 (p.11)

47 A TESTS FOR SUPERIOR PREDICTIVE ABILITY H 0 involves multiple multiple inequalities: Linear: Perlman 1969, Judge & Yancey 1986, Wolak 1987, 1989b, Dufour Non-Linear: Wolak 1989a, Restricted Alternative: Gourieroux, Holly, & Monfort 1982, Andrews See also: Goldberger These tests require an estimate of Ω m m-matrix. In our framework m can be large relative to sample size n. Aspects of forecasts comparisons: Comparing multiple nested model, see Harvey & Newbold 2000 ; Estimated parameters and non-differentiable loss functions, see McCracken 2000 ; Regression-based tests see West & McCracken 1998 ; Forecast encompassing Harvey, Leybourne, & Newbold 1998, West 2001, and Clark & Mc- Cracken 2001 ; Forecasts involving cointegrated variables, see Corradi, Swanson, & Olivetti January 19, 2004 Assumptions and Theoretical Results/4 (p.12)

48 A TESTS FOR SUPERIOR PREDICTIVE ABILITY Assumption 2 Our test statistic has the form: T n ϕ n 1/2 p X, V n, where V n vo R q, ϕ is continuous, and a ϕ u, v 0 and ϕ 0, v 0; b ϕ u, v ϕ u, v where u k max 0, u k, k 1,..., m; c ϕ u, v, if u k for any k 1,..., m. a Normalization: 0 no evidence against H 0. b Models with sample performance worse than the benchmark provide no evidence against H 0. c The test statistic diverges to as the evidence against H 0 increases. January 19, 2004 Assumptions and Theoretical Results/5 (p.13)

49 A TESTS FOR SUPERIOR PREDICTIVE ABILITY Theorem 1 Suppose Assumptions 1 and 2 hold. Define the matrix Ω o with elements Ω o ij Ω ij 1 {µi µ j 0}, i, j 1,..., m, and let F µ,ω be the cdf of ϕ Z, v 0, where Z N m 0, Ω o. Under the null hypothesis, µ 0, it holds that ϕ n 1/2 X, V n d F µ,ω. Under the alternative, µ 0, ϕ n 1/2 X, V n p. Thus the asymptotic distribution of T n ϕ n 1/2 X, V n, depends on µ and Ω. So these are nuisance parameters! Only the forecasting models for which µ k 0 matter for the asymptotic distribution of T n. January 19, 2004 Assumptions and Theoretical Results/6 (p.14)

50 A TESTS FOR SUPERIOR PREDICTIVE ABILITY Example µ Ω Ω o Asymptotic distribution of T n, F µ,ω, is given from an m 0 -dimensional Gaussian distribution, where m 0 m is the number of ks with µ k 0. Ways to handle nuisance parameter problems: Substitute consistent estimator: F µ,ˆωˆωˆω F µ,ω Least Favorable Configuration LFC : F µ,ω F 0,Ω January 19, 2004 Assumptions and Theoretical Results/7 (p.15)

51 A TESTS FOR SUPERIOR PREDICTIVE ABILITY 9. THE REALITY CHECK FOR DATA SNOOPING The Reality Check for Data Snooping by White EMA, H 0 : µ 0 equivalent to max k µ k 0. RC applies the test statistic: T RC n n 1/2 X max where X max max X 1,..., X m. Many values of µ are consistent with the null hypothesis µ 0. The RC is based on the LFC. Derives the distribution of T RC n under H 0 as if µ 0. Obtains estimate of F 0,Ω using bootstrap techniques. January 19, 2004 Assumptions and Theoretical Results/8 (p.16)

52 A TESTS FOR SUPERIOR PREDICTIVE ABILITY A Simple Example Suppose that m 2 and X t iid N 2 µ, Ω, where µ 0 γ and Ω ω 2. Forecast k 1 is as good as the benchmark µ 1 0 Forecast k 2 is worse than the benchmark µ 2 γ < 0 Then n 1/2 X N 2 ( n 1/2 µ, Ω ), where X n 1 n t 1 X t and P n 1/2 X max x P n 1/2 X 1 x, n 1/2 X 2 x Φ x Φ x n1/2 γ. ω January 19, 2004 Assumptions and Theoretical Results/9 (p.17)

53 A TESTS FOR SUPERIOR PREDICTIVE ABILITY A Simple Example Continued Thus the exact distribution is given by P n 1/2 X max x Φ x Φ x n1/2 γ ω. Since γ < 0, for large n we have Φ x n1/2 γ ω 1 for x > 0, such that P n 1/2 X max x P n 1/2 X 1 x Φ x, for x > 0. RC LFC assumes γ 0 b/c P n 1/2 X max x Φ x Φ x ω. But if ω is large, then Φ x Φ x ω Φ x for x >> 0. ω RC s critical value is defined from the poor and volatile alternative, X 2. Yet X 2 is irrelevant for the distribution of T RC n n 1/2 X max! January 19, 2004 Assumptions and Theoretical Results/10 (p.18)

54 A TESTS FOR SUPERIOR PREDICTIVE ABILITY 10. TWO ASPECTS OF THE RC S POWER PROBLEM If m is large: T RC n n 1/2 X max max n 1/2 X 1,..., n 1/2 X m Φ x n1/2 µ 1 ω 1 Φ x n1/2 µ m. ω m Different Variances: Critical value defined from the k s with the largest ω k. Many Poor Models: Suppose µ k < 0, for k 2, µ 1 0, and ω k 1. T RC n A Φ x whereas the RC applies Φ x m IMPLICATION: RC s critical value too large... Power is lost. POWER IS IMPORTANT! January 19, 2004 Assumptions and Theoretical Results/11 (p.19)

55 A TESTS FOR SUPERIOR PREDICTIVE ABILITY 11. A NEW TEST FOR SPA Two Modifications... Standardize the test statistic compare apples to apples. T SM n t max max n1/2 X 1,..., n1/2 X m. ˆω 1 ˆω m Use sample information to determine m 0,... the number of binding inequalities. Employ bootstrap method to estimate F µ,ω. January 19, 2004 A New Test for SPA/1 (p.20)

56 A TESTS FOR SUPERIOR PREDICTIVE ABILITY The estimator ˆµ u k 0 results in a conservative test. The constrained estimator ˆµ l k min X k, 0 results in a liberal test. Consider instead: { X ˆµ c k if X k < A k,n k 0 if X k A k,n, as A k,n /ω k as where A k,n 0 and. n 2 log log n Then µ k 0 P ˆµ c k 0 1 & µ k < 0 P ˆµ c k << 0 1, as n, due to the law of the iterate logarithm: X k µ k lim sup n 1 t n ω k n 2 log log n 1 almost surely. January 19, 2004 A New Test for SPA/2 (p.21)

57 A TESTS FOR SUPERIOR PREDICTIVE ABILITY January 19, 2004 A New Test for SPA/3 (p.22)

58 A TESTS FOR SUPERIOR PREDICTIVE ABILITY Thus, we have three estimators: ˆµ l k X k 1 { X k 0}, ˆµ c k X k 1 { X k A k,n}, ˆµ u k 0. Theorem 3 Let F i n be the cdf of T n ϕ n 1/2 Z i n, V n, where Given Assumptions 1 and 2, then n 1/2 Z i n ˆµ i p N m 0, Ω, i l, c, u. F c n F µ,ω as n and F l n x F c n x F u n x, for all n and all x R. The correction term, A n,k, is somewhat arbitrary. We use A k,n 1 4 n 1/4 ˆω k and we verify that A k,n as 0 and A k,n ˆω k/ω k n 1/4 ω k n 2 log log n 4 2 log log n as as n. January 19, 2004 A New Test for SPA/4 (p.23)

59 A TESTS FOR SUPERIOR PREDICTIVE ABILITY Lower Bound Consistent Distr. Upper Bound Reality Check Estimator: ˆµ l k min X k, 0 ˆµ k X k 1 X k< A k,n ˆµ u k 0 Similarity: Non-Similar Similar Non-Similar p-values: Liberal Consistent Conservative Sensitive: No No Yes January 19, 2004 A New Test for SPA/5 (p.24)

60 A TESTS FOR SUPERIOR PREDICTIVE ABILITY Asymptotic Tests of Composite Hypotheses General theory for asymptotic tests of composite hypotheses. E.g: Test of multiple linear equalities. ATCH: Theorem 5 A necessary condition for a test to be asymptotically unbiased, is that it is asymptotically similar on the boundary of the null hypothesis, Θ 0. In the present framework this amounts to determining the ks for which µ k 0. January 19, 2004 A New Test for SPA/6 (p.25)

61 A TESTS FOR SUPERIOR PREDICTIVE ABILITY 12. SIMULATION EXPERIMENT Design I. X k,t N 0, σ 2 k, k 1,..., m 0, X k,t N µ k, σ 2 k, µ k χ 2 1, k m 0 1,..., m. Design II. Mean Squared Error Loss: L k,t Y t Ŷ t 2. We let Y t Ŷ 0t N 0, 1 such that L 0,t χ 2 1. L k,t iid Uniform on 0, 2L 0,t, k 1,..., m 0 L k,t iid Uniform on 0, 2 L 0,t ζ k,t, k m 0 1,..., m, where ζ k,t is such that µ k < 0 for k m 0 1,... m. January 19, 2004 Simulation Results/1 (p.26)

62 A TESTS FOR SUPERIOR PREDICTIVE ABILITY Table 2: Type I Errors of RC and SPA Design I, Tests at 5% level Panel C m 10 and m 0 5 n RC l RC c RC u SP A l SP A c SP A u Panel D m 100 and m January 19, 2004 Simulation Results/2 (p.27)

63 A TESTS FOR SUPERIOR PREDICTIVE ABILITY Table 3: Type I Errors of RC and SPA Design II, Tests at 5% level Panel C m 10 and m 0 5 n RC l RC c RC u SP A l SP A c SP A u Panel D m 100 and m January 19, 2004 Simulation Results/3 (p.28)

64 A TESTS FOR SUPERIOR PREDICTIVE ABILITY January 19, 2004 Simulation Results/4 (p.29)

65 A TESTS FOR SUPERIOR PREDICTIVE ABILITY January 19, 2004 Simulation Results/5 (p.30)

66 A TESTS FOR SUPERIOR PREDICTIVE ABILITY 13. FORECASTING US INFLATION: Attempting to forecast Annual US Inflation, Y t, GDP price index Regression-Based Forecast using 1, 2, or 3 regressors, X k,t, chosen from a set of 29 macro-economic variables, Z i,t. Lagged Inflation Employment/Inventories/GDP Interest Rates Fuel/Energy Deterministic Variables: Quarterly Dummies Forecasts are given by where Ŷ k,t 1 ˆβ k,tx k,t 1, t 0,..., n 1, k 1,..., m, ˆβ k,t is the LS estimator regress Y τ on X k,τ for τ R 1,..., t. January 19, 2004 Simulation Results/6 (p.31)

67 A TESTS FOR SUPERIOR PREDICTIVE ABILITY Data are quarterly data: 1951:Q1 2000:Q4. Evaluation period is 1963:Q1-2000:Q4 some data used for initial estimation Benchmark Forecast is the Random Walk Forecast: Ŷ 0,t Y t 5. Large Universe: Includes all regression models with 1, 2, or, 3 regressors which results in 27! 27 3!3! 27! 27 2!2! 27! !1! We also include the average model: Ŷ 3304,t Ŷ k,t Small Universe: Regression models with three regressors. Always include lagged inflation coefficient fixed to be one. k 1 January 19, 2004 Simulation Results/7 (p.32)

68 A TESTS FOR SUPERIOR PREDICTIVE ABILITY Y t Z 1,t, Z 2,t Z 3,t, Z 4,t Z 5,t Z 6,t, Z 7,t Z 8,t Z 9,t Z 10,t, Z 11,t Z 12,t, Z 13,t Z 14,t Z 15,t, Z 16,t Z 17,t, Z 18,t Z 19,t, Z 20,t Z 21,t, Z 22,t Z 23,t Z 26,t Z 27,t January 19, 2004 Annual inflation Annual inflation lags of Y t Quarterly inflation Quarterly inflation relative to previous year s inflation Changes in employment in manufacturing sector Quarterly employment relative to average of previous year Quarterly employment relative to average of previous two years Quarterly changes in real inventory Quarterly changes in quarterly GDP Interest paid on 3-month T-bill Changes in 3-month T-bill Changes in 3-month T-bill relative to level of T-bill Changes in prices of fuel and energy Changes in prices of food Quarterly dummies: first, second, third, and fourth quarter Constant Simulation Results/8 (p.33)

69 A TESTS FOR SUPERIOR PREDICTIVE ABILITY Table 6: Large Universe Loss t-stat. p-value Evaluated by MAD Benchmark: m 3304 number of models Best Performing: n 160 sample size Most Significant: B 2, 000 resamples Median: q 0.5 dependence Worst: RC l RC c RC u SP A l SP A c SP A u SPA p-values: % Critical values: 5% % January 19, 2004 Simulation Results/9 (p.34)

70 A TESTS FOR SUPERIOR PREDICTIVE ABILITY Table 7: Small Universe Loss t-stat. p-value Evaluated by MAD Benchmark: m 352 number of models Best Performing: n 160 sample size Most Significant: B 2, 000 resamples Median: q 0.5 dependence Worst: RC l RC c RC u SP A l SP A c SP A u SPA p-values: % Critical values: 5% % January 19, 2004 Simulation Results/10 (p.35)

71 A TESTS FOR SUPERIOR PREDICTIVE ABILITY Table 8: Full Universe Loss t-stat. p-value Evaluated by MAD Benchmark: m 3, 656 number of models Best Performing: n 160 sample size Most Significant: B 2, 000 resamples Median: q 0.5 dependence Worst: RC l RC c RC u SP A l SP A c SP A u SPA p-values: % Critical values: 5% % January 19, 2004 Simulation Results/11 (p.36)

72 A TESTS FOR SUPERIOR PREDICTIVE ABILITY No. Description Annual inflation t 5 1 *** Annual inflation t 5 * Quarterly inflation relative to previous year s inflation t 7 Changes in employment in manufacturing sector t 1 8 ** Quarterly employment relative to average of previous year t 12 ** Quarterly changes in quarterly GDP t 13 * Quarterly changes in quarterly GDP t 1 14 Interest paid on 3-month T-bill t 15 Changes in 3-month T-bill t 18 * Changes in 3-month T-bill relative to level of T-bill t 1 January 19, 2004 Simulation Results/12 (p.37)

73 A TESTS FOR SUPERIOR PREDICTIVE ABILITY 14. SUMMARY Test for Superior Predictive Ability. Find a better forecasting model Test economic theory A New Test for SPA: Differs from existing test... Choice of Test Statistic Distribution under the Null hypothesis The new test avoids... Sensitivity to inclusion of irrelevant models Lack of Power Biased Test Is there a need to revisit existing applications? E.g. are calendar effects real? Random Walk Inflation Forecast is Significantly Outperformed January 19, 2004 Summary/1 (p.40)

74 A Forecast Comparison of Volatility Models: Does Anything Beat a GARCH 1,1? Peter Reinhard Hansen Department of Economics Brown University Asger Lunde Department of Information Science The Aarhus School of Business We thank the Danish Research Agency for financial support.

75 DOES ANYTHING BEAT A GARCH(1,1)? 1.1. Results Exchange rate data: No evidence that the GARCH 1,1 is outperformed. IBM Return data: The GARCH 1,1 is clearly outperformed. The best models have a leverage effect. Best is A-PARCH 2,2. General Results: The mean specification is irrelevant. Gaussian- versus t-specification: Ambiguous. August 3, 2004 c Hansen & Lunde 2001 Objective of the Paper/3 (p.4)

76 DOES ANYTHING BEAT A GARCH(1,1)? 4.2. The Conditional Variance Scale Parameter The scale parameter is modelled with h 2 t σ 2 t 1 ; θ. GARCH-type models: The conditional variance. ARCH: GARCH: IGARCH σ 2 t ω p i 1 α iε 2 t i σ 2 t ω p i 1 α iε 2 t i q j 1 β jσ 2 t j σ 2 t ω ε 2 t 1 p i 2 α i ε 2 t i ε 2 t 1 q j 1 β j σ 2 t j ε2 t 1 Taylor/Schwert: σ t ω p i 1 α i ε t i q j 1 β jσ t j August 3, 2004 c Hansen & Lunde 2001 The GARCH Universe/3 (p.9)

77 DOES ANYTHING BEAT A GARCH(1,1)? A-GARCH: NA-GARCH: V-GARCH: σ 2 t ω p i 1 [ αi ε 2 t i γ i ε t i ] q j 1 β jσ 2 t j σ 2 t ω p i 1 α i ε t i γ i σ t i 2 q j 1 β jσ 2 t j σ 2 t ω p i 1 α i e t i γ i 2 q j 1 β jσ 2 t j Thr.-GARCH: σ t ω p i 1 α [ i 1 γi εt i 1 γ i εt i ] q j 1 β jσ t j GJR-GARCH: σt 2 ω p 1 [ ] i 1 αi γ i I {εt i>0} ε 2 t i q j 1 β jσt j 2 log-garch: EGARCH: NGARCH a : log σ t ω p i 1 α i e t i q j 1 β j log σ t j log σt 2 ω p [ ( i 1 αi e t i γ i et i E e t i )] q j 1 β j log σt j 2, σ δ t ω p i 1 α i ε t i δ q j 1 β jσ δ t j August 3, 2004 c Hansen & Lunde 2001 The GARCH Universe/4 (p.10)

78 DOES ANYTHING BEAT A GARCH(1,1)? A-PARCH: GQ-ARCH: σ δ ω p i 1 α [ ] δ q i εt i γ i ε t i j 1 β jσt j δ σ 2 t ω p i 1 α iε t i p i 1 α iiε 2 t i p i<j α ijε t i ε t j q j 1 β jσ 2 t j H-GARCH: σt δ ω p i 1 α iδσt i δ e t κ τ e t κ ν q j 1 β jσt j δ Aug-GARCH b : δφ σt 2 t δ 1 1/δ if δ 0 exp φ t 1 if δ 0 φ t ω p i 1 [ α1i ε t i κ ν α 2i max 0, κ ε t i ν] φ t j p i 1 α 3if ε t i κ, ν α 4i f max 0, κ ε t i, ν φ t j q j 1 β jφ 2 t j a This is A-PARCH without the leverage effect. b Here f x, ν x ν 1 /ν. August 3, 2004 c Hansen & Lunde 2001 The GARCH Universe/5 (p.11)

79 DOES ANYTHING BEAT A GARCH(1,1)? 11. RESULTS Exchange Rate Data DM/USD Benchmark: ARCH 1 Criterion Performance p-values Bench. Worst Median Best Naive SPA l SPA c SPA u MSE MSE QLIKE R2LOG MAD MAD August 3, 2004 c Hansen & Lunde 2001 Results/1 (p.35)

80 DOES ANYTHING BEAT A GARCH(1,1)? Exchange Rate Data DM/USD Benchmark: GARCH 1,1 Criterion Performance p-values Bench. Worst Median Best Naive SPA l SPA c SPA u MSE MSE QLIKE R2LOG MAD MAD August 3, 2004 c Hansen & Lunde 2001 Results/2 (p.36)

81 DOES ANYTHING BEAT A GARCH(1,1)? IBM Data Benchmark: ARCH 1 Criterion Performance p-values Bench. Worst Median Best Naive SPA l SPA c SPA u MSE MSE QLIKE R2LOG MAD MAD August 3, 2004 c Hansen & Lunde 2001 Results/3 (p.37)

82 DOES ANYTHING BEAT A GARCH(1,1)? IBM Data Benchmark: GARCH 1,1 Criterion Performance p-values Bench. Worst Median Best Naive SPA l SPA c SPA u MSE MSE QLIKE R2LOG MAD MAD August 3, 2004 c Hansen & Lunde 2001 Results/4 (p.38)

83 DOES ANYTHING BEAT A GARCH(1,1)? IBM Data White s RC. Benchmark: ARCH 1 Criterion Performance p-values Bench. Worst Median Best Naive SPA o l SPA o c RC MSE MSE QLIKE R2LOG MAD MAD August 3, 2004 c Hansen & Lunde 2001 Results/5 (p.39)

84 DOES ANYTHING BEAT A GARCH(1,1)? IBM Data White s RC. Benchmark: GARCH 1,1 Criterion Performance p-values Bench. Worst Median Best Naive SPA o l SPA o c RC MSE MSE QLIKE R2LOG MAD MAD August 3, 2004 c Hansen & Lunde 2001 Results/6 (p.40)

85 Romano Wolf (2005) Stepwise Testing Peter Reinhard Hansen (SMU 2013) EPA, SPA and MCS February 25-26, / 49

86 Peter Reinhard Hansen (SMU 2013) EPA, SPA and MCS February 25-26, / 49

87 Stepwise Testing The multiple hypotheses of interest H i : µ i 0, i = 1,..., M. RC and SPA test control Pr(rejecting one or more true hypotheses) α, when all are true. Weak Control of the Familywise Error Rate (FWER) Romano-Wolf: A mixed environment of true and false hypotheses Pr(rejecting one or more true hypotheses) α, Peter Reinhard Hansen (SMU 2013) EPA, SPA and MCS February 25-26, / 49

88 Stepwise Testing When the SPA test reject the null H 0 : µ i 0 for all i = 1,..., M. We can concluded that {i : T i = n 1/2 di > c α } are significantly better than benchmark, i = 0. Stepwise procedure continues. M = {1,..., M} reject some... Do another test for SPA with hypotheses not rejected in first round. Fewer hypotheses... smaller critical value for T max... additional may be rejected. Continue until no additional hypothesis can be rejected. Peter Reinhard Hansen (SMU 2013) EPA, SPA and MCS February 25-26, / 49

89 Intuition from Holm s Improvement of Bonferroni Test H 1,..., H M. Let R i be the event that we reject H i, i = 1,..., M. Probability of rejecting one or more hypotheses: Pr(R 1 R 2 R M ) Pr(R 1 ) + + Pr(R M ). So Pr(R i ) α M Pr(R 1 R M ) α. Holm... if K rejected in first step... test non-rejected again using Pr(R i ) α M K. Continue until no additional hypothesis can be rejected. Peter Reinhard Hansen (SMU 2013) EPA, SPA and MCS February 25-26, / 49

90 Hansen, Lunde, Nason (2010) Model Confidence Set Peter Reinhard Hansen (SMU 2013) EPA, SPA and MCS February 25-26, / 49

91 Model Confidence Sets for Forecasting Models PETER REINHARD HANSEN Stanford University and CREATES ASGER LUNDE Aarhus School of Business, Aarhus University and CREATES JAMES M. NASON Federal Reserve Bank of Atlanta

92 MODEL CONFIDENCE SET 1. OUTLINE OF TALK Motivation (Forecasting US Inflation) Model Confidence Sets (MCS) Key Properties MCS p-values Bootstrap Implementation Simulation Study Empirical Application to US Inflation Forecasts Hansen, Lunde & Nason 2004 (p.1)

93 MODEL CONFIDENCE SET 2. MOTIVATION (FORECASTING TO US INFLATION) We revisit Stock & Watson (1999) who compare a large number of inflation forecasting models. They consider a Phillips curve: π t+h π t = φ + β(l)u t + γ(l)(1 L)π t + e t+h where π t is the rate of inflation and u t is the unemployment rate. Alternative specifications where u t is replaced by other macro variables.... Random walk, AR(p), multivariate forecast combinations, and principal component models. Hansen, Lunde & Nason 2004 (p.2)

94 MODEL CONFIDENCE SET Stock and Watson JME (1999) table 2. J.H. Stock, M.W. Watson / Journal of Monetary Economics 44 (1999) 293} Table 2 Forecasting performance of alternative real activity measures PUNEW GMDC 1970} } } }1996 Variable Trans Re1. MSE λ Re1. MSE λ Re1. MSE λ Re1. MSE λ No change !0.05 (0.59) (0.07) (1.59) (0.08) (0.18) (0.15) (1.31) (0.05) Univariate } 1.26! (0.19) (0.25) (0.15) (0.33) (0.15) (0.38) (0.09) (0.29) &Gaps' specixcations ip DT (0.11) (0.34) (0.08) (0.29) (0.08) (0.37) (0.04) (0.26) gmpyq DT 1.23! (0.16) (0.26) (0.12) (0.13) (0.12) (0.29) (0.10) (0.18) msmtq DT (0.08) (0.35) (0.11) (0.24) (0.09) (0.43) (0.09) (0.37) lpnag DT (0.12) (0.46) (0.08) (0.28) (0.11) (0.45) (0.08) (0.35) ipxmca LV (0.06) (0.32) (0.09) (0.27) (0.06) (0.49) (0.06) (0.30) hsbp LN (0.10) (0.26) (0.24) (0.23) (0.15) (0.37) (0.17) (0.26) lhmu25 LV (0.06) (0.41) (0.10) (0.36) (0.06) (0.50) (0.06) (0.36) First diwerences specixcations ip DLN (0.05) (0.30) (0.12) (0.25) (0.15) (0.60) (0.09) (0.19) gmpyq DLN (0.08) (0.20) (0.24) (0.14) (0.22) (0.29) (0.18) (0.13) msmtq DLN (0.07) (0.27) (0.13) (0.24) (0.16) (0.51) (0.11) (0.23) lpnag DLN (0.06) (0.27) (0.09) (0.28) (0.13) (0.53) (0.08) (0.35) dipxmca DLV (0.07) (0.36) (0.16) (0.29) (0.15) (0.57) (0.10) (0.16) dhsbp DLN 1.28! (0.19) (0.26) (0.16) (0.23) (0.16) (0.35) (0.09) (0.28) dlhmu25 DLV ! (0.08) (0.44) (0.12) (0.28) (0.15) (0.67) (0.08) (0.23) dlhur DLV ! (0.06) (0.55) (0.11) (0.68) (0.17) (0.79) (0.08) (0.25) Phillips curve RMSEs (% per annum) LHUR RMSE Hansen, Lunde & Nason 2004 (p.3)

95 MODEL CONFIDENCE SET 3. THE BEST FORECASTING MODEL Setup: Ojective: To forecast: Y t, t = 1,..., n. Competing forecasts: Ŷ i,t, i = 1,..., m Evaluated with a loss function: L i,t L(Y t, Ŷ i,t ). and ranked in terms of expected loss, E(L i,t ) Best model, i, solves: E(L i,t) = min i E(L i,t ) More general setup: Interval Forecasts, Density Forecast, Value-at-Risk Hansen, Lunde & Nason 2004 (p.4)

96 MODEL CONFIDENCE SET Which is the best model? Too difficult a question for most data sets. Often there is not sufficient information in the data to identify a single model as significantly superior to all other models. Aim of our paper: Given a set of competing models M 0 Construct a procedure that yields, M 1 α. Where M 1 α contains the best model with prob. (1 α) Hansen, Lunde & Nason 2004 (p.5)

97 MODEL CONFIDENCE SET 3.1. Tests for EPA and SPA EPA: Test for Equal Predictive Ability H 0 : E(L 1,t ) = = E(L m,t ). Diebold and Mariano (1995) (m = 2): Assume d t = L 1,t L 2,t satisfies a CLT. d/ var( d) d N(0, 1). Improvements suggested by Harvey, Leybourne, and Newbold (1997). Hansen, Lunde & Nason 2004 (p.6)

98 MODEL CONFIDENCE SET West (1996). (Parameter uncertainty): Assume L t = (L 1,t,..., L m,t ) satisfies CLT. Refinements: n 1/2 ( L n ξ) d N m (0, Ω). Estimation Scheme: (West and McCracken, 1998) Non-differentiable loss function: (McCracken, 2000) Nested Models: (Harvey & Newbold, 2000, and Clark & McCracken, 2001) Conditional EPA. H 0 : E(L 1,t F t 1 ) = E(L 2,t F t 1 ): (Giacomini & White, 2003) Hansen, Lunde & Nason 2004 (p.7)

99 MODEL CONFIDENCE SET SPA: Test for Superior Predictive Ability Given a benchmark model, i = 0. H 0 : µ i = E(L 0,t L i,t ) 0 for all i = 1,..., m. White (2000): The Reality Check for data snooping (RC). A test of multiple (m) inequalities. Hansen (2005): A Test for Superior Predictive Ability. Propose two modifications of the RC. A standardized test statistic Data-dependent choice for µ under the null hypothesis. Advantages: More power and not sensitive to irrelevant models. Romano & Wolf (2003): Stepwise Multiple Testing as... Determine all alternatives that dominate the benchmark. Hansen, Lunde & Nason 2004 (p.8)

100 MODEL CONFIDENCE SET 3.2. MCS versus SPA The MCS method is Benchmark-Free. SPA can answer: Q SPA : Is the benchmark model significantly outperformed or not? MCS addresses the question: Q MCS : Which models may be the best? The MCS procedure is based on tests for EPA. Avoids the composite hypotheses testing problems. The MCS procedure yields a p-value for each model! Hansen, Lunde & Nason 2004 (p.9)

101 MODEL CONFIDENCE SET We seek the best forecasting model(s). It may be tempting to apply RC/SPA m times... using each of the models as benchmark. The benchmarks that we fail to reject survives Properties of this procedure are somewhat unclear. Hansen, Lunde & Nason 2004 (p.10)

102 MODEL CONFIDENCE SET 4. THEORY OF MODEL CONFIDENCE SETS Consider a set, M 0, with objects indexed by i = 1,..., m. The objects are evaluated over the sample t = 1,..., n, in terms of a loss function, L i,t. Define the relative performance variables about which we assume: d ij,t L i,t L j,t, for all i, j M 0 E(d ij,t ) is finite and does not depend on t for all i, j M 0. Hansen, Lunde & Nason 2004 (p.11)

103 MODEL CONFIDENCE SET Definition 1: The set of superior objects is defined by M {i M 0 : E(d ij,t ) 0 for all j M 0 }. Let M {i M 0 : E(d ij,t ) > 0 for some j M 0 }. The MCS procedure applies tests of: H 0,M : E(d ij,t ) = 0 for all i, j M M 0, against the alternative H A,M : E(d ij,t ) 0 for some i, j M. H 0,M is always true given our definition of M, H 0,M is always false if M contains elements from both M and M. Hansen, Lunde & Nason 2004 (p.12)

104 MODEL CONFIDENCE SET The MCS procedure is based on An equivalence test, δ M, used to test H 0,M for any M M 0. An elimination rule, e M, that identifies the object to be removed from M if H 0,M is rejected. Definition 2: MCS Algorithm Step 0: Initially set M = M 0. Step 1: Test H 0,M using δ M at level α. Step 2: If H 0,M is accepted define M 1 α = M, otherwise use e M to eliminate objects from M and repeat steps 1 & 2. We refer to the set, M 1 α, of surviving objects (those that survive all tests without being eliminated) as the Model Confidence Set. Hansen, Lunde & Nason 2004 (p.13)

105 MODEL CONFIDENCE SET Underlying Testing Principle: Intersection-Union Tests of Berger (1982), applied by Pantula (1989) for selection of the lag-length in an autoregressive process... the order of integration, e.g. I(0), I(1), or I(2). Johansen (1988) for selecting the cointegration rank. (Sequential trace-test of Anderson, 1951). Related Statistical Procedures, (Miller 1966, Gupta & Panchapakesan 1979, Hsu 1996) Multiple Comparisons with Best (=MCS) (Horrace & Schmidt, 2000) Multiple Comparisons with Control (=RC/SPA) (Romano & Wolf, 2003) Hansen, Lunde & Nason 2004 (p.14)

106 MODEL CONFIDENCE SET If (δ M, e M ) satisfies the following assumption. Assumption 1: For any M M 0 we assume: (a) δ M is level α asymptotically. (b) δ M has unit power asymptotically. (c) Inferior models are asymptotically removed first. Then the term Confidence Set is appropriate for our MCS: Theorem 1: Properties of MCS, Given Assumption 1 it holds that (i) lim n P (M M 1 α ) 1 α, and (ii) lim n P (i M 1 α ) = 0 for all i M. Hansen, Lunde & Nason 2004 (p.15)

107 MODEL CONFIDENCE SET More on MCS properties The MCS procedure does not suffer from a build-up Type I errors, because the M 1 α is determined after the first acceptance. The probability that one (or more) superior models will be eliminated is asymptotically bounded by α (the level used in all tests). The familywise (Type I) error rate is α. The MCS procedure exploits the fact that the power converges to unity as the sample size increases. All inferior models are (eventually) eliminated. When the test lacks power, the MCS will be too large and will contain models from M. (A Type II error). The MCS should be large when the data does not contain sufficient information to tell the good and bad models apart. Hansen, Lunde & Nason 2004 (p.16)

108 MODEL CONFIDENCE SET When M consists of a single model, we obtain a stronger result. Corollary 1: If Assumption 1 holds and M is a singleton. Then lim P n (M = M 1 α ) = 1. Hansen, Lunde & Nason 2004 (p.17)

109 MODEL CONFIDENCE SET 4.1. MCS p-values How plausible is it that model i is best? MCS p-value: Model i M 1 α if and only if ˆp i α. e i ˆP (δ M is true) MCS p-value ˆp 1 = ˆp 2 = ˆp 3 = ˆp 4 = m n/a ˆp m = 1.00 Hansen, Lunde & Nason 2004 (p.18)

110 MODEL CONFIDENCE SET 4.2. Implementation of MCS Assumption 1 places conditions on equivalence test and elimination rule. We propose particular ones that statisfy these conditions. Bootstrap implementation justified by the following assumption. Assumption 2: For some r > 2 and δ > 0 it holds that E d ij,t r+δ < for all i, j M 0, and that {d ij,t } i,j M0 is α-mixing of order r/(r 2). Hansen, Lunde & Nason 2004 (p.19)

111 MODEL CONFIDENCE SET Individual t-statistics: t i = d i, for i M, var( d i ) where d i = ( L i L ). How good is model i relative to average sample loss across all models. Note that H 0,M is equivalent to E( d i ) = 0, i M. Test statistic: T max max i M t i. The asymptotic distribution of T max is non-standard (depends on nuisance parameters). Use Bootstrap. Hansen, Lunde & Nason 2004 (p.20)

112 MODEL CONFIDENCE SET Elimination Rule: We need an elimination rule, e M, that satisfies Assumption 1. With T max a natural elimination rule is e M arg max t i i, Removes the model that contributes most to the test statistic, T max, among the models with a sample performance that is worse than the average across models. Specifically, e M selects the object that has the largest standardized excess loss, relative to the average across all models in M. Hansen, Lunde & Nason 2004 (p.21)

113 MODEL CONFIDENCE SET 5. BOOTSTRAP IMPLEMENTATION 1. (Sample and Bootstrap Statistics) (a) Obtain the variables L i,t, for i = 1,..., m, and t = 1,..., n. (b) Calculate the sample averages for each model L i, 1 n n t=1 L i,t. (c) The corresponding bootstrap variables are given by L b,i,t =L i,τ b,t, for b = 1,..., B, i = 1,..., m, and t = 1,..., n E.g.(τ b,1,..., τ b,n ) = (4, 5, 7, n 1, n, 1,..., 2, 3) and calculate the bootstrap sample averages, L b,i 1 n n t=1 L b,i,t. (d) The only variables that need to be stored are L i and ζ b,i L b,i L i. Hansen, Lunde & Nason 2004 (p.22)

114 MODEL CONFIDENCE SET 2. (Sequential Testing) Initialize by setting M = M 0 (a) Let m denote the number of elements in M, and calculate L 1 m m L i, i=1 Define t i d i / ζ b, = 1 m m i=1 ζ b,i, and var( d i ) 1 B var( d i ) and calculate T max = max i M t i. B (ζ b,i ζb, )2. b=1 (b) The bootstrap estimate of T max s distribution is given by the empirical distribution of T D,b = max i M t b,i, for b = 1,..., B, where t b,i (ζ b,i ζ b, )/ var( d i ). Hansen, Lunde & Nason 2004 (p.23)

115 MODEL CONFIDENCE SET (c) The p-value of H M,0 is given by ˆp(m) 1 B B 1 {Tmax >Tmax,b}, b=1 where 1 { } is the indicator function. (d) If ˆp(m) < α, where α is the level the test, then H M,0 is rejected and e M arg max i t i is eliminated from M. (e) The steps in 2.(a)-(e) are repeated until first acceptance. The resulting set of models is denoted M 1 α and referred to as the (1 α) MCS. Hansen, Lunde & Nason 2004 (p.24)

116 MODEL CONFIDENCE SET 6. SIMULATION EXPERIMENTS We consider two designs that are based on the m-dimensional vector, µ = λ n (0, 1 m 1,..., m 2 m 1, 1), of the relative performances (we ensure that E(d ij,t ) = µ i µ j ). M consists of a single element, unless λ = 0, where M = M 0. The covariance structure is primarily defined by { 1 for i = j, X t iid N m (0, Σ), where Σ ij = ρ for i j, for some 0 ρ 1. Hansen, Lunde & Nason 2004 (p.25)

117 MODEL CONFIDENCE SET 6.1. Design I In this design we define the (vector of) loss variables to be L t µ + and ε t iid N(0, 1). a t ϕ X t, where ln(a t ) = E(a 2 t ) 2(1 + ϕ) + ϕ ln(a t 1) + ϕε t, This implies that var(l t ) = 1. ϕ = 0 corresponds to homoskedastic errors and ϕ > 0 to (GARCH-type) heteroskedastic errors. For our simulations we select λ = 0, 5, 10, 20, ρ = 0.00, 0.50, 0.75, 0.95, ϕ = 0.0, 0.5, 0.8, m = 10, 40, 100 with 2, 500 repetitions. We use the block-bootstrap (l = 2) and (B = 1, 000) resamples. We use n = 250 as the sample size, because this is in the order of magnitude that is common in empirical studies of macro economic variables. Hansen, Lunde & Nason 2004 (p.26)

118 MODEL CONFIDENCE SET Simulation Design I (n = 250). Panel A: ϕ = 0 Frequency at which M M 90% (size) m = 10 m = 40 m = 100 ρ = λ = λ = λ = λ = λ = Average number of elements in M 90% (power) λ = λ = λ = λ = λ = Size and power properties appear to be good. Hansen, Lunde & Nason 2004 (p.27)

119 MODEL CONFIDENCE SET The MCS becomes better at separating the inferior models from the superior model, as E(d ij,t ) increases (as λ increases). We also note that a strong correlation makes it easier to separate inferior models from superior model. This is not surprising because var(d ij,t ) = var(l it ) + var(l jt ) 2cov(L it, L jt ) = 2(1 ρ) which is decreasing in ρ. When λ > 0, the consistency result of Corollary 1 applies, which gives us M = M (when both statistics equal one). Hansen, Lunde & Nason 2004 (p.28)

120 MODEL CONFIDENCE SET Simulation Design I (n = 250, cont.). Panel C: ϕ = 0.8 Frequency at which M M 90% (size) m = 10 m = 40 m = 100 ρ = λ = λ = λ = λ = λ = Average number of elements in M 90% (power) λ = λ = λ = λ = λ = Heteroskedasticity adds power to the MCS procedure: the average number of models in M 90% tends to fall as ϕ is increased. Hansen, Lunde & Nason 2004 (p.29)

121 MODEL CONFIDENCE SET This design has 6.2. Design II (MSE-type loss) L t iid N 10 (µ, Σ) with the following covariance structure: For the mean vector we use Σ ij = ρ i j, for ρ = 0.00, 0.50, 0.75, µ = (0,..., 0, 1 10,..., 1 25 ) where m 0 = 10, so the number of zeros in µ defines the number of elements in M. We report results when M consists of a single, two and five models. Hansen, Lunde & Nason 2004 (p.30)

122 MODEL CONFIDENCE SET One good model ents Freq. at which best is in MCS ρ = 0 ρ = 0.5 ρ = 0.75 PSfrag replacements Average size of MCS ρ = 0 ρ = 0.5 ρ = Sample size Sample size Hansen, Lunde & Nason 2004 (p.31)

123 MODEL CONFIDENCE SET Two good models 10 ents Freq. at which best is in MCS ρ = 0 ρ = 0.5 ρ = 0.75 PSfrag replacements Average size of MCS ρ = 0 ρ = 0.5 ρ = Sample size Sample size Hansen, Lunde & Nason 2004 (p.32)

124 MODEL CONFIDENCE SET Five good models ents Freq. at which best is in MCS ρ = 0 ρ = 0.5 ρ = 0.75 PSfrag replacements Average size of MCS ρ = 0 ρ = 0.5 ρ = Sample size Sample size Hansen, Lunde & Nason 2004 (p.33)

125 MODEL CONFIDENCE SET 7. EMPIRICAL APPLICATION TO US INFLATION We revisit the comparison of inflation forecasts by Stock & Watson (1999). Basic forecasting model is a Phillips curve model π t+h π t = φ + β(l)x t + γ(l)(1 L)π t + e t+h where π t is the rate of inflation: (PUNEW ) CPI-U all items, and (GMDC) the personal consumption expenditure implicit price deflator, and x t represents u t is the unemployment rate.... many other macro variables.... principal component variables. Simpler models such as a random walk model and AR(p) models are also evaluated. Hansen, Lunde & Nason 2004 (p.34)

126 MODEL CONFIDENCE SET 1. subsample ments 1-month inflation (%) PUNEW GMCD 2. subsample ρ = 0 ρ = 0.5 ρ = Sample period: 1960 to Hansen, Lunde & Nason 2004 (p.35)

127 MODEL CONFIDENCE SET MCS p-values for Stock and Watson JME (1999, table 2). PUNEW: PUNEW: Variable Trans RMSE p 6 p 9 p 12 RMSE p 6 p 9 p 12 No change (month) No change (year) uniar Gaps specifications dtip DT dtgmpyq DT dtmsmtq DT dtlpnag DT ipxmca LV hsbp LN lhmu25 LV First difference specifications ip DLN gmpyq DLN msmtq DLN lpnag DLN dipxmca DLV dhsbp DLN dlhmu25 DLV dlhur DLV Phillips curve LHUR P -values marked with stars are in M 90%. Hansen, Lunde & Nason 2004 (p.36)

128 MODEL CONFIDENCE SET MCS p-values for Stock and Watson JME (1999, table 4). PUNEW: PUNEW: Variable RMSE p 6 p 9 p 12 RMSE p 6 p 9 p 12 No change (month) No change (year) Univariate Panel A. All indicators Mul. factors factor Comb. mean Comb. median Comb. ridge reg Panel B. Real activity indicators Mul. factors factor Comb. mean Comb. median Comb. ridge reg Panel C. Interest rates Mul. factors factor Comb. mean Comb. median Comb. ridge reg Panel D. Money Mul. factors factor Comb. mean Comb. median Comb. ridge reg Phillips curve LHUR Hansen, Lunde & Nason 2004 (p.37)

129 MODEL CONFIDENCE SET MCS contains few models in first subsample and many (almost all) models in second subsample. Because the (population) ranking of models have changed.... the second subsample is less informative, due to the decline in inflation variability. Multifactor and one-factor models consistently do very well. E.g., only the multi-factor and one-factor specifications for all indicators and real activity indicators appear in the MCS of PUNEW in the 1970:M1-1983:M12 subsample. The random walk model does poorly: Both subsamples and both inflation measures. Hansen, Lunde & Nason 2004 (p.38)

130 MODEL CONFIDENCE SET 8. SUMMARY Introduced Model Confidence Sets (MCS) for Forecasting Models More informative than model-selection methods. Serve as a tool to trim the set of candidate models, to find combinations of models that may yield better forecasts. Informative about the individual models: p-values. Simulation. Empirical Application to Inflation Forecasts Hansen, Lunde & Nason 2004 (p.39)

131 MODEL CONFIDENCE SET What s good about a MCS, M 1 α. General. Based on criterion specified by the user Typically a loss function L. Does not rely on correct specification, but {L it } s stochastic properties. Honest. Acknowledges the limitations of empirical analyses. Depends on the amount of information in the data under investigation. Informative. The interpretation of a MCS is analogous to a confidence interval for a parameter. (A model selection criterion is analogous to a point estimate). Natural. More than one model may qualify as being the best model. Hansen, Lunde & Nason 2004 (p.40)

132 Model Confidence Set Regression Models Peter Reinhard Hansen (SMU 2013) EPA, SPA and MCS February 25-26, / 49

133 Peter Reinhard Hansen (SMU 2013) EPA, SPA and MCS February 25-26, / 49

134 Peter Reinhard Hansen (SMU 2013) EPA, SPA and MCS February 25-26, / 49

135 Peter Reinhard Hansen (SMU 2013) EPA, SPA and MCS February 25-26, / 49

136 Peter Reinhard Hansen (SMU 2013) EPA, SPA and MCS February 25-26, / 49

Econ 423 Lecture Notes: Additional Topics in Time Series 1

Econ 423 Lecture Notes: Additional Topics in Time Series 1 John C. Chao April 25, 2017 1 These notes are based in large part on Chapter 16 of Stock and Watson (2011). They are for instructional purposes