Introduction to Forecasting and Forecast Evaluation

Size: px
Start display at page:

Download "Introduction to Forecasting and Forecast Evaluation"

Transcription

1 Introduction to Forecasting and Forecast Evaluation Lecture Notes to Acompany Talk Norman Rasmus Swanson Rutgers University contact: prepared for the Bank of Canada, May

2 Outline Part I. Prediction Basics - Loss Functions, Optimal Prediction, and Model Selection Part II. Parameter Estimation Error, Bootstrap Techniques, and Model Selection Part III. What Should We Be Predicting - Real-Time Data Part IV. Methods of Prediction - Some Comments Part V. Density Based Model Selection 2

3 1 Part I - Prediction Basics: Loss Functions, Optimal Prediction, and Model Selection 1.1 Loss Functions From an economic policy perspective, one of the main uses of econometric and statistical methods is to provide forecasts of macroeconomic and financial variables. For instance: Given the rate of inflation over the past twelve months, what will be the rate of inflation next month? What will be the rate two months from now? These predictions have important consequences for the formulation of economic policy (e.g. setting the bank lending rate, etc.) Suppose that the Federal Reserve Board forecasts a 1.5% annualized inflation rate for July 2008, while the Department of Treasury provides a forecast of 1%. How can we decide which of the two is more reliable? One key question thus concerns how we can measure the relative accuracy of different forecasts? Different models yield different forecasts, so we want to choose the model producing the most accurate one. Many econometric techniques deal with the in sample evaluation of models, and only recently has attention focused on out of sample model evaluation. A key difference between the two approaches is that in sample methods tend to select models that are too large. *Overfitting is a problem. * In-sample inference is another. Some of the issues arising are: (i) Choice of the loss function. Suppose X t+1 istherateofinflation at time t +1, andx (i) t+1/t rate of inflation forecasted at time t using model i. The forecast error implied by model i, is u i,t+1 = X t+1 X (i) t+1/t. is the 3

4 We want to choose model j over model i if, on average, model j produces smaller errors. Smaller in which sense? * Quadratic loss function: choose model j, if on average, u 2 j,t+1 <u 2 i,t+1 * Mean Absolute loss function: choose model j, if on average, u j,t+1 < u i,t+1 * Other sorts of Loss Functions? Direction of Change (Contingency tables), Profitability, etc. * Sometimes we are more concerned about positive errors than negative errors (or vice-versa), so we may want to use an asymmetric loss functions, such as linex (linear exponential) loss. (ii) Is it possible that for some loss function, model j beats model i, and for other loss functions model i beats model j? In general: yes. If we choose the right model, in the sense that we correctly specify the joint distribution of all of the relevant variables, then no other model can win. On the other hand, if the models we compare are (generally) misspecified, then the ranking of models is loss function specific (i.e. we would like to assume that all model are approximations to the truth - any other assumption seems overly strong). * This has implications on data transformation, particularly when using one model to predict more than one transformation of a variable. (iii) What is the effect of parameter estimation error (i.e. of estimating the models used in prediction). Suppose we forecast inflation at t+1 simply using inflation at time t, and we use a simple linear model. Say: X t = β 0 + β 1 X t 1 + u t. Thetrueforecastingerroris u t+1 = X t+1 β 0 β 1 X t. 4

5 However, we have to replace the unknown parameters with estimates, say b 0 and b 1. Thus the estimated forecast error becomes bu t+1 = X t+1 b 0 b 1 X t = u t+1 (b 0 β 0 ) (b 1 β 1 )X t (iv) Choice of forecast horizon. Given information up to time t, do we want to forecast inflation at t +1,t+2,...t+ k? Again, unless we have the right model, model i can beat model j for given forecast horizons, but model j can beat model i for different forecast horizons. 1.2 Optimal Prediction (i) Quadratic loss functions Consider a time series y t,, 2,..., T. Suppose we want to find the optimal predictor of y t,h steps ahead, using information available at time t. Let F t = σ(y 1,..., y t,x 1,...X t ), where X t is a (possibly vector valued) series that may help to predict y t. The optimal h step ahead predictor for y t+h, given F t, is the function by t+h/t such that E((y t+h by t+h/t ) 2 ) <E((y t+h ey t+h/t ) 2 ), for any ey t+h/t 6= by t+h/t. We know that by t+h/t = E(y t+h F t ) (i.e. the best predictor is the conditional expectation of y t given F t ). In fact suppose that ey t+h/t is an F t measurable function (e.g. any continuous function of y 1,..., y t,x 1,...X t is F t measurable), then E((y t+h ey t+h/t ) 2 )=E(((y t+h E(y t+h F t )) (ey t+h/t E(y t+h F t ))) 2 ) = E(((y t+h E(y t+h F t )) 2 )+E((ey t+h/t E(y t+h F t ))) 2 ) > E(((y t+h E(y t+h F t )) 2 ), 5

6 as E((y t+h E(y t+h F t ))(ey t+h/t E(y t+h F t ))) = 0. Thus, if we want to minimize the square error, the conditional expectation is the best predictor. Prediction with linear models Suppose that y t+1 = αy t + ² t+1, where ² t is a white noise process with zero mean and variance σ² 2 (i.e. consider an AR(1) process). Then the best one step predictor is E(y t+1 y t )=αy t, and the best h step predictor is E(y t+h y t )=α h y t, correspondingly the one step ahead prediction error is u t+1 = y t+1 αy t = ² t+1, and u t+h = y t+h α h y t = ² t+h α h 2 ² t+2 + α h 1 ² t+1. AR(p) processes can be treated in an analogous way. Prediction with nonlinear models Suppose where say or Now, Note further that: y t+1 = ag(y t )+² t+1, g(y t )=y t +1/(1 + exp( y t )) g(y t )=yt 2. E(y t+h y t )=ag(y t ). y t+2 = ag(y t+1 )+² t+2 = ag(ag(y t )+² t+1 )+² t+2. Because of the ² t+1 term entering into the nonlinear function g, it is not immediate how to get the two-step ahead prediction error. 6

7 In this case we can approximate E(y t+2 y t )with ag(ag(y t )) = ag(e(y t+1 y t )). Broadly speaking in order to get the h step ahead forecast, we begin by taking the one step ahead forecast (of which we now the closed form expression), then we predict one period ahead again replacing y t+1 (which is not observable) with E(y t+1 y t ). That is, replace it with its predicted value given the information and time. We then proceed in the next steps in the same manner. Sofarwehaveconsideredcasesinwhichy t depends only on its own past. Consider now the following model: y t+1 = β 0 + β 1 X t + ² t+1, so that E(y t+1 X t )=β 0 + β 1 X t. In order to compute h step ahead forecasts, for h>1, we need to know the data generating process of X t. In this case we approximate X t+1 with E(X t+1 X t ). That is, use: E(y t+2 X t )=β 0 + β 1 E(X t+1 X t ). (ii) Asymmetric loss functions We have seen that in the case of quadratic loss the best predictor is the conditional mean. In this case the problem of selecting the optimal forecast is equivalent to the problem of correctly specifying the conditional mean. However, there are several instances in which we are more concerned about positive errors (y t+h by t+h/t > 0) than about negative errors (y t+h by t+h/t < 0). Needless to say, arriving at the airport 5 minutes too late is more costly than arriving 5 minutes too early. In this case, then, we want to more heavily penalize positive errors. Two well known asymmetric loss function are Linex loss (linear exponential loss) and Lin-lin (linearlinear) loss. If we use Linex loss then we want to find the predictor by t+h/t such that E(exp(a(y t+h by t+h/t )) + a(y t+h by t+h/t ) 1) < E(exp(a(y t+h ey t+h/t )) + a(y t+h ey t+h/t ) 1), a6= 0, 7

8 for any ey t+h/t 6= by t+h/t. Note that for a>0, the loss is approximately linear to the left of the origin, while it is exponential to the right of the origin, and vice-versa for a<0. Thus, positive errors are considered more costly than negative errors. Christoffersen and Diebold (1997) show that in this case the best predictor is by t+h/t = E(y t+h F t )+ a 2 (Var(y t+h F t )) 1/2, where Var(y t+h F t )isthevarianceofy t+h conditional on the information available at time t. This formula is valid when y t+h F t = N(E(y t+h F t ),Var(y t+h F t )). Note that for a>0 (more weight on positive errors) the optimal predictor is larger than the optimal MSE predictor. In fact, as we are more concerned about positive errors, we purposely prefer an overestimate. Also, note that while E(y t+h F t ) is an unbiased predictor of y t+h, given the information available at time t, for a>0, E(y t+h F t )+ a 2 (Var(y t+h F t )) 1/2 is an upwardly biased predictor. In this case, knowledge of the optimal predictor requires knowledge of the joint specification of the conditional mean and variance. Another asymmetric loss is Lin-lin loss. If we use a Lin-lin loss, then we want to find the predictor by t+h/t such that E ³ a y t+h by t+h/t 1{y t+h > by t+h/t } + b y t+h by t+h/t 1{y t+h by t+h/t } < E ³ a y t+h ey t+h/t 1{y t+h > ey t+h/t } + b y t+h ey t+h/t 1{y t+h ey t+h/t } for any ey t+h/t 6= by t+h/t and a>0,b>0. This loss function increases linearly in the error, but for a>b it more heavily penalizes positive errors. If, y t+h F t is N(E(y t+h F t ),Var(y t+h F t )), then Christoffersen and Diebold show that the optimal predictor under Lin-lin loss is given by by t+h/t = E(y t+h F t )+(Var(y t+h F t )) 1/2 Φ 1 (a/(a + b)), where Φ denotes the CDF of a standard normal. As a>b,φ 1 (a/(a + b)) > 0 the optimal predictor is upwardly biased. 8

9 Example - GARCH Consider the following GARCH(1,1) model, y t = σ t ² t,² t iidn(0, 1) σt 2 = ω 0 + ω 1 σt ω 2 yt 1, 2 with ω 1 + ω 2 < 1, ω 0 > 0, ω 1 > 0, ω 2 > 0. Now, note that σt 2 = σ0 2 Xt 1 + ω 0 ω j Xt ω 2 ω1y j t 1 j, 2 j=0 j=0 and so σt 2 is a measurable function of the past squared returns. Thus, the relevant information set is F t 1 = σ(y 1,..., y t 1 ). Now, while E(y t F t 1 )=E(σ t ² t F t 1 )=σ t E(² t F t 1 )=0, E(y 2 t F t 1 )=Var(y t F t 1 )=E(σ 2 t ² 2 t F t 1 )=σ 2 t E(² 2 t F t 1 )=σ 2 t. Hence, if the loss function is quadratic, the optimal predictor is E(y t F t 1 )=0. If instead the loss function is Linex with a =1, the best predictor is E(y t F t 1 )=0.5σ t. Finally if the loss is a Lin-lin with parameter a =1andb =2, the optimal predictor is E(y t F t 1 )=σ t Φ 1 (2/3) = 0.4σ t. 9

10 1.3 Model Selection Comparing possibly misspecified forecasting models So far we have considered the issue of optimal prediction for given loss functions. In practice, the true data generating process (DGP) is unknown and so we form optimal predictions for given model(s) which may be (dynamically) misspecified. For example, suppose that we believe that y t follows an AR(1) process, so that the optimal h step ahead predictor is α h y t. However, if for example, the DGP (data generating process) is a SETAR (self-exciting autoregressive) process, then, say y t = α 1 y t 1 + α 2 y t 1 1{y t 1 > τ} + ² t. Now, the optimal forecast under the AR(1) assumption is clearly not optimal at all. Furthermore, in practice we need to define the relevant information set. Again, suppose that y t is an AR(2) process, but we are just considering a AR(1) model. Then, α h y t is indeed the optimal predictor (under quadratic loss), for the information set F t = σ(y t ); but we are indeed neglecting the information contained in y t 2. In this case E(y t+h y t ) is indeed correctly specified, but so there is dynamic mispecification. E(y t+h y t ) 6= E(y t+h y t,y t 1 ), Finally, the h step ahead prediction error is heteroskedastic and autocorrelated and, for the case of Linex loss for example, failing to take this into consideration, would lead to a non optimal forecast. Thus,inpracticewewanttobeabletocomparetherelativepredictiveabilityoftwoormore,possibly misspecified models. Note that the ranking of the models, in the misspecified case, is loss function specific. On the other hand, if we correctly specify all conditional aspects, then the right models will beat all competitors, regardless of the loss function choice. Diebold and Mariano (1995) propose a test for the null hypothesis of equal predictive ability, against the alternative of non equal predictive ability. For the time being, we neglect the issue of parameter estimation error. Let u 0,t+h and u 1,t+h be the h step ahead prediction errors, when predicting y t+h using information available up to time t. 10

11 and For example, for h =1, u 0,t+1 = y t+1 β 01 β 02 y t β 03 x t, u 1,t+1 = y t+1 β 11 β 12 y t β 13 z t. It is important that the two models we are comparing be nonnested (i.e. neither is a special case of the other). Under the assumption that u 0,t and u 1,t are strictly stationary, the hypotheses for this test of equal predictive accuracy are: H 0 : E(f(u 0,t ) f(u 1t )) = 0 and H A : E(f(u 0,t ) f(u 1t )) 6= 0 where f is some continuous positive valued loss function. The relevant statistic is where bσ 2 T is a consistent estimator of DM T = 1 T 1/2 1 bσ T 1/2 lim Var(T T T 1 X (f(u 0,t+1 ) f(u 1t+1 )), (f(u 0,t+1 ) f(u 1t+1 )). Note why we require non-nestedness: Suppose that model 1 is nested in model zero. e.g. and u 0,t+1 = y t+1 β 01 β 02 y t β 03 x t u 1,t+1 = y t+1 β 11 β 12 y t, If model 1 is indeed correctly (dynamically) specified for the conditional mean, then the null is equivalent to β 01 = β 11, β 02 = β 12, and β 03 =0, and so under the null u 0,t = u 1,t for all t. Moreover, in practice we do not observe u 0,t and u 1,t but we only observe bu 0,t and bu 1,t (that depend on estimated parameters). But we still have that bσ T and 1 T 1/2 1 bσ T T 1 X (f(u 0,t+1 ) f(u 1t+1 )) go to zero in probability, and the statistic no longer has a well defined limiting distribution. 11

12 As we allow for (dynamic) mispecification under both hypotheses, in general u 0,t and u 1,t are not martingale difference sequences, (i.e. E(u 0,t F t 1 ) 6= 0) and are in general autocorrelated. Thus, we need to use a heteroskedastic and autocorrelation robust covariance (HAC) estimator for the long run variance. We can use a Newey-West (1987) type estimator, for example. Namely, define bσ 2 T = 1 T T 1 X (d t+1 d) T Xl T 1 w τ τ=1 t=τ+1 (d t+1 d)(d t+1 τ d), where d = T 1 P T d t, and w τ =1 τ/(l T +1). The following is needed in the sequel. d t+1 = f(u 0,t+1 ) f(u 1t+1 ), Assumption A: (i) (u 0,t,u 1,t ) is a strictly stationary and strong mixing process with size 2r/(r 1) with r>1, (ii) E(f(u i,t ) 4 ) <, i=0, 1. ASIDE: Broadly speaking u 0,t is a strong mixing process if it is asymptotically independent, i.e. if u 0,t is independent of its infinite past. More formally, define F n = σ(u 0,,..., u 0,n 1,u n ) to be the information set generated from the infinite past of the series up to time n, and analogously F n+m = σ(u 0,n+m,u 0,n+m+1,..., u 0, ) is the information set generated by the history of the series from time n + m up to infinity. More precisely, if u 0,t is a strong mixing process, for any u 0,t F n+m, E((u 0,t E(u 0,t )) F n ) goes to zero as m. The size has to do which the rate at which this quantity goes to zero as m. Proposition 1: Let Assumption A hold. Then, as T,l T and l T /T 1/4 0, under H 0, DM T d N(0, 1), and under H A, for ε > 0. Pr(T 1/2 DM T > ε) 1, 12

13 Thus, we compare DM T with the critical value of a normal random variable. Suppose that we do not reject H 0 if 1.96 DM T 1.96; otherwise we reject H 0. This gives a test with asymptotic size equal to 0.05 and unit asymptotic power. Note that the same result holds for generic forecast horizon (i.e. for h>1). Proof - Sketch: Under both hypotheses, bσ 2 T pr σ0 2 1/2 = lim Var(T T Also, by the central limit theorem for mixing processes, T 1/2 T X Thus, when E(d t )=0, (i.e. when the null is true), T 1/2 d t+1 ). (d t+1 E(d t )) d N(0, σ 2 0). T X while when E(d t ) 6= 0 (i.e. under the alternative), diverges at rate T 1/2. d t+1 d N(0, σ 2 0), T 1/2 d t+1 Note that many applied practitioners do not even implement the simple DM test, instead relying on point estimates of mean square errors and related statistics when comparing alternative prediction models. 13

14 2 Part II - Parameter Estimation, Bootstrap Techniques, and Model Selection 2.1 Parameter Estimation Error In practice, we do not observe the true forecasting error. For simplicity, consider However, we do not know the vector β. u 0,t+1 = y t+1 β 01 β 02 y t β 03 x t, Thus, we need to replace the parameters with their estimator and take into account the error due to the fact that the parameters are estimated. There are three main sampling schemes. (i) fixed scheme (ii) recursive scheme (iii) rolling scheme. When interested in out of sample forecasting (and when we need to estimate parameters), we typically split the sample T into two subsamples, a regression period, with R observations, and a prediction period, with P observations, where T = R + P. Fixed estimation scheme: Use the first R observations to estimate the parameters, called them b β R, and construct a sequence of P prediction errors, defined as for t = R,..., R + P 1. bu 0,t+1 = y t+1 b β 01,R b β 02,R y t b β 03,R x t, Recursive estimation scheme: Use the first R observations to compute b β R, and construct the first prediction error: bu 0,R+1 = y R+1 b β 01,R b β 02,R y R b β 03,R x R. Then use all observations up to time R + 1 to construct b β R+1, and get the second prediction error bu 0,R+2 = y R+2 b β 01,R+1 b β 02,R+1 y R+1 b β 03,R+1 x R+1. Proceed in the same manner until you have a sequence of P prediction errors, defined as: for t = R,...R + P 1, bu 0,t+1 = y t+1 b β 01,t b β 02,t y t b β 03,t x t, 14

15 where b β t is the estimator computed using observations up to time t. Rolling estimation scheme: Use the first R observations to compute b β R, and construct the first prediction error: bu 0,R+1 = y R+1 b β 01,R b β 02,R y R b β 03,R x R. Then, observations from t =2uptot = R + 1 are used to construct b β 2,R+1, and a second prediction error in constructed: bu 0,R+2 = y R+2 b β 01,2,R+1 b β 02,2,R+1 y R+1 b β 03,2,R+1 x R+1. Thereafter, use observations from t =3tot = R + 2 and obtain another prediction error. Proceed in the same manner estimating the parameters using the most recent R observations, until you have a sequence of P prediction errors: bu 0,t+1 = y t+1 b β 01,t R+1,t b β 02,t R+1,t y t b β 03,t R+1,t x t, for t = R,...R + P 1, where b β t R+1,t is the estimator computed using observations from time t R + 1 up to time t; thatis using the most recent R observations (see West and McCracken (1998) for an overview of the properties of various sampling scheme). The most commonly used approach is the recursive scheme. Intuitively it makes sense to use the information contained in new observations as soon as it becomes available. However, one must also be aware of structural breaks due to changing data definitions, changing model specification, changing tastes and preferences, etc. (i) Effect of parameter estimation error when performing test for equal predictive accuracy. Let where and where and Further, define u 0,t+1 = y t+1 w 0 0,tβ 0 w 0,t =(1,y t,x t ) 0, β 0 =(β 01, β 02, β 03 ) 0 u 1,t+1 = y t+1 w 0 1tβ 1, w 1t =(1,y t,z t ) 0, β 1 =(β 11, β 12, β 13 ) 0, bu 0,t+1 = y t+1 w 0 0,t b β 0,t bu 1,t+1 = y t+1 w 0 1,t b β 1,t, where the parameters have been estimated recursively. 15

16 We observe bu 0,t+1 and bu 1,t+1, and so construct the Diebold-Mariano statistic using the out of sample period, as: ddm P = 1 P 1/2 1 bσ P T 1 X t=r (f(bu 0,t+1 ) f(bu 1t+1 )), (1) with where bσ 2 P = 1 P T 1 X t=r( b d t+1 b d) T Xl P 1 w τ τ=1 t=r+τ+1 bd t+1 = f(bu 0,t+1 ) f(bu 1t+1 ) ( b d t+1 b d) 2 ( b d t+1 τ b d), and bd = T 1 T X bd t+1. Assume that f is a differentiable function (this rules out Lin-lin loss, for example), via a mean value expansion around β 0 and β 1, we have: P 1/2 1 t=r (f(bu 0,t+1 ) f(bu 1t+1 )) = P 1/2 1 t=r (f(u 0,t+1 ) f(u 1t+1 )) (2) where P 1 1 t=r +P 1 1 t=r β0 f(eu 0,t+1 )P 1/2 ( b β 0t β 0 ) β1 f(eu 1,t+1 )P 1/2 ( b β 1t β 1 ), (3) eu 0,t+1 = y t+1 w 0 0,t e β 0,t with e β 0,t (β 0, b β t ). Also, eu 1,t+1 is definedinananalogousmanner. Note that the term on the right hand side of (2) is thesametermwehadintheabsenceofparameter estimation error (i.e. as if we knew the parameters). The main issue we shall address is the following: Do the last two terms above vanish in probability as the sample gets large? In other words, does the effect of parameter estimation error vanish as the sample size get large? Under which conditions will it vanish? We shall show that, in the context of DM type tests: (a) Regardless of the choice of the loss function, f, parameter estimation error vanishes if, as T, P/R 0, (i.e. if the estimation period grows at a faster rate than the prediction period grows). 16

17 Suppose that T = 10100, R= and P =100, in this case P = R 1/2, so P/R = R 1/2 0as R. In general, suppose that R =(T T δ ), δ < 1, and P = T δ (so T = R + P ). In this case P/R = T δ /(T T δ ) 0asT. In practice, this occurs when the period used for estimation is much longer than the period used for out of sample forecasting. (b) If the same loss function is used for estimation and out of sample prediction, then parameter estimation error vanishes, regardless the relative rates at which P and R grow as the sample size get large (for the case of the DM test). This is for example the case in which we use nonlinear (or ordinary) least squares for estimation and we employ a quadratic (MSE) loss function. More generally, this occurs when we estimate parameters via an m estimator andweusethesame loss function for out of sample prediction (as we shall see below). Several authors (see e.g. Granger (1969) and Weiss (1996)) point out that the right way to proceed is to use the same loss for estimation and prediction. (c) Finally, if P/R π > 0, and we use a different loss function for estimation and prediction, then the contribution of parameter estimation error does not vanish. In particular, it will affect the covariance of the limiting distribution and we need to take it into account, if we want to perform valid inference. (ii) m Estimators. Let: Examples: m(y t,x t, θ) =(y t X 0 tθ) 2 OLS bθ T =argmin θ Θ m(y t,x t, θ) =(y t h(x t, θ)) 2 Nonlinear Least Squares 1 T m(y t,x t, θ) (4) m(y t,x t, θ) = log f(y t X t ; θ) Maximum Likelihood or Quasi-MLE (QMLE). Now, define: Note that, given the above expression for b θ T, θ =argmin θ Θ E(m(y t,x t, θ)) (5) 17

18 1 P T T θ m(y t,x t, θ) bθt = 1 P T T θ m(y t,x t, θ b T )=0, because of the first order condition. Also, θ E(m(y t,x t, θ)) θ = θ E(m(y t,x t, θ )) = E( θ (m(y t,x t, θ ))) = 0. If the uniform law large numbers holds. That is, if then b θ T sup θ Θ pr θ (consistency). 1 T Now, by a mean value expansion around θ, (m(y t,x t, θ) E (m(y t,x t, θ))) 1 T = 1 T + 1 T θ m(y t,x t, b θ T ) θ m(y t,x t, θ ) 2 θm(y t,x t, e θ T ) ³b θ T θ, where θ e T ³b θ T, θ. Now, 1 P T T θ m(y t,x t, θ b T )=0. Thus: ³ T θ b T θ = à 1 T 2 θm(y t,x t, e θ T )! 1 1 pr 0, X T θ m(y t,x t, θ ) T = ³ E ³ 2 θm(y t,x t, θ 1 X ) 1 T θ m(y t,x t, θ ) T à 1 T 2 θm(y t,x t, e θ T ) E ³ 2 θm(y t,x t, θ )! 1 1 X T θ m(y t,x t, θ ) (6) T Now, if the uniform law of large number holds. That is, if sup θ Θ 1 T ³ 2 θ m(y t,x t, θ) E ³ 2 θm(y t,x t, θ) and E ( 2 θm(y t,x t, θ)) is a positive definite matrix, then à 1 T 2 θm(y t,x t, e θ T ) E ³ 2 θm(y t,x t, θ )! 1 18 pr 0, pr 0.

19 Note that E ³ θ m(y t,x t, θ ) =0, by first order conditions. Under regularity conditions (see e.g. West (1996)), the central limit theorem applies and where 1 X T θ m(y t,x t, θ ) d N(0,V), T V = lim T à Var à 1 X T θ m(y t,x t, θ ) T Now, the term on the last line in (6) is the product of something going in probability to zero with something converging in distribution, therefore it goes in probability to zero (product rule). Thus, T ³ b θ T θ d N(0,MVM),!!. with M = ³ E ³ 2 θm(y t,x t, θ ) 1. Now, we need to have estimator for M and V, call them c M and bv.by the uniform law of large numbers, a consistent estimator for c M, is given by. cm = à 1 T 2 θm(y t,x t, b θ T ) Now, for the estimator of b V,if E ³ θ m(y t,x t, θ ) θ m(y s,x t, θ ) 0 =0forallt 6= s, then! bv = 1 T θ m(y t,x t, b θ T ) θ m(y t,x t, b θ T ) 0, if E ³ θ m(y t,x t, θ ) θ m(y s,x t, θ ) 0 6= 0, then we need to use a Newey-West (HAC heteroskedastic autocorrelation robust) estimator. In this case, bv = 1 T + 2 T θ m(y t,x t, b θ T ) θ m(y t,x t, b θ T ) 0 Xl T X T w τ τ=1 θ m(y t,x t, b θ T ) θ m(y t τ,x t τ, b θ T ) 0 Under the same type of assumptions as in West (1996), ³ M c V b M 1/2 c ³ T θ b T θ d N(0,I). Note that V = M 1 (equivalent to the condition of spherical errors in the linear model). Turning to our example, let bu 0,t+1 = y t+1 w 0 0,t b β 0,t. 19

20 Suppose that b β 0t is a m estimator defined as 1 bβ 0t =argmin β 0 B t Note that if m is a quadratic function, then 1 bβ 0t =arg min β 0 B 0 t and so b β 0t is the OLS estimator. Also define, tx j=2 m(y j w 0 0,j 1β 0 ), (7) tx j=2 (y j w 0 0,j 1β 0 ) 2, β0 =arg min E(m(y j w 0 β 0 B 0 0,j 1β 0 )), (8) so that if m is a quadratic function, then β0 =argmin E((y j w β 0 0,j 1β 0 0 ) 2 ) B and if model zero is indeed correctly specified, then β0 denotes the parameter of the conditional expectation. bβ 1t and β 1 can be defined in the same manner. versus where We want to test H 0 : E(f(u 0,t ) f(u 1t )) = 0 H A : E(f(u 0,t ) f(u 1t )) 6= 0 u 0,t+1 = y t+1 w 0 0,tβ 0,u 1,t+1 = y t+1 w 0 1,tβ 1. Consider a non-standardized version of the DM statistic, also called, for the sake of simplicity, d DM P : = P ddm P = P 1 1/2 t=r P 1 1 +P t=r 1 1 t=r 1 1/2 t=r (f(bu 0,t+1 ) f(bu 1t+1 )) (f(u 0,t+1 ) f(u 1t+1 )) (9) β0 f(u 0,t+1 )P 1/2 ( b β 0t β 0) β1 f(u 1,t+1 )P 1/2 ( b β 1t β 1), (10) 20

21 where u 0,t+1 = y t+1 w 0 0,tβ 0,t with β 0,t (β 0, b β t ), and where u 0,t+1 is definedinananalogousmanner. The term on the right hand side in (9) is the DM statistic for the case in which we know the underlying parameters. For the sake of simplicity, let s concentrate on the first piece on the RHS of (10). Let by a mean value expansion around β 0, m t (β 0 )=m(y t w 0 0,t 1β 0 ), 1 t tx j=2 β m j ( b β 0t )= 1 t + 1 t tx j=2 tx j=2 β m j (β 0) 2 βm j (β 0t ) ³ β b 0t β0, with β 0t ( b β 0t, β 0). Now the left hand side above is identical to zero by the first order conditions (see equation (7)), thus Hereafter, let f t (β 0 )=u t (β 0 )=f(y t+1 w 0 0,tβ 0 ), and let f t (β 1 )=u t (β 1 )=f(y t+1 w 0 1,tβ 1 ). t 1/2 ³ b β 0t β 0 = 1 t 1 t 1/2 tx j=2 tx j=2 2 βm j (β 0t ) 1 Along the lines of West (1996) we now state the following assumptions. Assumption A1: f(u i,t ), is twice continuously differentiable in β i and sup β i0 B i 2 f t (β i )/ β i β 0 i <C,i=0, 1. β m j (β 0) (11) ³ Assumption A2: sup t 1 P tj=2 2 a.s t βm j (β it ) 1 B i, where B i is negative definite, i =0, 1. Assumption A3: (i) (y t,x t,w i,t ),i= 0, 1, is a strictly stationary strong mixing sequence with size 4(4 + ψ)/ψ, ψ > 0, (ii) f and m are twice continuously differentiable in β, over the interior of B, and β m, 2 βm, β f 0, 2 βf 0 are 2r dominated (more simply have 2r moments finite uniformly in B) with r 2(2 + ψ) 21

22 Assumption A4: βi uniquely identified (i.e. E(m(y j wi,j 1β 0 i )) <E(m(y j wi,j 1β 0 i )) for any β i 6= βi,i=0, 1. Assumption A5: T = R + P, as T,P,R and P/R π, 0 π. Hereafter the notation o P (1) denotes a term which approaches zero in probability. Recall that we re considering P 1 1 t=r β0 f(u 0,t+1 ) 0 P 1/2 ( b β 0t β 0). Proposition 2: Let Assumptions A1,A2,A3,A4,A5 hold. Then: where (i) If f = m (i.e. if we are using the same loss for estimation and testing), then: P 1 1 t=r (ii) If π = 0 (i.e. if as T,P/R 0), then: P 1 1 t=r (iii) In all other cases (i.e. π > 0andf 6= m), then: P β0 f(u 0,t+1 ) 0 P 1/2 ( b β 0t β 0)=o P (1). β0 f(u 0,t+1 ) 0 P 1/2 ( b β 0t β 0)=o P (1). 1 1 t=r for 0 < π <, and where Π =1forπ =. Also, β0 f(u 0,t+1 ) 0 P 1/2 ( b β 0t β 0) d N(0, 2ΠF 0 0B 0 S h0 h 0 B 0 F 0 ), Π =1 π 1 ln(1 + π) F 0 = E( β0 f(u 0,t+1 )) 0, S h0 h 0 = X j= E( β m 1 (β 0) β m 1+j (β 0) 0 ), and B 0 = ³ E ³ 2 θm t (θ 0 ) 1. Proof (available upon request) 22

23 Thus we have seen that there are two important cases in which the effect of parameter estimation error vanishes in probability. Broadly speaking, if we use a different loss function for estimation and testing, then we are (a priori) ruling out the use of an optimal predictor. Also, let s stop standardizing (by σ), the DM test, still, though, calling it DM. Proposition 3: Let Assumptions A1,A2,A3,A4,A5 hold and let f 6= m (different loss for estimation and testing) and π > 0. Then, under H 0, ddm P = 1 1 (f(bu P 1/2 0,t+1 ) f(bu 1,t+1 )) t=r d N (0,S ff +2ΠF0B 0 0 S h0h0 B 0 F 0 +2ΠF1B 0 1 S h1 h 1 B 1 F 1 Π(Sf 0 h0 B 0 F 0 + F0B 0 0 S fh0 ) 2Π (F1B 0 1 S h1 h 0 B 0 F 0 + F0B 0 0 S h0 h 1 B 1 F 1 ) +Π(Sfh 0 1 B 1 F 1 + F1B 0 1 S fh1 ) µ ³ 2 where for i =0, 1 F i = E( βi f(u i,t+1 )), B i = E βm j (βi ) 1, S hi h l = P j= E( β m 1 (βi ) β m 1+j (βl ) 0 ),i,l=0, 1 S fhi = P j= E((f(u 0,1 ) f(u 1,1 )) β m 1+j (βi ) 0 ), S ff = P j= E((f(u 0,1 ) f(u 1,1 )) (f(u 0,1+j ) f(u 1,1+j ))) Under the alternative, for some ε > 0,! T 1 X 1 Pr à (f(bu 0,t+1 ) f(bu 1,t+1 )) P > ε =1, and so diverges at rate P 1/2. t=r T 1 X 1 P 1/2 t=r (f(bu 0,t+1 ) f(bu 1,t+1 )) In order to implement a valid DM test in the case of non vanishing parameter estimation error, we need to consistently estimate all the pieces of the covariance matrix in Proposition 3. Now, for consistent estimation of F i and B i we can just use the sample mean evaluated at the estimated parameters. Namely, we can use: and bb i = bf i = 1 P à 1 P T 1 X t=r T 1 X t=r βi f(bu i,t+1 ), 2 β i m j ( b β i,t )! 1 23

24 However for the long run covariance matrix we need to use a HAC (Newey-West type) covariance estimator. Define for i, l =0, 1, bs hi h l = 1 P l P X τ= l P w τ l P t=r+l P β m t ( b β i,t ) β m t+τ ( b β l,t ) 0 bs fhi = 1 P l P X l P w τ τ= l P t=r+l P Ã (f(bu 0,t ) f(bu 1,t )) 1 P β m t+τ ( b β i,t ) 0 1 t=r (f(bu 0,t ) f(bu 1,t ))! bs ff = 1 P l P X l P w τ τ= l P t=r+l P Ã Ã f(bu 0,t ) f(bu 1,t ) 1 P 1 t=r f(bu 0,t+τ ) f(bu 1,t+τ ) 1 P (f(bu 0,t ) f(bu 1,t )) 1 t=r! (f(bu 0,t ) f(bu 1,t )) Given A1-A4 above, if we let w τ =1 τ/(l P +1) and as P,l P and l P /P 1/4 0, then S b hi h l, S b fhi, and S b ff are consistent for S hi h l,s fhi,s ff. Note that in practice we do not know π, a natural estimate for π is bπ = P/R. Also we do not observe the rate at which P and R grow. Thus, unless R is much larger than P it is worthwhile using the formula for the covariance which takes into account parameter estimation error, whenever we use a different loss for estimation and prediction. Of note is that a recent key paper by Giacomini and White discusses conditional predictive inference, in which case the data are conditioned on and parameter estimation error essentially vanishes. 2.2 Bootstrap Techniques for Critical Value Construction Introduction to the Bootstrap Inference on parameters is based on asymptotic critical values. But, how good is the normal approximation? Can we improve over inference based upon the normal approximation? We shall see that bootstrap critical values can provide refinements over asymptotic critical value under various circumstances. 24!

25 First, let us outline the logic underlying the bootstrap, and then we shall see how the use of bootstrap can lead to more accurate inference. Consider a very simple situation. We have a sample of T iid observations, X 1,..., X T and we want to test the null hypothesis: H 0 : E(X 1 )=μ versus H A : E(X 1 ) 6= μ Note that given the identical distribution assumption, E(X 1 )=E(X 2 )=... = E(X T ). Consider the t-statistic 1 P T (X T t μ,t = 1/2 t μ), bσ X where bσ X 2 = 1 Ã X t 1! 2 X t. T T Provided that var(x 1 ) <, we know that under H 0,t d μ N(0, 1). Thus, we compare t μ with 2.5% and 97.5% critical values of a standard normal, and we reject at 5% significance level if t μ,t < 1.96 or t μ,t > The idea underlying the bootstrap is to pretend that the sample is the population, and draw from the sample as many bootstrap samples as needed in order to construct many bootstrap statistics. The simplest form of bootstrap is the iid nonparametric bootstrap, which is suitable for iid observations. ImaginethatweputallT observations into an urn, and we then make T draws with replacement (i.e. we make one draw, get one observation, put it back into the urn, get another one, put it back in the urn, and so on). Let X 1,X 2,..., X T be the resampled observations, and note that X 1 = X t,,...,t with probability 1/T. In other words, X 1,X 2,..., X T is equal to X I1,X I2,...,X IT, where for i =1,..., T I i is a random variable taking values 1, 2,..., T with equal probability 1/T. X 1,X 2,..., X T forms a bootstrap sample. Needless to say, we can repeat the same operation and get a second bootstrap sample, and so on. Note that, given the original sample, the probability law governing the resampled series is nothing other than the probability law of I i,i=1,..., T. As I i are iid discrete uniform random variables on [1,T], the Xi are also iid, conditional on the sample. 25

26 Let E and Var denote the mean and the variance of the resampled series, conditional on sample (note that E and Var are mean and variance operators in terms of the law governing the bootstrap (i.e. in terms of I i,i=1,..., T )). and Also, Now, given the identical distribution, E (X 1 )=E (X 2)=... = E (X T ), E (X1) 1 = X 1 T + X 1 2 T X 1 T T E à 1 T = 1 T! Xt X t Thus, the bootstrap mean is equal to the sample mean. Given that X 1,...,X T are iid observations, = 1 T Var à 1 T 1/2 = E (X 1)= 1 T! Xt X t Var(X t )=Var (X 1) = E (X1 2 ) (E (X1)) 2 = 1 à 1 Xt 2 T T Ã! 2 X t = 1 Xt 2 1! 2 X t. T T Thus, the bootstrap variance is equal to the sample variance. Let bσ 2 X = 1 T à Xt 1! 2 X t. T Given that X1,..., XT are iid with mean and variance equal to the sample mean and sample variance, t μ,t = 1 P ³ T T X 1/2 t 1 T bσ X P T X t d N(0, 1), 26

27 where d denotes convergence in distribution according to the bootstrap probability measure, conditional on the sample. Note importantly that t d μ,t N(0, 1), regardless whether the null hypothesis is true or not. Thus, under the null t μ,t and t μ,t have the same limiting distribution. Under the alternative, t μ,t d N(0, 1) while t μ,t diverges (to ). This suggests proceeding in the following manner. Construct B (B large) bootstrap statistics, say t (1) μ,t,..., t (B) μ,t. Sort these statistics from the smallest to the largest. Suppose B = 1000, then the 25th bootstrap statistic gives the 2.5% significance level critical value, say zt,2.5% ; and the 975-th bootstrap statistic gives the 97.5% significance level critical value, say zt,97.5%. If B is large enough, then rejecting H 0 if t μ,t <z 2.5% or t μ,t >z T,97.5% and not rejecting if z T,2.5% <t μ,t <z T,97.5% yields a test with asymptotic size equal to 5% and unit asymptotic power. It is important to note that in this case the bootstrap higher moments also are equal to the sample moments. In fact, given independence, = E Ã 1 T 1/2! 3 Xt 1 T 3/2 E (X1 3 )= 1 1 T 1/2 T Xt 3 and so on for the fourth moments, etc. Question: Is inference based on zt,2.5% and z T,97.5% normal approximations (e.g. based on using ±1.96)? more accurate than inference based on standard Answer: Yes. Why? Show why using the Edgeworth Expansion (see lecture notes) 27

28 2.2.2 Bootstrap with Time Series The iid nonparametric bootstrap does not work with dependent observations. resampled observations are iid, while the actual observations are not. The reason is that the In the case of dependent observations, things are more complicated. On one hand we want to draw blocks of data long enough to preserve the dependence structure present in the original sample, while on the other hand we want to have a large enough number of blocks independent each other. The most used resampling method for time series data is the block bootstrap of Künsch (1989), which we shall consider below. Let T = bl, where b denotes the number of blocks and l denotes the length of each block. We first draw a discrete uniform random variable, I 1, that can take values 0, 1,..., T l with probability 1/(T l +1). The first block is given by X I1+1,..., X I1 +l. We then draw another discrete uniform random variable, say I 2, and a second block of length l is formed, say X I2 +1,..., X I2 +l. Continue in the same manner, until you draw the last discrete uniform say I b, and so the last block is X Ib +1,..., X Ib +l. Let s call the X t the resampled series, and note that X 1,X 2,..., X T corresponds to X I1 +1,X I1 +2,..., X Ib +l. Thus, conditional on the sample, the only random element is the beginning of each block. In particular X 1,..., X l,x l+1,..., X 2l,X T l+1,...,x T, conditional on the sample, can be treated as biidblocks of discrete uniform random variables. Now, above results hold under some restrictions on block length Finally... Bootstrap DM (and Data Snooping) Tests Given the above discussion, we can use the block bootstrap in the context of DM tests. Namely, use ddm P = 1 P 1/2 T 1 X t=r H 0 : E (f(u 0,t+1 ) f(u 1,t+1 )) = 0 H A : E (f(u 0,t+1 ) f(u 1,t+1 )) 6= 0 " ³f(bu 0,t+1) f(bu 1,t+1) 1 T T 1 X t=2 (f(bu 0,t+1 ) f(bu 1,t+1 )) Then, the empirical distribution of this statistic can be used to obtain critical values for d DM P. Moreover, consider the White (2000) data snooping test for comparing many models against a benchmark, so that we have #. 28

29 H 0 : max 0,t+1) f(u k,t+1 )) 0 k=1,...,m H A : max 0,t+1) f(u k,t+1 )) > 0 k=1,...,m and S P = max k=1,...,m ddm P,k T 1 X 1 = max (f(bu k=1,...,m P 1/2 0,t+1 ) f(bu k,t+1 )). t=r Here, we need only construct S P = max k=1,...,m = max k=1,...,m ddm P,k 1 P 1/2 T 1 X t=r " ³f(bu 0,t+1) f(bu k,t+1) 1 T T 1 X t=2 (f(bu 0,t+1 ) f(bu k,t+1 )) The only caveats with both of these tests is that a recentering to the parameter estimates used in the construction of either ³ f(bu 0,t+1 ) f(bu 1,t+1) #. or ³ f(bu 0,t+1 ) f(bu k,t+1) needs to generally be made, because the bootstrap estimator of the forecast model parameters is characterized by a location bias. Additionally, the bootstrap component is constructed over the last P observations, while the sample component is constructed over all T observations. These adjustments to the usual approach to bootstrap test statistic discussed above arise because of the use of recursive (or rolling) estimation schemes. If parameter estimation error is assumed to vanish, then recentering to the parameter estimates does not need to be made, although the 1 term still needs to be subtracted from the bootstrap statistics. T Corradi, Valentina and Norman R. Swanson, 2006, Predictive Density Evaluation, in: Handbook of Economic Forecasting, eds. Clive W.J. Granger, Graham Elliot and Allan Timmerman, Elsevier, Amsterdam, pp Corradi, Valentina and Norman R. Swanson, 2007, Nonparametric Bootstrap Procedures for Predictive Inference Based on Recursive Estimation Schemes, International Economic Review, 48,

30 3 PartIII-WhatShouldWeBePredicting-Real-TimeData Real-time data are important in policy making contexts, as most macroeconomic data are revised over time. Table 1: Generic Real-Time Dataset Data Release Date of Data pertains to i = 1950, 05 i = 1950, 06 i =1950, i = t... i =2011, 03 i =2011, 04 j =1950, 04 X i (j) X i (j) X i (j)... X i (j)... X i (j) X i (j) j =1950, 05 0 X i (j) X i (j)... X i (j)... X i (j) X i (j) j =1950, X i (j)... X i (j)... X i (j) X i (j) j = t X i (j)... X i (j) X i (j) j =2011, X i (j) X i (j) j =2011, X i (j) 30

31 Figure 1: Output Growth Rates, First, and Second Revision Errors 1965:4-2006:4 Growth Rates First Revision Errors 8 x Second Revision Errors 8 x x x 10-3 Note: First revision errors are defined as follows: t+2 u t+1 t = t+2 X t t+1 X t,where t+1 X t is the annualized growth rate of output pertaining to calendar date t, and available at time t + 1. Similarly, second revision errors are defined as t+3 u t+2 t = t+3 X t t+2 X t. See Sections 2 and 3 for further details. 31

32 Many papers are linked to the real-time data research center at the Piladelphia Federal Reserve Bank website at: In many papers, some of the earlier ones being Amato and Swanson (2001), Bernanke and Boivin (2003), and Croushore and Stark (2001, 2003) complete revision histories for the variables that they examine are considered. One way of thinking about this sort of data is using regressions of the form: fx t = α + t+1 X t β + W 0 t+1γ + ε t+1, (12) where W t+1 is an m 1 vector of variables representing the conditioning information set available at time period t +1andε t+1 is an error term assumed to be uncorrelated with t+1 X t and W t+1. The null hypothesis of interest in this model is that α =0,β =1,andγ = 0, based on the notion of testing for rationality of t+1 X t for f X t by finding out whether the conditioning information in W t+1, available in real-time to the data issuing agency, could have been used to construct better conditional predictions of final data. Notice that this hypothesis, if rejected, is consistent with the errors-in-variables hypothesis. 1 Following Keane and Runkle (1990), the test of rationality of t+1 X t in the context of model (12) can be broken down into two sub-hypotheses, namely (i) unbiasedness and (ii) efficiency. The hypothesis of unbiasedness can be tested by imposing the restriction that γ =0andtestingα =0, β =1, Efficiency requires that α =0,β =1,andγ =0. For a complete discussion of the above ideas, including references, see Swanson, Norman R. and Dick van Dijk, 2006, Are Reporting Agencies Getting It Right? Data Rationality and Business Cycle Asymmetry, Journal of Business and Economic Statistics, 24, Corradi, Valentina, Andres Fernandez and Norman R. Swanson, 2009, Information in the Revision Process of Real-Time Data, Journal of Business and Economic Statistics, 27, For further discussion on the relationship between errors-in-variables hypotheses and rationality hypotheses, the reader is referred to Croushore and Stark (2003) and Faust, Rogers, and Wright (2004), where the errors-in-variables and rational forecast models are associated with the notions of noise and news, respectively. 32

33 Suffice it to say that one of the biggest issues is how to compare models. Should the target to be predicted be some sort of final release, or instead an earlier release upon which most economic agents base their decisions, say. Do revisions converge to zero, in the sense that all preliminary data converge to some true final values? What about definitional (and benchmark) revision (as opposed to reision associated with the arrival of nnew information? Can we predict revisions? see the special issue of Journal of Business and Economic Statistics, in which appears the above Corradi et al. paper, as well as key Aruoba et al. and McCracken et al. papers. 33

34 4 Part IV - Methods of Prediction - Some Comments Three sorts of prediction methods are of particular interest to policy setters. (i) Individual prediction models Thus far, we have discussed numerous individual linear and nonlinear models, including AR models, threshold switching models, and GARCH models, for example. It is certainly fesible to construct prediction models with many equations, in which case correlation across predictions of different variables becomes important (e.g., common shocks driving multiple yields in a yield curve model). In the current context, we are also concerned with correlation across different predictors of the same variable (see below discussion of Aiolfi and Timmermann (2006). (ii) Diffusion Indicies or other data reduction type models. Consider the standard diffusion index forecasting approach. X t = Λ 0,t F 0,t + u t, (13) where X t is a N 1 vector, Λ 0,t is an N r matrix of factor loadings, and F 0,t is the unobserved r 1 factor vector. We want to use factors to predict y t h steps ahead. For this purpose, we might consider the following simple index model with no factor dynamics, y t+h = β 0,1,t F 0,1,t β 0,r,t F 0,r,t + Γ 0 W t + ² t+h = F 0 0,tβ 0,t + ² t+h (14) Factors can be estimated using principal components, or using versions of the Kalman filter for various variants of this model. There are many data reduction techniques, such as the bagging, boosting, the lasso, the garrot, ridge regression, the elastic net, that offer additional approaches to data shrinkage. Some of these set coefficients on many variables to zero, and are hence very parsimonious, while others are not. All of these are discussed in the following paper, and are combined with diffusion index modelling. Hyun Hak Kim and Norman R. Swanson, 2010, Forecasting Financial and Macroeconomic Variables Using Data Reduction Methods: New Empirical Evidence, Working Paper, Rutgers University. 34

35 (iii) Forecast pooling and forecast combination A stylized fact in empirical economics is that it is difficult to beat the simple forecast average (i.e., average the predictions from multiple models using equal weights). Bagging is a form of model averaging. Bayesian model averaging is another popular type of averaging. Pooling, or forecast combination is sometimes done using simple regression, such as by estimating regressions of the form discussed in the proceeding example, with the weights (w i ) estimated using least squares on a pre-sample, say.. Timmermann, A. (2006), Forecast Combinations, in Handbook of Forecasting, eds. Clive W.J. Granger, Graham Elliot and Allan Timmerman, Elsevier, Amsterdam, pp , notes that forecast combinations may so frequently be found to yield better forecasts because of model misspecification, instability (nonstationarities), and estimation error, when the number of models is large relative to the sample size. Aiolfi, M. and A. Timmermann (2006), Persistence in forecasting performance and conditional combination strategies, Journal of Econometrics, 135, note that correlation (persistence) of linear and nonlinear times series models is prevalent in a large cross section of economic variables used for prediction in the G7 countries. They find it useful to: first (i) sort models into clusters using past performance; then (ii) pool forecasts within each cluster; then (iii) estimate optimal forecastr combination weights for the clusters with shrinkage towards equal weights. For further references, see Stock, James H. and Watson, Mark W., 2005, An Empirical Comparison of Methods for Forecasting using Many Predictors, Princeton University,Working Paper. Example: Combined Bivariate ADL Model As in Stock and Watson (2005), we implement a combined bivariate autoregressive distributed lag (ADL) model. Forecasts are constructed by combining individual forecasts computed from bivariate ADL models. The i-th ADL model includes p i,x lags of X i,t, and p i,y lags of Y t, and has the form Ŷ ADL t+h =ˆα + ˆβ i (L) X i,t + ˆφ i (L) Y t. The combined forecast is Ŷ Comb,h T +h T = Σ n w i Ŷ ADL,h T +h T. Here, we might set (w i =1/n), where n = 146. see Appendix for tables and results using this sort of forecast combination. 35

36 5 Part V - Density Based Model Selection Thus far we have discussed only point estimate based model selection (e.g. using mean square forecast error). Density based model selection is also important, as we are interested not only in good point preditions, but also in good interval predictions, for example. 5.1 Comparing Models Using Simulated Distributions Assume that we have a series of real business cycle (RBC) models, and our objective is to compare the joint distribution of historical variables with the joint distribution of the simulated variables for these RBC models. Hereafter, for sake of simplicity, but without loss of generality, we limit our attention to the joint distribution of (actual and model-based) current and previous period output growth. Extension to an arbitrary (but finite) number of lags of a given variable (or variables) follows immediately, but computational demand increases substantially as the number of random variables being examined increases. We shall follow Corradi, Valentina and Norman R. Swanson, 2007, Evaluation of Dynamic Stochastic General Equilibrium Models Based on Distributional Comparison of Simulated and Historical Data, Journal of Econometrics,136, Consider m RBC models, and set model 1 as the benchmark model. In keeping with our focus on current and lagged values of the variable of interest, let Y t =( log X t, log X t 1 ), Y j,n ( b θ j,t )=( log X j,n ( b θ j,t ), log X j,n 1 ( b θ j,t )). Also, let F 0 (u; θ 0 ) denote the distribution of Y t evaluated at u and F j (u; θ j) denote the distribution of Y j,n (θ j), where θ j is the probability limit of b θ j,t, taken as T, and where u U < 2, possibly unbounded. Accuracy is measured in terms of squared error. 36

37 The squared (approximation) µ error associated with model j, j =1,..., m, is measured in terms of the ³Fj (weighted) average over U of (u; θ j) F 0 (u; θ 0 ) 2, where u U, andu is a possibly unbounded set on < 2. Thus, the rule is to choose Model 1 over Model 2 if Z µ ³F1 (u; θ 1) F 0 (u; θ 0 ) 2 Z µ ³F2 φ(u)du < (u; θ 2) F 0 (u; θ 0 ) 2 φ(u)du, U U where R U φ(u)du =1andφ(u) 0forallu U <2. For any evaluation point, this measure defines a norm and is a typical goodness of fit measure. The hypotheses of interest are: Z µ ³F0 H 0 : max (u; θ 0 ) F 1 (u; θ ³ 1) 2 F0 (u) F j (u; θ j) 2 φ(u)du 0 j=2,...,m U and H A : Z µ ³F0 max (u) F 1 (u; θ ³ 1) 2 F0 (u) F j (u; θ j) 2 φ(u)du > 0. j=2,...,m U Thus, under H 0, no model can provide a better approximation (in a squared error sense) to the distribution of Y t than the approximation provided by model 1. If interest focuses on confidence intervals, so that the objective is to approximate Pr(u Y t u), then the null and alternative hypotheses can be stated as: H 0 0 : µ ³³F1 max (u; θ 1) F 1 (u; θ 1) (F 0 (u; θ 0 ) F 0 (u; θ 0 )) 2 j=2,...,m versus H 0 0 : ³³ F j (u; θ j) F j (u; θ j) (F 0 (u; θ 0 ) F 0 (u; θ 0 )) 2 0. µ ³³F1 max (u; θ 1) F 1 (u; θ 1) (F 0 (u; θ 0 ) F 0 (u; θ 0 )) 2 j=2,...,m ³³ F j (u; θ j) F j (u; θ j) (F 0 (u; θ 0 ) F 0 (u; θ 0 )) 2 > 0. If interest focuses on testing the null of equal accuracy of two distribution models (analogous to the pairwise conditional mean comparison setup of Diebold and Mariano (1995)), we can simply state the hypotheses as: Z µ ³F0 H0 00 : (u; θ 0 ) F 1 (u; θ ³ 1) 2 F0 (u) F j (u; θ j) 2 φ(u)du =0 U 37

Lecture Notes Prediction and Simulation Based Specification Testing and Model Selection

Lecture Notes Prediction and Simulation Based Specification Testing and Model Selection Lecture Notes Prediction and Simulation Based Specification esting and Model Selection copyright Valentina Corradi and Norman Rasmus Swanson contact: nswanson@econ.rutgers.edu Course Notes Outline - Professor

More information

Discussion of Tests of Equal Predictive Ability with Real-Time Data by T. E. Clark and M.W. McCracken

Discussion of Tests of Equal Predictive Ability with Real-Time Data by T. E. Clark and M.W. McCracken Discussion of Tests of Equal Predictive Ability with Real-Time Data by T. E. Clark and M.W. McCracken Juri Marcucci Bank of Italy 5 th ECB Workshop on Forecasting Techniques Forecast Uncertainty in Macroeconomics

More information

Generalized Method of Moments (GMM) Estimation

Generalized Method of Moments (GMM) Estimation Econometrics 2 Fall 2004 Generalized Method of Moments (GMM) Estimation Heino Bohn Nielsen of29 Outline of the Lecture () Introduction. (2) Moment conditions and methods of moments (MM) estimation. Ordinary

More information

Further Evidence on the Usefulness of Real-Time Datasets for Economic Forecasting

Further Evidence on the Usefulness of Real-Time Datasets for Economic Forecasting http://www.aimspress.com/journal/qfe QFE, 1(1): 2-25 DOI: 10.3934/QFE.2017.1.2 Received: 29 January 2017 Accepted: 28 February 2017 Published: 10 April 2017 Research article Further Evidence on the Usefulness

More information

Mining Big Data Using Parsimonious Factor and Shrinkage Methods

Mining Big Data Using Parsimonious Factor and Shrinkage Methods Mining Big Data Using Parsimonious Factor and Shrinkage Methods Hyun Hak Kim 1 and Norman Swanson 2 1 Bank of Korea and 2 Rutgers University ECB Workshop on using Big Data for Forecasting and Statistics

More information

Econ 423 Lecture Notes: Additional Topics in Time Series 1

Econ 423 Lecture Notes: Additional Topics in Time Series 1 Econ 423 Lecture Notes: Additional Topics in Time Series 1 John C. Chao April 25, 2017 1 These notes are based in large part on Chapter 16 of Stock and Watson (2011). They are for instructional purposes

More information

Using all observations when forecasting under structural breaks

Using all observations when forecasting under structural breaks Using all observations when forecasting under structural breaks Stanislav Anatolyev New Economic School Victor Kitov Moscow State University December 2007 Abstract We extend the idea of the trade-off window

More information

Tests of Equal Predictive Ability with Real-Time Data

Tests of Equal Predictive Ability with Real-Time Data Tests of Equal Predictive Ability with Real-Time Data Todd E. Clark Federal Reserve Bank of Kansas City Michael W. McCracken Board of Governors of the Federal Reserve System April 2007 (preliminary) Abstract

More information

Tests of Equal Predictive Ability with Real-Time Data

Tests of Equal Predictive Ability with Real-Time Data issn 1936-5330 Tests of Equal Predictive Ability with Real-Time Data Todd E. Clark Federal Reserve Bank of Kansas City Michael W. McCracken Board of Governors of the Federal Reserve System July 2007 Abstract

More information

Econometric Forecasting

Econometric Forecasting Graham Elliott Econometric Forecasting Course Description We will review the theory of econometric forecasting with a view to understanding current research and methods. By econometric forecasting we mean

More information

Forecasting the unemployment rate when the forecast loss function is asymmetric. Jing Tian

Forecasting the unemployment rate when the forecast loss function is asymmetric. Jing Tian Forecasting the unemployment rate when the forecast loss function is asymmetric Jing Tian This version: 27 May 2009 Abstract This paper studies forecasts when the forecast loss function is asymmetric,

More information

Economic modelling and forecasting

Economic modelling and forecasting Economic modelling and forecasting 2-6 February 2015 Bank of England he generalised method of moments Ole Rummel Adviser, CCBS at the Bank of England ole.rummel@bankofengland.co.uk Outline Classical estimation

More information

Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis

Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis Sílvia Gonçalves and Benoit Perron Département de sciences économiques,

More information

Inference in VARs with Conditional Heteroskedasticity of Unknown Form

Inference in VARs with Conditional Heteroskedasticity of Unknown Form Inference in VARs with Conditional Heteroskedasticity of Unknown Form Ralf Brüggemann a Carsten Jentsch b Carsten Trenkler c University of Konstanz University of Mannheim University of Mannheim IAB Nuremberg

More information

Robust Backtesting Tests for Value-at-Risk Models

Robust Backtesting Tests for Value-at-Risk Models Robust Backtesting Tests for Value-at-Risk Models Jose Olmo City University London (joint work with Juan Carlos Escanciano, Indiana University) Far East and South Asia Meeting of the Econometric Society

More information

Prof. Dr. Roland Füss Lecture Series in Applied Econometrics Summer Term Introduction to Time Series Analysis

Prof. Dr. Roland Füss Lecture Series in Applied Econometrics Summer Term Introduction to Time Series Analysis Introduction to Time Series Analysis 1 Contents: I. Basics of Time Series Analysis... 4 I.1 Stationarity... 5 I.2 Autocorrelation Function... 9 I.3 Partial Autocorrelation Function (PACF)... 14 I.4 Transformation

More information

Forecasting. Bernt Arne Ødegaard. 16 August 2018

Forecasting. Bernt Arne Ødegaard. 16 August 2018 Forecasting Bernt Arne Ødegaard 6 August 208 Contents Forecasting. Choice of forecasting model - theory................2 Choice of forecasting model - common practice......... 2.3 In sample testing of

More information

Comparing Predictive Accuracy, Twenty Years Later: On The Use and Abuse of Diebold-Mariano Tests

Comparing Predictive Accuracy, Twenty Years Later: On The Use and Abuse of Diebold-Mariano Tests Comparing Predictive Accuracy, Twenty Years Later: On The Use and Abuse of Diebold-Mariano Tests Francis X. Diebold April 28, 2014 1 / 24 Comparing Forecasts 2 / 24 Comparing Model-Free Forecasts Models

More information

Volatility. Gerald P. Dwyer. February Clemson University

Volatility. Gerald P. Dwyer. February Clemson University Volatility Gerald P. Dwyer Clemson University February 2016 Outline 1 Volatility Characteristics of Time Series Heteroskedasticity Simpler Estimation Strategies Exponentially Weighted Moving Average Use

More information

Least Squares Estimation-Finite-Sample Properties

Least Squares Estimation-Finite-Sample Properties Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Finite-Sample 1 / 29 Terminology and Assumptions 1 Terminology and Assumptions

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Linear Regression with Time Series Data

Linear Regression with Time Series Data Econometrics 2 Linear Regression with Time Series Data Heino Bohn Nielsen 1of21 Outline (1) The linear regression model, identification and estimation. (2) Assumptions and results: (a) Consistency. (b)

More information

10. Time series regression and forecasting

10. Time series regression and forecasting 10. Time series regression and forecasting Key feature of this section: Analysis of data on a single entity observed at multiple points in time (time series data) Typical research questions: What is the

More information

Estimation and Testing of Forecast Rationality under Flexible Loss

Estimation and Testing of Forecast Rationality under Flexible Loss Review of Economic Studies (2005) 72, 1107 1125 0034-6527/05/00431107$02.00 c 2005 The Review of Economic Studies Limited Estimation and Testing of Forecast Rationality under Flexible Loss GRAHAM ELLIOTT

More information

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models University of Illinois Fall 2016 Department of Economics Roger Koenker Economics 536 Lecture 7 Introduction to Specification Testing in Dynamic Econometric Models In this lecture I want to briefly describe

More information

A Forecast Rationality Test that Allows for Loss Function Asymmetries

A Forecast Rationality Test that Allows for Loss Function Asymmetries A Forecast Rationality Test that Allows for Loss Function Asymmetries Andrea A. Naghi First Draft: May, 2013 This Draft: August 21, 2014 Abstract In this paper, we propose a conditional moment type test

More information

Predicting bond returns using the output gap in expansions and recessions

Predicting bond returns using the output gap in expansions and recessions Erasmus university Rotterdam Erasmus school of economics Bachelor Thesis Quantitative finance Predicting bond returns using the output gap in expansions and recessions Author: Martijn Eertman Studentnumber:

More information

ARIMA Modelling and Forecasting

ARIMA Modelling and Forecasting ARIMA Modelling and Forecasting Economic time series often appear nonstationary, because of trends, seasonal patterns, cycles, etc. However, the differences may appear stationary. Δx t x t x t 1 (first

More information

Introduction to Econometrics

Introduction to Econometrics Introduction to Econometrics T H I R D E D I T I O N Global Edition James H. Stock Harvard University Mark W. Watson Princeton University Boston Columbus Indianapolis New York San Francisco Upper Saddle

More information

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY Time Series Analysis James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY & Contents PREFACE xiii 1 1.1. 1.2. Difference Equations First-Order Difference Equations 1 /?th-order Difference

More information

2.5 Forecasting and Impulse Response Functions

2.5 Forecasting and Impulse Response Functions 2.5 Forecasting and Impulse Response Functions Principles of forecasting Forecast based on conditional expectations Suppose we are interested in forecasting the value of y t+1 based on a set of variables

More information

Are Forecast Updates Progressive?

Are Forecast Updates Progressive? CIRJE-F-736 Are Forecast Updates Progressive? Chia-Lin Chang National Chung Hsing University Philip Hans Franses Erasmus University Rotterdam Michael McAleer Erasmus University Rotterdam and Tinbergen

More information

Research Division Federal Reserve Bank of St. Louis Working Paper Series

Research Division Federal Reserve Bank of St. Louis Working Paper Series Research Division Federal Reserve Bank of St. Louis Working Paper Series Tests of Equal Predictive Ability with Real-Time Data Todd E. Clark and Michael W. McCracken Working Paper 2008-029A http://research.stlouisfed.org/wp/2008/2008-029.pdf

More information

Econ 623 Econometrics II Topic 2: Stationary Time Series

Econ 623 Econometrics II Topic 2: Stationary Time Series 1 Introduction Econ 623 Econometrics II Topic 2: Stationary Time Series In the regression model we can model the error term as an autoregression AR(1) process. That is, we can use the past value of the

More information

ECON3327: Financial Econometrics, Spring 2016

ECON3327: Financial Econometrics, Spring 2016 ECON3327: Financial Econometrics, Spring 2016 Wooldridge, Introductory Econometrics (5th ed, 2012) Chapter 11: OLS with time series data Stationary and weakly dependent time series The notion of a stationary

More information

Some Recent Developments in Predictive Accuracy Testing With Nested Models and (Generic) Nonlinear Alternatives

Some Recent Developments in Predictive Accuracy Testing With Nested Models and (Generic) Nonlinear Alternatives Some Recent Developments in redictive Accuracy Testing With Nested Models and (Generic) Nonlinear Alternatives Valentina Corradi 1 and Norman R. Swanson 2 1 University of Exeter 2 Rutgers University August

More information

Central Bank of Chile October 29-31, 2013 Bruce Hansen (University of Wisconsin) Structural Breaks October 29-31, / 91. Bruce E.

Central Bank of Chile October 29-31, 2013 Bruce Hansen (University of Wisconsin) Structural Breaks October 29-31, / 91. Bruce E. Forecasting Lecture 3 Structural Breaks Central Bank of Chile October 29-31, 2013 Bruce Hansen (University of Wisconsin) Structural Breaks October 29-31, 2013 1 / 91 Bruce E. Hansen Organization Detection

More information

State-space Model. Eduardo Rossi University of Pavia. November Rossi State-space Model Fin. Econometrics / 53

State-space Model. Eduardo Rossi University of Pavia. November Rossi State-space Model Fin. Econometrics / 53 State-space Model Eduardo Rossi University of Pavia November 2014 Rossi State-space Model Fin. Econometrics - 2014 1 / 53 Outline 1 Motivation 2 Introduction 3 The Kalman filter 4 Forecast errors 5 State

More information

Non-nested model selection. in unstable environments

Non-nested model selection. in unstable environments Non-nested model selection in unstable environments Raffaella Giacomini UCLA (with Barbara Rossi, Duke) Motivation The problem: select between two competing models, based on how well they fit thedata Both

More information

VAR Models and Applications

VAR Models and Applications VAR Models and Applications Laurent Ferrara 1 1 University of Paris West M2 EIPMC Oct. 2016 Overview of the presentation 1. Vector Auto-Regressions Definition Estimation Testing 2. Impulse responses functions

More information

Understanding Regressions with Observations Collected at High Frequency over Long Span

Understanding Regressions with Observations Collected at High Frequency over Long Span Understanding Regressions with Observations Collected at High Frequency over Long Span Yoosoon Chang Department of Economics, Indiana University Joon Y. Park Department of Economics, Indiana University

More information

Econometría 2: Análisis de series de Tiempo

Econometría 2: Análisis de series de Tiempo Econometría 2: Análisis de series de Tiempo Karoll GOMEZ kgomezp@unal.edu.co http://karollgomez.wordpress.com Segundo semestre 2016 IX. Vector Time Series Models VARMA Models A. 1. Motivation: The vector

More information

Bootstrap Procedures for Recursive Estimation Schemes With Applications to Forecast Model Selection

Bootstrap Procedures for Recursive Estimation Schemes With Applications to Forecast Model Selection Bootstrap rocedures for Recursive Estimation Schemes With Applications to Forecast Model Selection Valentina Corradi and Norman R. Swanson 2 Queen Mary, University of London and 2 Rutgers University June

More information

Forecast comparison of principal component regression and principal covariate regression

Forecast comparison of principal component regression and principal covariate regression Forecast comparison of principal component regression and principal covariate regression Christiaan Heij, Patrick J.F. Groenen, Dick J. van Dijk Econometric Institute, Erasmus University Rotterdam Econometric

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrics of Panel Data Jakub Mućk Meeting # 6 Jakub Mućk Econometrics of Panel Data Meeting # 6 1 / 36 Outline 1 The First-Difference (FD) estimator 2 Dynamic panel data models 3 The Anderson and Hsiao

More information

Bagging and Forecasting in Nonlinear Dynamic Models

Bagging and Forecasting in Nonlinear Dynamic Models DBJ Discussion Paper Series, No.0905 Bagging and Forecasting in Nonlinear Dynamic Models Mari Sakudo (Research Institute of Capital Formation, Development Bank of Japan, and Department of Economics, Sophia

More information

Forecasting. A lecture on forecasting.

Forecasting. A lecture on forecasting. Forecasting A lecture on forecasting. Forecasting What is forecasting? The estabishment of a probability statement about the future value of an economic variable. Let x t be the variable of interest. Want:

More information

1 Outline. 1. Motivation. 2. SUR model. 3. Simultaneous equations. 4. Estimation

1 Outline. 1. Motivation. 2. SUR model. 3. Simultaneous equations. 4. Estimation 1 Outline. 1. Motivation 2. SUR model 3. Simultaneous equations 4. Estimation 2 Motivation. In this chapter, we will study simultaneous systems of econometric equations. Systems of simultaneous equations

More information

Title. Description. var intro Introduction to vector autoregressive models

Title. Description. var intro Introduction to vector autoregressive models Title var intro Introduction to vector autoregressive models Description Stata has a suite of commands for fitting, forecasting, interpreting, and performing inference on vector autoregressive (VAR) models

More information

11. Bootstrap Methods

11. Bootstrap Methods 11. Bootstrap Methods c A. Colin Cameron & Pravin K. Trivedi 2006 These transparencies were prepared in 20043. They can be used as an adjunct to Chapter 11 of our subsequent book Microeconometrics: Methods

More information

Lecture 6: Dynamic Models

Lecture 6: Dynamic Models Lecture 6: Dynamic Models R.G. Pierse 1 Introduction Up until now we have maintained the assumption that X values are fixed in repeated sampling (A4) In this lecture we look at dynamic models, where the

More information

Vector Auto-Regressive Models

Vector Auto-Regressive Models Vector Auto-Regressive Models Laurent Ferrara 1 1 University of Paris Nanterre M2 Oct. 2018 Overview of the presentation 1. Vector Auto-Regressions Definition Estimation Testing 2. Impulse responses functions

More information

The regression model with one stochastic regressor (part II)

The regression model with one stochastic regressor (part II) The regression model with one stochastic regressor (part II) 3150/4150 Lecture 7 Ragnar Nymoen 6 Feb 2012 We will finish Lecture topic 4: The regression model with stochastic regressor We will first look

More information

Econometrics of financial markets, -solutions to seminar 1. Problem 1

Econometrics of financial markets, -solutions to seminar 1. Problem 1 Econometrics of financial markets, -solutions to seminar 1. Problem 1 a) Estimate with OLS. For any regression y i α + βx i + u i for OLS to be unbiased we need cov (u i,x j )0 i, j. For the autoregressive

More information

ECON 616: Lecture 1: Time Series Basics

ECON 616: Lecture 1: Time Series Basics ECON 616: Lecture 1: Time Series Basics ED HERBST August 30, 2017 References Overview: Chapters 1-3 from Hamilton (1994). Technical Details: Chapters 2-3 from Brockwell and Davis (1987). Intuition: Chapters

More information

Notes on Time Series Modeling

Notes on Time Series Modeling Notes on Time Series Modeling Garey Ramey University of California, San Diego January 17 1 Stationary processes De nition A stochastic process is any set of random variables y t indexed by t T : fy t g

More information

An overview of applied econometrics

An overview of applied econometrics An overview of applied econometrics Jo Thori Lind September 4, 2011 1 Introduction This note is intended as a brief overview of what is necessary to read and understand journal articles with empirical

More information

ACE 564 Spring Lecture 8. Violations of Basic Assumptions I: Multicollinearity and Non-Sample Information. by Professor Scott H.

ACE 564 Spring Lecture 8. Violations of Basic Assumptions I: Multicollinearity and Non-Sample Information. by Professor Scott H. ACE 564 Spring 2006 Lecture 8 Violations of Basic Assumptions I: Multicollinearity and Non-Sample Information by Professor Scott H. Irwin Readings: Griffiths, Hill and Judge. "Collinear Economic Variables,

More information

Linear Models in Econometrics

Linear Models in Econometrics Linear Models in Econometrics Nicky Grant At the most fundamental level econometrics is the development of statistical techniques suited primarily to answering economic questions and testing economic theories.

More information

GARCH Models. Eduardo Rossi University of Pavia. December Rossi GARCH Financial Econometrics / 50

GARCH Models. Eduardo Rossi University of Pavia. December Rossi GARCH Financial Econometrics / 50 GARCH Models Eduardo Rossi University of Pavia December 013 Rossi GARCH Financial Econometrics - 013 1 / 50 Outline 1 Stylized Facts ARCH model: definition 3 GARCH model 4 EGARCH 5 Asymmetric Models 6

More information

Quick Review on Linear Multiple Regression

Quick Review on Linear Multiple Regression Quick Review on Linear Multiple Regression Mei-Yuan Chen Department of Finance National Chung Hsing University March 6, 2007 Introduction for Conditional Mean Modeling Suppose random variables Y, X 1,

More information

ECON 4160, Spring term Lecture 12

ECON 4160, Spring term Lecture 12 ECON 4160, Spring term 2013. Lecture 12 Non-stationarity and co-integration 2/2 Ragnar Nymoen Department of Economics 13 Nov 2013 1 / 53 Introduction I So far we have considered: Stationary VAR, with deterministic

More information

Statistical inference

Statistical inference Statistical inference Contents 1. Main definitions 2. Estimation 3. Testing L. Trapani MSc Induction - Statistical inference 1 1 Introduction: definition and preliminary theory In this chapter, we shall

More information

Testing for Regime Switching in Singaporean Business Cycles

Testing for Regime Switching in Singaporean Business Cycles Testing for Regime Switching in Singaporean Business Cycles Robert Breunig School of Economics Faculty of Economics and Commerce Australian National University and Alison Stegman Research School of Pacific

More information

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) 1 EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) Taisuke Otsu London School of Economics Summer 2018 A.1. Summation operator (Wooldridge, App. A.1) 2 3 Summation operator For

More information

Nonstationary Time Series:

Nonstationary Time Series: Nonstationary Time Series: Unit Roots Egon Zakrajšek Division of Monetary Affairs Federal Reserve Board Summer School in Financial Mathematics Faculty of Mathematics & Physics University of Ljubljana September

More information

Department of Economics, Vanderbilt University While it is known that pseudo-out-of-sample methods are not optimal for

Department of Economics, Vanderbilt University While it is known that pseudo-out-of-sample methods are not optimal for Comment Atsushi Inoue Department of Economics, Vanderbilt University (atsushi.inoue@vanderbilt.edu) While it is known that pseudo-out-of-sample methods are not optimal for comparing models, they are nevertheless

More information

Warwick Business School Forecasting System. Summary. Ana Galvao, Anthony Garratt and James Mitchell November, 2014

Warwick Business School Forecasting System. Summary. Ana Galvao, Anthony Garratt and James Mitchell November, 2014 Warwick Business School Forecasting System Summary Ana Galvao, Anthony Garratt and James Mitchell November, 21 The main objective of the Warwick Business School Forecasting System is to provide competitive

More information

Comparing Nested Predictive Regression Models with Persistent Predictors

Comparing Nested Predictive Regression Models with Persistent Predictors Comparing Nested Predictive Regression Models with Persistent Predictors Yan Ge y and ae-hwy Lee z November 29, 24 Abstract his paper is an extension of Clark and McCracken (CM 2, 25, 29) and Clark and

More information

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Principles of Statistical Inference Recap of statistical models Statistical inference (frequentist) Parametric vs. semiparametric

More information

Lecture 2: Univariate Time Series

Lecture 2: Univariate Time Series Lecture 2: Univariate Time Series Analysis: Conditional and Unconditional Densities, Stationarity, ARMA Processes Prof. Massimo Guidolin 20192 Financial Econometrics Spring/Winter 2017 Overview Motivation:

More information

Questions and Answers on Unit Roots, Cointegration, VARs and VECMs

Questions and Answers on Unit Roots, Cointegration, VARs and VECMs Questions and Answers on Unit Roots, Cointegration, VARs and VECMs L. Magee Winter, 2012 1. Let ɛ t, t = 1,..., T be a series of independent draws from a N[0,1] distribution. Let w t, t = 1,..., T, be

More information

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication G. S. Maddala Kajal Lahiri WILEY A John Wiley and Sons, Ltd., Publication TEMT Foreword Preface to the Fourth Edition xvii xix Part I Introduction and the Linear Regression Model 1 CHAPTER 1 What is Econometrics?

More information

1 Introduction to Generalized Least Squares

1 Introduction to Generalized Least Squares ECONOMICS 7344, Spring 2017 Bent E. Sørensen April 12, 2017 1 Introduction to Generalized Least Squares Consider the model Y = Xβ + ɛ, where the N K matrix of regressors X is fixed, independent of the

More information

Forecasting the term structure interest rate of government bond yields

Forecasting the term structure interest rate of government bond yields Forecasting the term structure interest rate of government bond yields Bachelor Thesis Econometrics & Operational Research Joost van Esch (419617) Erasmus School of Economics, Erasmus University Rotterdam

More information

Lectures on Structural Change

Lectures on Structural Change Lectures on Structural Change Eric Zivot Department of Economics, University of Washington April5,2003 1 Overview of Testing for and Estimating Structural Change in Econometric Models 1. Day 1: Tests of

More information

Week 5 Quantitative Analysis of Financial Markets Characterizing Cycles

Week 5 Quantitative Analysis of Financial Markets Characterizing Cycles Week 5 Quantitative Analysis of Financial Markets Characterizing Cycles Christopher Ting http://www.mysmu.edu/faculty/christophert/ Christopher Ting : christopherting@smu.edu.sg : 6828 0364 : LKCSB 5036

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Combining Macroeconomic Models for Prediction

Combining Macroeconomic Models for Prediction Combining Macroeconomic Models for Prediction John Geweke University of Technology Sydney 15th Australasian Macro Workshop April 8, 2010 Outline 1 Optimal prediction pools 2 Models and data 3 Optimal pools

More information

Are Forecast Updates Progressive?

Are Forecast Updates Progressive? MPRA Munich Personal RePEc Archive Are Forecast Updates Progressive? Chia-Lin Chang and Philip Hans Franses and Michael McAleer National Chung Hsing University, Erasmus University Rotterdam, Erasmus University

More information

Asymptotic inference for a nonstationary double ar(1) model

Asymptotic inference for a nonstationary double ar(1) model Asymptotic inference for a nonstationary double ar() model By SHIQING LING and DONG LI Department of Mathematics, Hong Kong University of Science and Technology, Hong Kong maling@ust.hk malidong@ust.hk

More information

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY Time Series Analysis James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY PREFACE xiii 1 Difference Equations 1.1. First-Order Difference Equations 1 1.2. pth-order Difference Equations 7

More information

Reality Checks and Nested Forecast Model Comparisons

Reality Checks and Nested Forecast Model Comparisons Reality Checks and Nested Forecast Model Comparisons Todd E. Clark Federal Reserve Bank of Kansas City Michael W. McCracken Board of Governors of the Federal Reserve System October 2006 (preliminary and

More information

7. Forecasting with ARIMA models

7. Forecasting with ARIMA models 7. Forecasting with ARIMA models 309 Outline: Introduction The prediction equation of an ARIMA model Interpreting the predictions Variance of the predictions Forecast updating Measuring predictability

More information

Research Division Federal Reserve Bank of St. Louis Working Paper Series

Research Division Federal Reserve Bank of St. Louis Working Paper Series Research Division Federal Reserve Bank of St. Louis Working Paper Series Evaluating the Accuracy of Forecasts from Vector Autoregressions Todd E. Clark and Michael W. McCracken Working Paper 2013-010A

More information

Econometrics I. Professor William Greene Stern School of Business Department of Economics 25-1/25. Part 25: Time Series

Econometrics I. Professor William Greene Stern School of Business Department of Economics 25-1/25. Part 25: Time Series Econometrics I Professor William Greene Stern School of Business Department of Economics 25-1/25 Econometrics I Part 25 Time Series 25-2/25 Modeling an Economic Time Series Observed y 0, y 1,, y t, What

More information

Principles of forecasting

Principles of forecasting 2.5 Forecasting Principles of forecasting Forecast based on conditional expectations Suppose we are interested in forecasting the value of y t+1 based on a set of variables X t (m 1 vector). Let y t+1

More information

Augmenting our AR(4) Model of Inflation. The Autoregressive Distributed Lag (ADL) Model

Augmenting our AR(4) Model of Inflation. The Autoregressive Distributed Lag (ADL) Model Augmenting our AR(4) Model of Inflation Adding lagged unemployment to our model of inflationary change, we get: Inf t =1.28 (0.31) Inf t 1 (0.39) Inf t 2 +(0.09) Inf t 3 (0.53) (0.09) (0.09) (0.08) (0.08)

More information

A Forecast Rationality Test that Allows for Loss Function Asymmetries

A Forecast Rationality Test that Allows for Loss Function Asymmetries A Forecast Rationality Test that Allows for Loss Function Asymmetries Andrea A. Naghi University of Warwick This Version: March 2015 Abstract In this paper, we propose a new forecast rationality test that

More information

EVALUATING DIRECT MULTI-STEP FORECASTS

EVALUATING DIRECT MULTI-STEP FORECASTS EVALUATING DIRECT MULTI-STEP FORECASTS Todd Clark and Michael McCracken Revised: April 2005 (First Version December 2001) RWP 01-14 Research Division Federal Reserve Bank of Kansas City Todd E. Clark is

More information

An estimate of the long-run covariance matrix, Ω, is necessary to calculate asymptotic

An estimate of the long-run covariance matrix, Ω, is necessary to calculate asymptotic Chapter 6 ESTIMATION OF THE LONG-RUN COVARIANCE MATRIX An estimate of the long-run covariance matrix, Ω, is necessary to calculate asymptotic standard errors for the OLS and linear IV estimators presented

More information

1 Teaching notes on structural VARs.

1 Teaching notes on structural VARs. Bent E. Sørensen February 22, 2007 1 Teaching notes on structural VARs. 1.1 Vector MA models: 1.1.1 Probability theory The simplest (to analyze, estimation is a different matter) time series models are

More information

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics A short review of the principles of mathematical statistics (or, what you should have learned in EC 151).

More information

Ch.10 Autocorrelated Disturbances (June 15, 2016)

Ch.10 Autocorrelated Disturbances (June 15, 2016) Ch10 Autocorrelated Disturbances (June 15, 2016) In a time-series linear regression model setting, Y t = x tβ + u t, t = 1, 2,, T, (10-1) a common problem is autocorrelation, or serial correlation of the

More information

11. Further Issues in Using OLS with TS Data

11. Further Issues in Using OLS with TS Data 11. Further Issues in Using OLS with TS Data With TS, including lags of the dependent variable often allow us to fit much better the variation in y Exact distribution theory is rarely available in TS applications,

More information

Do Markov-Switching Models Capture Nonlinearities in the Data? Tests using Nonparametric Methods

Do Markov-Switching Models Capture Nonlinearities in the Data? Tests using Nonparametric Methods Do Markov-Switching Models Capture Nonlinearities in the Data? Tests using Nonparametric Methods Robert V. Breunig Centre for Economic Policy Research, Research School of Social Sciences and School of

More information

Econometric Forecasting

Econometric Forecasting Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna October 1, 2014 Outline Introduction Model-free extrapolation Univariate time-series models Trend

More information

Econometrics. Week 11. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Econometrics. Week 11. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Econometrics Week 11 Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Fall 2012 1 / 30 Recommended Reading For the today Advanced Time Series Topics Selected topics

More information

ECO Class 6 Nonparametric Econometrics

ECO Class 6 Nonparametric Econometrics ECO 523 - Class 6 Nonparametric Econometrics Carolina Caetano Contents 1 Nonparametric instrumental variable regression 1 2 Nonparametric Estimation of Average Treatment Effects 3 2.1 Asymptotic results................................

More information

Multivariate Time Series: VAR(p) Processes and Models

Multivariate Time Series: VAR(p) Processes and Models Multivariate Time Series: VAR(p) Processes and Models A VAR(p) model, for p > 0 is X t = φ 0 + Φ 1 X t 1 + + Φ p X t p + A t, where X t, φ 0, and X t i are k-vectors, Φ 1,..., Φ p are k k matrices, with

More information