Lecture Notes Prediction and Simulation Based Specification Testing and Model Selection

Size: px
Start display at page:

Download "Lecture Notes Prediction and Simulation Based Specification Testing and Model Selection"

Transcription

1 Lecture Notes Prediction and Simulation Based Specification esting and Model Selection copyright Valentina Corradi and Norman Rasmus Swanson contact:

2 Course Notes Outline - Professor Norman Rasmus Swanson Rutgers University nswanson@econ.rutgers.edu Predictive and Simulation Based Specification esting and Model Selection Part I. Prediction Basics i Introduction ii Optimal Prediction Outline Part II. Parameter Estimation Error and Bootstrap echniques i Parameter Estimation Error ii Bootstrap echniques for Critical Value Construction Part III. Linear and Nonlinear Predictive Accuracy esting with Nested and Nonnested Models i Granger Causality ii Comparing the Predictive Accuracy of wo Nested Models Part IV. Multiple Models, Simulated Data, and In- and Out-of-Sample Specxification and Predictive Accuracy esting i Predictive Accuracy ests with Recursive Estimation ii Correct in-sample Specification esting Using CK ype ests iii Comparing Discrete ime Models Using Simulated Distributions Part V. Density Forecast Evaluation i he Kullback-Leibler Information Criterion Approach ii A Predictive Density Accuracy est for Comparing Multiple Misspecified Models iii Predictive Density esting for Continuous ime Models iv esting Using Continuous ime Finance Models References: All references are listed at the end of the lecture notes. 2

3 Part I - Prediction Basics. Introduction Prediction is an important area of economics. Examples where prediction is used include policy setting at the government level, production and inventory accumulation decisions at the firm level, and investment and asset allocation decisions at the individual level. Consider the following example: From an economic policy perspective, one of the main uses of econometric and statistical methods is to provide forecasts of macroeconomic and financial variables. For instance: Given the rate of inflation over the past twelve months, what will be the rate of inflation next month? What will be the rate two months from now? hese predictions have important consequences for the formulation of economic policy e.g. setting the bank lending rate, etc. Suppose that the Federal Reserve Board forecasts a.5% annualized inflation rate for July 2008, while the Department of reasury provides a forecast of %. How can we decide which of the two is more reliable? One key question thus concerns how we can measure the relative accuracy of different forecasts? Different models yield different forecasts, so we want to choose the model producing the most accurate one. Many econometric techniques deal with the in sample evaluation of models, and only recently has attention focused on out of sample model evaluation. A key difference between the two approaches is that in sample methods tend to select models that are too large. * Overfitting is a problem. * In-sample inference is another. Some of the issues arising are: i Choice of the loss function. Suppose X t+ is the rate of inflation at time t +, and is the rate of inflation forecasted at time t using model i. X i t+/t 3

4 he forecast error implied by model i, is u i,t+ = X t+ X i t+/t. We want to choose model j over model i if, on average, model j produces smaller errors. Smaller in which sense? * Quadratic loss function: choose model j, if on average, u 2 j,t+ < u 2 i,t+ * Mean Absolute loss function: choose model j, if on average, u j,t+ < u i,t+ * Other sorts of Loss Functions? Direction of Change, Profitability, etc. * Sometimes we are more concerned about positive errors than negative errors or viceversa, so we may want to use an asymmetric loss functions, such the linex linear exponential loss. ii Is it possible that for some loss function model j beats model i, and for other loss functions model i beats model j? In general: yes. If we choose the right model, in the sense that we correctly specify the joint distribution of all of the relevant variables, then no other model can win. On the other hand, if the models we compare are misspecified, then the ranking of models is loss function specific i.e. we would like to assume that all model are approximations to the truth - any other assumption seems overly strong. iii What is the effect of parameter estimation error i.e. of estimating the models used in prediction. Suppose we forecast inflation at t+ simply using inflation at time t, and we use a simple linear model. Say: X t = β 0 + β X t + u t. he true forecasting error is u t+ = X t+ β 0 β X t. 4

5 However, we have to replace the unknown parameters with estimates, say b 0 and b. hus the estimated forecast error becomes û t+ = X t+ b 0 b X t = u t+ b 0 β 0 b β X t iv Choice of forecast horizon. Given information up to time t, do we want to forecast inflation at t +, t + 2,...t + k? Again, unless we have the right model, model i can beat model j for given forecast horizons, but model j can beat model i for different forecast horizons..2 Optimal Prediction In this subsection we discuss optimal prediction in the context of of various loss functions. i Quadratic loss functions Consider a time series y t, t =, 2,...,. Suppose we want to find the optimal predictor of y t, h step ahead, using the information available at time t. Let F t = σy,..., y t, X,...X t where X t is a possibly vector valued other series that may help to predict y t. he optimal h step ahead predictor for y t+h given F t is the function ŷ t+h/t such that for any ỹ t+h/t ŷ t+h/t. We know that Ey t+h ŷ t+h/t 2 < Ey t+h ỹ t+h/t 2, ŷ t+h/t = Ey t+h F t i.e. the best predictor is the conditional expectation of y t given F t. In fact suppose that ỹ t+h/t is an F t measurable function e.g. any continuous function of y,..., y t, X,...X t is F t measurable, then Ey t+h ỹ t+h/t 2 = Ey t+h Ey t+h F t ỹ t+h/t Ey t+h F t 2 5

6 = Ey t+h Ey t+h F t 2 + Eỹ t+h/t Ey t+h F t 2 > Ey t+h Ey t+h F t 2, as Ey t+h Ey t+h F t ỹ t+h/t Ey t+h F t = 0. hus, if we want to minimize the square error, the conditional expectation is the best predictor. Prediction with linear models Suppose that y t+ = αy t + ɛ t+, where ɛ t is a white noise process with zero mean and variance σ 2 ɛ i.e. consider an AR process. hen the best one step predictor is and the best h step predictor is Ey t+ y t = αy t, Ey t+h y t = α h y t, correspondingly the one step ahead prediction error is u t+ = y t+ αy t = ɛ t+, and u t+h = y t+h α h y t = ɛ t+h α h 2 ɛ t+2 + α h ɛ t+. ARp processes can be treated in an analogous way. Prediction with nonlinear models Suppose where say or Now, y t+ = agy t + ɛ t+, gy t = y t + / + exp y t gy t = yt 2. Ey t+ /y t = agy t. 6

7 Note further that: y t+2 = agy t+ + ɛ t+2 = agagy t + ɛ t+ + ɛ t+2. Because of the ɛ t+ term entering into the nonlinear function g, it is not immediate how to get the two-step ahead prediction error. In this case we can approximate Ey t+2 y t with agagy t = agey t+ y t. Broadly speaking in order to get the h step ahead forecast, we begin by taking the one step ahead forecast of which we now the closed form expression, then we predict one period ahead again replacing y t+ which is not observable with Ey t+ y t. hat is, replace it with its predicted value given the information and time. We then proceed in the next steps in the same manner. So far we have considered cases in which y t depends only on its own past. Consider now the following model: so that y t+ = β 0 + β X t + ɛ t+, Ey t+ X t = β 0 + β X t. In order to compute h step ahead forecasts, for h >, we need to know the data generating process of X t. In this case we approximate X t+ with EX t+ X t. hat is, use: Ey t+2 X t = β 0 + β EX t+ X t. ii Asymmetric loss functions We have seen that in the case of quadratic loss the best predictor is the conditional mean. In this case the problem of selecting the optimal forecast is equivalent to the problem of correctly specifying the conditional mean. However, there are several instances in which we are more concerned about positive errors y t+h ŷ t+h/t > 0 than about negative errors y t+h ŷ t+h/t < 0. 7

8 Needless to say, arriving at the airport 5 minutes too late is more costly than arriving 5 minutes too early. In this case, then, we want to more heavily penalize positive errors. wo well known asymmetric loss function are Linex loss linear exponential loss and Lin-lin linear-linear loss. If we use Linex loss then we want to find the predictor ŷ t+h/t such that Eexpay t+h ŷ t+h/t + ay t+h ŷ t+h/t < Eexpay t+h ỹ t+h/t + ay t+h ỹ t+h/t, a 0, for any ỹ t+h/t ŷ t+h/t. Note that for a > 0, the loss is approximately linear to the left of the origin, while it is exponential to the right of the origin, and vice-versa for a < 0. hus, positive errors are considered more costly than negative errors. Christoffersen and Diebold 997 show that in this case the best predictor is ŷ t+h/t = Ey t+h F t + a 2 V ary t+h F t /2, where V ary t+h F t is the variance of y t+h conditional on the information available at time t. his formula is valid when y t+h F t = NEy t+h F t, V ary t+h F t. Note that for a > 0 more weight on positive errors the optimal predictor is larger than the optimal MSE predictor. In fact, as we are more concerned about positive errors, we purposely prefer an overestimate. In fact, note that while Ey t+h F t is an unbiased predictor of y t+h, given the information available at time t, for a > 0, is an upwardly biased predictor. Ey t+h F t + a 2 V ary t+h F t In this case, knowledge of the optimal predictor requires knowledge of the joint specification of the conditional mean and variance. Another asymmetric loss is Lin-lin loss. If we use a Lin-lin loss, then we want to find the predictor ŷ t+h/t such that E a y t+h ŷ t+h/t {y t+h > ŷ t+h/t } + b y t+h ŷ t+h/t {y t+h ŷ t+h/t } < E a y t+h ỹ t+h/t {y t+h > ỹ t+h/t } + b y t+h ỹ t+h/t {y t+h ỹ t+h/t } 8

9 for any ỹ t+h/t ŷ t+h/t and a > 0, b > 0. his loss function increases linearly in the error, but for a > b it more heavily penalizes positive errors. If, y t+h F t is NEy t+h F t, V ary t+h F t, then Christoffersen and Diebold show that the optimal predictor under Lin-lin loss is given by ŷ t+h/t = Ey t+h F t + V ary t+h F t /2 Φ a/a + b, where Φ denotes the CDF of a standard normal. As a > b, Φ a/a + b > 0 and so the optimal predictor is upwardly biased. Example - GARCH Consider the following GARCH, model, with ω + ω 2 <, ω 0 > 0, ω > 0, ω 2 > 0. Now note that and so σ 2 t y t = σ t ɛ t, ɛ t iidn0, σ 2 t = ω 0 + ω σ 2 t + ω 2 y 2 t, σt 2 = σ0 2 t + ω 0 ω j t + ω 2 ωy j t j, 2 j=0 j=0 is a measurable function of the past squared returns. hus the relevant information set is F t = σy,..., y t. Now, while Ey t F t = Eσ t ɛ t F t = σ t Eɛ t F t = 0, Ey 2 t F t = V ary t F t = Eσ 2 t ɛ 2 t F t = σ 2 t Eɛ 2 t F t = σ 2 t. Hence, if the loss function is quadratic, the optimal predictor is Ey t F t = 0. If instead the loss function is Linex with a =, the best predictor is Ey t F t = 0.5σ t. 9

10 Finally if the loss is a Lin-lin with parameter a = and b = 2, the optimal predictor is Ey t F t = σ t Φ 2/3 = 0.4σ t. iii Comparing possibly misspecified forecasting models. So far we have considered the issue of optimal prediction for given loss functions. In practice, the true data generating process DGP is unknown and so we form optimal predictions for given models which may be dynamically misspecified. For example, suppose that we believe that y t optimal h step ahead predictor is α h y t. follows an AR process, so that the However if for example the DGP data generating process is a SEAR self-exciting autoregressive process, then, say y t = α y t + α 2 y t {y t > τ} + ɛ t. Now, the optimal forecast under the AR assumption is clearly not optimal at all. Furthermore, in practice we need to define the relevant information set. Again, suppose that y t is an AR2 process, but we are just considering a AR model. hen, α h y t is indeed the optimal predictor under quadratic loss, for the information set F t = σy t ; but we are indeed neglecting the information contained in y t 2. In this case Ey t+h y t is indeed correctly specified, but so there is dynamic mispecification. Ey t+h y t Ey t+h y t, y t, Finally, the h step ahead prediction error is heteroskedastic and autocorrelated and, for the case of Linex loss for example, failing to take this into consideration, would lead to a non optimal forecast. hus, in practice we want to be able to compare the relative predictive ability of two or more, possibly misspecified models. Note that the ranking of the models, in the misspecified case, is loss function specific. On the other hand, if we correctly specify all conditional aspects, then the right models will beat all competitors, regardless of the loss function choice. 0

11 Diebold and Mariano 995 propose a test for the null hypothesis of equal predictive ability, against the alternative of non equal predictive ability. For the time being, we neglect the issue of parameter estimation error. Let u 0,t+h and u,t+h be the h step ahead prediction errors, when predicting y t+h using information available up to time t. and For example, for h =, u 0,t+ = y t+ β 0 β 02 y t β 03 x t, u,t+ = y t+ β β 2 y t β 3 z t. It is important that the two models we are comparing be nonnested i.e. neither is a special case of the other. We shall later see why this is important. Under the assumption that u 0,t and u,t are strictly stationary, the hypotheses for this test of equal predictive accuracy are: and H 0 : Efu 0,t fu t = 0 H A : Efu 0,t fu t 0 where f is some continuous positive valued loss function. he relevant statistic is DM = /2 σ where σ 2 is a consistent estimator of /2 lim V ar t= fu 0,t+ fu t+, fu 0,t+ fu t+. t= Note why we require non-nestedness: Suppose that model is nested in model zero. e.g. u 0,t+ = y t+ β 0 β 02 y t β 03 x t and u,t+ = y t+ β β 2 y t,

12 If model is indeed correctly dynamically specified for the conditional mean, then the null is equivalent to β 0 = β, β 02 = β 2, and β 03 = 0, and so under the null u 0,t = u,t for all t. In practice, as we shall see below we do not observe u 0,t and u,t but we only observe û 0,t and û,t that depend on estimated parameters. But we still have that σ and /2 σ t= fu 0,t+ fu t+ go to zero in probability, and the statistic no longer has a well defined limiting distribution. As we allow for dynamic mispecification under both hypotheses, in general u 0,t and u,t are not martingale difference sequences, i.e. Eu 0,t F t 0 and are in general autocorrelated. hus, we need to use a heteroskedastic and autocorrelation robust covariance HAC estimator for the long run variance. where We can use a Newey-West 987 type estimator, for example. Namely, define σ 2 = t= d t+ d d = t= d t, and w τ = τ/l +. he following is needed in the sequel. l w τ τ= t=τ+ d t+ = fu 0,t+ fu t+, d t+ dd t+ τ d, Assumption A: i u 0,t, u,t is a strictly stationary and strong mixing process with size 2r/r with r >, ii Efu i,t 4 <, i = 0,. ASIDE: Broadly speaking u 0,t is a strong mixing process if it is asymptotically independent, i.e. if u 0,t is independent of its infinite past. More formally, define F n = σu 0,,..., u 0,n, u n to be the information set generated from the infinite past of the series up to time n, and analogously F n+m = σu 0,n+m, u 0,n+m+,..., u 0, is the information set generated by the history of the series from time n + m up to infinity. More precisely, if u 0,t is a strong mixing process, for any u 0,t F n+m, Eu 0,t Eu 0,t F n 2

13 goes to zero as m. he size has to do which the rate at which this quantity goes to zero as m. Proposition : Let Assumption A hold. hen, as, l and l / /4 0, under H 0, DM d N0,, and under H A, for ε > 0. Pr /2 DM > ε, hus, we compare DM with the critical value of a normal random variable. Suppose that we do not reject H 0 if.96 DM.96; otherwise we reject H 0. his gives a test with asymptotic size equal to 0.05 and unit asymptotic power. Note that the same result holds for generic forecast horizon i.e. for h >. Proof - Sketch: Under both hypotheses, σ 2 pr σ0 2 /2 = lim V ar Also, by the central limit theorem for mixing processes, /2 t= d t+. t= d t+ Ed t d N0, σ 2 0. hus, when Ed t = 0, i.e. when the null is true, /2 t= while when Ed t 0 i.e. under the alternative, diverges at rate /2. d t+ d N0, σ 2 0, /2 d t+ t= Note that many applied practitioners do not even implement the simple DM test, instead relying on point estimates of mean square errors and related statistics when comparing alternative prediction models. 3

14 2 Part II - Parameter Estimation Error and Bootstrap echniques 2. Parameter Estimation Error In practice, we do not observe the true forecasting error. For simplicity, consider However, we do not know the vector β. u 0,t+ = y t+ β 0 β 02 y t β 03 x t, hus, we need to replace the parameters with their estimator and take into account the error due to the fact that the parameters are estimated. here are three main sampling schemes. i fixed scheme ii recursive scheme iii rolling scheme. When interested in out of sample forecasting and when we need to estimate parameters, we typically split the sample into two subsamples, a regression period, with R observations, and a prediction period, with P observations, where = R + P. Fixed estimation scheme: Use the first R observations to estimate the parameters, called them β R, and construct a sequence of P prediction errors, defined as for t = R,..., R + P. û 0,t+ = y t+ β 0,R β 02,R y t β 03,R x t, Recursive estimation scheme: Use the first R observations to compute β R, and construct the first prediction error: û 0,R+ = y R+ β 0,R β 02,R y R β 03,R x R. hen use all observations up to time R+ to construct β R+, and get the second prediction error û 0,R+2 = y R+2 β 0,R+ β 02,R+ y R+ β 03,R+ x R+. Proceed in the same manner until you have a sequence of P prediction errors, defined as: û 0,t+ = y t+ β 0,t β 02,t y t β 03,t x t, 4

15 for t = R,...R + P, where β t is the estimator computed using observations up to time t. Rolling estimation scheme: Use the first R observations to compute β R, and construct the first prediction error: û 0,R+ = y R+ β 0,R β 02,R y R β 03,R x R. hen, observations from t = 2 up to t = R + are used to construct β 2,R+, and a second prediction error in constructed: û 0,R+2 = y R+2 β 0,2,R+ β 02,2,R+ y R+ β 03,2,R+ x R+. hereafter, use observations from t = 3 to t = R + 2 and obtain another prediction error. Proceed in the same manner estimating the parameters using the most recent R observations, until you have a sequence of P prediction errors: û 0,t+ = y t+ β 0,t R+,t β 02,t R+,t y t β 03,t R+,t x t, for t = R,...R + P, where β t R+,t is the estimator computed using observations from time t R + up to time t; that is using the most recent R observations see West and McCracken 998 for an overview of the properties of various sampling scheme. he most commonly used approach is the recursive scheme. Intuitively it makes sense to use the information contained in new observations as soon as it becomes available. However, one must also be aware of structural breaks due to changing data definitions, changing model specification, changing tastes and preferences, etc. i Effect of parameter estimation error when performing test for equal predictive accuracy. Let where and where Further, define u 0,t+ = y t+ w 0,tβ 0 w 0,t =, y t, x t, β 0 = β 0, β 02, β 03 u,t+ = y t+ w tβ, w t =, y t, z t, β = β, β 2, β 3, û 0,t+ = y t+ w 0,t β 0,t 5

16 and û,t+ = y t+ w,t β,t, where the parameters have been estimated recursively. We observe û 0,t+ and û,t+, and so construct the Diebold-Mariano statistic using the out of sample period, as: DM P = P /2 σ P t=r fû 0,t+ fû t+, with where σ 2 P = P t=r d t+ d l P w τ τ= t=r+τ+ d t+ = fû 0,t+ fû t+ d t+ d 2 d t+ τ d, and d = d t+. t= Assume that f is a differentiable function this rules out Lin-lin loss, for example, via a mean value expansion around β 0 and β, we have: P /2 t=r fû 0,t+ fû t+ = P /2 t=r fu 0,t+ fu t+ 2 where +P t=r +P t=r β0 fũ 0,t+ P /2 β 0t β 0 β fũ,t+ P /2 β t β, 3 ũ 0,t+ = y t+ w 0,t β 0,t with β 0,t β 0, β t and ũ,t+ is defined in an analogous manner. Note that the term on the right hand side of 2 is the same term we had in the absence of parameter estimation error i.e. as if we knew the parameters. he main issue we shall address is the following: Do the last two terms above vanish in probability as the sample gets large? In other words, does the effect of parameter estimation error vanish as the sample size get large? Under which conditions will it vanish? 6

17 We shall show that, in the context of DM type tests: a Regardless of the choice of the loss function, f, parameter estimation error vanishes if, as, P/R 0, i.e. if the estimation period grows at a faster rate than the prediction period grows. Suppose that = 000, R = 0000 and P = 00, in this case P = R /2, so P/R = R /2 0 as R. In general, suppose that R = δ, δ <, and P = δ so = R + P. In this case P/R = δ / δ 0 as. In practice, this occurs when the period used for estimation is much longer than the period used for out of sample forecasting. b If the same loss function is used for estimation and out of sample prediction, then parameter estimation error vanishes, regardless the relative rates at which P and R grow as the sample size get large for the case of the DM test. his is for example the case in which we use nonlinear or ordinary least squares for estimation and we employ a quadratic MSE loss function. More generally, this occurs when we estimate parameters via an m estimator and we use the same loss function for out of sample prediction as we shall see below. Several authors see e.g. Granger 969 and Weiss 996 point out that the right way to proceed is to use the same loss for estimation and prediction. c Finally, if P/R π > 0, and we use a different loss function for estimation and prediction, then the contribution of parameter estimation error does not vanish. In particular, it will affect the covariance of the limiting distribution and we need to take it into account, if we want to perform valid inference. ii m Estimators. Let: Examples: my t, X t, θ = y t X tθ 2 OLS θ = arg min θ Θ my t, X t, θ = y t hx t, θ 2 Nonlinear Least Squares my t, X t, θ 4 t= my t, X t, θ = log fy t X t ; θ Maximum Likelihood or Quasi-MLE QMLE. 7

18 Now, define: θ = arg min θ Θ Emy t, X t, θ 5 Note that, given the above expression for θ, t= θ my t, X t, = t= θ θ θ my t, X t, θ = 0, because of the first order condition. Also, θ Emy t, X t, θ θ = θ Emy t, X t, θ = E θ my t, X t, θ = 0. If the uniform law large numbers holds. hat is, if pr my t, X t, θ E my t, X t, θ 0, then θ sup θ Θ t= pr θ consistency. Now, by a mean value expansion around θ, θ my t, X t, θ t= = θ my t, X t, θ t= + 2 θmy t, X t, θ θ θ, t= where θ θ, θ. Now, t= θ my t, X t, θ = 0. hus: θ θ = 2 θmy t, X t, θ θ my t, X t, θ t= t= = E 2 θmy t, X t, θ θ my t, X t, θ t= 2 θmy t, X t, θ E 2 θmy t, X t, θ t= θ my t, X t, θ 6 t= Now, if the uniform law of large number holds. hat is, if sup θ Θ t= 2 θ my t, X t, θ E 2 θmy t, X t, θ pr 0, 8

19 and E 2 θmy t, X t, θ is a positive definite matrix, then 2 θmy t, X t, θ E 2 θmy t, X t, θ t= pr 0. Note that E θ my t, X t, θ = 0, by first order conditions. Under regularity conditions see e.g. West 996, the central limit theorem applies and where θ my t, X t, θ d N0, V, t= V = lim V ar θ my t, X t, θ. t= Now, the term on the last line in 6 is the product of something going in probability to zero with something converging in distribution, therefore it goes in probability to zero product rule. hus, θ θ d N0, MV M, with M = E 2 θmy t, X t, θ. Now, we need to have estimator for M and V, call them M and V. By the uniform law of large numbers, a consistent estimator for M, is given by M = 2 θmy t, X t, θ t=. Now, for the estimator of V, if E θ my t, X t, θ θ my s, X t, θ = 0 for all t s, then V = θ my t, X t, θ θ my t, X t, θ, t= if E θ my t, X t, θ θ my s, X t, θ 0, then we need to use a Newey-West HAC heteroskedastic autocorrelation robust estimator. In this case, V = θ my t, X t, θ θ my t, X t, θ t= + 2 l w τ θ my t, X t, θ θ my t τ, X t τ, θ τ= t= Under the same type of assumptions as in West 996, M V M /2 θ θ d N0, I. 9

20 Note that V = M equivalent to the condition of spherical errors in the linear model. urning to our example, let Suppose that β 0t is a m estimator defined as û 0,t+ = y t+ w 0,t β 0,t. β 0t = arg min β 0 B t Note that if m is a quadratic function, then β 0t = arg min β 0 B 0 t and so β 0t is the OLS estimator. Also define, so that if m is a quadratic function, then t my j w 0,j β 0, 7 j=2 t y j w 0,j β 0 2, j=2 β 0 = arg min β 0 B 0 Emy j w 0,j β 0, 8 β 0 = arg min β 0 B Ey j w 0,j β 0 2 and if model zero is indeed correctly specified, then β 0 denotes the parameter of the conditional expectation. β t and β can be defined in the same manner. versus where We want to test H 0 : Efu 0,t fu t = 0 H A : Efu 0,t fu t 0 u 0,t+ = y t+ w 0,tβ 0, u,t+ = y t+ w,tβ. From the DM statistic, consider: P /2 t=r = P /2 t=r fû 0,t+ fû t+ 9 fu 0,t+ fu t+ 0 20

21 where P t=r +P t=r β0 fu 0,t+ P /2 β 0t β 0 β fu,t+ P /2 β t β, u 0,t+ = y t+ w 0,tβ 0,t with β 0,t β 0, β t, and where u 0,t+ is defined in an analogous manner. he term on the right hand side in 9 is the DM statistic for the case in which we know the underlying parameters. For the sake of simplicity, let s concentrate on the first piece on the RHS of. Let by a mean value expansion around β 0, t m t β 0 = my t w 0,t β 0, t β m j β 0t = t β m j β j=2 t 0 j=2 + t t 2 βm j β 0t β 0t β0, j=2 with β 0t β 0t, β0. Now the left hand side above is identical to zero by the first order conditions see equation 33, thus t /2 β 0t β 0 Hereafter, let f t β 0 = u t β 0 = fy t+ w 0,tβ 0, and let f t β = u t β = fy t+ w,tβ. = t t /2 t 2 βm j β 0t j=2 t β m j β0 2 j=2 Along the lines of West 996 we now state the following assumptions. Assumption A: fu i,t, is twice continuously differentiable in β i and sup 2 f t β i / β i β i < C, i = 0,. β i0 B i 2

22 Assumption A2: sup t tj=2 2 t βm j β it a.s B i, where B i is negative definite, i = 0,. Assumption A3: i y t, x t, w i,t, i = 0,, is a strictly stationary strong mixing sequence with size 44 + ψ/ψ, ψ > 0, ii f and m are twice continuously differentiable in β, over the interior of B, and β m, 2 βm, β f, 2 βf are 2r dominated more simply have 2r moments finite uniformly in B with r 22 + ψ Assumption A4: βi uniquely identified i.e. Emy j w i,j β i < Emy j w i,j β i for any β i βi, i = 0,. Assumption A5: = R + P, as, P, R and P/R π, 0 π. Hereafter the notation o P denotes a term which approaches zero in probability. Recall that we re considering P t=r β0 fu 0,t+ P /2 β 0t β 0. Proposition 2: Let Assumptions A,A2,A3,A4,A5 hold. hen: where i If f = m i.e. if we are using the same loss for estimation and testing, then: P t=r β0 fu 0,t+ P /2 β 0t β 0 = o P. ii If π = 0 i.e. if as, P/R 0, then: P t=r β0 fu 0,t+ P /2 β 0t β 0 = o P. iii In all other cases i.e. π > 0 and f m, then: P t=r β0 fu 0,t+ P /2 β 0t β 0 d N0, 2ΠF 0B 0 S h0 h 0 B 0 F 0, Π = π ln + π for 0 < π <, and where Π = for π =. Also, F 0 = E β0 fu 0,t+, S h0 h 0 = E β m β0 β m +j β0, j= 22

23 and B 0 = E 2 θm t θ 0. hus, Proof - Sketch: i From Lemma A3 in West 996, for all a < 0.5, P t=r = P t=r sup t a β 0t β0 = o P. t R β0 fu 0,t+ P /2 β 0t β 0 and the first term on the right hand side can be written as β0 fu 0,t+ P /2 β 0t β 0 + o P 3 E β0 fu 0,t+ P /2 β 0t β0 +P /2 t=r t=r the second term is o P as sup t R β 0t β 0 = o P, and P /2 t=r β0 fu 0,t+ E β0 fu 0,t+ β 0t β 0, 4 β0 fu 0,t+ E β0 fu 0,t+ = O P because of the central limit theorem, given A3 and A4. Now if f = m, E β0 mu 0,t+ = E β my t w 0,t β 0 = 0, because of the first order condition in 8. ii Recall equation 3, and note that P β0 fu 0,t+ P /2 β 0t β0 t=r P β0 fu 0,t+ sup P /2 β 0t β0 t=r t R, and P t=r β0 fu 0,t+ pr E β0 fu 0,t+ <, 23

24 and as P/R 0 and sup t R t a β 0t β 0 = o P. iii Given 2 and given 3 and 4, can be written as, sup P /2 β 0t β0 = o p, t R P /2 t=r β0 fu 0,t+ β 0t β 0 E β0 fu 0,t+ B 0 P /2 t β m j β t 0 + o P t=r West Lemma A5 shows that lim V ar P /2 t β m j β P t=r t 0 = 2ΠS h0 j=2 j=2 hus we have seen that there are two important cases in which the effect of parameter estimation error vanishes in probability. Namely, when the prediction period grows at a slower rate than the estimation period and when we use the same loss function for estimation and testing. For example, when we use a quadratic loss function and we estimate the parameter by OLS, then parameter estimation error vanishes. As mentioned above, it has been suggested by Granger 969, 993 and more recently by Weiss 996 that the right approach is indeed to use the same loss for function for estimation and prediction. Broadly speaking, if we use a different loss function for estimation and testing, then we are a priori ruling out the use of an optimal predictor. Proposition 3: Let Assumptions A,A2,A3,A4,A5 hold and let f m different loss for estimation and testing and π > 0. hen, under H 0, fû P /2 0,t+ fû,t+ t=r d N 0, S ff + 2ΠF 0B 0 S h0h0 B 0 F 0 +2ΠF B S h h B F ΠS f h0 B 0 F 0 + F 0B 0 S fh0 2Π F B S h h 0 B 0 F 0 + F 0B 0 S h0 h B F +ΠS fh B F + F B S fh 24

25 where for i = 0, F i = E βi fu i,t+, B i = E 2 βm j β i, S hi h l = j= E β m β i β m +j β l, i, l = 0, S fhi = j= Efu 0, fu, β m +j β i, S ff = j= Efu 0, fu, fu 0,+j fu,+j Under the alternative, for some ε > 0, and so diverges at rate P /2. Pr P t=r fû 0,t+ fû,t+ > ε =, fû P /2 0,t+ fû,t+ t=r In order to implement a valid DM test in the case of non vanishing parameter estimation error, we need to consistently estimate all the pieces of the covariance matrix in Proposition 3. Now, for consistent estimation of F i and B i we can just use the sample mean evaluated at the estimated parameters. Namely, we can use: and B i = F i = P P t=r t=r βi fû i,t+, 2 β i m j β i,t However for the long run covariance matrix we need to use a HAC Newey-West type covariance estimator. Define for i, l = 0,, Ŝ hi h l = P l P τ= l P w τ l P t=r+l P β m t β i,t β m t+τ β l,t Ŝ fhi = P l P l P w τ τ= l P t=r+l P fû 0,t fû,t P β m t+τ β i,t t=r fû 0,t fû,t 25

26 Ŝ ff = P Given A-A4 above, if we let l P l P w τ τ= l P t=r+l P fû 0,t fû,t P t=r fû 0,t+τ fû,t+τ P fû 0,t fû,t t=r w τ = τ/l P + fû 0,t fû,t and as P, l P and l P /P /4 0, then Ŝh i h l, Ŝfh i, and Ŝff are consistent for S hi h l, S fhi, S ff. Note that in practice we do not know π, a natural estimate for π is π = P/R. Also we do not observe the rate at which P and R grow. hus, unless R is much larger than P it is worthwhile using the formula for the covariance which takes into account parameter estimation error, whenever we use a different loss for estimation and prediction. Of note is that a recent key paper by Giacomini and White discusses conditional predictive inference, in which case the data are conditioned on and parameter estimation error essentially vanishes. 2.2 Bootstrap echniques for Critical Value Construction 2.2. Introduction to the Bootstrap Inference on parameters is based on asymptotic critical values. But, how good is the normal approximation? Can we improve over inference based upon the normal approximation? We shall see that bootstrap critical values can provide refinements over asymptotic critical value under various circumstances. First, let us outline the logic underlying the bootstrap, and then we shall see how the use of bootstrap can lead to more accurate inference. Consider a very simple situation. We have a sample of iid observations, X,..., X and we want to test the null hypothesis: versus H 0 : EX = µ H A : EX µ Note that given the identical distribution assumption, EX = EX 2 =... = EX. 26

27 where Consider the t-statistic t µ, = σ 2 X = t= X /2 t µ, σ X X t t= 2 X t. t= Provided that varx <, we know that under H 0, t µ d N0,. hus, we compare t µ with 2.5% and 97.5% critical values of a standard normal, and we reject at 5% significance level if t µ, <.96 or t µ, >.96. he idea underlying the bootstrap is to pretend that the sample is the population, and draw from the sample as many bootstrap samples as needed in order to construct many bootstrap statistics. he simplest form of bootstrap is the iid nonparametric bootstrap, which is suitable for iid observations. Imagine that we put all observations into an urn, and we then make draws with replacement i.e. we make one draw, get one observation, put it back into the urn, get another one, put it back in the urn, and so on. Let X, X2,..., X with probability /. be the resampled observations, and note that X = X t, t =,..., In other words, X, X 2,..., X is equal to X I, X I2,..., X I, where for i =,..., I i is a random variable taking values, 2,..., with equal probability /. X, X 2,..., X forms a bootstrap sample. Needless to say, we can repeat the same operation and get a second bootstrap sample, and so on. Note that, given the original sample, the probability law governing the resampled series is nothing other than the probability law of I i, i =,...,. As I i are iid discrete uniform random variables on [, ], the Xi are also iid, conditional on the sample. Let E and V ar denote the mean and the variance of the resampled series, conditional on sample note that E and V ar are mean and variance operators in terms of the law governing the bootstrap i.e. in terms of I i, i =,...,. Now, given the identical distribution, E X = E X 2 =... = E X, 27

28 and E X = X + X X = X t t= Also, E Xt = E X = t= hus, the bootstrap mean is equal to the sample mean. X t t= Given that X,..., X are iid observations, V ar X /2 t t= = V arxt = V ar X t= = E X 2 E X 2 = 2 Xt 2 X t t= t= = Xt 2 2 X t. hus, the bootstrap variance is equal to the sample variance. Let σ 2 X = t= Xt t= t= 2 X t. t= Given that X,..., X are iid with mean and variance equal to the sample mean and sample variance, t µ, = t= X /2 t t= X t σ X d N0,, where d denotes convergence in distribution according to the bootstrap probability measure, conditional on the sample. 28

29 Note importantly that t d µ, N0,, regardless whether the null hypothesis is true or not. hus, under the null t µ, and t µ, have the same limiting distribution. Under the alternative, d N0, while t µ, diverges to. t µ, his suggests proceeding in the following manner. Construct B B large bootstrap statistics, say t µ,,..., t B µ,. Sort these statistics from the smallest to the largest. Suppose B = 000, then the 25th bootstrap statistic gives the 2.5% significance level critical value, say z,2.5% ; and the 975-th bootstrap statistic gives the 97.5% significance level critical value, say z,97.5%. if If B is large enough, then rejecting H 0 if t µ, < z2.5% or t µ, > z,97.5% and not rejecting z,2.5% < t µ, < z,97.5% yields a test with asymptotic size equal to 5% and unit asymptotic power. It is important to note that in this case the bootstrap higher moments also are equal to the sample moments. In fact, given independence, = E /2 and so on for the fourth moments, etc. 3 Xt t= 3/2 E X 3 = /2 Xt 3 t= Question: Is inference based on z,2.5% and z,97.5% more accurate than inference based on standard normal approximations e.g. based on using ±.96? Answer: Yes. Why? Show why using the Edgeworth Expansion nothing to do with Edgeworth box! Under mild assumptions satisfied for the sample mean in the iid case provided there are enough finite moments, we can express the distribution of the t-statistic as the CDF of a standard normal, plus other terms capturing deviations from normality. Namely, Pr t µ, x = Φx + /2 p xφx + p 2 xφx + 3/2 p 3 xφx +..., 5 where Φx and φx are the cumulative distribution function and the density of a standard normal evaluated at x, p x is a polynomial in x depending on the central third moment, p 2 x is a polynomial in x depending on the central fourth moment, etc. 29

30 herefore, p x captures the deviation from normality in the form of skewness, and p 2 x captures deviation from normality in the sense of excess kurtosis. he successive terms capture more complex deviations and higher order effects. From 5 we see that the order of approximation of the normal distribution is /2. Analogously, we can write the Edgeworth expansion for t µ,, i.e. Pr t µ, x = Φx + /2 p xφx + p 2 xφx + 3/2 p 3 xφx +..., 6 where p x is a polynomial in x depending on the sample central third moment, p 2 x is a polynomial in x depending on the sample central fourth moment, etc. herefore, as sample moments converge to population moments, and as we know that under mild assumptions the convergence is at rate /2, we have that p 2 x p 2 x = O p /2, etc. p x p x = O p /2, Recall that Pr t µ, x depends on the sample, and so it s a random variable, while Pr t µ, x is a number between 0 and! depending on. while hus, Pr t µ, x Pr t µ, x = O P, Pr t µ, x Φx = O /2. Hence, if we approximate Pr t µ, x with a standard normal CDF we have an error of order O /2, while if we approximate Pr t µ, x with the bootstrap distribution Pr t µ, x we have an error of order O P. hus, the bootstrap distribution provides a more accurate approximation than the normal CDF. In practice, we do not compare Pr t µ, x with Φx, but instead we compare t µ, with z α. Let z,α be defined as below, Pr t µ, z,α = α and analogously, define z,α as Cornish Expansion Pr t µ, z,α = α Recall that the boostrap moments are the sample moments. 30

31 Whenever we have an Edgeworth expansion, we can always obtain a Cornish expansion by inversion. z,α = z α + /2 q α + q 2 α + 3/2 q 3 α... 7 where q α, q 2 α are polynomials in α capturing skewness and kurtosis, and z,α = z α + /2 q α + q 2 α + 3/2 q 3 α... where q α, q 2 α are also polynomials in α capturing sample skewness and sample kurtosis. Now, q α q α = O P /2 and hus, while q 2 α q 2 α = O P /2 z,α z,α = O z,α z α = O P /2. herefore, we see that inference based on bootstrap critical values is more accurate than that based on asymptotic normal critical values Bootstrapping GMM Estimators with ime Series he iid nonparametric bootstrap does not work with dependent observations. he reason is that the resampled observations are iid, while the actual observations are not. In the case of dependent observations, things are more complicated. On one hand we want to draw blocks of data long enough to preserve the dependence structure present in the original sample, while on the other hand we want to have a large enough number of blocks independent each other. he most used resampling method for time series data is the block bootstrap of Künsch 989, which we shall consider below. Let = bl, where b denotes the number of blocks and l denotes the length of each block. We first draw a discrete uniform random variable, I, that can take values 0,,..., l with probability / l +. he first block is given by X I+,..., X I +l. We then draw another discrete uniform random variable, say I 2, and a second block of length l is formed, say X I2 +,..., X I2 +l. Continue in the same manner, until you draw the last discrete uniform say I b, and so the last block is X Ib +,..., X Ib +l. 3

32 Let s call the X t the resampled series, and note that X, X 2,..., X corresponds to X I +, X I +2,..., X Ib +l. hus, conditional on the sample, the only random element is the beginning of each block. In particular X,..., X l, X l+,..., X 2l, X l+,..., X, conditional on the sample, can be treated as b iid blocks of discrete uniform random variables. It can be shown that conditional on the sample and for all samples except a set of probability measure approaching zero, E Xt = t= X t + OP l/ 8 t= V ar /2 Xt t= = l l t t=l i= lx X t X t+i X t t= t= +O P l 2 /, 9 where E and V ar denotes the expectation and the variance operators with respect to P the probability law governing the resampled series or the probability law governing the iid uniform random variables, conditional on the sample, and where O P l/ O P l 2 / denotes a term converging in probability P to zero, as l/ 0 l 2 / 0. Proof - Sketch of 8 and 9: E Xt t= = E bl = E l b l X Ii +j i= j= l X I +j j=, 20 as I i, i =,..., b are independent uniform random variables; and so, conditional on the sample, blocks are independent and identically distributed note that conditional on sample, all randomness is due to I,..., I b, which are iid uniform random variables. hus, 20 can be rewritten as: l X + X X l PrI = 0 + l X 2 + X X l+ PrI = l X l+ + X l X 2l PrI = l 32

33 l X bl l+ + X bl l X bl PrI = l +. 2 Now PrI = 0 = PrI = =... = PrI = l = l +. Note that for l + t l we have lx t summands, while we have only X and X bl, 2 X 2 and X bl, and l X l and X bl l. hus, summing up the terms in 2 we have that: E Xt = t= = l + l X t + O P l/ 22 t=l+ X t + O P l/ t= Now, we want to sketch the proof of 9. As I i, i = 0,, 2,..., b are iid, and given 22, V ar = V ar = E l X /2 t t= l X l /2 I +j j= = V ar b /2 l /2 b l X Ii +j i= j= l l X I +k X a X I +j X a + O P l 2 /, k= j= where X a = t= X t. he first term on the RHS above is in turn equal to l l l X k+ X a X j+ X a PrI = 0 k= j= + l l l X k+2 X a X j+2 X a PrI = k= j= l l X k+ l X a l k= j= X j+ l X a PrI = l 33

34 l l = X t X a X t+j X a + O P l 2 /. l t=l j= l Now, Künsch shows that conditional on the samples except a set with probability measure approaching zero, as l, t b µ, = V ar /2 /2 Xt t= /2 Xt EXt d N0, I. 23 t= Let where σ 2,HAC t HAC µ, = t= X /2 t µ σ HAC is an HAC covariance estimator., hus, if we use the block bootstrap, we know that t HAC µ, and t b µ, will have the same limiting distribution and so bootstrap critical values are asymptotically valid, as explained below. Recall that in the iid case iid observations and iid bootstrap, and E V ar /2 Xt = t= Xt = V ar t= X t t= /2 X t. t= In the case of the block bootstrap with dependent observations but the same will be true if we use the block bootstrap and we have iid observations, E Xt = l X t + O P t= t= V ar = V ar X /2 t t= X t /2 t= l 2 + O P 34

35 As a consequence, it is no longer true that Pr t HAC µ, x Pr t b µ, x = O P. Gotze and Hipp 996, for the case of stationary mixing observations, show that if we choose the block length, l, equal to the lag truncation parameter used in the construction of the HAC variance estimator i.e. l = m, then hus, for l = /4, 2 Pr t HAC µ, x Pr t b µ, x = O P l + O l /2 Pr t HAC µ, x Pr t b µ, x = O P 3/ Bootstrap Refinements for GMM Estimators Now, we outline how to bootstrap GMM estimators, and we see how bootstrap critical values can provide an improvement over asymptotic normal critical values see Andrews 2002 for complete details. Improvement over standard asymptotic approximation is called higher order refinement. In the sequel, we require that E g t β GMM gt k β GMM = 0 for all k > κ, where κ is finite. hat is, the correlation between the moment conditions is zero after the κ th term. Currently, for the case of general nonlinear GMM estimators, there are no results about bootstrap higher order refinements for the general case, in which κ = κ with κ, as. 3 For some generality, consider the case in which the variance of the moment conditions depend on the parameters; and therefore we use a two-step GMM approach. 2 Note that while we need m / /4 0 for the case of possibly heterogeneous observations, in the strict stationary case we can allow for m = /4. 3 Inoue and Shintani 2006 provide GMM refinements in the case of κ = κ for linear IV overidentified estimators. 35

36 In the first step, we use an arbitrary p p weighting matrix, say Ω, and we compute, β,gmm = arg min g t β Ω g t β β B t= t= = arg min β ΩG β. β B 24 Given β,gmm, we compute the second step estimator β,gmm = arg min β B G β Ω β,gmm G β, 25 where Ω β,gmm = g t β,gmm gt β,gmm t= + 2 κ g t β,gmm gt j β,gmm t= j= he two-step GMM covariance matrix estimator is given by: where σ 2 = D β,gmm Ω β,gmm D β,gmm, D β,gmm = t= β g t β β= β,gmm. Let σ 2 ii, be the ii th element of σ 2. Suppose, g t β = g y t, X t, Z t, β, and we resample b blocks of length l of y t, X t, Z t, in order to obtain y t, X t, Z t. Let g t β = g y t, X t, Z t, β E g y t, X t, Z t, β,gmm, 26 where = E g yt, Xt, Zt, β,gmm t= w t g y t, X t, Z t, β,gmm l + t= 36

37 with w t = t/l t =,..., l w t = t = l,..., l + w t = t +, t = l + 2,..., l he weight w t is smaller than one for the first and last l observations, as they have less chances of being drawn. Note, that in general g yt, Xt, Zt, β,gmm has non-zero mean even if g yt, X t, Z t, β GMM has zero mean. Hence there is a need for recentering the bootstrap moment conditions. In fact, E gt β,gmm = 0. Now, define the bootstrap counterpart of β,gmm as β,gmm, where β,gmm = arg min gt β Ω β B t= = arg min β B G β ΩG β, gt β t= and where g t β is defined as in 26. Also, define the bootstrap counterpart of β,gmm as β,gmm, where where and hus, Ω β,gmm β,gmm = arg min β B = arg min β B G t= gt β Ω β,gmm β,gmm G β, β Ω t= gt β g t β = g y t, X t, Z t, β E g y t, X t, Z t β,gmm, 27 Ω + 2 β,gmm = κ t= j= t= gt β,gmm g t β,gmm gt β,gmm g t j β,gmm is the bootstrap analog of Ω β,gmm. 37

38 he bootstrap covariance matrix, is given by σ 2 = D β,gmm Ω β,gmm D β,gmm, where versus D β,gmm = t= Now, let σ 2 ii, be the ii th element of σ 2. We are interested in testing Define the t-statistic as: H 0 : β i = β i,gmm H A : β i β i,gmm. β g t β β= β,gmm. t βi, = /2 β i,,gmm β i,gmm σ ii, he bootstrap analog of t βi, is: tβ i, = /2 β i,,gmm β i,,gmm. Now, σ 2 ii, is the bootstrap counterpart of σ 2 ii,, but the variance of σ 2 ii, does not coincide with var /2 β i,,gmm β i,,gmm. his is because the dependence in the sample moment conditions and in the bootstrap moment conditions is not the same. his is due to the so called joint problem. Blocks are independent, conditional on the sample. So, the last observation of a block and the first of the next block are uncorrelated. However, this is not true in the original sample. As there are b joint points as many as the blocks, this has to be taken into account. Summarizing, the issue is whether or not σ ii, 2 properly mimics σ ii, 2 2 i.e. E σ ii, = σ 2 ii,, but σ ii, 2 is NO var /2 β i,,gmm β i,,gmm. We thus need a correction factor. Define σ ii, σ 2 ii, = D β,gmm Ω β,gmm D β,gmm D β,gmm Ω β,gmm Ω β,gmm Ω β,gmm D β,gmm Ω β,gmm D β,gmm, 38

39 where Ω β,gmm = E = t= s= g t l l l β,gmm g s β,gmm l l t=0 j= i= gt+j β,gmm g t+i β,gmm. Note that he correction factor is thus given by σ 2 ii, = var /2 β i,,gmm β i,,gmm. τ ii, = σ ii, σ ii,, Now, consider the adjusted bootstrap statistic, t β i, = /2 β i,,gmm β i,,gmm σ 2 ii,, σ ii, which is given by the product of the bootstrap analog of the t-statistic times the correction term. We construct B corrected bootstrap statistics, t β i,, and use them to get the corrected bootstrap critical values, z α/2, and z α/2, for example, if B = 000 and α = 0.025, then z α/2, is the 25th smallest t β i, and z α /2, is the 975th smallest corrected bootstrap statistic, t β i,. Now, under H 0, Pr t βi, <.96 or t βi, >.96 = O /2 If we instead use corrected bootstrap critical values z α/2, and z α/2,, then Pr t βi, < z α/2, or t βi, > z α/2, = O /2+ξ where ξ > 0. hus, as ξ > 0, inference based on bootstrap critical values is more accurate than inference based on asymptotic, standard normal critical values. In the case of iid data, l = and ξ = /2 so that we have the same order of improvement as that for the sample mean one cannot do better than this... In the dependent case, when l = /4, ξ can be arbitrarily close to /4. 39

40 Remarks i hus far, we have considered the case of equally tailed tests, in the sense that we compare t βi, with the 2.5% and 97.5% critical values. Needless to say, in finite samples, z α/2, z α/2,, as the bootstrap distribution is not symmetric for finite. However, if we impose symmetry, and we compare t βi, with z α/2,, then Pr t βi, > z α/2, = O +ξ where again ξ cannot be larger than /4. he smaller order of error is due to the fact that, by imposing symmetry, the first term in the Edgeworth and Cornish expansions disappears. ii Broadly speaking, in the iid case iid bootstrap we have an improvement, over standard normal critical values, of order /2. In the dependent case block bootstrap case, by choosing the block length of order /4, we have an improvement, not larger but arbitrarily close to /4. his is due to the fact that in the iid data/iid bootstrap case, the bootstrap moment approaches the sample moments at rate /2. On the other hand, in the block bootstrap case, the bootstrap moments approach the sample moments at rate /4. he latter is true, even if we have iid data but we resample using blocks. iii Note that if the moment conditions are a martingale sequence, dynamic correct specification case, then κ = 0. However, this does not help. We still need to use the block bootstrap, in order to capture dependence in the higher higher than second moments. 40

41 3 Part III - Linear and Nonlinear Predictive Accuracy esting With Nested and Nonnested Models As the title to this section suggests, we are now ready to discuss in detail a number of predictive accuracy tests used for comparing linear and nonlinear models, potentially under misspecification, allowing for parameter estimation error, and for both nested and nonnested alternatives. 3. Granger Causality We say that X t Granger causes Y t, if the lags of X t help to predict Y t Granger 969. More formally, if fy t F x t = fy t F t, then we say that X t is not Granger causal for Y t, where F t denotes some relevant information set, and F x t is the same information set, except without any past values of X t. ypically, when performing a causality test, the null is that of non-causality, versus the alternative of causality. Causality tests are often performed by regressing Y t on its lags, on the lags of X t, and on lags of other relevant variables; and then testing whether the coefficients on the lags of X t are all equal to zero or not. versus More formally, consider H 0 : X t does not cause Y t H A : X t causes Y t We could then estimate the following model, say: p q Y t = c + α i Y t i + β j X t j + e t i= j= and the null and alternative could be restated as versus H 0 : β j = 0 for all j =,..., p H 0 : β j 0 for at least one j =,..., p hus, to test the null, one could use the usual F, Wald, Lagrange Multiplier or Likelihood Ratio tests, for example. 4

Introduction to Forecasting and Forecast Evaluation

Introduction to Forecasting and Forecast Evaluation Introduction to Forecasting and Forecast Evaluation Lecture Notes to Acompany Talk Norman Rasmus Swanson Rutgers University contact: nswanson@econ.rutgers.edu http://econweb.rutgers.edu/nswanson/ prepared

More information

Bootstrap Procedures for Recursive Estimation Schemes With Applications to Forecast Model Selection

Bootstrap Procedures for Recursive Estimation Schemes With Applications to Forecast Model Selection Bootstrap rocedures for Recursive Estimation Schemes With Applications to Forecast Model Selection Valentina Corradi and Norman R. Swanson 2 Queen Mary, University of London and 2 Rutgers University June

More information

Economic modelling and forecasting

Economic modelling and forecasting Economic modelling and forecasting 2-6 February 2015 Bank of England he generalised method of moments Ole Rummel Adviser, CCBS at the Bank of England ole.rummel@bankofengland.co.uk Outline Classical estimation

More information

Robust Backtesting Tests for Value-at-Risk Models

Robust Backtesting Tests for Value-at-Risk Models Robust Backtesting Tests for Value-at-Risk Models Jose Olmo City University London (joint work with Juan Carlos Escanciano, Indiana University) Far East and South Asia Meeting of the Econometric Society

More information

Discussion of Tests of Equal Predictive Ability with Real-Time Data by T. E. Clark and M.W. McCracken

Discussion of Tests of Equal Predictive Ability with Real-Time Data by T. E. Clark and M.W. McCracken Discussion of Tests of Equal Predictive Ability with Real-Time Data by T. E. Clark and M.W. McCracken Juri Marcucci Bank of Italy 5 th ECB Workshop on Forecasting Techniques Forecast Uncertainty in Macroeconomics

More information

Econ 423 Lecture Notes: Additional Topics in Time Series 1

Econ 423 Lecture Notes: Additional Topics in Time Series 1 Econ 423 Lecture Notes: Additional Topics in Time Series 1 John C. Chao April 25, 2017 1 These notes are based in large part on Chapter 16 of Stock and Watson (2011). They are for instructional purposes

More information

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics A short review of the principles of mathematical statistics (or, what you should have learned in EC 151).

More information

Quick Review on Linear Multiple Regression

Quick Review on Linear Multiple Regression Quick Review on Linear Multiple Regression Mei-Yuan Chen Department of Finance National Chung Hsing University March 6, 2007 Introduction for Conditional Mean Modeling Suppose random variables Y, X 1,

More information

ECON 616: Lecture 1: Time Series Basics

ECON 616: Lecture 1: Time Series Basics ECON 616: Lecture 1: Time Series Basics ED HERBST August 30, 2017 References Overview: Chapters 1-3 from Hamilton (1994). Technical Details: Chapters 2-3 from Brockwell and Davis (1987). Intuition: Chapters

More information

Multivariate Time Series: VAR(p) Processes and Models

Multivariate Time Series: VAR(p) Processes and Models Multivariate Time Series: VAR(p) Processes and Models A VAR(p) model, for p > 0 is X t = φ 0 + Φ 1 X t 1 + + Φ p X t p + A t, where X t, φ 0, and X t i are k-vectors, Φ 1,..., Φ p are k k matrices, with

More information

Generalized Method of Moments (GMM) Estimation

Generalized Method of Moments (GMM) Estimation Econometrics 2 Fall 2004 Generalized Method of Moments (GMM) Estimation Heino Bohn Nielsen of29 Outline of the Lecture () Introduction. (2) Moment conditions and methods of moments (MM) estimation. Ordinary

More information

GMM, HAC estimators, & Standard Errors for Business Cycle Statistics

GMM, HAC estimators, & Standard Errors for Business Cycle Statistics GMM, HAC estimators, & Standard Errors for Business Cycle Statistics Wouter J. Den Haan London School of Economics c Wouter J. Den Haan Overview Generic GMM problem Estimation Heteroskedastic and Autocorrelation

More information

Tests of Equal Predictive Ability with Real-Time Data

Tests of Equal Predictive Ability with Real-Time Data Tests of Equal Predictive Ability with Real-Time Data Todd E. Clark Federal Reserve Bank of Kansas City Michael W. McCracken Board of Governors of the Federal Reserve System April 2007 (preliminary) Abstract

More information

Department of Economics, UCSD UC San Diego

Department of Economics, UCSD UC San Diego Department of Economics, UCSD UC San Diego itle: Spurious Regressions with Stationary Series Author: Granger, Clive W.J., University of California, San Diego Hyung, Namwon, University of Seoul Jeon, Yongil,

More information

Understanding Regressions with Observations Collected at High Frequency over Long Span

Understanding Regressions with Observations Collected at High Frequency over Long Span Understanding Regressions with Observations Collected at High Frequency over Long Span Yoosoon Chang Department of Economics, Indiana University Joon Y. Park Department of Economics, Indiana University

More information

Tests of Equal Predictive Ability with Real-Time Data

Tests of Equal Predictive Ability with Real-Time Data issn 1936-5330 Tests of Equal Predictive Ability with Real-Time Data Todd E. Clark Federal Reserve Bank of Kansas City Michael W. McCracken Board of Governors of the Federal Reserve System July 2007 Abstract

More information

Prof. Dr. Roland Füss Lecture Series in Applied Econometrics Summer Term Introduction to Time Series Analysis

Prof. Dr. Roland Füss Lecture Series in Applied Econometrics Summer Term Introduction to Time Series Analysis Introduction to Time Series Analysis 1 Contents: I. Basics of Time Series Analysis... 4 I.1 Stationarity... 5 I.2 Autocorrelation Function... 9 I.3 Partial Autocorrelation Function (PACF)... 14 I.4 Transformation

More information

Volatility. Gerald P. Dwyer. February Clemson University

Volatility. Gerald P. Dwyer. February Clemson University Volatility Gerald P. Dwyer Clemson University February 2016 Outline 1 Volatility Characteristics of Time Series Heteroskedasticity Simpler Estimation Strategies Exponentially Weighted Moving Average Use

More information

A Forecast Rationality Test that Allows for Loss Function Asymmetries

A Forecast Rationality Test that Allows for Loss Function Asymmetries A Forecast Rationality Test that Allows for Loss Function Asymmetries Andrea A. Naghi First Draft: May, 2013 This Draft: August 21, 2014 Abstract In this paper, we propose a conditional moment type test

More information

Department of Economics, Vanderbilt University While it is known that pseudo-out-of-sample methods are not optimal for

Department of Economics, Vanderbilt University While it is known that pseudo-out-of-sample methods are not optimal for Comment Atsushi Inoue Department of Economics, Vanderbilt University (atsushi.inoue@vanderbilt.edu) While it is known that pseudo-out-of-sample methods are not optimal for comparing models, they are nevertheless

More information

An estimate of the long-run covariance matrix, Ω, is necessary to calculate asymptotic

An estimate of the long-run covariance matrix, Ω, is necessary to calculate asymptotic Chapter 6 ESTIMATION OF THE LONG-RUN COVARIANCE MATRIX An estimate of the long-run covariance matrix, Ω, is necessary to calculate asymptotic standard errors for the OLS and linear IV estimators presented

More information

Asymptotic distribution of GMM Estimator

Asymptotic distribution of GMM Estimator Asymptotic distribution of GMM Estimator Eduardo Rossi University of Pavia Econometria finanziaria 2010 Rossi (2010) GMM 2010 1 / 45 Outline 1 Asymptotic Normality of the GMM Estimator 2 Long Run Covariance

More information

LECTURE ON HAC COVARIANCE MATRIX ESTIMATION AND THE KVB APPROACH

LECTURE ON HAC COVARIANCE MATRIX ESTIMATION AND THE KVB APPROACH LECURE ON HAC COVARIANCE MARIX ESIMAION AND HE KVB APPROACH CHUNG-MING KUAN Institute of Economics Academia Sinica October 20, 2006 ckuan@econ.sinica.edu.tw www.sinica.edu.tw/ ckuan Outline C.-M. Kuan,

More information

Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis

Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis Sílvia Gonçalves and Benoit Perron Département de sciences économiques,

More information

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) 1 EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) Taisuke Otsu London School of Economics Summer 2018 A.1. Summation operator (Wooldridge, App. A.1) 2 3 Summation operator For

More information

Econ 623 Econometrics II Topic 2: Stationary Time Series

Econ 623 Econometrics II Topic 2: Stationary Time Series 1 Introduction Econ 623 Econometrics II Topic 2: Stationary Time Series In the regression model we can model the error term as an autoregression AR(1) process. That is, we can use the past value of the

More information

Predicting bond returns using the output gap in expansions and recessions

Predicting bond returns using the output gap in expansions and recessions Erasmus university Rotterdam Erasmus school of economics Bachelor Thesis Quantitative finance Predicting bond returns using the output gap in expansions and recessions Author: Martijn Eertman Studentnumber:

More information

ECON3327: Financial Econometrics, Spring 2016

ECON3327: Financial Econometrics, Spring 2016 ECON3327: Financial Econometrics, Spring 2016 Wooldridge, Introductory Econometrics (5th ed, 2012) Chapter 11: OLS with time series data Stationary and weakly dependent time series The notion of a stationary

More information

Ch.10 Autocorrelated Disturbances (June 15, 2016)

Ch.10 Autocorrelated Disturbances (June 15, 2016) Ch10 Autocorrelated Disturbances (June 15, 2016) In a time-series linear regression model setting, Y t = x tβ + u t, t = 1, 2,, T, (10-1) a common problem is autocorrelation, or serial correlation of the

More information

Wooldridge, Introductory Econometrics, 4th ed. Chapter 2: The simple regression model

Wooldridge, Introductory Econometrics, 4th ed. Chapter 2: The simple regression model Wooldridge, Introductory Econometrics, 4th ed. Chapter 2: The simple regression model Most of this course will be concerned with use of a regression model: a structure in which one or more explanatory

More information

2.5 Forecasting and Impulse Response Functions

2.5 Forecasting and Impulse Response Functions 2.5 Forecasting and Impulse Response Functions Principles of forecasting Forecast based on conditional expectations Suppose we are interested in forecasting the value of y t+1 based on a set of variables

More information

Comparing Nested Predictive Regression Models with Persistent Predictors

Comparing Nested Predictive Regression Models with Persistent Predictors Comparing Nested Predictive Regression Models with Persistent Predictors Yan Ge y and ae-hwy Lee z November 29, 24 Abstract his paper is an extension of Clark and McCracken (CM 2, 25, 29) and Clark and

More information

VAR Models and Applications

VAR Models and Applications VAR Models and Applications Laurent Ferrara 1 1 University of Paris West M2 EIPMC Oct. 2016 Overview of the presentation 1. Vector Auto-Regressions Definition Estimation Testing 2. Impulse responses functions

More information

Using all observations when forecasting under structural breaks

Using all observations when forecasting under structural breaks Using all observations when forecasting under structural breaks Stanislav Anatolyev New Economic School Victor Kitov Moscow State University December 2007 Abstract We extend the idea of the trade-off window

More information

Questions and Answers on Unit Roots, Cointegration, VARs and VECMs

Questions and Answers on Unit Roots, Cointegration, VARs and VECMs Questions and Answers on Unit Roots, Cointegration, VARs and VECMs L. Magee Winter, 2012 1. Let ɛ t, t = 1,..., T be a series of independent draws from a N[0,1] distribution. Let w t, t = 1,..., T, be

More information

Inference in VARs with Conditional Heteroskedasticity of Unknown Form

Inference in VARs with Conditional Heteroskedasticity of Unknown Form Inference in VARs with Conditional Heteroskedasticity of Unknown Form Ralf Brüggemann a Carsten Jentsch b Carsten Trenkler c University of Konstanz University of Mannheim University of Mannheim IAB Nuremberg

More information

11. Further Issues in Using OLS with TS Data

11. Further Issues in Using OLS with TS Data 11. Further Issues in Using OLS with TS Data With TS, including lags of the dependent variable often allow us to fit much better the variation in y Exact distribution theory is rarely available in TS applications,

More information

Estimation and Testing of Forecast Rationality under Flexible Loss

Estimation and Testing of Forecast Rationality under Flexible Loss Review of Economic Studies (2005) 72, 1107 1125 0034-6527/05/00431107$02.00 c 2005 The Review of Economic Studies Limited Estimation and Testing of Forecast Rationality under Flexible Loss GRAHAM ELLIOTT

More information

Some Recent Developments in Predictive Accuracy Testing With Nested Models and (Generic) Nonlinear Alternatives

Some Recent Developments in Predictive Accuracy Testing With Nested Models and (Generic) Nonlinear Alternatives Some Recent Developments in redictive Accuracy Testing With Nested Models and (Generic) Nonlinear Alternatives Valentina Corradi 1 and Norman R. Swanson 2 1 University of Exeter 2 Rutgers University August

More information

Forecasting the unemployment rate when the forecast loss function is asymmetric. Jing Tian

Forecasting the unemployment rate when the forecast loss function is asymmetric. Jing Tian Forecasting the unemployment rate when the forecast loss function is asymmetric Jing Tian This version: 27 May 2009 Abstract This paper studies forecasts when the forecast loss function is asymmetric,

More information

The Functional Central Limit Theorem and Testing for Time Varying Parameters

The Functional Central Limit Theorem and Testing for Time Varying Parameters NBER Summer Institute Minicourse What s New in Econometrics: ime Series Lecture : July 4, 008 he Functional Central Limit heorem and esting for ime Varying Parameters Lecture -, July, 008 Outline. FCL.

More information

Bayesian Semiparametric GARCH Models

Bayesian Semiparametric GARCH Models Bayesian Semiparametric GARCH Models Xibin (Bill) Zhang and Maxwell L. King Department of Econometrics and Business Statistics Faculty of Business and Economics xibin.zhang@monash.edu Quantitative Methods

More information

Bayesian Semiparametric GARCH Models

Bayesian Semiparametric GARCH Models Bayesian Semiparametric GARCH Models Xibin (Bill) Zhang and Maxwell L. King Department of Econometrics and Business Statistics Faculty of Business and Economics xibin.zhang@monash.edu Quantitative Methods

More information

Vector Auto-Regressive Models

Vector Auto-Regressive Models Vector Auto-Regressive Models Laurent Ferrara 1 1 University of Paris Nanterre M2 Oct. 2018 Overview of the presentation 1. Vector Auto-Regressions Definition Estimation Testing 2. Impulse responses functions

More information

ECON 4160, Spring term Lecture 12

ECON 4160, Spring term Lecture 12 ECON 4160, Spring term 2013. Lecture 12 Non-stationarity and co-integration 2/2 Ragnar Nymoen Department of Economics 13 Nov 2013 1 / 53 Introduction I So far we have considered: Stationary VAR, with deterministic

More information

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics Jiti Gao Department of Statistics School of Mathematics and Statistics The University of Western Australia Crawley

More information

State-space Model. Eduardo Rossi University of Pavia. November Rossi State-space Model Fin. Econometrics / 53

State-space Model. Eduardo Rossi University of Pavia. November Rossi State-space Model Fin. Econometrics / 53 State-space Model Eduardo Rossi University of Pavia November 2014 Rossi State-space Model Fin. Econometrics - 2014 1 / 53 Outline 1 Motivation 2 Introduction 3 The Kalman filter 4 Forecast errors 5 State

More information

Least Squares Estimation-Finite-Sample Properties

Least Squares Estimation-Finite-Sample Properties Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Finite-Sample 1 / 29 Terminology and Assumptions 1 Terminology and Assumptions

More information

Comparing Predictive Accuracy, Twenty Years Later: On The Use and Abuse of Diebold-Mariano Tests

Comparing Predictive Accuracy, Twenty Years Later: On The Use and Abuse of Diebold-Mariano Tests Comparing Predictive Accuracy, Twenty Years Later: On The Use and Abuse of Diebold-Mariano Tests Francis X. Diebold April 28, 2014 1 / 24 Comparing Forecasts 2 / 24 Comparing Model-Free Forecasts Models

More information

Central Bank of Chile October 29-31, 2013 Bruce Hansen (University of Wisconsin) Structural Breaks October 29-31, / 91. Bruce E.

Central Bank of Chile October 29-31, 2013 Bruce Hansen (University of Wisconsin) Structural Breaks October 29-31, / 91. Bruce E. Forecasting Lecture 3 Structural Breaks Central Bank of Chile October 29-31, 2013 Bruce Hansen (University of Wisconsin) Structural Breaks October 29-31, 2013 1 / 91 Bruce E. Hansen Organization Detection

More information

Autoregressive Moving Average (ARMA) Models and their Practical Applications

Autoregressive Moving Average (ARMA) Models and their Practical Applications Autoregressive Moving Average (ARMA) Models and their Practical Applications Massimo Guidolin February 2018 1 Essential Concepts in Time Series Analysis 1.1 Time Series and Their Properties Time series:

More information

GARCH Models Estimation and Inference. Eduardo Rossi University of Pavia

GARCH Models Estimation and Inference. Eduardo Rossi University of Pavia GARCH Models Estimation and Inference Eduardo Rossi University of Pavia Likelihood function The procedure most often used in estimating θ 0 in ARCH models involves the maximization of a likelihood function

More information

Week 5 Quantitative Analysis of Financial Markets Characterizing Cycles

Week 5 Quantitative Analysis of Financial Markets Characterizing Cycles Week 5 Quantitative Analysis of Financial Markets Characterizing Cycles Christopher Ting http://www.mysmu.edu/faculty/christophert/ Christopher Ting : christopherting@smu.edu.sg : 6828 0364 : LKCSB 5036

More information

11. Bootstrap Methods

11. Bootstrap Methods 11. Bootstrap Methods c A. Colin Cameron & Pravin K. Trivedi 2006 These transparencies were prepared in 20043. They can be used as an adjunct to Chapter 11 of our subsequent book Microeconometrics: Methods

More information

Nonparametric Bootstrap Procedures for Predictive Inference Based on Recursive Estimation Schemes

Nonparametric Bootstrap Procedures for Predictive Inference Based on Recursive Estimation Schemes Nonparametric Bootstrap rocedures for redictive Inference Based on Recursive Estimation Schemes Valentina Corradi and Norman R. Swanson 2 Queen Mary, University of London and 2 Rutgers University March

More information

Lecture 3: Autoregressive Moving Average (ARMA) Models and their Practical Applications

Lecture 3: Autoregressive Moving Average (ARMA) Models and their Practical Applications Lecture 3: Autoregressive Moving Average (ARMA) Models and their Practical Applications Prof. Massimo Guidolin 20192 Financial Econometrics Winter/Spring 2018 Overview Moving average processes Autoregressive

More information

FORECAST-BASED MODEL SELECTION

FORECAST-BASED MODEL SELECTION FORECAST-ASED MODEL SELECTION IN THE PRESENCE OF STRUCTURAL REAKS Todd E. Clark Michael W. McCracken AUGUST 2002 RWP 02-05 Research Division Federal Reserve ank of Kansas City Todd E. Clark is an assistant

More information

10. Time series regression and forecasting

10. Time series regression and forecasting 10. Time series regression and forecasting Key feature of this section: Analysis of data on a single entity observed at multiple points in time (time series data) Typical research questions: What is the

More information

Comprehensive Examination Quantitative Methods Spring, 2018

Comprehensive Examination Quantitative Methods Spring, 2018 Comprehensive Examination Quantitative Methods Spring, 2018 Instruction: This exam consists of three parts. You are required to answer all the questions in all the parts. 1 Grading policy: 1. Each part

More information

Topic 4 Unit Roots. Gerald P. Dwyer. February Clemson University

Topic 4 Unit Roots. Gerald P. Dwyer. February Clemson University Topic 4 Unit Roots Gerald P. Dwyer Clemson University February 2016 Outline 1 Unit Roots Introduction Trend and Difference Stationary Autocorrelations of Series That Have Deterministic or Stochastic Trends

More information

Estimation of Dynamic Regression Models

Estimation of Dynamic Regression Models University of Pavia 2007 Estimation of Dynamic Regression Models Eduardo Rossi University of Pavia Factorization of the density DGP: D t (x t χ t 1, d t ; Ψ) x t represent all the variables in the economy.

More information

Bootstrapping the Grainger Causality Test With Integrated Data

Bootstrapping the Grainger Causality Test With Integrated Data Bootstrapping the Grainger Causality Test With Integrated Data Richard Ti n University of Reading July 26, 2006 Abstract A Monte-carlo experiment is conducted to investigate the small sample performance

More information

GARCH Models Estimation and Inference

GARCH Models Estimation and Inference GARCH Models Estimation and Inference Eduardo Rossi University of Pavia December 013 Rossi GARCH Financial Econometrics - 013 1 / 1 Likelihood function The procedure most often used in estimating θ 0 in

More information

A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008 A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. Linear-in-Parameters Models: IV versus Control Functions 2. Correlated

More information

Lecture 2: Univariate Time Series

Lecture 2: Univariate Time Series Lecture 2: Univariate Time Series Analysis: Conditional and Unconditional Densities, Stationarity, ARMA Processes Prof. Massimo Guidolin 20192 Financial Econometrics Spring/Winter 2017 Overview Motivation:

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrics of Panel Data Jakub Mućk Meeting # 6 Jakub Mućk Econometrics of Panel Data Meeting # 6 1 / 36 Outline 1 The First-Difference (FD) estimator 2 Dynamic panel data models 3 The Anderson and Hsiao

More information

A Primer on Asymptotics

A Primer on Asymptotics A Primer on Asymptotics Eric Zivot Department of Economics University of Washington September 30, 2003 Revised: October 7, 2009 Introduction The two main concepts in asymptotic theory covered in these

More information

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley Time Series Models and Inference James L. Powell Department of Economics University of California, Berkeley Overview In contrast to the classical linear regression model, in which the components of the

More information

Non-Stationary Time Series and Unit Root Testing

Non-Stationary Time Series and Unit Root Testing Econometrics II Non-Stationary Time Series and Unit Root Testing Morten Nyboe Tabor Course Outline: Non-Stationary Time Series and Unit Root Testing 1 Stationarity and Deviation from Stationarity Trend-Stationarity

More information

GARCH Models Estimation and Inference

GARCH Models Estimation and Inference Università di Pavia GARCH Models Estimation and Inference Eduardo Rossi Likelihood function The procedure most often used in estimating θ 0 in ARCH models involves the maximization of a likelihood function

More information

1 Outline. 1. Motivation. 2. SUR model. 3. Simultaneous equations. 4. Estimation

1 Outline. 1. Motivation. 2. SUR model. 3. Simultaneous equations. 4. Estimation 1 Outline. 1. Motivation 2. SUR model 3. Simultaneous equations 4. Estimation 2 Motivation. In this chapter, we will study simultaneous systems of econometric equations. Systems of simultaneous equations

More information

Statistical inference

Statistical inference Statistical inference Contents 1. Main definitions 2. Estimation 3. Testing L. Trapani MSc Induction - Statistical inference 1 1 Introduction: definition and preliminary theory In this chapter, we shall

More information

Generalized Method of Moment

Generalized Method of Moment Generalized Method of Moment CHUNG-MING KUAN Department of Finance & CRETA National Taiwan University June 16, 2010 C.-M. Kuan (Finance & CRETA, NTU Generalized Method of Moment June 16, 2010 1 / 32 Lecture

More information

Analogy Principle. Asymptotic Theory Part II. James J. Heckman University of Chicago. Econ 312 This draft, April 5, 2006

Analogy Principle. Asymptotic Theory Part II. James J. Heckman University of Chicago. Econ 312 This draft, April 5, 2006 Analogy Principle Asymptotic Theory Part II James J. Heckman University of Chicago Econ 312 This draft, April 5, 2006 Consider four methods: 1. Maximum Likelihood Estimation (MLE) 2. (Nonlinear) Least

More information

Econometrics II - EXAM Outline Solutions All questions have 25pts Answer each question in separate sheets

Econometrics II - EXAM Outline Solutions All questions have 25pts Answer each question in separate sheets Econometrics II - EXAM Outline Solutions All questions hae 5pts Answer each question in separate sheets. Consider the two linear simultaneous equations G with two exogeneous ariables K, y γ + y γ + x δ

More information

Title. Description. var intro Introduction to vector autoregressive models

Title. Description. var intro Introduction to vector autoregressive models Title var intro Introduction to vector autoregressive models Description Stata has a suite of commands for fitting, forecasting, interpreting, and performing inference on vector autoregressive (VAR) models

More information

EVALUATING DIRECT MULTI-STEP FORECASTS

EVALUATING DIRECT MULTI-STEP FORECASTS EVALUATING DIRECT MULTI-STEP FORECASTS Todd Clark and Michael McCracken Revised: April 2005 (First Version December 2001) RWP 01-14 Research Division Federal Reserve Bank of Kansas City Todd E. Clark is

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Research Division Federal Reserve Bank of St. Louis Working Paper Series

Research Division Federal Reserve Bank of St. Louis Working Paper Series Research Division Federal Reserve Bank of St. Louis Working Paper Series Tests of Equal Predictive Ability with Real-Time Data Todd E. Clark and Michael W. McCracken Working Paper 2008-029A http://research.stlouisfed.org/wp/2008/2008-029.pdf

More information

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47 ECON2228 Notes 2 Christopher F Baum Boston College Economics 2014 2015 cfb (BC Econ) ECON2228 Notes 2 2014 2015 1 / 47 Chapter 2: The simple regression model Most of this course will be concerned with

More information

Economics Division University of Southampton Southampton SO17 1BJ, UK. Title Overlapping Sub-sampling and invariance to initial conditions

Economics Division University of Southampton Southampton SO17 1BJ, UK. Title Overlapping Sub-sampling and invariance to initial conditions Economics Division University of Southampton Southampton SO17 1BJ, UK Discussion Papers in Economics and Econometrics Title Overlapping Sub-sampling and invariance to initial conditions By Maria Kyriacou

More information

Missing dependent variables in panel data models

Missing dependent variables in panel data models Missing dependent variables in panel data models Jason Abrevaya Abstract This paper considers estimation of a fixed-effects model in which the dependent variable may be missing. For cross-sectional units

More information

GARCH Models. Eduardo Rossi University of Pavia. December Rossi GARCH Financial Econometrics / 50

GARCH Models. Eduardo Rossi University of Pavia. December Rossi GARCH Financial Econometrics / 50 GARCH Models Eduardo Rossi University of Pavia December 013 Rossi GARCH Financial Econometrics - 013 1 / 50 Outline 1 Stylized Facts ARCH model: definition 3 GARCH model 4 EGARCH 5 Asymmetric Models 6

More information

Discrete time processes

Discrete time processes Discrete time processes Predictions are difficult. Especially about the future Mark Twain. Florian Herzog 2013 Modeling observed data When we model observed (realized) data, we encounter usually the following

More information

Lecture 6: Univariate Volatility Modelling: ARCH and GARCH Models

Lecture 6: Univariate Volatility Modelling: ARCH and GARCH Models Lecture 6: Univariate Volatility Modelling: ARCH and GARCH Models Prof. Massimo Guidolin 019 Financial Econometrics Winter/Spring 018 Overview ARCH models and their limitations Generalized ARCH models

More information

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research Linear models Linear models are computationally convenient and remain widely used in applied econometric research Our main focus in these lectures will be on single equation linear models of the form y

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

The Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University

The Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University The Bootstrap: Theory and Applications Biing-Shen Kuo National Chengchi University Motivation: Poor Asymptotic Approximation Most of statistical inference relies on asymptotic theory. Motivation: Poor

More information

A time series is called strictly stationary if the joint distribution of every collection (Y t

A time series is called strictly stationary if the joint distribution of every collection (Y t 5 Time series A time series is a set of observations recorded over time. You can think for example at the GDP of a country over the years (or quarters) or the hourly measurements of temperature over a

More information

Notes on Time Series Modeling

Notes on Time Series Modeling Notes on Time Series Modeling Garey Ramey University of California, San Diego January 17 1 Stationary processes De nition A stochastic process is any set of random variables y t indexed by t T : fy t g

More information

Are Forecast Updates Progressive?

Are Forecast Updates Progressive? MPRA Munich Personal RePEc Archive Are Forecast Updates Progressive? Chia-Lin Chang and Philip Hans Franses and Michael McAleer National Chung Hsing University, Erasmus University Rotterdam, Erasmus University

More information

Vector autoregressions, VAR

Vector autoregressions, VAR 1 / 45 Vector autoregressions, VAR Chapter 2 Financial Econometrics Michael Hauser WS17/18 2 / 45 Content Cross-correlations VAR model in standard/reduced form Properties of VAR(1), VAR(p) Structural VAR,

More information

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models University of Illinois Fall 2016 Department of Economics Roger Koenker Economics 536 Lecture 7 Introduction to Specification Testing in Dynamic Econometric Models In this lecture I want to briefly describe

More information

7. Forecasting with ARIMA models

7. Forecasting with ARIMA models 7. Forecasting with ARIMA models 309 Outline: Introduction The prediction equation of an ARIMA model Interpreting the predictions Variance of the predictions Forecast updating Measuring predictability

More information

On detection of unit roots generalizing the classic Dickey-Fuller approach

On detection of unit roots generalizing the classic Dickey-Fuller approach On detection of unit roots generalizing the classic Dickey-Fuller approach A. Steland Ruhr-Universität Bochum Fakultät für Mathematik Building NA 3/71 D-4478 Bochum, Germany February 18, 25 1 Abstract

More information

Multivariate Time Series Analysis and Its Applications [Tsay (2005), chapter 8]

Multivariate Time Series Analysis and Its Applications [Tsay (2005), chapter 8] 1 Multivariate Time Series Analysis and Its Applications [Tsay (2005), chapter 8] Insights: Price movements in one market can spread easily and instantly to another market [economic globalization and internet

More information

The loss function and estimating equations

The loss function and estimating equations Chapter 6 he loss function and estimating equations 6 Loss functions Up until now our main focus has been on parameter estimating via the maximum likelihood However, the negative maximum likelihood is

More information

University of Pavia. M Estimators. Eduardo Rossi

University of Pavia. M Estimators. Eduardo Rossi University of Pavia M Estimators Eduardo Rossi Criterion Function A basic unifying notion is that most econometric estimators are defined as the minimizers of certain functions constructed from the sample

More information

Diagnostic Test for GARCH Models Based on Absolute Residual Autocorrelations

Diagnostic Test for GARCH Models Based on Absolute Residual Autocorrelations Diagnostic Test for GARCH Models Based on Absolute Residual Autocorrelations Farhat Iqbal Department of Statistics, University of Balochistan Quetta-Pakistan farhatiqb@gmail.com Abstract In this paper

More information

Reality Checks and Nested Forecast Model Comparisons

Reality Checks and Nested Forecast Model Comparisons Reality Checks and Nested Forecast Model Comparisons Todd E. Clark Federal Reserve Bank of Kansas City Michael W. McCracken Board of Governors of the Federal Reserve System October 2006 (preliminary and

More information