Linear Regression with Time Series Data

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f e c o n o m i c s Econometrics II Linear Regression with Time Series Data Morten Nyboe Tabor

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f e c o n o m i c s Learning Goals Give an interpretation of the linear regression model for stationary time series and explain the model s identifying assumptions. Derive the method of moments (MM) estimator and state the assumptions used to derive the estimator. Estimate and interpret the parameters. Explain the sufficient conditions for consistency, unbiasedness, and asymptotic normality of the method of moments estimator in the linear regression model. Construct misspecification tests and analyze to what extent a statistical model is congruent with the data. Explain the consequences of autocorrelated, heteroskedastic, and non-normal residuals for the properties of the MM estimator. Econometrics II Linear Regression with Time Series Data Slide 2/46

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f e c o n o m i c s Outline 1 The Linear Regression Model Definition, Interpretation, and Identification Method of Moments (MM) Estimation 2 Properties of the Estimator Consistency Unbiasedness Example: Bias in AR(1) Model Asymptotic Distribution 3 Dynamic Completeness and Autocorrelation A Dynamically Complete Model Autocorrelation of the Error Term Consequences of Autocorrelation 4 Model Formulation and Misspecification Testing Model Formulation Misspecification Testing Model Formulation and Misspecification Testing in Practice 5 The Frisch-Waugh-Lovell Theorem 6 Recap: Linear Regression Model with Time Series Data Econometrics II Linear Regression with Time Series Data Slide 3/46

1. The Linear Regression Model

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f e c o n o m i c s Time Series Regression Models Consider the linear regression model y t = x t β + ɛ t = x 1tβ 1 + x 2tβ 2 +... + x kt β k + ɛ t, for t = 1, 2,..., T. Note that y t and ɛ t are 1 1, while x t and β are k 1. The interpretation of the model depends on the variables in x t. 1 If x t contains contemporaneously dated variables, it is denoted a static regression: y t = x t β + ɛt. 2 A simple model for y t given the past is an autoregressive model: y t = θy t 1 + ɛ t. 3 More complicated dynamics in the autoregressive distributed lag (ADL) model: y t = θ 1 y t 1 + x t ϕ 0 + x t 1 ϕ 1 + ɛ t. Econometrics II Linear Regression with Time Series Data Slide 5/46

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f e c o n o m i c s Interpretation of Regression Models Consider again the regression model y t = x t β + ɛ t, t = 1, 2,..., T. ( ) As it stands, the equation is a tautology: not informative on β! Why?... For any β, we can find a residual ɛ t so that ( ) holds. We have to impose restrictions on ɛ t to ensure a unique solution to ( ). This is called identification in econometrics. Assume that ( ) represents the conditional expectation, E(y t x t) = x t β, so that E (ɛ t x t) = 0. ( ) This condition states a zero-conditional-mean. We say that x t is predetermined. Under assumption ( ) the coefficients are the partial (ceteris paribus) effects E(y t x t) x jt = β j. Econometrics II Linear Regression with Time Series Data Slide 6/46

Socrative Question 1 Consider the linear regression model, y t = x t β + ɛ t, t = 1, 2,..., T, ( ) where we assume the regressors are predetermined: E(ɛ t x t) = 0. Q: Which of the following assumptions is not implied by predeterminedness? (A) E(ɛ t) = 0. (B) There is no feedback from y t to x t. (C) E(ɛ t x 1t, x 2t,..., x t,..., x T ) = 0. (D) E(x tɛ t) = 0. (E) Don t know. Please go to www.socrative.com, click Student login, and enter room id Econometrics2.

Exercise 1 Consider the linear regression model, y t = x t β + ɛ t, t = 1, 2,..., T, ( ) where we assume the regressors are predetermined: E(ɛ t x t) = 0. Throughout the course, we use the following results from probability theory: Law of iterated expectations: Linearity of conditional expectations: E(v t) = E(E(v t w t)), E(f (v t)w t v t) = f (v t)e(w t). Q1. Use the law of iterated expectations to show that predeterminedness implies the moment condition E(x tɛ t) = 0. Q2. Show that predeterminedness implies a zero unconditional mean of the residuals: E(ɛ t) = 0. Q3. Show that strict exogeneity, E(ɛ t x 1, x 2,..., x t,..., x T ) = 0, implies predeterminedness.

Exercise 1 Q1. Use the law of iterated expectations to show that predeterminedness implies the moment condition E(x tɛ t) = 0. Q2. Show that predeterminedness implies a zero unconditional mean of the residuals: E(ɛ t) = 0.

Exercise 1 Q3. Show that strict exogeneity, E(ɛ t x 1, x 2,..., x t,..., x T ) = 0, implies predeterminedness.

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f e c o n o m i c s Identification Predeterminedness implies the so-called moment condition: stating that x t and ɛ t are uncorrelated. E (x tɛ t) = 0, ( ) Now insert the model definition, ɛ t = y t x t β, in ( ) to obtain E(x t(y t x t β)) = 0 E(x ty t) E(x tx t )β = 0. This is a system of k equations in the k unknown parameters, β, and if E(x tx t ) is non-singular we can find the so-called population estimator which is unique. β = E(x tx t ) 1 E(x ty t), The parameters in β are identified by ( ) and the non-singularity condition. The latter is the well-known condition for no perfect multicollinearity. Econometrics II Linear Regression with Time Series Data Slide 7/46

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f e c o n o m i c s Method of Moments (MM) Estimation From a given sample (y t, x t ), t = 1, 2,..., T, we cannot compute expectations. In practice we replace with sample averages and obtain the MM estimator ( ) 1 ( ) T T β = T 1 x tx t T 1 x ty t. t=1 Note that the MM estimator coincides with the OLS estimator. For MM to work, i.e., β β, we need a law of large numbers (LLN) to apply, i.e., T 1 T x ty t E(x ty t) and T 1 t=1 t=1 T x tx t E(x tx t ). Note the two distinct conditions for OLS to converge to the true value: t=1 1 The moment condition ( ) should be satisfied. 2 A law of large numbers should apply. A central part of econometric analysis is to ensure these conditions. Econometrics II Linear Regression with Time Series Data Slide 8/46

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f e c o n o m i c s Main Assumption We impose assumptions to ensure that a LLN applies to the sample averages. Main Assumption Consider a time series y t and the k 1 vector time series x t. We assume: 1 The process z t = (y t, x t ) has a joint stationary distribution. 2 The process z t is weakly dependent, so that z t and z t+h becomes approximately independent for h. Interpretation: Think of (1) as replacing identical distributions for IID data. Think of (2) as replacing independent observations for IID data. Under the main assumption, most of the results for linear regression on random samples carry over to the time series case. Econometrics II Linear Regression with Time Series Data Slide 9/46

Introduction and Motivation for Theoretical Exercises During lectures you are asked to work actively on theoretical exercises. These are based on the learning goals for the course and focus on theoretical derivations and interpretations. The idea of the theoretical exercises is to apply the principles covered in the readings and videos typically to a model that is slightly different from the one in the readings and videos. The exercises include several questions and there is progression in the difficulty. Try to solve as much as you can. We follow up using Socrative quizzes and review of the answers. Also, a written solution is available on Absalon.

Theoretical Exercise 1.1 We consider the first-order autoregressive model for z t, given by: where the initial value z 0 is given. z t = δ + θz t 1 + ɛ t, t = 1, 2,..., T, ( ) Q1. State the predeterminedness assumption and the moment conditions for the model in ( ). Q2. Derive the method of moments (MM) estimator of the parameters δ and θ. Q3. (Advanced) Write the model in ( ) for t = 1, 2,..., T using matrix notation. Hint: Define the T 1 vector Z = (z 1, z 2,..., z T ). Q4. (Advanced) Write the sample moments for the model in ( ) using matrix notation and derive the method of moments estimator of δ and θ.

Socrative Question 2 We consider the first-order autoregressive model for c t, given by: z t = δ + θz t 1 + ɛ t, t = 1, 2,..., T, z 0 given. ( ) Q. The predeterminedness assumption and the moment conditions for the model in ( ) are given by? (A) E(ɛ t x t) = 0 and E(x tɛ t) = 0. (B) E(ɛ t z t) = 0 and E(z tɛ t) = 0. (C) E(ɛ t z t 1) = 0 and E(z t 1ɛ t) = 0. (D) E(ɛ t 1, z t) = 0 and E(z tɛ t) = 0 and E(1 ɛ t) = 0. (E) Don t know. Please go to www.socrative.com, click Student login, and enter room id Econometrics2.

Socrative Question 3 We consider the first-order autoregressive model for c t, given by: z t = δ + θz t 1 + ɛ t, t = 1, 2,..., T, z 0 given. ( ) Q. What is the method of moments estimator of β = (δ, θ)? (A) β = ( T t=1 xtx t ) 1 T t=1 xtyt. (B) β = ( T t=1 zt 1z t 1 ) 1 T t=1 zt 1zt. ( T T ) (C) β = t=1 12 1 1 ( T t=1 zt 1 T t=1 zt 1 1 T z 1 2 t=1 zt T ). t=1 t 1 t=1 zt 1zt (D) β = ( T t=1 z 2 t T t=1 zt 1zt ) 1 ( T z T z 2 t=1 t 2 T ). t=1 t 1 t=1 zt 1zt T t=1 ztzt 1 (E) Don t know. Please go to www.socrative.com, click Student login, and enter room id Econometrics2.

We consider the first-order autoregressive model for z t, given by: z t = δ + θz t 1 + ɛ t, t = 1, 2,..., T, z 0 given. ( ) Q1. State the predeterminedness assumption and the moment conditions for the model in ( ).

We consider the first-order autoregressive model for c t, given by: z t = δ + θz t 1 + ɛ t, t = 1, 2,..., T, z 0 given. ( ) Q2. Derive the method of moments (MM) estimator of the parameters δ and θ.

We consider the first-order autoregressive model for c t, given by: z t = δ + θz t 1 + ɛ t, t = 1, 2,..., T, z 0 given. ( ) Q3. (Advanced) Write the model in ( ) for t = 1, 2,..., T using matrix notation. Hint: Define the T 1 vector Z = (z 1, z 2,..., z T ).

We consider the first-order autoregressive model for c t, given by: z t = δ + θz t 1 + ɛ t, t = 1, 2,..., T, z 0 given. ( ) Q4. (Advanced) Write the sample moments for the model in ( ) using matrix notation and derive the method of moments estimator of δ and θ.

Theoretical Exercise 1.1 Continued We consider the first-order autoregressive model for c t, given by: z t = δ + θz t 1 + ɛ t = x t β + ɛ t, t = 1, 2,..., T, ( ) where the initial value z 0 is given. We can always rewrite the model as: y t = x t β + ɛ t, t = 1, 2,..., T, ( ) by defining y t = z t and the (2 1) vectors x t = (1, z t 1), and β = (δ, θ). Given the moment conditions, E(x tɛ t) = 0, the method of moments estimator of β = (δ, θ) is given by: ( T ) 1 T β = x tx t x ty t. ( ) t=1 t=1 Q5. Re-write the expression for β in ( ) in terms of the true parameters β and a sampling error which depends only on x t and ɛ t for t = 1, 2,..., T.

We consider the (general) model: y t = x t β + ɛ t, t = 1, 2,..., T. ( ) Q5. Re-write the expression for β in ( ) in terms of the true parameters β and a sampling error which depends only on x t and ɛ t for t = 1, 2,..., T. ( T ) 1 T β = x tx t x ty t. ( ) t=1 t=1

Socrative Question 4 Consider the autoregressive distributed lag model for x t, given by: x t = θx t 1 + γy t + ɛ t, t = 1, 2,..., T, x 0 given. ( ) Q. What is the method of moments estimator of β = (θ, φ)? (A) β = ( T t=1 yty t ) 1 T t=1 ytxt. (B) β = ( T t=1 xtx t ) 1 T t=1 xtyt. T t=1 ytxt 1 (C) β = ( T t=1 x 2 t 1 (D) β = ( T t=1 x 2 t T t=1 xt 1xt T t=1 xt 1yt T t=1 y 2 t T t=1 xtxt 1 ) 1 ( T t=1 xt 1xt T t=1 ytxt ). ) 1 ( T T x 2 t=1 xtyt T ). t=1 t 1 t=1 xt 1yt (E) Don t know. Please go to www.socrative.com, click Student login, and enter room id Econometrics2.

2. Properties of the Estimator

Monte Carlo Illustration: Static Regression 1 We simulate M replications from the data generating process, y t = b 1z 1t + e t, t = 1, 2,..., T, (1) where z 1t N(0, 1), e t N(0, 1), and b 1 = 1. 2 For each replication of (y t, z 1t), we estimate the static linear regression model by OLS, which gives the estimates β(m) T given the sample size T. We plot, The distributions of β(m) M 1 M (MCSD). m=1 β (m) T y t = β 0 + β 1z 1t + ɛ t, t = 1, 2,..., T, (2) T. = ( β(m) 0,T, β(m) 1,T ) for m = 1, 2,..., M and plus-minus two Monte Carlo standard deviations for an increasing sample size T {20, 30,..., 1000}. What do you see?

Simulation Settings in OxMetrics We run the simulations in OxMetrics. Choose Model, select the PcGive module with the category Monte Carlo and model class Static Experiment using PcNaive. We use the settings:

What Do You See? Za Constant 10 10 5 5 0.90 0.95 1.00 1.05 1.10-0.10-0.05 0.00 0.05 0.10 1.5 Za 2MCSD 0.50 Constant 2MCSD 0.25 1.0 0.00-0.25 0 250 500 750 1000 0 250 500 750 1000

The Linear Regression Model and the MM Estimator

Socrative Question 5 Consider the linear regression model, y t = x t β + ɛ t, for t = 1, 2,..., T, where we assume that x t is a single explanatory with E(x t) = 0. The MM estimator can be written as β = T t=1 xtyt T t=1 x 2 t = β + T t=1 xtɛt T t=1 x 2 t. Q: In addition to stationarity of (y t, x t), what is required for consistency of the MM estimator, β β for T? (A) Strict exogeneity: E(ɛ t x 1, x 2,..., x T ) = 0. (B) Predeterminedness: E(ɛ t x t) = 0. (C) Moment conditions: E(x tɛ t) = 0. (D) Conditional homoskedasticity and no serial correlation: E(ɛ 2 t x t) = σ 2 and E(ɛ tɛ s x t, x s) = 0. (E) Don t know. (More importantly, can you sketch the proof of consistency and explain how the assumptions are used?)

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f e c o n o m i c s Consistency Consistency is the first requirement for an estimator: β should converge to β. Result 1: Consistency Let y t and x t obey the main assumption. If the regressors obey the moment condition, E(x tɛ t) = 0, then the OLS estimator is consistent, i.e., β β as T. The OLS estimator is consistent if the regression represents the conditional expectation of y t x t. In this case E(ɛ t x t) = 0, and β i has a ceteris paribus interpretation. The conditions in Result 1 are sufficient but not necessary. We will see examples of estimators that are consistent even if the conditions are not satisfied (related to unit roots and cointegration). Econometrics II Linear Regression with Time Series Data Slide 11/46

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f e c o n o m i c s Illustration of Consistency Consider the regression model with a single explanatory variable, k = 1, y t = x tβ + ɛ t, t = 1, 2,..., T, with E (x tɛ t) = 0 and E(x t) = 0. Write the OLS estimator as β = T 1 T t=1 ytxt T 1 T t=1 x 2 t = T 1 T (xtβ + ɛt)xt t=1 T 1 T x 2 t=1 t = β + T 1 T t=1 ɛtxt T 1 T t=1 x 2 t and look at the terms as T : T plim T 1 T xt 2 = σx, 2 0 < σx 2 < ( ) plim T 1 T t=1 T ɛ tx t = E (ɛ tx t) = 0 t=1 Where LLN applies under Assumption 1; and ( ) holds for a stationary process (σ 2 x is the limiting variance of x t). ( ) follows from predeterminedness., ( ) Econometrics II Linear Regression with Time Series Data Slide 12/46

Exercise: Proof of Unbiasedness In the linear regression model, y t = x t β + ɛ t, t = 1, 2,..., T, where y t and x t are stationary, the MM estimator can be written as: T t=1 β = β + xtɛt T. t=1 xtx t Assuming strict exogeneity, E(ɛ t x 1, x 2,..., x T ) = 0, the MM estimator is conditionally and unconditionally unbiased. Q1. Use the law of iterated expecations to prove that the assumption of strict exogeneity implies that the MM estimator is conditionally unbiased: E(ɛ t x 1, x 2,..., x T ) = 0 E( β x1, x 2,..., x T ) = β. Why is predeterminedness not sufficient for conditional unbiasedness? Q2. Use the law of iterated expectations to prove that conditional unbiasedness implies unconditional unbiasedness: E( β x1, x 2,..., x T ) = 0 E( β) = β.

Q1. Use the law of iterated expecations to prove that the assumption of strict exogeneity implies that the MM estimator is conditionally unbiased: E(ɛ t x 1, x 2,..., x T ) = 0 E( β x1, x 2,..., x T ) = β. Why is predeterminedness not sufficient for conditional unbiasedness? Q2. Use the law of iterated expectations to prove that conditional unbiasedness implies unconditional unbiasedness: E( β x1, x 2,..., x T ) = 0 E( β) = β.

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f e c o n o m i c s Unbiasedness A stronger requirement for an estimator is unbiasedness: E( β) = β. Result 2: Unbiasedness Let y t and x t obey the main assumption. If the regressors are strictly exogenous, E (ɛ t x 1, x 2,..., x t,..., x T ) = 0, then the OLS estimator is unbiased, i.e., E( β x1, x 2,..., x T ) = β. Unbiasedness requires strict exogeneity, which is not fulfilled in a dynamic regression. Consider the first order autoregressive model y t = θy t 1 + ɛ t. Here y t is function of ɛ t, so ɛ t cannot be uncorrelated with y t, y t+1,..., y T. Result 3: Estimation bias in dynamic models In general, the OLS estimator is not unbiased in regressions with lagged dependent variables. Econometrics II Linear Regression with Time Series Data Slide 13/46

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f e c o n o m i c s Example: Finite Sample Bias in an AR(1) Consider a first-order autoregressive, AR(1), model, y t = θy t 1 + ɛ t, t = 1, 2,..., T, ( ) where θ = 0.9. The time series y t satisfy Assumption 1 of stationarity and weak dependence (we will return to the properties of y t later). The OLS estimator is, ( T ) 1 ( T θ = t=1 y 2 t 1 t=1 y t 1y t ) We are often interested in E( θ) to check for bias for a given T. This is typically difficult to derive analytically.. But if we could draw realizations of θ, then we could estimate E( θ). Econometrics II Linear Regression with Time Series Data Slide 14/46

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f e c o n o m i c s Finite Sample Bias in an AR(1) MC simulation: 1 Construct (randomly) M artificial data sets from the AR(1) model ( ). 2 Estimate the model ( ) for each data set and get the estimates, θ(m), for m = 1, 2,..., M. 3 Compute the Monte Carlo mean, the Monte Carlo standard deviation, and an estimate of the bias: M MEAN( θ) = M 1 θ (m) MCSD( θ) = m=1 M ( θ(m) ) 2 M 1 MEAN( θ) m=1 Bias( θ) = MEAN( θ) θ. Note that by the LLN (for independent observations, since the θ(m) s are independent): M MEAN( θ) = M 1 θ (m) E( θ) for M. m=1 Econometrics II Linear Regression with Time Series Data Slide 15/46

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f e c o n o m i c s Finite Sample Bias in an AR(1): Monte Carlo Simulations in PcNaive We consider Monte Carlo simulations to illustrate potential bias of the OLS estimator for θ in the AR(1) model, ( ). We consider the parameter values: θ {0.0, 0.5, 0.9}. This can be done quite easily in the module PcNaive in OxMetrics. 1 In OxMetrics, click the Model menu and select Model. 2 In the window that pops up, select the PcGive module. In Category select Monte Carlo and in Model class select AR(1) Experiment using PcNaive. 3 Click Formulate. 4 In the Formulate window you must specify the DGP, the estimating model, the number of replications, the sample length(s), and some output settings. See next slide. Econometrics II Linear Regression with Time Series Data Slide 16/46

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f e c o n o m i c s Finite Sample Bias in an AR(1): PcNaive Econometrics II Linear Regression with Time Series Data Slide 17/46

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f e c o n o m i c s Finite Sample Bias in an AR(1): PcNaive Output for θ = 0 Ya_1 5.0 2.5-0.25-0.20-0.15-0.10-0.05 0.00 0.05 0.10 0.15 0.20 0.25 0.5 Ya_1 2MCSD 0.0-0.5 20 40 60 80 100 120 140 160 180 200 Econometrics II Linear Regression with Time Series Data Slide 18/46

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f e c o n o m i c s Finite Sample Bias in an AR(1): PcNaive Output for θ = 0.5 Ya_1 5.0 2.5 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 1.0 Ya_1 2MCSD 0.5 0.0 20 40 60 80 100 120 140 160 180 200 Econometrics II Linear Regression with Time Series Data Slide 19/46

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f e c o n o m i c s Finite Sample Bias in an AR(1): PcNaive Output for θ = 0.9 30 Ya_1 20 10 0.84 0.85 0.86 0.87 0.88 0.89 0.90 0.91 0.92 0.93 0.94 1.25 Ya_1 2MCSD 1.00 0.75 0.50 0 100 200 300 400 500 600 700 800 900 1000 Econometrics II Linear Regression with Time Series Data Slide 20/46

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f e c o n o m i c s Finite Sample Bias in an AR(1) Finite Sample Bias in an AR(1) In a Monte Carlo simulation, we take an AR(1) as the DGP and In a MC simulation we take an AR(1) as the DGP and estimation model: estimation model: =0 9 1 + (0 1) y t = 0.9 y t 1 + ɛ t, ɛ t N(0, 1). Mean of OLS estimate in AR(1) model 1.2 1.1 1.0 0.9 0.8 True value MEAN 0.7 0.6 0.5 MEAN ± 2 MCSD 0.4 10 20 30 40 50 60 70 80 90 100 11 of 21 Econometrics II Linear Regression with Time Series Data Slide 21/46

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f e c o n o m i c s Asymptotic Distribution To derive the asymptotic distribution we need a CLT; additional restrictions on ɛ t. Result 4: Asymptotic distribution and OLS variance Let y t and x t obey the main assumption and let x t and ɛ t be uncorrelated, E(ɛ tx t) = 0. Furthermore, assume (conditional) homoskedasticity and no serial correlation, i.e., E ( ) ɛ 2 t x t = σ 2 E (ɛ tɛ s x t, x s) = 0 for all t s. Then as T, the OLS estimator is asymptotically normal with usual variance: T ( β β ) D N(0, σ 2 E(x tx t ) 1 ). Inserting estimators, we can test hypotheses using ( ( T ) 1 ) β a N β, σ 2 x tx t. t=1 Recall that E(ɛ tx t) = 0 is implied by E[ɛ t x t] = 0. Econometrics II Linear Regression with Time Series Data Slide 22/46

3. Dynamic Completeness and Autocorrelation

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f e c o n o m i c s A Dynamically Complete Model Result 4 (Asymptotic distribution of the OLS estimator) required no serial correlation in the error terms. The precise condition for no serial correlation looks strange. Often we disregard conditioning and consider whether ɛ t and ɛ s are uncorrelated. We say that a model is dynamically complete if E(y t x t, y t 1, x t 1, y t 2, x t 2,..., y 1, x 1) = E(y t x t) = x t β. x t contains all relevant information in the available information set. No-serial-correlation is practically the same as dynamic completeness. All systematic information in the past of y t and x t is used in the regression model. This is often taken as an important design criteria for a dynamic regression model. We should always test for no-autocorrelation in time series models. Econometrics II Linear Regression with Time Series Data Slide 24/46

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f e c o n o m i c s Autocorrelation of the Error Term If Cov(ɛ t, ɛ s) 0 for some t s, we have autocorrelation of the error term. This is detected from the estimated residuals and Cov( ɛ t, ɛ s) 0 is referred to as residual autocorrelation. Often used synonymously. Residual autocorrelation does not imply that the DGP has autocorrelated errors. Typically, autocorrelation is taken as a signal of misspecification. Different possibilities: 1 Autoregressive errors in the DGP. 2 Dynamic misspecification. 3 Omitted variables and non-modelled structural shifts. 4 Misspecified functional form The solution to the problem depends on the interpretation. Econometrics II Linear Regression with Time Series Data Slide 25/46

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f e c o n o m i c s Consequences of Autocorrelation Autocorrelation will not violate the assumptions for Result 1 (consistency) in general. But E(x tɛ t) = 0 is violated if the model includes a lagged dependent variable. Look at an AR(1) model with error autocorrelation, i.e., the two equations y t = θy t 1 + ɛ t ɛ t = ρɛ t 1 + v t, v t IID(0, σ 2 v ). Both y t 1 and ɛ t depend on ɛ t 1, so E(y t 1ɛ t) 0. Result 5: Inconsistency of OLS In a regression model including the lagged dependent variable, the OLS estimator is not consistent in the presence of autocorrelation of the error term. Even if OLS is consistent, the standard formula for the variance in Result 4 is no longer valid. It is possible to derive the variance, the so-called heteroskedasticity-and-autocorrelation-consistent (HAC) standard errors. Econometrics II Linear Regression with Time Series Data Slide 26/46

Simulation Illustration: Inconsistency in Dynamic Model Data-generating Process We simulate y t from the stationary AR(2) model with autocorrelated errors, y t = θ 1y t 1 + θ 2y t 2 + ɛ t, ɛ t = ρɛ t 1 + v t, ( ) ( ) for t = 1, 2,..., T, where v t is assumed IIDN(0,1) and the initial values are y 0 = y 1 = 0. We set θ 1 = 0.7 and θ 2 = 0.2. We simulate M = 10, 000 replications with sample size T = 200, 1000. We first simulated with ρ = 0.0 (no autocorrelation in ɛ t), then with ρ = 0.9 (high autocorrelation in ɛ t). Estimated Model For the simulated data, we estimate the AR(2) model in ( ) by MM/OLS: ( T ) 1 ( T ) ( θ1 = θ 2 t=1 ( yt 1 y t 2 ) (yt 1 y t 2 ) t=1 ( ) ) yt 1 y y t t 2 We plot the distributions of θ1, θ2, T ( θ1 θ 1), and T ( θ2 θ 2). Consistency of the estimator requires the moment condition, which is violated due to the autocorrelation of the error-term.

Simulation Illustration: Inconsistency in Dynamic Model

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f e c o n o m i c s (1) Autoregressive Errors in the DGP Consider the case where the errors are truly autoregressive: y t = x t β + ɛ t If ρ is known we can write ɛ t = ρɛ t 1 + v t, v t IID(0, σ 2 v ). (y t ρy t 1) = ( x t ρx t 1) β + (ɛt ρɛ t 1) y t = ρy t 1 + x t β x t 1ρβ + v t. The transformation is analog to GLS transformation in the case of heteroskedasticity. The GLS model is subject to a so-called common factor restriction: three regressors but only two parameters, ρ and β. Estimation is non-linear and can be carried out by maximum likelihood. Econometrics II Linear Regression with Time Series Data Slide 27/46

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f e c o n o m i c s Consistent estimation of the parameters in the GLS model requires E((x t ρx t 1) (ɛ t ρɛ t 1)) = 0. But then ɛ t 1 should be uncorrelated with x t, i.e., E(ɛ tx t+1) = 0. Consistency of GLS requires stronger assumptions than consistency of OLS. The GLS transformation is rarely used in modern econometrics. 1 Residual autocorrelation does not imply that the error term is autoregressive. There is no a priori reason to believe that the transformation is correct. 2 The requirement for consistency of GLS is strong. Econometrics II Linear Regression with Time Series Data Slide 28/46

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f e c o n o m i c s (2) Dynamic Misspecification Residual autocorrelation indicates that the model is not dynamically complete, so E(y t x t) E(y t x t, y t 1, x t 1, y t 2, x t 2,..., y 1, x 1). The (dynamic) model is misspecified and should be reformulated. Natural remedy is to extend the list of lags of x t and y t. If autocorrelation seems of order one, then a starting point is the GLS transformation. The AR(1) structure is only indicative and we look at the unrestricted ADL model y t = α 0y t 1 + x t α 1 + x t 1α 2 + η t. Finding of first-order autocorrelation is only used to extend the list of regressors. The common factor restriction is removed. Econometrics II Linear Regression with Time Series Data Slide 29/46

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f e c o n o m i c s (3) Omitted Variables Omitted variables in general can also produce autocorrelation. Let the DGP be y t = x 1t β 1 + x 2t β 2 + ɛ t, and consider the estimation model ( ) y t = x 1t β 1 + u t. ( ) Then the error term is u t = x 2t β 2 + ɛ t, which is autocorrelated if x 2t is persistent. An example is if the DGP exhibits a level shift, e.g., ( ) includes the dummy variable { 0 for t < T0 x 2t =. 1 for t T 0 If x 2t is not included in ( ) then the residual will be systematic. Again the solution is to extend the list of regressors, x t. Econometrics II Linear Regression with Time Series Data Slide 30/46

Simulation Illustration: Omitted Level-Shift

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f e c o n o m i c s (4) Misspecified Functional Form If the true relationship y t = g(x t) + ɛ t is non-linear, the residuals from a linear regression will typically be autocorrelated. The solution is to try to reformulate the functional form of the regression line. Econometrics II Linear Regression with Time Series Data Slide 31/46

4. Model Formulation and Misspecification Testing

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f e c o n o m i c s Model Formulation and Properties of the Estimator We consider the linear regression model for time series, y t = x t β + ε t. (*) If there is no perfect multicolleniarity in x t we can always compute the OLS estimator β and consider the estimated model, y t = x t β + ε t, for the sample t = 1, 2,...T. (**) Under specific assumptions can we derive properties of the estimator β: 1 Consistency: β β as T. 2 Unbiasedness: E[ β x1, x 2,..., x T ] = β. 3 Asymptotic distribution: β a N(β, σ 2 ( T t=1 xtx t ) 1 ). However, if ( ) does not satisfy the specific assumptions we cannot use the results in (1)-(3) to interpret the estimated coefficients and to do statistical inference (e.g. test the hypothesis H 0 : β i = 0). Econometrics II Linear Regression with Time Series Data Slide 33/46

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f e c o n o m i c s Misspecification Testing We can never show that the model is well specified; but we can certainly find out if it is not in a specific direction! Misspecification testing. We consider the following standard tests on the estimated errors, ε t: 1 Test for no-autocorrelation. 2 Test for no-heteroskedasticity. 3 Test for normality of the error term. If all tests are passed, we may think of the model as representing the main features of the data and we can use the results in (1) (3), e.g. for testing hypotheses on estimated parameters. Econometrics II Linear Regression with Time Series Data Slide 34/46

Socrative Question 6 For the estimated model for consumption, we get the following output for the test for no autocorrelation of order 1-2. Error autocorrelation coefficients in auxiliary regression: Coefficient Std.Error t value Lag 1 0.18842 0.1305 1.444 Lag 2 0.092372 0.103 0.8972 RSS = 0.0304648 sigma = 0.000210102 Testing for error autocorrelation from lags 1 to 2 Chi^2(2) = 5.7742 [0.0557] and F form F(2,145) = 2.8826 [0.0592] Q: What is your conclusion to the LM test for no autocorrelation of order 1-2? (A) We cannot reject the null of autocorrelation. (B) We cannot reject the null of no autocorrelation. (C) We reject the null of autocorrelation. (D) We reject the null of no autocorrelation. (E) Don t know. Please go to www.socrative.com, click Student login, and enter room id Econometrics2.

Socrative Question 7 We estimate the model for the (log) first-difference in consumption c t = δ + γ 1 y t + γ 2 w t + ɛ t, ɛ t IID(0, σ 2 ), by method of moments using quarterly data for 1973(1) 2010(4). We get the following results for the test for no autocorrelation of order 1-5: Error autocorrelation coefficients in auxiliary regression: Coefficient Std.Error t value Lag 1 0.31365 0.08351 3.756 Lag 2 0.0036568 0.08552 0.04276 Lag 3 0.024997 0.08545 0.2925 Lag 4 0.14346 0.08624 1.663 Lag 5 0.17008 0.08378 2.03 RSS = 0.0326929 sigma = 0.000228622 Testing for error autocorrelation from lags 1 to 5 Chi^2(5) = 18.574 [0.0023]** and F form F(5,143) = 4.0113 [0.0020]** Q. What are the properties of the MM estimator of θ = (δ, γ 1, γ 2)? (A) Consistent and asymptotically normally distributed. (B) Inconsistent. (C) Consistent, but not asymptotically normally distributed. (D) Consistent and asymptotically normally distributed, but standard asymptotic variance (Result 4) is wrong. (E) Don t know.

Statistical Testing 1 Null hypothesis: 2 Test statistics: 3 Asymptotic distribution of the test statistics under the null:

Socrative Question 8 Assume that we get the following output for the test for heteroskedasticity: Heteroscedasticity coefficients: Coefficient Std.Error t-value dy 0.0010634 0.0017557 0.60567 dw -0.00019348 0.0023341-0.082893 ECM_1 0.0011928 0.0018455 0.64629 dy^2 0.10949 0.038701 2.8291 dw^2-0.029320 0.075593-0.38787 ECM_1^2 0.026705 0.057277 0.46624 RSS = 2.92449e-005 sigma = 0.000450655 effective no. of parameters = 7 Regression in deviation from mean Testing for heteroscedasticity using squares Chi^2(6) = 13.418 [0.0369]* and F-form F(6,144) = 2.3407 [0.0346]* Q: Which of the following statements is correct about the MM estimator? (A) β is inconsistent. (B) β is inconsistent, but T ( β β) is asymptotically normal. (C) β is consistent, but T ( β β) is not asymptotically normal. (D) β is consistent and T ( β β) is asymptotically normal. (E) Don t know.

Socrative Question 9 Q: Why do we care about normality of the error term? (A) If ɛ t Niid(0, σ 2 ), then the MM estimator converges faster to the asymptotic normal distribution and it is efficient. (B) The MM estimator is inconsistent if ɛ t is not normally distributed and independent over time. (C) The t-test of the null that β i = 0, t βi =0 = βi/s.e.( βi) is only asymptotically N(0, 1) if ɛ t Niid(0, σ 2 ). (D) The MM estimator is not asymptotically normally distributed unless ɛ t is normally distributed. (E) Don t know.

Socrative Question 10 For the estimated model for consumption, we get the misspecification tests: AR 1 5 test: F(5,142) = 1.9539 [0.0891] Normality test: Chi^2(2) = 36.929 [0.0000]** Hetero test: F(6,144) = 2.3407 [0.0346]* Q. What do you conclude regarding the misspecification tests? (A) The estimated residuals are not autocorrelated, homoskedastic, and normal. (B) The estimated residuals are homoskedastic and normal, but autocorrelated. (C) The estimated residuals are not autocorrelated, but heteroskedastic and non-normal. (D) The estimated residuals are autocorrelated, heteroskedastic, and non-normal. (E) Don t know.

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f e c o n o m i c s (1) Tests for No-Autocorrelation Let ɛ t (t = 1, 2,..., T ) be the estimated residuals from the original regression model y t = x t β + ɛ t. A test for no-autocorrelation (of first order) is based on the hypothesis γ = 0 in the auxiliary regression ɛ t = x t δ + γ ɛ t 1 + u t, where x t is included because it may be correlated with ɛ t 1. A valid test is t ratio for γ = 0. Alternatively there is the Breusch-Godfrey LM test LM = T R 2 χ 2 (1). Note that x t and ɛ t are orthogonal, and any explanatory power is due to ɛ t 1. The Durbin Watson (DW) test is derived for finite samples. Based on strict exogeneity. Not valid in many models. Econometrics II Linear Regression with Time Series Data Slide 35/46

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f e c o n o m i c s (2) Test for No-Heteroskedasticity Consider the auxiliary regression ɛ 2 t = γ 0 + x 1tγ 1 +... + x kt γ k + x 2 1tδ 1 +... + x 2 ktδ k + u t. A test for unconditional homoskedasticity is based on the null hypothesis: γ 1 =... = γ k = δ 1 =... = δ k = 0. The alternative is that the variance of ɛ t depends on x it or the squares x 2 it for some i = 1, 2,...k. The LM test, LM = T R 2 χ 2 (2k). Econometrics II Linear Regression with Time Series Data Slide 36/46

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f e c o n o m i c s (3) Test for Normality of the Error Term Skewness (S) and kurtosis (K) are the estimated third and fourth central moments of the standardized estimated residuals u t = ( ε t ε)/ σ.: S = T 1 T ut 3 and K = T 1 t=1 T ut 4. The normal distribution is symmetric and has S = 0, while K = 3 (K 3 is often referred to as excess kurtosis). If K > 3, the distribution has fat tails. Under the assumption of normality, t=1 ξ S = T 6 S2 χ 2 (1) ξ K = T 24 (K 3)2 χ 2 (1). It turns out that ξ S and ξ K are independent: Jarque-Bera joint test of normality: ξ JB = ξ S + ξ K χ 2 (2). Econometrics II Linear Regression with Time Series Data Slide 37/46

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f e c o n o m i c s Notes on Normality of the Error Term Often normality of the error terms is rejected due to the presence of a few large outliers that the model cannot account for (i.e. they are captured by the error term). Start with a plot and a histogram of the (standardized) estimated residuals. Any large outliers of, say, more than three standard deviations? A big residual at time T 0 may be accounted for by including a dummy variable (x t = 1 for t = T 0, x t = 0 for all other t). Note that the results for the linear regression model hold without assuming normality of ε t. However, the normal distribution is a natural benchmark and given normality: β converges faster to the asymptotic normal distribution. The OLS estimator coincides with the maximum likelihood (ML) estimator. Econometrics II Linear Regression with Time Series Data Slide 38/46

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f e c o n o m i c s Model Formulation and Misspecification Testing in Practice Finding a well-specified model for the data of interest can be hard in practice! Potential reasons include: Which variables to include in x t? Economic theory can often guide us. How many lags to include to obtain a dynamically complete model? Are the model parameters constant over the sample or do we need to include structural breaks? Do we need to include dummy variables to capture large outliers? Think about which model and which formulation is useful to represent the characteristics of the data! In practice we use an iterative procedure: 1 Estimate the model for a specific formulation. 2 Do misspecification tests on the estimated errors. 3 If the misspecification tests are rejected, re-formulate the model. Repeat (1) and (2) until the misspecification tests are not rejected. General-to-Specific: Start with a larger model and simplify it by removing insignificant variables. Econometrics II Linear Regression with Time Series Data Slide 39/46

5. The Frisch-Waugh-Lovell Theorem

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f e c o n o m i c s Motivation Consider the linear regression model, y t = µ + β 1x t + ε t, t = 1, 2,..., T, (3) and the linear regression model for the de-meaned variables ỹ t = y t ȳ and x t = x t x, given by Question: Would you expect ˆβ 1 to equal ˆb 1? ỹ t = b 1 x t + u t, t = 1, 2,..., T. (4) Econometrics II Linear Regression with Time Series Data Slide 41/46

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f e c o n o m i c s Motivation: The Frisch-Waugh-Lovell Theorem Consider the linear regression model y t = µ + β 1x t + ε t, t = 1, 2,..., T, (5) and the linear regression model for the de-meaned variables ỹ t = y t ȳ and x t = x t x, given by Question: Would you expect ˆβ 1 to equal ˆb 1? ỹ t = b 1 x t + u t, t = 1, 2,..., T. (6) (a) Scatterplot of y t (vertical axis) and x t (horizontal axis) (b) Scatterplot of ~ y t = y t ȳ (vertical axis) and ~ x t = x t x (horizontal axis) 20 y t x t 5 yt ~y t ~ x t 15 0 10 0 2 4 6 8-5 -4-2 0 2 4 The Frisch-Waugh-Lovell theorem tells us that ˆβ 1 = ˆb 1. Econometrics II Linear Regression with Time Series Data Slide 42/46

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f e c o n o m i c s Model Consider the more general model with x 1t K 1 1 and x 2t K 2 1. y t = x 1tβ 1 + x 2tβ 2 + ε t, t = 1,..., T, The Frisch-Waugh-Lovell theorem tells us that we can obtain the OLS estimate of β 1 in two ways: 1 (The usual way) Regress y t on (x 1t, x 2t) and obtain, ˆβ 1. 2 (A three step procedure) Regress y t on x 2t and obtain the residuals, yt. Next, regress x 1t on x 2t and obtain the residuals, x1t. Lastly, regress yt on x1t and obtain ˆβ 1. The the two estimates are numerically identical. A formal proof will be posted on Absalon. Econometrics II Linear Regression with Time Series Data Slide 43/46

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f e c o n o m i c s Why is it a useful theorem? In some cases, it will allow to greatly simplify the model: Demeaning variables: x 2t = 1. Detrending variables: See PS #1. Removing fixed effects in a panel data model (Econometrics I). Econometrics II Linear Regression with Time Series Data Slide 44/46

6. Recap: Linear Regression Model with Time Series Data

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f e c o n o m i c s Recap: y t = x tβ + ε t Main assumption: Cross-section vs. Time series (1) No perfect collinearity in X variables (2) Strict exogeneity Cross-Section: independent and identically distributed Time Series: weak dependence and stationarity Technical requirements for the application of a LLN and a CLT. OLS: Assumptions required for implementation and properties E(ɛ t x 1, x 2,..., x T ) = 0 (3) Contemporaneous exogeneity (predeterminedness) E(ɛ t x t ) = 0 (4) Moment conditions E(x t ɛ t ) = 0 (5) Conditional homoskedasticity E(ɛ 2 t xt ) = σ2 (6) No serial correlation E(ɛ t ɛ s x t, x t ) = 0 t s Computation Consistency Unbiasedness Asympt. Distr. β = (X X) 1 X y plim β = β E( β) = β β a Normal Note: Since (2) : E(ɛ t x 1, x 2,..., x T ) = 0 (3) : E(ɛ t x t ) = 0 (4) : E(ɛ t x t ) = 0, but the reverse is not true, conditions ( ) are stronger and therefore not necessary, as long as one of the less strong required assumptions mentioned in the table is fulfilled. : In order obtain the usual OLS variance. Econometrics II Linear Regression with Time Series Data Slide 46/46

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f e c o n o m i c s Recap: y t = x tβ + ε t Main assumption: Cross-section vs. Time series Cross-Section: independent and identically distributed Time Series: weak dependence and stationarity Technical requirements for the application of a LLN and a CLT. OLS: Assumptions required for implementation and properties (1) No perfect collinearity in X variables (2) Strict exogeneity E(ɛ t x 1, x 2,..., x T ) = 0 (3) Contemporaneous exogeneity (predeterminedness) E(ɛ t x t ) = 0 (4) Moment conditions E(x t ɛ t ) = 0 (5) Conditional homoskedasticity E(ɛ 2 t xt ) = σ2 (6) No serial correlation E(ɛ t ɛ s x t, x t ) = 0 t s Computation Consistency Unbiasedness Asympt. Distr. β = (X X) 1 X y plim β = β E( β) = β β a Normal Note: Since (2) : E(ɛ t x 1, x 2,..., x T ) = 0 (3) : E(ɛ t x t ) = 0 (4) : E(ɛ t x t ) = 0, but the reverse is not true, conditions ( ) are stronger and therefore not necessary, as long as one of the less strong required assumptions mentioned in the table is fulfilled. : In order obtain the usual OLS variance. ( ) ( ) Econometrics II Linear Regression with Time Series Data Slide 46/46

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f e c o n o m i c s Recap: y t = x tβ + ε t Main assumption: Cross-section vs. Time series Cross-Section: independent and identically distributed Time Series: weak dependence and stationarity Technical requirements for the application of a LLN and a CLT. OLS: Assumptions required for implementation and properties Computation Consistency Unbiasedness Asympt. Distr. β = (X X) 1 X y plim β = β E( β) = β β a Normal (1) No perfect collinearity in X variables (2) Strict exogeneity E(ɛ t x 1, x 2,..., x T ) = 0 ( ) (3) Contemporaneous exogeneity (predeterminedness) E(ɛ t x t ) = 0 (4) Moment conditions E(x t ɛ t ) = 0 (5) Conditional homoskedasticity E(ɛ 2 t xt ) = σ2 (6) No serial correlation E(ɛ t ɛ s x t, x t ) = 0 t s Note: Since (2) : E(ɛ t x 1, x 2,..., x T ) = 0 (3) : E(ɛ t x t ) = 0 (4) : E(ɛ t x t ) = 0, but the reverse is not true, conditions ( ) are stronger and therefore not necessary, as long as one of the less strong required assumptions mentioned in the table is fulfilled. : In order obtain the usual OLS variance. ( ) Econometrics II Linear Regression with Time Series Data Slide 46/46

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f e c o n o m i c s Recap: y t = x tβ + ε t Main assumption: Cross-section vs. Time series Cross-Section: independent and identically distributed Time Series: weak dependence and stationarity Technical requirements for the application of a LLN and a CLT. OLS: Assumptions required for implementation and properties Computation Consistency Unbiasedness Asympt. Distr. β = (X X) 1 X y plim β = β E( β) = β β a Normal (1) No perfect collinearity in X variables (2) Strict exogeneity E(ɛ t x 1, x 2,..., x T ) = 0 ( ) ( ) (3) Contemporaneous exogeneity (predeterminedness) E(ɛ t x t ) = 0 ( ) ( ) (4) Moment conditions E(x t ɛ t ) = 0 (5) Conditional homoskedasticity E(ɛ 2 t xt ) = σ2 (6) No serial correlation E(ɛ t ɛ s x t, x t ) = 0 t s Note: Since (2) : E(ɛ t x 1, x 2,..., x T ) = 0 (3) : E(ɛ t x t ) = 0 (4) : E(ɛ t x t ) = 0, but the reverse is not true, conditions ( ) are stronger and therefore not necessary, as long as one of the less strong required assumptions mentioned in the table is fulfilled. : In order obtain the usual OLS variance. Econometrics II Linear Regression with Time Series Data Slide 46/46