The regression model with one stochastic regressor (part II)

The regression model with one stochastic regressor (part II) 3150/4150 Lecture 7 Ragnar Nymoen 6 Feb 2012

We will finish Lecture topic 4: The regression model with stochastic regressor We will first look at an application: The Norwegian Phillips curve over a long historical period (seperate note): Reminder about the importance of variable transformation for obtaining a conditional expectations function that is linear in parameters Then look at an special case of the theory: Regression with variables that are jointly normally distributed. It is useful as reference and for introducing two issues that that did not arise in RM1: Why regress y and x and not x on y? And what is the relationship between regression and correlation and between regression and causality? Finally, we define exogeneity as an econometric concept, and

extend the regression model to time series data References: See Lecture 6 and the more detailed references that we give below.

Binormal variables I Before we begin: Remember that regression does not require normally distributed variables : In fact already RM1 showed that! Assume that we have stochastic variables (y i,x i ), i = 1, 2,..., n that are generated by the following system of linear equations: y i = µ y + ɛ y,i (1) x i = µ x + ɛ x,i (2) where µ y and µ x are parameters and ɛ y,i and ɛ x,i have a normal joint probability distribution ( ) ( ( )) ɛxi σ 2 N 0, x ω xy ɛ yi ω xy σy 2 (3)

Binormal variables II ɛ x,i and ɛ y,i are therefore bivariate normal with expectation zero and covariance matrix ( σ 2 x ω xy ω xy σ 2 y ). The correlation coeffi cient between ɛ x,i and ɛ y,i is: ρ xy = ω xy σ x σ y. It is the population correlation coeffi cient. Since linear combination of normally distributed variables are also normally distributed, it follows that y i and x i given by (1), (2) are also normally distributed

Binormal variables III From the properties of the normal distribution: the distribution of y i conditional on x i is also normal, with expectation E[y i x i ] = µ y ρ xy σ y σ x µ x } {{ } β 1 + ρ xy σ y σ x }{{} β 2 = β 1 + β 2 x i (4) We will not derive this, but if you are interested, see e.g., BN kap 4.5.6 and 5.7 x i

Binormal variables IV If we define the stochastic variables e i, i = 1, 2,..., n we see that the regression model: e i = y i E (y i x i ) (5) y i = β 1 + β 2 x i + e i. (6) gives y i as the sum of the conditional expectations function (4) and the disturbance e i. This is of course a general characterization of RM2, what we have gained by assuming a bivariate normal is that β 1 and β 2 have been expressed as functions of the underlying population parameters µ x, µ y, σ 2 x, σ 2 y and ρ xy.

Binormal variables V Note that e i can be written as e i = µ y + ɛ yi β 1 β 2 (µ x + ɛ xi ) = ɛ yi ω xy σ 2 x Which can be used to show: ɛ xi E (e i ) = 0, E (e i ɛ xi ) = 0 Var(e i ) σ 2 = σ 2 y (1 ρ 2 xy ) (7) E (x i e i ) = 0 for all i In particular (7) shows that the reduction in unexplained variance of y i relative to total variance of y i is due to correlation.

Regression, correlation and causality I The statistical system given by (1), (2) and (3) is mapped into model form: y i = β 1 + β 2 x i + e i (8) x i = µ x + ɛ x,i (9) where is (8) is the conditional model of y i given x i (what we wish to explain) and (9) is the marginal model of x i (what we do not try to explain). Note: this does not mean that (8) and (9) prove that x i is causing y i!

Regression, correlation and causality II An equally valid model of the statistical system is x i = γ 1 + γ 2 y i + ε i (10) y i = µ y + ɛ y,i (11) where ε i has similar properties as e i, but for the case where we model x i conditionally on y i. γ 2 can be shown to be which is not β 2 and not 1 β 2 either. γ 2 = ρ xy σ x σ y

Regression, correlation and causality III Note that you have shown in Seminar exercise 1 that the same results hold in the data, i.e., when the population parameters are replaced by empirical moments! So we have two conditional model, one representing x y and the other y x How can we tell which of them represents causation? The general answer is that we cannot assert causality from regression alone that can only be done with reference to (subject matter) theory!

Regression, correlation and causality IV Recall the picture of econometrics as a combined discipline that we began with! Looking ahead to intermediate and advanced courses: In both cross section data and with time series data we have often access to natural experiments, that can make it possible to substantiate a causal interpretation

Exogeneity defined I Part of the specification of RM2 was that E (e i x h ) = 0, i and h (12) which implies that the disturbance e i is uncorrelated with all x h variables: cov(e i, x h ) = 0, i and h (13) We showed that, because of conditioning, we had for h = i cov(e i, x i ) = 0 (14) is an inherent property of the model. It always holds. However: (13) is a more general statement than (14).

Exogeneity defined II It is therefore custom to include (13) as an assumption in the model specification. This assumption is called the assumption of exogenous explanatory variable, cf HGL p 402 and BN. We will now look at two examples where exogeneity fail, but with different consequences for the OLS estimators

The measurement error model Measurement error in the regressor I HGL Ch 10.2. BN kap 6.3 Assume that the parameters of interest is between an observable variable y i and an unobservable variable x (permanent income is the example in HGL) y i = β 1 + β 2 x i + v i By the same assumptions as for RM2, but using the symbols xi and v i is place of x i and e i, this can formulated as a regression model. However, that model would be irrelevant for practice since x i is unobservable

The measurement error model Measurement error in the regressor II To formulate a model in observables weextend the list of assumption with x i = xi + u i where u i is a random measurement error that is uncorrelated with both v i and x i. It is tempting to say that is a valid regression model. y i = β 1 + β 2 x i + e i (15)

The measurement error model Measurement error in the regressor III However, since e i in this case must be e i = v i β 2 u i then cov(e i, x i ) = β 2 var(u i ) = 0 (16) showing that x i cannot be regarded as exogenous in (15). If we estimate (15) by OLS, what do we get in terms of properties? We will only motivate an answer, since a precise answer will use Probability limits that will be explained under Topic 6

The measurement error model Measurement error in the regressor IV As always, the OLS estimator for β 2 can be written as ˆβ 2 = n i=1(x i x)y i n i=1(x i x) 2 = β 2 + n i=1(x i x)e i n i=1(x i x) 2 Unlike in RM2, we cannot show ( n ) E i=1 (x i x)e i n i=1(x i x) 2 = 0 with the use of conditional expectation because x i and e i contain common stochastic variables.

The measurement error model Measurement error in the regressor V Intuitively however, we can guess that there is going to be a bias since the n i=1(x i x)e i is an empirical counterpart to cov(e i, x i ), which is non-zero from the specification of the model. This turns out to be true: In fact failure of exogenity of x implies that ˆβ 2 becomes inconsistent: We do not get the exactly true β 2 even in infinitely large samples. Looking ahead: The method of moments (Topic 10) can be used instead of OLS to obtain a consistent estimator.

The measurement error model Measurement error in y If the only departure from RM2 is that we have y i = β 1 + β 2 x i + v i where y is unobservable, the consequences are different. As long as the measurement error in y is uncorrelated with x, the model in terms of the observables has the same properties as before. In particular: No bias of OLS estimator for β 2! Show as a DIY!

The Lucas critique Rational expectations and the Lucas critique I The measurement error model can be use to explain the famous Lucas critique in macroeconomics Let x t represent the expected value of x t. Under the hypothesis of adaptive expectations the OLS estimator of β 2 remains consistent. But under the assumption of rational expectations we have that u i in x i = xi + u i represents a random expectations error. The result is that OLS gives an inconsistent estimator of the structural parameter β 2.

The Lucas critique Rational expectations and the Lucas critique II Inconsistent because the OLS estimator is contaminated by parameters of the expectations formation process. Moreover: Since expectations change when policy changes, the OLS estimator ˆβ 2 is subject to structural breaks: It will change when policy chagnes and will be an unreliable guide to judge the effects of economic polices. Looking ahead: Later courses discuss both the theory and the relevance of the Lucas critique (it can in fact be tested!). If interested: BN 5.12 is relatively detailed comared to other introductory books.

Models for time series data I For time series data we use t as a subscript for the stochastic variables/observations. It is also custom to replace n by T. If we formulate a static model y t = β 1 + β 2 x t + e t (17) for time series data, the specification of RM2 will in essence be unchanged, with e.g., assumption d. written as cov (e t, e t±s x t ) = 0, s = t which is called the assumption of no autocorrelation in the disturbances.

Models for time series data II For the static model (17) the hypothesis of no autocorrelation often fails. This regularly shows up in the OLS residuals ê t from (17) which are usually highly correlated with ê t 1 (and often older residual as well). The explanation is that time series variables are typically serially correlated: y t is usually highly correlated with y t 1, and x t is correlated with x t 1. Therefore the independent sampling assumption of RM2 is irrelevant for the case of time series data

A simple dynamic model I The solution of the problem with autocorrelation is either to correct the OLS estimators, or to represent the serial correlation of y t and x t in the conditional expectation (dynamic econometric models) The simplest example a dynamic model is y t = β 1 + β 2 y t 1 + e t, with 1 < β 2 < 1 (18) where the explanatory variable replacing x t is the history the y variable. This type of equation is called an autoregressive model of order one (AR(1)). It is a linear stochastic difference equation.

A simple dynamic model II In terms of properties of estimators: How close does this model come to RM2? The answer is: So close that it can be seen as a variant of RM2 To complete the specification of the dynamic regression model, we can define the conditional expectation function and the disturbance properties E (y t y t 1 ) = β 1 + β 2 y t 1 E (e t y t 1 ) = 0 var(e t y t 1 ) = σ 2 cov(e t,, e t±s y t 1 ) = 0

A simple dynamic model III What can we say about cov(e t±s,y t 1 )? in this model? For s = 0, we have from E (e t y t 1 ) = 0 that cov(e t,y t 1 ) = 0 but at we know, exogeneity requires that y t 1 is uncorrelated with all disturbances, both past and future. The mathematical solution for y t in (18) is found by repeated substitution of y t 1, y t 2 and so on back to infinity: (19) shows that y t = β 1 i=0 βi 2 + i=0 βi 2e t i (19)

A simple dynamic model IV y t 1 is uncorrelated with e t and all future disturbances, but y t 1 is correlated with e t 1 and all other past disturbances Hence y t 1 is not exogenous in (18), but y t 1 is not complete endogenous either. We have an intermediary case between exogeneity and endogeneity, and we say that y t 1 is a pre-determined variable in (19). In the case of a pre-determined explanatory variable the properties of the OLS estimators ˆβ 2 and ˆβ 1 are consistent but with finite sample biases, that are due to the correlation between y t 1 and past disturbances.

A simple dynamic model V As an example, we have that E ( ˆβ 2 β 2 ) 2β 2 T for the simplest case with β 1 = 0 (no drift).

0.01 0.02 0.03 0.04 0.05 0.06 plot of bias formula for β 2 = 0.5 T = 1, 2,..., 100 0.07 0.08 10 20 30 40 50 60 70 80 90 100

Summary of the regression model I As long as the regressor is deterministic or exogenous, and the classical assumptions about the disturbance properties hold, the regression model gives OLS estimators that are BLUE. In the case of stochastic x, the proof is in term of conditional and iterated expectations. With normally distributed disturbances, hypotheses tests and confidence intervals can be based on percentiles from the t-distribution. Consistency of estimators also holds. We have only proved that for the case of deterministic regressor: The theory of Probability limit is needed for the case of stochastic x.

Summary of the regression model II Without normally distributed disturbances, the t-test is approximately valid, and the degree of approximation becomes better with larger n If x is a pre-determined stochastic regressor, there is a (small) bias in the OLS estimator. That bias is decreasing in the sample size. Hence, for typical sample sizes (more than 30 observations) the case or pre-determinedness can be regarded as a variant of RM2: the properties are very similar.