Freeing up the Classical Assumptions. () Introductory Econometrics: Topic 5 1 / 94

Size: px

Start display at page:

Download "Freeing up the Classical Assumptions. () Introductory Econometrics: Topic 5 1 / 94"

Julian Reed
5 years ago
Views:

1 Freeing up the Classical Assumptions () Introductory Econometrics: Topic 5 1 / 94

2 The Multiple Regression Model: Freeing Up the Classical Assumptions Some or all of classical assumptions needed for derivations of Topic 3 and Topic 4 slides. Derivation of the OLS estimator only required the assumption of a linear relationship between Y and X But to show that OLS estimator had desirable properties did require assumptions Proof of the Gauss-Markov theorem required all classical assumptions except for the assumption of Normal errors Derivation of confidence intervals and hypothesis testing procedures required all classical assumptions. But what if some or all of the classical assumptions are false? This set of slides (based on Chapter 5 of textbook) will show how to relax them () Introductory Econometrics: Topic 5 2 / 94

3 Begin by discussing some general theory, before considering some special cases. Two general categories: problems which call for use of Generalized Least Squares (or GLS) Estimator Heteroskedasticity and autocorrelated errors will be discussed in this category. Second category relates to the use of the so-called Instrumental Variables (IV) Estimator. () Introductory Econometrics: Topic 5 3 / 94

4 Basic Theoretical Results Previously derived theoretical results using multiple regression model with classical assumptions: Y i = α + β 1 X 1i β k X ki + ε i 1 E (ε i ) = 0 mean zero errors. 2 var (ε i ) = E ( ε 2 ) i = σ 2 constant variance errors (homoskedasticity). 3 cov (ε i ε j ) = 0 for i = j. 4 ε i is Normally distributed 5 Explanatory variables are fixed. They are not random variables. () Introductory Econometrics: Topic 5 4 / 94

5 Remember: Assumption 1 is innocuous If error had non-zero mean could include it as part of the intercept it would have no effect on estimation of slope coeffi cients in the model). Assumption 4 can be relaxed (approximately) by using asymptotic theory (not discussed in this course, but see Appendix to Chapter 3 if you are interested) Assumption 5 we will still maintain (will discuss this later on in the context of "instrumental variables" estimation). For now focus on Assumptions 2 and 3. () Introductory Econometrics: Topic 5 5 / 94

6 Heteroskedasticity relates to Assumption 2. Autocorrelation (also called serial correlation) relates to Assumption 3 Basic ideas: Under classical assumptions, Gauss Markov theorem says "OLS is BLUE". But if Assumptions 2 and 3 are violated OLS this no longer holds (OLS is still unbiased, but is no longer "best". i.e. no longer minimum variance). Concepts/proofs/derivations often use following strategy. The model can be transformed to create a new model which does satisfy classical assumptions. We know OLS (on the transformed model) will be BLUE. (And all the theory we worked out for the OLS estimator will hold except it will hold for the transformed model). The OLS estimator using such a transformed model is called the Generalized Least Squares (GLS) estimator. () Introductory Econometrics: Topic 5 6 / 94

7 Heteroskedasticity Heteroskedasticity occurs when the error variance differs across observations. Errors reflects mis-pricing of houses If a house sells for much more (or less) than comparable houses, then it will have a big error Error variance measures the dispersion of errors/degree of possible mis-pricing houses Suppose small houses all very similar, then unlikely to be large pricing errors Suppose big houses can be very different than one another, then possible to have large pricing errors If this story is true, then expect errors for big houses to have larger variance than for small houses Statistically: heteroskedasticity is present and is associated with the size of the house () Introductory Econometrics: Topic 5 7 / 94

8 Assumption 2 replaced by var (ε i ) = σ 2 ω 2 i for i = 1,.., N. Note: ω i varies with i so error variance differs across observations Begin with some theoretical results assuming ω 2 i is known What are the properties of the OLS estimator if heteroskedasticity is present? () Introductory Econometrics: Topic 5 8 / 94

9 To make derivations easier, go back to simple regression model: Y i = βx i + ε i Classical assumptions hold, except for Assumption 2. We now have heteroskedasticity. Remember that OLS estimator can be written in various ways: β = X i Y i Xi 2 = β + X i ε i Xi 2 Under classical assumptions, we proved: ( ) σ β 2 is N β, Xi 2 We used this to derive confidence intervals and hypothesis testing procedures. () Introductory Econometrics: Topic 5 9 / 94

10 Under heteroskedasticity, most of our previous derivations still work. The error variance did not appear in our proofs for unbiasedness of OLS nor showing it had a Normal distribution. Hence, we will not repeat the derivations here but simply state the following results: Under the present assumptions ( β) (i.e. allowing for heteroskedasticity), OLS is still unbiased (i.e. E = β) and it is Normally distribution. New result: Under the present assumptions, ) var ( β = σ2 Xi 2 ω 2 i ( Xi 2 ) 2 () Introductory Econometrics: Topic 5 10 / 94

11 Proof (using properties of variance operator) ) var ( β ( = var β + ) X i ε i Xi 2 ( ) = var X i ε i Xi 2 1 = ( ) Xi 2 2 var ( ) X i ε i 1 = ( ) Xi 2 2 Xi 2 var (ε i ) = σ 2 ( X 2 i ) 2 X 2 i ω 2 i () Introductory Econometrics: Topic 5 11 / 94

12 Key Theoretical Point: If heteroskedasticity is present, variance of the OLS estimator is different than what it was under the classical assumptions. ) But var ( β enters the formula for confidence intervals and test statistics Key Points for Empirical Practice: If heteroskedasticity is present and you ignore it, simply using the OLS estimator in a software package, ) the software package will use the incorrect formula for var ( β. Software package will use formula which obtains ) under classical assumptions, where it should be using var ( β = σ2 Xi 2 ω 2 i. ( Xi 2 ) 2 Confidence intervals and test statistics will be incorrect () Introductory Econometrics: Topic 5 12 / 94

13 Summary: OLS is still unbiased if heteroskedasticity is present (so as an estimate it may be okay) But almost everything else (confidence intervals, hypothesis tests, etc.) will be incorrect. The only case where using OLS is acceptable ) is if you make sure the computer is using the correct var ( β = σ2 X 2 formula. i ω 2 i ( Xi 2 ) 2 We will return to this point later in our discussion of the heteroskedasticity consistent estimator (to be defined later). () Introductory Econometrics: Topic 5 13 / 94

14 The Generalized Least Squares Estimator under Heteroskedasticity Idea: Transform the model to create new model which does obey classical assumptions. The original regression model is: Y i = βx i + ε i (1) Consider a transformed model where we divide both sides by ω i : Y i ω i = β X i ω i + ε i ω i or (to make the notation compact): Y i = βx i + ε i (2) Can prove the transformed model given in (2) satisfies the classical assumptions. () Introductory Econometrics: Topic 5 14 / 94

15 Since ε i has mean zero, so will εi Key thing to verify is that var (εi ) is constant var (ε i ) = var ( εi ω i ) = 1 ω 2 i var (ε i ) = σ2 ω 2 i ω 2 i = σ 2 So error variances in (2) are constant. Important point: The transformed model in (2) satisfies classical assumptions. Hence, all our OLS results (using transformed model) can be used to say OLS (on transformed model) is BLUE, OLS confidence intervals (using transformed data) are correct, etc. etc. Don t have to re-do all the proofs on Topic 3 (just plug in Yi, Xi, εi instead of the Y i, X i, ε i ) () Introductory Econometrics: Topic 5 15 / 94

16 The Generalized Least Squares Estimator The previous reasoning suggests OLS using transformed model provides a good estimator: β GLS = Xi Yi Xi 2 In terms of the original data this is: β X i Y i ω GLS = 2 i X i 2 ω 2 i This is called the Generalized Least Squares (GLS) estimator I have written "GLS" as a subscript on it to make explicit it is not the same as OLS on original model. Note: I am still working with the simple regression model, but the extension to multiple regression is immediate. Simply divide every explanatory variable (and the dependent variable) by ω i and then do OLS on the transformed model. () Introductory Econometrics: Topic 5 16 / 94

17 Properties of the GLS estimator (under heteroskedasticity) Since GLS is equivalent to OLS on transformed model, can use OLS results from Topic 3 slides (and apply them to the transformed model). That is, plug in Xi and Yi instead of X i and Y i in all our old formulae. Since the transformed model satisfies the classical assumptions, draw on old results to say: ( ) σ β 2 GLS is N β, Xi 2 Thus, (under heteroskedasticity) GLS is unbiased with ) σ var ( β GLS = 2 Xi 2 = σ 2 ( Xi 2 ω 2 i () Introductory Econometrics: Topic 5 17 / 94 )

18 Gauss-Markov theorem tells us that, under the classical assumptions, OLS is BLUE. Here β GLS is equivalent to OLS estimation of a transformed model which does satisfy the classical assumptions. Hence, under heteroskedasticity, it follows immediately that β GLS is BLUE. An implication of this is that: ) ) var ( β GLS var ( β OLS Thus, GLS is a better estimator than OLS. Both are unbiased, but GLS has a smaller variance (it is more effi cient). () Introductory Econometrics: Topic 5 18 / 94

19 The fact that β GLS is N ( β, ) σ 2 Xi 2 can be used to derive confidence intervals and hypothesis tests exactly as before. We will not repeat this material. Formulae are same as before except with X y i i and y i instead of X i and () Introductory Econometrics: Topic 5 19 / 94

20 Heteroskedasticity: Estimation if Error variances are unknown Derivations above assumed that ω 2 i is known. In practice, it will usually be the case that ω 2 i is unknown. How to proceed? Either figure out what ω 2 i is or replace ω 2 i by an estimate. Or a heteroskedasticity consistent estimator (HCE) can be used. Digression: consistency is an asymptotic concept (asymptotic derivations not done in this course) Intuition 1: Consistency has some similarities to unbiasedness. Intuition 2: A consistent estimator is one which, as sample size goes to infinity, goes to true value. () Introductory Econometrics: Topic 5 20 / 94

21 Fixing up a Heteroskedasticity Problem by Logging In some cases, log linear regressions will be homoskedastic even if linear regression is heteroskedastic Note: if variables have values which are zero or negative you cannot log them. But even if you log some of your variables (or even only log the dependent variable) it is sometimes enough to fix up a heteroskedasticity problem Remember: be careful with interpretation of coeffi cients when you log variables (see Topic 4 slides) Heteroskedasticity tests (see below) can be used to see whether logging fixes us a heteroskedasticity problem Note: solving a heteroskedasticity problem by logging is not called GLS () Introductory Econometrics: Topic 5 21 / 94

22 Doing GLS by Transforming the Model In many cases, the heteroskedasticity can be related to an explanatory variable. Consider multiple regression model: Y i = α + β 1 X 1i β k X ki + ε i under the classical assumptions except that var (ε i ) = σ 2 ω i = σ 2 Z 2 i Z i is an explanatory variable (usually Z i will be one of X 2i,.., X ki ). This captures the idea "the error variances vary directly with an explanatory variable". () Introductory Econometrics: Topic 5 22 / 94

23 If you suspect "the error variances vary inversely with an explanatory variable" you could use: var (ε i ) = σ 2 1 Z 2 i Note: variances must be positive which is why I have used Z 2 i. An alternative choice is to use the exponential function (e.g. var (ε i ) = σ 2 exp (Z i )). () Introductory Econometrics: Topic 5 23 / 94

24 Remember: under heteroskedasticity, GLS says we should transform our data as: Y i = β X i + ε i ω i ω i ω i and then use OLS on transformed model. But here we have ω i = Z i. So can divide all your variables by Z i and then do OLS. Empirical tip: Experiment with different choices for Z (usually, it will be one of X 1,.., X k ) Note: cannot divide by zero and, hence, you cannot use this transformation for a variable which has Z i = 0 for any observation Cannot use of this transformation with dummy variables. If the heteroskedasticity is characterized by var (ε i ) = σ 2 exp (Z i ) then zero values of Z i are acceptable. () Introductory Econometrics: Topic 5 24 / 94

25 Previous slide has "error variances vary directly with Z" If error variances vary inversely with Z, transformed model becomes: Y i Z i = αz i + β 1 X 1i Z i β k X ki Z i + ε i Z i GLS estimator obtained by multiplying all your variables by Z i and then doing OLS with these new variables. Interpretation: from original regression β j is marginal effect of X j on Y (ceteris paribus) and this interpretation holds with transformed model But now β j appears as coeffi cient on X j Z (or X j Z ) () Introductory Econometrics: Topic 5 25 / 94

26 What if heteroskedasticity is present, but you cannot relate it to a single variable, Z? It is desirable to do GLS if you can. That is, if you can find a transformed model which does not suffer from heteroskedasticity (must check with heteroskedasticity tests) If you cannot, remember that OLS is still unbiased so is an adequate second best estimator. But the variance formula we derived under the classical assumptions not longer holds. The correct formula is: ) var ( β = σ2 Xi 2 ω 2 i ( Xi 2 ) 2 So one thing you can do is use OLS with this correct formula to calculate the variance. () Introductory Econometrics: Topic 5 26 / 94

27 Problem: we do not know σ 2 ω 2 i. Solution: Replace it with an estimate. Since can use the OLS residuals: as estimates of σ 2 ω 2 i. ) Thus, an estimate of var ( β is var (ε i ) = E ( ε 2 ) i = σ 2 ω 2 i ε 2 i ) var ( β = X i 2 ε2 i ( Xi 2 ) 2 () Introductory Econometrics: Topic 5 27 / 94

28 Summary: Use OLS to estimate β, then use var ) ( β in formulae for confidence intervals, etc. This is an example of a heteroskedasticity consistent estimator (HCE). They can be automatically calculated in Gretl Advantages: HCEs are easy to calculate and you do not need to know the form that the heteroskedasticity takes. Disadvantages: HCEs are not as effi cient as the GLS estimator (i.e. they will have larger variance). () Introductory Econometrics: Topic 5 28 / 94

29 Testing for Heteroskedasticity If heteroskedasticity is NOT present, then OLS is fine (it is BLUE). But if it is present, you should use GLS (or a HCE). Thus, it is important to know if heteroskedasticity is present. There are many tests, here I will describe two of the most popular ones: the White test and the Breusch Pagan test () Introductory Econometrics: Topic 5 29 / 94

30 Suppose you think the error variance might depend on any or all of the variables Z 1,.., Z p (which may be the same as the explanatory variables in the regression itself). Breusch-Pagan test involves the following steps: Run OLS on the original regression (ignoring heteroskedasticity) and obtain the residuals, ε i, and using the residuals calculate: σ 2 = ε2 i N Run a second regression of the equation: ε 2 i σ 2 = γ 0 + γ 1 Z 1i γ p Z pi + v i Calculate the Breusch-Pagan test statistic using the regression sum of squares (RSS) from this second regression: BP = RSS 2 This test statistic has a χ 2 (p) distribution, which can be used to get a critical value from. () Introductory Econometrics: Topic 5 30 / 94

31 The White Test for Heteroskedasticity Similar to the Breusch-Pagan test White test involves the following steps: Run OLS on the original regression (ignoring heteroskedasticity) and obtain the residuals, ε i. Run a second regression of the equation: ε 2 i = γ 0 + γ 1 Z 1i γ p Z pi + v i and obtain the R 2 from this regression. Calculate the White test statistic: W = NR 2 This test statistic has a χ 2 (p) distribution which can be used to get a critical value from. () Introductory Econometrics: Topic 5 31 / 94

32 An advantage of the White and Breusch Pagan tests is that you just need to choose Z 1,.., Z p Usually these are just the explanatory variables in the original regression, although with the White test researchers sometimes also include squares and cross-products of the explanatory variables. A disadvantage is that, if the tests indicate that heteroskedasticity is present, they do not offer much guidance on how you should try and transform the model to do GLS. All you know is that heteroskedasticity is present and is related to one (or several) of the variables Z 1,.., Z p. () Introductory Econometrics: Topic 5 32 / 94

33 Recommendations for Empirical Practice If you think you might have a heteroskedasticity problem, begin by doing a heteroskedasticity test. If your tests indicate heteroskedasticity is present, then do some experimentation to see if you can solve the heteroskedasticity problem Sometimes logging your dependent variable (and some or all of the explanatory variables) will be enough to fix the problem. Sometimes multiply/dividing all your explanatory variables by some variable (Z j ) is enough to fix the problem Try different choices for Z j Note: Every time you try such a transformation you must do a heteroskedasticity test to check if it has fixed the problem. () Introductory Econometrics: Topic 5 33 / 94

34 If you cannot find a transformation which fixes the heteroskedasticity problem, then use a HCE. Remember: if heteroskedasticity is present, then hypothesis tests involving β s will be incorrect. So wait until after you have corrected the problem (or are using an HCE) before doing hypothesis testing (e.g. to find out which of your explanatory variables are insignificant). Remember to be careful with your interpretation of the marginal effects of coeffi cients If logging solves heteroskedasticity problem, see page 111 of textbook for interpretation of logged coeffi cients If multiply/dividing Z j solves the problem, see page 129 for interpretation of coeffi cients () Introductory Econometrics: Topic 5 34 / 94

35 Example: Explaining House Prices Data set used before, here is a reminder: N = 546 houses sold in Windsor, Canada. The dependent variable, Y, is the sales price of the house in Canadian dollars. The explanatory variables are: X 1 = the lot size of the property (in square feet) X 2 = the number of bedrooms X 3 = the number of bathrooms X 4 = the number of storeys (excluding the basement) () Introductory Econometrics: Topic 5 35 / 94

36 D 1 = 1 if the house has a driveway (= 0 if it does not) D 2 = 1 if the house has a recreation room (= 0 if not) D 3 = 1 if the house has a basement (= 0 if not) D 4 = 1 if the house has gas central heating (= 0 if not) D 5 = 1 if the house has air conditioning (= 0 if not). () Introductory Econometrics: Topic 5 36 / 94

37 Breusch-Pagan and White tests require the selection of variables which might be related to the error variance I set these variables to be the explanatory variables in the original regression I find BP = and W = Critical values for both of these tests are taken from the χ 2 (9) distribution. The 5% critical value is Since both test statistics are greater than critical value, both of these tests indicate that heteroskedasticity is present Thus, although the OLS estimates presented in Topic 3 slides for this data set were unbiased, the confidence intervals and hypothesis tests were incorrect. () Introductory Econometrics: Topic 5 37 / 94

38 I next experimented with different transformations to try and eliminate the heteroskedasticity problem until I found one that eliminated heteroskedasticity The one that worked was to take logs of all variables except for the dummies (since the log of zero is undefined, you cannot take the log of a dummy variable) In this log-linear regression, the values for the two heteroskedasticity tests are BP = and W = which are both less than the 5% critical value Hence, both tests fail to reject the hypothesis of homoskedasticity. Thus, the log transformation has solved the heteroskedasticity problem in this example. Remember: β j in log linear regression is interpreted as an elasticity () Introductory Econometrics: Topic 5 38 / 94

39 Autocorrelation We will continue our discussion of problems which call for the use of the Generalized Least Squares Estimator by considering autocorrelation. This is used with time series data, so we will use t = 1,.., T to denote observations (rather than i = 1,.., N) Reminder of basic theoretical Results Previously derived theoretical results using multiple regression model with classical assumptions: 1 E (ε i ) = 0 mean zero errors. 2 var (ε i ) = E ( ε 2 ) i = σ 2 constant variance errors (homoskedasticity). 3 cov (ε i ε j ) = 0 for i = j. 4 ε i is Normally distributed 5 Explanatory variables are fixed. They are not random variables. () Introductory Econometrics: Topic 5 39 / 94

40 Remember: Assumption 1 is innocuous (if the error had a non-zero mean we could include it as part of the intercept it would have no effect on estimation of slope coeffi cients in the model). Assumption 4 can be relaxed (approximately) by using asymptotic theory. Assumption 5 we will still maintain. Autocorrelation (also called serial correlation) relates to Assumption 3. () Introductory Econometrics: Topic 5 40 / 94

41 Basic ideas: Under classical assumptions, Gauss Markov theorem says "OLS is BLUE". But if Assumptions 2 and 3 are violated OLS this no longer holds (OLS is still unbiased, but is no longer "best". i.e. no longer minimum variance). Concepts/proofs/derivations use following strategy. The model can be transformed to create a new model which does satisfy classical assumptions. We know OLS (on the transformed model) will be BLUE. (And all the theory we worked out for the OLS estimator will hold except it will hold for the transformed model). The OLS estimator using such a transformed model is called the Generalized Least Squares (GLS) estimator. () Introductory Econometrics: Topic 5 41 / 94

42 Autocorrelated Errors Use multiple regression model under the classical assumptions, with the exception that the errors follow an autoregressive process of order 1 (AR(1)): ε t = ρε t 1 + u t where it is u t which satisfies classical assumptions. So E (u t ) = 0, var (u t ) = σ 2 and cov (u t, u s ) = 0 (for t = s). We also assume 1 < ρ < 1. To preview later material, this restriction ensure stationarity and means you do not have to worry about problems relating to unit roots and cointegration (definitions will be provided to you later on). Focus on the AR(1) cases, but note that the AR(p) errors case is a simple extension: ε t = ρ 1 ε t 1 + ρ 2 ε t ρ p ε t p + u t () Introductory Econometrics: Topic 5 42 / 94

43 Variances and Covariances of Regression Error Assumptions above specified properties of u t, but we need to know properties of ε t. Notation: σ 2 ε = var (ε t ) = E ( ε 2 ) t where last equal sign follows since errors have mean zero. Derivation of variance of regression errors (textbook does derivation in different way): σ 2 ε = var (ρε t 1 + u t ) = ρ 2 var (ε t 1 ) + var (u t ) = ρ 2 σ 2 ε + σ 2 = σ 2 1 ρ 2 In previous derivations used properties of variance operator, the fact that ε t 1 and u t are independent of one another and that ε t is homoskedastic. () Introductory Econometrics: Topic 5 43 / 94

44 The derivation of covariance between different regression errors will be done in Problem Sheet 4: cov (ε t, ε t 1 ) = ρσ 2 ε For errors more than one period apart, can show: cov (ε t, ε t s ) = ρ s σ 2 ε Thus, the regression model with autocorrelated errors violates assumption 3. Regression errors are NOT uncorrelated with one another. Hence, we need to work with a GLS estimator. () Introductory Econometrics: Topic 5 44 / 94

45 The GLS Estimator for the Autocorrelated Errors Case Remember: GLS can be interpreted as OLS on a suitably transformed model. In this case, the appropriate transformation is referred to as "quasi-differencing". To explain what this is, consider the regression model: Y t = α + β 1 X 1t β k X kt + ε t Holds for every time period so we can take it at period t 1 and multiply both sides of the equation by ρ: ρy t 1 = ρα + ρβ 1 X 1t ρβ k X kt 1 + ρε t 1 Subtract this equation from the original regression equation: Y t ρy t 1 = α ρα + β 1 (X 1t ρx 1t 1 ) β k (X kt ρx kt 1 ) + ε t ρε t 1 () Introductory Econometrics: Topic 5 45 / 94

46 or Y t = α + β 1 X 1t β k X kt + u t But u t satisfies the classical assumptions so OLS on this transformed model will be GLS (which will be BLUE). Note: transformed variables are "quasi-differenced" = Y t ρy t 1 X1t = (X 1t ρx 1t 1 ) etc. Y t The case with ρ = 1 is called "differenced" this is not quite the same so we say "quasi" differenced. () Introductory Econometrics: Topic 5 46 / 94

47 One (relatively minor) issue: if our original data is from t = 1,.., T then Y 1 = Y 1 ρy 0 will involve Y 0 (and same issue for explanatory variables). But we do not observe such "initial conditions". There are many ways of treating initial conditions. Simplest is to work with data from t = 2,.., T (and use t = 1 values for variables as initial conditions). Summary: If we knew ρ, then we could quasi-difference the data and do OLS using the transformed data (which is equivalent to GLS). In practice, we rarely (if ever) know ρ. Hence, replace ρ by an estimate: ρ. This is what Cochrane-Orcutt procedure does () Introductory Econometrics: Topic 5 47 / 94

48 The Cochrane-Orcutt Procedure Remember: with autocorrelated errors, GLS is BLUE. However, OLS (on original data) is still unbiased. Cochrane-Orcutt procedure begins with OLS and then uses OLS residuals to estimate ρ. Cochrane-Orcutt procedure goes through following steps: () Introductory Econometrics: Topic 5 48 / 94

49 1 Do OLS regression of Y t on intercept, X 1t,.., X kt and produce the OLS residuals, ε t. 2 Do OLS regression of ε t on ε t 1 which will provide a ρ. 3 Quasi-difference all variables to produce = Y t ρy t 1 X1t = (X 1t ρx 1t 1 ) etc. Y t 4 Do OLS regression of Yt on intercept, X1t,.., X kt, thus producing GLS estimates of the coeffi cients. () Introductory Econometrics: Topic 5 49 / 94

50 Autocorrelation Consistent Estimators Remember: with heteroskedasticity we discussed heteroskedasticity consistent estimator (HCE). Less effi cient than GLS, but is a correct second-best solution when GLS diffi cult to implement. Similar issues hold autocorrelated errors. There exist autocorrelation consistent estimators which allow for the correct use of OLS methods when you have autocorrelated errors. We will not explain these, but many popular econometrics software packages include them. The most popular is the Newey-West estimator (which corrects for heteroskedasticity as well). () Introductory Econometrics: Topic 5 50 / 94

51 Testing for Autocorrelated Errors If ρ = 0 then doing OLS on the original data is fine (OLS is BLUE). However, if ρ = 0, then a GLS estimator such as the Cochrane-Orcutt estimator is better. This motivates testing H 0 : ρ = 0 against H 1 : ρ = 0. There are several such tests, here we describe some of the most popular. () Introductory Econometrics: Topic 5 51 / 94

52 Breusch-Godfrey Test This is a test of H 0 : ρ 1 = 0, ρ 2 = 0,.., ρ p = 0 in the regression model with AR(p) errors: ε t = ρ 1 ε t 1 + ρ 2 ε t ρ p ε t p + u t Breusch-Godfrey test involves the following steps: 1 Run a regression of Y t on an intercept, X 1,.., X k using OLS and produce the residuals, ε t. 2 Run second regression of ε t on intercept, X 1,.., X k, ε t 1,.., ε t p using OLS and produce the R 2. 3 Calculate the test statistic: LM = TR 2 If H 0 is true, then LM has an (approximate) χ 2 (p) distribution. Thus, critical value taken from statistical tables for the Chi-square distribution. () Introductory Econometrics: Topic 5 52 / 94

53 The Box-Pierce and Ljung Tests These test H 0 : ρ 1 = 0, ρ 2 = 0,.., ρ p = 0 Both based on idea that, if the errors are not autocorrelated, then the correlations between different errors should be zero. Replace errors by residuals. ε t are residuals from OLS regression of Y on an intercept, X 1,.., X k, Correlations between ε t and ε t s are: r s = T t=s+1 ε t ε t s T t=s+1 ε2 t () Introductory Econometrics: Topic 5 53 / 94

54 Box-Pierce test statistic (sometimes called the Q test statistic) is: Q = T p rj 2 j=1 The p means that AR(p) errors are being tested for. The Ljung test statistic is: Q = T (T + 2) p j=1 r 2 j T j Critical values for both taken from χ 2 (p) tables. Econometrics software packages such as Gretl will calculate test statistics () Introductory Econometrics: Topic 5 54 / 94

55 Warning: in some cases, one of the explanatory variables will be the dependent variable from a previous period ("lagged dependent variable"). For instance: Y t = α + δy t 1 + βx t + ε t The Box-Pierce and Ljung tests are not appropriate in this case. The Breusch-Godfrey test, however, is still appropriate. The textbook discusses two other approaches: the Durbin-Watson statistic and Durbin s h-test. () Introductory Econometrics: Topic 5 55 / 94

56 Instrumental Variable Methods Overview: Under the classical assumptions, OLS is BLUE. When we relax Assumptions 2 or 3 (e.g. to allow for heteroskedasticity or autocorrelated errors), then OLS is no longer BLUE but it is still unbiased (although GLS is better) However, in the case we are about to consider, OLS will be biased and an entirely different estimator will be called for the instrumental variables (IV) estimator. Here we relax assumption that explanatory variables are not random variables. For simplicity, use simple regression model, but results generalize to the case of multiple regression. () Introductory Econometrics: Topic 5 56 / 94

57 Theory Motivating the IV Estimator Reminder of basic theoretical Results Previously derived theoretical results using multiple regression model with classical assumptions: 1 E (ε i ) = 0 mean zero errors. 2 var (ε i ) = E ( ε 2 ) i = σ 2 constant variance errors (homoskedasticity). 3 cov (ε i ε j ) = 0 for i = j. 4 ε i is Normally distributed 5 Explanatory variables are fixed. They are not random variables. () Introductory Econometrics: Topic 5 57 / 94

58 Remember: Assumption 1 is innocuous. Assumption 4 can be relaxed (approximately) by using asymptotic theory. Assumptions 2 and 3 were covered above (heteroskedasticity and autocorrelated errors). Now we will focus on relaxing Assumption 5. Note: When explanatory variables are be random, many derivations we did before with expected value and variance operators become much more diffi cult/impossible. For this reason, most relevant results are asymptotic. But asymptotic methods not covered in course (see appendix to Chapter 5 if you are interested) These lecture slides provide some intuition, hints at derivations and discussion of things relevant for empirical practice. () Introductory Econometrics: Topic 5 58 / 94

59 Case 1: Explanatory Variable is Random But is Uncorrelated with Error If X i is now a random variable, we have to make some assumptions about its distribution. Assume X i are independent and identically distributed (i.i.d.) random variables with: E (X i ) = µ X var (X i ) = σ 2 X In Case 1 we will assume explanatory variable and errors are uncorrelated with one another: cov (X i, ε i ) = E (X i ε i ) = 0 () Introductory Econometrics: Topic 5 59 / 94

60 Remember, under classical assumptions: ( ) σ β 2 is N β, Xi 2 This result can still be shown to hold approximately in this case (I will not provide details, some given in textbook) Bottom line: If we relax the assumptions of Normality and fixed explanatory variables we get exactly the same results as for OLS under the classical assumptions (but here they hold approximately), provided explanatory variables are uncorrelated with the error term. () Introductory Econometrics: Topic 5 60 / 94

61 Case 2: Explanatory Variable is Correlated with the Error Term Assume X i are i.i.d. random variables with: E (X i ) = µ X var (X i ) = σ 2 X In Case 2 we will assume explanatory variable and errors are correlated with one another: cov (X i, ε i ) = E (X i ε i ) = 0 It turns out that, in this case, OLS is biased and a new estimator is called for: the instrumental variables (IV) estimator. Why is this? We will not provide proof, but outline the basic idea () Introductory Econometrics: Topic 5 61 / 94

62 The proof that OLS is biased begins in the same manner as the proof of unbiasedness under classical assumptions (see Topic 3 slides or page 68 of textbook). We can get up to the following stage in the proof: ) ( β) E = β + E ( X i ε i Xi 2 But at this stage we can go ( no farther ) other than to note that there is no reason to think that E X i ε i = 0 and, in fact, it is not. X i 2 Intuition: (ignoring Xi 2 in the denominator), the numerator is E ( ) X i ε i = E (Xi ε i ) = cov (X i ε i ) = 0 Important point: if the error and explanatory variable are correlated, then OLS is biased and should be avoided. Soon will offer some explanation for why this might occur, but first introduce new estimator to handle this case. () Introductory Econometrics: Topic 5 62 / 94

63 The Instrumental Variables Estimator An instrumental variable (IV), Z i, is a random variable which is uncorrelated with the error but is correlated with the explanatory variable. Formally, an instrumental variable is assumed to satisfy the following assumptions: E (Z i ) = µ Z var (Z i ) = σ 2 Z cov (Z i, ε i ) = E (Z i ε i ) = 0 cov (X i, Z i ) = E (X i Z i ) µ Z µ X = σ XZ = 0 Assuming an instrumental variable exists (something we will return to later), IV estimator: β IV = N i=1 Z i Y i N X i Z i i=1 () Introductory Econometrics: Topic 5 63 / 94

64 The asymptotic derivations in the textbook (not covered in this course) imply (approximately): ( ( σ 2 β IV is N β, Z + µ 2 ) ) Z σ 2 N (σ XZ + µ X µ Z ) 2 ) Note: this implies E ( β IV = β so unbiased (approximately) This formula can be used to calculate confidence intervals, hypothesis tests, etc. (comparable to Topic 3 slide derivations) In practice, the unknown means and variances can be replaced by their sample counterparts. Thus, µ X can be replaced by X, σ 2 Z by the sample variance of (Z i Z) 2 N 1, etc. No additional details of how this is done, but note that econometrics software packages do IV estimation. Note: this is sometimes called the two stage least squares estimator (and Gretl uses this term) () Introductory Econometrics: Topic 5 64 / 94

65 Using the IV Estimator in Practice what if you have a multiple regression model involving more than one explanatory variable? Answer: you need at least one instrumental variable for each explanatory variable that is correlated with the error. Note: if even only one of the explanatory variables in a multiple regression model is correlated with the errors, OLS estimates of all coeffi cients can be biased. what if you have more instrumental variables than you need? Use the generalized instrumental variables estimator (GIVE). () Introductory Econometrics: Topic 5 65 / 94

66 The Generalized Instrumental Variables Estimator Illustrate in simple regression model with one explanatory variable, X. X is correlated with error, so IV estimator is needed Suppose two instrumental variables exist: Z 1 and Z 2. GIVE involves running an initial regression of the explanatory variable on the instruments: X i = γ 0 + γ 1 Z 1i + γ 2 Z 2i + u i OLS estimation of this initial regression provides fitted values: X i = γ 0 + γ 1 Z 1i + γ 2 Z 2i. () Introductory Econometrics: Topic 5 66 / 94

67 It can be shown that X is uncorrelated with the errors in the original regression and, hence, that it is a suitable instrument. This is what GIVE uses as an instrument. Thus, the GIVE is given by: β GIVE = N i=1 X i Y i N X i X i i=1 () Introductory Econometrics: Topic 5 67 / 94

68 The Hausman Test Hausman test used to see if IV estimator is necessary If X correlated with error, then OLS is biased and you should use an IV estimator. However, if X uncorrelated with error, then (under the classical assumptions) OLS is BLUE and, hence, is more effi cient than IV. Basic idea of Hasuman test (Gretl will calculate it for you): Let H 0 be the hypothesis that explanatory variables are uncorrelated with error. If H 0 is true, then both OLS and IV are both acceptable estimators and should give roughly the same result. However, if H 0 is false, then OLS in not acceptable, but IV is and they can be quite different. Hausman test based measuring the difference between β and β IV. () Introductory Econometrics: Topic 5 68 / 94

69 It can be shown that Hausman test can be done using OLS methods Example: simple regression with one instrument. Original regression model of interest is: Y i = α + βx i + ε i Hausman test uses this regression but adds the instrumental variable to it: Y i = α + βx i + γz i + ε i Can be shown that Hausman test is equivalent to t-test of H 0 : γ = 0. If coeffi cient on Z is significant, you reject H 0 and use the IV estimator. If coeffi cient on Z is insignificant, do OLS on original regression model () Introductory Econometrics: Topic 5 69 / 94

70 Example: Hausman test in multiple regression model Suppose we have 3 explanatory variables (X 1, X 2 and X 3 ), 2 of which might be correlated with the error (X 2 and X 3 ). For X 2 we have two possible instrumental variables (Z 1 and Z 2 ), For X 3 we have three possible instrumental variables (Z 3, Z 4 and Z 5 ) Since potentially more instruments than explanatory variables, might want to use GIVE Hausman test runs initial GIVE regressions (see the discussion of the GIVE) to get fitted values, X 2 and X 3 Then adds these fitted values to original regression and tests if they are significant Precise steps given on next slide () Introductory Econometrics: Topic 5 70 / 94

71 Run an OLS regression of X 2 on an intercept, Z 1 and Z 2. Obtain the fitted values, X 2. Run an OLS regression of X 3 on an intercept Z 3, Z 4 and Z 5. Obtain the fitted values, X 3. Run an OLS regression of Y on an intercept X 1, X 2, X 3, X 2 and X 3. Do an F-test of the hypothesis that the coeffi cients on X 2 and X 3 are jointly equal to zero. If the F-test rejects, then proceed with GIVE, if not use OLS on the original regression. () Introductory Econometrics: Topic 5 71 / 94

72 The Sargan Test Hausman test tells you whether it is necessary to do instrumental variables estimation, if we have valid instrumental variables But how do we know that Z truly is a valid instrument? Remember to be a valid instrument, Z must be correlated with X but uncorrelated with regression error. The latter is not easy to check, since the regression error is not observed. The issue of testing whether instruments truly are valid ones is diffi cult If you have only one potential instrument for every explanatory variable, there is no way of testing instrument validity Next slide illustrates why this is so () Introductory Econometrics: Topic 5 72 / 94

73 Suppose one potential instrument, Z for a simple regression model: Y i = βx i + ε i For Z to be instrument must have cov (Z i, ε i ) = 0. ε i is unobserved, but why not create a test statistic based on cov (Z i, ε i ) using residuals But which residuals should you use? IV residuals (i.e. ε IV i = Y i β IV X i ) sound plausible, but will be unacceptable if Z is not a valid instrument OLS residuals are inappropriate since they could very well be biased (and you cannot use a Hausman test to check this, since you are not sure that Z is a valid instrument). In fact, there is no way of testing whether the instruments are valid in such a case. () Introductory Econometrics: Topic 5 73 / 94

74 But if more potential instruments than explanatory variables, there is a test for whether instruments are valid: the Sargan test Suppose you have k explanatory variables which all are potentially correlated with the error. Have r instrumental variables where r > k. The Sargan test involves the following steps: 1 Run regression of Y on X 1,.., X k using GIVE and obtain IV residuals, ε IV i. 2 Run OLS regression of ε IV i the R 2. on all the instruments, Z 1,.., Z r and obtain 3 Sargan test statistic is NR 2 and critical value can be obtained from χ 2 (r k) 4 If test statistic less than critical value conclude they are valid instruments, else they are not. () Introductory Econometrics: Topic 5 74 / 94

75 Why Might the Explanatory Variable Be Correlated with Error? There are many different reasons why the explanatory variables might be correlated with the errors. In these slides we discuss two reasons and give an example Errors in Variables problem Simultaneous equations model () Introductory Econometrics: Topic 5 75 / 94

76 Errors in Variables What if you want to run the regression: Y i = βx i + ε i Suppose this regression satisfies the classical assumptions but you do not observe x i, but instead observe: X i = X i + v i where v i is i.i.d. with mean zero, variance σ 2 ν and is independent of ε i. () Introductory Econometrics: Topic 5 76 / 94

77 In other words, X is observed with error. Replacing X i in the original regression yields a new regression: Y i = β (Xi v i ) + ε i = βxi + εi where εi = ε i βv i What is the covariance between the explanatory variable, Xi, and the error, εi, in this new regression? cov (Xi, εi ) = E [(X i + v i ) (ε i βv i )] = βσ 2 ν = 0 Hence measurement error in explanatory variables (but not dependent variable), causes them to be correlated with the regression error. () Introductory Econometrics: Topic 5 77 / 94

78 The Simultaneous Equations Model Example: supply and demand model Demand curve: Supply curve Q D = β D P + ε D Q S = β S P + ε S Q D is quantity demanded, Q S is quantity supplied, P is price In equilibrium, Q D = Q S and this condition is used to solve for equilibrium price and quantity. Price and quantity are determined by this model so they are both endogenous variables. These two equations are a simultaneous equations model. () Introductory Econometrics: Topic 5 78 / 94

79 Suppose econometrician collects data on price and quantity and runs a regression of quantity on price. This will yield an estimate β. But would it be an estimate of β D or β S? There is no way of knowing. OLS will probably neither estimate the supply curve nor estimate the demand curve. () Introductory Econometrics: Topic 5 79 / 94

80 How might we solve this problem? Include exogenous variable (not determined by the model). E.g. suppose Q D depends also on the income, I : Q D = β D P + γi + ε D Notation: original supply and demand model is structural form. When we solve model so that only exogenous variables on right hand side, we get reduced form () Introductory Econometrics: Topic 5 80 / 94

81 Setting Q D = Q S and solving for equilibrium P and Q Set right-hand sides of the demand and supply equations equal to each other, and rearrange: P = γ β D β S I + ε S ε D β D β S = π 1 I + ε 1 Substitute this expression for P into the supply equation: Q = β S (π 1 I + ε 1 ) + ε S = β S π 1 I + β S ε 1 + ε S = π 2 I + ε 2 γ β D β S Note: I am using π 1 = and π 2 = β S π 1 as notation for reduced form coeffi cients ε 1 and ε 2 are reduced form errors which are functions of the structural form errors This example shows what reduced and structural form are, but what are the implications for the econometrics? () Introductory Econometrics: Topic 5 81 / 94

82 Directly Estimating Structural Equations using OLS Leads to Biased Estimates Why not run a regression based on the demand curve? Regression of Q on P and I? Can show explanatory variable P is correlated with the error. As shown in this section of lecture slides, OLS is not appropriate To prove this, assume ε S and ε D are uncorrelated and homoskedastic var (ε D ) = σ 2 D var (ε S ) = σ 2 S Assume exogenous variable I is not random () Introductory Econometrics: Topic 5 82 / 94

83 cov (P, ε D ) = E (Pε D ) = E [(π 1 I + ε 1 ) ε D ] [( ) ] εs ε = D E ε D β D β S σ 2 D = β D β S = 0, () Introductory Econometrics: Topic 5 83 / 94

84 OLS Estimation of Reduced Form Equations is Fine It is not hard to show that reduced form errors do satisfy the classical assumptions. Thus, OLS estimation of reduced form equations is BLUE So π 1 and π 2 are BLUE, but what are they estimates of? For instance, π 1 = demand curve. γ β D β S which is not the slope of supply curve or But here we can estimate slope of supply curve using π 1 and π 2 We have π 2 = β S π 1 and, thus, β S = π 2 π 1 Such a strategy is called indirect least squares. () Introductory Econometrics: Topic 5 84 / 94

85 How does all this relate to instrumental variables? Easy to show that indirect least squares is equivalent to instrumental variable estimation of supply curve using I as an instrument This example sheds insight on how to choose instruments Note: here we are estimating the supply curve, but there is no way of estimating the demand curve Why is this? Key thing is that income does not appear in supply curve. A general rule is: if you have an exogenous variable which is excluded from an equation, then it can be used as an instrumental variable for that equation. This occurs for supply curve, but not demand curve () Introductory Econometrics: Topic 5 85 / 94

86 An example where the explanatory variable could be correlated with the error Suppose interested in estimating returns to schooling and have data from a survey of many individuals on: The dependent variable: Y = income The explanatory variable: X = years of schooling And other explanatory variables like experience, age, occupation, etc.. which we will ignore here to simplify exposition. My contention is that, in such a regression it probably is the case that X is correlated with the error and, thus, OLS will be inappropriate. () Introductory Econometrics: Topic 5 86 / 94

87 To understand why, first think of how errors are interpreted in this regression. An individual with a positive error is earning an unusually high level of income. That is, his/her income is more than his/her education would suggest. An individual with a negative error is earning an unusually low level of income. That is, his/her income is less than his/her education would suggest. What might be correlated with this error? Perhaps each individual has some underlying quality (e.g. intelligence, ambition, drive, talent, luck or even family encouragement). () Introductory Econometrics: Topic 5 87 / 94

88 This quality would like be associated with the error (e.g. individuals with more drive tend to achieve unusually high incomes). But this quality would also effect the schooling choice of the individual. For instance, ambitious students would be more likely to go to university. Summary: Ambitious, intelligent, driven individuals would both tend to have more schooling and more income (i.e. positive errors). So both the error and the explanatory variable would be influenced by this quality. Error and explanatory variable probably would be correlated with one another. () Introductory Econometrics: Topic 5 88 / 94

89 How do you choose instrumental variables? There is a lot of discussion in the literature how to do this. But this is too extensive and complicated for this course, so we offer a few practical thoughts. An instrumental variable should be correlated with explanatory variable, but not with error. Sometimes economic theory (or common sense) suggests variables with this property. In our example, we want a variable which is correlated with the schooling decision, but is unrelated to error (i.e. factors which might explain why individuals have unusually high/low incomes) An alternative way of saying this: we want to find a variable which affects schooling choice, but has no direct effect on income. () Introductory Econometrics: Topic 5 89 / 94

90 Characteristics of parents or older siblings have been used as instruments. Justification: if either of your parents had a university degree, then you probably come from a family where education is valued (increase the chances you go to university). However, your employer will not care that your parents went to university (so no direct effect on your income). Other researchers have used geographical location variables as instruments. Justification: if you live in a community where a university/college is you are more likely to go to university. However, your employer will not care where you lived so location variable will have no direct effect on your income. () Introductory Econometrics: Topic 5 90 / 94

91 Advice for Empirical Practice Divide your variables into the dependent variable, the ones(s) which may be endogenous explanatory variables, the ones which are exogenous explanatory variables and the ones which might be instruments Economic theory and common sense can help you make an initial classification Previous example illustrates this Then use Hausman and Sargan tests to try and see if your classification is a good one and, if not, experiment with adjusting classifications Also: a valid instrument should not have explanatory power for y (after controlling for effect of exogenous explanatory variables) Hence, can experiment with regressions including instruments: they should not be significant (in a regression which passes the Hausman test) () Introductory Econometrics: Topic 5 91 / 94

92 Chapter Summary Chapter discusses violations of classical assumptions and breaks into a "GLS" part and an "IV" part If errors either have different variances (heteroskedasticity) or are correlated with one another, then OLS is unbiased, but is no longer the best estimator. The best estimator is GLS. If heteroskedasticity is present, then the GLS estimator can be calculated using OLS on a transformed model. If suitable transformation cannot be found, then heteroskedasticity consistent estimator should be used. There are many tests for heteroskedasticity, including the Breusch-Pagan and the White test. () Introductory Econometrics: Topic 5 92 / 94

93 If errors are autocorrelated, GLS estimator is OLS on a transformed model. The required transformation involves quasi-differencing each variable. The Cochrane-Orcutt procedure is a popular way of implementing the GLS estimator. There are many tests for autocorrelated errors, including the Breusch-Godfrey test, the Box-Pierce test and the Ljung test. In many applications, it is implausible to treat the explanatory variables and fixed. Hence, it is important to allow for them to be random variables. If explanatory variables are random and all of them are uncorrelated with the regression error, then standard methods associated with OLS still work. If explanatory variables are random and some of them are correlated with the regression error, then OLS is biased. The instrumental variables estimator is not. () Introductory Econometrics: Topic 5 93 / 94

1 The Multiple Regression Model: Freeing Up the Classical Assumptions

1 The Multiple Regression Model: Freeing Up the Classical Assumptions Some or all of classical assumptions were crucial for many of the derivations of the previous chapters. Derivation of the OLS estimator