Notes on Instrumental Variables Methods

Size: px

Start display at page:

Download "Notes on Instrumental Variables Methods"

Rosanna Haynes
6 years ago
Views:

1 Notes on Instrumental Variables Methods Michele Pellizzari IGIER-Bocconi, IZA and frdb 1 The Instrumental Variable Estimator Instrumental variable estimation is the classical solution to the roblem of endogeneity. A variable is considered to be endogenous in a model when it is correlated with the error term, so that the key OLS identification assumtion fails. For examle, consider the following multinomial model: y = β 0 + β 1 x β K 1 x K 1 + β K x K + u (1.1) where the following assumtions holds: E(u) = 0; Cov(x j, u) = 0 j = 1, 2,..., K 1; but Cov(x K, u) 0. and we say that x K is endogenous. Problems of endogeneity may arise from various sources, the two most common are omitted variables and measurement error. Notice that the roblem is articularly worrisome because the endogeneity of one regressor tyically revents consistent estimation of all the other arameters of the model. 1 These notes are largely based on the textbook by Jeffrey M. Wooldridge Econometric Analysis of Cross Section and Panel Data. MIT Press. Contact details: Michele Pellizzari, IGIER-Bocconi, via Roentgen 1, Milan (Italy). michele.ellizzari@unibocconi.it 1 You are going to show this in a roblem set. 1

2 The instrumental variable (IV) aroach to solve endogeneity is based on the idea of finding an external variable z 1, the instrument, that satisfies the following two imortant roerties: IV Assumtion 1 Cov(z 1, u) = 0 which essentially means that the instrument should not be endogenous itself, i.e. it should be exogenous; IV Assumtion 2 Cov(z 1, x K x 1,..., x K 1 ) 0, and ossibly this correlation should be large (ositive or negative does not matter). The conditioning of the other exogenous variables of the model guarantees that the correlation between the instrument and the endogenous variable is not suriously driven by other regressors. The imortance of this detail will be clear later. Finding such instrumental variable is the most difficult art of the entire rocedure. There is no redetermined rocedure that can be followed to find a good instrument. It s all about being smart and creative. And convincing. In fact, as we will see later, while Assumtion 1 can be tested emirically, there is no way to test Assumtion 2 (other than having an alternative instrument) and you will just have to find an idea for an instrument that is so smart and convincing that eole reading your results will have nothing to criticize! To give you an examle of a good instrument, here is one that is often used in studies of family economics. Suose you want to study the wage enalty that young mothers ay when they re-enter the labour market after regnancy. So, suose you have data on a samle of working women and your main equation has wages on the left-hand-side and on the right-handside an indicator for whether the woman had a baby in the revious 6-12 months lus a set of controls. You are obviously (and rightly) worried that the motherhood indicator is endogenous in this equation. Your brilliant idea for the instrument is the gender of revious children in the family. There is amle evidence that the likelihood of having a second baby is higher if the first baby was a girl. This is articularly true in develoing countries but it also alies to industrialized ones. Even more robust is the higher likelihood of an additional regnancy if in the family there are only children 2

3 of the same sex (if you have two boys you re more likely to go for a third children than if you had a boy and a girl). These tyes of variables seem fairly exogenous as (generally) one cannot do much to choose the gender of one s children (although selective abortion is an issue in some develoing countries. A leading aer on this issue is Oster, E Heatitis B and the Case of the Missing Women. Journal of Political Economy, vol. 113(6), ). Once, you have found your brilliant idea for an instrument, things become easy. In the original model there are K arameters to be estimated. To do that we can use the following set of K moment conditions: which can be written jointly as: E(x 1 u) = 0 E(x 2 u) = 0 E(x K 1 u) = 0 E(z 1 u) = 0. E(z u) = 0 (1.2) where z = (x 1, x 2,..., x K 1, z 1 ) is the vector that includes all the exogenous variables of the model: all the x s, excluding x K that is endogenous, and the instrument z 1. Using assumtion 1.2 it is easy to show that the vector of arameters is in fact identified: E(z u) = E[z (y xβ)] = E(z y) E(z x)β β = E(z x) 1 E(z y) (1.3) Now, we can derive a consistent estimator of β by simly alying the analogy rincile to equation 1.3: [ N ] 1 [ N ] β IV = N 1 z ix i N 1 z iy i (1.4) i=1 The basic results in asymtotic theory guarantee that β IV is a consistent and asymtotically normal estimator of β. 3 i=1

4 Obviously, this estimation method extends directly to cases in which the number of endogenous variables is larger than one, in which case we will have to find (at least) one instrument for each of them. For examle, if in the revious model also x K 1 were endogenous, we would have to find an additional instrument z 2 such that E(z 2 u) = 0 and Cov(z 2, x K 1 ) 0. Then, we would simly redefine the vector of all exogenous variables as z = (x 1, x 2,..., x K 2, z 2, z 1 ) and roceed as before to comute β IV. Models of the tye discussed in this section, where the number of instruments is exactly equal to the number of endogenous variables, are called just identified. In the following section we consider over-identified models, that is models where the number of instruments exceeds the number of endogenous variables. 2 Multile Instruments: The Two-Stages Least Squares Estimator (2SLS) In some fortunate cases you are so smart to find more than just one instrument for each (or some) of the endogenous variables. In these cases the model is called over-identified, meaning that there is more than just one way to comute a consistent estimator for the arameters. Let us kee things simle and consider a model with just one endogenous variable, the same model of the revious section, but several instruments z 1,..., z M, all of them satisfying the conditions to be valid instruments: Cov(z h, u) = 0 h = 1,..., M Cov(x K, z h x 1,..., x K 1 ) 0 h = 1,..., M In rincile, with all these instruments we could construct u to M different IV estimators. Actually, a lot more. In fact, any linear combination of two or more of the M instruments is also a valid instrument. So the otential set of β IV that we could construct is very large and the question is which one to choose. Remember that one of the roerties of a good instrument is that it should be strongly correlated with the endogenous variable. 2 Hence, it seems reasonable to choose as instrument the one articular linear combination of 2 We will clarify the reason why such condition is imortant in the next section when we discuss the issue of weak instruments. 4

5 all instruments that maximizes the correlation with the endogenous variable. But how do we find such a linear combination? The simlest way to do that is to run a OLS regression of the endogenous variable x K on all the instruments: x K = ϑ 1 z ϑ M z M + δ 1 x δ K 1 x K 1 + e (2.1) The estimated ϑ s and δ s obtained from such regression will be used as the coefficients of the linear combination: x K = ϑ 1 z ϑ M z M + δ 1 x δ K 1 x K 1 (2.2) Now we can roceed as if we had only one instrument, x K. Notice that in equation 2.1 I have included also all the other exogenous variables in the model. We will discuss the reason for this in the next section. For now, make a little act of faith and accet that this is the right way to roceed. Let us also further clarify why the OLS coefficients of equation 2.1 are actually the coefficients of the linear combination of the instruments (and the other exogenous variables) that maximize correlation with x K. Remember that the OLS method minimizes the squared residuals. In other words, the OLS method looks for the coefficients that make the right-hand-side of equation 2.1 the most similar to the left-hand-side, which essentially amounts to maximizing the covariance between the two sides (as you make them as similar as ossible). 3 To conclude, this new IV estimator, which is called Two Stages Least Squares (2SLS), can be derived using x K as a single instrument for x K and alying the same rocedure as in section 1. Define ẑ the vector of exogenous variables analogously to z in section 1: ẑ = (x 1, x 2,..., x K 1, x K ). And comute the 2SLS estimator as: [ N ] 1 [ N ] β 2SLS = N 1 ẑ ix i N 1 ẑ iy i (2.3) i=1 Again, thanks to the basic results of asymtotic theory, we automatically know that β 2SLS is consistent and asymtotically normal. 4 But, why did we call this estimator two-stages-least-squares? That s because it can be obtained by a simle two-stes rocedure: 3 Another, simler way of saying this is that x K is the linear rojection of x K on all the exogenous variables, i.e. all instruments as well as all the exogenous regressors. 4 Finding its exact asymtotic variance-covariance matrix is a bit more comlex than usual because we should take into account the fact that in comuting this estimator we 5 i=1

6 1. First stage. Regress each endogenous variable on all exogenous ones, i.e. all instruments and all exogenous regressors, and obtain redicted values; 2. Second stage. In the main model, relace the endogenous variables with their redictions from the first stage regressions and run OLS. The resulting OLS estimator from the second stage regression is in fact β 2SLS. 5 Finally, notice that when the model is just-identified, the simle IV and the two-stages rocedure lead exactly to the same estimator. To see this equality, notice that the OLS estimator from the second stage regression can be written in full matrix notation as: β 2SLS = (Ẑ Ẑ) 1 (Ẑ Y ) (2.4) Also, remember that Ẑ is the matrix of redictions from the first stage regression and can thus be exressed as: 6 Ẑ = Z(Z Z) 1 Z X (2.5) If we now relace this exression into the full-matrix notation of β 2SLS, we obtain exactly the estimator of equation 2.3 in full-matrix notation: β 2SLS = (Ẑ Ẑ) 1 (Ẑ Y ) = (X Z(Z Z) 1 Z Z(Z Z) 1 Z X) 1 }{{}}{{} (Ẑ Y ) Ẑ = (X Z(Z Z) 1 Z }{{} X) Y ) Ẑ = (Ẑ X) 1 (Ẑ Y ) are using one variable that is itself an estimate. All standard statistical ackages comute 2SLS with correct standard errors and we ski this derivation. However, you should kee in mind that, whenever in an estimation rocedure you use a variable that is itself an estimate, you should worry about the comutation of the standard errors. 5 The standard errors of this second stage regression, however, will have to be adjusted to account for the fact that one (or more) of the regressors are estimates. 6 The rediction from the first stage is simly Z times the estimated set of coefficients, whose exression is in fact (Z Z) 1 Z X. The matrix notation is useful because it automatically takes into account that only one of the elements in Z is actually estimated while all the others simly reeat themselves. 6 Ẑ

7 3 Additional (but imortant!) Notes on Instrumental Variables 3.1 Why do we ut all the exogenous variables in the first stage regression? To clarify this oint, let us consider a very simle examle of a model with just two regressors, one of which is endogenous: y = β 0 + β 1 x 1 + β 2 x 2 + u (3.1) with E(u) = 0, Cov(x 1, u) = 0 but Cov(x 2, u) 0. Also suose that we have a valid instrument for x 2, a variable z 1 such that Cov(z 1, u) = 0 and Cov(z 1, x 2 ) 0. Now, consider what haens if we omit x 1 from the first-stage regression, i.e. if we run the first-stage regression only on the instrument. We still want to allow the ossibility that x 1 enters the secification of x 2 so let us write the first stage regression as follows: x 2 = ϑ 0 + ϑ 1 z 1 + (δ 1 x 1 + e) = ϑ 0 + ϑ 1 z 1 + v (3.2) where v is a comosite error term equal to δ 1 x 1 + e. If the two regressors x 1 and x 2 are unrelated to each other then δ 1 would be equal to zero. So, if we run the first stage regression without x 1 the rediction that we obtain is x 2 = ϑ 0 + ϑ 1 z 1. 7 The residual of this regression is ṽ = x 2 x 2 and, by the analogy rincile, it converges in robability to the comosite error term v = δ 1 x 1 + e. So, when we relace x 2 in equation 3.1 with x 2 we obtain the following: y = β 0 + β1x 1 + β2 x 2 + (β 2 ṽ + u) (3.3) which shows that, unless β 2 or δ 1 are equal to zero, x 1 will be correlated with the error term of the second-stage regression and will, thus, be endogenous and imede identification of all the arameters in the model. In fact, the error term of the second stage regression is β 2 ṽ + u and it is asymtotically equal to β 2 (δ 1 x 1 + e) + u which is by definition correlated with x 1. Notice that the instances which make the omission of x 1 in the firststage regression irrelevant are rather eculiar. If δ 1 is equal to zero, that 7 We use. instead of. to differentiate this analysis from the one develoed in section 2. 7

8 means that x 1 and x 2 are uncorrelated with each other (conditional on the instrument), which makes the inclusion of x 1 in the main model also irrelevant for the consistent estimation of β 2. If, instead, β 2 is equal to zero than the true model simly does not include x 2, which eliminates any roblem with endogeneity from the very start. 3.2 Weak instruments: why should the instrument be highly correlated with the endogenous variable? Consider a simle model with just one regressors that is endogenous: y = β 0 + β 1 x 1 + u (3.4) and suose there is one valid instrument z available to construct an IV estimator. We know that that estimator converges in robability to the following exression: β IV Cov(z, y) Cov(z, x) = Cov[z, (β 0 + β 1 x 1 + u)] Cov(z, x) Cov(z, u) Cov(z, x) = β 1 + If the instrument is valid, then asymtotically Cov(z, u) = 0 and the robability limit of β IV is simly β. However, this result is correct only as N while in small samles the Cov(z, u) will never be exactly equal to zero due to samling variation. 8 In the instrument is valid, then, β IV is certainly consistent but might be subject to a small samle bias (just like all estimators). Notice, however, that if the instrument is weak, that is only weakly correlated with the endogenous variable, then Cov(z, x) is small and the small samle bias might in fact become very large even if Cov(z, u) is also small. 9 8 Notice that when we talk about small samles we do not necessarily mean samles of small size. We simly intend to refer to non-asymtotic roerties. In this terminology, a small samle is any samle with N smaller than. 9 How strong the correlation between the instrument and the endogenous variable should be is a very subtle issue. In rincile, the instrument should cature all the variation in the endogenous variable that is not correlated with the error term and thus induces endogeneity. For this reason we do not want such correlation to be too high. At the same time, however, the instrument should be strong enough to avoid weak-instrument bias in small samles. There is no technical solution to this trade-off and you will have to evaluate the goodness of your instrument in the light of the secific setting case by case. 8

9 This roblem is articularly worrisome since several studies have shown that weak instruments can induce otentially very large biases also with relatively big samles (u to 100,000 observations). 10 As a rule of thumb, an instrument is considered weak is the t statistics in the first-stage regression is smaller than If you have more than one instrument you should look at the F statistics for the test of joint significance of all instruments (excluding the other exogenous regressors). 3.3 Testing endogeneity: the Hasuman test Is it ossible to test whether a regressor is endogenous? The answer to this question is yes, however, such test can only be erformed once an instrument for the otentially endogenous regressor is found. And we know that finding instruments is the most comlicated art of the entire rocess so in some sense the test for endogeneity comes a bit too late. In rincile we would like to know whether a regressor is endogenous or not before wasting our recious time looking for a nice idea for an instrument. It is only when we have found one that the endogeneity test can effectively be erformed. 11 The test, called Hausman test after the name of Jerry Hausman who first roosed it, is based on the following idea. Suose you have a model where you susect one (or more) regressors to be endogenous and you have found the necessary valid instruments to construct an IV estimator. Now, if the regressor(s) are really endogenous, the OLS estimator will be biased while the IV estimator will be consistent: H 1 : E(u x) 0 endogeneity β OLS β IV β IV β OLS β + bias β bias 0 Under the alternative hyothesis that the model is not affected by endo- 10 See Staiger, D. and J.H. Stock. Instrumental Variables Regression with Weak Instruments. Econometrica, vol.65, As you can guess from this short reamble, I am not a great fan of the Hausman test. 9

10 geneity, both estimators are consistent: H 0 : E(u x) = 0 no endogeneity β OLS β IV β IV β OLS β β 0 The idea, then, is to test hyothesis H 0 by testing that the difference between the OLS and the IV estimator is asymtotically equal to zero. To this end, we construct the quadratic form of such difference: H = ( β IV β OLS ) [V ar( β IV β OLS )] 1 ( β IV β OLS ) a χ 2 K (3.5) and, since we know that both β IV and β OLS are asymtotically normal, the quadratic form will be asymtotically distributed according to a χ 2 distribution with K degrees of freedom. 12 The comutation of the Hausman test, however, oses one little roblem. If you look at equation 3.5 you notice that, having roduced β IV and β OLS we can directly comute the difference but we have no clue about how to calculate the variance of the difference. All we obtain from the estimation of the two estimators are the variance-covariance matrices of each of them, i.e. AV ar( β IV ) and AV ar( β OLS ), but we know nothing about their covariance. So how do we comute the variance of the difference? Fortunately, a simle theorem tells us how to do this. You find the roof of the theorem in the aendix while here we only sketch the intuition which goes as follows: under H 0 we know that β OLS is not only consistent but also efficient, i.e. it is the linear estimator with the smallest ossible variance. Using this fact, it is ossible to show that: V ar( β IV β OLS ) = V ar( β IV ) V ar( β OLS ) (3.6) With this formula we can directly comute the Hausman test since both V ar( β IV ) and V ar( β OLS ) are already known from the estimation of β IV and β OLS. 12 Remember that K is the number of arameters of the model, i.e. the dimensionality of the vector β. 10

11 3.4 Over-identification test When the model is over-identified, i.e. when we have more instruments for each endogenous variable, we may not want to use all of them. In fact, there is a trade-off (that we are not going to analyse in details) between the ower of the first stage regression and the efficiency of the IV estimator: the more instruments we use the more owerful the first-stage regression will be, in the sense that it will exlain a larger and larger fraction of the variance of the endogenous variable, but also the more instruments we use the larger the variance of the estimator, i.e. the less efficient it will be. To give you an extreme examle, imagine to have just one endogenous variable and two instruments. From what you have learned so far, the best thing to do in such case is simly to use both instruments in a 2SLS rocedure. However, suose that the two instruments are almost erfectly collinear (if they were erfectly collinear, you would not be able to run the first order equation) so that, conditional on one of them, there is very little additional information to be exloited from the other. In such case, you would exect the estimators roduced using either one or the other of the two instruments to be asymtotically identical (and robably very similar also in small samles). However, it is easy to guess that the one roduced using both instruments will be the least efficient: the use of two instruments reduces the available degrees of freedom without adding much information. So, how do we choose which instrument(s) to kee in case of over-identification? The common ractice is to kee those that aear to be most significant in the first stage regression. However, one could also construct a formal test to comare the estimators roduced with two different subsets of instruments. If the test shows that the two estimates are asymtotically identical, then there is no need to use all instruments jointly. We are not going to see over-identification tests in details. 11

MATH 2710: NOTES FOR ANALYSIS

MATH 2710: NOTES FOR ANALYSIS MATH 270: NOTES FOR ANALYSIS The main ideas we will learn from analysis center around the idea of a limit. Limits occurs in several settings. We will start with finite limits of sequences, then cover infinite