Estimating and Testing Non-Linear Models Using Instrumental Variables

Size: px

Start display at page:

Download "Estimating and Testing Non-Linear Models Using Instrumental Variables"

Lee Barton
6 years ago
Views:

1 Estimating and Testing Non-Linear Models Using Instrumental Variables Lance Lochner University of Western Ontario NBER Enrico Moretti University of California Berkely NBER, CEPR and IZA November, 2012 Abstract In many empirical studies, researchers seek to estimate causal relationships using instrumental variables. When few valid instruments are available, researchers typically estimate models that impose a linear relationship between the dependent variable and the endogenous variable, even when the true model is likely to be non-linear. In the presence of non-linearity, ordinary least squares (OLS) and instrumental variable (IV) estimators identify different weighted averages of the underlying marginal causal effects, so the traditional Hausman test (applied to mis-specified linear models) is uninformative about endogeneity. We build on this insight to develop a new exogeneity test that is robust to non-linearity in the endogenous regressor. This test compares the IV estimator applied to the mis-specified linear model with an appropriately weighted average of all marginal effects estimated from the correctly specified non-linear model using OLS. Notably, our test works well even when only a single valid instrument is available, and the true model cannot be estimated using IV methods. We re-visit three recent empirical examples to show how the test can be used to shed new light on the role of non-linearity. We thank Josh Angrist, David Card, Pedro Carneiro, Jim Heckman, Guido Imbens, the editor, two anonymous referees and seminar participants at the 2008 UM/MSU/UWO Summer Labor Conference, UCSD, and Stanford for insightful suggestions. We are also grateful to Matias Cattaneo and Javier Cano Urbina, who provided excellent research assistance and insightful substantive comments, as well as Martijn van Hasselt and Youngki Shin for their many comments and suggestions.

2 1 Introduction Many recent empirical papers seek to estimate causal relationships using instrumental variables (IV), including two-stage least squares (2SLS) estimators, when concerns about causality arise. In many cases, only a single valid instrument is available, so researchers are often limited to estimating a linear relationship between the dependent variable and the potentially endogenous regressor. A model frequently estimated in practice has the following form: y i = s i β L + x iγ L + ν i, (1) where y i is the outcome of individual i; s i is the potentially endogenous regressor; and x i is a k 1 vector of exogenous covariates (including an intercept). In this paper, we consider the case of a discrete endogenous regressor s i. For example, s i might represent years of schooling: s i {0, 1, 2, 3,..., S}. Conclusions about exogeneity of s i and consistency of the ordinary least squares (OLS) estimator are typically based on a comparison of OLS and IV estimates of β L. When a standard Hausman test (Hausman 1978) indicates a significant difference between OLS and IV estimates, it is common to conclude that endogeneity of s i plays an important confounding role in OLS. Yet, in many economics applications, the true relationship between y i and s i is unlikely to be linear. In particular, suppose that the true model has the more general form: y i = D ij β j + x iγ + ε i, (2) j=1 where D ij = 1[s i j] reflects a dummy variable equal to one if total years of schooling are at least j, E(ε i ) = 0, and E(ε i x i ) = 0. Here, the β j represent grade-specific effects of moving from j 1 to j years of schooling. The key difference between the models in equations (1) and (2) is that the former assumes a constant effect of s i across all levels of s i while the latter does not. For example, in the classic case of the return to education, the model in (1) assumes that the effect of an extra year of elementary school is identical to the effect of the last years of high school and college, while the model in (2) allows for sheepskin effects and other non-linearities which are likely to arise in practice. While non-linearities are likely to be important in many applications, relatively few studies have focused on their practical implications when instrumental variables may be needed. 1 The 1 Angrist, Graddy, and Imbens (2000), Lochner and Moretti (2001), and Mogstad and Wiswall (2010) are notable 1

3 difficulty in estimating a specification like that of equation (2) when endogenity concerns arise is that there may be many β j parameters to estimate, while researchers typically have very few valid instruments. As a consequence, empirical studies commonly estimate models like equation (1) even when there is no theoretical reason to do so and in some cases there is prima-facie evidence of important non-linearities. In this paper, we demonstrate that when the true relationship between y i and s i is non-linear as in equation (2) but the estimated model is linear as in equation (1), OLS and IV methods estimate different weighted averages of all grade-specific effects, which implies that the standard Hausman test is uninformative about endogeneity. Based on this insight, we develop a new endogeneity specification test that is robust to a general non-linear relationship between the dependent variable and endogenous regressor. Importantly, this test only requires a single (even binary) instrument. We are not the first to point out that estimates from a mis-specified linear model yield weighted averages of each grade-specific effect. Yitzhaki (1996) derives these weights in the context of OLS, while Angrist and Imbens (1995) and Heckman, Urzua, and Vytlacil (2006) derive weights for IV estimators in the presence of both non-linearity and parameter heterogeneity. Angrist and Imbens (1995) show conditions under which 2SLS estimates a local average treatment effect (LATE). 2 In a very general setting, Heckman, Urzua, and Vytlacil (2006) discuss ordered and unordered choice models with unobserved heterogeneity and nonlinearity, developing weights for treatment effects using general instruments. Heckman and Vytlacil (2005) emphasize that in the presence of parameter heterogeneity, there is no single effect of the regressor on an outcome, and different estimation strategies provide estimates of different parameters of interest or different average effects. While most of the literature focuses on parameter heterogeneity across individuals with a constant marginal effect (i.e. y i is linear in s i ), we consider the opposite case, assuming a non-linear relationship between y i and s i that is the same for all individuals. 3 Our setting is a special case of that used by Heckman, Urzua, and Vytlacil (2006); however, our emphasis on non-linearities and the endogeneity test are novel. exceptions. 2 Intuitively, the LATE reflects the effect of a regressor on outcomes for individuals induced to change their behavior in response to a change in the value of the instrument. 3 Studies focused on parameter heterogeneity include Imbens and Angrist (1994), Wooldridge (1997), Heckman and Vytlacil (1998, 1999, 2005), Card (1999), Kling (2000), Moffitt (2009), and Carneiro, Heckman and Vytlacil (2010). 2

4 We begin by showing that inappropriately assuming linearity will generally yield different OLS and IV/2SLS estimates even in the absence of endogeneity, since these estimators can be written as weighted averages of causal responses to each marginal change in the regressor, where the sets of weights differ for the estimators. 4 An appealing feature of our setting is that the weights have an intuitive interpretation, are functions of observable quantities, and can be easily estimated under very general assumptions. Therefore, it is possible to directly compare the OLS and IV weights. This insight leads to our main contribution: a new specification test for endogeneity in the presence of non-linearities. Because OLS and IV/2SLS applied to mis-specified linear models identify different weighted averages of marginal effects, the traditional Hausman test is uninformative about endogeneity of the regressor. It may reject equality of OLS and IV/2SLS estimates even when the regressor is exogenous, and it may fail to reject equality when the regressor is endogenous. order to implement the standard Hausman test for equation (2), one would need to estimate all β j parameters using IV methods. This is often impossible in practice due to the large number of instruments required in most applications. For example, when s reflects years of schooling, this would require 20 valid instruments for each potential grade level. The test that we propose can be thought of as a generalization of the standard Hausman test and is informative about the consistency of OLS estimates for all β j effects in equation (2). Our test re-weights OLS estimates of the β j s from equation (2) using estimated IV/2SLS weights and compares this with the corresponding IV/2SLS estimator of β L in equation (1). In Under fairly general conditions, our test can be implemented even when only a single valid (binary) instrument is available. 5 It is important to note that we test whether the weighted average of all OLS ˆβ j asymptotic biases equals zero. Therefore, our test has no power against the possibility that some OLS ˆβ j 4 Relative to the existing literature, our models are closer to those typically estimated in practice. Angrist and Imbens (1995) only consider discrete regressors that are indicators that place observations into mutually exclusive categories, and they interact their instrument (also assumed to be discrete) with each of these regressors to create a large set of effective instruments. The Heckman, Urzua, and Vytlacil (2006) discussion of instrumental variables estimation in ordered choice models is left implicit on all covariates affecting the outcome variable. 5 Lochner and Moretti (2001) and Mogstad and Wiswall (2010) suggest that comparing re-weighted OLS estimates with IV/2SLS estimates may be a useful heuristic approach for assessing the importance of non-linearities. In this paper, we develop a formal econometric test for exogeneity based on this insight. Note that our test differs conceptually and practically from the omnibus specification tests developed by White (1981), which essentially compare different weighted generalized least squares estimators for a general nonlinear function. 3

5 estimates are asymptotically biased upwards and others downwards in such a way as to exactly cancel each other when averaged using the IV/2SLS weights. Still, rejection of the null implies that OLS estimates are inconsistent. Furthermore, we discuss conditions under which all ˆβ j asymptotic biases would be of the same sign, in which case our test is equivalent to testing whether all OLS ˆβ j estimates are consistent. The fact that our test requires only a single instrument should make it attractive to empirical researchers. In many contexts, researchers can easily use OLS to estimate models with general non-linear relationships between their outcome variable and a potentially endogenous regressor (e.g. regressing log wages on a set of 20 schooling dummies), yet they often have very few valid instruments at their disposal. In these cases, the models that can be estimated using IV/2SLS are often very restricted as in equation (1). Still, the researcher can use our test to establish whether the OLS estimates are consistent without having to estimate the general non-linear model using IV/2SLS. If our test fails to reject exogeneity, researchers can have some confidence in their OLS estimates. However, if our test rejects, it does not help in estimating the true model. Our test, therefore, offers only a partial solution to the problem of non-linear models with few instruments. We stress that even if exogeneity cannot be rejected, researchers should exercise caution when conducting inference using OLS estimates of equation (2) when the instruments are not sufficiently strong. Like the Hausman test, our test does not have much power when instruments are weak. As Wong (1997) and Guggenberger (2010) demonstrate, this can cause size problems with inference in a two-stage approach where the Hausman test is used to determine exogeneity in a first stage, and OLS estimates are used in a second stage when exogeneity cannot be rejected. 6 Monte Carlo simulations confirm that similar inference problems can arise when using our test with insufficiently strong instruments. In the last part of the paper, we demonstrate the practical usefulness of our test by re-examining three recent empirical papers in which estimated 2SLS effects differ from OLS effects. In one example, our test suggests that schooling is exogenous for incarceration among white men. This is empirically useful, since it lends credibility to OLS estimates that suggest a highly non-linear 6 Specifically, Wong (1997) and Guggenberger (2010) provide simulation evidence (in a linear regression model like equation 1) for the null rejection probability of a simple hypothesis test conditional on a standard Hausman pretest for exogeneity not rejecting. Their findings indicate that when regressor endogeneity is small, the null rejection probability of the hypothesis test may be substantially higher than the nominal size if the instruments are not sufficiently strong. 4

6 relationship between educational attainment and the probability of imprisonment. In contrast, our test strongly rejects exogeneity of schooling for incarceration among black men, while the standard Hausman test does not. In this case, the endogeneity of schooling is obscured when non-linearities are ignored. Our other examples produce greater concordance between the standard Hausman test and our exogeneity test; however, for different reasons. The rest of the paper is organized as follows. In Section 2, we show conditions under which OLS, IV and 2SLS estimates of β L in equation (1) can be written as weighted averages of the true underlying β j parameters in the non-linear model given by equation (2). Section 3 develops an exogeneity test that can be used to determine consistency of the OLS estimator for equation (2). Section 4 presents the results from three previous empirical examples, and Section 5 concludes. 2 Estimating Non-Linear Relations Under Linearity Assumptions In this section, we consider IV/2SLS and OLS estimators when equation (1) is estimated, but the true model is described by equation (2). We show conditions under which these estimators converge to a weighted average of each grade-specific β j effect and discuss the weights. 2.1 IV Estimation with a Single Instrument We first consider IV estimation with a single instrument, discussing OLS as a special case. We focus on the case where the potentially endogenous variable s i is discrete. 7 Throughout the paper, we assume ε i is independent across individuals with E(ε i ) = 0, x i is distributed with density F (x), and E(ε i x i ) = 0. It is useful to decompose schooling in the population as s i = x i δ s + η i, where δ s = [E(x i x i )] 1 E(x i s i ) by construction and E(x i η i ) = 0. The following IV assumption is standard. Assumption 1. The instrument is uncorrelated with the error in the outcome equation, E(ε i z i ) = 0, and correlated with schooling after linearly controlling for x i, E(η i z i ) 0. 7 While we study the case of a discrete endogenous regressor, OLS and IV estimators will also yield different weighted averages of marginal effects when the regressor is continuous.when the endogenous regressor is continuous, one could always discretize the problem and directly apply our methodology. Alternatively, the insights of Yithzaki (1996) might be used to develop weights and a related test specifically designed for the continuous regressor case. 5

7 Let M x = I x(x x) 1 x and s = M x s for any variable s. (We drop the i subscripts when we refer to the vector or matrix version of a variable that vertically stacks all individual-specific values.) With a single instrument, 2SLS estimation of the linear model (equation 1) is equivalent to the following IV estimator: ( where Wj IV = ( z s) 1 z 1 D j = N ˆβ IV L = (z M x s) 1 z M x y = ( z s) 1 z D j β j + ( z s) 1 z ε = j=1 j=1 Wj IV β j + ( z s) 1 z ε ) ( N / z i D ij i=1 N 1 N i=1 z i s i ). Since S j=1 D ij = s i, these W IV j sum to one over j = 1,..., S. We refer to them as weights even though they may be negative for some j. 8 One helpful assumption is monotonicity in the effects of the instrument on schooling. Although monotonicity is not necessary for deriving and estimating weights, it does help ensure that they are non-negative and simplifies their interpretation. Monotonicity implies that the instrument either causes everyone to weakly increase or causes everyone to weakly decrease their schooling. Without loss of generality, we assume that s i is weakly increasing in z i. Define s i (z) to be the value of s i for individual i when z i = z. Assumption 2. (Monotonicity) The instrument does not decrease schooling: P r[s i (z) < s i (z )] = 0 for all z > z. To facilitate our analysis of ˆβ L IV, it is useful to decompose z i = x i δ z+ζ i where δ z = [E(x i x i )] 1 E(x i z i ) and E(x i ζ i ) = 0. Proposition 1. If Assumption 1 holds, then ˆβ L IV p S j=1 ωj IV β j, where ωj IV = P r(s i j)e(ζ i s i j) [P r(s i k)e(ζ i s i k)] k=1 (3) 8 When they cannot be shown to be non-negative, we use weights with quotation marks to distinguish them from cases when they are known to be proper weights that are both non-negative and sum to one. 6

8 sum to unity over all j = 1,..., S. Furthermore, if E(z i x i ) = x i δ z and Assumption 2 (Monotonicity) holds, then the weights are non-negative and can be written as ωj IV = E{Cov(z i, D ij x i )} E{Cov(z i, D ik x i )} k=1 0. (4) Proof: See Online Appendix A. This result shows that estimating the mis-specified linear model using IV yields a consistent estimate of a weighted average of all grade-specific marginal effects. The weights on all gradespecific effects are straightforward to estimate. From a 2SLS regression of D ij on s i and x i using z i as an instrument for s i, the coefficient estimate on s i equals W IV j. When the instrument affects all persons in the same direction and its expectation conditional on x i is linear (e.g. x s are mutually exclusive and exhaustive categorical indicator variables), the weights are non-negative and depend on the strength of the covariance between the instrument and each schooling transition indicator conditional on other covariates. In general, different instruments yield estimates of different weighted averages, even if the instruments are all valid. With Assumption 1, E(z i x i ) = x i δ z, and E(ε i x i ) = 0, it is straightforward to show that the IV estimator converges to a weighted average of all conditional (on x) IV estimators, where the weights are proportional to the covariance between the instrument and schooling conditional on x: ˆβ L IV p β IV (x)h(x)df (x), where β IV (x) = Cov(z i,y i x) Cov(z i,s i x) is the population analogue of the IV estimator conditional on x i = x and h(x) = Cov(z i,s i x) Cov(zi,s i a)df (a) h(x) 0 under Assumption 2). Notice that β IV (x) = S is a weighting function that integrates to one for different x (with j=1 β j ωj IV (x), where ωj IV (x) = Cov(z i,d ij x i ) Cov(z i,s i x) are x-specific IV weights for each grade-specific effect, β j. So, each x-specific IV estimator is simply a weighted average of the grade-specific β j effects, where the weights are proportional to the covariance between the instrument and D ij conditional on x. Some re-arranging shows that we can write the IV weights from equations (3) or (4) as ω IV j = ω IV j (x)h(x)df (x). 9 These results complement the IV/2SLS analyses of Angrist and Imbens (1995) and Heckman, Urzua, and Vytlacil (2006), who also consider parameter heterogeneity along with non-linearity. In 9 In Online Appendix A, we further show that with a binary instrument, the ωj IV (x) weights can be more easily interpreted along the lines of the LATE analysis of Angrist and Imbens (1995). 7

9 order to ease interpretation in the presence of parameter heterogeneity, Angrist and Imbens (1995) make strong assumptions about the additional x i covariates and how they enter in estimation. Specifically, they assume that the x i regressors are indicator variables that place individuals into mutually exclusive categories and that the instrumental variable (also assumed to be discrete) is interacted with all of these additional covariates. By contrast, Heckman, Urzua, and Vytlacil (2006) consider a very general setting for ordered and unordered choice models; however, their discussion of IV estimation for these models implicitly conditions on all covariates x i (deriving IV weights analogous to ωj IV (x) in our setting). Results in this section could, therefore, be derived as a special case of their analysis. While our analysis ignores heterogeneity in the grade-specific effects, it considers estimation under common assumptions about covariates and the way they typically enter during estimation. We are not focused on finding an economic interpretation for the IV estimator, since the weights we consider can easily be estimated. Instead, we are interested in empirically comparing the OLS and IV weights and deriving a test for whether the different weights can explain differences between the two estimators when linearity is incorrectly assumed. Since OLS is a special case of IV estimation, in the absence of endogeneity, the OLS estimator for the linear model also converges to a weighted average of the grade-specific effects, β j, where the weights are non-negative and sum to one. Corollary 1. If E(ε i s i ) = 0 then where the sum to unity over all j = 1,..., S. ˆβ L OLS p j=1 ωj OLS = P r(s i j)e(η i s i j) P r(s i k)e(η i s i k) k=1 ω OLS j β j (5) 0 (6) Proof: This result largely follows from Proposition 1 replacing z i with s i. Online Appendix A shows that the OLS weights are always non-negative. The empirical counterpart to the OLS weights, W OLS j p ω OLS, is simply the coefficient estimate on s i in an OLS regression of D ij on s i and x i. Therefore, only data on x i and s i are needed to construct consistent estimates of the asymptotic weights. Of course, the weights implied by OLS j 8

10 estimation will not generally equal the weights implied by IV estimation. 10 In Section 4, we graph estimated OLS and IV weights in a few different empirical applications. Researchers often estimate linear specifications rather than more general non-linear models, because they are limited in the instrumental variables at their disposal. Yet, there is no reason to expect OLS and IV estimators for a mis-specified linear model to be equal even in the absence of endogeneity (i.e. if s i and z i are both uncorrelated with ε i ) or individual-level parameter heterogeneity (i.e. all β j parameters are the same for everyone). As a result, standard Hausman tests applied to the mis-specified linear model may reject the null hypothesis of exogenous s due simply to non-linearity in the relationship between y and s. Below, we develop a chi-square test for whether OLS estimation of equation (2) yields consistent estimates of the underlying β j parameters (i.e. whether E(ε i s i ) = 0) even when only a single valid instrumental variable is available. However, first, we generalize our key results to the case of many instruments SLS Estimation with Multiple Instruments We now generalize the results to the case where we have I distinct instruments for schooling, z i = (z i1... z ii ), but the researcher still estimates the linear-in-schooling model (1). Let s i = x i θ x + z i θ z + ξ i, with ˆθ x and ˆθ z reflecting the corresponding OLS estimates of θ x and θ z. Further define the predicted value of schooling conditional on x and z: ŝ = x ˆθ i x + z i ˆθ z. Then, ˆβ 2SLS L = (ŝ M x ŝ) 1 ŝ M x y = W j β j + (ŝ M x ŝ) 1 ŝ M x ε (7) where the weights W j = (ŝ M x ŝ) 1 ŝ M x D j = (ˆθ zz M x z ˆθ z ) 1 ˆθ z z M x D j reflect consistent estimates of ω j from 2SLS estimation of j=1 D ij = s i ω j + x iα j + ψ ij, j {1,..., S}. (8) 10 For example, consider the case with no x regressors (except an intercept). It is straightforward to show that ωj+1 OLS ωj OLS (E(s i ) j) P r(s i = j), which is positive for j < E(s i ), zero for j = E(s i ), and negative when j > E(s i). This implies that OLS estimation of the linear specification places the most weight on grade-specific β j effects near the mean schooling level. When schooling is uniformly distributed in the population, the weights decay symmetrically as one moves away from the mean in either direction. Contrast this with the IV weights in the case of a binary instrument z i {0, 1} satisfying the monotonicity assumption. In this case, IV places all the weight on schooling margins that are affected by the instrument, while the underlying distribution of schooling in the population is irrelevant. 9

11 We assume that Assumption 1 holds for all z il instruments and that we have sufficient variation in z i conditional on x i for identification. Let ζ i = (ζ i1,..., ζ ii ) be the I 1 vector collecting all ζ il = z il x i δ zl, where δ zl = [E(x i x i )] 1 E(x i z il ) was introduced above in the single-instrument case. Assumption 3. The covariance matrix for z i after partialling out x i, E(ζ i ζ i ), is full rank. As with the single-instrument IV estimator, we can show that the linear 2SLS estimator converges in probability to a weighted average of all grade-specific effects. Letting ω IV jl reflect the grade j weight from the single-instrument IV estimator using z il as the instrument as defined by equation (3), the 2SLS estimator weight on any β j is a weighted average of each of these single-instrument IV estimator weights. Proposition 2. Under Assumptions 1 and 3, ˆβ L 2SLS unity over all j = 1,..., S and Ω l = sum to unity over all l = 1,..., I. I θ zl k=1 θ zm m=1 k=1 p S j=1 P r(s i k)e(ζ il s i k) P r(s i k)e(ζ im s i k) E(z il x i ) = x i δ zl, then all ω IV jl, Ω l, and ω j are non-negative. Proof: See Online Appendix A. ω j β j, where ω j = I l=1 Ω l ω IV jl sum to Furthermore, if each instrument satisfies Assumption 2 and Not surprisingly, one can also show that the 2SLS estimator converges in probability to a weighted average of the probability limits of all single-instrument IV estimators, where the weights are given by Ω l in equation (9). 11 (9) 3 A Wald Test for Consistent OLS Estimation of All β j s When at least one valid instrumental variable is available, the analysis of Section 2 suggests a practical test for whether OLS estimates of B (β 1,..., β S ) from equation (2), ˆB, are consistent If we define β L IV,l = plim ˆβ L IV,l where ˆβ L IV,l is the single-instrument IV estimator using z il as an instrument for s i in estimating equation (1), then ˆβ L 2SLS p I l=1 Ω l β L IV,l, where Ω l is defined by equation (9). 12 Formally, ˆB = (D M x D) 1 D M x y, where M x and y are defined earlier and D reflects the stacked N S matrix of (D i1,..., D is ) for all individuals. 10

12 We now develop a test that compares the 2SLS estimator for the linear model with the weighted sum of the unrestricted grade-specific OLS estimates of the β j s, using the estimated 2SLS weights W (W 1,..., W S ). Intuitively, if E(ε i s i ) = 0 so OLS estimates of equation (2) are consistent, then the re-weighted sum of these OLS estimates (using the 2SLS weights) should asymptotically equal the 2SLS estimator from the linear model, i.e. ˆβ L 2SLS W ˆB p 0. This will not generally be true when E(ε i D ij ) 0 for any j. Recall that ˆB is given by OLS estimation of equation (2), while ˆβ L 2SLS is given by 2SLS estimation of equation (1). Applying 2SLS to equation (8) yields estimates W j and ˆα j for all j. In order to derive our test statistic, we frame estimation of ˆB, ˆβ 2SLS L, and W as a stacked generalized method of moments (GMM) problem. This establishes joint normality of ( ˆB, ˆβ 2SLS L, W ) and facilitates estimation of the covariance matrix for all of these estimators. From this, a straightforward application of the delta-method yields the variance of ˆβ L 2SLS W ˆB, which is used in developing a chi-square test statistic for the null hypothesis that ˆT ˆβ L 2SLS W ˆB p 0. It is necessary to introduce some additional notation in order to define the test statistic. We first define the regressors for OLS estimation of equation (2), X 1i = (D i x i ), and the regressors, X 2i = (s i x i ), and instruments, Z 2i = (z i x i ), used in 2SLS estimation of equations (1) and (8). Denote the corresponding matrices for all individuals as X 1, X 2, and Z 2, respectively. Next, let Θ = (B γ β L γ L W 1 α 1... W S α S ) reflect the full set of parameters to be estimated. Finally, let ˆΘ denote the corresponding vector of parameter estimates, where (B γ ) is estimated by OLS and (β L γ L ) and all (W j α j ) are estimated via 2SLS. The variance of Θ can be consistently estimated from ˆV = ÂˆΛÂ, (10) where Â = [X 1 X 1] I 2 [ ˆX 2 ˆX 2 ] 1ˆΓ 2, (11) ˆΓ 2 = (Z 2 Z 2) 1 Z 2 X 2, ˆX2 = Z 2ˆΓ2, and 0 reflects conformable matrices of zeros. 13 Furthermore, ˆε ˆΛ = 1 N 2 i (X 1i X 1i) ˆε iˆν i (X 1i Z 2i) ˆε i ˆΨ i (X 1i Z 2i) ˆε N iˆν i (Z 2i X 1i) ˆν i 2(Z 2i Z 2i) ˆν i ˆΨ i (Z 2i Z 2i), (12) i=1 ˆε i ˆΨi (Z 2i X 1i) ˆν i ˆΨi (Z 2i Z 2i) ˆΨi ˆΨ i (Z 2i Z 2i) 13 See the proof of Theorem 1 in Online Appendix A. 11

13 where ˆε i = y i D i ˆB x iˆγ, ˆν i = y i s i ˆβL 2SLS x iˆγl, and ˆΨ i = ( ˆψ 1i ˆψ2i... ˆψSi ) with ˆψ ij = D ij s i W j ˆα j x i. Finally, define ˆT T ( ˆΘ) = ˆβ L 2SLS W ˆB, and let Ĝ ˆT = ( Ŵ 0 x 1 0 x ( ˆβ 1 0 x) ( ˆβ 2 0 x)... ( ˆβ S 0 x)) represent the (2S (S + 2)K) 1 jacobian vector for T ( ˆΘ) (where 0 x is a K 1 zero vector). It is now possible to derive a chi-square test statistic. Theorem 1. Under Assumptions 1 and 3, if E(ε i s i ) = 0, then [ ( W N = N ˆβ ] 2SLS L W ˆB) 2 d χ Ĝ ˆV 2 (1). (13) Ĝ Proof: See Online Appendix A. It is important to note that ˆT p 0 need not imply that ˆB p B for two reasons. First, this test cannot tell us anything about whether ˆβ j p βj for some grade transition j if ω j = 0. In other words, the test only provides information about the effects of grade transitions that are affected by the instrument. Second, the ˆβ j OLS estimates may be asymptotically biased upward for some j and downward for others. In general, ˆB p B B + {E(D i D i ) E(D ix i )[E(x ix i )] 1 E(x i D i )} 1 E(D i ε i ). Thus, ˆT p 0 for any B satisfying ω (B B ) = 0. A test based on Theorem 1 would have no power against these alternatives; although, rejection of the null hypothesis would imply that ˆB does not consistently estimate B. Under reasonable conditions, W N can serve as a valid test statistic for the null hypothesis that ˆB p B. If ω j > 0 for all j (a testable assumption) and if E(ε i D ij ) = E(ε i s i j) were either non-negative for all j or non-positive for all j, then all ˆβ j would be asymptotically biased in the same direction and B B ω (B B ) 0. In this case, testing whether ˆT p 0 would be equivalent to testing for consistency of ˆB. 14 To better understand these conditions, consider a standard latent index ordered choice model for schooling of the form: s i = µ(z i, x i ) + v i (14) s i = j if and only if j s i < j + 1. (15) 14 In the case where some ω j = 0, the test would be equivalent to testing for consistency of all β j with ω j > 0. 12

14 Assume that all x regressors and instruments z are independent of both errors: (ε i, v i ) (z i, x i ). It is straightforward to show that if E(ε v) is weakly monotonic in v, then E(ε i s i j) will be either non-positive or non-negative for all j. 15 Monotonicity of E(ε v) is trivially satisfied by all joint elliptical distributions (e.g. bivariate normal or t distributions), which produce linear conditional expectation functions. Intuitively, one is only likely to fail to reject the null hypothesis of ˆT p 0 when B B in cases where individuals with both high and low propensities for education (conditional on observable characteristics) have a higher (or lower) unobserved ε than individuals with an average propensity for schooling. In the case of an ordered choice model, this would imply a U-shaped (or inverted U-shaped) relationship for E(ε v). In many economic contexts, these perverse cases seem unlikely. We also note that if more than one valid instrument are available, then those instruments can be used in different combinations to perform separate tests. Because each 2SLS estimator (distinguished by the set of instruments used) converges to a different weighted average of the true B parameters (i.e. ω zb where z denotes the set of instruments used), it is unlikely that one would reject the null of ω zb = ω zb for all sets of instruments unless B = B. 16 To demonstrate the extent to which non-linearity can induce differences between OLS and IV estimates that our new exogeneity test can account for (while standard Hausman or Durbin-Wu- Hausman tests cannot), we perform a Monte Carlo simulation exercise based on Card s (1995) schooling earnings model. These results are discussed in detail in Online Appendix B; however, we note here that our test (see Theorem 1) performs well in two important respects. First, the test has nearly identical performance to the standard Hausman test (applied to estimates from the mis-specified linear model) in the absence of nonlinearities. Thus, there is no cost to using our test rather than the more traditional Hausman test that fails to account for non-linearities. Second, our test has very similar properties regardless of the extent of non-linearity, rejecting equality of the re-weighted OLS and IV estimates at noticeably higher rates for even small deviations from exogeneity as long as the instruments are sufficiently strong. Of course, when the instruments are relatively weak, our test (like the standard Hausman test for a linear model) has little power to detect endogeneity since the IV estimates tend to have large 15 Strictly speaking, weak monotonicity is only required over the range of v covered by j µ(z, x) (i.e. for v [1 µ(z, x), S µ(z, x)]), so behavior in the tails of the distribution is irrelevant. See Online Appendix A for details. 16 Because these test statistics are not generally independent, the critical values for this type of joint testing procedure are likely to be quite complicated. We do not address this issue here. 13

15 standard errors. In these cases, negligible amounts of endogeneity may be difficult to detect with our test. This can lead to poor size properties when conducting inference using OLS estimates of the β j parameters as discussed by Wong (1997) and Guggenberger (2010) who study this issue in the context of linear models and use of the Hausman test to determine exogeneity. Monte Carlo results presented in Online Appendix B suggest caution when using OLS estimates for inference even if our test fails to reject exogeneity if the instruments are relatively weak. This is particularly true when the IV and re-weighted OLS estimates are quite different but the IV estimates are very imprecise. 4 Practical Use of our Test and Three Empirical Examples To demonstrate the practical value of our test, we re-examine three recent empirical papers on the effects of individual and maternal schooling in which estimated 2SLS effects differ from corresponding OLS estimates. In all cases, the econometric specification assumed linearity. 17 the presence of non-linearities, differences between OLS and 2SLS weights may explain at least some of the difference between the two estimates. For each of the three cases, we examine the extent to which re-weighting the OLS estimates of the β j s helps reconcile the difference between the linearly mis-specified OLS and 2SLS estimates. We then test whether schooling is exogenous using both the standard Hausman test and our proposed generalization that accounts for potential non-linearities. Results are reported in Table Columns 1 and 2 reproduce OLS and 2SLS estimates using the same models and similar data used in the original papers. For example, the first row indicates that using the Lochner and Moretti (2004) data for white men, a regression of an indicator for incarceration on years of schooling and controls yields an OLS coefficient equal to , and a 2SLS coefficient equal to The 2SLS estimates use as instrumental variables three indicators for different compulsory schooling ages. column 3. The difference between OLS and 2SLS is reported in The 2SLS estimate is about 10% larger than the OLS estimate (in absolute value), even though most reasonable explanations for the endogeneity of schooling suggest that the OLS estimate should overstate the importance of schooling. The corresponding OLS and 2SLS estimates 17 The instruments used in these examples have been employed in numerous studies examining a wide array of outcomes. See, e.g. Lochner (2011). 18 Details regarding samples and estimating specifications are reported in the bottom of Table 1. In 14

16 for Blacks are and , respectively. There are several well-understood reasons why one might find a larger 2SLS estimate (relative to the OLS estimate), including the presence of measurement error and heterogeneous effects. It is also possible that non-linearity in the incarceration-schooling relationship may play a role. This seems particularly relevant here, since non-linearities appear to be important based on OLS estimates. The bars in Figures 1 and 2 plot OLS estimates of the grade-specific effects β j of moving from j 1 to j years of schooling. If the linearity assumption were correct and these estimates were consistent, all of the estimated ˆβ j should be the same. Instead, the estimated ˆβ j suggest that the marginal effects of different grade transitions vary considerably across years of schooling. Unless there are much stronger biases for some grades than others, the figures suggest strong non-linearities in the effect of schooling on imprisonment, with the strongest effect for high school graduation (11 to 12). Based on these findings, Lochner and Moretti (2004) suggest that high school graduation is an important margin for incarceration among men, but they are hesitant to draw strong conclusions from these general OLS estimates due to concerns about endogeneity. The lines in Figures 1 and 2 report estimates of the OLS and 2SLS weights, as defined in Section 2. These weights are clearly very different for white men: the OLS weights are high for years of schooling between 12 and 16, while the 2SLS weights are highest at 12 years of schooling, implying that the effect of moving from 11 to 12 years of schooling figures prominently in the 2SLS estimates. This is not surprising, since the instruments adopted (compulsory schooling laws) are most effective at shifting schooling levels just before or at high school graduation. For black men, the effect of compulsory schooling is strong at earlier grades, so that the weights are more shifted to the left. In column 4 of Table 1, we re-weight the estimated grade-specific effects ( ˆβ j ) using the 2SLS weights in Figure 1. For whites, the re-weighted OLS estimates are , larger than the 2SLS estimates. The re-weighted OLS estimates are larger, because 2SLS puts more weight on the large β j associated with moving from 11 to 12 years of schooling. For blacks, the re-weighted OLS estimate is smaller, because the 2SLS weights are more shifted to the left and, therefore, put less weight on larger β j. The last three columns of Table 1 are the most important, since they report on different tests for the exogeneity of schooling. Column 5 presents test statistics and associated p-values for our proposed test of exogeneity (see Theorem 1), which is valid in the presence of non-linearities. Columns 6 and 7 present results from the standard Hausman test and the Durbin-Wu-Hausman 15

17 test (applied to the linear-in-schooling specification), respectively, which are both incorrect in the presence of non-linearities. For white men, our test fails to reject, which is quite important in practice, since it suggests that our OLS estimates of the β j in Figure 1 are consistent. Given a high first stage F-statistic of and the fact that the re-weighted OLS estimate is very close to the 2SLS estimate (a difference of less than 10%), we can feel confident in concluding from our OLS estimates of β j that high school completion has the greatest effect on incarceration rates while college attendance has much weaker effects. This is extremely useful, since with only three available instruments, it is impossible to estimate all 20 β j parameters using 2SLS. Indeed, it is not possible to precisely estimate highly restricted two-parameter non-linear models using 2SLS. Fortunately, our test suggests that this is not necessary in this context. The case of incarceration for black men is different: our test strongly rejects the hypothesis that the re-weighted OLS and 2SLS estimates are the same (p-value of.0005), while the standard Hausman test fails to reject. Re-weighting the OLS estimates for the β j parameters reveals that the OLS estimates are significantly biased towards zero, on average, since the re-weighted OLS estimate is compared to the 2SLS estimate of Unfortunately, we cannot draw any strong conclusions about the relative importance of different grades due to these biases. In general, these findings show the empirical relevance of our concerns that, in the presence of non-linearity, the standard Hausman test can fail to detect an endogeneity problem when one exists. In the second panel, we turn to estimates of the effect of maternal schooling on infant health from Currie and Moretti (2003). The instrument in this case is an indicator for college proximity. (First stage F-statistics for the instruments are ) In this case, the re-weighted OLS estimates (column 4) are generally quite similar to the OLS estimates (column 1). Looking at Figures 3 and 4, it is clear why: the OLS and 2SLS weights are nearly identical. Not surprisingly, our test and the standard Hausman test produce very similar test statistics and the same conclusions: exogeneity cannot be rejected for either child health outcome. Finally, in the bottom panel, we turn to estimates of the private return to schooling using three dummies for compulsory schooling as instruments. While this analysis is based on that of Acemoglu and Angrist (2000), we consider the effects of schooling on log annual earnings rather than weekly wages for white men in their 40s. Figure 5 reports the OLS estimates of the β j parameters as well as the OLS and 2SLS weights. OLS estimates indicate that an additional year of schooling translates into an 8.2% increase in annual earnings, while the 2SLS estimates suggest 16

18 a much larger return. The re-weighted OLS estimates fall in between the OLS and 2SLS estimates, although they are much closer to the OLS estimates. The effect of re-weighting is minor despite substantially different OLS and 2SLS weights. Our test rejects the hypothesis that the re-weighted OLS and 2SLS estimates are equal, even though the instruments are not particularly strong in this application (the first stage F-statistic for the instruments is only 29.5). 5 Conclusions In applied work, it is often the case that OLS and IV estimates differ, and sometimes the direction of the difference is not what one might expect based on economic theory and plausible assumptions on the direction of endogeneity bias. Influential work by Angrist and Imbens (1994, 1995) and Heckman and Vytlacil (2005) has clarified the interpretation of IV estimates as a local average treatment effect when the regression parameter of interest is heterogenous. Our work complements the existing understanding of the differences between IV and OLS estimates when the model is mis-specified. We focus on the case where the true model is non-linear, but the researcher estimates a linear model. This case has become increasingly relevant as the growing emphasis on the validity of instruments has led many empirical researchers to estimate linear relationships with very few instruments. Yet, in many instances the true relationship between the dependent and independent variables may be quite non-linear, as is frequently suggested by more general specifications estimated using OLS. 19 The main contribution of this paper is to develop a simple generalization of the Hausman test to assess whether different weighting and non-linearity explains the difference between linear IV/2SLS and OLS estimators. Under fairly weak conditions, this serves as a specification test for exogeneity of the regressor in a general non-linear context. A particularly appealing feature of our test is that it only requires a single instrument, the primary reason many researchers turn to linear models rather than estimate more general non-linear models that can be estimated using OLS. Our test offers researchers the ability to estimate general non-linear models using OLS and then easily test whether those estimates are consistent. 19 Of course, mis-specification in the structural equation is only one of many concerns for valid inference. Other concerns include the strength and exogeneity of the instrument(s). Our approach, assumes away instrument-related problems, instead addressing problems associated with mis-specification in the structural equation if at least one exogenous instrument is available. 17

19 References Daron Acemoglu and Joshua D. Angrist. How Large are Human-Capital Externalities? Evidence from Compulsory-Schooling Laws, in NBER Macroeconomics Annual 2000, Vol. 15, 9 74, National Bureau of Economic Research, Angrist, D. Joshua, Kathryn Graddy, and Guido W. Imbens. The Interpretation of Instrumental Variables Estimators in Simultaneous Equations Models with an Application to the Demand for Fish, Review of Economic Studies, 67, , Angrist, D. Joshua, and Guido W. Imbens. Two-Stage least Squares Estimation of Average causal Effects in Models with variable Treatment Intensity, JASA, 90, , Card, David. Using Geographic Variation in College Proximity to Estimate the Return to Schooling in Aspects of Labour Economics: Essays in Honour of John Vanderkamp, edited by Louis Christofides, E. Kenneth Grant and Robert Swindinsky. University of Toronto Press Card, David. The Causal Effect of Education on Earnings. In Orley Ashenfelter and David Card, editors, Handbook of Labor Economics Volume 3A. Amsterdam: Elsevier, Carneiro, Pedro, James Heckman and Edward Vytlacil. Evaluating Marginal Policy Changes and the Average Effect of Treatment for Individuals at the Margin, Econometrica, 78(1), Currie, Janet, and Enrico Moretti. Mother s Education and the Intergenerational Transmission of Human Capital: Evidence from College Openings, Quarterly Journal of Economics, 118(4), Guggenberger, Patrik. The Impact of a Hausman Pretest on the Asymptotic Size of a Hypothesis Test, Econometric Theory, 26, , Hausman, J.A. Specification Tests in Econometrics, Econometrica, 46(6), , Heckman, James J., and Edward Vytlacil, Instrumental Variables Methods for the Correlated Random Coefficient Model, Journal of Human Resources, 33(4), Heckman, James J., and Edward Vytlacil. Structural Equations, Treatment Effects, and Econometric Policy Evaluation, Econometrica, Econometric Society, vol. 73(3), , Heckman, James J., Lance Lochner, and Petra Todd. Earnings Functions and Rates of Return, Journal of Human Capital, 2(1), Imbens, Guido, and Joshua Angrist. Identification and Estimation of Local Average Treatment Effects, Econometrica, 62(2),

20 Hungeford, Thomas, and Gary Solon. Sheepskin Effects in the Returns to Education, Review of Economics and Statistics, Jaeger, David, and Marianne Page. Degrees matter: New Evidence on Sheepskin Effects in the Returns to Education, Review of Economics and Statistics, 78(4), Kling, Jeffrey. Interpreting Instrumental Variables Estimates of the Returns to Schooling, Journal of Business and Statistics, Lochner, Lance. Nonproduction Benefits of Education: Crime, Health, and Good Citizenship, in E. Hanushek, S. Machin, and L. Woessmann (eds.), Handbook of the Economics of Education, Vol. 4, Ch. 2, Amsterdam: Elsevier Science, Lochner, Lance, and Enrico Moretti. The Effect of Education on Criminal Activity: Evidence from Prison Inmates, Arrests and Self-Reports, NBER Working Paper No. 8605, Lochner, Lance, and Enrico Moretti. The Effect of Education on Criminal Activity: Evidence from Prison Inmates, Arrests and Self-Reports, American Economic Review. 94(1), Lochner, Lance, and Enrico Moretti. Estimating and Testing Non-Linear Models Using Instrumental Variables NBER Working Paper No , Moffitt, Robert. Estimating Marginal Treatment Effects in Heterogeneous Populations, Annales deconomie et de Statistique, Special Issue on Econometrics of Evaluation, Fall Mogstad, Magne, and Matthew Wiswall. Linearity in Instrumental Variables Estimation: Problems and Solutions, Working Paper, Park, Jin Heun. Estimation of Sheepskin Effects Using the Old and New Measures of Educational Attainment in the CPS, Economic Letters 62, White, Halbert, Consequences and Detection of Misspecified Nonlinear Regression Models, Journal of the American Statistical Association, 76, , Wong, Ka-fu. Effects on Inference of Pretesting the Exogeneity of a Regressor, Economic Letters, 56, , Wooldridge, Jeffrey. On Two Stage Least Squares Estimation of the Average Treatment Effect in Random Coefficient Models, Economics Letters, 56, Yitzhaki, Shlomo. On Using Linear Regressions in Welfare Economics, Journal of Business and Economic Statistics, 14, ,

Estimating and Testing Non-Linear Models Using Instrumental Variables

Estimating and Testing Non-Linear Models Using Instrumental Variables Lance Lochner University of Western Ontario NBER Enrico Moretti University of California Berkeley NBER, CEPR and IZA April 29, 2011