Topics in Applied Econometrics and Development - Spring 2014

Size: px

Start display at page:

Download "Topics in Applied Econometrics and Development - Spring 2014"

Isaac Roberts
5 years ago
Views:

1 Topic 2: Topics in Applied Econometrics and Development - Spring 2014

2 Single-Equation Linear Model The population model is linear in its parameters: y = β 0 + β 1 x 1 + β 2 x β K x K + u - y, x 1, x 2, x 3,..., x K : observable random scalars (we can observe in a random sample of the population) - u: unobservable random disturbance or error - β 0, β 1, β 2,..., β K : parameters (constants) we want to estimate

3 The error term u can consist of a variety of things, including: - omitted variables - measurement error Key conditions for OLS to consistently estimate the β j : E (u) = 0: the error (in the population) has mean zero Cov(x j ; u) = 0, j = 1, 2,..., K: the error (in the population) is uncorrelated with each of the regressors

4 E (u) = 0: assumption is without loss of generailty (WLOG) when an intercept is included (we consider this case in what follows) Sufficient for Cov(x j ; u) = 0, j = 1, 2,..., K is to have: E (u x 1, x 2,..., x K ) = E (u x) = 0 (zero conditional mean assumption)

5 Under the population model y = β 0 + β 1 x 1 + β 2 x β K x K + u and assumption E (u x 1, x 2,..., x K ) = 0, we have the population regression function: E (y x 1, x 2,..., x K ) = β 0 + β 1 x 1 + β 2 x β K x K

6 The population regression function E (y x 1, x 2,..., x K ) = β 0 + β 1 x 1 + β 2 x β K x K includes the case where the x j are nonlinear functions of underlying explanatory variables Example: E (savings income, size, age, college) = β 0 + β 1 log(income) + β 2 size + β 3 age + β 4 college + β 5 college.age

7 x j is endogenous if it is correlated with u x j is exogenous if it is uncorrelated with u Endogeneity usually arises in 3 ways: 1) Omitted 2) Measurement Error 3) Simultaneity

8 1) Omitted variables Omitted variables: we want to control for one or more additional variables but, usually because of data unavailability, we cannot include them in a regression model. Suppose that E (y x, q) is the conditional expectation of interest, which can be written as a function linear in parameters and additive in q. - q is unobserved - we can estimate E (y x), but this will have no particular relationship to E (y x, q) when q and x are correlated.

9 We can represent this situation as follows: y = β 0 + β 1 x 1 + β 2 x β K x K + (q + u) - q is part of the error term - q and x j correlated x j endogenous

10 We can represent this situation as follows: y = β 0 + β 1 x 1 + β 2 x β K x K + (q + u) - q is part of the error term - q and x j correlated x j endogenous Could you give an example in which this problem in found?

11 Correlation of explanatory variables with unobservables is often due to self-selection. If agents choose the value of x j, this might depend on factors q that are unobservable to the analyst. Example: - Omitted ability in a wage equation. An individual s years of schooling are likely to be correlated with unobserved ability.

12 Example: Wage Equation with Unobserved Ability Write the wage equation as: log(wage) = β 0 + β 1 exper + β 2 exper 2 + β 3 educ + γabil + v - v is such that E (v exper; educ, abil) = 0 - Let abil be uncorrelated with exper and exper 2 once educ has been partialed out: abil = δ 0 + δ 3 educ + r with r uncorrelated with exper and exper 2 - Then: plim ˆβ 3 = β 3 + γδ 3. the coefficients on exper and exper 2 are consistently estimated by the OLS regression that omits ability - If δ 3 > 0 then plim ˆβ 3 > β 3 (because γ > 0 by definition) the return to education is overestimated in large samples.

13 2) Measurement Error Measurement Error: we want to measure the effect of a variable, say xk, but we can observe only an imperfect measure of it, say x K. When we plug x K in for xk, we put a measurement error into u. - u and x K may be correlated.

14 Remark Measurement error is an issue only when the variables on which we can collect data differ from the variables that influence decisions by individuals, families, firms, and so on. Example: - Suppose we are estimating the effect of peer group behavior on teenage drug usage - The behavior of a teenager s peer group is self-reported - Self-reporting may be a mismeasure of actual peer group behavior. BUT: no problem - We are more interested in the effects of how a teenager perceives his peer group.

15 Measurement Error in the Dependent Variable Suppose the dependent variable is the only variable measured with error. y : the variable that we would like to explain. The regression model is linear: y = β 0 + β 1 x 1 + β 2 x β K x K + v For example, y could be annual family saving.

16 We are interested in E (y x 1, x 2,..., x K ) y: observable measure of y (y = y ) e 0 = y y : population measurement error

17 From y = β 0 + β 1 x 1 + β 2 x β K x K + v And: We can write: y = y e 0 y = β 0 + β 1 x 1 + β 2 x β K x K + v + e 0

18 Since y, x 1, x 2,..., x K are observed, we can estimate this model by OLS. We just ignore that y is an imperfect measure of y and proceed as usual. Does OLS with y in place of y produce consistent estimators of the β j? - Important: what we assume about the relationship between the measurement error e 0 and the explanatory variables x j - Usual assumption: measurement error in y is statistically independent of each explanatory variable (e 0 is uncorrelated with x) OLS estimators are consistent. Further, the usual OLS inference procedures (t statistics, F statistics) are asymptotically valid

19 Measurement Error in an Explanatory Variable Measurement Error in an Explanatory Variable: considered a much more important problem than measurement error in the response variable. Consider the model with a single explanatory measured with error: y = β 0 + β 1 x 1 + β 2 x β K x K + v - y, x 1, x 2,..., x K 1 : observable - xk : not observable - Assume: 1) v has zero mean; 2) v is uncorrelated with y, x 1, x 2,..., x K 1, x K If xk were observed, OLS estimation would produce consistent estimators.

20 We observe x K, a measure of x K Assumption: v is also uncorrelated with x K The measurement error in the population is: e K = x K x K and this can be positive, negative, or zero. Assume: E (e K ) = 0 (average measurement error in the population is zero). This has no practical consequences because we include an intercept in the model. Since v is assumed to be uncorrelated with x K and x K, v is also uncorrelated with e K.

21 Usual assumption: e K is uncorrelated with the explanatory variables not measured with error: E (x j e K ) = 0, j = 1,..., K 1. The key assumptions involve the relationship between the measurement error and x K and x K 2 assumptions in the econometrics literature

22 1st assumption: e K is uncorrelated with the observed measure, x K : Cov(x K, e K ) = 0 e K correlated with the unobserved variable x K

23 From y = β 0 + β 1 x 1 + β 2 x β K x K + v And x K = x K e K We have y = β 0 + β 1 x 1 + β 2 x β K x K + (v β K e K ) We assumed: v and e K both have zero mean and are uncorrelated with each x j (including x K ) v β K e K has zero mean and is uncorrelated with the x j OLS estimation with x K in place of xk produces consistent estimators of all of the β j

24 Since v is uncorrelated with e K, the variance of the error is Var(v β K e K ) = σ 2 v + β 2 K σ2 e K measurement error increases the error variance. Not surprising and doesn t violate any of the OLS assumptions Since in this case OLS has all its nice properties, this is not usually what econometricians have in mind when referring to measurement error in an explanatory variable.

25 2nd assumption: measurement error is uncorrelated with the unobserved explanatory variable: Cov(x K, e K ) = 0 This is the Classical errors-in-variables (CEV) assumption This assumption comes from writing the observed measure as the sum of the true explanatory variable and the measurement error: x K = x K + e K and assuming the two components of x K are uncorrelated We still maintain the assumptions that v is uncorrelated with x K and x K, and therefore with e K

26 Cov(xK, e K ) = 0 and x K = xk + e K x K and e K must be correlated: Cov(x K, e K ) = E (x K e K ) = E (xk e K ) + E (ek 2 ) = σ 2 e K From y = β 0 + β 1 x 1 + β 2 x β K x K + (v β K e K ), we see that correlation between x K and e K causes problems for OLS. OLS regression of y on x 1, x 2,..., x K gives inconsistent estimators of all of the β j

27 If x K is uncorrelated with x j, all j = K, then so is x K. It follows that plim ˆβ j = β j, all j = K. It is possible to show that ( plim ( ˆβ k ) = β k σ 2 ) r K σ 2 r K + σ 2 e K where rk is the linear projection error in xk = δ 0 + δ 1 x 1 + δ 2 x δ K 1 x K 1 + rk ( ) σ 2 r K Since 0 < < 1, plim ˆβ σ 2 r K +σ 2 e k = β k. This is the K attenuation bias in OLS due to classical errors-in-variables

28 Attenuation bias In large samples, the estimated OLS effect will be attenuated as a result of the presence of classical errors-invariables. If β k > 0, ˆβ k will underestimate β k If β k < 0, ˆβ k will overestimate β k

29 Example: Measurement Error in Family Income Problem: Estimate the causal effect of family income on college grade point average, after controlling for high school grade point average and SAT score: colgpa = β 0 + β 1 famic + β 2 hsgpa + β 3 SAT + v - faminc is actual annual family income Precise data on colgpa, hsgpa, and SAT are relatively easy to obtain from school records. But family income, especially as reported by students, could be mismeasured.

30 Suppose faminc = faminc + e 1, and the CEV assumptions hold (faminc and e 1 correlated) using reported family income in place of actual family income will bias the OLS estimator of β 1 toward zero. Consequence: hypothesis test of H 0 : β 1 = 0 will have a higher probability of Type II error (one fails to reject a false null hypothesis)

31 3) Simultaneity Simultaneity: at least one of the explanatory variables is determined simultaneously along with y. If x K is determined partly as a function of y, then x K and u are generally correlated.

32 3) Simultaneity Simultaneity: at least one of the explanatory variables is determined simultaneously along with y. If x K is determined partly as a function of y, then x K and u are generally correlated. Could you give an example in which this problem in found?

33 Example - y: city murder rate - x K : size of the police force - The size of the police force is partly determined by the murder rate.

34 Remarks The distinctions among the 3 forms of endogeneity are not always sharp. An equation can have more than one source of endogeneity.

35 Remarks The distinctions among the 3 forms of endogeneity are not always sharp. An equation can have more than one source of endogeneity. Example: - Suppose we are looking at the effect of alcohol consumption on worker productivity (measured by wages) - We would worry that: 1) Alcohol usage is correlated with unobserved factors that also affect wage (e.g. family background) omitted variables 2) Alcohol demand generally depends on income simultaneity 3).Alcohol usage may be unprecisely measured measurement error

36 Motivation for Consider again the linear population model y = β 0 + β 1 x 1 + β 2 x β K x K + u - E (u) = 0: the error (in the population) has mean zero - Cov(x j ; u) = 0, j = 1, 2,..., K 1: the error (in the population) is uncorrelated with each of the K 1 regressors - BUT: x K might be correlated with u. Hence: - x 1, x 2,..., x K 1 : exogenous - x K : endogenous

37 The endogeneity can come from any of the sources we discussed: 1) Omitted ; 2) Measurement Error; 3) Simultaneity To fix ideas: think of u as containing an omitted variable that is uncorrelated with all explanatory variables except x K As we saw: Cov(x K, u) = 0 OLS results inconsistent

38 variables (IV) provide a general solution to the problem of an endogenous explanatory variable. IV approach with x K endogenous: - we need an observable variable, z 1, not in the population model that satisfies 2 conditions 1st condition - Cov(z 1, u) = 0: z 1 must be uncorrelated with u In other words: z 1 is exogenous in the population model This is the exogeneity condition.

39 2nd condition - Consider the linear projection of x K onto all the exogenous variables: x K = δ 0 + δ 1 x 1 + δ 2 x δ K 1 x K 1 + θ 1 z 1 + r K - Key assumption: θ 1 = 0 (the coefficient on z 1 is nonzero) - This condition is often loosely described as z 1 is correlated with x K That statement is not quite correct. The condition θ 1 = 0 means that z 1 is partially correlated with x K once the other exogenous variables x 1,..., x K 1 have been netted out.

40 When z 1 satisfies Cov(z 1, u) = 0 and θ 1 = 0, then it is an instrumental variable (IV) candidate for x K. (or, simply, z 1 is an instrument for x K ) The linear projection x K = δ 0 + δ 1 x 1 + δ 2 x δ K 1 x K 1 + θ 1 z 1 + r K is called a reduced form equation for the endogenous explanatory variable x K. - The reduced form involves writing an endogenous variable as a linear projection onto all exogenous variables. - This terminology also conveys that here is nothing necessarily structural about this equation

41 From the structural equation: y = β 0 + β 1 x 1 + β 2 x β K x K + u And the reduced form for x K : x K = δ 0 + δ 1 x 1 + δ 2 x δ K 1 x K 1 + θ 1 z 1 + r K We obtain a reduced form for y: y = α 0 + α 1 x 1 + α 2 x α K 1 x K 1 + λ 1 z 1 + v where: - v = u + β K r K : reduced form error - α j = β j + β K δ j - λ 1 = β K θ 1

42 By the assumptions, v is uncorrelated with all explanatory variables in the reduced form equation y = α 0 + α 1 x 1 + α 2 x α K 1 x K 1 + λ 1 z 1 + v OLS consistently estimates the reduced form parameters α j and λ 1 Estimates of the reduced form parameters are sometimes of interest in their own right. But estimating the structural parameters is generally more useful.

43 Example Suppose: - x K : job training hours per worker - y: measure of average worker productivity. - job training grants were randomly assigned to firms. It is natural to use for z 1 either: 1) a binary variable indicating whether a firm received a job training grant 2) the actual amount of the grant per worker (if the amount varies by firm)

44 The parameter β K in the structural equation y = β 0 + β 1 x 1 + β 2 x β K x K + u is the effect of job training on worker productivity. The parameter λ 1 in the reduced form equation y = α 0 + α 1 x 1 + α 2 x α K 1 x K 1 + λ 1 z 1 + u is the effect of receiving the job training grant on worker productivity, which is of some interest. (But estimating the effect of an hour of general job training is more valuable)

45 We can show that the assumptions we have made on the IV z 1 solve the identification problem for the β j in the structural equation y = β 0 + β 1 x 1 + β 2 x β K x K + u Identification means that we can write the β j in terms of population moments in observable variables.

46 Write the structural equation y = β 0 + β 1 x 1 + β 2 x β K x K + u as y = xβ + u where x (1, x 2,..., x K ). Write z (1, x 2,..., x K 1, z 1 ). (vector of all exogenous variables)

47 Since by assumption Cov(x j ; u) = 0, j = 1, 2,..., K 1 and Cov(z 1, u) = 0, we have: E (z u) = 0 Multiply equation y = xβ + u through by z : z y = z xβ + z u Taking expectations: E (z y) = E (z xβ) + E (z u) E (z x)β = E (z y) where E (z x) is K K and E (z y) is K 1

48 Equation E (z x)β = E (z y) represents a system of K linear equations in the K unknowns β 1, β 2,..., β K. This system has a unique solution the K K matrix E (z x) has full rank: rank E (z x) = K (need θ 1 = 0 for this rank condition to hold) In this case, the solution is β = [E (z x)] 1 E (z y) This equation identifies the vector β

49 Let {(x i, y i, z i1 ) : i = 1, 2,..., N} be a random sample from the population The IV estimator of β is ˆβ = ( N 1 N z i x i i=1 ) 1 ( ) N 1 N z i y i i=1 which we can write as - Z: N K data matrix - X: N K data matrix - Y: N 1 data vector on the y i. ˆβ = ( Z X ) 1 Z Y

50 Remarks When searching for instruments, conditions Cov(z 1, u) = 0 and θ 1 = 0 are equally important in identifying β However: one practically important difference between these: - condition θ 1 = 0 can be tested - condition Cov(z 1, u) = 0 must be maintained. The reason is that u is unobservable. Condition θ 1 = 0 can and should be tested. In fact, the strength of the rejection in condition θ 1 = 0 (in a p-value sense) is important for determining the finite sample properties, particularly the bias, of the IV estimator.

51 Example: for Education in a Wage Equation Consider a wage equation for the U.S. population: log(wage) = β 0 + β 1 exper + β 2 exper 2 + β 3 educ + u where u is thought to be correlated with educ because of omitted ability. Suppose that we collect data on mother s education: motheduc

52 For motheduc to be a valid instrument for educ we need: 1) Assume that motheduc is uncorrelated with u 2) Have that θ 1 = 0 in the reduced form equation educ = δ 0 + δ 1 exper + δ 2 exper 2 + θ 1 motheduc + r There is little doubt that educ and motheduc are partially correlated. Also, this correlation is easily tested given a random sample from the population. Potential problem with motheduc as an instrument for educ: - motheduc might be correlated with the omitted factors in u: mother s education is likely to be correlated with child s ability and other family background characteristics that might be in u.

53 Suppose we want the last digit of one s social security. we have a poor IV candidate for the opposite reason. The last digit is randomly determined it is independent of other factors that affect earnings.(cov(lastdigit, u) = 0 hold) BUT: The last digit is also independent of education (θ 1 = 0 does not hold)

54 Challange: come up with convincing instruments. Angrist and Krueger (1991) propose using quarter of birth as an IV for education. In the simplest case, let : - frstqrt: dummy variable (= 1 for people born in the 1st quarter of the year and 0 otherwise) Quarter of birth is arguably independent of unobserved factors (such as ability) that affect wage In addition, we must have θ 1 = 0 in the reduced form: educ = δ 0 + δ 1 exper + δ 2 exper 2 + θ 1 frstqrt + r

55 Compulsory school attendence laws induce a relationship between educ and frstqrt: - At least some people are forced, by law, to attend school longer than they otherwise would, and this fact is correlated with quarter of birth. We can determine the strength of this association in a particular sample by estimating the reduced form and obtaining the t statistic for H 0 : θ 1 = 0.

56 Hence: it can be very difficult to find a good IV because the variable must satisfy 2 different, often conflicting, criteria. For motheduc, the issue in doubt is whether condition Cov(motheduc, u) = 0 holds. For frstqrt, the concern is with condition θ 1 = 0. Since this can be tested, frstqrt has more appeal as an instrument. - However, the partial correlation between educ and frstqrt is small, and this can lead to finite sample problems

57 Another issue: the sense in which we are estimating the return to education for the entire population of working people. Suppose: - the return to education is not constant across people - we use frstqrt as an IV to estimate the return to education Then: the IV results estimate the return only for those people induced to obtain more schooling because they were born in the 1st quarter of the year. These make up a relatively small fraction of the population.

58 Convincing instruments sometimes arise in the context of program evaluation Suppose: individuals are randomly selected to be eligible for the program. - Examples: job training programs and school voucher programs. Actual participation is almost always voluntary, and it may be endogenous because it can depend on unobserved factors that affect the response. However: reasonable to assume that eligibility is exogenous. Because job training and eligibility are correlated, the eligibility can be used as an IV for job training.

59 A common source of instrumental variables are natural experiments. Natural experiment: when some (often unintended) feature of the setup we are studying produces exogenous variation in an otherwise endogenous explanatory variable. Example: Angrist and Krueger (1991) quarter of birth seems, at least initially, to be a good natural experiment.

60 Sensible IVs need not come from natural experiments. Economists often use regional variation in prices or taxes as instruments for endogenous explanatory variables appearing in individual-level equations Example: - Suppose we want to estimating the effects of alcohol consumption on performance in college - the local price of alcohol can be used as an IV for alcohol consumption provided other regional factors that affect college performance have been appropriately controlled for. - Idea: the price of alcohol can be assumed to be exogenous to each individual.

61 Example: College Proximity as an IV for Education Card (1995): use wage data for 1976 Dummy variable that indicates whether a man grew up in the vicinity of a four-year college as an IV for years of schooling. Also includes several other controls: experience and experience square, a black indicator, southern and urban indicators, and regional and urban indicators IV estimate of the return to schooling: 13.2% OLS estimate of the return to schooling: 7.5%

62 Thus, the IV estimate is almost 2 the OLS estimate. Counterintuitive result if we thought that an OLS suffered from an upward omitted variable bias. One possibility: OLS estimators suffer from the attenuation bias as a result of measurement error (But the classical errors-in-variables assumption for education is questionable). Another possibility: instrument is not exogenous in the wage equation: location is not entirely exogenous.

63 Multiple Instruments: Two-Stage Least Squares Consider again the linear population model y = β 0 + β 1 x 1 + β 2 x β K x K + u - E (u) = 0: the error (in the population) has mean zero - Cov(x j ; u) = 0, j = 1, 2,..., K 1: the error (in the population) is uncorrelated with each of the K 1 regressors - x K can be correlated with u. Now: assume that we have more than one instrumental variable for x K.

64 Let z 1, z 2,..., z M be variables such that Cov(z h, u) = 0, h = 1, 2,..., M so each z h is exogenous in the population model If each of the z h has some partial correlation with x K, we could have M different IV estimators. Actually, there are many more than this (more than we can count) since any linear combination of x 1, x 2,..., x K 1, z 1, z 2,..., z M is uncorrelated with u. The question is: Which IV estimator should we use?

65 The two-stage least squares (2SLS) estimator is the most efficient IV estimator. Define the vector of exogenous variables again by z (1, x 1, x 2,..., x K 1, z 1,..., z M ) -This is a 1 L vector (L = K + M). Out of all possible linear combinations of z that can be used as an instrument for x K, the method of 2SLS chooses that which is most highly correlated with x K. The linear combination of z most highly correlated with x K is given by the linear projection of x K on z.

66 Write the reduced form for x K as x K = δ 0 + δ 1 x 1 + δ 2 x δ K 1 x K 1 +θ 1 z θ M z M + r K - By definition: r K has zero mean and is uncorrelated with each RHS variable. As any linear combination of z is uncorrelated with u, x K δ 0 + δ 1 x 1 + δ 2 x δ K 1 x K 1 +θ 1 z θ M z M is uncorrelated with u.

67 If we could observe x K, we would use it as an instrument for x K in the structural equation However: δ j and θ j are population parameters xk usable instrument. is not a But: we can consistently estimate the parameters in equation x K = δ 0 + δ 1 x 1 + δ 2 x δ K 1 x K 1 +θ 1 z θ M z M + r K

68 The sample analogues of the xik OLS fitted values: for each observation i are the ˆx ik = ˆδ 0 + ˆδ 1 x i1 + ˆδ 2 x i ˆδ K 1 x i,k 1 +ˆθ 1 z i ˆθ M z im For each observation i, define the vector ˆx i (1, x i1, x i2,..., x i,k 1, ˆx i,k ), i = 1, 2,..., N. Using ˆx i as the instruments for x i gives the IV estimator ˆβ = = ( N ) 1 ( N ˆx i x i ˆx i y i i=1 i=1 ) 1 (ˆX X ˆX Y )

69 ˆβ can be obtained from the following steps: 1. First-stage regression: Obtain the fitted values ˆx K from the regression x K on 1, x 1,..., x K 1, z 1,..., z M (i subscript is omitted for simplicity) 2. Second-stage regression: Run the OLS regression This 2nd stage produces the ˆβ. y on 1, x 1,..., x K 1, ˆx K

70 Alert: common harmful mistake It is possible to show that the following seemingly sensible, two-step procedure is generally inconsistent: 1st stage: regress x K on 1, z 1,..., z M and obtain the fitted values, say x K. 2nd stage: run the regression y on 1, x 1,..., x K 1, x K. You CANNOT omit x 1,..., x K 1 in the 1st-stage regression. It is best to use a software package with a 2SLS command rather than explicitly carry out the two-step procedure.

71 Remember that in the case of 1 endogenous variable and 1 instrument, we had the condition that in the linear projection of x K onto all the exogenous variables: x K = δ 0 + δ 1 x 1 + δ 2 x δ K 1 x K 1 + θ 1 z 1 + r K we had θ 1 = 0 (the coefficient on z 1 is nonzero) What is the analogue of this condition when more than 1 instrument is available with 1 endogenous explanatory variable?

72 When more than 1 instrument is available with 1 endogenous explanatory variable we need: At least one of the θ j in equation x K = δ 0 + δ 1 x 1 + δ 2 x δ K 1 x K 1 +θ 1 z θ M z M + r K be = 0. That is: we need at least 1 exogenous variable that does not appear in the structural equation to induce variation in x K that cannot be explained by x 1,..., x K 1. Remark: Identification of β does not depend on the values of the δ h in this equation.

73 To test this condition, we simply test the null hypothesis H 0 : θ 1 = 0, θ 2 = 0,..., θ M = 0 against the alternative that at least one of the θ j is = 0. This test gives a compelling reason for explicitly running the 1st-stage regression. A standard F statistic can be used to test this hypothesis. If we cannot reject the this hypothesis against the alternative that at least one of the θ j is = 0, then we should have serious reservations about the 2SLS procedure: the instruments do not pass a minimal requirement.

74 The model with: - 1 endogenous variable - M > 1 instruments is overidentified (there are M 1 overidentifying restrictions) If each z h has some partial correlation with x K, then we have M 1 more exogenous variables than needed to identify the parameters in the structual equation

75 Example: - 1 endogenous variable - M = 2 There is 1 overidentifying restriction. could discard 1 of the instruments and still achieve identification

76 Testing Overidentifying Restrictions Suppose we have more instruments than we need to identify an equation Then: we can test whether the additional instruments are valid in the sense that they are uncorrelated with u 1. Write the equation in the form: y 1 = z 1 δ 1 + y 2 α 1 + u 1 - y 2 : 1 G 1 vector of endogenous variables in the population model - z 1 : 1 L 1 - z 2 : 1 L 2 - z: 1 L vector of all exogenous variables (z = (z 1, z 2 )) - L 2 > G 1 : the model is overidentified

77 We could use any 1 G 1 subset of z 2 as instruments for y 2 in estimating equation y 1 = z 1 δ 1 + y 2 α 1 + u 1 Hausman (1978) suggested comparing the 2SLS estimator using all instruments to 2SLS using a subset that just identifies the equation If all instruments are valid, the estimates should differ only as a result of sampling error.

78 Test the validity of all overidentification restrictions: - based on the observation that the residuals from 2SLS should be uncorrelated with the set of exogenous variables if the instruments are truly exogenous

79 A test for validity of the overidentification restrictions is obtained from the OLS regression û 1 on z where û 1 are the 2SLS residuals using all of the instruments z (simply estimate regression y 1 = z 1 δ 1 + y 2 α 1 + u 1 by 2SLS and obtain the 2SLS residuals, û 1 ). Obtain R 2 u, the usual R-squared. Null hypothesis is that all instruments are valid: E (z u 1 ) = 0 Under the null, NR 2 u χ 2 Q 1, where Q 1 L 2 G 1 is the number of overidentifying restrictions. If we reject the null hypothesis, then our logic for choosing the IVs must be reexamined (1 or more of the IVs are not exogenous). If we fail to reject the null, then we can have some confidence in the overall set of instruments used.

80 Example: Overidentifying Restrictions in the Wage Equation Estimate for working women the equation log(wage) = δ 0 + δ 1 exper + δ 2 exper 2 + α 1 educ + u 1 - educ and u 1 may be correlated Use motheduc, fatheduc, huseduc (mother s education, father s education, husband s education) as instruments for educ in a 2SLS procedure 2 overidentifying restrictions.

81 Let û 1 be the 2SLS residuals from equation log(wage) = δ 0 + δ 1 exper + δ 2 exper 2 + α 1 educ + u 1 using all instruments The test statistic is N times the R-squared from the OLS regression û 1 on 1, exper, exper 2, motheduc, fatheduc, huseduc Under H 0, NR 2 u χ 2 Q 2 Using the data on working women in Wooldrdridge: R 2 u = Overidentification test statistic: p-value: overidentifying restrictions are not rejected at any reasonable level. we can have some confidence in the overall set of instruments used.

82 Example: To test for the validity of z 4 as an IV, assuming that z 3 is a valid IV. - Consider the model: - y 2 is (possibly) endogenous - z 3 and z 4 are IVs y 1 = β 0 + β 1 y 2 + β 2 z 1 + b 3 z 2 + u Test: - We run the 2SLS with z 3 as the only IV - Compute û 3 = y 1 ˆβ 0 ˆβ 1 y 2 ˆβ 2 z 1 ˆβ 3 z 2 - Evaluate the regression model û 3 = δ 0 + δ 1 z 4, in particular, test the significance of z 4. This is a valid test for the validity of z 4 as an IV. BUT it needs to assume that z 3 is a valid IV.

83 Next paper: *Angrist, Joshua D., and Alan B. Krueger and the Search for Identification: From Supply and Demand to Natural Experiments. Journal of Economic Perspectives, 15(4):

84 The earliest applications of instrumental variables involved the estimation of demand and supply curves. Several economists were interested in estimating the elasticities of demand and supply for products ranging from herring (a fish) to butter, usually with time series data. If the demand and supply curves shift over time, the observed data on quantities and prices reflect a set of equilibrium points on both curves. OLS regression of quantities on prices fails to identify that is, trace out either the supply or demand relationship.

85 P.G. Wright (1928) applied instrumental variables to estimating the elasticities of supply and demand for flaxseed, the source of linseed oil. Wright suggested that certain curve shifters what we would now call IVs can be used to address the problem: Such additional factors may be factors which: (A) affect demand conditions without affecting cost conditions or which (B) affect cost conditions without affecting demand conditions. Demand curve shifter: the price of substitute good (such as cottonseed) Supply curve shifter: yield per acre, which is primarily determined by the weather.

86 Wright (1928) observed: Success with this method depends on success in discovering factors of the type A and B. He used 6 different supply shifters to estimate the demand curve and then averaged the 6 instrumental variables estimates. The resulting average elasticity of demand for flaxseed was His average instrumental variables estimate of the elasticity of supply was 2.4. Wright s econometric advance went unnoticed by the subsequent literature. Not until the 1940s were IVs and related methods rediscovered and extended.

87 Wright s (1928) method of averaging the different instrumental variables estimates does not necessarily produce the most efficient estimate Other estimators may combine the information in different instruments to produce an estimate with less sampling variability. The most efficient way to combine multiple instruments is usually the 2SLS (developed by Theil (1953)) 1st stage: the endogenous RHS variable (price in this application) is regressed on all the instruments. 2nd stage: the predicted values of price (based on the data for the instruments and the coefficients estimated from the first-stage regression) are then plugged directly into the equation of interest in place of the endogenous regressor

88 and Measurement Error IV can also overcome measurement error problems in explanatory variables. Measurement error can arise for many reasons: - the limited ability of statistical agencies to collect accurate information - deviation between the variables specified in economic theory and those collected in practice.

89 If an explanatory variable is measured with additive random errors, then the coefficient on that variable in a OLS regression will be biased toward zero in a large sample. The higher the proportion of variability that is due to errors, the greater the bias. variables provide a consistent estimate even in the presence of measurement error.

90 and Omitted IV helps overcome omitted variables problems in estimates of causal relationships. Studies of this type are usually concerned with estimating a narrowly defined causal relationship: - effect of schooling, training or military service on earnings - the impact of smoking or medical treatments on health - the effect of social insurance programs on labor supply - the effect of policing on crime. The observed association between the outcome and explanatory variable of interest in these and many other examples is likely to be misleading in the sense that it partly reflects omitted factors that are related to both variables.

91 If these factors could be measured and held constant in a regression, the omitted variables bias would be eliminated. One solution to the omitted variables problem: assign the variable of interest randomly. - Example: social experiments assign people to a job training program or to a control group. Random assignment assures that participation in the program is not correlated with omitted personal or social factors.

92 Randomized experiments are not always possible: cannot force a randomly chosen group of people to quit smoking On the other hand, it may be possible to find a degree of exogenous variation in variables like schooling, smoking and minimum wages.

93 How can instrumental variables solve the omitted variables problem? - Suppose that we would like to use the following cross-sectional regression equation to measure the return to schooling, denoted ρ: Y i = α + ρs i + βa i + ε i - Y i : person i s log wage - S i : person s highest grade of schooling completed - A i : ability or motivation Data on A i are typically unavailable.

94 Without additional information, the parameter of interest, ρ, is not identified (that is, we cannot deduce it from the joint distribution of earnings and schooling alone). Suppose we have a third variable, the instrument: Z i, which is correlated with schooling, but otherwise unrelated to earnings. (That is, Z i is uncorrelated with the omitted variables and the regression error ε i ) IVs solve the omitted variables problem by using only part of the variability in schooling specifically, a part that is uncorrelated with the omitted variables to estimate the relationship between schooling and earnings.

95 IV methods allow to estimate the coefficient of interest consistently and free from asymptotic bias from omitted variables, without actually having data on the omitted variables or even knowing what they are If there is > 1 valid instrument, the coefficient of interest can be estimated by 2SLS natural experiments : provide instruments that are used to overcome omitted variables bias - situations where the forces of nature or government policy have conspired to produce an environment somewhat akin to a randomized experiment. Good instrument: correlated with the endogenous regressor for reasons the researcher can verify and explain, but uncorrelated with the outcome variable for reasons beyond its effect on the endogenous regressor.

96 Maddala (1977, p. 154) asks, Where do you get such a variable? Angrist & Krueger: good instruments often come from detailed knowledge of the economic mechanism and institutions determining the regressor of interest. In the case of schooling, human capital theory suggests that people make schooling choices by comparing the costs and benefits of alternatives. Possible source of instruments: 1) Differences in costs: loan policies or other subsidies; 2) Institutional constraints.

97 Angrist and Krueger (1991): use of natural experiments to eliminate omitted variables bias. Most states required students to enter school in the calendar year in which they turned 6, school start age is a function of date of birth. Those born late in the year are young for their grade. - December 31st birthday cutoff: - Children born in the 4th quarter enter school at age 5 - Children born in the 1st quarter enter school at age 6 Compulsory schooling laws typically require students to remain in school until their 16th birthdays These groups of students will be in different grades when they reach the legal dropout age.

98 The combination of school start age policies and compulsory schooling laws creates a natural experiment in which children are compelled to attend school for different lengths of time depending on their birthdays. Use data from the 1980 census Look at the relationship between educational attainment and quarter of birth for men born from 1930 to 1959.

100 Figure 1: - younger birth cohorts finished more schooling - men born early in the calendar year tend to have lower average schooling levels This 10-year birth cohort was selected because men this age tend to have a relatively flat age-earnings profile. (But the pattern of less education for men born early in the year holds for men born in the 1940s and 1950s, as well) Individual s date of birth is probably unrelated to the person s innate ability, motivation or family connections date of birth should provide a valid instrument for schooling.

101

102 Figure 2: average earnings by quarter of birth for the same sample. - This figure shows the reduced form relationship between the instruments and the dependent variable. - Earnings rise with work experience. older cohorts tend to have higher earnings - On average, men born in early quarters of the year almost always earn less than those born later in the year. - This reduced form relationship parallels the quarter-of-birth pattern in schooling. Figure 1 and 2: it is clear that the differences in education and earnings associated with quarter of birth are discrete blips, rather than smooth changes related to the gradual effects of aging.

103 Intuition behind the IV: - differences in earnings by quarter of birth are assumed to be accounted for solely by differences in schooling by quarter of birth the estimated return to schooling is simply the appropriately rescaled difference in average earnings by quarter of birth Only a small part of the variability in schooling the part associated with quarter of birth is used to identify the return to education.

104 They find: - men born in the first quarter have 1/10 of a year less schooling than men born in later quarters - men born in the first quarter earn about 0.1 percent less than men born in later quarters - The ratio of the difference in earnings to the difference in schooling, about 0.10, is an IV estimate of the proportional earnings gain from an additional year of schooling.

105 Need a well-developed story or model motivating the choice of instruments. These stories have implications that can be used to support or refute a behavioral interpretation of the IV estimates. Example: the interpretation of Figures 1 and 2 as resulting from the interaction of school start-age policy and compulsory schooling - Support for this interpretation: quarter of birth is unrelated to earnings and educational attainment for those with a college degree or higher. - The college degree or higher people are unconstrained by compulsory schooling laws if quarter of birth was related to education or earnings in this sample, the rationale motivating the use of quarter of birth as an instrument would have been refuted.

106 Interpreting Estimates with Heterogeneous Responses Difficulty in interpreting IV estimates: not every observation s behavior is affected by the instrument. IV methods use only part of the variation in an explanatory variable that is, by change the behavior of only some people. Example: Angrist and Krueger (1991) study - the quarter-of-birth instrument is most relevant for those who are at high probability of quitting school as soon as possible, with little or no effect on those who are likely to proceed on to college.

107 In other words: - IVs provide an estimate for a specific group namely, people whose behavior can be manipulated by the instrument. In the example: The quarter-of-birth instruments used by Angrist and Krueger (1991) generate an estimate for those whose level of schooling was changed by that instrument.

108 Angrist and Krueger (1991) view: - IVs often solve the first-order problem of eliminating omitted variables bias for a well-defined population. - Since the sample size and range of variability in many empirical studies are quite limited, extrapolation to other populations is naturally somewhat speculative (A fertilizer that helps corn to grow in Iowa will probably have a beneficial effect in California as well, though one can t be sure.) - Existence of heterogeneous treatment effects would be a reason for analyzing more natural experiments, not fewer, to understand the source and extent of heterogeneity in the effect of interest.

109 Besides that: - the population one learns about in a natural experiment is often of intrinsic interest - Example: Angrist and Krueger (1991) IV estimates are relevant for assessing the economic rewards to increases in schooling induced by legal and institutional changes from policies designed to keep children from dropping out of high school

Problem Set # 1. Master in Business and Quantitative Methods

Problem Set # 1. Master in Business and Quantitative Methods Problem Set # 1 Master in Business and Quantitative Methods Contents 0.1 Problems on endogeneity of the regressors........... 2 0.2 Lab exercises on endogeneity of the regressors......... 4 1 0.1 Problems