Topics in Applied Econometrics and Development - Spring 2014

Size: px
Start display at page:

Download "Topics in Applied Econometrics and Development - Spring 2014"

Transcription

1 Topic 2: Topics in Applied Econometrics and Development - Spring 2014

2 Single-Equation Linear Model The population model is linear in its parameters: y = β 0 + β 1 x 1 + β 2 x β K x K + u - y, x 1, x 2, x 3,..., x K : observable random scalars (we can observe in a random sample of the population) - u: unobservable random disturbance or error - β 0, β 1, β 2,..., β K : parameters (constants) we want to estimate

3 The error term u can consist of a variety of things, including: - omitted variables - measurement error Key conditions for OLS to consistently estimate the β j : E (u) = 0: the error (in the population) has mean zero Cov(x j ; u) = 0, j = 1, 2,..., K: the error (in the population) is uncorrelated with each of the regressors

4 E (u) = 0: assumption is without loss of generailty (WLOG) when an intercept is included (we consider this case in what follows) Sufficient for Cov(x j ; u) = 0, j = 1, 2,..., K is to have: E (u x 1, x 2,..., x K ) = E (u x) = 0 (zero conditional mean assumption)

5 Under the population model y = β 0 + β 1 x 1 + β 2 x β K x K + u and assumption E (u x 1, x 2,..., x K ) = 0, we have the population regression function: E (y x 1, x 2,..., x K ) = β 0 + β 1 x 1 + β 2 x β K x K

6 The population regression function E (y x 1, x 2,..., x K ) = β 0 + β 1 x 1 + β 2 x β K x K includes the case where the x j are nonlinear functions of underlying explanatory variables Example: E (savings income, size, age, college) = β 0 + β 1 log(income) + β 2 size + β 3 age + β 4 college + β 5 college.age

7 x j is endogenous if it is correlated with u x j is exogenous if it is uncorrelated with u Endogeneity usually arises in 3 ways: 1) Omitted 2) Measurement Error 3) Simultaneity

8 1) Omitted variables Omitted variables: we want to control for one or more additional variables but, usually because of data unavailability, we cannot include them in a regression model. Suppose that E (y x, q) is the conditional expectation of interest, which can be written as a function linear in parameters and additive in q. - q is unobserved - we can estimate E (y x), but this will have no particular relationship to E (y x, q) when q and x are correlated.

9 We can represent this situation as follows: y = β 0 + β 1 x 1 + β 2 x β K x K + (q + u) - q is part of the error term - q and x j correlated x j endogenous

10 We can represent this situation as follows: y = β 0 + β 1 x 1 + β 2 x β K x K + (q + u) - q is part of the error term - q and x j correlated x j endogenous Could you give an example in which this problem in found?

11 Correlation of explanatory variables with unobservables is often due to self-selection. If agents choose the value of x j, this might depend on factors q that are unobservable to the analyst. Example: - Omitted ability in a wage equation. An individual s years of schooling are likely to be correlated with unobserved ability.

12 Example: Wage Equation with Unobserved Ability Write the wage equation as: log(wage) = β 0 + β 1 exper + β 2 exper 2 + β 3 educ + γabil + v - v is such that E (v exper; educ, abil) = 0 - Let abil be uncorrelated with exper and exper 2 once educ has been partialed out: abil = δ 0 + δ 3 educ + r with r uncorrelated with exper and exper 2 - Then: plim ˆβ 3 = β 3 + γδ 3. the coefficients on exper and exper 2 are consistently estimated by the OLS regression that omits ability - If δ 3 > 0 then plim ˆβ 3 > β 3 (because γ > 0 by definition) the return to education is overestimated in large samples.

13 2) Measurement Error Measurement Error: we want to measure the effect of a variable, say xk, but we can observe only an imperfect measure of it, say x K. When we plug x K in for xk, we put a measurement error into u. - u and x K may be correlated.

14 Remark Measurement error is an issue only when the variables on which we can collect data differ from the variables that influence decisions by individuals, families, firms, and so on. Example: - Suppose we are estimating the effect of peer group behavior on teenage drug usage - The behavior of a teenager s peer group is self-reported - Self-reporting may be a mismeasure of actual peer group behavior. BUT: no problem - We are more interested in the effects of how a teenager perceives his peer group.

15 Measurement Error in the Dependent Variable Suppose the dependent variable is the only variable measured with error. y : the variable that we would like to explain. The regression model is linear: y = β 0 + β 1 x 1 + β 2 x β K x K + v For example, y could be annual family saving.

16 We are interested in E (y x 1, x 2,..., x K ) y: observable measure of y (y = y ) e 0 = y y : population measurement error

17 From y = β 0 + β 1 x 1 + β 2 x β K x K + v And: We can write: y = y e 0 y = β 0 + β 1 x 1 + β 2 x β K x K + v + e 0

18 Since y, x 1, x 2,..., x K are observed, we can estimate this model by OLS. We just ignore that y is an imperfect measure of y and proceed as usual. Does OLS with y in place of y produce consistent estimators of the β j? - Important: what we assume about the relationship between the measurement error e 0 and the explanatory variables x j - Usual assumption: measurement error in y is statistically independent of each explanatory variable (e 0 is uncorrelated with x) OLS estimators are consistent. Further, the usual OLS inference procedures (t statistics, F statistics) are asymptotically valid

19 Measurement Error in an Explanatory Variable Measurement Error in an Explanatory Variable: considered a much more important problem than measurement error in the response variable. Consider the model with a single explanatory measured with error: y = β 0 + β 1 x 1 + β 2 x β K x K + v - y, x 1, x 2,..., x K 1 : observable - xk : not observable - Assume: 1) v has zero mean; 2) v is uncorrelated with y, x 1, x 2,..., x K 1, x K If xk were observed, OLS estimation would produce consistent estimators.

20 We observe x K, a measure of x K Assumption: v is also uncorrelated with x K The measurement error in the population is: e K = x K x K and this can be positive, negative, or zero. Assume: E (e K ) = 0 (average measurement error in the population is zero). This has no practical consequences because we include an intercept in the model. Since v is assumed to be uncorrelated with x K and x K, v is also uncorrelated with e K.

21 Usual assumption: e K is uncorrelated with the explanatory variables not measured with error: E (x j e K ) = 0, j = 1,..., K 1. The key assumptions involve the relationship between the measurement error and x K and x K 2 assumptions in the econometrics literature

22 1st assumption: e K is uncorrelated with the observed measure, x K : Cov(x K, e K ) = 0 e K correlated with the unobserved variable x K

23 From y = β 0 + β 1 x 1 + β 2 x β K x K + v And x K = x K e K We have y = β 0 + β 1 x 1 + β 2 x β K x K + (v β K e K ) We assumed: v and e K both have zero mean and are uncorrelated with each x j (including x K ) v β K e K has zero mean and is uncorrelated with the x j OLS estimation with x K in place of xk produces consistent estimators of all of the β j

24 Since v is uncorrelated with e K, the variance of the error is Var(v β K e K ) = σ 2 v + β 2 K σ2 e K measurement error increases the error variance. Not surprising and doesn t violate any of the OLS assumptions Since in this case OLS has all its nice properties, this is not usually what econometricians have in mind when referring to measurement error in an explanatory variable.

25 2nd assumption: measurement error is uncorrelated with the unobserved explanatory variable: Cov(x K, e K ) = 0 This is the Classical errors-in-variables (CEV) assumption This assumption comes from writing the observed measure as the sum of the true explanatory variable and the measurement error: x K = x K + e K and assuming the two components of x K are uncorrelated We still maintain the assumptions that v is uncorrelated with x K and x K, and therefore with e K

26 Cov(xK, e K ) = 0 and x K = xk + e K x K and e K must be correlated: Cov(x K, e K ) = E (x K e K ) = E (xk e K ) + E (ek 2 ) = σ 2 e K From y = β 0 + β 1 x 1 + β 2 x β K x K + (v β K e K ), we see that correlation between x K and e K causes problems for OLS. OLS regression of y on x 1, x 2,..., x K gives inconsistent estimators of all of the β j

27 If x K is uncorrelated with x j, all j = K, then so is x K. It follows that plim ˆβ j = β j, all j = K. It is possible to show that ( plim ( ˆβ k ) = β k σ 2 ) r K σ 2 r K + σ 2 e K where rk is the linear projection error in xk = δ 0 + δ 1 x 1 + δ 2 x δ K 1 x K 1 + rk ( ) σ 2 r K Since 0 < < 1, plim ˆβ σ 2 r K +σ 2 e k = β k. This is the K attenuation bias in OLS due to classical errors-in-variables

28 Attenuation bias In large samples, the estimated OLS effect will be attenuated as a result of the presence of classical errors-invariables. If β k > 0, ˆβ k will underestimate β k If β k < 0, ˆβ k will overestimate β k

29 Example: Measurement Error in Family Income Problem: Estimate the causal effect of family income on college grade point average, after controlling for high school grade point average and SAT score: colgpa = β 0 + β 1 famic + β 2 hsgpa + β 3 SAT + v - faminc is actual annual family income Precise data on colgpa, hsgpa, and SAT are relatively easy to obtain from school records. But family income, especially as reported by students, could be mismeasured.

30 Suppose faminc = faminc + e 1, and the CEV assumptions hold (faminc and e 1 correlated) using reported family income in place of actual family income will bias the OLS estimator of β 1 toward zero. Consequence: hypothesis test of H 0 : β 1 = 0 will have a higher probability of Type II error (one fails to reject a false null hypothesis)

31 3) Simultaneity Simultaneity: at least one of the explanatory variables is determined simultaneously along with y. If x K is determined partly as a function of y, then x K and u are generally correlated.

32 3) Simultaneity Simultaneity: at least one of the explanatory variables is determined simultaneously along with y. If x K is determined partly as a function of y, then x K and u are generally correlated. Could you give an example in which this problem in found?

33 Example - y: city murder rate - x K : size of the police force - The size of the police force is partly determined by the murder rate.

34 Remarks The distinctions among the 3 forms of endogeneity are not always sharp. An equation can have more than one source of endogeneity.

35 Remarks The distinctions among the 3 forms of endogeneity are not always sharp. An equation can have more than one source of endogeneity. Example: - Suppose we are looking at the effect of alcohol consumption on worker productivity (measured by wages) - We would worry that: 1) Alcohol usage is correlated with unobserved factors that also affect wage (e.g. family background) omitted variables 2) Alcohol demand generally depends on income simultaneity 3).Alcohol usage may be unprecisely measured measurement error

36 Motivation for Consider again the linear population model y = β 0 + β 1 x 1 + β 2 x β K x K + u - E (u) = 0: the error (in the population) has mean zero - Cov(x j ; u) = 0, j = 1, 2,..., K 1: the error (in the population) is uncorrelated with each of the K 1 regressors - BUT: x K might be correlated with u. Hence: - x 1, x 2,..., x K 1 : exogenous - x K : endogenous

37 The endogeneity can come from any of the sources we discussed: 1) Omitted ; 2) Measurement Error; 3) Simultaneity To fix ideas: think of u as containing an omitted variable that is uncorrelated with all explanatory variables except x K As we saw: Cov(x K, u) = 0 OLS results inconsistent

38 variables (IV) provide a general solution to the problem of an endogenous explanatory variable. IV approach with x K endogenous: - we need an observable variable, z 1, not in the population model that satisfies 2 conditions 1st condition - Cov(z 1, u) = 0: z 1 must be uncorrelated with u In other words: z 1 is exogenous in the population model This is the exogeneity condition.

39 2nd condition - Consider the linear projection of x K onto all the exogenous variables: x K = δ 0 + δ 1 x 1 + δ 2 x δ K 1 x K 1 + θ 1 z 1 + r K - Key assumption: θ 1 = 0 (the coefficient on z 1 is nonzero) - This condition is often loosely described as z 1 is correlated with x K That statement is not quite correct. The condition θ 1 = 0 means that z 1 is partially correlated with x K once the other exogenous variables x 1,..., x K 1 have been netted out.

40 When z 1 satisfies Cov(z 1, u) = 0 and θ 1 = 0, then it is an instrumental variable (IV) candidate for x K. (or, simply, z 1 is an instrument for x K ) The linear projection x K = δ 0 + δ 1 x 1 + δ 2 x δ K 1 x K 1 + θ 1 z 1 + r K is called a reduced form equation for the endogenous explanatory variable x K. - The reduced form involves writing an endogenous variable as a linear projection onto all exogenous variables. - This terminology also conveys that here is nothing necessarily structural about this equation

41 From the structural equation: y = β 0 + β 1 x 1 + β 2 x β K x K + u And the reduced form for x K : x K = δ 0 + δ 1 x 1 + δ 2 x δ K 1 x K 1 + θ 1 z 1 + r K We obtain a reduced form for y: y = α 0 + α 1 x 1 + α 2 x α K 1 x K 1 + λ 1 z 1 + v where: - v = u + β K r K : reduced form error - α j = β j + β K δ j - λ 1 = β K θ 1

42 By the assumptions, v is uncorrelated with all explanatory variables in the reduced form equation y = α 0 + α 1 x 1 + α 2 x α K 1 x K 1 + λ 1 z 1 + v OLS consistently estimates the reduced form parameters α j and λ 1 Estimates of the reduced form parameters are sometimes of interest in their own right. But estimating the structural parameters is generally more useful.

43 Example Suppose: - x K : job training hours per worker - y: measure of average worker productivity. - job training grants were randomly assigned to firms. It is natural to use for z 1 either: 1) a binary variable indicating whether a firm received a job training grant 2) the actual amount of the grant per worker (if the amount varies by firm)

44 The parameter β K in the structural equation y = β 0 + β 1 x 1 + β 2 x β K x K + u is the effect of job training on worker productivity. The parameter λ 1 in the reduced form equation y = α 0 + α 1 x 1 + α 2 x α K 1 x K 1 + λ 1 z 1 + u is the effect of receiving the job training grant on worker productivity, which is of some interest. (But estimating the effect of an hour of general job training is more valuable)

45 We can show that the assumptions we have made on the IV z 1 solve the identification problem for the β j in the structural equation y = β 0 + β 1 x 1 + β 2 x β K x K + u Identification means that we can write the β j in terms of population moments in observable variables.

46 Write the structural equation y = β 0 + β 1 x 1 + β 2 x β K x K + u as y = xβ + u where x (1, x 2,..., x K ). Write z (1, x 2,..., x K 1, z 1 ). (vector of all exogenous variables)

47 Since by assumption Cov(x j ; u) = 0, j = 1, 2,..., K 1 and Cov(z 1, u) = 0, we have: E (z u) = 0 Multiply equation y = xβ + u through by z : z y = z xβ + z u Taking expectations: E (z y) = E (z xβ) + E (z u) E (z x)β = E (z y) where E (z x) is K K and E (z y) is K 1

48 Equation E (z x)β = E (z y) represents a system of K linear equations in the K unknowns β 1, β 2,..., β K. This system has a unique solution the K K matrix E (z x) has full rank: rank E (z x) = K (need θ 1 = 0 for this rank condition to hold) In this case, the solution is β = [E (z x)] 1 E (z y) This equation identifies the vector β

49 Let {(x i, y i, z i1 ) : i = 1, 2,..., N} be a random sample from the population The IV estimator of β is ˆβ = ( N 1 N z i x i i=1 ) 1 ( ) N 1 N z i y i i=1 which we can write as - Z: N K data matrix - X: N K data matrix - Y: N 1 data vector on the y i. ˆβ = ( Z X ) 1 Z Y

50 Remarks When searching for instruments, conditions Cov(z 1, u) = 0 and θ 1 = 0 are equally important in identifying β However: one practically important difference between these: - condition θ 1 = 0 can be tested - condition Cov(z 1, u) = 0 must be maintained. The reason is that u is unobservable. Condition θ 1 = 0 can and should be tested. In fact, the strength of the rejection in condition θ 1 = 0 (in a p-value sense) is important for determining the finite sample properties, particularly the bias, of the IV estimator.

51 Example: for Education in a Wage Equation Consider a wage equation for the U.S. population: log(wage) = β 0 + β 1 exper + β 2 exper 2 + β 3 educ + u where u is thought to be correlated with educ because of omitted ability. Suppose that we collect data on mother s education: motheduc

52 For motheduc to be a valid instrument for educ we need: 1) Assume that motheduc is uncorrelated with u 2) Have that θ 1 = 0 in the reduced form equation educ = δ 0 + δ 1 exper + δ 2 exper 2 + θ 1 motheduc + r There is little doubt that educ and motheduc are partially correlated. Also, this correlation is easily tested given a random sample from the population. Potential problem with motheduc as an instrument for educ: - motheduc might be correlated with the omitted factors in u: mother s education is likely to be correlated with child s ability and other family background characteristics that might be in u.

53 Suppose we want the last digit of one s social security. we have a poor IV candidate for the opposite reason. The last digit is randomly determined it is independent of other factors that affect earnings.(cov(lastdigit, u) = 0 hold) BUT: The last digit is also independent of education (θ 1 = 0 does not hold)

54 Challange: come up with convincing instruments. Angrist and Krueger (1991) propose using quarter of birth as an IV for education. In the simplest case, let : - frstqrt: dummy variable (= 1 for people born in the 1st quarter of the year and 0 otherwise) Quarter of birth is arguably independent of unobserved factors (such as ability) that affect wage In addition, we must have θ 1 = 0 in the reduced form: educ = δ 0 + δ 1 exper + δ 2 exper 2 + θ 1 frstqrt + r

55 Compulsory school attendence laws induce a relationship between educ and frstqrt: - At least some people are forced, by law, to attend school longer than they otherwise would, and this fact is correlated with quarter of birth. We can determine the strength of this association in a particular sample by estimating the reduced form and obtaining the t statistic for H 0 : θ 1 = 0.

56 Hence: it can be very difficult to find a good IV because the variable must satisfy 2 different, often conflicting, criteria. For motheduc, the issue in doubt is whether condition Cov(motheduc, u) = 0 holds. For frstqrt, the concern is with condition θ 1 = 0. Since this can be tested, frstqrt has more appeal as an instrument. - However, the partial correlation between educ and frstqrt is small, and this can lead to finite sample problems

57 Another issue: the sense in which we are estimating the return to education for the entire population of working people. Suppose: - the return to education is not constant across people - we use frstqrt as an IV to estimate the return to education Then: the IV results estimate the return only for those people induced to obtain more schooling because they were born in the 1st quarter of the year. These make up a relatively small fraction of the population.

58 Convincing instruments sometimes arise in the context of program evaluation Suppose: individuals are randomly selected to be eligible for the program. - Examples: job training programs and school voucher programs. Actual participation is almost always voluntary, and it may be endogenous because it can depend on unobserved factors that affect the response. However: reasonable to assume that eligibility is exogenous. Because job training and eligibility are correlated, the eligibility can be used as an IV for job training.

59 A common source of instrumental variables are natural experiments. Natural experiment: when some (often unintended) feature of the setup we are studying produces exogenous variation in an otherwise endogenous explanatory variable. Example: Angrist and Krueger (1991) quarter of birth seems, at least initially, to be a good natural experiment.

60 Sensible IVs need not come from natural experiments. Economists often use regional variation in prices or taxes as instruments for endogenous explanatory variables appearing in individual-level equations Example: - Suppose we want to estimating the effects of alcohol consumption on performance in college - the local price of alcohol can be used as an IV for alcohol consumption provided other regional factors that affect college performance have been appropriately controlled for. - Idea: the price of alcohol can be assumed to be exogenous to each individual.

61 Example: College Proximity as an IV for Education Card (1995): use wage data for 1976 Dummy variable that indicates whether a man grew up in the vicinity of a four-year college as an IV for years of schooling. Also includes several other controls: experience and experience square, a black indicator, southern and urban indicators, and regional and urban indicators IV estimate of the return to schooling: 13.2% OLS estimate of the return to schooling: 7.5%

62 Thus, the IV estimate is almost 2 the OLS estimate. Counterintuitive result if we thought that an OLS suffered from an upward omitted variable bias. One possibility: OLS estimators suffer from the attenuation bias as a result of measurement error (But the classical errors-in-variables assumption for education is questionable). Another possibility: instrument is not exogenous in the wage equation: location is not entirely exogenous.

63 Multiple Instruments: Two-Stage Least Squares Consider again the linear population model y = β 0 + β 1 x 1 + β 2 x β K x K + u - E (u) = 0: the error (in the population) has mean zero - Cov(x j ; u) = 0, j = 1, 2,..., K 1: the error (in the population) is uncorrelated with each of the K 1 regressors - x K can be correlated with u. Now: assume that we have more than one instrumental variable for x K.

64 Let z 1, z 2,..., z M be variables such that Cov(z h, u) = 0, h = 1, 2,..., M so each z h is exogenous in the population model If each of the z h has some partial correlation with x K, we could have M different IV estimators. Actually, there are many more than this (more than we can count) since any linear combination of x 1, x 2,..., x K 1, z 1, z 2,..., z M is uncorrelated with u. The question is: Which IV estimator should we use?

65 The two-stage least squares (2SLS) estimator is the most efficient IV estimator. Define the vector of exogenous variables again by z (1, x 1, x 2,..., x K 1, z 1,..., z M ) -This is a 1 L vector (L = K + M). Out of all possible linear combinations of z that can be used as an instrument for x K, the method of 2SLS chooses that which is most highly correlated with x K. The linear combination of z most highly correlated with x K is given by the linear projection of x K on z.

66 Write the reduced form for x K as x K = δ 0 + δ 1 x 1 + δ 2 x δ K 1 x K 1 +θ 1 z θ M z M + r K - By definition: r K has zero mean and is uncorrelated with each RHS variable. As any linear combination of z is uncorrelated with u, x K δ 0 + δ 1 x 1 + δ 2 x δ K 1 x K 1 +θ 1 z θ M z M is uncorrelated with u.

67 If we could observe x K, we would use it as an instrument for x K in the structural equation However: δ j and θ j are population parameters xk usable instrument. is not a But: we can consistently estimate the parameters in equation x K = δ 0 + δ 1 x 1 + δ 2 x δ K 1 x K 1 +θ 1 z θ M z M + r K

68 The sample analogues of the xik OLS fitted values: for each observation i are the ˆx ik = ˆδ 0 + ˆδ 1 x i1 + ˆδ 2 x i ˆδ K 1 x i,k 1 +ˆθ 1 z i ˆθ M z im For each observation i, define the vector ˆx i (1, x i1, x i2,..., x i,k 1, ˆx i,k ), i = 1, 2,..., N. Using ˆx i as the instruments for x i gives the IV estimator ˆβ = = ( N ) 1 ( N ˆx i x i ˆx i y i i=1 i=1 ) 1 (ˆX X ˆX Y )

69 ˆβ can be obtained from the following steps: 1. First-stage regression: Obtain the fitted values ˆx K from the regression x K on 1, x 1,..., x K 1, z 1,..., z M (i subscript is omitted for simplicity) 2. Second-stage regression: Run the OLS regression This 2nd stage produces the ˆβ. y on 1, x 1,..., x K 1, ˆx K

70 Alert: common harmful mistake It is possible to show that the following seemingly sensible, two-step procedure is generally inconsistent: 1st stage: regress x K on 1, z 1,..., z M and obtain the fitted values, say x K. 2nd stage: run the regression y on 1, x 1,..., x K 1, x K. You CANNOT omit x 1,..., x K 1 in the 1st-stage regression. It is best to use a software package with a 2SLS command rather than explicitly carry out the two-step procedure.

71 Remember that in the case of 1 endogenous variable and 1 instrument, we had the condition that in the linear projection of x K onto all the exogenous variables: x K = δ 0 + δ 1 x 1 + δ 2 x δ K 1 x K 1 + θ 1 z 1 + r K we had θ 1 = 0 (the coefficient on z 1 is nonzero) What is the analogue of this condition when more than 1 instrument is available with 1 endogenous explanatory variable?

72 When more than 1 instrument is available with 1 endogenous explanatory variable we need: At least one of the θ j in equation x K = δ 0 + δ 1 x 1 + δ 2 x δ K 1 x K 1 +θ 1 z θ M z M + r K be = 0. That is: we need at least 1 exogenous variable that does not appear in the structural equation to induce variation in x K that cannot be explained by x 1,..., x K 1. Remark: Identification of β does not depend on the values of the δ h in this equation.

73 To test this condition, we simply test the null hypothesis H 0 : θ 1 = 0, θ 2 = 0,..., θ M = 0 against the alternative that at least one of the θ j is = 0. This test gives a compelling reason for explicitly running the 1st-stage regression. A standard F statistic can be used to test this hypothesis. If we cannot reject the this hypothesis against the alternative that at least one of the θ j is = 0, then we should have serious reservations about the 2SLS procedure: the instruments do not pass a minimal requirement.

74 The model with: - 1 endogenous variable - M > 1 instruments is overidentified (there are M 1 overidentifying restrictions) If each z h has some partial correlation with x K, then we have M 1 more exogenous variables than needed to identify the parameters in the structual equation

75 Example: - 1 endogenous variable - M = 2 There is 1 overidentifying restriction. could discard 1 of the instruments and still achieve identification

76 Testing Overidentifying Restrictions Suppose we have more instruments than we need to identify an equation Then: we can test whether the additional instruments are valid in the sense that they are uncorrelated with u 1. Write the equation in the form: y 1 = z 1 δ 1 + y 2 α 1 + u 1 - y 2 : 1 G 1 vector of endogenous variables in the population model - z 1 : 1 L 1 - z 2 : 1 L 2 - z: 1 L vector of all exogenous variables (z = (z 1, z 2 )) - L 2 > G 1 : the model is overidentified

77 We could use any 1 G 1 subset of z 2 as instruments for y 2 in estimating equation y 1 = z 1 δ 1 + y 2 α 1 + u 1 Hausman (1978) suggested comparing the 2SLS estimator using all instruments to 2SLS using a subset that just identifies the equation If all instruments are valid, the estimates should differ only as a result of sampling error.

78 Test the validity of all overidentification restrictions: - based on the observation that the residuals from 2SLS should be uncorrelated with the set of exogenous variables if the instruments are truly exogenous

79 A test for validity of the overidentification restrictions is obtained from the OLS regression û 1 on z where û 1 are the 2SLS residuals using all of the instruments z (simply estimate regression y 1 = z 1 δ 1 + y 2 α 1 + u 1 by 2SLS and obtain the 2SLS residuals, û 1 ). Obtain R 2 u, the usual R-squared. Null hypothesis is that all instruments are valid: E (z u 1 ) = 0 Under the null, NR 2 u χ 2 Q 1, where Q 1 L 2 G 1 is the number of overidentifying restrictions. If we reject the null hypothesis, then our logic for choosing the IVs must be reexamined (1 or more of the IVs are not exogenous). If we fail to reject the null, then we can have some confidence in the overall set of instruments used.

80 Example: Overidentifying Restrictions in the Wage Equation Estimate for working women the equation log(wage) = δ 0 + δ 1 exper + δ 2 exper 2 + α 1 educ + u 1 - educ and u 1 may be correlated Use motheduc, fatheduc, huseduc (mother s education, father s education, husband s education) as instruments for educ in a 2SLS procedure 2 overidentifying restrictions.

81 Let û 1 be the 2SLS residuals from equation log(wage) = δ 0 + δ 1 exper + δ 2 exper 2 + α 1 educ + u 1 using all instruments The test statistic is N times the R-squared from the OLS regression û 1 on 1, exper, exper 2, motheduc, fatheduc, huseduc Under H 0, NR 2 u χ 2 Q 2 Using the data on working women in Wooldrdridge: R 2 u = Overidentification test statistic: p-value: overidentifying restrictions are not rejected at any reasonable level. we can have some confidence in the overall set of instruments used.

82 Example: To test for the validity of z 4 as an IV, assuming that z 3 is a valid IV. - Consider the model: - y 2 is (possibly) endogenous - z 3 and z 4 are IVs y 1 = β 0 + β 1 y 2 + β 2 z 1 + b 3 z 2 + u Test: - We run the 2SLS with z 3 as the only IV - Compute û 3 = y 1 ˆβ 0 ˆβ 1 y 2 ˆβ 2 z 1 ˆβ 3 z 2 - Evaluate the regression model û 3 = δ 0 + δ 1 z 4, in particular, test the significance of z 4. This is a valid test for the validity of z 4 as an IV. BUT it needs to assume that z 3 is a valid IV.

83 Next paper: *Angrist, Joshua D., and Alan B. Krueger and the Search for Identification: From Supply and Demand to Natural Experiments. Journal of Economic Perspectives, 15(4):

84 The earliest applications of instrumental variables involved the estimation of demand and supply curves. Several economists were interested in estimating the elasticities of demand and supply for products ranging from herring (a fish) to butter, usually with time series data. If the demand and supply curves shift over time, the observed data on quantities and prices reflect a set of equilibrium points on both curves. OLS regression of quantities on prices fails to identify that is, trace out either the supply or demand relationship.

85 P.G. Wright (1928) applied instrumental variables to estimating the elasticities of supply and demand for flaxseed, the source of linseed oil. Wright suggested that certain curve shifters what we would now call IVs can be used to address the problem: Such additional factors may be factors which: (A) affect demand conditions without affecting cost conditions or which (B) affect cost conditions without affecting demand conditions. Demand curve shifter: the price of substitute good (such as cottonseed) Supply curve shifter: yield per acre, which is primarily determined by the weather.

86 Wright (1928) observed: Success with this method depends on success in discovering factors of the type A and B. He used 6 different supply shifters to estimate the demand curve and then averaged the 6 instrumental variables estimates. The resulting average elasticity of demand for flaxseed was His average instrumental variables estimate of the elasticity of supply was 2.4. Wright s econometric advance went unnoticed by the subsequent literature. Not until the 1940s were IVs and related methods rediscovered and extended.

87 Wright s (1928) method of averaging the different instrumental variables estimates does not necessarily produce the most efficient estimate Other estimators may combine the information in different instruments to produce an estimate with less sampling variability. The most efficient way to combine multiple instruments is usually the 2SLS (developed by Theil (1953)) 1st stage: the endogenous RHS variable (price in this application) is regressed on all the instruments. 2nd stage: the predicted values of price (based on the data for the instruments and the coefficients estimated from the first-stage regression) are then plugged directly into the equation of interest in place of the endogenous regressor

88 and Measurement Error IV can also overcome measurement error problems in explanatory variables. Measurement error can arise for many reasons: - the limited ability of statistical agencies to collect accurate information - deviation between the variables specified in economic theory and those collected in practice.

89 If an explanatory variable is measured with additive random errors, then the coefficient on that variable in a OLS regression will be biased toward zero in a large sample. The higher the proportion of variability that is due to errors, the greater the bias. variables provide a consistent estimate even in the presence of measurement error.

90 and Omitted IV helps overcome omitted variables problems in estimates of causal relationships. Studies of this type are usually concerned with estimating a narrowly defined causal relationship: - effect of schooling, training or military service on earnings - the impact of smoking or medical treatments on health - the effect of social insurance programs on labor supply - the effect of policing on crime. The observed association between the outcome and explanatory variable of interest in these and many other examples is likely to be misleading in the sense that it partly reflects omitted factors that are related to both variables.

91 If these factors could be measured and held constant in a regression, the omitted variables bias would be eliminated. One solution to the omitted variables problem: assign the variable of interest randomly. - Example: social experiments assign people to a job training program or to a control group. Random assignment assures that participation in the program is not correlated with omitted personal or social factors.

92 Randomized experiments are not always possible: cannot force a randomly chosen group of people to quit smoking On the other hand, it may be possible to find a degree of exogenous variation in variables like schooling, smoking and minimum wages.

93 How can instrumental variables solve the omitted variables problem? - Suppose that we would like to use the following cross-sectional regression equation to measure the return to schooling, denoted ρ: Y i = α + ρs i + βa i + ε i - Y i : person i s log wage - S i : person s highest grade of schooling completed - A i : ability or motivation Data on A i are typically unavailable.

94 Without additional information, the parameter of interest, ρ, is not identified (that is, we cannot deduce it from the joint distribution of earnings and schooling alone). Suppose we have a third variable, the instrument: Z i, which is correlated with schooling, but otherwise unrelated to earnings. (That is, Z i is uncorrelated with the omitted variables and the regression error ε i ) IVs solve the omitted variables problem by using only part of the variability in schooling specifically, a part that is uncorrelated with the omitted variables to estimate the relationship between schooling and earnings.

95 IV methods allow to estimate the coefficient of interest consistently and free from asymptotic bias from omitted variables, without actually having data on the omitted variables or even knowing what they are If there is > 1 valid instrument, the coefficient of interest can be estimated by 2SLS natural experiments : provide instruments that are used to overcome omitted variables bias - situations where the forces of nature or government policy have conspired to produce an environment somewhat akin to a randomized experiment. Good instrument: correlated with the endogenous regressor for reasons the researcher can verify and explain, but uncorrelated with the outcome variable for reasons beyond its effect on the endogenous regressor.

96 Maddala (1977, p. 154) asks, Where do you get such a variable? Angrist & Krueger: good instruments often come from detailed knowledge of the economic mechanism and institutions determining the regressor of interest. In the case of schooling, human capital theory suggests that people make schooling choices by comparing the costs and benefits of alternatives. Possible source of instruments: 1) Differences in costs: loan policies or other subsidies; 2) Institutional constraints.

97 Angrist and Krueger (1991): use of natural experiments to eliminate omitted variables bias. Most states required students to enter school in the calendar year in which they turned 6, school start age is a function of date of birth. Those born late in the year are young for their grade. - December 31st birthday cutoff: - Children born in the 4th quarter enter school at age 5 - Children born in the 1st quarter enter school at age 6 Compulsory schooling laws typically require students to remain in school until their 16th birthdays These groups of students will be in different grades when they reach the legal dropout age.

98 The combination of school start age policies and compulsory schooling laws creates a natural experiment in which children are compelled to attend school for different lengths of time depending on their birthdays. Use data from the 1980 census Look at the relationship between educational attainment and quarter of birth for men born from 1930 to 1959.

99

100 Figure 1: - younger birth cohorts finished more schooling - men born early in the calendar year tend to have lower average schooling levels This 10-year birth cohort was selected because men this age tend to have a relatively flat age-earnings profile. (But the pattern of less education for men born early in the year holds for men born in the 1940s and 1950s, as well) Individual s date of birth is probably unrelated to the person s innate ability, motivation or family connections date of birth should provide a valid instrument for schooling.

101

102 Figure 2: average earnings by quarter of birth for the same sample. - This figure shows the reduced form relationship between the instruments and the dependent variable. - Earnings rise with work experience. older cohorts tend to have higher earnings - On average, men born in early quarters of the year almost always earn less than those born later in the year. - This reduced form relationship parallels the quarter-of-birth pattern in schooling. Figure 1 and 2: it is clear that the differences in education and earnings associated with quarter of birth are discrete blips, rather than smooth changes related to the gradual effects of aging.

103 Intuition behind the IV: - differences in earnings by quarter of birth are assumed to be accounted for solely by differences in schooling by quarter of birth the estimated return to schooling is simply the appropriately rescaled difference in average earnings by quarter of birth Only a small part of the variability in schooling the part associated with quarter of birth is used to identify the return to education.

104 They find: - men born in the first quarter have 1/10 of a year less schooling than men born in later quarters - men born in the first quarter earn about 0.1 percent less than men born in later quarters - The ratio of the difference in earnings to the difference in schooling, about 0.10, is an IV estimate of the proportional earnings gain from an additional year of schooling.

105 Need a well-developed story or model motivating the choice of instruments. These stories have implications that can be used to support or refute a behavioral interpretation of the IV estimates. Example: the interpretation of Figures 1 and 2 as resulting from the interaction of school start-age policy and compulsory schooling - Support for this interpretation: quarter of birth is unrelated to earnings and educational attainment for those with a college degree or higher. - The college degree or higher people are unconstrained by compulsory schooling laws if quarter of birth was related to education or earnings in this sample, the rationale motivating the use of quarter of birth as an instrument would have been refuted.

106 Interpreting Estimates with Heterogeneous Responses Difficulty in interpreting IV estimates: not every observation s behavior is affected by the instrument. IV methods use only part of the variation in an explanatory variable that is, by change the behavior of only some people. Example: Angrist and Krueger (1991) study - the quarter-of-birth instrument is most relevant for those who are at high probability of quitting school as soon as possible, with little or no effect on those who are likely to proceed on to college.

107 In other words: - IVs provide an estimate for a specific group namely, people whose behavior can be manipulated by the instrument. In the example: The quarter-of-birth instruments used by Angrist and Krueger (1991) generate an estimate for those whose level of schooling was changed by that instrument.

108 Angrist and Krueger (1991) view: - IVs often solve the first-order problem of eliminating omitted variables bias for a well-defined population. - Since the sample size and range of variability in many empirical studies are quite limited, extrapolation to other populations is naturally somewhat speculative (A fertilizer that helps corn to grow in Iowa will probably have a beneficial effect in California as well, though one can t be sure.) - Existence of heterogeneous treatment effects would be a reason for analyzing more natural experiments, not fewer, to understand the source and extent of heterogeneity in the effect of interest.

109 Besides that: - the population one learns about in a natural experiment is often of intrinsic interest - Example: Angrist and Krueger (1991) IV estimates are relevant for assessing the economic rewards to increases in schooling induced by legal and institutional changes from policies designed to keep children from dropping out of high school

Problem Set # 1. Master in Business and Quantitative Methods

Problem Set # 1. Master in Business and Quantitative Methods Problem Set # 1 Master in Business and Quantitative Methods Contents 0.1 Problems on endogeneity of the regressors........... 2 0.2 Lab exercises on endogeneity of the regressors......... 4 1 0.1 Problems

More information

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors Laura Mayoral IAE, Barcelona GSE and University of Gothenburg Gothenburg, May 2015 Roadmap Deviations from the standard

More information

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Econometrics Week 8 Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Fall 2012 1 / 25 Recommended Reading For the today Instrumental Variables Estimation and Two Stage

More information

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Many economic models involve endogeneity: that is, a theoretical relationship does not fit

More information

Mgmt 469. Causality and Identification

Mgmt 469. Causality and Identification Mgmt 469 Causality and Identification As you have learned by now, a key issue in empirical research is identifying the direction of causality in the relationship between two variables. This problem often

More information

Instrumental Variables

Instrumental Variables Instrumental Variables Econometrics II R. Mora Department of Economics Universidad Carlos III de Madrid Master in Industrial Organization and Markets Outline 1 2 3 OLS y = β 0 + β 1 x + u, cov(x, u) =

More information

Econometrics of causal inference. Throughout, we consider the simplest case of a linear outcome equation, and homogeneous

Econometrics of causal inference. Throughout, we consider the simplest case of a linear outcome equation, and homogeneous Econometrics of causal inference Throughout, we consider the simplest case of a linear outcome equation, and homogeneous effects: y = βx + ɛ (1) where y is some outcome, x is an explanatory variable, and

More information

Dealing With Endogeneity

Dealing With Endogeneity Dealing With Endogeneity Junhui Qian December 22, 2014 Outline Introduction Instrumental Variable Instrumental Variable Estimation Two-Stage Least Square Estimation Panel Data Endogeneity in Econometrics

More information

IV Estimation WS 2014/15 SS Alexander Spermann. IV Estimation

IV Estimation WS 2014/15 SS Alexander Spermann. IV Estimation SS 2010 WS 2014/15 Alexander Spermann Evaluation With Non-Experimental Approaches Selection on Unobservables Natural Experiment (exogenous variation in a variable) DiD Example: Card/Krueger (1994) Minimum

More information

ECO375 Tutorial 9 2SLS Applications and Endogeneity Tests

ECO375 Tutorial 9 2SLS Applications and Endogeneity Tests ECO375 Tutorial 9 2SLS Applications and Endogeneity Tests Matt Tudball University of Toronto Mississauga November 23, 2017 Matt Tudball (University of Toronto) ECO375H5 November 23, 2017 1 / 33 Hausman

More information

Exercise Sheet 4 Instrumental Variables and Two Stage Least Squares Estimation

Exercise Sheet 4 Instrumental Variables and Two Stage Least Squares Estimation Exercise Sheet 4 Instrumental Variables and Two Stage Least Squares Estimation ECONOMETRICS I. UC3M 1. [W 15.1] Consider a simple model to estimate the e ect of personal computer (P C) ownership on the

More information

1 Motivation for Instrumental Variable (IV) Regression

1 Motivation for Instrumental Variable (IV) Regression ECON 370: IV & 2SLS 1 Instrumental Variables Estimation and Two Stage Least Squares Econometric Methods, ECON 370 Let s get back to the thiking in terms of cross sectional (or pooled cross sectional) data

More information

Warwick Economics Summer School Topics in Microeconometrics Instrumental Variables Estimation

Warwick Economics Summer School Topics in Microeconometrics Instrumental Variables Estimation Warwick Economics Summer School Topics in Microeconometrics Instrumental Variables Estimation Michele Aquaro University of Warwick This version: July 21, 2016 1 / 31 Reading material Textbook: Introductory

More information

Economics 241B Estimation with Instruments

Economics 241B Estimation with Instruments Economics 241B Estimation with Instruments Measurement Error Measurement error is de ned as the error resulting from the measurement of a variable. At some level, every variable is measured with error.

More information

Wooldridge, Introductory Econometrics, 3d ed. Chapter 16: Simultaneous equations models. An obvious reason for the endogeneity of explanatory

Wooldridge, Introductory Econometrics, 3d ed. Chapter 16: Simultaneous equations models. An obvious reason for the endogeneity of explanatory Wooldridge, Introductory Econometrics, 3d ed. Chapter 16: Simultaneous equations models An obvious reason for the endogeneity of explanatory variables in a regression model is simultaneity: that is, one

More information

Instrumental Variables and the Problem of Endogeneity

Instrumental Variables and the Problem of Endogeneity Instrumental Variables and the Problem of Endogeneity September 15, 2015 1 / 38 Exogeneity: Important Assumption of OLS In a standard OLS framework, y = xβ + ɛ (1) and for unbiasedness we need E[x ɛ] =

More information

ECO375 Tutorial 8 Instrumental Variables

ECO375 Tutorial 8 Instrumental Variables ECO375 Tutorial 8 Instrumental Variables Matt Tudball University of Toronto Mississauga November 16, 2017 Matt Tudball (University of Toronto) ECO375H5 November 16, 2017 1 / 22 Review: Endogeneity Instrumental

More information

8. Instrumental variables regression

8. Instrumental variables regression 8. Instrumental variables regression Recall: In Section 5 we analyzed five sources of estimation bias arising because the regressor is correlated with the error term Violation of the first OLS assumption

More information

Recitation Notes 6. Konrad Menzel. October 22, 2006

Recitation Notes 6. Konrad Menzel. October 22, 2006 Recitation Notes 6 Konrad Menzel October, 006 Random Coefficient Models. Motivation In the empirical literature on education and earnings, the main object of interest is the human capital earnings function

More information

ECON Introductory Econometrics. Lecture 16: Instrumental variables

ECON Introductory Econometrics. Lecture 16: Instrumental variables ECON4150 - Introductory Econometrics Lecture 16: Instrumental variables Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 12 Lecture outline 2 OLS assumptions and when they are violated Instrumental

More information

An explanation of Two Stage Least Squares

An explanation of Two Stage Least Squares Introduction Introduction to Econometrics An explanation of Two Stage Least Squares When we get an endogenous variable we know that OLS estimator will be inconsistent. In addition OLS regressors will also

More information

Empirical approaches in public economics

Empirical approaches in public economics Empirical approaches in public economics ECON4624 Empirical Public Economics Fall 2016 Gaute Torsvik Outline for today The canonical problem Basic concepts of causal inference Randomized experiments Non-experimental

More information

4.8 Instrumental Variables

4.8 Instrumental Variables 4.8. INSTRUMENTAL VARIABLES 35 4.8 Instrumental Variables A major complication that is emphasized in microeconometrics is the possibility of inconsistent parameter estimation due to endogenous regressors.

More information

Instrumental Variables

Instrumental Variables Instrumental Variables Department of Economics University of Wisconsin-Madison September 27, 2016 Treatment Effects Throughout the course we will focus on the Treatment Effect Model For now take that to

More information

Instrumental Variables

Instrumental Variables Instrumental Variables Yona Rubinstein July 2016 Yona Rubinstein (LSE) Instrumental Variables 07/16 1 / 31 The Limitation of Panel Data So far we learned how to account for selection on time invariant

More information

ECO375 Tutorial 4 Wooldridge: Chapter 6 and 7

ECO375 Tutorial 4 Wooldridge: Chapter 6 and 7 ECO375 Tutorial 4 Wooldridge: Chapter 6 and 7 Matt Tudball University of Toronto St. George October 6, 2017 Matt Tudball (University of Toronto) ECO375H1 October 6, 2017 1 / 36 ECO375 Tutorial 4 Welcome

More information

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data July 2012 Bangkok, Thailand Cosimo Beverelli (World Trade Organization) 1 Content a) Endogeneity b) Instrumental

More information

Lecture 8: Instrumental Variables Estimation

Lecture 8: Instrumental Variables Estimation Lecture Notes on Advanced Econometrics Lecture 8: Instrumental Variables Estimation Endogenous Variables Consider a population model: y α y + β + β x + β x +... + β x + u i i i i k ik i Takashi Yamano

More information

Applied Econometrics (MSc.) Lecture 3 Instrumental Variables

Applied Econometrics (MSc.) Lecture 3 Instrumental Variables Applied Econometrics (MSc.) Lecture 3 Instrumental Variables Estimation - Theory Department of Economics University of Gothenburg December 4, 2014 1/28 Why IV estimation? So far, in OLS, we assumed independence.

More information

Chapter 2: simple regression model

Chapter 2: simple regression model Chapter 2: simple regression model Goal: understand how to estimate and more importantly interpret the simple regression Reading: chapter 2 of the textbook Advice: this chapter is foundation of econometrics.

More information

ECON Introductory Econometrics. Lecture 17: Experiments

ECON Introductory Econometrics. Lecture 17: Experiments ECON4150 - Introductory Econometrics Lecture 17: Experiments Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 13 Lecture outline 2 Why study experiments? The potential outcome framework.

More information

5.2. a. Unobserved factors that tend to make an individual healthier also tend

5.2. a. Unobserved factors that tend to make an individual healthier also tend SOLUTIONS TO CHAPTER 5 PROBLEMS ^ ^ ^ ^ 5.1. Define x _ (z,y ) and x _ v, and let B _ (B,r ) be OLS estimator 1 1 1 1 ^ ^ ^ ^ from (5.5), where B = (D,a ). Using the hint, B can also be obtained by 1 1

More information

Introduction to Panel Data Analysis

Introduction to Panel Data Analysis Introduction to Panel Data Analysis Youngki Shin Department of Economics Email: yshin29@uwo.ca Statistics and Data Series at Western November 21, 2012 1 / 40 Motivation More observations mean more information.

More information

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors Laura Mayoral IAE, Barcelona GSE and University of Gothenburg Gothenburg, May 2015 Roadmap of the course Introduction.

More information

Final Exam. Economics 835: Econometrics. Fall 2010

Final Exam. Economics 835: Econometrics. Fall 2010 Final Exam Economics 835: Econometrics Fall 2010 Please answer the question I ask - no more and no less - and remember that the correct answer is often short and simple. 1 Some short questions a) For each

More information

Sociology 593 Exam 2 Answer Key March 28, 2002

Sociology 593 Exam 2 Answer Key March 28, 2002 Sociology 59 Exam Answer Key March 8, 00 I. True-False. (0 points) Indicate whether the following statements are true or false. If false, briefly explain why.. A variable is called CATHOLIC. This probably

More information

In Chapter 2, we learned how to use simple regression analysis to explain a dependent

In Chapter 2, we learned how to use simple regression analysis to explain a dependent 3 Multiple Regression Analysis: Estimation In Chapter 2, we learned how to use simple regression analysis to explain a dependent variable, y, as a function of a single independent variable, x. The primary

More information

In Chapter 2, we learned how to use simple regression analysis to explain a dependent

In Chapter 2, we learned how to use simple regression analysis to explain a dependent C h a p t e r Three Multiple Regression Analysis: Estimation In Chapter 2, we learned how to use simple regression analysis to explain a dependent variable, y, as a function of a single independent variable,

More information

Using Instrumental Variables to Find Causal Effects in Public Health

Using Instrumental Variables to Find Causal Effects in Public Health 1 Using Instrumental Variables to Find Causal Effects in Public Health Antonio Trujillo, PhD John Hopkins Bloomberg School of Public Health Department of International Health Health Systems Program October

More information

Multiple Linear Regression CIVL 7012/8012

Multiple Linear Regression CIVL 7012/8012 Multiple Linear Regression CIVL 7012/8012 2 Multiple Regression Analysis (MLR) Allows us to explicitly control for many factors those simultaneously affect the dependent variable This is important for

More information

STOCKHOLM UNIVERSITY Department of Economics Course name: Empirical Methods Course code: EC40 Examiner: Lena Nekby Number of credits: 7,5 credits Date of exam: Friday, June 5, 009 Examination time: 3 hours

More information

Applied Health Economics (for B.Sc.)

Applied Health Economics (for B.Sc.) Applied Health Economics (for B.Sc.) Helmut Farbmacher Department of Economics University of Mannheim Autumn Semester 2017 Outlook 1 Linear models (OLS, Omitted variables, 2SLS) 2 Limited and qualitative

More information

Applied Quantitative Methods II

Applied Quantitative Methods II Applied Quantitative Methods II Lecture 4: OLS and Statistics revision Klára Kaĺıšková Klára Kaĺıšková AQM II - Lecture 4 VŠE, SS 2016/17 1 / 68 Outline 1 Econometric analysis Properties of an estimator

More information

4 Instrumental Variables Single endogenous variable One continuous instrument. 2

4 Instrumental Variables Single endogenous variable One continuous instrument. 2 Econ 495 - Econometric Review 1 Contents 4 Instrumental Variables 2 4.1 Single endogenous variable One continuous instrument. 2 4.2 Single endogenous variable more than one continuous instrument..........................

More information

4 Instrumental Variables Single endogenous variable One continuous instrument. 2

4 Instrumental Variables Single endogenous variable One continuous instrument. 2 Econ 495 - Econometric Review 1 Contents 4 Instrumental Variables 2 4.1 Single endogenous variable One continuous instrument. 2 4.2 Single endogenous variable more than one continuous instrument..........................

More information

Econometrics Homework 4 Solutions

Econometrics Homework 4 Solutions Econometrics Homework 4 Solutions Question 1 (a) General sources of problem: measurement error in regressors, omitted variables that are correlated to the regressors, and simultaneous equation (reverse

More information

Problem Set - Instrumental Variables

Problem Set - Instrumental Variables Problem Set - Instrumental Variables 1. Consider a simple model to estimate the effect of personal computer (PC) ownership on college grade point average for graduating seniors at a large public university:

More information

Ec1123 Section 7 Instrumental Variables

Ec1123 Section 7 Instrumental Variables Ec1123 Section 7 Instrumental Variables Andrea Passalacqua Harvard University andreapassalacqua@g.harvard.edu November 16th, 2017 Andrea Passalacqua (Harvard) Ec1123 Section 7 Instrumental Variables November

More information

Applied Microeconometrics I

Applied Microeconometrics I Applied Microeconometrics I Lecture 6: Instrumental variables in action Manuel Bagues Aalto University September 21 2017 Lecture Slides 1/ 20 Applied Microeconometrics I A few logistic reminders... Tutorial

More information

Applied Statistics and Econometrics. Giuseppe Ragusa Lecture 15: Instrumental Variables

Applied Statistics and Econometrics. Giuseppe Ragusa Lecture 15: Instrumental Variables Applied Statistics and Econometrics Giuseppe Ragusa Lecture 15: Instrumental Variables Outline Introduction Endogeneity and Exogeneity Valid Instruments TSLS Testing Validity 2 Instrumental Variables Regression

More information

Economics 345: Applied Econometrics Section A01 University of Victoria Midterm Examination #2 Version 1 SOLUTIONS Fall 2016 Instructor: Martin Farnham

Economics 345: Applied Econometrics Section A01 University of Victoria Midterm Examination #2 Version 1 SOLUTIONS Fall 2016 Instructor: Martin Farnham Economics 345: Applied Econometrics Section A01 University of Victoria Midterm Examination #2 Version 1 SOLUTIONS Fall 2016 Instructor: Martin Farnham Last name (family name): First name (given name):

More information

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, 2015-16 Academic Year Exam Version: A INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This

More information

The returns to schooling, ability bias, and regression

The returns to schooling, ability bias, and regression The returns to schooling, ability bias, and regression Jörn-Steffen Pischke LSE October 4, 2016 Pischke (LSE) Griliches 1977 October 4, 2016 1 / 44 Counterfactual outcomes Scholing for individual i is

More information

Linear Regression. Junhui Qian. October 27, 2014

Linear Regression. Junhui Qian. October 27, 2014 Linear Regression Junhui Qian October 27, 2014 Outline The Model Estimation Ordinary Least Square Method of Moments Maximum Likelihood Estimation Properties of OLS Estimator Unbiasedness Consistency Efficiency

More information

Econometrics Problem Set 11

Econometrics Problem Set 11 Econometrics Problem Set WISE, Xiamen University Spring 207 Conceptual Questions. (SW 2.) This question refers to the panel data regressions summarized in the following table: Dependent variable: ln(q

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

Answer Key: Problem Set 5

Answer Key: Problem Set 5 : Problem Set 5. Let nopc be a dummy variable equal to one if the student does not own a PC, and zero otherwise. i. If nopc is used instead of PC in the model of: colgpa = β + δ PC + β hsgpa + β ACT +

More information

Multiple Regression Analysis: Inference MULTIPLE REGRESSION ANALYSIS: INFERENCE. Sampling Distributions of OLS Estimators

Multiple Regression Analysis: Inference MULTIPLE REGRESSION ANALYSIS: INFERENCE. Sampling Distributions of OLS Estimators 1 2 Multiple Regression Analysis: Inference MULTIPLE REGRESSION ANALYSIS: INFERENCE Hüseyin Taştan 1 1 Yıldız Technical University Department of Economics These presentation notes are based on Introductory

More information

Instrumental Variables, Simultaneous and Systems of Equations

Instrumental Variables, Simultaneous and Systems of Equations Chapter 6 Instrumental Variables, Simultaneous and Systems of Equations 61 Instrumental variables In the linear regression model y i = x iβ + ε i (61) we have been assuming that bf x i and ε i are uncorrelated

More information

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity LECTURE 10 Introduction to Econometrics Multicollinearity & Heteroskedasticity November 22, 2016 1 / 23 ON PREVIOUS LECTURES We discussed the specification of a regression equation Specification consists

More information

WISE International Masters

WISE International Masters WISE International Masters ECONOMETRICS Instructor: Brett Graham INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This examination paper contains 32 questions. You are

More information

Problem 13.5 (10 points)

Problem 13.5 (10 points) BOSTON COLLEGE Department of Economics EC 327 Financial Econometrics Spring 2013, Prof. Baum, Mr. Park Problem Set 2 Due Monday 25 February 2013 Total Points Possible: 210 points Problem 13.5 (10 points)

More information

Econometrics Review questions for exam

Econometrics Review questions for exam Econometrics Review questions for exam Nathaniel Higgins nhiggins@jhu.edu, 1. Suppose you have a model: y = β 0 x 1 + u You propose the model above and then estimate the model using OLS to obtain: ŷ =

More information

Handout 12. Endogeneity & Simultaneous Equation Models

Handout 12. Endogeneity & Simultaneous Equation Models Handout 12. Endogeneity & Simultaneous Equation Models In which you learn about another potential source of endogeneity caused by the simultaneous determination of economic variables, and learn how to

More information

Economics 113. Simple Regression Assumptions. Simple Regression Derivation. Changing Units of Measurement. Nonlinear effects

Economics 113. Simple Regression Assumptions. Simple Regression Derivation. Changing Units of Measurement. Nonlinear effects Economics 113 Simple Regression Models Simple Regression Assumptions Simple Regression Derivation Changing Units of Measurement Nonlinear effects OLS and unbiased estimates Variance of the OLS estimates

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple

More information

ECNS 561 Topics in Multiple Regression Analysis

ECNS 561 Topics in Multiple Regression Analysis ECNS 561 Topics in Multiple Regression Analysis Scaling Data For the simple regression case, we already discussed the effects of changing the units of measurement Nothing different here Coefficients, SEs,

More information

Estimating the return to education for married women mroz.csv: 753 observations and 22 variables

Estimating the return to education for married women mroz.csv: 753 observations and 22 variables Return to education Estimating the return to education for married women mroz.csv: 753 observations and 22 variables 1. inlf =1 if in labor force, 1975 2. hours hours worked, 1975 3. kidslt6 # kids < 6

More information

An example to start off with

An example to start off with Impact Evaluation Technical Track Session IV Instrumental Variables Christel Vermeersch Human Development Human Network Development Network Middle East and North Africa Region World Bank Institute Spanish

More information

Eco 391, J. Sandford, spring 2013 April 5, Midterm 3 4/5/2013

Eco 391, J. Sandford, spring 2013 April 5, Midterm 3 4/5/2013 Midterm 3 4/5/2013 Instructions: You may use a calculator, and one sheet of notes. You will never be penalized for showing work, but if what is asked for can be computed directly, points awarded will depend

More information

One Economist s Perspective on Some Important Estimation Issues

One Economist s Perspective on Some Important Estimation Issues One Economist s Perspective on Some Important Estimation Issues Jere R. Behrman W.R. Kenan Jr. Professor of Economics & Sociology University of Pennsylvania SRCD Seattle Preconference on Interventions

More information

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data July 2012 Bangkok, Thailand Cosimo Beverelli (World Trade Organization) 1 Content a) Classical regression model b)

More information

Applied Economics. Panel Data. Department of Economics Universidad Carlos III de Madrid

Applied Economics. Panel Data. Department of Economics Universidad Carlos III de Madrid Applied Economics Panel Data Department of Economics Universidad Carlos III de Madrid See also Wooldridge (chapter 13), and Stock and Watson (chapter 10) 1 / 38 Panel Data vs Repeated Cross-sections In

More information

Inference in Regression Model

Inference in Regression Model Inference in Regression Model Christopher Taber Department of Economics University of Wisconsin-Madison March 25, 2009 Outline 1 Final Step of Classical Linear Regression Model 2 Confidence Intervals 3

More information

Job Training Partnership Act (JTPA)

Job Training Partnership Act (JTPA) Causal inference Part I.b: randomized experiments, matching and regression (this lecture starts with other slides on randomized experiments) Frank Venmans Example of a randomized experiment: Job Training

More information

EC402 - Problem Set 3

EC402 - Problem Set 3 EC402 - Problem Set 3 Konrad Burchardi 11th of February 2009 Introduction Today we will - briefly talk about the Conditional Expectation Function and - lengthily talk about Fixed Effects: How do we calculate

More information

Birkbeck Working Papers in Economics & Finance

Birkbeck Working Papers in Economics & Finance ISSN 1745-8587 Birkbeck Working Papers in Economics & Finance Department of Economics, Mathematics and Statistics BWPEF 1809 A Note on Specification Testing in Some Structural Regression Models Walter

More information

Lecture 14. More on using dummy variables (deal with seasonality)

Lecture 14. More on using dummy variables (deal with seasonality) Lecture 14. More on using dummy variables (deal with seasonality) More things to worry about: measurement error in variables (can lead to bias in OLS (endogeneity) ) Have seen that dummy variables are

More information

Sociology 593 Exam 2 March 28, 2002

Sociology 593 Exam 2 March 28, 2002 Sociology 59 Exam March 8, 00 I. True-False. (0 points) Indicate whether the following statements are true or false. If false, briefly explain why.. A variable is called CATHOLIC. This probably means that

More information

Topic 10: Panel Data Analysis

Topic 10: Panel Data Analysis Topic 10: Panel Data Analysis Advanced Econometrics (I) Dong Chen School of Economics, Peking University 1 Introduction Panel data combine the features of cross section data time series. Usually a panel

More information

Solutions to Problem Set 5 (Due November 22) Maximum number of points for Problem set 5 is: 220. Problem 7.3

Solutions to Problem Set 5 (Due November 22) Maximum number of points for Problem set 5 is: 220. Problem 7.3 Solutions to Problem Set 5 (Due November 22) EC 228 02, Fall 2010 Prof. Baum, Ms Hristakeva Maximum number of points for Problem set 5 is: 220 Problem 7.3 (i) (5 points) The t statistic on hsize 2 is over

More information

Introductory Econometrics

Introductory Econometrics Based on the textbook by Wooldridge: : A Modern Approach Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna November 23, 2013 Outline Introduction

More information

ECON3150/4150 Spring 2015

ECON3150/4150 Spring 2015 ECON3150/4150 Spring 2015 Lecture 3&4 - The linear regression model Siv-Elisabeth Skjelbred University of Oslo January 29, 2015 1 / 67 Chapter 4 in S&W Section 17.1 in S&W (extended OLS assumptions) 2

More information

Endogeneity. Tom Smith

Endogeneity. Tom Smith Endogeneity Tom Smith 1 What is Endogeneity? Classic Problem in Econometrics: More police officers might reduce crime but cities with higher crime rates might demand more police officers. More diffuse

More information

A Measure of Robustness to Misspecification

A Measure of Robustness to Misspecification A Measure of Robustness to Misspecification Susan Athey Guido W. Imbens December 2014 Graduate School of Business, Stanford University, and NBER. Electronic correspondence: athey@stanford.edu. Graduate

More information

Econometrics. 7) Endogeneity

Econometrics. 7) Endogeneity 30C00200 Econometrics 7) Endogeneity Timo Kuosmanen Professor, Ph.D. http://nomepre.net/index.php/timokuosmanen Today s topics Common types of endogeneity Simultaneity Omitted variables Measurement errors

More information

Wooldridge, Introductory Econometrics, 3d ed. Chapter 9: More on specification and data problems

Wooldridge, Introductory Econometrics, 3d ed. Chapter 9: More on specification and data problems Wooldridge, Introductory Econometrics, 3d ed. Chapter 9: More on specification and data problems Functional form misspecification We may have a model that is correctly specified, in terms of including

More information

ECO 310: Empirical Industrial Organization Lecture 2 - Estimation of Demand and Supply

ECO 310: Empirical Industrial Organization Lecture 2 - Estimation of Demand and Supply ECO 310: Empirical Industrial Organization Lecture 2 - Estimation of Demand and Supply Dimitri Dimitropoulos Fall 2014 UToronto 1 / 55 References RW Section 3. Wooldridge, J. (2008). Introductory Econometrics:

More information

Rockefeller College University at Albany

Rockefeller College University at Albany Rockefeller College University at Albany PAD 705 Handout: Simultaneous quations and Two-Stage Least Squares So far, we have studied examples where the causal relationship is quite clear: the value of the

More information

An overview of applied econometrics

An overview of applied econometrics An overview of applied econometrics Jo Thori Lind September 4, 2011 1 Introduction This note is intended as a brief overview of what is necessary to read and understand journal articles with empirical

More information

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16)

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16) Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16) 1 2 Model Consider a system of two regressions y 1 = β 1 y 2 + u 1 (1) y 2 = β 2 y 1 + u 2 (2) This is a simultaneous equation model

More information

Chapter 6 Stochastic Regressors

Chapter 6 Stochastic Regressors Chapter 6 Stochastic Regressors 6. Stochastic regressors in non-longitudinal settings 6.2 Stochastic regressors in longitudinal settings 6.3 Longitudinal data models with heterogeneity terms and sequentially

More information

STOCKHOLM UNIVERSITY Department of Economics Course name: Empirical Methods Course code: EC40 Examiner: Lena Nekby Number of credits: 7,5 credits Date of exam: Saturday, May 9, 008 Examination time: 3

More information

Linear Models in Econometrics

Linear Models in Econometrics Linear Models in Econometrics Nicky Grant At the most fundamental level econometrics is the development of statistical techniques suited primarily to answering economic questions and testing economic theories.

More information

The Simple Linear Regression Model

The Simple Linear Regression Model The Simple Linear Regression Model Lesson 3 Ryan Safner 1 1 Department of Economics Hood College ECON 480 - Econometrics Fall 2017 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 1 / 77 Bivariate

More information

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43 Panel Data March 2, 212 () Applied Economoetrics: Topic March 2, 212 1 / 43 Overview Many economic applications involve panel data. Panel data has both cross-sectional and time series aspects. Regression

More information

CHAPTER 7. + ˆ δ. (1 nopc) + ˆ β1. =.157, so the new intercept is = The coefficient on nopc is.157.

CHAPTER 7. + ˆ δ. (1 nopc) + ˆ β1. =.157, so the new intercept is = The coefficient on nopc is.157. CHAPTER 7 SOLUTIONS TO PROBLEMS 7. (i) The coefficient on male is 87.75, so a man is estimated to sleep almost one and one-half hours more per week than a comparable woman. Further, t male = 87.75/34.33

More information

Experiments and Quasi-Experiments

Experiments and Quasi-Experiments Experiments and Quasi-Experiments (SW Chapter 13) Outline 1. Potential Outcomes, Causal Effects, and Idealized Experiments 2. Threats to Validity of Experiments 3. Application: The Tennessee STAR Experiment

More information

Specification testing in panel data models estimated by fixed effects with instrumental variables

Specification testing in panel data models estimated by fixed effects with instrumental variables Specification testing in panel data models estimated by fixed effects wh instrumental variables Carrie Falls Department of Economics Michigan State Universy Abstract I show that a handful of the regressions

More information

Applied Microeconometrics (L5): Panel Data-Basics

Applied Microeconometrics (L5): Panel Data-Basics Applied Microeconometrics (L5): Panel Data-Basics Nicholas Giannakopoulos University of Patras Department of Economics ngias@upatras.gr November 10, 2015 Nicholas Giannakopoulos (UPatras) MSc Applied Economics

More information