Instrumental Variables

Size: px

Start display at page:

Download "Instrumental Variables"

Charlene Lynette Poole
5 years ago
Views:

1 Instrumental Variables Department of Economics University of Wisconsin-Madison January 22, 2018

2 Model Lets start with the constant treatment effect model For now take that to be Y i = αt i + X iβ + u i α measures the causal effect of T i on Y i. We don t want to just run OLS because we are worried that T i is not randomly assigned, that is that T i and u i are correlated. There are a number of different reasons that might be true-i think the main thing that we are worried about in the treatment effects literature is omitted variables. In this course we are going to think about a lot of different ways of dealing with this potential problem for estimating α.

3 Instrumental Variables Lets start with instrumental variables I want to think about three completely different approaches for estimating α The first is the GMM approach and the second two will come from a Simultaneous Equations framework

4 To justify OLS we would need E (T i u i ) = 0 E(X i u i ) = 0 The focus of IV is to try to relax the first assumption (There is much less concern about the second)

5 Lets suppose that we have an instrument Z i for which E (Z i u i ) = 0 and we continue to assume that E(X i u i ) = 0 We also will stick with the exactly identified case (1 dimensional Z i ) Define Then we know Z i = [ Zi X i [ E Zi ] [, Xi Ti = X i ] [ α, B = β ( )] Y i Xi B = 0 ]

6 Turning this into a GMM estimator we get 0 = 1 N = 1 N N i=1 Z i ( N Zi Y i i=1 Y i X i ( 1 N B IV ) N i=1 Z i X i ) B IV which we can solve as ( 1 B IV N And that is IV. N i=1 Z i X i ) 1 1 N N Zi Y i i=1

7 Consistency Notation I will use to mean asymptotically equivalent. Typically the phrase A n B n means that (A n B n ) p 0 So I want to be formal in this sense However I will not be completely formal as this could also mean almost sure convergence, convergence in distribution or some sort of uniform convergence. The differences between these are not relevant for anything we will do in class.

8 Consistency of IV ( 1 N B IV = N i=1 ( 1 N = N i=1 ( 1 =B + N B Z i X i Z i X i N i=1 ) 1 1 N ) 1 1 N Z i X i N Zi Y i i=1 N i=1 ) 1 1 N ( Z i X i B + Z i u i ) N Zi u i i=1

9 since 1 N N Zi u i 0 i=1 So (assuming iid sampling) this only took two assumptions. The moment conditions and the fact that you can invert ( ) E Zi Xi As we will discuss this assumption is typically a bigger deal than in OLS

10 Furthermore we get (Huber-White) standard errors from N ( B IV B) = ( 1 N Under standard conditions: N i=1 Z i X i ) 1 [ 1 N N i=1 Z i u i ] so N ( B IV B) 1 N N i=1 ( ( )) Zi u i N 0, E u 2 i Zi Zi ( ( N 0, E Zi Xi ) 1 E ( u 2 i Z i Z i ) ( E X i Z i ) 1 )

11 We approximate this by ( ) E Zi Xi 1 N ( ) E u 2 i Zi Zi N i=1 N i=1 Z i X i û 2 i Z i Z i

12 It actually turns out that IV is not unbiased even when it is consistent, so lets derive the asymptotic bias more generally Before that I want to take a detour and discuss partitioned regression which will turn out to be really useful for this and we will use it many times in this course

13 Partitioned Regression Think about the standard regression model (in large matrix notation) Y = X 1 β 1 + X 2 β 2 + u We will define M 2 I X 2 ( X 2 X 2 ) 1 X 2. Two facts about M 2

14 Fact 1: M 2 is idempotent ( ( ) M 2 M 2 = I X 2 X 1 2 X 2 X 2 ) ( ( ) ) I X 2 X 1 2 X 2 X 2 =I 2X 2 ( X 2 X 2 ) 1 X 2 + X 2 ( X 2 X 2 ) 1 X 2X 2 ( X 2 X 2 ) 1 X 2 =I X 2 ( X 2 X 2 ) 1 X 2 =M 2

15 Fact 2: M 2 Y is the Residuals from Regression For any potential dependent variable (say Y), M 2 Y is the residuals I would get if I regressed Y on X 2 To see that let the regression coefficients be be ĝ and generically let Ỹ be residuals from a regression of Y on X 2 so that Ỹ Y X 2 ĝ ( ) =Y X 2 X 1 2 X 2 X 2 Y [ ( ) ] = I X 2 X 1 2 X 2 X 2 Y =M 2 Y.

16 An important special case of this is that if I regress something on itself, the residuals are all zero That is M 2 X 2 = X 2 X 2 (X 2X 2 ) 1 X 2X 2 = 0

17 If I think of the GMM moment equations for least squares I get the two equations ) 0 = X 1 (Y X 1 β1 X 2 β2 ) 0 = X 2 (Y X 1 β1 X 2 β2 We can solve for β 2 in the second as β 2 = ( X 2X 2 ) 1 X 2 ( Y X 1 β1 ) Now plug this into the first ) 0 = X 1 (Y X 1 β1 X 2 β2 = X 1 (Y ( ) )) X 1 β1 X 2 X 1 2 X 2 X 2 (Y X 1 β1 = X 1M 2 Y X 1M 2 X 1 β1

18 Or β 1 = ( X 1M 2 X 1 ) 1 X 1 M 2 Y = ( X 1 X 1 ) 1 X Ỹ That is if I 1 Run a regression of X 1 on X 2 and form its residuals X 1 2 Run a regression of Y on X 2 and form its residuals Ỹ 3 Run a regression of Ỹ on X 1 Since I derived this from the sample analogue of the GMM moment equations (or the normal equations), this gives me exactly the same result as if I had run the full regression of Y on X 1 and X 2

19 It turns out the same idea works for IV. Put everything we had before into large Matrix notation and we can write GMM as: 0 = Z ( Y T α IV X β IV ) The second can be solved as Now plug this into the first 0 = X ( Y T α IV X β IV ) β IV = ( X X ) 1 X (Y T α IV ) ( 0 = Z Y T α IV X β ) IV ( = Z Y T α IV X ( X X ) ) 1 X (Y T α IV ) = Z M X Y Z M X T α IV

20 so α IV = ( Z M X T ) 1 Z M X Y = Z Ỹ Z T cov( Z i, Ỹ i ) cov( Z i, T i ) (This last expression assumes that there is an intercept in the model. If not it would be expected values rather than covariances, but covariances make things easier to interpret-at least to me)

21 To see consistency from this perspective note that so Ỹ =M X Y =αm X T + M X Xβ + M X u =α T + ũ α IV cov( Z i, Ỹ i ) cov( Z i, T i ) cov( Z i, α T i + ũ i ) cov( Z i, T i ) = α + cov( Z i, ũ i ) cov( Z i, T i )

22 This formula is helpful. In order for the model to be consistent you need cov( Z i, ũ i ) = 0 cov( Z i, T i ) 0 But more generally for the asymptotic bias to be small you want cov( Z i, ũ i ) to be small cov( Z i, T i ) to be large This means that in practice there is some tradeoff between them. If your instrument is not very powerful, a little bit of correlation in cov( Z i, ũ i ) could lead to a large asymptotic bias.

23 As a concrete example lets compare IV to OLS. OLS is really just a special case of IV with Z i = T i Then we get α IV α + cov( Z i, ũ i ) cov( Z i, T i ) α OLS α + cov( T i, ũ i ) cov( T i, T i ) If cov( Z i, ũ i ) = 0 and cov( T i, ũ i ) 0 then IV is consistent and OLS is not However, cov( Z i, ũ i ) < cov( T i, ũ i ) does not guarantee less bias because it also depends on cov( Z i, T i ) = 0 and cov( T i, T i ) 0

24 Assumptions Lets think about the assumptions Lets ignore the first stage thing and assume that isn t a problem we looked at two different things GMM My derivation E(Z i u i ) = 0, E(X i u i ) = 0 cov( Z i, ũ i ) = 0 Are these the same?

25 Well... We know that GMM gives consistent estimates and if cov( Z i, ũ i ) 0 then α is not consistent. So the first assumptions better imply the second However, it doesn t have to be the other way around. We need cov( Z i, ũ i ) = 0 to get consistent estimates of α, but we haven t required consistent estimates of β.

26 To see it doesn t go the other way, suppose that I can write with E(X i, ε i ) = 0, E(Z i, ε i ) = 0 We can write u i = X iδ + ε i Y i =αt i + X iβ + u i αt i + X i(β + δ) + ε i So IV gives a consistent estimate of α but not β

27 Simultaneous equations The second and third way to see IV comes from the simultaneous equations framework Y i = αt i + X iβ + u i T i = ρy i + X iγ + Z i δ + ν i Thee are called the structural equations Note the difference between X i and Z i in that we restrict what can affect what. We could also have stuff that affects Y i but not not T i but lets not worry about that (we are still allowing this as a possibility as some of the γ coefficients could be zero) The model with ρ = 0 simplifies things, but lets focus on what happens when it isn t

28 We assume that E(u i X i, Z i ) = 0 E(v i X i, Z i ) = 0 but notice that if ρ 0, then almost for sure T i is correlated with u i because u i influences T i through Y i

29 It is useful to calculate the reduced form for T i, namely T i = ρy i + X iγ + Z i δ + ν i = ρ [ αt i + X iβ ] + u i + X i γ + Z i δ + ν i = ραt i + X i [ρβ + γ] + Z iδ + (ρu i + ν i ) = X i ρβ + γ 1 ρα + δ Z i 1 ρα + ρu i + ν i 1 ρα = X iβ 2 + Z iδ 2 + νi where β2 ρβ + γ 1 ρα δ2 δ 1 ρα ν i ρu i + ν i 1 ρα Note that E(νi X i, Z i ) = 0, so one can obtain a consistent estimate of β2 and δ 2 by regressing T i on X i and Z i.

30 This is called the reduced form equation for T i Note that the parameters here are not the fundamental structural parameters themselves, but the are a known function of these parameters To me this is the classic definition of reduced form (you need to have a structural model)

31 We can obtain a consistent estimate of α as long as we have an exclusion restriction That is we need some Z i that affects T i but not Y i directly I want to show this in two different ways

32 Method 1 We can also solve for the reduced form for Y i Y i = αt i + X iβ + u i αγ + β = X i 1 αρ + Z αδ i 1 αρ + αν i + u i 1 αρ = X i β1 + Z i δ1 + u i with β1 αγ + β 1 αρ δ1 αδ 1 αρ u i αν i + u i 1 αρ Like the other reduced form, we can get a consistent estimate of β 1 and δ 1 by regressing Y i on X i and Z i.

33 Notice then that δ1 δ2 = α So we can get a consistent estimate of α simply by taking the ratio of the reduced form coefficients It also gives another interpretation of IV: δ 2 is the causal effect of Z i on T i δ 1 is the causal effect of Z i on Y i -it only operates through T i

34 Z i Y i 2 T i If we increase Z i by one unit this leads T i to increase by δ 2 units which causes Y i to increase by δ 2 α units Thus the causal effect of Z i on Y i is δ 2 α units

35 This illustrates another important way to think about an instrument: the key assumption is that T i is the only channel through which Z i influences Y i

36 Suppose there was another d i a Z i Y i 2 T i Then the causal effect of Z i on Y i would be δ2 α + da and IV would be δ 2 α + da δ 2

37 In the exactly identified case (i.e. one Z i ), this is numerically identical to IV. To see why, note that in a regression of Y i and T i on (X i, Z i ) yields δ 1 = Z iỹi Z i Z i so δ 2 = Z i T i Z i Z i δ 1 δ 2 = Z iỹi Z i T i = α IV This is just math-it does not require that the Structural equation" determining T i be correct

38 Method 2 Define T f i X iβ 2 + Z iδ 2 and suppose that T f i were known to the econometrician Now notice that Y i = αt i + X iβ + u i [ ] = α T f i + νi + X iβ + u i = αt f i + X iβ 2 + (αν i + u i ) One could get a consistent estimate of α by regressing Y i on X i and T f i.

39 Two Stage Least Squares In practice we don t know T f i but can get a consistent estimate of it from the fitted values of a reduced form regression call this T i (it is crucial that the reduced form gives us consistent estimates of β 2 and δ 2 ) That is: 1 Regress T i on X i and Z i, form the predicted value T i 2 Regress Y on X i and T i To run the second regression one needs to be able to vary T i separately from X i which can only be done if there is a Z i

40 It turns out that 2SLS is also is numerically identical to IV (with 1 instrument) Note that so T = Z ( Z Z ) 1 Z T B 2SLS = ( [ T X ] [ T X ]) 1 [ T X ] Y However, note that we can write X = Z ( Z Z ) 1 Z X That is projecting X on (X, Z) and using it to predict X will be a perfect fit.

41 That means that(using notation from earlier) that [ T X ] = Z ( Z Z ) 1 Z X Then (in the exactly identified case) we can write ( B ( 2SLS = X Z Z Z ) 1 ( Z Z Z Z ) ) 1 1 Z X ( X Z Z Z ) 1 Z Y ( ( = X Z Z Z ) 1 1 Z X ) ( X Z Z Z ) 1 Z Y = (Z X ) 1 ( Z Z ) ( X Z ) 1 ( X Z Z Z ) 1 Z Y (Z X ) 1 ( = = ( Z X ) 1 Z Y = β IV Z Z ) ( Z Z ) 1 Z Y

42 3 Interpretations Thus with 1 instrument we have 3 equivalent ways to derive IV: 1 GMM estimator or (Z T) 1 Z Y 2 Ratio of reduced form estimates-rescaling the reduced form 3 2SLS-direct effect of fitted model With more than one instrument only one of these procedures works-we ll worry about that later

43 Examples There are three main reasons people use IV 1 Simultaneity bias: ρ 0 2 Omitted Variable bias : There are unobservables that contribute to u i that are correlated with T i 3 Measurement Error: We do not observe T i perfectly While the first is the original reason for IV, in practice omitted variable bias is typically the biggest concern A classic (perhaps the classic) example is the returns to schooling.

44 Returns to Schooling This comes from the Card s chapter in the 1999 Handbook of Labor Economics Lets assume that log(w i ) = αs i + X iβ + ε i where W i is wages, S i is schooling, and X i is other stuff The biggest problem is unobserved ability

45 We are worried about ability bias we want to use instrumental variables A good instrument should have two qualities: It should be correlated with schooling It should be uncorrelated with unobserved ability (and other unobservables) Many different things have been tried. Lets go through some of them

46 Family Background If my parents earn quite a bit of money it should be easier for me to borrow for college Also they might put more value on education This should make me more likely to go This has no direct effect on my income-wisconsin did not ask how much education my Father had when they made my offer But is family background likely to be uncorrelated with unobserved ability?

47 Closeness of College If I have a college in my town it should be much easier to attend college I can live at home If I live on campus I can travel to college easily I can come home for meals and to get my clothes washed I can hang out with my friends from High school But is this uncorrelated with unobserved ability?

48 Quarter of Birth This is the most creative Consider the following two aspects of the U.S. education system (this actually varies from state to state and across time but ignore that for now), People begin Kindergarten in the calendar year in which they turn 5 You must stay in school until you are 16 Now consider kids who: Can t stand school and will leave as soon as possible Obey truancy law and school age starting law Are born on either December 31,1972 or January 1,1973

49 Those born on December 31 will turn 5 in the calendar year 1977 and will start school then (at age 4) will stop school on their 16th birthday which will be on Dec. 31, 1988 thus they will stop school during the winter break of 11th grade Those born on January 1 will turn 5 in the calendar year 1978 and will start school then (at age 5) will stop school on their 16th birthday which will be on Jan. 1, 1989 thus they will stop school during the winter break of 10th grade

50 The instrument is a dummy variable for whether you are born on Dec. 31 or Jan 1 This is pretty cool: For reasons above it will be correlated with education No reason at all to believe that it is correlated with unobserved ability The Fact that not everyone obeys perfectly is not problematic: An instrument just needs to be correlated with schooling, it does not have to be perfectly correlated In practice we can t just use the day as an instrument, use quarter of birth instead

51 Policy Changes Another possibility is to use institutional features that affect schooling Here often institutional features affect one group or one cohort rather than others

55 Consistently IV estimates are higher than OLS Why? Bad Instruments Ability Bias Measurement Error Publication Bias Discount Rate Bias

56 Heterogeneous Treatment Effects OK lets go back to the simple model but think about heterogeneous treatment effects following Imbens and Angrist (1994) We can write Y i = T i Y 1i + (1 T i ) Y 0i = T i (Y 1i Y 0i ) + Y 0i = i T i + Y 0i Assume our binary instrument Z i is correlated with T i but not with i or Y 0i (or equivalently Y 0i or Y 1i ) Does IV estimate the ATE?

57 There are 4 different types of people those for whom T i = 1 when: : Z i = 1, Z i = 0 : Z i = 1 only : Never : Z i = 0 only Imbens and Angrist s monotonicity rules out 4 as a possibility We also assume that Z i is independent of type Let µ, µ, and µ represent the sample proportions of the three groups and S i an indicator of the type

58 We know α 2SLS = Y 1 Y 0 E (Y i Z i = 1) E (Y i Z i = 0) T 1 T 0 E (T i Z i = 1) E (T i Z i = 0) Lets figure out the numerator and denominator E (Y i Z i = 1) E (Y i Z i = 0) =E ( i T i + Y 0i Z i = 1) E ( i T i + Y 0i Z i = 0) =E ( i T i Z i = 1) E ( i T i Z i = 0) = [µ E( i T i S i =, Z i = 1) + µ E( i T i S i =, Z i = 1) + µ E( i T i S i =, Z i = 1)] [µ E( i T i S i =, Z i = 0) + µ E( i T i S i =, Z i = 0) + µ E( i T i S i =, Z i = 0)] = [µ E( i S i = ) + µ E( i S i = ) + µ 0] [µ E( i S i = ) + µ 0 + µ 0] =µ E( i S i = )

59 E (T i Z i = 1) E (T i Z i = 0) = [µ E(T i S i =, Z i = 1) + µ E(T i S i =, Z i = 1) + µ E(T i S i =, Z i = 1)] [µ E(T i S i =, Z i = 0) + µ E(T i S i =, Z i = 0) + µ E(T i S i =, Z i = 0)] = [µ + µ ] =µ [µ )] Thus E (Y i Z i = 1) E (Y i Z i = 0) E (T i Z i = 1) E (T i Z i = 0) =E( i S i = ) They call this the local average treatment effect (LATE)

60 Measurement Error Another way people use instruments is for measurement error In the classic model lets get rid of X s so we want to measure the effect of T on Y. Y i = β 0 + αt i + u i and lets not worry about other issues so assume that cov(t i, u i ) = 0. The complication is that I don t get to observe T i, I only get to observe a noisy version of it: τ 1i = T i + ξ i where ξ i is i.i.d measurement error with variance σ 2 ξ

61 What happens if I run the regression on τ 1i instead of T i? α Cov (τ 1i, Y i ) Var(τ 1i ) = Cov (T i + ξ i, β 0 + αt i + u i ) Var(T i + ξ i ) = α Var (T i) Var(T i ) + σ 2 ξ

62 Why is OLS biased? Lets rewrite the model as Y i = β 0 + αt i + u i = β 0 + αt i + αξ i + u i αξ i = β 0 + ατ 1i + (u i αξ i ). You can see the problem with OLS: τ 1i = T i + ξ i is correlated with (u i αξ i )

63 Now suppose we have another measure of T i, τ 2i = T i + η i where η i is uncorrelated with everything else in the model. Using this as an instrument gives us a solution. τ 2i is correlated with τ 1i (through T i ),but uncorrelated with (u i αξ i ) so we can use τ 2i as an instrument for τ 1i.

64 Twins (Here we will think about both measurement error and fixed effect approaches) log(w if ) = θ f + αs if + u if The problem is that θ f is correlated with S if We can solve by differencing log(w f ) = α S f + u f if S f is uncorrelated with u f, then we can use this to get consistent estimates of α

65 The problem here is that a little measurement error can screw up things quite a bit because the variance of S f is small. A solution of this is to get two measures on schooling Ask me about my schooling Also ask my brother about my schooling do the same think for my brother s schooling This gives us two different measure of S if. Use one as an instrument for the other

67 The Colonial Origins of Comparative Development: An Empirical Investigation by Acemoglu, Johnson, and Robinson Time for a non-education example-a very important paper on an extremely important issue Their goal is to estimate the effects of institutions on Economic Development The idea is that countries with better institutions should develop more

68 Specifically they want to estimate their equation: log(y i ) = µ + αr i + X iγ + ε i where y i is GDP per capita R i is protection against expropriation" X i is other stuff Lets take a preliminary look

69 VOL. 91 NO. 5 ACEMOGLU ET AL.: THE COLONIAL ORIGINS OF DEVELOPMENT 1379 TABLE 2-OLS REGRESSIONS Whole Base Whole Whole Base Base Whole Base world sample world world sample sample world sample (1) (2) (3) (4) (5) (6) (7) (8) Dependent variable is log output per Dependent variable is log GDP per capita in 1995 worker in 1988 Average protection against expropriation (0.04) (0.06) (0.06) (0.05) (0.06) (0.06) (0.04) (0.06) risk, Latitude (0.49) (0.51) (0.70) (0.63) Asia dummy (0.19) (0.23) Africa dummy (0.15) (0.17) "Other" continent dummy (0.20) (0.32) R Number of observations Notes: Dependent variable: columns (1)-(6), log GDP per capita (PPP basis) in 1995, current prices (from the World Bank's World Development Indicators 1999); columns (7)-(8), log output per worker in 1988 from Hall and Jones (1999). Average protection against expropriation risk is measured on a scale from 0 to 10, where a higher score means more protection against expropriation, averaged over 1985 to 1995, from Political Risk Services. Standard errors are in parentheses. In regressions with continent dummies, the dummy for America is omitted. See Appendix Table Al for more detailed variable definitions

70 THE AMERICAN ECONOMIC REVIEW DECEMBER HKG S CAN r) ~~~~~~~~~~~~MLTBHS H GM PER DOMTW) N- 8 SLV BO[GU IDN. ~~~HTI SDN 'MM TGO RD 7 A l NEBRGD NGA 0~~~~~~~~ a) EhE TZA m~ 6m 0 4' Average Expropriation Risk FIGURE 2. OLS RELATIONSHIP BETWEEN EXPROPRIATION RISK AND INCOME

The problem is that there is no reason to believe institutions are set at random THE AMERICAN ECONOMIC REVIEW DECEMBER 2001 their income They suggest per current using mortalities institutions in of

71 The problem is that there is no reason to believe institutions are set at random THE AMERICAN ECONOMIC REVIEW DECEMBER 2001 their income They suggest per current using mortalities institutions in of these settlers countries.2 as an instrument More with the following argument specifically, our theory can be schematically nstitutions on eco- summarized as a source of exogs. In this paper, we (potential) settler tional differences > settlements mortality y Europeans,' and possible source of early current eory rests on three institutions institutions es of colonization ferent sets of instiuropean powers set plified by the Belngo. These instituuch protection for hey provide checks ernment expropria- current performance. We use data on the mortality rates of soldiers, bishops, and sailors stationed in the colonies be- Lets see whattween they the findseventeenth and nineteenth centuries, largely based on the work of the historian Philip D. Curtin. These give a good indication of the

72 VOL 91 NO. 5 ACEMOGLU ET AL.: THE COLONIAL ORIGINS OF DEVELOPMENT 1377 TABLE 1-DESCRIPTIVE STATISTICS By quartiles of mortality Whole world Base sample (1) (2) (3) (4) Log GDP per capita (PPP) in (1.1) (1.1) Log output per worker in (with level of United States (1.1) (1.0) normalized to 1) Average protection against expropriation risk, (1.8) (1.5) Constraint on executive in (2.3) (2.3) Constraint on executive in (1.8) (2.1) Constraint on executive in first year of independence (2.4) (2.4) Democracy in (2.6) (3.0) European settlements in (0.4) (0.3) Log European settler mortality n.a (1.1) Number of observations Notes: Standard deviations are in parentheses. Mortality is potential settler mortality, measured in terms of deaths per annum per 1,000 "mean strength" (raw mortality numbers are adjusted to what they would be if a force of 1,000 living people were kept in place for a whole year, e.g., it is possible for this number to exceed 1,000 in episodes of extreme mortality as those

73 THE AMERICAN ECONOMIC REVIEW DECE 10 USA NZL CAN AUS SGP C) co) IND GMB v 8 ' MYS,fflLANGAB 0) K i?r) Z A ^~~~~~L JAM -F,6._ T ~~~~~~~~CMR GINGH NABGA MDG 4 SDN MLI 4 ~~~~~~~~~~~HTI ZAR Log of Settler Mortality

74 TABLE 3-DETERMINANTS OF INSTITUTIONS European settlements in (0.73) (0.93) (0.90) (1.20) Log European settler mortality (0.17) (0.18) (0.24) (0.25) (0.02) (0.02) Latitude (1.80) (1.70) (2.30) (2.40) (0.19) R Number of observations (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) Panel A Dependent Variable Is Average Protection Against Expropriation Risk in Constraint on executive in (0.08) (0.09) Democracy in (0.06) (0.07) Constraint on executive in first year of independence (0.08) (0.08) European settlements in (0.61) (0.78) Log European settler mortality (0.13) (0.14) Latitude (1.40) (1.50) (1.40) (1.51) (1.34) R Number of observations Dependent Variable Is European Dependent Variable Is Constraint Dependent Variable Is Settlements in Panel B on Executive in 1900 Democracy in

75 O. 5 ACEMOGLU ET AL.: THE COLONIAL ORIGINS OF DEVELOPMENT 10 'Ivp LO) < PANGA tl FJ GUY AGO Xi PAKIND SDN GMB 0a co BGD NERMD NGA tl 6 ETH TA n- 6 SI Log of Settler Mortality FIGURE 1. REDUCED-FORM RELATIONSHIP BETWEEN INCOME AND SETTLER MORTALITY

76 TABLE 4-IV REGRESSIONS OF LOG GDP PER CAPITA Base Base Base sample, Base Base sample sample dependen Base sample Base sample sample sample with with variable i Base Base without without without without continent continent log outpu sample sample Neo-Europes Neo-Europes Africa Africa dummies dummies per worke (1) (2) (3) (4) (5) (6) (7) (8) (9) Panel A: Two-Stage Least Squares Average protection against expropriation risk (0.16) (0.22) (0.36) (0.35) (0.10) (0.12) (0.30) (0.46) (0.17) Latitude (1.34) (1.46) (0.84) (1.8) Asia dummy (0.40) (0.52) Africa dummy (0.36) (0.42) "Other" continent dummy (0.85) (1.0) Panel B: First Stage for Average Protection Against Expropriation Risk in Log European settler mortality (0.13) (0.14) (0.13) (0.14) (0.22) (0.24) (0.17) (0.18) (0.13) Latitude (1.34) (1.50) (1.43) (1.40) Asia dummy (0.49) (0.50) Africa dummy (0.41) (0.41) "Other" continent dummy (0.84) (0.84) R

77 Overidentification What happens when we have more than one instrument? Lets think about a general case in which Z i is multidimensional Let K Z be the dimension of Zi Let K X denote the dimension of Xi Now we have more equations then parameters so we can no longer solve for B using ( 0 = Z Y X B ) because this gives us K Z equations in K X unknowns.

78 A simple solution is follow GMM and weight the moments by some K Z K Z weighting matrix Ω and then minimize [ ] [ ] Z (Y X B) Ω Z (Y X B) which gives ( 2X Z ΩZ Y X B ) = 0 (notice that in the exactly identified case X Z Ω drops out) We can solve directly for our estimator ( B GMM = X Z ΩZ X ) 1 X Z ΩZ Y

79 Two staged least squares is a special case of this: B 2SLS = ( X Z ( Z Z ) 1 Z X ) 1 X Z ( Z Z ) 1 Z Y Notice that this is the same as B GMM when Ω = (Z Z ) 1

80 Consistency ( 1 B GMM = N X Z Ω 1 ) 1 N Z X 1 N X Z Ω 1 N Z (X B + U) ( 1 =B + N X Z Ω 1 ) 1 N Z X 1 N X Z Ω 1 N Z U ( ( ) ( )) 1 ( ) B + E Xi Zi ΩE Zi Xi E Xi Zi ΩE(Zi u i ) =B

81 Inference N ( B B) = ( 1 N X Z Ω 1 ) 1 N Z X 1 N X Z Ω 1 Z U N Using a standard central limit theorem with i.i.d. data Thus N 1 Z U = 1 N N i=1 ( N 0, E ( Z i u i u 2 i Z i Z i N ( B B) N ( 0, A VA ) )) with ( V =E A = ( E Xi Zi ( ) X i Z i ( ΩE ) ΩE u 2 i Zi Zi ( Z i X i ) ( ) ΩE Zi Xi )) 1

82 From GMM results we know that the efficient weighting matrix is ( Ω =E u 2 i Zi Zi ) 1 in which case the Covariance matrix simplifies to ( ( E X i Z i ) ( E u 2 i Z i Z i ) 1 E ( Z i X i ) ) 1 This also means that under homoskedasticity two staged least squares is efficient.

83 Overidentification Tests Lets think about testing in the following way. Suppose we have two instruments so that we have three sets of moment conditions ( 0 = Z 1 Y T α X β ) ( 0 = Z 2 Y T α X β ) ( 0 = X Y T α X β )

84 As before we can use partitioned regression to deal with the X s and then write the first two moment equations as ) 0 = Z 1 (Ỹ T α ) 0 = Z 2 (Ỹ T α The way I see the overidentification test is whether we can find an α that solves both equations.

85 That is let α 1 = Z 1Ỹ Z 2 T α 2 = Z 1Ỹ Z 2 T If α 1 α 2 then the test will not reject the model, otherwise it will For this reason I am not a big fan of overidentification tests: If you have two crappy instruments with roughly the same bias you will fail to reject Why not just estimate α 1 and α 2 and look at them? It seems to me that you learn much more from that than a simple F-statistic

Instrumental Variables

Instrumental Variables Department of Economics University of Wisconsin-Madison September 27, 2016 Treatment Effects Throughout the course we will focus on the Treatment Effect Model For now take that to