Instrumental Variables, Simultaneous and Systems of Equations

Chapter 6 Instrumental Variables, Simultaneous and Systems of Equations 61 Instrumental variables In the linear regression model y i = x iβ + ε i (61) we have been assuming that bf x i and ε i are uncorrelated This happens to be a critical assumption because without it none of the proofs of consistency and unbiasedness of OLS or GLS would remain valid An alternative method of estimation to deal with this problem is the method of Instrumental Variables (IV) The idea in IV estimation is the existence of an additional set of variables z i that are correlated with x i, but uncorrelated with ε i 611 Assumptions Recall the assumptions of the classical regression model: (1) linearity, (2) full rank, (3) exogeneity of the independent variables, (4) homoscedasticity and nonautocorrelation, (5) stochastic or nonstochastic data, and (6) normally distributed disturbances All but assumption (3) will hold here Our new assumption is: ASSUMPTION 3IV : E[ε i x i ]=η i (62) The implication from this assumption is that the disturbances and the regressors are now correlated: ASSUMPTION 3IV : E[x i ε i ]=γ (63) where γ 0 We say that the regressors X are no longer exogenous Now, assume that there exist an additional set of variables Z such that the two following properties hold: 1 EXOGENEITY: Z is uncorrelated with the disturbance ε 55

56 6 Instrumental Variables, Simultaneous and Systems of Equations 2 RELEVANCE: Z is correlated with the independent variables X Z is called the set of instrumental variables 612 Ordinary Least Squares Under ASSUMPTION 3IV the OLS estimator, b, is no longer unbiased: E[b X]=β +(X X) 1 X η β (64) Hence, the Gauss-Markov theorem does not longer hold Not only that OLS is biased, it is also inconsistent 613 Instrumental Variables estimator The derivation of the IV estimator is the following: y = Xβ + ε (65) Z y = Z Xβ + Z ε Z y = Z Xβ b IV = (Z X) 1 Z y The estimate of the asymptotic variance of b IV is: EstAsyVar[b IV ]= ˆσ 2 (Z X) 1 (Z Z)(X Z) 1, (66) where ˆσ 2 is the consistent estimator of σ 2 and it is given by: ˆσ 2 = ˆε ˆε n, (67) with ˆε being just the estimated residuals from the IV regression 614 Two-Stage Least Squares The IV approach works fine when the number of columns in Z is the same as the number of columns in X However, what if we have more instruments than regressors in the model? Then b IV cannot be estimated because Z X will not be full rank and cannot be inverted Because every linear combination of Z is uncorrelated with ε, one naive approach would be simply to drop columns in Z until we have the same

61 Instrumental variables 57 number of columns as in X Clearly this means that we are loosing some informations A better approach is to use the projection of the columns of X on the column space of Z: ˆX=Z(Z Z) 1 Z X (68) Then, we just use ˆX as the set of instruments Z in the IV estimation b IV = ( ˆX X) 1 ˆX y (69) = [X Z(Z Z)Z X] 1 X Z(Z Z) 1 Z y = [X (I M Z )X] 1 X (I M Z )y = ( ˆX ˆX) 1 ˆX y M Z is the residual maker matrix based on Z, hence (I M Z ) is the (idempotent) projection matrix based on Z When ˆX is the set of instruments, the IV estimator is computed by OLS of y on ˆX Therefore, we can see b IV as a two step approach; first computing ˆX and then using ˆX in Equation 69 This is why this is called the two-stage least squares (2SLS) estimator A key point here is that the asymptotic covariance matrix in Equation 66 should not use the ˆσ 2 presented in Equation 67 because this last one is inconsistent for σ 2 615 Hausman and Wu Test If X is uncorrelated with ε, the asymptotic covariance of the OLS estimator is never larger than that of the IV estimator Hence, if X is not endogenous (not correlated with ε), OLS is the preferred estimator Obviously, if X is endogenous we have to use IV The problem in choosing between OLS and IV is that sometimes it is not clear whether X is correlated with ε Recall that ε is unobserved, so the question is not that trivial Fortunately Hausman (1978) and Wu (1973) developed a test The idea behind the Hausman s test is as follows The null hypothesis is that X is uncorrelated with ε Under the null we have two consistent estimators of β, b LS and b IV Under the alternative only b IV is consistent Then, if we examine: d=b IV b LS, (610) under the null, plim d=0, whereas under the alternative, plim d 0 d is given by: and the Wald statistic used to test this hypothesis is: d=( ˆX ˆX) 1 ˆX e, (611) H = d [( ˆX ˆX) 1 (X X) 1 ] 1 d s 2 (612)

58 6 Instrumental Variables, Simultaneous and Systems of Equations 616 Instrumental variables and the Hausman test in Stata Based on data from the 1980 census, we want to estimate the following equation: rent i = β 0 + β 1 hsngval i + β 2 pcturban i + u i, (613) where rent is the median monthly gross rent, hsngval is the median dollar value of owner-occupied housing, and pcturban is the percentage of the population living in urban areas Because random shocks that affect rental rates in a state probably also affect housing values, we treathsngval as endogenous (correlated with u i ) Because hsngval is potentially endogenous, we need at least one instrument that is correlated with hsngval but uncorrelated with u i These excluded exogenous variables must not affect rent, because if they do then they should be included in the regression We have family income (faminc) and region of the country (region) that we believe are correlated withhsngval but not with u i The set of instrumental variables is then pcturban, faminc, 2region, 3region, and4region Let s say that we first estimate the model using OLS: use http://wwwstata-presscom/data/r11/hsng regress rent pcturban hsngval Source SS df MS Number of obs = 50 -------------+------------------------------ F( 2, 47) = 4754 Model 409835269 2 204917635 Prob > F = 00000 Residual 202595931 47 431055172 R-squared = 06692 -------------+------------------------------ Adj R-squared = 06551 Total 6124312 49 124985959 Root MSE = 20762 rent Coef Std Err t P> t [95% Conf Interval] pcturban 5248216 2490782 211 0040 0237408 1025902 hsngval 0015205 0002276 668 0000 0010627 0019784 _cons 1259033 1418537 888 0000 9736603 1544406 The 2SLS command is: ivregress 2sls rent pcturban (hsngval = faminc iregion) Instrumental variables (2SLS) regression Number of obs = 50 Wald chi2(2) = 9076 Prob > chi2 = 00000 R-squared = 05989 Root MSE = 22166 rent Coef Std Err z P> z [95% Conf Interval] hsngval 0022398 0003284 682 0000 0015961 0028836 pcturban 081516 2987652 027 0785-504053 667085 _cons 1207065 1522839 793 0000 9085942 1505536 Instrumented: hsngval Instruments: pcturban faminc 2region 3region 4region Notice the big difference between the OLS estimate and the 2SLS estimate for hsngval If in addition we are interested in the result from the first stage, we need to type The 2SLS command is:

62 Simultaneous equations 59 ivregress 2sls rent pcturban (hsngval = faminc iregion), first First-stage regressions ----------------------- Number of obs = 50 F( 5, 44) = 1966 Prob > F = 00000 R-squared = 06908 Adj R-squared = 06557 Root MSE = 92534821 hsngval Coef Std Err t P> t [95% Conf Interval] pcturban 1822201 1150167 158 0120-4958092 4140211 faminc 2731324 6818931 401 0000 1357058 4105589 region 1 (empty) 2-5095038 4122112-124 0223-1340261 3212533 3-177805 4072691-044 0665-9986019 6429919 4 1341379 4048141 331 0002 5255296 2157228 _cons -1867187 1199548-156 0127-4284717 5503439 For the Hausman-Wu test we need to run estat endogenous Tests of endogeneity Ho: variables are exogenous Durbin (score) chi2(1) = 128473 (p = 00003) Wu-Hausman F(1,46) = 159067 (p = 00002) This command reports the Durbin and the Hausman-Wu endogeneity test The null hypothesis in these tests is that the variable under consideration (hsngval) can be treated as exogenous Here both test statistics are highly significant, so we reject the null of exogeneity and we must treathsngval as endogenous 62 Simultaneous equations An illustrative example of a system of simultaneous equations is a model of market equilibrium: Demand equation : q d,t = α 1 p t + α 2 x t + ε d,t (614) Supply equation : q s,t = β 1 p t + ε s,t (615) Equilibrium condition : q d,t = q s,t = q t (616) There are structural equations derived theoretically, where each describes a particular aspect of the economy Because the market equilibrium jointly determined price and quantity, they are jointly dependent or endogenous variables This is a complete system of equations because the number of endogenous variables is equal to the number of equations Because OLS is inconsistent in this system, potential estimators include IV and 2SLS

60 6 Instrumental Variables, Simultaneous and Systems of Equations 621 The identification problem The identification problem should be solved before estimation We say that two theories are observationally equivalent if both are consistent with the same data Then we say that the structure is unidentified Figure 61 shows that the observed data that is consistent with the two structures in panels (b) and (c) We say that in panel (a) the structure is unidentified If we have additional information, for example, that supplied was fixed, then this rules out (c) and the correct structure is (b) Fig 61 Market equilibria and observational equivalence, from [Greene (2008)] 622 Single equation estimation methods If we are only interested in a single equation of the system, we can employ the IV or the 2SLS estimators presented previously The Stata example in Equation 613 can also be viewed as a simultaneous equation problem if we write the model as: hsngval i = α 0 + α 1 faminc i + α 2 2region i +α 3 3region i + α 4 4region i + v i, (617) rent i = β 0 + β 1 hsngval i + β 2 pcturban i + u i, (618)

62 Simultaneous equations 61 The results are the same as before 623 System methods of estimation The system of M equation can be formulated as: y 1 Z 1 0 0 0 δ 1 ε 1 y 2 = 0 Z 2 0 0 δ 2 + ε 2 y n 0 0 0 Z M δ M ε M or y=zδ + ε (619) where 1 E[ε X]=0, and E[εε X]= Σ = Σ I (620) OLS on the system is inconsistent IV is consistent, but is inefficient compared to estimators that make use of the cross-sectional correlations of the disturbances Three techniques are generally used for joint estimation of the entire system of equations: three-stage least squares, GMM, and full information maximum likelihood (FIML) 624 Three-stage least squares The three-stage least squares estimator of the system presented in Equation 619 is: ˆδ 2SLS =[Ẑ (Σ 1 I)Ẑ] 1 Ẑ (Σ 1 I)y (621) That has the following asymptotic covariance matrix: AsyVar[ ˆδ 2SLS ]=[ Z (Σ 1 I) Z] 1, (622) which is estimated with [Ẑ (Σ 1 I)Ẑ] 1 Under certain conditions, 3SLS satisfies the requirements for an IV estimator, hence it is consistent Moreover, 3SLS is asymptotically efficient Consider the simple example with M = 2 that relates consumption (consump) to private and government wages paid (wagepriv and wagegovt) Simultaneously, private wages depend on consumption, total government expenditures (govt), and the lagged stock of capital in the economy (capital1) The model can be written as: 1 is the kronecker product, which multiplies each of the elements of the first matrix by the entire second matrix

62 6 Instrumental Variables, Simultaneous and Systems of Equations consump = β 0 + β 1 wagepriv+β 2 wagegovt+ε 1 (623) wagepriv = β 3 + β 4 consump+β 5 govt+β 6 capital1+ε 2 (624) We assume that consump and wagepriv are endogenous while wagegovt, govt, and capital1 are exogenous The Stata commands for this example of 3SLS are: use http://wwwstata-presscom/data/r11/klein reg3 (consump wagepriv wagegovt) (wagepriv consump govt capital1) Three-stage least-squares regression Equation Obs Parms RMSE "R-sq" chi2 P consump 22 2 1776297 09388 20802 00000 wagepriv 22 3 2372443 08542 8004 00000 Coef Std Err z P> z [95% Conf Interval] consump wagepriv 8012754 1279329 626 0000 5505314 1052019 wagegovt 1029531 3048424 338 0001 432051 1627011 _cons 193559 3583772 540 0000 1233184 2637996 wagepriv consump 4026076 2567312 157 0117-1005764 9057916 govt 1177792 5421253 217 0030 1152461 2240338 capital1-0281145 0572111-049 0623-1402462 0840173 _cons 1463026 1026693 142 0154-5492552 3475306 Endogenous variables: consump wagepriv Exogenous variables: wagegovt govt capital1 Because the consumption equation is just identified the number of excluded exogenous variables from this equation (govt and capital1) is equal to the number of endogenous variables included (consump and wagepriv) the 3SLS point estimates are identical to the 2SLS estimates 63 Systems of Equations A multiple equation structure can be written as y 1 = X 1 β 1 + ε 1, (625) y 2 = X 2 β 2 + ε 2, y M = X M β M + ε M, where there are M equations and T observations in the sample

63 Systems of Equations 63 631 Seemingly Unrelated Regressions In the seemingly unrelated regressions (SUR) models the regressions are related because the (contemporaneous) errors associated with the dependent variables may be correlated Equations 624 can be expressed as: y i = X i β i + ε i i=1,2,,m, (626) where X i is strictly exogenous: ε =[ε 1,ε 2,,ε M] (627) and the disturbances are homoscedastic: E[ε X 1,X 2,,X M ]=0, (628) E[ε m ε m X 1,X 2,,X M ]=σ mm I T (629) As before, there are T observations to estimate M equation Each equation has K i regressors, for a total of K = M i=1 K i We require at least T > K i observations In addition, we assume that disturbances are uncorrelated across observations but correlated across equations: E[ε it ε js X 1,X 2,,X M ]=σ i j, if t = s and 0 otherwise (630) Therefore, the formulation of the disturbance is: E[ε i ε j X 1,X 2,,X M ]=σ i j I, (631) or σ 11 I σ 12 I σ 1M I E[εε σ 21 I σ 22 I σ 2M I X 1,X 2,,X M ]=Ω = σ M1 I σ M2 I σ MM I 632 GLS on Seemingly Unrelated Regressions Each of the M equations is, by itself, a classical regression Hence, using OLS equation by equation yields consistent, but not necessarily efficient parameters A more efficient estimator is to use generalized least squares to the stacked model

64 6 Instrumental Variables, Simultaneous and Systems of Equations y 1 X 1 0 0 β 1 ε 1 y 2 = 0 X 2 0 β 2 + ε 2 = Xβ + ε y n 0 0 X M β M ε M For the tth observation, the M M covariance matrix of the disturbances is σ 11 σ 12 σ 1M σ 21 σ 22 σ 2M Σ = σ M1 σ M2 σ MM Therefore, in Equation 631, and Ω = Σ I (632) Ω 1 = Σ 1 I (633) Then, the GLS estimator for the system in Equations 626 is ˆβ = [X Ω 1 X] 1 XΩ 1 y (634) = [X (Σ 1 I)X] 1 X (Σ 1 I)y The GLS results covered before apply to this model The equations are linked only by the disturbances How much efficiency is gained when using GLS over OLS? 1 If the equations are unrelated, GLS yields the same estimates as OLS 2 if we have the same X for all equations, then GLS and OLS are identical 3 If the regression in one block of equations are a subset of those in another, then GLS has no efficiency gains over OLS in the estimation of the smaller set of equations With unrestricted correlation of the disturbances and different regressors in the equations, two propositions that generally apply are: 1 The greater the correlation of the disturbances, the greater the efficiency gain in using GLS 2 The less the correlation in the X matrices, the greater the efficiency gain in using GLS 633 SUR in Stata Consider the following system of two equations: price = β 0 + β 1 foreign+β 2 lenght+u 1 (635) weight = γ 0 + γ 1 foreign+γ 2 lenght+u 2 (636)

63 Systems of Equations 65 Even if the share the same X and SUR is this system is the same as OLS, the benefit in using arises, if for example, we want to test H 0 : β 2 = γ 2 The commands in Stata are: use http://wwwstata-presscom/data/r11/auto sureg (price foreign length) (weight foreign length), small dfk The optionssmall anddfk are used to obtain small-sample statistics comparable with OLS Seemingly unrelated regression Equation Obs Parms RMSE "R-sq" F-Stat P price 74 2 2474593 03154 1635 00000 weight 74 2 2502515 08992 31654 00000 Coef Std Err t P> t [95% Conf Interval] price foreign 2801143 766117 366 0000 1286674 4315611 length 9021239 1583368 570 0000 5891219 1215126 _cons -1162135 3124436-372 0000-1779777 -544493 weight foreign -1336775 7747615-173 0087-2868332 194782 length 3144455 1601234 1964 0000 2827921 3460989 _cons -285025 3159691-902 0000-3474861 -2225639 Then we can test H 0 : β 2 = γ 2 by using: test length The optionssmall anddfk are used to obtain small-sample statistics comparable with OLS ( 1) [price]foreign = 0 ( 2) [weight]foreign = 0 F( 2, 142) = 1799 Prob > F = 00000 Now, consider the following system where the X are different across equations: Here the command is: price = β 0 + β 1 foreign+β 2 mpg+β 3 displ+u 1 (637) weight = γ 0 + γ 1 foreign+γ 2 lenght+u 2 (638) sureg (price foreign mpg displ) (weight foreign length), corr That yields the following output Seemingly unrelated regression Equation Obs Parms RMSE "R-sq" chi2 P price 74 3 2165321 04537 4964 00000 weight 74 2 2452916 08990 66184 00000 Coef Std Err z P> z [95% Conf Interval]

66 6 Instrumental Variables, Simultaneous and Systems of Equations price foreign 305825 6857357 446 0000 1714233 4402267 mpg -1049591 5847209-180 0073-2195623 9644042 displacement 1818098 4286372 424 0000 9779842 2658211 _cons 3904336 1966521 199 0047 500263 7758645 weight foreign -1473481 7544314-195 0051-2952139 517755 length 3094905 1539895 2010 0000 2793091 3396718 _cons -2753064 3039336-906 0000-3348763 -2157365 Correlation matrix of residuals: price weight price 10000 weight 03285 10000 Breusch-Pagan test of independence: chi2(1) = 7984, Pr = 00047 The option corr displays the correlation between the disturbances in Equations 638 and 638 In this case the correlation is 03285 and we reject the null hypothesis that the correlation is zero Hence, GLS is more efficient than OLS