Lecture Notes on Advanced Econometrics Lecture 8: Instrumental Variables Estimation Endogenous Variables Consider a population model: y α y + β + β x + β x +... + β x + u i i i i k ik i Takashi Yamano Fall Semester 9 We call y an endogenous variable when y is correlated with u. As we have studied earlier, y would be correlated with u if (a) there are omitted variables that are correlated with y and y, (b) y is measured with errors, and (c) y and y are simultaneously determined (we will cover this issue later). All of these problems, we can identify the source of the problems as the correlation between the error term and one or some of the independent variables. For all of these problems, we can apply instrumental variables () estimations because instrumental variables are used to cut correlations between the error term and independent variables. To conduct estimations, we need to have instrumental variables (or instruments in short) that are (R) uncorrelated with u but (R) partially and sufficiently strongly correlated with y once the other independent variables are controlled for. It turns out that it is very difficult to find proper instruments! In practice, we can test the second requirement (b), but we can not test the first requirement (a) because u is unobservable. To test the second requirement (b), we need to express a reduced form equation of y with all of exogenous variables. Exogenous variables include all of independent variables that are not correlated with the error term and the instrumental variable, z. The reduced form equation for y is y δ Z z + δ + δ x + + δ k x k- + u For the instrumental variable to satisfy the second requirement (R), the estimated coefficient of z must be significant. In this case, we have one endogenous variable and one instrumental variable. When we have the same number of endogenous and instrumental variables, we say the endogenous variables are just identified. When we have more instrumental variables than endogenous variables, we say the endogenous variables are over-identified. In this case,
we need to use two stage least squares (SLS) estimation. We will come back to SLS later. Define x (y,, x,, x k- ) as a -by-k vector, z (z,, x,, x k- ) a -by-k vector of all exogenous variables, X as a n-by-k matrix that includes one endogenous variable and k- independent variables, and Z as a n-by-k matrix that include one instrumental variable and (k-) independent variables: y x x3 x k z x x3 x k y x x3 xk z x x3 xk X, Z. yn xn xn3 xnk zn xn xn3 xnk The instrumental variables () estimator is β ( X ) Y Notice that we can take the inverse of Z'X because both Z and X are n-by-k matrices and Z'X is a k-by-k matrix which has full rank, k. This indicates that there is no perfect co linearity in Z. The condition that Z'X has full rank of k is called the rank condition. Problems with Small property. The consistency of the estimators can be shown by using the two requirements for s: β ( X ) ( Xβ + β + ( Z X ) u β + ( Z X / n) u / n From the first requirement (R), p lim u / n. From the second requirement (R), p lim X / n A, where A E( z x). Therefore, the estimator is consistent when s satisfy the two requirements. A Bivariate model Let s consider a simple bivariate model: y β + y + u β We suspect that y is an endogenous variable, cov(y,. Now, consider a variable, z, which is correlated y but not correlated with u: cov(z, y ) but cov(z,. Consider cov(z, y ): cov( z, y ) cov( z, β + β y + u )
β cov( z,) + β cov( z, y ) + cov( z, Because cov( z,) and cov( z,, we find that cov( z, y) β cov( z, y ) n i n i i ( zi z)( yi y), which is ( z z)( y y ) ( Z X ) Y. Thus, we find the same conclusion as using the matrix form. i The problem in practice is the first requirement, cov(z,. We can not empirically confirm this requirement because u cannot be observed. Thus, the validity of this assumption is left to economic theory or economists common sense. Recent studies show that even the first requirement can be problematic when the correlation between the endogenous and instrumental variables is weak. We will discuss this later. Example : Card (995), CARD.dta. A dummy variable grew up near a 4 year collage as an on educ. OLS. reg lwage educ Source SS df MS Number of obs 3 ---------+------------------------------ F(, 38) 39.54 Model 58.553536 58.553536 Prob > F. Residual 534.658 38.7756857 R-squared.987 ---------+------------------------------ Adj R-squared.984 Total 59.646 39.96956335 Root MSE.439 lwage Coef. Std. Err. t P> t [95% Conf. Interval] educ.594.8697 8.53..464674.577 _cons 5.57883.38895 43.47. 5.494748 5.6477 Correlation between nearc4 (an ) and educ. reg educ nearc4 Source SS df MS Number of obs 3 ---------+------------------------------ F(, 38) 63.9 Model 448.644 448.644 Prob > F. Residual 3.4759 38 7.9767 R-squared.8 ---------+------------------------------ Adj R-squared.5 Total 56.8 39 7.658643 Root MSE.6494 3
educ Coef. Std. Err. t P> t [95% Conf. Interval] nearc4.899.36988 7.994..65693.3347 _cons.698.85646 48.69..539.86594 Thus, nearc4 satisfies the one of the two requirements to be a good candidate as an. Estimation:. ivreg lwage (educnearc4) Instrumental variables (SLS) regression Source SS df MS Number of obs 3 ---------+------------------------------ F(, 38) 5.7 Model -34.443-34.443 Prob > F. Residual 93.75354 38.39776 R-squared. ---------+------------------------------ Adj R-squared. Total 59.646 39.96956335 Root MSE.55686 lwage Coef. Std. Err. t P> t [95% Conf. Interval] educ.8866.693 7.53..3658.39634 _cons 3.76747.348867.799. 3.8344 4.4554 Instrumented: educ Instruments: nearc4 Note that you can obtain the same coefficient by estimating OLS on lwage with the predicted educ (predicted by nearc4). However, the standard error would be incorrect. In the above Estimation, the standard error is already corrected. End of Example The Two-Stage Least Squares Estimation Again, let s consider a population model: y α y + β + βx + β x +... + β k xk + u () where y is an endogenous variable. Suppose that there are m instrumental variables. Instruments, z (, x,, x k, z,, z m ), are correlated with y. From the reduced form equation of y with all exogenous variables (exogenous independent variables plus instruments), we have y δ + δx+ δ x +... + δ k xk + δ k+ z+... + δ k+ m z m + ε y ŷ +ε 4
ŷ is a linear projection of y with all exogenous variables. Because ŷ is projected with all exogenous variables that are not correlated with the error term, u, in (), ŷ is not correlated with u, while ε is correlated with u. Thus, we can say that by estimating y with all exogenous variables, we have divided into two parts: one is correlated with u and the other is not. The projection of y with Z can be written as y Zδ Z( Z) y When we use the two-step procedure (as we discuss later), we use this ŷ in the place of y. But now, we treat y as a variable in X and project X itself with Z: X ZΠ Z( Z) X P X Π is a (k+m-)-by-k matrix with coefficients, which should look like: δ δ Π δ k. + m Z Thus, y in X should be expressed as a linear projection, and other independent variables in X should be expressed by itself. P Z Z( Z) is a n-by-n symmetric matrix and idempotent (i.e., P Z PZ PZ ). We use X as instruments for X and apply the estimation as in X X ) β ( X Y This can be also written as X X ) β ( X Y SLS ( X PZ X ) X PZ Y ( X Z( Z) X ) X Z( Z) Y () SLS This is the SLS estimator. It is called as two-stage because it looks like we take two steps by creating projected X to estimate the SLS estimators. We do not need to take two steps as we show in (). We can just estimate SLS estimators in one step by using X and Z. (This is what econometrics packages do.) The Two-Step procedure 5
It is still a good idea to know how to estimate the SLS estimators by a two-step procedure: Step : Obtain ŷ by estimating an OLS against all of exogenous variables, including all of instruments (the first-stage regression) Step : Use ŷ in the place of y to estimate y against ŷ and all of exogenous independent variables, not instruments (the second stage regression) The estimated coefficients from the two-step procedure should exactly the same as SLS. However, you must be aware that the standard errors from the two-step procedure are incorrect, usually smaller than the correct ones. Thus, in practice, avoid using predicted variables as much as you can! Econometric packages will provide you SLS results based on (). So you do not need to use the two-step procedure. We use the first step procedure to test the second requirement for s. In the first stage regression, we should conduct a F-test on all instruments to see if instruments are jointly significant in the endogenous variable, y. As we discuss later, instruments should be strongly correlated with y to have reliable SLS estimators. Consistency of SLS Assumption SLS.: For vector z, E(z', where z (, x,, x k, z,, z m ). Assumption SLS.: (a) rank E(z'z) k+m+; (b) rank E(z'x) k. (b) is the rank condition for identification that z is sufficiently linearly related to x so that rank E(z'x) has ful column rank. The order condition is k-+m k-+h, where h is the number of endogenous variable. Thus, the order condition indicates that we must have at least as many instruments as endogenous variables. Under assumption SLS and SLS, the SLS estimators in (-5) are consistent. Under homoskedasticity, σ ( X X n ) ( n k) u ( X X i i is a valid estimator of the asymptotic variance of β. ) SLS 6
Under heteroskedasticity, the heteroskedasticity-robust (White) standard errors is ( X ) X ) XΣX ( X X Here are some tests for estimation. See Wooldridge (Introductory Econometrics) for details. Testing for Endogeneity (i) Estimate the reduced form model using the endogenous variable as the dependent variable: y δ + δx + δ x +... + δ k xk + δ k+ z+... + δ k+ m zm + ε (ii) Obtain the residual, ε. (iii) Estimate y α y + β + β x + β x +... + β x + δ ε u k k + (iv) If δ is significant, then y is endogenous. Testing the Over-Identification (i) Estimate β SLS and obtain û. (ii) Regress û on z and x. (iii) Get R and get nr, which is chi-squared. If this is significantly different from zero, then at least some of the s are not exogenous. Example : Card (995), card.dta again. OLS. reg lwage educ exper expersq black smsa south Source SS df MS Number of obs 3 ---------+------------------------------ F( 6, 33) 4.93 Model 7.6565 6 8.69469 Prob > F. Residual 4.475997 33.48647 R-squared.95 ---------+------------------------------ Adj R-squared.89 Total 59.646 39.96956335 Root MSE.3749 lwage Coef. Std. Err. t P> t [95% Conf. Interval] educ.749.3554.3..67357.8883 exper.835958.66478.575..756.96635 expersq -.49.378-7.5. -.864 -.677 black -.89636.7666 -.758. -.499 -.557 smsa.643.55733.365..38876.99583 south -.4865.58-8.59. -.54546 -.9584 _cons 4.733664.6766 7.. 4.6 4.8667 7
Estimation: nearc nearc4 as s. ivreg lwage (educ nearc nearc4) exper expersq black smsa south Instrumental variables (SLS) regression Source SS df MS Number of obs 3 ---------+------------------------------ F( 6, 33).3 Model 86.368644 6 4.3787 Prob > F. Residual 56.44747 33.6863949 R-squared.455 ---------+------------------------------ Adj R-squared.438 Total 59.646 39.96956335 Root MSE.465 lwage Coef. Std. Err. t P> t [95% Conf. Interval] educ.68487.4869 3.38..65499.56983 exper.9.779 5.69..776865.67358 expersq -.35.357-6.574. -.998 -.677 black -.977.5687 -.938.53 -.545.996 smsa.65736.3335 3.846..57363.769 south -.9587.347-4.5. -.448 -.49956 _cons 3.73.8956 3.994..665743 4.878463 Instrumented: educ Instruments: nearc nearc4 + exper expersq... south. ivreg lwage (educ nearc nearc4 fatheduc motheduc) exper expersq black sms > a south Instrumental variables (SLS) regression Source SS df MS Number of obs ---------+------------------------------ F( 6, 3) 83.66 Model 8.49483 6 8.69939 Prob > F. Residual 3.58 3.448678 R-squared.57 ---------+------------------------------ Adj R-squared.57 Total 48.999484 9.933397 Root MSE.386 lwage Coef. Std. Err. t P> t [95% Conf. Interval] educ.73.63 7.93..75334.4839 exper.98944.948.435..83496.75385 expersq -.449.43-6.3. -.3359 -.66 black -.54635.593-5.87. -.765 -.99655 smsa.5854.95975 7.698..46.89854 south -.746.866-5.936. -.46688 -.783 _cons 4.678.68 9.657. 3.83664 4.686956 Instrumented: educ Instruments: nearc nearc4 fatheduc motheduc + exper expersq... south 8
Estimation: fatheduc motheduc as s. ivreg lwage (educ fatheduc motheduc) exper expersq black smsa south Instrumental variables (SLS) regression Source SS df MS Number of obs ---------+------------------------------ F( 6, 3) 83.44 Model 8.47754 6 8.79557 Prob > F. Residual 3.533 3.448368 R-squared.59 ---------+------------------------------ Adj R-squared.58 Total 48.999484 9.933397 Root MSE.3857 lwage Coef. Std. Err. t P> t [95% Conf. Interval] educ.9993.756 7.834..7496.4946 exper.98884.953.395..899.7538 expersq -.4487.4-6.3. -.3356 -.669 black -.559.59598-5.8. -.4983 -.9968 smsa.597.968 7.693..4553.893988 south -.7797.874-5.936. -.4783 -.784 _cons 4.645.8975 9.479. 3.834865 4.693436 Instrumented: educ Instruments: fatheduc motheduc + exper expersq... south Which ones are better? End of Example Weak Instruments Remember the two requirements for instrumental variables to be valid: (R) uncorrelated with u but (R) partially and sufficiently strongly correlated with y once the other independent variables are controlled for. The first requirement is difficult to be confirmed because we cannot observe u. So, we rely on economic theory or natural experiments to find instrumental variables that satisfy the first requirement. The second requirement can be checked by conducting some analyses. One way of checking the second requirement is to estimate a regression model of endogenous variables on exogenous variables which include instrumental variables and other exogenous variables. Suppose that we have one endogenous variable, y, and m instrumental variables, z,, z m. Then estimate the following model: y + i β + δzi +... + δ m zim + βxi +... + βδ k xik ui. () 9
Then obtain a F-statistics on the estimators of the instrumental variables: H : δ... δ m. If the F-statistics is small, then we call the instrumental variables weak. When the instrumental variables are weak, the or SLS estimators could be inconsistent or have large standard errors. Inconsistency To examine how weak instruments can make and SLS estimators inconsistent, let us consider a simple bivariate model: y β + y + u β In a bivariate model, we write plim β β + cov( z, cov( z, ) y because Corr(z,cov(z,/[z)] (see Wooldridge pp74) plim plim β β + z, /( z) ) z, y ) /( z) )) y z, y ) z, y ) ) β + u β Thus if z is only weakly correlated with the endogenous variable, y, i.e., z, y ) is very small, the estimator could be severely biased even when the correlation between z and u is very small. Consider now the OLS estimator of the bivariate model ignoring the endogeneity of y : plim cov( y, β OLS β + var( y ) β + β + y, y y ) y, y ) ) Here we have the endogenous bias. But the size of the endogenous bias could be smaller than the bias created by the weak instrument. To compare these two, let us take the ratio of these two: p lim β β z,. p lim β β z, y ) y, OLS When this ratio is larger than, the bias in the estimator is larger than the OLS estimator. When the instrumental variable is completely uncorrelated with u, the bias in
the estimator is zero. When it has even a very small correlation with u, the size of the ratio depends on the size of the denominator, which could be very small when the correlation between the instrumental variable and the endogenous variable, corr ( z, y), is small. The implication from this simple model could be also applied on a more complicated model where there are more than one endogenous variable and one instrumental variable. Low Precision Weak instrumental variables can lead to large standard errors of the /SLS estimators. The variance of the estimator is V ( β ) σ ( X ) Z( X ). This could be rearranged as: V ( β ) ( X X ) X X ( X ) Z( Z σ X ) V ( β OLS ) X X ( X ) Z( X ) V ( β )[( X X ) X ] [( Z) Z OLS X ] V ( β )[ Π ] [ Π where Π Z, X and OLS Z, X X, Z ] Π are projections of Z on X and X on Z, respectively. If the X, Z correlations between Z and X are low, then Π Z, X and Π X, Z have low values, which would make the variance of the estimators large. This could be applied on the SLS estimators. In empirical estimation, we often find large standard errors in the /SLS estimators. This could be caused by weak instruments. Thus, weak instrumental variables can cause inconsistency and imprecision in the /SLS estimators. But how weak is weak? A Rule of Thumb to find Weak Instruments Staiger and Stock (997) suggest that the F-statistics of instrumental variables in (), of this lecture note, should be larger than to ensure that the maximum bias in estimators to be less than %. If you are willing to accept the maximum bias in estimators to be less than %, the threshold is F-stat being larger than 5. If the number of instrumental variables is one, the F-statistics should be replaced by the t-statistics. In practice, it is quite difficult to find valid instrumental variables that are not weak. I personally find searching for instrumental variables time-consuming and not so rewarding. One can never be sure about validness of s. You should look for natural experiments or randomaized experiments that could be used as instrumental variables.