Exercise Sheet 4 Instrumental Variables and Two Stage Least Squares Estimation ECONOMETRICS I. UC3M 1. [W 15.1] Consider a simple model to estimate the e ect of personal computer (P C) ownership on the college grade point average (GP A) for the graduating seniors at a large public university: GP A = 0 + 1 P C + u; where P C is a binary variable indicating computer ownership. a) Why might PC ownership be correlated with u? b) Explain why P C is likely to be related to parent s annual income. Does this mean parental income is a good IV for P C? Why or why not? c) Suppose that, four years ago, the university gave grants to buy computers to roughly one-half of the incoming students, and the students that received grants were randomly chosen. Carefully explain how would you use this information to construct an instrumental variable for P C: 2. [W 15.2] Suppose that you wish to estimate the e ect of class attendance on student performance. A basic model is: stndfnl = 0 + 1 atndrte + 2 prigp A + 3 ACT + u; where the variables are de ned as usual. a) Let dist be the distance from the students living quarters to the lecture hall. Do you think dist is uncorrelated with u? b) Assuming that dist and u are uncorrelated, what other assumption must dist satisfy in order to be a valid IV for atndrte? c) Suppose we add the interaction prigp A atndrte; stndfnl = 0 + 1 atndrte + 2 prigp A + 3 ACT + 4 prigp A atndrte + u: If atndrte is correlated with u; then, in general, so is prigp A atndrte: What might be a good IV for prigp A atndrte? [Hint: if E [ujprigp A; ACT; dist] = 0; as happens when prigp A; ACT; dist are all exogenous, then any function of prigp A and dist is uncorrelated with u]. 3. [W 15.5] Assume that x = u in a regression model y = 0 + 1 x + u; so that the population variation of the error term is the same as it is in x: Suppose that the instrumental variable z; is slightly correlated with u; Cov (z; u) = 0;1: Suppose also that z and x have a somewhat stronger correlation, Cov (x; z) = 0;2: 1
a) What is the asymptotic bias in the IV estimator? b) How much correlation would have to exist between x and u before OLS has more asymptotic bias than 2SLS? 4. [W 15.7] The following is a simple model to measure the e ect of a school choice program on standardised test performance, score = 0 + 1 choice + 2 faminc + u 1 ; where score is the score on a statewide test, choice is a binary variable indicating whether a student attended a choice college in the past year, and faminc is family income. The IV for choice is grant; the dollar amount granted to students to use for tuition at choice schools. The grant amount di ered by family income level, which is why we control for faminc in the equation. a) Even with faminc in the equation,why might choice be correlated with u 1? b) If within each income class, the grant amounts were assigned randomly, is grant uncorrelated with u 1? c) Write the reduced form equation for choice: What is needed for grant to be partially correlated with choice? d) Write the reduced form equation for score: Explain why this is useful. [Hint: How do you interpret the coe cient on grant?] 5. [W 15.8] Suppose you want to test whether girls who attend a girls high school do better in mathematics than girls who attend coeducational schools. You have a random sample of senior high school girls from a state in the USA; and score is the score on a standardised mathematics test. Let girlhs be a dummy variable indicating whether a student attends a girls high school. a) What other factors would you control for in the equation? b) Write an equation relating score to girlhs and the other factors you listed in part (a). c) Suppose that parental support and motivation are unmeasured factors in the error term in part (b). Are these likely to be correlated with girlhs? Explain your answer. d) Discuss the assumptions needed for the number of girls high schools within a twentymile radius of a girl s home to be a valid IV for girlhs: 6. [W 15.13] The data in FERTIL2.RAW include, for women in Botswana during 1988, information on number of childen, years of education, age and economic status variables. a) Estimate this model by OLS children = 0 + 1 educ + 2 age + 3 age 2 + u and interpret the estimates. In particular, holding age xed, what is the estimated e ect of another year of education on fertility? If 100 women receive another year of education, how many fewer children are they expected to have? 2
b) frsthalf is a dummy variable equal to 1 if the woman was born during the rst six months of the year. Assuming that f rsthalf is uncorrelated with the error term from part (a), show that frsthalf is a reasonable IV for educ: c) Estimate the model from part (a) by using frsthalf as an IV for educ: Compare the estimated e ect of educ with the OLS estimate from part (a) and test the potential endogeneity of educ. d) Add the binary variables electric; tv and bicycle to the model and assume these are exogenous. Estimate the equation by OLS and 2SLS and compare the estimated coe - cients on educ: Interpret the coe cient on tv and explain why television ownership has a negative e ect on fertility. 7. [W 15.14] Use the data in CARD:RAW for this exercise. a) Consider the equation log (wage) = 0 + 1 educ + 2 exper + + u where the other explanatory variables include exper 2 ; black; smsa and south: In order for IV to be consistent, the IV for educ; nearc4 (which is a dummy variable indicating if someone grew up close to a four-year college) must be uncorrelated with u: Could nearc4 be correlated with a component of the error term, such us unobserved ability? Explain your answer. b) For a subsample of the men in the data set, an IQ score is available. Regress IQ on nearc4 to check whether average IQ scores vary by whether a man grew up near a four-year college.what do you conclude? c) Now regress IQ on nearc4; smsa66 and the regional dummy variables reg662;..., reg669: Are IQ and nearc4 related after the geographic dummy variables have been partialled out? Reconcile this with your ndings from part (b). d) From parts (a) and (b) ; what do you conclude about the importance of controlling for smsa66 and the 1966 regional dummies in the log (wage) equation? 8. [Wooldridge, 16.1] Consider a two-equation system in supply and demand form, that is, with the same variable y 1 (typically quantity) appearing on the left-hand side: y 1 = 1 y 2 + 1 z 1 + u 1 y 1 = 2 y 2 + 2 z 2 + u 2 : a) If 1 = 0 or 2 = 0; explain why a reduced form exists for y 1 : (Remember, a reduced form expresses y 1 as a linear function of the exogenous variables and the structural errors). If 1 6= 0 and 2 = 0; nd the reduced form for y 2 : b) If 1 6= 0, 2 6= 0 and 1 6= 2 ; nd the reduced form for y 1 : Does y 2 have a reduced form in this case? c) Is the condition 1 6= 2 likely to be met in supply and demand examples? Explain. 9. [Wooldridge, 16.2] Let corn denote per capita consumption of corn in bushels, at the county level, price the price per bushel of corn, income the per capita county income and rainfall 3
inches of rainfall during the last corn-growing season. The following simultaneous equations model imposes the equilibrium condition that supply equals demand, corn = 1 price + 1 income + u 1 corn = 2 price + 2 rainfall + u 2 : Which is the supply equation and which is the demand equation? Explain. 10. [Wooldridge, 16.4] Suppose that annual earnings and alcohol consumption are determined by the SEM: log (earnings) = 0 + 1 alcohol + 2 educ + u alcohol = 0 + 1 log (earnings) + 2 educ + 3 log (price) + u 2 ; where price is a local price index for alcohol, which includes state and local taxes. Assuming that educ and price are exogenous, and that 1 ; 2 ; 1 ; 2 and 3 are all di erent from zero, which equation is identi ed? How would you estimate that equation? 11. [Wooldridge, 16.9] Use the data le SMOKE.RAW for this exercise. a) A model to estimate the e ects of smoking on annual income (possibly through lost work days due to illness, or productivity e ects), is log (income) = 0 + 1 cigs + 2 educ + 3 age + 4 age 2 + u 1 where cigs is number of cigarettes smoked per day, on average. How do you interpret 1? b) To re ect the fact that cigarette consumption might be jointly determined with income, a demand for cigarettes equation is cigs = 0 + 1 log (income) + 2 educ + 3 age + 4 age 2 + 5 log (cigpric) + 6 restaurn + u 2 where cigpric is the price of a pack of cigarettes (in cents), and restaurn is a binary variable equal to unity if the person lives in a state with restaurant smoking restrictions. Assuming these are exogenous to the individual, what signs would you expect for 5 and 6? c) Under what assumption is the income equation from part (a) identi ed? d) Estimate the income equation by OLS and discuss the estimate of 1 : e) Estimate the reduced form for cigs; regressing cigs on all exogenous variables. Are log (cigpric) and restaurn signi cant in the reduced form? f ) Estimate the income equation by 2SLS. Compare the resulting estimate of 1 with the OLS estimate. g) Do you think that cigarette prices and restaurant smoking restrictions are exogenous in the income equation? 12. [Wooldridge, 16.10] Use the data le MROZ.RAW and consider the example of the labour supply for active female labour market participants. 4
a) Estimate the labour supply equation by 2SLS using exper and exper 2 as instruments for log (wage) ; log (hours) = 1 log (wage) + 10 + 11 educ + 12 age + 13 kidslt6 + 14 nwifeinc + u 1 and compare the results to that obtained with hours as the dependent variable. b) In the labour supply equation from part (a) allow educ to be endogenous because of omitted ability. Use motheduc and fatheduc as IVs for educ: Note, that now you have two endogenous variables in the equation. c) Test the overidentifying restrictions in the 2SLS estimation from part (b) : Do the IVs pass the test? 5