Problem Set # 1 Master in Business and Quantitative Methods
Contents 0.1 Problems on endogeneity of the regressors........... 2 0.2 Lab exercises on endogeneity of the regressors......... 4 1
0.1 Problems on endogeneity of the regressors Problem 1 For the simple regression model, y i = µ + u i, u i N(0,σ 2 ), prove that the sample mean is consistent and asymptotically normally distributed. Now consider the alternative estimator ˆµ = i w iy i w i = i (n(n+1)/2) = i i i. Note that i w i = 1 and i i2 = n(n + 1)(2n + 1)/6. Prove that this is a consistent estimator of µ and obtain its asymptotic variance. Problem 2 The following equation explains weekly hours of television viewing by a child in terms of the childs age, mothers education, fathers education, and number of siblings: tvhours i = β 0+β 1 age i +β 2 age 2 i +β 3motheduc i +β 4 fatheduc i +β 5 sibs i +u i We are worried that tvhours is measured with error in the survey. Let tvhours denote the reported hours of television viewing per week. (1) What do the classical errors-in-variables (CEV) assumptions require in this application? (2) Do you think the CEV assumptions are likely to hold? Explain. Problem 3 In the context of the instrumental variables we have shown that the least squares estimator ˆβ is biased and inconsistent. Nonetheless, ˆβ does estimate something. Derive the asymptotic covariance matrix of ˆβ, and show that ˆβ is asymptotically normally distributed. Problem 4 Prove that the limiting distribution of ˆβ IV is n(ˆβiv β) d N ( 0,σ 2 rq 1 ZX Q ZZQ 1 XZ). and present the asymptotic distribution of ˆβ IV.
Problem 5 Suppose that we want to estimate the marginal effect of income on the average grade obtained in the university (colgpa). Beside the family income, average grade in high school (hsgpa) and grade obtained in the exam to enter the university (SAT) also explain colgpa. Consider the model for colgpa: colgpa t = β 0 + β 1 faminc t + β 2hsGPA t + β 3 SAT t + u t, where faminc is the true value of the family income. Nevertheless, we observe that the data on the family income provided by the students can contain measurement errors, ie, faminc = faminc +e, where e N(0,σ 2 e). 1. Which are the consequences of this information to the properties of the OLS estimators? Justify your answer with a formal proof. 2. Propose an alternative method to estimate the previous model and explain in detail its assumptions and properties. Problem 6 Consider a model for the health of an individual: health i = β 0 + β 1age i + β 2weight i + β 3height i + β 4male i + β 5work i + β 6exercise i + u i, where health is some quantitative measure of the persons health, age, weight, height, and male are self-explanatory, work is weekly hours worked, and exercise is the hours of exercise per week. a. Why might you be concerned about exercise being correlated with the error term u? b. Suppose you can collect data on two additional variables, disthome and distwork, the distances from home and from work to the nearest health club or gym. Discuss whether these are likely to be uncorrelated with u. Problem 7 Consider the model: y = βx + ε Prove that when only x is measured with error, the squared correlation between y and x is less than that between y and x. (Note the assumption that y = y). Does the same hold true if y is also measured with error?
0.2 Lab exercises on endogeneity of the regressors Problem 8 Consider a consumption function of the form c t = α + βy t + u t. It is estimated using 204 observations on aggregate U.S. consumption and disposable personal income. The data set is called consumption.txt. Suppose that there is a possibility of bias due to correlation between y t and u t. Consider instrumental variables estimation using y t 1 as the instrument for y t, and, of course, the constant term is its own instrument. One observation is lost because of the lagged values, so the results are based on 203 quarterly observations. Estimate the model using the instrumental variables method and compare it with the OLS estimation Test for the possible endogeneity using the Hausman test (version Wu). Problem 9 Regarding the previous exercise that specifies a very simple consumption model, one might wonder if the meager specification of the model could help explain the finding in the Hausman test.use the data to carry out the test in a more elaborate specification c t = β 1 + β 2 y t + β 3 i t + β 4 c t 1 + ε t where c t is the log of real consumption, y t is the log of real disposable income, and i t is the interest rate (90-day T bill rate).(hint: use as instrument of y t, y t 1 ). Compare these results to those obtained previously. Problem 10 Use the data in WAGE2.xls for this exercise: (i) We want to explain log(wage) based on educ, but educ is an endogenous variable. Use IV to estimate this model, using sibs as an instrumental variable. Moreover, run the regression of educ on sibs to test the correlation between educ and sibs. (ii) The variable brthord is birth order (brthord is one for a first-born child, two for a second-born child, and so on). Explain why educ and brthord might be negatively correlated. Regress educ on brthord to determine whether there is a statistically significant negative correlation. (iii) Now, suppose that we include number of siblings as an explanatory variable in the wage equation; this controls for family background, to some extent log(wage) i = β 0 + β 1 educ i + β 2 sibs i + u i.
Suppose that we want to use brthord as an IV for educ, assuming that sibs is exogenous. The reduced form for educ is: educ i = π 0 + π 1 sibs i + π 2 brthord i + v i. State and test the identification assumption. (iv) Estimate the equation from part (iii) using brthord as an IV for educ (and sibs as its own IV). Comment on the standard errors for β 1 y de β 2. (v) Using the fitted values from part (iii), educ, ˆ compute the correlation between educ ˆ and sibs. Use this result to explain your findings from part (iv). Problem 11 The data in FERTIL2.xls includes, for women in Botswana during 1988, information on number of children, years of education, age, and religious and economic status variables. (i) Estimate this model by OLS children i = β 0 + β 1 educ i + β 2 age i + β 3 age 2 i + u i and interpret the estimates. In particular, holding age fixed, what is the estimated effect of another year of education on fertility? If 100 women receive another year of education, how many fewer children are they expected to have? (ii) frsthalf is a dummy variable equal to one if the woman was born during the first six months of the year. Assuming that frsthalf is uncorrelated with the error term from part (i), show that frsthalf is a reasonable IV candidate for educ. (Hint: You need to do a regression.) (iii) Estimate the model from part (i) by using frsthalf as an IV for educ. Compare the estimated effect of education with the OLS estimate from part (i). (iv) Add the binary variables electric, tv, and bicycle to the model and assume these are exogenous. Estimate the equation by OLS and 2SLS and compare the estimated coefficients on educ. Interpret the coefficient on tv and explain why television ownership has a negative effect on fertility.
Problem 12 The aim of this exercise is to compare the estimations obtained with 2SLS with others obtained by incorrect methods. The data is WAGE2.xls (i) Use 2SLS to estimate: log(wage) i = β 0 + β 1 educ i + β 2 exper i + β 3 tenure i + β 4 black i + u i, using sibs as instrumental variable for educ. (ii) Estimate now the regression of educ on sibs, exper, tenure and black. Then plug in educ ˆ in the equation of part (i) and estimate it by OLS. Compare these ˆβ j with the ones obtained in part (i) and their standard errors. Comment these standard errors.