Econometrics Problem Set 6

Similar documents
Econometrics Problem Set 6

Econometrics Problem Set 3

Econometrics Problem Set 4

Econometrics Problem Set 7

Universidad Carlos III de Madrid Econometría Nonlinear Regression Functions Problem Set 8

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

Econometrics Homework 1

Econometrics Problem Set 11

2. Linear regression with multiple regressors

WISE International Masters

Homework Set 2, ECO 311, Spring 2014

ECON3150/4150 Spring 2016

ECON3150/4150 Spring 2015

Econometrics Problem Set 10

Homework Set 2, ECO 311, Fall 2014

Applied Statistics and Econometrics

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

ECON 497 Midterm Spring

Review of Econometrics


WISE International Masters

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Final Exam - Solutions

Linear Regression with 1 Regressor. Introduction to Econometrics Spring 2012 Ken Simons

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

Econometrics -- Final Exam (Sample)

Final Exam - Solutions

Chapter 9: The Regression Model with Qualitative Information: Binary Variables (Dummies)

Applied Statistics and Econometrics

Eco 391, J. Sandford, spring 2013 April 5, Midterm 3 4/5/2013

Motivation for multiple regression

Applied Statistics and Econometrics

Introduction to Econometrics. Multiple Regression

ECON Interactions and Dummies

2) For a normal distribution, the skewness and kurtosis measures are as follows: A) 1.96 and 4 B) 1 and 2 C) 0 and 3 D) 0 and 0

Hypothesis Tests and Confidence Intervals in Multiple Regression

CHAPTER 4. > 0, where β

M(t) = 1 t. (1 t), 6 M (0) = 20 P (95. X i 110) i=1

Econ 1123: Section 2. Review. Binary Regressors. Bivariate. Regression. Omitted Variable Bias

Introduction to Econometrics. Multiple Regression (2016/2017)

Regression Analysis. BUS 735: Business Decision Making and Research

Ec1123 Section 7 Instrumental Variables

Write your identification number on each paper and cover sheet (the number stated in the upper right hand corner on your exam cover).


The F distribution. If: 1. u 1,,u n are normally distributed; and 2. X i is distributed independently of u i (so in particular u i is homoskedastic)

Econometrics Review questions for exam

ECO321: Economic Statistics II

Inference in Regression Model

Rockefeller College University at Albany

ECON 482 / WH Hong Binary or Dummy Variables 1. Qualitative Information

Chapter 7. Hypothesis Tests and Confidence Intervals in Multiple Regression

Problem Set - Instrumental Variables

In order to carry out a study on employees wages, a company collects information from its 500 employees 1 as follows:

ECON 4230 Intermediate Econometric Theory Exam

Econ Spring 2016 Section 9

Answers to Problem Set #4

Applied Quantitative Methods II

Regression Models REVISED TEACHING SUGGESTIONS ALTERNATIVE EXAMPLES

Basic econometrics. Tutorial 3. Dipl.Kfm. Johannes Metzler

1 A Non-technical Introduction to Regression

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

ECON Introductory Econometrics. Lecture 7: OLS with Multiple Regressors Hypotheses tests

The Simple Linear Regression Model

Write your identification number on each paper and cover sheet (the number stated in the upper right hand corner on your exam cover).

Practice Questions for the Final Exam. Theoretical Part

Answer Key: Problem Set 6

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

Inference in Regression Analysis

Unless provided with information to the contrary, assume for each question below that the Classical Linear Model assumptions hold.

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

Chapter 6: Linear Regression With Multiple Regressors

Empirical Application of Simple Regression (Chapter 2)

Economics 241B Estimation with Instruments

4. Nonlinear regression functions

Contest Quiz 3. Question Sheet. In this quiz we will review concepts of linear regression covered in lecture 2.

MGEC11H3Y L01 Introduction to Regression Analysis Term Test Friday July 5, PM Instructor: Victor Yu

Introduction to Simple Linear Regression

8. Instrumental variables regression

1 Linear Regression Analysis The Mincer Wage Equation Data Econometric Model Estimation... 11

Statistical Inference with Regression Analysis

Introduction to Econometrics. Heteroskedasticity

Multiple Regression Analysis

The returns to schooling, ability bias, and regression

Multiple Regression. Midterm results: AVG = 26.5 (88%) A = 27+ B = C =

Ecn Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman. Midterm 2. Name: ID Number: Section:

Solutions to Odd-Numbered End-of-Chapter Exercises: Chapter 8

Economics 345: Applied Econometrics Section A01 University of Victoria Midterm Examination #2 Version 1 SOLUTIONS Fall 2016 Instructor: Martin Farnham

Multiple Linear Regression

Comprehensive Examination Quantitative Methods Spring, 2018

Applied Economics. Regression with a Binary Dependent Variable. Department of Economics Universidad Carlos III de Madrid

1 Correlation and Inference from Regression

CIVL 7012/8012. Simple Linear Regression. Lecture 3

Homoskedasticity. Var (u X) = σ 2. (23)

Multiple Regression Analysis: Heteroskedasticity

Statistical methods for Education Economics

UNIVERSIDAD CARLOS III DE MADRID ECONOMETRICS Academic year 2009/10 FINAL EXAM (2nd Call) June, 25, 2010


Econ Spring 2016 Section 9

Transcription:

Econometrics Problem Set 6 WISE, Xiamen University Spring 2016-17 Conceptual Questions 1. This question refers to the estimated regressions shown in Table 1 computed using data for 1988 from the CPS. The data set consists of information on 4000 full-time full-year workers. The highest educational achievement for each worker was either a high school diploma or a bachelor s degree. The worker s ages ranged from 25 to 34 years. The data set also contained information on the region of the country where the person lived, marital status, and number of children. For the purposes of these exercises let AHE = average hourly earnings (in 1998 dollars) College = binary variable (1 if college, 0 if high school) F emale = binary variable (1 if female, 0 if male) Age = age (in years) N ortheast = binary variable (1 if Region = Northeast, 0 otherwise) M idwest = binary variable (1 if Region = Midwest, 0 otherwise) South = binary variable (1 if Region = South, 0 otherwise) W est = binary variable (1 if Region = West, 0 otherwise) (a) (SW 7.1) Add * (5%) and ** (1%) to Table 1 to indicate statistical significance of the coefficients. Solution: All coefficients (including the intercept) should have ** except for the coefficient associated with X 6 in column (3) which should have nothing. (b) (SW 7.2) Using the regression results in column (1): i. Is the college-high school earnings difference estimated from this regression statistically significant at the 5% level? Construct a 95% confidence interval of the difference. Solution: The t-statistic is 5.46/0.21 = 26.0 > 1.96, so the coefficient is statistically significant at the 5% level. The 95% confidence interval of the college-high school earnings difference is [5.05, 5.87]. ii. Is the male-female earnings difference estimated from this regression statistically significant at the 5% level? Construct a 95% confidence interval for the difference.

Solution: The t-statistic is 2.64/0.20 = 13.2 < 1.96, so the coefficient is statistically significant at the 5% level. The 95% confidence interval of malefemale earnings difference is [ 3.03, 2.25]. (c) (SW 7.3) Using the regression results in column (2): i. Is age an important determinant of earnings? Use an appropriate statistical test and/or confidence interval to explain your answer. Solution: From column (2), age is statistically significant at the 5% level. Using a t-test, the t-statistic is 0.29/0.04 = 7.25, with a p-value of 4.2 10 13, implying that the coefficient on age is statistically significant at the 1% level. ii. Sally is a 29-year-old female college graduate. Betsy is a 34-year-old female college graduate. Construct a 95% confidence interval for the expected difference between their earnings. Solution: The 95% confidence interval for the expected difference between their earnings is: Age [0.29 ± 1.96 0.04] = [$1.06, $1.84]. (d) (SW 7.4) Using the regression results in column (3): i. Do there appear to be important regional differences? Use an appropriate hypothesis to explain your answer. Solution: The F -statistic testing the coefficients on the regional regressors are zero is 6.10. The 1% critical value (from the F 3, distribution) is 3.78. Because 6.10 > 3.78, the regional effects are significant at the 1% level. ii. Juanita is a 28-year-old female college graduate from the South. Molly is a 28-yearold female college graduate from the West. Jennifer is a 28-year-old female college graduate from the Midwest. α) Construct a 95% confidence interval for the difference in expected earnings between Juanita and Molly. Solution: The 95% confidence interval for the difference in the expected earnings between Juanita and Molly is: (X 6,Juanita X 6,Molly ) [ ˆβ 6 ±z 0.025 SE( ˆβ 6 ) = 0.27 ± 1.96 0.26 = [ 0.78, 0.24]. β) Explain how you would construct a 95% confidence interval for the difference in expected earnings between Juanita and Jennifer. Solution: The expected difference between Juanita and Jennifer is (X 5,Juan X 5,Jenn ) β 5 + (X 6,Juan X 6,Jenn ) β 6 = β 5 + β 6. A 95% confidence interval could be constructed using the general methods discussed in Section 7.3. In this case, an easy way to do this is to omit Page 2

Midwest from the regression and replace it with X 5 = W est. In this new regression the coefficient on South measures the difference in wages between the South and the Midwest, and a 95% confidence interval can be computed directly. Dependent variable: average hourly earnings (AHE). Regressor (1) (2) (3) College(X 1 ) 5.46 5.48 5.44 (0.21) (0.21) (0.21) F emale(x 2 ) -2.64-2.62-2.62 (0.20) (0.20) (0.20) Age(X 3 ) 0.29 0.29 (0.04) (0.04) Northeast(X 4 ) 0.69 (0.30) Midwest(X 5 ) 0.60 (0.28) South(X 6 ) -0.27 (0.26) Intercept 12.69 4.40 3.75 (0.14) (1.05) (1.06) Summary Statistics F -statistic for regional effects = 0 6.10 SER 6.27 6.22 6.21 R 2 0.176 0.190 0.194 R 2 n 4000 4000 4000 Table 1: Results of Regressions of Average Hourly Earnings on Gender and Education Binary Variables and Other Characteristics Using 1988 Data from the Current Populations Survey (e) (SW 7.5) The regression shown in column (2) was estimated again, this time using data from 1992 (4000 observations selected at random from the March 1993 CPS, converted into 1998 dollars using the consumer price index). The results are ÂHE =0.77 + 5.29 College 2.59 F emale + 0.40 Age, SER = 5.85, R 2 = 0.21. (0.98) (0.20) (0.18) (0.03) Comparing this regression to the regression for 1998 shown in column (2) was there a statistically significant change in the coefficient on College? Solution: The t-statistic for the difference in the college coefficients is t = ( ˆβ College,1998 ˆβ College,1992 )/SE( ˆβ College,1998 ˆβ College,1992 ). Page 3

Because ˆβ College,1998 and ˆβ College,1992 are computed from independent samples, they are independent, which means that cov( ˆβ College,1998, ˆβ College,1992 ) = 0. Thus, var( ˆβ College,1998 ˆβ College,1992 ) = var( ˆβ College,1998 ) + var( ˆβ College,1992 ). This implies that SE( ˆβ College,1998 ˆβ College,1992 ) = (0.21 2 + 0.20 2 ) 1/2. Thus, t act = 5.48 5.29 = 0.6552. (0.21 2 + 0.20 2 ) 1/2 There is no significant change since the calculated t-statistic is less than 1.96, the 5% critical value. 2. (SW 7.7) Data were collected from a random sample of 220 home sales from a community in 2003. Let P denote the selling price (in $1000), BDR denote the number of bedrooms, Bath denote the number of bathrooms, Hsize denote the size of the house (in square feet), Lsize denote the lot size (in square feet), Age denote the age of the house (in years), and P r denote a binary variable that is equal to 1 if the condition of the house is reported as poor. An estimated regression yields ˆP =119.2 + 0.485BDR + 23.4Bath + 0.156Hsize + 0.002Lsize + 0.090Age - 48.8P r (23.9) (2.61) (8.94) (0.011) (0.00048) (0.311) (10.5) SER = 41.5, R2 = 0.72. (a) Is the coefficient on BDR statistically significantly different from zero? Solution: For BDR, the t statistic is 0.485 = 0.1858 < 1.96. Therefore, the coefficient on BDR is not statistically significant different from 2.61 zero. (b) Typically five-bedroom houses sell for much more than two-bedroom houses. Is this consistent with your answer to (a) and with the regression more generally? Solution: The coefficient on BDR measures only the partial effect of the number of bedrooms holding Hsize (house size) constant. Yet a typical five-bedroom house is much larger than a typical two-bedroom house. Therefore the results in (a) says little about the conventional wisdom. (c) A homeowner purchases 2000 square feet from an adjacent lot. Construct a 99% confidence interval for the change in the value of her house. Solution: The 99% confidence interval for the effect of lot size on price is 2000 [0.002 ± z 0.005 0.00048], that is, [1.536.47] (with z 0.005 2.5758). (d) Lot size is measured in square feet. Do you think that another scale might be more appropriate? Why or why not? Page 4

Solution: Choosing the scale of the variables should be done to make the regression results easy to read and to interpret. If the lot size were measured in thousands of square feet, the estimates coefficient would be 2 instead of 0.002. (e) The F -statistic for omitting BDR and Age from the regression is F = 0.08. Are the coefficients on BDR and Age jointly statistically different from zero at the 10% level? Solution: The 10% critical value from the F 2, distribution is 2.30. Because 0.08 < 2.30, the coefficients are not jointly significant at the 10% level. 3. (SW 7.9) Consider the regression model Y i = β 0 + β 1 X 1i + β 2 X 2i + u i. Use the transform the regression approach discussed in class to transform the regression so that you can use a t-statistic to test (a) β 1 = β 2 ; Solution: Adding and subtracting β 2 X 1i gives to the right-hand side of the equation Thus you can estimate Y i = β 0 + β 1 X 1i β 2 X 1i + β 2 X 1i + β 2 X 2i + u i = β 0 + (β 1 β 2 )X 1i + β 2 (X 1i + X 2i ) + u i. and test whether γ = 0. (b) β 1 + aβ 2 = 0, where a is a constant; Y i = β 0 + γx 1i + β 2 (X 1i + X 2i ) + u i Solution: Adding and subtracting aβ 2 X 1i to the right-hand side of the equation gives Thus you can estimate Y i = β 0 + β 1 X 1i + aβ 2 X 1i aβ 2 X 1i + β 2 X 2i + u i = β 0 + (β 1 + aβ 2 )X 1i + β 2 (X 2i ax 1i ) + u i. and test whether γ = 0. Y i = β 0 + γx 1i + β 2 (X 2i ax 1i ) + u i (c) β 1 + β 2 = 1; (Hint: You can redefine the dependent variable in the regression.) Page 5

Solution: Adding and subtracting β 2 X 1i to the right-hand side of the equation, and subtracting X 1i from both sides of the equation gives Thus you can estimate Y i X 1i = β 0 + β 1 X 1i + β 2 X 1i X 1i β 2 X 1i + β 2 X 2i + u i = β 0 + (β 1 + β 2 1)X 1i + β 2 (X 2i X 1i ) + u i. Y i X 1i = β 0 + γx 1i + β 2 (X 2i X 1i ) + u i and test whether γ = 0. Alternatively, you can ignore the hint and estimate and test whether γ = 1. (d) β 1 + β 2 = a, where a is a constant. Y i = β 0 + γx 1i + β 2 (X 2i X 1i ) + u i Solution: Adding and subtracting β 2 X 1i to the right-hand side of the equation, and subtracting ax 1i from both sides of the equation gives Y i ax 1i = β 0 + β 1 X 1i + β 2 X 1i ax 1i β 2 X 1i + β 2 X 2i + u i = β 0 + (β 1 + β 2 a)x 1i + β 2 (X 2i X 1i ) + u i. Thus you can estimate and test whether γ = 0. Alternatively, you can estimate Y i ax 1i = β 0 + γx 1i + β 2 (X 2i X 1i ) + u i and test whether γ = a. Y i = β 0 + γx 1i + β 2 (X 2i X 1i ) + u i 4. (SW 7.10) Show that the following two formulas for the homoskedasticity-only F -statistic are equivalent. F = (SSR restricted SSR unrestricted )/q SSR unrestricted /(n k unrestricted 1) and F = (R 2 unrestricted R2 restricted )/q (1 R 2 unrestricted )/(n k unrestricted 1). 5. (SW 7.11) A school district undertakes an experiment to estimate the effect of class size on test scores in second-grade classes. The district assigns 50% of its previous years firstgraders to small second-grade classes (18 students per classroom) and 50% to regular-size Page 6

classes (21 students per classroom). Students new to the district are handled differently: 20% are randomly assigned to small classes and 80% to regular-class sizes. At the end of the second-grade school year, each student is given a standardized exam. Let Y i denote the exam score for the i th student, X 1i denote a binary variable that equals 1 if the student is assigned to a small class, and X 2i denote a binary variable that equals 1 if the student is newly enrolled. Let β 1 denote the causal effect on test scores of reducing class size from regular to small. (a) Consider the regression Y i = β 0 + β 1 X 1i + u i. Do you think that E(u i X 1i ) = 0? Is the OLS estimator of β 1 unbiased and consistent? Explain. Solution: Treatment (assignment to small classes) was not randomly assigned in the population (the continuing and newly-enrolled students) because of the difference in the proportion of treated continuing and newly-enrolled students. Thus, the treatment indicator X 1 is correlated with X 2. If newly-enrolled students perform systematically differently on standardized tests than continuing students (perhaps because of adjustment to a new school), then this becomes part of the error term u in the regression. This leads to correlation between X 1 and u, so that E(u X 1 ) 0. Because E(u X 1 ) 0, ˆβ 1 is biased and inconsistent. Statistically, if the true model is Y i = β 0 + β 1 X 1i + β 2 X 2i + u i, that is, if β 2 0, then we have the necessary conditions for omitted variable bias. The variables X 1i and X 2i are correlated and X 2i partly explains Y i. (b) Consider the regression Y i = β 0 +β 1 X 1i +β 2 X 2i +u i. Do you think that E(u i X 1i, X 2i ) = 0 depends on X 1? Is the OLS estimator of β 1 unbiased and consistent? Explain. Do you think that E(u i X 1i, X 2i ) = 0 depends on X 2? Is the OLS estimator of β 2 unbiased and consistent? Explain. Solution: Because treatment was randomly assigned conditional on enrollment status (continuing or newly-enrolled), E(u X 1, X 2 ) will not depend on X 1. This means that the assumption of conditional mean independence is satisfied, and ˆβ 1 is unbiased and consistent. However, because X 2 was not randomly assigned (newlyenrolled students may, on average, have attributes other than being newly enrolled that affect test scores), E(u X 1, X 2 ) may depend on X 2, so that ˆβ 2 may be biased and inconsistent. Statistically, assume that E(u i X 1i, X 2i ) = E(u i X 2i ) which may not be equal to zero. Intuitively, once you control for student type (continuing or newly-enrolled), class size is randomly assigned. For convenience, assume further that E(u i X 2i ) = γ 0 + γ 2 X 2i. Page 7

Define v i = u i E(u i X 1i, X 2i ). (Note that E(v i X 1i, X 2i ) = 0) Thus, Y i = β 0 + β 1 X 1i + β 2 X 2i + u i = β 0 + β 1 X 1i + β 2 X 2i + E(u i X 1i, X 2i ) + v i = β 0 + β 1 X 1i + β 2 X 2i + E(u i X 2i ) + v i = β 0 + β 1 X 1i + β 2 X 2i + γ 0 + γ 2 X 2i + v i = β 0 + γ 0 + β 1 X 1i + (β 2 + γ 2 )X 2i + v i. Thus, the conditional mean of the error term in this model is zero, that is, least squares assumption 1 holds. The ordinary-least-squares regression estimate ˆβ 1 is an unbiased estimate of β 1. The ordinary-least-squares regression estimate ˆβ 2, however, is an unbiased estimate of β 2 + γ 2, not β 2. 6. The Bonferroni test of the joint hypothesis β 1 = β 1,0 and β 2 = β 2,0 based on the critical value c > 0 uses the following rule: Do not reject if t 1 c and if t 2 c; otherwise, reject, where t 1 and t 2 are the t-statistics that test the restriction on β 1 and β 2 respectively. For a significance level of 5% and two restrictions the Bonferroni critical value c equals 2.241. For the following questions use a large sample approximation for your test statistics. (a) Using the above critical value, what is the probability of rejecting the null when the null is true i. when ρ ˆβ1, ˆβ 2 = 0. Solution: The probability that the Bonferroni test does not reject the null hypothesis when the hypothesis is true is Pr( t 1 < 2.2414, t 2 < 2.2414). Asymptotically, the estimates ˆβ 1 and ˆβ 2 are jointly normally distributed. Hence, for such random variables, zero correlation implies independence. Hence, Pr( t 1 < 2.2414, t 2 < 2.2414) = Pr( t 1 < 2.2414) Pr( t 2 < 2.2414). For a normally distributed random variable, z, Pr( z < 2.2414) = 0.975. Hence, the probability of rejecting the null when the null is true equals 1 0.975 2 = 0.0494. ii. when ρ ˆβ1, ˆβ 2 =.5. (Hint: If t 1 and t 2 are two jointly normally distributed random variables with correlation equal to 0.5 then Pr( t 1 < 2.2414, t 2 < 2.2414) = 0.9535.) Solution: Using the hint directly, the probability of rejecting the null when the null is true is 1 0.9535 = 0.0465. iii. when ρ ˆβ1, ˆβ 2 = 1. Page 8

Solution: When the correlation between ˆβ 1 and ˆβ 2 equals one then Pr( t 1 < 2.241, t 2 < 2.241) = Pr( t 1 < 2.241) = 0.975. Hence, the probability of rejecting the null when the null is true is 1-0.975=0.025. (b) Comment on the size and power of the Bonferroni test as the correlation between β 1 and β 2 increases. Solution: As the correlation between β 1 and β 2 increases the size of the test decreases, the probability of rejecting the null when the null is true decreases. As a result, the power of the test falls, the probability of rejecting the null when the null is false falls. Empirical Questions For these empirical exercises, the required datasets and a detailed description of them can be found at www.wise.xmu.edu.cn/course/gecon/written.html. 7. (SW E7.3) The data set used in this empirical exercise (CollegeDistance) contains data from a random sample of high school seniors interviewed in 1980 and re-interviewed in 1986. In this exercise you will use these data to investigate the relationship between the number of completed years of education for young adults and the distance from each student s high school to the nearest four year college. (Proximity to college lowers the cost of education, so that students who live closer to a four-year college should, on average, complete more years of higher education.) Solution: The R code required for each question is listed within its respective solution. The code listed here initialises the software. # read data and attach data CD< read. csv ( D: /R/ C o l l e g e D i s t a n c e. csv ) # a t t a c h i n g a l l o w s you to d i r e c t l y a c c e s s v a r i a b l e names attach (CD) # add AER l i b r a r y f o r r e q u i r e d f u n c t i o n s l i b r a r y ( AER ) ## The table summarises the regressions used to answer the questions. Page 9

Dependent variable: years of completed education (ED). Model Regressor (1) (2) (3) Dist -0.0734-0.0308-0.0326 (0.0134) (0.0116) (0.0126) Bytest 0.0924 0.0931 (0.0030) (0.0030) F emale 0.1434 0.1439 (0.0503) (0.0503) Black 0.3538 0.3384 (0.0675) (0.0689) Hispanic 0.4024 0.3492 (0.0737) (0.0774) Incomehi 0.3666 0.3741 (0.0622) (0.0623) Ownhome 0.1456 0.1433 (0.0648) (0.0652) Dadcoll 0.5699 0.5740 (0.0763) (0.0764) Momcoll 0.3792 0.3787 (0.0836) (0.0835) Cue80 0.0244 0.0283 (0.0093) (0.0095) Stwmfg80-0.0502-0.0426 (0.0196) (0.0199) U rban 0.0652 (0.0634) T uition -0.1848. (0.0988) Intercept 13.9559 8.8614 8.8935 (0.0378) (0.2411) (0.2437) Summary Statistics F -statistic for Black and Hispanic 22.155 F -statistic for U rban and T uition 2.4253 SER 1.807 1.538 1.538 R 2 0.0075 0.2829 0.2838 R 2 0.0072 0.2809 0.2814 N 3796 3796 3796 Model Specifications among Three Regressions of Years of Completed Education on Page 10

Distance to the Nearest College and Other Independent Variables. Heteroscedasticity- Robust Standard Errors in Parentheses under Coefficients. Significance Level (Using Two-Sided Test): ***,0.1%; **,1%; *,5%;.,10%. (a) An education advocacy group argues that, on average, a person s educational attainment would increase by approximately 0.15 year if distance to the nearest college is decreased by 20 miles. Run a regression of years of completed education (ED) on distance to the nearest college (Dist). Is the advocacy groups claim consistent with the estimated regression? Explain. Solution: # model e s t i m a t i o n model 1< lm ( formula=yrsed d i s t, data=cd) # h e t e r o s k e d a s t i c i t y robust standard e r r o r s c o e f t e s t ( model 1, vcov.=vcovhc( model 1, type= HC1 ) ) # summary o f model e s t i m a t i o n summary( model 1 ) ## The education advocacy group s claim is that the coefficient on Dist is 0.075, noting that the variable is in 10 s of miles). The 95% confidence interval for β Dist from column (1) in the above table is ( 0.0734 1.96 0.0134, 0.0734 + 1.96 0.0134) or (.099664,.047136), which includes the group s claim 0.075. Thus, the advocacy groups claim is consistent with the estimated regression. (b) Other factors also affect how much college a person completes. Does controlling for these other factors change the estimated effect of distance on college years completed? To answer this question, construct a table like Table 7.1 in the textbook. Include a simple specification [constructed in (a)], a base specification (that includes a set of important control variables), and several modifications of the base specification. Discuss how the estimated effect of Dist on ED changes across specifications. Solution: # model e s t i m a t i o n m2< lm ( formula=yrsed d i s t+b y t e st+female+black+h i s p a n i c +incomehi+ownhome+d a d c o l l+momcoll+cue80+stwmfg80, data=cd) # h e t e r o s k e d a s t i c i t y robust standard e r r o r s c o e f t e s t (m2, vcov=vcovhc(m2, type= HC1 ) ) # summary o f model e s t i m a t i o n summary(m2) ### # model e s t i m a t i o n m3< lm ( formula=yrsed d i s t+b y t e st+female Page 11

+black+h i s p a n i c+incomehi+ownhome+d a d c o l l +momcoll+cue80+stwmfg80+urban+t u i t i o n, data=cd) # h e t e r o s k e d a s t i c i t y robust standard e r r o r s c o e f t e s t (m3, vcov=vcovhc(m3, type= HC1 ) ) # summary o f model e s t i m a t i o n summary(m3) vcov m3< vcovhc(m3, type= HC1 ) # j o i n t s i g n i f i c a n c e t e s t l i n e a r H y p o t h e s i s (m3, c ( urban =0, t u i t i o n =0 ), vcov=vcov m3 ) ## The simple specification is shown in column (1) in the above table, which only includes the factor Dist, distance to the nearest college. From the empirical questions in last time s homework, we know that apart from Dist, some additional regressors controlling for characteristics of the student, the student s family and the local labour market, such as Bytest, F emale, Black, Hispanic, Incomehi, Ownhome, DadColl, M omcoll, Cue80, and Stwmf g80, are significant, too. Column (2) in the above table shows the base specification controlling for such important factors. In column (2), the coefficient on Dist is 0.0308, which is much different from the results from the simple specification in column (1). R2 and SER change much, too. Column (3) shows another model specification including another two factors, U rban and T uition, which are not jointly significant(f -statistic= 2.4253 and the p-value= 0.08859). The coefficient on Dist, 0.0326, in column (3) changes little from the results in column (2). R2 and SER change very little between the last two columns, too. From the base specification in column (2), the 95% confidence interval for β Dist is ( 0.0326 1.96 0.0126, 0.0326+1.96 0.0126) or (.057296,.007904), which doesn t include the group s claim 0.075. Similar results are obtained from the regression in column (3). (c) It has been argued that, controlling for other factors, blacks and Hispanics complete more college than whites. Is this result consistent with the regressions that you constructed in part (b)? Solution: # h e t e r o s k e d a s t i c i t y robust variance c o v a r i a n c e matrix vcov m2< vcovhc(m2, type= HC1 ) # j o i n t s i g n i f i c a n c e t e s t l i n e a r H y p o t h e s i s (m2, c ( black =0, h i s p a n i c =0 ), vcov=vcov m2 ) ## Yes. The base specification in column (2) shows that the estimated coefficients ˆβ Black and ˆβ Hispanic are positive, large, and statistically separately and jointly significant (the F -statistic= 22.155 and the p-value is nearly zero). (d) Graph a 95% joint confidence interval for the coefficients on blacks and Hispanics. Page 12

Solution: # e l l i p s e p l o t car : : e l l i p s e ( m2$coef [ 5 : 6 ], vcov m2 [ 5 : 6, 5 : 6 ], s q r t ( qchisq ( 0. 9 5, 2 ) ), add=f, xlab= Black, ylab= Hispanic ) mtext ( C o e f f i c i e n t s o f Black and Hispanic, s i d e =3, l i n e =3, f o n t =2, outer=false) mtext( 95% Joint Confidence I n t e r v a l, s i d e =3, l i n e =1.5, f o n t =3, outer=false) ## Coefficients of Black and Hispanic 95% Joint Confidence Interval Hispanic 0.3 0.4 0.5 0.20 0.25 0.30 0.35 0.40 0.45 0.50 Black Page 13