Answers to Problem Set #4 Problems. Suppose that, from a sample of 63 observations, the least squares estimates and the corresponding estimated variance covariance matrix are given by: bβ bβ 2 bβ 3 = 2 3 and covariance matrix bβ β2 b β3 b bβ 3 2 bβ 2 2 4 0 bβ 3 0 3 Test each of the following hypotheses and state the conclusion: a. β 2 =0 t = ˆβ 2 β 2 = 3 0 = 3 ˆσ β2 4 2 =.5 Since the critical t-value is approximately.67, fail to reject the null hypothesis.. b. β +2β 2 =5 Restate the null in terms of a new parameter, γ =5 β 2β 2 : H o : γ =0 The t-statistic is: where t = ˆγˆσ γ = 3 0.905 ˆγ =5 ˆβ 2ˆβ 2 =5 2 2 3= 3 ˆσ 2 γ = V ³5 ˆβ 2ˆβ 2 = V ˆσ 2 γ = 3+4(4)+4( 2) = ˆσ γ = ³ˆβ +4V ³ˆβ2 +2( ) ( 2) cov ³ˆβ, 2 ˆβ Since the t-statistic is less than the critical t-value (approximately.67) we fail to reject the null hypothesis.. c. β β 2 + β 3 =4
Again, restate the null in terms of a new parameter, γ =4 β + β 2 β 3 : H o : γ =0 The t-statistic is: t = ˆγˆσ = 6 γ 4 =.5 where ˆγ =4 ˆβ + ˆβ 2 ˆβ 3 =4 2+3+=6 ˆσ 2 γ = V ³4 ˆβ + ˆβ 2 3 ˆβ ³ˆβ ³ˆβ2 ³ˆβ3 = V + V + V +2 ( ) () cov ³ˆβ, 2 ˆβ +2( )( )cov ³ˆβ, 3 ˆβ + 2()( )cov ³ˆβ2, 3 ˆβ ˆσ 2 γ = 3+4+3+( 2)( 2) + 2() + ( 2)(0) = 6 ˆσ γ = 6 = 4 Since the t-statistic is less than the critical t-value (approximately.67) we fail to reject the null hypothesis. 2. Suppose that, from a sample of 00 observations, you run a regression of the dependent variable on 3 independent variables or regressors. Some of the output from this regression looks as follows: S.E.of regression: 2.532 Varianceof y : 4.326 Conduct the F-test for the overall signiþcance of the coefficients. The test has the following null and alternative hypotheses: H o : β = β 2 = β 3 =0 H A : β j 6=0, for at least one j =, 2, 3 First, Þnd the sum of sqaured residuals: s PT s = ε2 t (T K) s 2 = s 2 (T K) = ε2 t (T K) ε 2 t ε 2 t = (2.532) 2 (00 3) = 62.92
Find the (total sum of sqaures) sum of squared y t s: var(y) = (y t ȳ) 2 T 4.326 = (y t ȳ) 2 99 (y t ȳ) 2 = 428.274 Note, the R 2 for this regression is: R 2 = ε2 t 62.92 (y = 2 t ȳ) 428.27 = 0.452 The R 2 by deþnition has to be bounded by 0 and. If the R 2 had turned out positive, you can use the following relationship between the R 2 and the F-statistic for an overall signiþcance test: R 2 /(K ) F = ( R 2 ) /(T K) 3. Show that: R 2 = R 2 k T k ( R2 ) The adjusted R 2 and R 2 are deþned as: R 2 = R 2 = ³ PT ε2 t ³ PT /(T K) (y t ȳ) 2 /(T ) ε2 t (y t ȳ) 2
Rewrite R 2 : R 2 = R 2 (T ) (T K) (T ) (T ) = + R2 (T K) (T K) (T K T +) 2 (T ) = + R (T K) (T K) ( K) (T ) = + R2 (T K) (T K) 2 (T ) (K ) = R (T K) (T K) µ T K + K = R 2 (K ) T K (T K) µ = R 2 + K (K ) T K (T K) = R 2 + R 2 K T K (K ) (T K) = R2 ( K) R 2 (T K) 4. Given the following regression model y t = β + β 2 x 2t + β 3 x 3t + ε t you are asked to test the null hypothesis, H 0 : β 2 +3β 3 =. (a) Construct the restricted regression that imposes the constraint implied by the null hypothesis and discuss how you would use an F-test to test this null hypothesis. Restricted model: y t = β +( 3β 3 ) x 2t + β 3 x 3t + ε t y t = β + x 2t + β 3 (x 3t 2x 2t )+ε t y t x 2t = β + β 3 (x 3t 2x 2t )+ε t Construct two new variables: ỹ t =(y t x 2t )and x t =(x 3t 2x 2t ). Run the following two regressions: (U) : y t = β + β 2 x 2t + β 3 x 3t + ε t (R) : ỹ t = β + β 3 ( x t )+ε t Obtain the sum of squared residuals and construct the F statistic: F = (SSE R SSE U ) /(3 2) SSE U /(T 3) = (SSE R SSE U ) SSE U /(T 3)
If the F statistic is greater than the critical F,T 3 then we reject the null hypothesis that β 2 +3β 3 =.. b. Suppose instead that you are asked to test this null hypothesis using a t-statistic. Describe how you would construct the test in this case. You need not write down \ the exact OLS expressions for quantities such as COV ( β b i, β b j ). Remember that V (ax +by )=a 2 V (X)+b 2 V (Y )+2abCOV (X, Y ), and that out of the estimation you will obtain the variances and covariances of all the coefficient estimates. Restate the null in terms of a new parameter, γ = β 2 3β 3 : The t-statistic is: where H o : γ =0 t = ˆγˆσ γ ˆγ = β 2 3β 3 ³ˆβ2 ³ˆβ3 ˆσ 2 \ ³ˆβ2 3 γ = V ( β 2 3β 3 )=V + V +2( )( 3) cov, ˆβ r ³ˆβ2 ³ˆβ3 \ ³ˆβ2 3 ˆσ γ = V + V +2( )( 3) cov, ˆβ Calculate the t-statistic and compare to the critical t-value. If the t-statistic exceeds the critical value, reject the null hypothesis. Another way to solve this problem is impose the restriction in the regression model: Y t = β +(γ 3β 3 +)X 2t + β 3 X 3t + ε t Y t = β + γx 2t + β 3 (X 3t 3X 2t )+X 2t + ε t Y t X 2t = β + γx 2t + β 3 (X 3t 3X 2t )+ε t You can estimate this expression using least squares and test the hypothesis that γ =0. 5. Suppose you estimate by OLS the model However, the true model is given by the expression y t = βx t + ε t () y t = βx t + γz t + u t (2) where u t is the residual term, and z t is a regressor that was omitted in the Þrst regression. Furthermore, suppose that γ < 0andthatCOV (x, z) > 0
(a) Ignoring that γ < 0andthatCOV (x, z) > 0, in general, under what conditions will the estimator b β be an unbiased estimator of the true β? Unbiasedness requires that E(x t,u t )=0. ˆβ = = (y t ȳ)(x t x) (x t x) 2 (βx t + γz t + u t β x γ z)(x t x) (x t x) 2 Ignoring that γ < 0andthatCOV (x, z) > 0: ˆβ = (βx t + u t βx t )(x t x) (x t x) 2 ˆβ = β (x P t x)(x t x) T (x + u t (x t x) t x) 2 (x t x) 2 ˆβ = β + u t (x t x) (x t x) 2 Taking the expected value, conditioning on x t : " ³ˆβ xt PT # E = E(β x t )+E u t (x t x) (x t x) 2 x t = β + = β + "Ã! # (x t x) E u 2 t x t x t "Ã! # (x t x) E u 2 t x t x t 0 (x t x) 2 E "à x! # u t x t Using the Law of Iterated Expectations: Ã! E ³ˆβ = β + (x t x) E u 2 t x t ³ PT From the above expression, we can see that E ³ˆβ 6= β if E u tx t 6= 0.. b. Calculate the bias in b β when we estimate by OLS equation () (i.e., we omit z t ).
Calculate ˆβ : ˆβ = ˆβ = (y t ȳ)(x t x) (x t x) 2 (βx t + γz t + u t β x γ z)(x t x) (x t x) 2 ˆβ = β (x t x) 2 (x t x) + γ (z P t z)(x t x) T 2 (x + u t (x t x) t x) 2 (x t x) 2 Taking the expected value, conditioning on x t : à ³ˆβ xt! # E = E(β x t )+ "γ (x t x) E (z 2 t z)(x t x) x t + "Ã! #! # (x t x) E u 2 t x t x t "à x (x t x) E u 2 t x t Assuming E(x t,u t )=0holds: ³ˆβ xt E E ³ˆβ = β + = β + "Ã! # γ (x t x) E (z 2 t z)(x t x) x t "!# à γ (x t x) E (z 2 t z)(x t x) {z } BIAS. c. Given the conditions of the problem, is b β from the previous part biased upwards or downwards? Explain why. Recall that γ < 0andcov(x t,z t ) > 0: E ³ˆβ E ³ˆβ Ã! Ã! γ = β + (x t x) 2 E (z t z)(x t x) {z } {z } = β + negative bias ( ) (+) Therefore, the estimate of ˆβ will be biased downward.
Comments on Problem Set #4 (Empirical) There are a number of ways to approach this problem. I have provided some results below, but they are by no means exhaustive. As a first step, include all of the explanatory variables: DRUGS, ENROLLMENT, MATH87, and SES in a baseline regression. Note, the variable ENROLL is ENROLLMENT/000, so the coefficient can be interpreted for each 000 students. Dependent Variable: MATH9 Method: Least Squares Date: 03/0/0 Time: 6:07 Sample: 407 Included observations: 402 Excluded observations: 5 Variable Coefficient Std. Error t-statistic Prob. C 0.02374 0.083436 0.48309 0.8822 DRUGS -0.96503 0.436590-2.099230 0.0364 ENROLL 0.067648 0.07762 0.876704 0.382 MATH87 0.6467 0.037062 7.29973 0.0000 SES 0.33049 0.03932 3.383634 0.0008 URBAN 0.000446 0.000880 0.507469 0.62 R-squared 0.548663 Mean dependent var Adjusted R-squared 0.542964 S.D. dependent var S.E. of regression 0.6052 Akaike info criterion Sum squared resid 47.4253 Schwarz criterion Log likelihood -368.7840 F-statistic Since the p-values for ENROLL and URBAN are greater than 0.05, the variables are not significant. When you conduct an F-test to find whether the variables are jointly significant, you will find the p-value is greater than 0.05. This yields the following regression: Dependent Variable: MATH9 Method: Least Squares Date: 02/26/0 Time: 7:03 Sample: 407 Included observations: 405 Excluded observations: 2 Variable Coefficient Std. Error t-statistic Prob. C 0.07039 0.07595 0.98278 0.3266 DRUGS -0.73534 0.4390 -.77662 0.0764 MATH87 0.63976 0.036826 7.37269 0.0000 SES 0.4094 0.039050 3.6392 0.0003 R-squared 0.54252 Mean dependent var -0.026229 Adjusted R-squared 0.539099 S.D. dependent var 0.899902 S.E. of regression 0.6094 Akaike info criterion.86294 Sum squared resid 49.6727 Schwarz criterion.90738 Log likelihood -373.0943 F-statistic 58.544 Durbin-Watson stat.954262 Prob(F-statistic) 0.000000
Notice that when we drop the ENROLL and URBAN variables, the DRUGS variable becomes insignificant at the 95% level. A Wald test on the joint significance of the DRUGS, ENROLL, and URBAN coefficients from the baseline regression above yields: Wald Test: Equation: EQ0 Null Hypothesis: C(2)=0 C(3)=0 C(6)=0 F-statistic.808365 Probability 0.45052 Chi-square 5.425094 Probability 0.4388 This implies that the DRUGS, URBAN, and ENROLL variables are jointly insignificant at the 95% level. It is a matter of choice in deciding whether or not to drop these variables. Here, we decide to leave the variables in the regression. Given the sample size, we do not need to worry about degrees of freedom. Now, we will look more carefully at subsamples using dummy variables. We run the following regression for males and females: Dependent Variable: MATH9 Method: Least Squares Date: 03/0/0 Time: 6: Sample: 407 Included observations: 402 Excluded observations: 5 Variable Coefficient Std. Error t-statistic Prob. MALE*MATH87 0.572959 0.05084.26970 0.0000 (-MALE)*MATH87 0.7053 0.053825 3.0033 0.0000 MALE*DRUGS -.4846 0.475200-2.46700 0.06 (-MALE)*DRUGS -0.697427 0.50966 -.36497 0.73 MALE*ENROLL 0.4982 0.09595.562023 0.9 (-MALE)*ENROLL 0.07400 0.09838 0.76980 0.8596 MALE*SES 0.78070 0.054866 3.245536 0.003 (-MALE)*SES 0.0944 0.054720.79902 0.0862 MALE*URBAN 0.000568 0.00203 0.47255 0.637 (-MALE)*URBAN 0.000409 0.00268 0.322687 0.747 R-squared 0.555037 Mean dependent var -0.025206 Adjusted R-squared 0.54482 S.D. dependent var 0.902533 S.E. of regression 0.6089 Akaike info criterion.870273 Sum squared resid 45.3430 Schwarz criterion.969687 Log likelihood -365.9249 Durbin-Watson stat.983788 A couple of implications from this regression deserve comment. The F-statistics which test whether females and males respond differently all yield p-values that are greater than 0.5. This means that males and females respond the same way to DRUGS, ENROLL, SES, URBAN, and MATH87 in terms of their MATH9 performance. However, compared to the average response, the following observations hold. First, males at schools where drugs are prominent tend to perform worse than women at the same schools. It appears that the effect drugs have on women is not significant. Second, males at schools with high enrollment tend to perform better than women at the same schools. Third, social-economic status affects the performance of males more
than females in terms of test scores. Finally, males and females do not respond differently in terms of their previous math scores or the degree of urbanization. The latter finding could be the result of including both the ENROLL and URBAN variables in the regress. These two variables are most likely highly correlated, so it could be that ENROLL is taking explanatory power away from the URBAN variable. Even though males and females do not respond differently, the t-statisics reveal that males attending schools with high enrollment, low level of drugs, and with high social-economic status perform better on average. We have to consider this when making a recommendation on how to improve math scores. For instance, fighting a drug problem will be successful in raising test scores if the school male-dominated in terms of enrollment.