CHAPTER 8 SOLUTIONS TO PROBLEMS 8.1 Parts () and (). The homoskedastcty assumpton played no role n Chapter 5 n showng that OLS s consstent. But we know that heteroskedastcty causes statstcal nference based on the usual t and F statstcs to be nvald, even n large samples. As heteroskedastcty s a volaton of the Gauss-Markov assumptons, OLS s no longer BLUE. 8. Wth Var(u nc,prce,educ,female) = σ nc, h(x) = nc, where h(x) s the heteroskedastcty functon defned n equaton (8.1). Therefore, h( x ) = nc, and so the transformed equaton s obtaned by dvdng the orgnal equaton by nc: beer = β0(1/ nc) + β1 + β( prce / nc) + β3( educ / nc) + β4( female / nc) + ( u / nc). nc Notce that β 1, whch s the slope on nc n the orgnal model, s now a constant n the transformed equaton. Ths s smply a consequence of the form of the heteroskedastcty and the functonal forms of the explanatory varables n the orgnal equaton. 8.3 False. The unbasedness of WLS and OLS hnges crucally on Assumpton MLR.3, and, as we know from Chapter 4, ths assumpton s often volated when an mportant varable s omtted. When MLR.3 does not hold, both WLS and OLS are based. Wthout specfc nformaton on how the omtted varable s correlated wth the ncluded explanatory varables, t s not possble to determne whch estmator has a small bas. It s possble that WLS would have more bas than OLS or less bas. 8.4 () These varables have the antcpated sgns. If a student takes courses where grades are, on average, hgher as reflected by hgher crsgpa then hs/her grades wll be hgher. The better the student has been n the past as measured by cumgpa, the better the student does (on average) n the current semester. Fnally, tothrs s a measure of experence, and ts coeffcent ndcates an ncreasng return to experence. The t statstc for crsgpa s very large, over fve usng the usual standard error (whch s the largest of the two). Usng the robust standard error for cumgpa, ts t statstc s about.61, whch s also sgnfcant at the 5% level. The t statstc for tothrs s only about 1.17 usng ether standard error, so t s not sgnfcant at the 5% level. () Ths s easest to see wthout other explanatory varables n the model. If crsgpa were the only explanatory varable, H 0 : β = 1 means that, wthout any nformaton about the student, crsgpa the best predctor of term GPA s the average GPA n the students courses; ths holds essentally by defnton. (The ntercept would be zero n ths case.) Wth addtonal explanatory varables t s not necessarly true that β = 1 because crsgpa could be correlated wth characterstcs of crsgpa the student. (For example, perhaps the courses students take are nfluenced by ablty as 6
measured by test scores and past college performance.) But t s stll nterestng to test ths hypothess. The t statstc usng the usual standard error s t = (.900 1)/.175.57; usng the heteroskedastcty-robust standard error gves t.60. In ether case we fal to reject H 0 : β = 1 at any reasonable sgnfcance level, certanly ncludng 5%. () The n-season effect s gven by the coeffcent on season, whch mples that, other thngs equal, an athlete s GPA s about.16 ponts lower when hs/her sport s competng. The t statstc usng the usual standard error s about 1.60, whle that usng the robust standard error s about 1.96. Aganst a two-sded alternatve, the t statstc usng the robust standard error s just sgnfcant at the 5% level (the standard normal crtcal value s 1.96), whle usng the usual standard error, the t statstc s not qute sgnfcant at the 10% level (cv 1.65). So the standard error used makes a dfference n ths case. Ths example s somewhat unusual, as the robust standard error s more often the larger of the two. 8.5 () No. For each coeffcent, the usual standard errors and the heteroskedastcty-robust ones are practcally very smlar. () The effect s.09(4) =.116, so the probablty of smokng falls by about.116. () As usual, we compute the turnng pont n the quadratc:.00/[(.0006)] 38.46, so about 38 and one-half years. (v) Holdng other factors n the equaton fxed, a person n a state wth restaurant smokng restrctons has a.101 lower chance of smokng. Ths s smlar to the effect of havng four more years of educaton. (v) We just plug the values of the ndependent varables nto the OLS regresson lne: smokes ˆ =.656.069 log(67.44) +.01 log(6,500).09(16) +.00(77).0006(77 ).005. Thus, the estmated probablty of smokng for ths person s close to zero. (In fact, ths person s not a smoker, so the equaton predcts well for ths partcular observaton.) SOLUTIONS TO COMPUTER EXERCISES 8.6 () Gven the equaton crsgpa sleep = β + β totwrk + β educ + β age + β age + β yngkd + β male + u 0 1 3 4 5 6, the assumpton that the varance of u gven all explanatory varables depends only on gender s Var( u totwrk, educ, age, yngkd, male) = Var( u male) = δ + δ male 0 1 63
Then the varance for women s smply δ 0 and that for men s δ 0 + δ 1; the dfference n varances s δ 1. () After estmatng the above equaton by OLS, we regress (ncludng, of course, an ntercept). We can wrte the results as u ˆ on male, = 1,, K,706 û = 189,359. 8,849.6 male + resdual (0,546.4) (7,96.5) n = 706, R =.0016. Because the coeffcent on male s negatve, the estmated varance s hgher for women. () No. The t statstc on male s only about 1.06, whch s not sgnfcant at even the 0% level aganst a two-sded alternatve. 8.7 () The estmated equaton wth both sets of standard errors (heteroskedastcty-robust standard errors n brackets) s prce = 1.77 +.0007 lotsze +.13 sqrft +13.85 bdrms (9.48) (.00064) (.013) (9.01) [36.8] [.001] [.017] [8.8] n = 88, R =.67. The robust standard error on lotsze s almost twce as large as the usual standard error, makng lotsze much less sgnfcant (the t statstc falls from about 3.3 to about 1.70). The t statstc on sqrft also falls, but t s stll very sgnfcant. The varable bdrms actually becomes somewhat more sgnfcant, but t s stll barely sgnfcant. The most mportant change s n the sgnfcance of lotsze. () For the log-log model, log ( prce ) = 5.61 +.168 log(lotsze) +.700 log(sqrft) +.037 bdrms (0.65) (.038) (.093) (.08) [0.76] [.041] [.101] [.030] n = 88, R =.643. Here, the heteroskedastcty-robust standard error s always slghtly greater than the correspondng usual standard error, but the dfferences are relatvely small. In partcular, log(lotsze) and log(sqrft) stll have very large t statstcs, and the t statstc on bdrms s not sgnfcant at the 5% level aganst a one-sded alternatve usng ether standard error. 64
() As we dscussed n Secton 6., usng the logarthmc transformaton of the dependent varable often mtgates, f not entrely elmnates, heteroskedastcty. Ths s certanly the case here, as no mportant conclusons n the model for log(prce) depend on the choce of standard error. (We have also transformed two of the ndependent varables to make the model of the constant elastcty varety n lotsze and sqrft.) 8.8 After estmatng equaton (8.18), we obtan the squared OLS resduals û. The full-blown Whte test s based on the R-squared from the auxlary regresson (wth an ntercept), û on llotsze, lsqrft, bdrms, llotsze, lsqrft, bdrms, llotsze lsqrft, llotsze bdrms, and lsqrft bdrms, where l n front of lotsze and sqrft denotes the natural log. [See equaton (8.19).] Wth 88 observatons the n-r-squared verson of the Whte statstc s 88(.109) 9.59, and ths s the outcome of an (approxmately) χ 9 random varable. The p-value s about.385, whch provdes lttle evdence aganst the homoskedastcty assumpton. 8.9 () The estmated equaton s votea ˆ =37.66 +.5 prtystra +3.793 democa +5.779 log(expenda) (4.74) (.071) (1.407) (0.39) 6.38 log(expendb) + û (0.397) n = 173, R =.801, R =.796. You can convnce yourself that regressng the uˆ on all of the explanatory varables yelds an R- squared of zero, although t mght not be exactly zero n your computer output due to roundng error. Remember, ths s how OLS works: the estmates ˆ β j are chosen to make the resduals be uncorrelated n the sample wth each ndependent varable (as well as have zero sample average). () The B-P test entals regressng the u ˆ on the ndependent varables n part (). The F statstc for jont sgnfcant (wth 4 and 168 df) s about.33 wth p-value.058. Therefore, there s some evdence of heteroskedastcty, but not qute at the 5% level. () Now we regress u ˆ on votea ˆ and ( votea ˆ ), where the votea ˆ are the OLS ftted values from part (). The F test, wth and 170 df, s about.79 wth p-value.065. Ths s slghtly less evdence of heteroskedastcty than provded by the B-P test, but the concluson s very smlar. 65
8.10 () By regressng sprdcvr on an ntercept only we obtan ˆµ.515 se.01). The asymptotc t statstc for H 0 : µ =.5 s (.515.5)/.01.71, whch s not sgnfcant at the 10% level, or even the 0% level. () 35 games were played on a neutral court. () The estmated LPM s sprdcvr ˆ =.490 +.035 favhome +.118 neutral.03 fav5 +.018 und5 (.045) (.050) (.095) (.050) (.09) n = 553, R =.0034. The varable neutral has by far the largest effect f the game s played on a neutral court, the probablty that the spread s covered s estmated to be about.1 hgher and, except for the ntercept, ts t statstc s the only t statstc greater than one n absolute value (about 1.4). (v) Under H 0 : β 1 = β = β 3 = β 4 = 0, the response probablty does not depend on any explanatory varables, whch means nether the mean nor the varance depends on the explanatory varables. [See equaton (8.38).] (v) The F statstc for jont sgnfcance, wth 4 and 548 df, s about.47 wth p-value.76. There s essentally no evdence aganst H 0. (v) Based on these varables, t s not possble to predct whether the spread wll be covered. The explanatory power s very low, and the explanatory varables are jontly very nsgnfcant. The coeffcent on neutral may ndcate somethng s gong on wth games played on a neutral court, but we would not want to bet money on t unless t could be confrmed wth a separate, larger sample. 8.11 () The estmates are gven n equaton (7.31). Rounded to four decmal places, the smallest ftted value s.0066 and the largest ftted value s.5577. () The estmated heteroskedastcty functon for each observaton s hˆ ˆ (1 ˆ = arr86 arr86 ), whch s strctly between zero and one because 0 < arr86 ˆ < 1 for all. The weghts for WLS are 1/ h ˆ. To show the WLS estmate of each parameter, we report the WLS results usng the same equaton format as for OLS: ˆ arr86 =.448.168 pcnv +.0054 avgsen.0018 tottme.05 ptme86 (.018) (.019) (.0051) (.0033) (.003).045 qemp86 (.005) n =,75, R =.0744. 66
The coeffcents on the sgnfcant explanatory varables are very smlar to the OLS estmates. The WLS standard errors on the slope coeffcents are generally lower than the nonrobust OLS standard errors. A proper comparson would be wth the robust OLS standard errors. () After WLS estmaton, the F statstc for jont sgnfcance of avgsen and tottme, wth and,719 df, s about.88 wth p-value.41. They are not close to beng jontly sgnfcant at the 5% level. If your econometrcs package has a command for WLS and a test command for jont hypotheses, the F statstc and p-value are easy to obtan. Alternatvely, you can obtan the restrcted R-squared usng the same weghts as n part () and droppng avgsen and tottme from the WLS estmaton. (The unrestrcted R-squared s.0744.) 8.1 () The heteroskedastcty-robust standard error for ˆwhte β.19 s about.06, whch s notably hgher than the nonrobust standard error (about.00). The heteroskedastcty-robust 95% confdence nterval s about.078 to.179, whle the nonrobust CI s, of course, narrower, about.090 to.168. The robust CI stll excludes the value zero by some margn. () There are no ftted values less than zero, but there are 31 greater than one. Unless we do somethng to those ftted values, we cannot drectly apply WLS, as h ˆ wll be negatve n 31 cases. 8.13 () The equaton estmated by OLS s colgpa =1.36 +.41 hsgpa +.013 ACT.071 skpped +.14 PC (.33) (.09) (.010) (.06) (.057) n = 141, R =.59, R =.38 () The F statstc obtaned for the Whte test s about 3.58. Wth and 138 df, ths gves p- value.031. So, at the 5% level, we conclude there s evdence of heteroskedastcty n the errors of the colgpa equaton. (As an asde, note that the t statstcs for each of the terms s very small, and we could have smply dropped the quadratc term wthout losng anythng of value.) () In fact, the smallest ftted value from the regresson n part () s about.07, whle the largest s about.165. Usng these ftted values as the h ˆ n a weghted least squares regresson gves the followng: colgpa =1.40 +.40 hsgpa +.013 ACT.076 skpped +.16 PC (.30) (.083) (.010) (.0) (.056) n = 141, R =.306, R =.86 There s very lttle dfference n the estmated coeffcent on PC, and the OLS t statstc and WLS t statstc are also very close. Note that we have used the usual OLS standard error, even though 67
t would be more approprate to use the heteroskedastcty-robust form (snce we have evdence of heteroskedastcty). The R-squared n the weghted least squares estmaton s larger than that from the OLS regresson n part (), but, remember, these are not comparable. (v) Wth robust standard errors that s, wth standard errors that are robust to msspecfyng the functon h(x) the equaton s colgpa =1.40 +.40 hsgpa +.013 ACT.076 skpped +.16 PC (.31) (.086) (.010) (.01) (.059) n = 141, R =.306, R =.86 The robust standard errors do not dffer by much from those n part (); n most cases, they are slghtly hgher, but all explanatory varables that were statstcally sgnfcant before are stll statstcally sgnfcant. But the confdence nterval for β PC s a bt wder. 8.14 () I now get R =.057, but the other estmates seem okay. () One way to ensure that the unweghted resduals are beng provded s to compare them wth the OLS resduals. They wll not be the same, of course, but they should not be wldly dfferent. ( ( ( () The R-squared from the regresson u on y, y, = 1,...,807 s about.07. We use ths as R n equaton (8.15) but wth k =. Ths gves F = 11.15, and so the p-value s about zero. û (v) The substantal heteroskedastcty found n part () shows that the feasble GLS procedure descrbed on page 79 does not, n fact, elmnate the heteroskedastcty. Therefore, the usual standard errors, t statstcs, and F statstcs reported wth weghted least squares are not vald, even asymptotcally. (v) The weghted least squares equaton wth robust standard errors s cgs =5.64 + 1.30 log(ncome).94 log(cgprc).463 educ (37.31) (.54) (8.97) (.149) +.48 age.0056 age 3.46 restaurn (.115) (.001) (.7) n = 807, R =.1134 The substantal dfferences n standard errors compare wth equaton (8.36) s another ndcaton that our proposed correcton for heteroskedastcty dd not really do the trck. Wth the excepton of restaurn, all standard errors got notably bgger; for example, the standard error for 68
log(cgprc) doubled. All varables that were sgnfcant wth the nonrobust standard errors reman sgnfcant, but the confdence ntervals are much wder n several cases. [ Instructor s Note: You can also do ths exercse wth regresson (8.34) used n place of (8.3). Ths gves a somewhat larger estmated ncome effect.] 8.15 () In the followng equaton, estmated by OLS, the usual standard errors are n ( ) and the heteroskedastcty-robust standard errors are n [ ]: e401k =.506 +.014 nc.00006 nc +.065 age.00031 age.0035 male (.081) (.0006) (.000005) (.0039) (.00005) (.011) [.079] [.0006] [.000005] [.0038] [.00004] [.011] n = 9,75, R =.094. There are no mportant dfferences; f anythng, the robust standard errors are smaller. () Ths s a general clam. Snce Var(y x) = p( x)[1 p( x )], we can wrte E( u x) = p( x) [ p( x )]. Wrtten n error form, can wrte ths as a regresson model u = p( ) [ p( )] + v 0 1 x x. In other words, we u = δ + δ p( x) + δ [ p( x )] + v, wth the restrctons δ 0 = 0, δ 1 = 1, and δ = -1. Remember that, for the LPM, the ftted values, y ˆ, are estmates of p( x ) = β + β x +... + β x. So, when we run the regresson uˆ on yˆ, y ˆ (ncludng an 0 1 1 k k ntercept), the ntercept estmates should be close to zero, the coeffcent on y ˆ should be close to one, and the coeffcent on y ˆ should be close to 1. () The Whte F statstc s about 310.3, whch s very sgnfcant. The coeffcent on e401 ˆ k s about 1.010, the coeffcent on e401 ˆ k s about.970, and the ntercept s about -.009. Ths accords qute well wth what we expect to fnd. (v) The smallest ftted value s about.030 and the largest s about.697. The WLS estmates of the LPM are e401k =.488 +.016 nc.00006 nc +.055 age.00030 age.0055 male (.076) (.0005) (.000004) (.0037) (.00004) (.0117) n = 9,75, R =.108. There are no mportant dfferences wth the OLS estmates. The largest relatve change s n the coeffcent on male, but ths varable s very nsgnfcant usng ether estmaton method. 69