Econ 388 R. Butler 014 revsons Lecture 15 I. HETEROSKEDASTICITY: both pure and mpure (the mpure verson s due to an omtted regressor that s correlated wth the ncluded regressors n the model) A. heteroskedastcty=when the varance of the error s not constant across the observatons, or when the assumpton V( ) s volated, so that V( ) nstead. That s, the varance changes for at least some observatons. A. pure heteroskedastcty--there are no correlated omtted varables that cause the varance to change 1. so f Z, then we stll get PURE heteroskedastcty whenever Z s a. one of the ndependent varables already n the model, or b. t s not an ndependent varable n our model but t s not supposed to be n the model (t s not an omtted varable) c. s not an ndependent varable n the model but s supposed to be (t s an omtted varable), BUT t s uncorrelated wth all the other ndependent varables.. more generally, f the varance s some functon of omtted varables, namely Z1, Z, and f( Z, Z ) AND at least one of these should be n the regresson 1 but are NOT, AND the one that should be n the regresson s correlated wth those varables that are ncluded as the ndependent varables, then we have IMPURE heteroskedastcty. Otherwse, we have PURE heteroskedastcty. 3. f there s pure heterokedastcty, then the followng holds: a. the estmated coeffcents are unbased, but not effcent (n partcular, OLS estmates are no longer BLUE). OLS wll not be asymptotcally effcent ether. There s another knd of estmator, called a generalzed least squares (GLS) or weghted least squares (WLS) estmator that wll be more effcent and also unbased. b. estmates of the varances are based (may be based), thus nvaldatng tests of sgnfcance (t-tests and p-values) c. IF s postvely correlated wth (X k X k), whch s sometmes the case wth economc data, then the expected value of the estmated varance wll be smaller than the true varance. Hence, OLS would be understatng the true varance, and the resultng t-statstcs would be too hgh (p-values too low). We aren t sure of the drecton of the bas otherwse (the more general case). B. mpure heteroskedastcty: the heteroskedastcty s due to an omtted ndependent varable that s correlated wth one or more of the ncluded ndependent varables. In ths case, the heteroskedastcty would be assocated wth based coeffcents (unlke the PURE case), and the estmated varances of the assocated coeffcents would also be based. Wooldrdge focuses on the case of pure heteroskedastcty n Chapter 8. 1
II. Robust Standard errors. Wth pure heteroskedastcty, the OLS estmates are unbased (and consstent how would you prove ths?), but the standard errors may be based (and nconsstent). There are two general approaches to handlng the heteroskedastcty problem: 1) use weghted least squares n whch you get both new standard errors and new estmated coeffcents (the standard errors may change a lot, the estmated coeffcents wll typcally not change by very much); or ) keep the OLS estmated coeffcents (they are, after all, unbased and consstent), but adjust the standard errors. The frst approach (Approach 1) was standard untl recently, but suffers because you have to model the form of heteroskedastcty, and you are never sure f you are modelng t correctly. So the later approach (approach ) s now becomng standard snce robust standard errors have been dscovered. These automatcally adjust for any (unknown) form of heteroskedastcty (so wth pure heteroskedastcty and a large sample, you are probably gettng the correct results). Besdes beng called robust standard errors, there are also known as heteroskedastc-consstent estmates, Whte estmates, Whte-adjusted standard errors, etc (see chapter 8 for more names). Hypothess testng proceeds usng the OLS estmates of the coeffcents and the heteroskedastc-consstent standard errors. Gettng the robust standard errors s especally easy n Stata just use the robust opton as follows: regress y x1 x x3, robust; In SAS, there are many ways to get robust standard errors, but probably the easest s as follows: proc genmod; class d; model y=x1 x x3; repeated subject=d; run; example usng the Utah CPS data [[[[[ut_cps_hetcov.do]]]]] Heteroskedastcty wll also nvaldate the usual F-tests of statstcal sgnfcance. So Wooldrdge outlnes a heteroskedastc-consstent approach to testng lnear restrctons n chapter 8. Ths approach s llustrated wth the program below (wth some of the relevant output followng): STATA: * program to do hetcov robust standard errors and show a robust "F- ; * tests" wooldrdge ; * for heterosk-robust LM stat, see P. 401 ndavdson/macknnon ; * ESTIMATION&INFERENCE n ECONOMETRICS ; (((bunch of prelmnary code to read and create varables))) gen ones = 1; gen lnwage = log(wklywg); * testng educaton varables assumng homoskedastcty; regress lnwage age whte male no_h_sc hgh_sch some_col college exec tech_sal serv_occ oper_occ
test (no_h_sc=0) (hgh_sch=0) (some_col=0) (college=0); regress lnwage age whte male no_h_sc hgh_sch some_col college exec tech_sal serv_occ oper_occ ag_cnstr manuf trade pub_admn, robust; * TESTING EDUCATION VARIABLES whle allowng for heteroskedastc errors--see references at top; * get the resduals from the restr. model (next) and multply them by resduals from the ; * four auxlary regressons of the omtted varables, then regress ones on all of these ; regress lnwage age whte male exec tech_sal serv_occ oper_occ ag_cnstr manuf trade pub_admn ; predct uhat, resduals; regress no_h_sc age whte male exec tech_sal serv_occ oper_occ predct no_uhat, resduals; regress hgh_sch age whte male exec tech_sal serv_occ oper_occ predct h_uhat, resduals; regress some_col age whte male exec tech_sal serv_occ oper_occ predct som_uhat, resduals; regress college age whte male exec tech_sal serv_occ oper_occ predct col_uhat, resduals; gen uhat1 = uhat*no_uhat; gen uhat = uhat*h_uhat; gen uhat3 = uhat*som_uhat; gen uhat4 = uhat*col_uhat; regress ones uhat1 uhat uhat3 uhat4, noconstant; gen lm_heter = e(n) - e(rss); *rss=sum of squared resduals; sum lm_heter; SOME RELEVANT OUTPUT FOLLOWS:. * testng educaton varables assumng homoskedastcty;. regress lnwage age whte male no_h_sc hgh_sch some_col college exec tech_sal serv_occ oper_occ Source SS df MS Number of obs = 194 -------------+------------------------------ F( 15, 178) = 4.30 Model 3.0048301 15.13365534 Prob > F = 0.0000 Resdual 88.4508 178.49677083 R-squared = 0.658 -------------+------------------------------ Adj R-squared = 0.039 Total 10.430038 193.63989835 Root MSE =.7048 lnwage Coef. Std. Err. t P> t [95% Conf. Interval] age.0105187.004351.43 0.016.0019836.0190538 whte -.3686956.36178-1.0 0.309-1.08164.344333 male.193335.113989 1.9 0.056 -.005598.44465 no_h_sc -.8134617.869731 -.83 0.005-1.379769 -.471545 hgh_sch -.7969473.30653-3.46 0.001-1.51349 -.345461 some_col -.573499.16494 -.65 0.009-1.000597 -.1464015 college -.3698765.8009-1.6 0.107 -.8198131.0800601 exec.756144.3040339 0.91 0.366 -.34360.875589 tech_sal -.0885604.3174986-0.8 0.781 -.715106.5379851 serv_occ -.1359019.303503-0.45 0.655 -.73489.463055 oper_occ.1333178.91388 0.46 0.648 -.4414073.708048 ag_cnstr.3985858.1995688.00 0.047.0047605.794111 manuf.1959387.1610785 1. 0.5 -.119305.513808 trade.96175.1618434 1.83 0.069 -.03038.6155537 pub_admn.68051.307576 1.16 0.47 -.187305.73447 _cons 6.45909.543331 11.50 0.000 5.173708 7.31811. test (no_h_sc=0) (hgh_sch=0) (some_col=0) (college=0); 3
( 1) no_h_sc = 0 ( ) hgh_sch = 0 ( 3) some_col = 0 ( 4) college = 0 F( 4, 178) = 3.4 Prob > F = 0.0100. regress lnwage age whte male no_h_sc hgh_sch some_col college exec tech_sal serv_occ oper_occ ag_cnstr manuf trade pub_admn, robust; Regresson wth robust standard errors Number of obs = 194 F( 15, 178) = 6.75 Prob > F = 0.0000 R-squared = 0.658 Root MSE =.7048 Robust lnwage Coef. Std. Err. t P> t [95% Conf. Interval] age.0105187.0056464 1.86 0.064 -.000638.016611 whte -.3686956.19091-1.93 0.055 -.7454337.008045 male.193335.113573 1.93 0.055 -.0047893.4434563 no_h_sc -.8134617.1969169-4.13 0.000-1.0054 -.448696 hgh_sch -.7969473.7590-3.50 0.001-1.4607 -.347851 some_col -.573499.1696881-3.38 0.001 -.9083585 -.386399 college -.3698765.18015-1.74 0.084 -.7898149.0500618 exec.756144.60141 1.06 0.91 -.378871.789116 tech_sal -.0885604.537479-0.35 0.77 -.5893017.411808 serv_occ -.1359019.470563-0.55 0.583 -.63438.3516343 oper_occ.1333178.30161 0.60 0.551 -.3067779.5734134 ag_cnstr.3985858.145866 3.0 0.00.1579.644447 manuf.1959387.1415148 1.38 0.168 -.083338.475013 trade.96175.1398489.1 0.036.001998.57150 pub_admn.68051.038 1.33 0.187 -.131015.6671167 _cons 6.45909.4179341 14.94 0.000 5.41166 7.07065. * TESTING EDUCATION VARIABLES whle allowng for heteroskedastc errors;. regress lnwage age whte male exec tech_sal serv_occ oper_occ > ag_cnstr manuf trade pub_admn ; Source SS df MS Number of obs = 194 -------------+------------------------------ F( 11, 18) = 4.38 Model 5.01097 11.9110088 Prob > F = 0.0000 Resdual 95.7984 18.5330376 R-squared = 0.093 -------------+------------------------------ Adj R-squared = 0.1615 Total 10.430038 193.63989835 Root MSE =.7335 lnwage Coef. Std. Err. t P> t [95% Conf. Interval] age.0119868.0043631.75 0.007.0033779.005956 whte -.349317.3685086-0.95 0.344-1.0764.3777767 male.61855.115433.7 0.04.034115.4895948 exec.5035434.30145 1.67 0.097 -.095736 1.09966 tech_sal -.0004907.39976-0.00 0.999 -.63779.6368106 serv_occ -.106765.3080585-0.35 0.79 -.7145878.501068 oper_occ.14154.97859 0.48 0.635 -.4461664.79111 ag_cnstr.995889.07568 1.48 0.141 -.100467.6996451 manuf.1779896.160999 1.10 0.74 -.1418471.49786 trade.076753.161685 1.8 0.01 -.111348.566935 pub_admn.384559.331479 1.41 0.161 -.1315645.7884763 _cons 5.49571.5067873 10.84 0.000 4.495787 6.495655. predct uhat, resduals; (93 mssng values generated). regress no_h_sc age whte male exec tech_sal serv_occ oper_occ > 4
. predct no_uhat, resduals;. regress hgh_sch age whte male exec tech_sal serv_occ oper_occ >. predct h_uhat, resduals;. regress some_col age whte male exec tech_sal serv_occ oper_occ >. predct som_uhat, resduals;. regress college age whte male exec tech_sal serv_occ oper_occ > ((((NOTE I LEFT OUT THE ABOVE REGRESSION RESULTS wth school dummy dep var)))). regress ones uhat1 uhat uhat3 uhat4, noconstant; Source SS df MS Number of obs = 194 -------------+------------------------------ F( 4, 190) = 3.4 Model 1.3835641 4 3.09589103 Prob > F = 0.0134 Resdual 181.616436 190.955875978 R-squared = 0.0638 -------------+------------------------------ Adj R-squared = 0.0441 Total 194 194 1 Root MSE =.97769 ones Coef. Std. Err. t P> t [95% Conf. Interval] uhat1 -.4635.7839338 -.86 0.005-3.788967 -.6963038 uhat -1.309697.4139533-3.16 0.00 -.163 -.493166 uhat3-1.15455.4458074 -.59 0.010 -.033919 -.751839 uhat4 -.7433876.434706-1.71 0.089-1.600858.114085. gen lm_heter = e(n) - e(rss);. *rss=sum of squared resduals;. sum lm_heter; Varable Obs Mean Std. Dev. Mn Max -------------+-------------------------------------------------------- lm_heter 1117 1.38356 0 1.38356 1.38356 The lm_heter statstc (1.38) s Ch-square wth four degrees of freedom under the null hypothess that the schoolng dummes are jontly nsgnfcant (remember, ths s a large sample test of that hypothess that s robust to whether or not there s heteroskedastcty n the model). Ths s statstcally sgnfcant at slghtly larger than the one percent level, close to the test wthout the heteroskedastcty correcton (based on the F-test above). III. Detectng heteroskedastcty A.. Goldfeld-Quandt test: Whle ths test requres that the researcher dentfy the factor of proportonalty to order the data nto thrds (e, requres that Z be dentfed to dvde the sample nto hgher and lower varance groups), t provdes an exact test statstc even n relatvely large samples. The valdty of the Parks and Whte tests, on the other hand, hnge on havng sample szes that are sutably large. 5
The null hypothess to be nvestgated s Ho:... 1 3 n Y I II III X To do the test of ths hypothess, we proceed as follows: a. Dvde the data nto three groups (roughly equal szes n I + n II + n III = n) b. Run separate regresson on groups I and III. Let s and s represent the I III correspondng estmators of. Under the null hypothess of homoskedastcty, we have s III F( n - k, - k ) III III ni I where k=the number of coeffcents ncludng the s I ntercept. *place the larger s n the numerator for ths test. [[[why? ths makes a two-taled test nto a one taled test. explan ]]]. the F-dstrbuton densty functon Fal to Reject F ( ) reject Ho crtcal value Ho s III Under the null hypothess one would expect s I to be farly close to one and large dfferences from one would provde the bass for rejectng the null hypothess. Illustrate below, after the dscusson of the modern approaches. 6
B Modern approaches 1. Goldfeld-Quandt assumes that you know how to partton the data, but offers an exact test even n small samples. The modern approaches are all large sample tests, but t assumes that you don t know the form of heteroskedastcty except that the varance s correlated wth one or more of the ndependent varables ncluded n the analyss.. Breusch-Pagan Test for Heteroskedastcty see the text for an explanaton. Ths test and the Whte test are general purpose tests for heteroskedastcty, strctly vald only when the data sets are large. Homoskedastcty suggests that the varance s unrelated to the values of the explanatory varables, whereas n a heteroskedastc model the varance of the errors s related to the value of the ndependent varables to some functon "f" as follows: = f(x 1,X,,...,X k, ) An mportant note here. Ths does not say anythng about mpure heteroskedastcty because these Xs on the rght hand sde of the above equaton are always ncluded n the model, and even f one of the Xs were excluded from the model, t would have to be correlated wth the ncluded Xs to yeld mpure heteroskedastcty. In the absence of a vst from the covarance angel, we probably don't know what form "f" takes. Breunsch-Pagan suggests that ths could be a lnear functon, so that the test regresson takes the form (f k=3, so there are 3 slope regressors n the orgnal model): ˆ 0 1X1 X 3X3 n R wll be dstrbuted as a Ch-square dstrbuton under the null hypothess that there s no heteroskedastcty. Large values of ths statstc (beyond the crtcal values) would ndcate that there s heteroskedastcty. 3. Whte Test for Heteroskedastcty. Whte suggested takng a (second order) Taylor Seres expanson of σ = f(x 1, X, X k ) that ncludes cross product terms (snce cross product terms are also relevant n plm arguments).. We agan show ths for the smple case of three regressors, though the extenson to many regressors s straghtforward (snce you just keep ncludng all the squares and cross product terms of the orgnal regressors); σ = f(x 1, X, X k ) 0 1 1 3 3 4 1 5 6 3 7 1 X Next replace wth ˆ ( ˆ s the resduals from the orgnal model:), and run the followng regresson: 8 1 X 3 9 X 3 7
ˆ 0 1X 1 X 3X3 4X1 5X 6X3 7X1 X 8X1 X3 9XX3 If the product (number of observatons * R ) s hgh, then we reject the null hypothess of homoskedastcty of errors. n R s dstrbuted as Ch-square dstrbuton wth degrees of freedom equal to the number of non-ntercept terms n the last equaton. Large values of n R, larger than the crtcal value for the Ch-square dstrbuton, ndcate that null hypothess of homoskedastc errors s rejected. Lke the F-dstrbuton, the Ch-square s a skewed to the rght dstrbuton whose shape depends on the degrees of freedom parameter. 4. Smple Verson of the Whte Test. When the number of regressors s moderate or large, the Whte test wll obvously nclude a lot of cross product terms that wll eat up a lot of degrees of freedom. An alternate Whte test but not qute as general s to regress the squared resduals on the predcted value of Y, and the predcted value of Y squared. Under the null hypothess of no heteroskedastcty, the resultng n R wll be a Ch-square dstrbuton wth degrees of freedom. Examples of some of these tests follow: [[[ut_cps_hettest.do]]]] * testng for heteroskedastcty ; regress lnwage age whte male no_h_sc hgh_sch some_col college exec tech_sal serv_occ oper_occ predct resds, resduals; predct yhat; hettest; hettest, rhs; mtest, preserve whte; ** take the yhat and resds to form a whtes test: est var=a0+a1 yhat +a yhat^ ; gen yhatsq=yhat*yhat; gen resdsq=resds*resds; regress resdsq yhat yhatsq; gen lm_whte=e(n)*e(r); sum lm_whte; SOME OF THE OUTPUT FOLLOWS:. regress lnwage age whte male no_h_sc hgh_sch some_col college exec tech_sal serv_occ oper_oc > c > Source SS df MS Number of obs = 194 -------------+------------------------------ F( 15, 178) = 4.30 Model 3.0048301 15.13365534 Prob > F = 0.0000 Resdual 88.4508 178.49677083 R-squared = 0.658 -------------+------------------------------ Adj R-squared = 0.039 Total 10.430038 193.63989835 Root MSE =.7048 lnwage Coef. Std. Err. t P> t [95% Conf. Interval] age.0105187.004351.43 0.016.0019836.0190538 whte -.3686956.36178-1.0 0.309-1.08164.344333 male.193335.113989 1.9 0.056 -.005598.44465 no_h_sc -.8134617.869731 -.83 0.005-1.379769 -.471545 hgh_sch -.7969473.30653-3.46 0.001-1.51349 -.345461 some_col -.573499.16494 -.65 0.009-1.000597 -.1464015 college -.3698765.8009-1.6 0.107 -.8198131.0800601 8
exec.756144.3040339 0.91 0.366 -.34360.875589 tech_sal -.0885604.3174986-0.8 0.781 -.715106.5379851 serv_occ -.1359019.303503-0.45 0.655 -.73489.463055 oper_occ.1333178.91388 0.46 0.648 -.4414073.708048 ag_cnstr.3985858.1995688.00 0.047.0047605.794111 manuf.1959387.1610785 1. 0.5 -.119305.513808 trade.96175.1618434 1.83 0.069 -.03038.6155537 pub_admn.68051.307576 1.16 0.47 -.187305.73447 _cons 6.45909.543331 11.50 0.000 5.173708 7.31811. predct resds, resduals; (93 mssng values generated). predct yhat; (opton xb assumed; ftted values). hettest; Breusch-Pagan / Cook-Wesberg test for heteroskedastcty Ho: Constant varance Varables: ftted values of lnwage. hettest, rhs; ch(1) = 3.15 Prob > ch = 0.0758 **est var regressed on yhat, no heterosk at 5 precent level** Breusch-Pagan / Cook-Wesberg test for heteroskedastcty Ho: Constant varance Varables: age whte male no_h_sc hgh_sch some_col college exec tech_sal serv_occ oper_occ ag_cnstr manuf trade pub_admn ch(15) = 46.67 Prob > ch = 0.0000**est var regressed on all the rhs var, heterosk at better than 1 precent level**. mtest, preserve whte; Whte's test for Ho: homoskedastcty aganst Ha: unrestrcted heteroskedastcty ch(76) = 51.44 Prob > ch = 0.986 **est var regressed on x and cross products, no heteroskedastctyl** Cameron & Trved's decomposton of IM-test --------------------------------------------------- Source ch df p ---------------------+----------------------------- Heteroskedastcty 51.44 76 0.986 Skewness 19.39 15 0.1968 Kurtoss 3.7 1 0.0538 ---------------------+----------------------------- Total 74.54 9 0.9080 ---------------------------------------------------. ** take the yhat and resds to form a whtes test: est var=a0+a1 yhat +a yhat^ ;. gen yhatsq=yhat*yhat;. gen resdsq=resds*resds; (93 mssng values generated). regress resdsq yhat yhatsq; Source SS df MS Number of obs = 194 -------------+------------------------------ F(, 191) = 0.4 Model 1.3143371.657168559 Prob > F = 0.6555 Resdual 96.511071 191 1.5541398 R-squared = 0.0044 -------------+------------------------------ Adj R-squared = -0.0060 Total 97.85408 193 1.54313683 Root MSE = 1.46 resdsq Coef. Std. Err. t P> t [95% Conf. Interval] yhat.0489963 4.985543 0.01 0.99-9.784797 9.8879 9
yhatsq -.006454.4091188-0.05 0.960 -.876167.786358 _cons.9180861 15.14971 0.06 0.95-8.96413 30.80031. gen lm_whte=e(n)*e(r);. sum lm_whte; Varable Obs Mean Std. Dev. Mn Max -------------+-------------------------------------------------------- lm_whte 1117.8561439 0.8561439.8561439 **est var regressed on yhat and yhatsq, chd-square wth degrees of freedom, no heteroskedastcty ** In SAS, use the SPEC opton to test for heteroskedastcty: proc reg; model y=x1 x x3/spec; or for another varant of the Whte s test for heteroskedastcty, use proc model proc model; ; parms b0-b3; y=b0+b1*x1+b*x+b3*x3; Ft y/whte; run; Both these produce a Whte-lke test usng Xs and cross products of Xs; agan the null s no heteroskedastcty. [[[[[[ 3x5 Quz: Testng for heteroskedastcty usng the Professor s salares as the example: Below s the regresson of the squared resduals on experence, and the square of experence (the orgnal regresson, from whch the resduals were calculated, was the professor s salares regressed on experence): RESID_SQUARE = 446043 + 35440 experen - 9364 exper_sq Predctor Coef StDev T P Constant 446043 4419907 0.06 0.956 experen 35440 7404898 0.44 0.674 exper_sq -9364 60656-0.36 0.730 S = 3604350 R-Sq =.041 R-Sq(adj) = 0.0% a. What type of test for heteroskedastcty s beng employed here? b. Is there evdence of heteroskedastcty? Explan. ]]]]]] 10