Econ107 Applied Econometrics Topic 9: Heteroskedasticity (Studenmund, Chapter 10)

I. Defnton and Problems Econ7 Appled Econometrcs Topc 9: Heteroskedastcty (Studenmund, Chapter ) We now relax another classcal assumpton. Ths s a problem that arses often wth cross sectons of ndvduals, households or frms. It can be a problem wth tme seres data, too. Homoskedastcty exsts when the varance of the dsturbances s constant: Var ( ε )= E ( ε )= σ Assumpton of equal (homo) spread (skedastcty) n the dstrbuton of the dsturbances for all observatons. Varance s a constant. Independent of anythng else, ncludng the values of the ndependent varables. Heteroskedastcty exsts when the varance of the dsturbances s varable: Var ( ε )= E ( ε )= σ The varance of the dsturbances can take on a dfferent value for each observaton n the sample. Most general specfcaton. Takes on dfferent values for each observaton. More often, σ may be related to one or more of the ndependent varables. Heteroskedastcty volates one of the basc classcal assumptons. Example: Suppose we estmate a cross-sectonal savngs functon. The varance of the dsturbances may ncrease wth dsposable ncome, due to ncreased 'dscretonary ncome'. More ncome has to be devoted to basc necesstes. Dstrbutons flatten out as DI rses. Take 3 dstnct ncome classes (low, medum and hgh) two graphs. Other examples. Mght be related to the sze of some economc aggregate (e.g., corporaton, metropoltan area, state or regon).

Page - What happens f we use OLS n a regresson wth known heteroskedastcty? () The estmated coeffcents are stll unbased. Homoskedastcty s not a necessary condton for unbasedness. Same result as multcollnearty. () But the OLS estmators are neffcent. Ths means that they're no longer

Page - 3 BLUE, lke n the case of seral correlaton. Dfferent from multcollnearty. Return to the -varable regresson: wth homoskedastcty: = β β X Y ε wth heteroskedastcty: Var ( ˆ β σ )= Σ x Of course f σ =σ, we can smplfy: Var ( x Var ( ˆ Σ σ β )= ( Σ x ) ˆ β σ Σ x σ )= = ( Σ x ) Σ x Earler formula for calculatng the standard error depends on the assumpton of homoskedastcty. It s a specal case of the more general formula. OLS estmators wll be neffcent. They re no longer mnmum varance. The formulae for the OLS estmators of the coeffcents are stll the same under heteroskedastcty. Intuton: We want the 'best ft' possble for our regresson lne. Our sample regresson functon should le as close as possble to the populaton regresson functon. OLS 'equally weghts' each observaton. It assumes each observaton contrbutes the same amount of 'nformaton' to ths estmaton. Wth heterogenety that s no longer approprate. Observatons should not be equally weghted. Those assocated wth a tghter dstrbuton of ε contrbute more nformaton, whle those assocated wth a wder dstrbuton of ε contrbute less nformaton about the bass of ths economc behavour. A pror observatons from wder dstrbuton have more potental error. By dsregardng heteroskedastcty and usng the OLS formula, we would produce based estmates of the standard errors. In general, we won't know the drecton of the bas. As a result, statstcal nference would be napproprate. Weghted Least Squares (WLS)

Page - 4 WLS essentally takes advantage of ths heterogenety n ts estmators. Assume σ s known. Transform the data by dvdng by σ : Y = β ( ) σ σ X β ( ) σ ε ( ) σ or use to denote the transformed varables: where W s the 'weght' gven to the observaton. Ths s just the nverse of the standard devaton. No longer a constant term n the regresson. OLS estmaton on the transformed model s WLS on the orgnal model (denote the estmators by ˆβ and ˆβ ). But why dd we do ths? As a result: = β W β X Y Ths means that the OLS estmators of the transformed data are BLUE, because the dsturbances are now homoskedastc. Not only constant, also equal to. In ths case ˆβ and ˆβ wll be BLUE. The transformed model meets all the classcal assumptons, ncludng homoskedastcty. Recall that ˆβ and ˆβ are unbased, but not effcent. Another way to motvate the dstncton between OLS and WLS s to look at the 'objectve functons' of the estmaton. Under OLS we mnmse the resdual sum of squares: ε σ Var( ε )= E( ε )= E [( ε / σ ) ] = = σ Σ e = Σ(Y - ˆ β - ˆ β X ) e Σ σ But under WLS we mnmse a weghted resdual sum of squares: Couple thngs to note:

Page - 5. The formulae for WLS estmates of β and β aren t worth commttng to memory, so don t wrte them down. The key s that they look smlar to the formulae under homoskedastcty, except for the weghtng factor. True of both - and k-varable models. But wth software packages, don t have to know these. Just transform data and run the regresson through the orgn.. If σ = σ for all observatons, then WLS estmators are OLS estmators. OLS s a specal case of ths more general procedure. II. Detecton How do we know when our dsturbances are heteroskedastc? The key s that we never observe the true dsturbances or the dstrbutons from whch they are drawn. In other words, we never observe σ (at least not unless we see the entre populaton). For example, take our orgnal example of the savngs functon. If we had the entre census of 4 mllon Sngaporeans we d be able to calculate t. We d know how the dsperson n the dsturbances vares wth dsposable ncome. But n samples we have to make an educated guess. We consder 4 dagnostc tests or ndcators.. A Pror Informaton Ths mght be 'antcpated' (e.g., based on past emprcal work). Check the relevant lterature n ths area. Mght show clear and persstent evdence of heteroskedastcty. For example, check both domestc and overseas studes of savngs regressons. Is t a commonly reported problem n ths emprcal work? Key s that you see t comng. Remander of the tests are post-mortems.. Graphcal Methods Key: We'd lke nformaton on u, but all we ever see are e. We want to know whether or not these squared resduals exhbt any 'systematc pattern'. Wth homoskedastcty we'll see somethng lke ths:

Page - 6 No relatonshp between e and the explanatory varable. Even f we get ths pattern, we can't rule out the possblty of heteroskedastcty. We may have to plot these squared resduals aganst other explanatory varables. The same could be done for the squared resduals and the ftted value. Use a two-step procedure: Alternatvely, wth heteroskedastcty what we'll see are patterns lke ths: 3. Park Test The Park test s just a formalsaton of the plottng of the squared resduals aganst another varable (often one of the explanatory varables).

Page - 7. Run OLS on your regresson. Retan the squared resduals. Assume that: σ β = σ Z Ths mples a 'log-log' lnear relatonshp between the squared resduals and Z.. Estmate the followng: ln e = lnσ β ln Z u Test H : β=. If null s rejected, ths suggests that heteroskedastcty s present. You need to choose whch varables mght be related to the squared resduals (often an ndependent varable s used). If β>, then upward-slopng curved relatonshp. If β<, then downward-slopng curved relatonshp. One problem s that rejecton of H s... suffcent, but not a necessary condton for heteroskedastcty. Another problem s that ths test mposes an assumed relatonshp between a partcular varable and the squared resduals. 4. Whte Test Ths gets around the problems of the Park test that the dsturbances are lkely to be heteroskedastc. Use a three-step procedure: = α β ln Z. Run OLS on your regresson (assume X and X 3 are the two ndependent varables). Retan the squared resduals. u. Estmate the followng auxlary regresson: 3 3 4 3 α 5 3 e = α α X α X α X α X X X u 3. Test the overall sgnfcance of the auxlary regresson. To do ths, use nr. Under the null of homoskedastcty, nr follows the ch-square dstrbuton wth

Page - 8 degrees of freedom equal to the number of slope coeffcents n the auxlary regresson, where n s the sample sze, R s the coeffcent of determnaton of the auxlary regresson. III. Remedal Measures What do you do when heteroskedastcty s suspected? () When σ s known, transform data by dvdng both dependent and ndependent varables by σ and run OLS. Ths s the weghted least squares procedure. Not a very nterestng stuaton. Ths nformaton s rarely avalable. () When σ s unknown, determne the lkely form of the heteroskedastcty. Transform the data accordngly, and run weghted least squares. Two Examples: Suppose we have the followng regresson for a cross secton of ctes: CR = β β EXP β ε where: CR = Per capta crme rate. EXP = Per capta expendtures on polce. = Populaton. The frst slope coeffcent pcks up the effectveness of polce expendtures at the margn (negatve). The second says that crme mght ncrease wth the sze of the metropoltan populaton (postve). () Suppose we suspect that: where σ s a constant. Var ( ε )= E ( ε )= σ Transform the data and estmate the followng:

Page - 9 or CR β EXP = β β CR = β EXP β β ε u The resduals are now homoskedastc. (The proof left to you as an exercse.) Ths doesn t change the nterpretaton of the coeffcents. Dvdng both sdes by the same varable. () Suppose we now suspect that: Var ( ε )= E ( ε )= σ CR ˆ where Cˆ R s the ftted value. Transform the data and estmate the followng: CR = β CR ˆ CR ˆ EXP β CR ˆ β CR ˆ u The resduals are now homoskedastc. Operatonally, ths second example requres two steps: () Run OLS on the orgnal equaton wth the untransformed data (recall that the estmated coeffcents are stll unbased, although they are neffcent). () Transform the data by dvdng the dependent and ndependent varables wth these ftted values and estmate as above. (). Usng heteroskedastc-corrected (HC or Whte) standard errors. Heteroskedastcty does not cause bas of OLS estmates but mpacts the standard errors. HC technque drectly adjusts the standard errors of OLS estmates to take account of heteroskedastcty.

Page - IV. Questons for Dscusson: Q.3 V. Computng Exercse: Johnson, Ch, -5