( )( ) [ ] [ ] ( ) 1 = [ ] = ( ) 1. H = X X X X is called the hat matrix ( it puts the hats on the Y s) and is of order n n H = X X X X.

Size: px

Start display at page:

Download "( )( ) [ ] [ ] ( ) 1 = [ ] = ( ) 1. H = X X X X is called the hat matrix ( it puts the hats on the Y s) and is of order n n H = X X X X."

Homer Russell
5 years ago
Views:

1 ( ) ( ) where ( ) 1 ˆ β = X X X X β + ε = β + Aε A = X X 1 X [ ] E ˆ β β AE ε β so ˆ = + = β s unbased ( )( ) [ ] ˆ Cov β = E ˆ β β ˆ β β = E Aεε A AE ε ε A Aσ IA = σ AA = σ X X = [ ] = ( ) 1 Ftted values are gven by Yˆ = X ˆ β = X X X X Y = HY ( ) 1 ( ) 1 H = X X X X s called the hat matrx ( t puts the hats on the Y s) and s of order n n The error sum of squares, SSE, s Mn S = Y Y ˆ β X Y + ˆ β X X ˆ β ( ) 1 = Y Y ˆ β X Y + ˆ β X X X X X Y = Y Y ˆ β X Y The estmate of σ s based on ths Normal model Addng the assumptons that the error varables are..d. wth ε ~ N(0,σ ), then Y has a MVN (multvarate normal) dstrbuton wth mean Xβ and covarance matrx) σ I. ˆ β = X X X Y s also MVN {wth mean β and It follows (usng MVN theory) that ( ) 1 σ X X }. covarance matrx ( ) 1 The maxmum lkelhood method produces the same estmates of the β s as least-squares and an estmate of σ whch s agan a multple of SSE as before. ( β ) 1 In fact we use ˆ σ = ˆ n k 1 Y Y X Y In addton we can show that ˆ β β ese ( ˆ β ) and c s the th dagonal element of ( X X ) Resduals ( ˆ ) ( ˆ ) SSE = ˆ ε ˆ ε = Y X β Y X β Note that ˆε = Y Yˆ = Y HY = ( I H )Y 14 ( ˆ β ) ~ t where ese = c ˆ σ n k 1 It follows that Cov[ ˆ ε ] = σ ( I H ) and V [ ˆ ε ] = σ ( 1 h ) where h s the th dagonal element of ( ) 1 H = X X X X.

2 The resduals have dfferent varances. h s a measure of the leverage/nfluence of the th observaton and ts value depends explctly on the predctors for that observaton. [ ] as h V εˆ We can defne the standardsed resdual as ˆ ε ese ˆ ε ˆ ε ( ) ( ) = 1 h The standardsed resduals do have the same varance (= 1). Checkng for ths s a recommended part of model dagnostcs. An observaton wth a large standardsed resdual has an unusual response. (Related dea: Cook s dstances for dentfyng nfluental observatons). ˆ σ Illustraton of multple lnear regresson (wth explanatory varables) Model: Y = β 0 + β 1 x 1 + β x + ε or E[Y x ] = β 0 + β 1 x 1 + β x ; ε ~ N(0,σ ) Data: n = 7 y x 1 x R Data are n a data frame called llus, wth varables y, x1, and x pars(llus) gves us a matrx plot of scatterplots for pars of varables see over Y = X = X X = X X 1 = ( ) 140. X Y = ɵ β = ( X X ) X Y =

3 y x1 x >> Ftted model y ˆ = x x >> Estmate of error varance σ Y Y = ɵ X Y = ɵ = = 1 β σ ( ) >> Standard errors/covarances/dstrbutons of estmators ˆ ˆ Estmate of Cov β = σ ( X X ) = ese( ˆ β ) = ese( ˆ β ) = 0.53 ese( ˆ β ) = 0.01 So: 0 1 ˆ β ˆ ~ (, ) ˆ 0 ~ N ( β 0,0.5685) β1 N β1 β ~ N( β, ) >> Testng the β's Testng H: β 0 = 0 we have t = 0.138/ = 0.9 on 4 d.f. Smlarly for β 1 : t = 3.55 and for β : t = 0.8 Now P( t 4 >.776) = 0.05 so ˆβ 1 s sgnfcantly dfferent from 0". We conclude that β 1 0, whch suggests that x 1 may be a useful predctor of the response. 16

4 >> 95% CIs for parameters β 0 : ± ( ).e ±.086.e. (.30, 1.87) β 1 : ± ( ).e ± e. (0.196, 1.60) β : ± ( ).e ± e. ( 0.041, 0.076) CI for σ (based on the χ 4 dstrbuton of ˆ 4 σ / σ ) s {( )/11.14, ( )/0.4844}.e. (0.030, 0.695) Usng R attach(llus) summary(y) Mn. 1st Qu. Medan Mean 3rd Qu. Max summary(x1) Mn. 1st Qu. Medan Mean 3rd Qu. Max summary(x) Mn. 1st Qu. Medan Mean 3rd Qu. Max mod = lm(y~x1+x) summary(mod) Resduals: Coeffcents: Estmate Std. Error t value Pr(> t ) (Intercept) x * x Sgnf. codes: 0 `***' `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Resdual standard error: 0.90 on 4 degrees of freedom Multple R-Squared: 0.833, Adjusted R-squared: F-statstc: on and 4 DF, p-value: summary.aov(mod) Df Sum Sq Mean Sq F value Pr(>F) x * x Resduals Sgnf. codes: 0 `***' `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 mod$ft mod$res

5 >> Ftted values v responses plot(y,mod$ft,pch=8) y >> Resduals v x and v ftted values plot(x1,mod$res,pch=8) plot(x,mod$res,pch=8) plot(mod$ft,mod$res,pch=8) mod$ft x1 mod$res mod$res mod$res x mod$ft 18

6 >> Usng the "hat" matrx H = X(X X) -1 X H = ˆ and then ftted values Y = H Y = (see ftted values above) >> Standardsed resduals ɵ σ = ɵε 1 = = h 11 = h 11 = ese(ɵε 1 ) = ( ) 1/ = sres ɵε 1 = / = ɵε 7 = = h 77 = h 77 = ese(ɵε 7 ) = ( ) 1/ = sres ɵε 7 = /0.137 = Smlarly for the others, as shown below: Row FITS1 RESI1 HI1 SRES Interpretaton of ftted models - health warnng The nterpretaton of ftted models nvolvng more than one explanatory varable has to be approached wth care. We must recognse that when we examne the mportance/effect of an explanatory varable, we are examnng ts mportance/effect n the presence of any other varables already n the model. The coeffcents/parameters and assocated standard errors and P values do not tell the whole story and can sometmes be msleadng. 19

7 We need to wegh up all the evdence all the plots and summary statstcs, ncludng the contrbutons to the total sum of squares and the overall coeffcent of determnaton. Returnng to the data of the Illustraton above (secton ), let us examne the fts of models nvolvng only one explanatory varable, x 1 or x. Frst x 1 on ts own mod10 = lm(y ~ x1) summary(mod10) Resduals: Coeffcents: Estmate Std. Error t value Pr(> t ) (Intercept) x ** Sgnf. codes: 0 `***' `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Resdual standard error: on 5 degrees of freedom Multple R-Squared: , Adjusted R-squared: F-statstc: 0.59 on 1 and 5 DF, p-value: summary.aov(mod10) Df Sum Sq Mean Sq F value Pr(>F) x ** Resduals Sgnf. codes: 0 `***' `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 The total varaton n the responses s S yy =.0171; varable x 1 explans of ths total (80.5%) and the coeffcent assocated wth t (0.99) s hghly sgnfcant (sgnfcantly dfferent from 0) t has a very small P value (0.006, whch s < 1%). The varaton explaned by x 1 (1.630) s the same as n model above, whch was the ft of y on x 1 and x (n that order). Returnng to the ft n model, we see that, n the presence of x 1, x explans only a further of the total varaton. Together the two varables explan 83.3% of the total varaton (not much more than the 80.5% explaned by x 1 alone). In the presence of x 1, we gan lttle by ncludng x. Now x on ts own mod11 = lm(y~x) summary(mod11) Resduals: Coeffcents: Estmate Std. Error t value Pr(> t ) (Intercept) x Resdual standard error: on 5 degrees of freedom Multple R-Squared: , Adjusted R-squared: F-statstc:.1 on 1 and 5 DF, p-value: summary.aov(mod11) Df Sum Sq Mean Sq F value Pr(>F) x Resduals

8 The total varaton n the responses s S yy =.0171 ; varable x explans only of ths total (30.6%) and the coeffcent assocated wth t (0.051) s not sgnfcant (not sgnfcantly dfferent from 0) t has a szeable P value (0.197). Back to fttng both varables: what happens s we ft x frst, then x 1? mod1 = lm(y~x+x1) summary(mod1) Resduals: Coeffcents: Estmate Std. Error t value Pr(> t ) (Intercept) x x * Sgnf. codes: 0 `***' `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Resdual standard error: 0.90 on 4 degrees of freedom Multple R-Squared: 0.833, Adjusted R-squared: F-statstc: on and 4 DF, p-value: summary.aov(mod1) Df Sum Sq Mean Sq F value Pr(>F) x x * Resduals Sgnf. codes: 0 `***' `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 The varaton explaned by x n model 11, the ft of x alone, (0.618) s the same as n model 1, the ft of y on x and x 1 (n that order). Returnng to model 1, we see that, n the presence of x, x 1 explans a further of the total varaton. Together the two varables explan 83.3% of the total varaton (much more than the 30.6% explaned by x alone). In the presence of x, we gan a lot by ncludng x 1. It s clearer and smpler f we ft the varable whch explans most of the varaton n the responses frst, then the varable of next mportance and so on. The followng pages contan supplementary materal on aspects of statstcal modellng 1

9 Extra materal on statstcal modellng A further llustraton models wth qualtatve explanatory varables (factors) Data: n = pars (x, y ) where y s the response; the data arse under two dfferent sets of condtons (type = 0 or 1) and are presented below sorted by x wthn type. Row y x type Dstngushng the two types y x y We model the responses frst gnorng the varable type. R Data are n data frame called llus3, wth varables y, x, and type attach(llus3) mod3 = lm(y~x) plot(x,y,pch=8) ablne(mod3) x1

10 y summary(mod3) Resduals: Mn 1Q Medan 3Q Max Coeffcents: Estmate Std. Error t value Pr(> t ) (Intercept) *** x e-08 *** Sgnf. codes: 0 `***' `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Resdual standard error: on 0 degrees of freedom Multple R-Squared: , Adjusted R-squared: F-statstc: 69.4 on 1 and 0 DF, p-value: 6.01e-08 summary.aov(mod3) Df Sum Sq Mean Sq F value Pr(>F) x e-08 *** Resduals Sgnf. codes: 0 `***' `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 x mod3$ft mod3$res

11 Ftted values v responses plot(y,mod3$ft,pch=8) Resduals v ftted y values plot(mod3$ft,mod3$res,pch=8) mod3$res mod3$ft We now model the responses usng a model whch ncludes the qualtatve varable type, whch was declared as a factor when the data frame was set up [> type = factor(c( rep(0,14),rep(1,8)))]. mod4 = lm(y~x+type) mod3$ft y x1

12 summary(mod4) Resduals: Mn 1Q Medan 3Q Max Coeffcents: Estmate Std. Error t value Pr(> t ) (Intercept) e-05 *** x e-11 *** type e-06 *** Sgnf. codes: 0 `***' `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Resdual standard error: on 19 degrees of freedom Multple R-Squared: 0.95, Adjusted R-squared: F-statstc: on and 19 DF, p-value:.001e-11 summary.aov(mod4) Df Sum Sq Mean Sq F value Pr(>F) x e-11 *** type e-06 *** Resduals Sgnf. codes: 0 `***' `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Interpretng the output: The ft s yˆ = x ( f type ) so e.g. observaton 1: x =.4, type = 1, y ˆ = ( ) = observaton 0: x = 9.1, type =, y ˆ = ( ) = mod4$ft

13 mod4$res Ftted values v responses plot(y,mod4$ft,pch=8) mod4$ft y Resduals v x and v ftted values plot(x,mod4$res,pch=8) > plot(mod4$ft,mod4$res,pch=8) mod4$res mod4$res x mod4$ft The total varaton n the responses s S yy = ; varable x explans of ths total (77.6%) and the coeffcent assocated wth t (0.6090) s hghly sgnfcant (sgnfcantly dfferent from 0) t has a neglgble P value. 6

14 In the presence of x, the varable type explans a further of the total varaton and ts coeffcent s also hghly sgnfcant. Together the two varables explan 9.5% of the total varaton. In the presence of x, we gan much by ncludng type. * * * * * Fnally we extend the prevous model (mod4) by allowng for an nteracton between the explanatory varables x and type. An nteracton exsts between two explanatory varables when the effect of one on a response varable s dfferent at dfferent values/levels of the other. For example consder the effect of polcyholder s age and gender on a response varable clam rate. If the effect of age on clam rate s dfferent for males and females, then there s an nteracton between age and gender. mod5 = lm(y ~ x * type) summary(mod5) Resduals: Mn 1Q Medan 3Q Max Coeffcents: Estmate Std. Error t value Pr(> t ) (Intercept) e-05 *** x e-10 *** type x:type Sgnf. codes: 0 `***' `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Resdual standard error: 0.68 on 18 degrees of freedom Multple R-Squared: 0.956, Adjusted R-squared: F-statstc: 74.6 on 3 and 18 DF, p-value:.388e-10 summary.aov(mod5) Df Sum Sq Mean Sq F value Pr(>F) x e-11 *** type e-05 *** x:type Resduals Sgnf. codes: 0 `***' `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Comment: the nteracton appears to have added nothng the coeffcent of determnaton s effectvely unchanged compared to the prevous model. We also note that the extra parameter value s small and s not sgnfcant. In ths partcular case, an nteracton term s not helpful - ncludng t has smply confused the ssue. In a case where an nteracton term does mprove the ft and the coeffcent s sgnfcant, then both varables and the nteracton between them should be ncluded n the model Comparng full and reduced models We may want to compare a model wth a restrcted verson of tself: Full model wth k predctors: Y = β 0 + β 1 x β m x m + β m+1 x m β k x k + ε Reduced model wth m predctors: Y = β 0 + β 1 x β m x m + ε We can wrte the restrcton as H 0 : β m+1 = β m+ = = β k = 0 The full model ncludes an addtonal k m predctors. 7

15 Test s based on the relatve reducton n the resdual/unexplaned varaton SSE (.e. the ncrease n the regresson/explaned SSR) resultng from ncludng the extra k m predctors n the model. Under H 0 : ( reduced full ) SSE SSE /( k m) ~ F SSE /( n k 1) full k m, n k 1 If the F value s large we reject H 0 and conclude that the extra predctors should be ncluded n the model. Ref: Illustraton n sectons and : n = 7, k =, m = 1 Full model: Y = β 0 + β 1 x 1 + β x + ε SSE full = on 4 df ( predctors) Reduced model A: Y = β 0 + β 1 x 1 + ε (1 predctor).e. H 0 : β = 0 Fttng the regresson of y on x 1 alone gves SSE reduced = on 5 df So F = [( )/(1)]/[( /4)] = 0.7 on 1,4 df P value s very hgh, so the restrcton (H 0 ) can stand. The predctor x adds nothng of value to the model. Reduced model B: Y = β 0 + β x + ε (1 predctor).e. H 0 : β 1 = 0 Fttng the regresson of y on x alone gves SSE reduced = on 5 df So F = ( )/( /4) = 1.6 on 1,4 df P value < 0.05 so we reject the restrcton: we should nclude the predctor x 1. Notes: In the full model, explaned varaton SSR = Predctor x 1 explans followed by x wth a further Specfyng the predctors n reverse order, x explans followed by x 1 wth a further x 1 has a t value of 3.55 wth P value 0.04 x has a t value of only 0.8 wth P value R = Fttng x 1 only: R = ; fttng x only: R = Partal correlaton The sgnfcance of an ndvdual predctor can be found from the t rato of the correspondng β parameter. Another approach s to examne the partal correlaton of response and predctor, whch s the correlaton of response and predctor wth dependence on all other predctors removed. Consder the model Y = β 0 + β 1 x+ β z + ε wth data (y, x, z ), = 1,, n. We consder the drect effect of x on the response varable Y we fnd the partal correlaton of y and x, wth dependence on z removed, gven the symbol r yx z : remove the lnear effect of z from y: regress y on z call the resduals from the ft e yz remove the lnear effect of z from x: regress x on z call the resduals from the ft e xz We can now nvestgate the relatonshp between y and x (wth dependence on z removed) by regressng e yz on e xz. 8

16 And the ordnary correlaton coeffcent between e yz and e xz s the requred partal correlaton coeffcent of y and x wth dependence on z removed,.e. r yx z = correlaton coeffcent between e yz and e xz The partal correlaton coeffcent r yx z s related to the ordnary correlaton coeffcents as follows: r yx z = r yx r r yz xz ( 1 r yz )( 1 r xz ) (Note: n ths formula x and y are nterchangeable - as of course they must be) Partal correlaton coeffcents between response and potental new predctors to be ncluded n the model (wth the effect of all exstng predctors n the model removed) are also used n stepwse regresson methods. Ref: Illustraton n sectons and agan : n = 7, k =, m = 1 Partal correlaton of y and x 1 : y on x : r y = ; y = x ; resduals from the ft e y x 1 on x : r 1 = ; x 1 = x ; resduals from the ft e 1 Partal correlaton coeffcent of y and x 1 = correlaton coeffcent of e y and e 1 = Ordnary correlaton coeffcent of y and x 1 = Partal correlaton of y and x : y on x 1 : r y1 = ; y = x 1 ; resduals from the ft e y1 x on x 1 : r 1 = ; x = x 1 ; resduals from the ft e 1 Partal correlaton coeffcent of y and x = correlaton coeffcent of e y1 and e 1 = Ordnary correlaton coeffcent of y and x = Factors/dummy varables The relatonshp between the response Y and predctors x may be dfferent for values of some category varable - for example: ncome wth x and gender yeld wth x and nvestment type sales wth x and country We can ncorporate a category varable n a regresson analyss by declarng t as a factor or by explctly usng one or more ndcator varables, whch n ths context are often referred to as dummy varables. In secton we ncorporated the varable type by declarng t as a factor wth two levels. We could alternatvely have ncorporated t by explctly ntroducng a dummy numercal varable: Regresson of Y on x and type (= 0 or 1) E[Y type = 1] = β 0 + β 1 x E[Y type = ] = β 0 + β 1 x + β β s the addtonal ntercept for type = 1 Y = β 0 + β 1 x + (β type) + ε 9

17 Suppose the category varable s country, wth 3 values (Scotland, England, Wales). An alternatve to declarng country as a factor wth 3 levels s to ntroduce dummy varables: dv1 (= 1 for Scotland, 0 otherwse) dv (= 1 for England, 0 otherwse) In ths approach, to regress Y on x and country, we can use the model: Y = β 0 + β 1 x + (β dv1) + (β 3 dv) + ε E[Y Wales] = β 0 + β 1 x E[Y Scotland] = (β 0 + β ) + β 1 x E[Y England] = (β 0 + β 3 ) + β 1 x β s the addtonal ntercept for Scotland β 3 s the addtonal ntercept for England These examples are both common slope (wth dfferng ntercepts) models. There are also common ntercept (wth dfferng slopes) models Multcollnearty Consder the model Y = β 0 + β 1 x 1 + β x + ε Suppose the data ponts for our predctors (x 1,x ) le on a straght lne (r = ± 1). Then the model s not of full rank and we cannot ft t (try t and see ). If r s close to 1, the estmaton process becomes unstable (small changes n errors have bg effects) the standard errors of estmaton are hgh. We can predct the response for values close to the lne whch the data (x 1,x ) are close to, but not elsewhere. We cannot separate the effects of one predctor from the other and so we cannot sensbly nvestgate the effect of x 1 or x alone. Ths stuaton s descrbed as one of multcollnearty of course ts serousness s a matter of degree. Some predctors are related n subtle (not very obvous) ways ths can be a real problem. Some statstcal software packages provde warnngs when hgh correlatons among predctors are present Multple correlaton Multple correlaton measures the strength of the relatonshp between the response Y and the set of predctors. There are two approaches, both of whch we have met before. () correlaton coeffcent between the responses and the ftted responses () coeffcent of determnaton, gven by R ( yˆ y ) ( ) SSR = = SST y y In stepwse regresson we ntroduce the predctors one by one. Each addtonal predctor ntroduced nto the model wll add an amount ( 0) to the overall SSR, reducng SSE by a correspondng amount *. We generally choose the predctor to ntroduce next as that one whch wll ncrease R the most. * note: the degrees of freedom wll change 30

18 Heteroscedastcty In many stuatons n economcs and fnance, the specfcaton that the model be homoscedastc s napproprate, and so we nvestgate models n whch the assumpton that the error varance s constant s dropped. Instead we specfy a heteroscedastc model wth V[ε ] = σ. In hgh varance sectons of the data the resduals are lkely to be greater than n low varance sectons, and so OLS wll place more weght on the observatons n the former sectons, ensurng a good ft there. The OLS estmators are stll unbased and consstent, but they are not mnmum varance. In the model wth one predctor Y = a + bx + ε, V[ε ] = σ we fnd bˆ = S /, ˆ xy Sxx = cy E b = b, and Var b = ( ) x x σ ˆ Sxx The basc technque used to take account of heteroscedastcty s weghted least squares (WLS) n whch, for the model above, we mnmse whch leads to * ( ) Y a b x σ * * * * ˆ x y x x y y b = where x =, y = x σ σ We can acheve ths by scalng/weghtng the orgnal data and then usng OLS. In general then: Consder the model Y = β 0 + β 1 x β k x k + ε and the transformed model Y * = β 0 ' + β 1 ' x 1 * + + β k 'x k * + ε * where Y * = Y /σ, β ' = β /σ, x j * = x j /σ, j = 1,,, k, ε * = ε /σ The transformed model s homoscedastc: V[ε *] = 1 OLS on the transformed model s equvalent to WLS on the orgnal model Potental drawback do we need to know the values of the σ? Actually we can manage wth a knowledge of only the relatve szes of the error varances we may ndeed have some relevant nformaton whch wll help us to assess the relatve szes, but t s stll askng a lot. One mportant specal case s to assume that the error varance s related to one of the predctors n partcular that V[ε ] x j for some j. Suppose V[ε ] = cx. We then take Y * = Y /x, x j * = x j /x, j = 1,,, k, ε * = ε /x and ft the model Y /x = β 0 /x + β 1 x 1 /x + β + β 3 x 3 /x + + β k x k /x + ε /x Ths s also a homoscedastc model: V[ε *] = c. 31

Chapter 13: Multiple Regression

Chapter 13: Multiple Regression Chapter 13: Multple Regresson 13.1 Developng the multple-regresson Model The general model can be descrbed as: It smplfes for two ndependent varables: The sample ft parameter b 0, b 1, and b are used to