This is the linearity assumption. The expected value of Y is really a straight-line function of x.

Size: px

Start display at page:

Download "This is the linearity assumption. The expected value of Y is really a straight-line function of x."

Gregory Knight
6 years ago
Views:

1 B90.30 / C.007 NOTE for Wedesday 4 FEB 00 Hadouts: mpleregressoiferece CoefIterp Matrx otato Pamphlets o multple regresso varable selecto We eed to talk about fereces that we make wth regresso. Frst we wll eed to be very clear about the model assumptos. We ll do ths for smple regresso (K ) ad later for multple regresso (K ). E(Y ) β 0 β x Ths s the learty assumpto. The expected value of Y s really a straght-le fucto of x. Y β 0 β x ε Ths s the addtve error (ose) assumpto. I partcular, the rght sde β β x e ε. s ot (β 0 β x ) ε. It s also ot 0 ε, ε,, ε are depedet radom varables, each wth the same dstrbuto The depedece meas that ε ad ε j are depedet radom varables whe j. It also meas that ε s depedet of x ad of all the other depedet varables. The same dstrbuto assumpto really meas that the ε s are a radom sample. E(ε ) 0 ad D(ε ) The ε s are assumed to be take from a populato wth mea zero. Ths s a costless assumpto. If E(ε ) ψ 0, the we would rewrte the model as Y (β 0 ψ) β x (ε ψ). I ths rewrtte form (β 0 ψ) s a ukow tercept ad the (ε ψ) are uobserved ose terms wth mea 0. The ε s are assumed to all have the same stadard devato. We eed to be wary about possble volatos of ths assumpto, as t s farly commo for D(ε ) to be larger for those data pots that have large x.

2 Ths says also that D(Y ) D(β 0 β x ε ) D( ε ). The sgal β 0 β x s ot radom, ad t does ot cotrbuted to the stadard devato. ε, ε,, ε are ormally dstrbuted Beleve t or ot, we do t eed to make ths assumpto all cases. If our oly cocers are about meas, stadard devatos, ad correlatos we do ot eed to voke ormalty. The ormal dstrbuto assumpto s crtcal for the feretal acts of cofdece tervals, hypothess tests, ad predctos. Ay actvty that gets to usg the t dstrbuto or the F dstrbuto must be based o the assumpto of ormalty. As a cosequece of all these assumptos, cludg ormalty, we ca show b ~ N β, The ~ meas s dstrbuted as. The otato N( mea, varace ) s also used here. That s, b s ormally dstrbuted wth mea β ad wth varace. The stadard devato s of course. There s a terestg dervato for ths. b xy ( x x)( Y Y ) ( x ) x Y It s terestg that we ca drop Y from ths arthmetc. ( x x)( β β x ε ) 0 ( x x)( β x ε ) Note that β 0 dsappears.

3 ( x x) x ( x x) β ε ( x x)( x x) ( x x) β ε Note that x ca be retroduced. β x x ε β x x. Recall that ( x x) ε ce ths s a lear combato of the ε s, t must be ormally dstrbuted. (The x s are treated as o-radom.) ce E(ε ) 0, ths also shows that E(b ) β. All that remas s fdg Var(b ). Var(b ) Var β ( x x) ε Var ( x x) ε Var ( x x) ce β s ot radom, t s ot volved the varace. 3 ε

4 { ε} Var ( x x) ( x x) Var ( ε) x x x x ( ) b 0 ~ N β0, x Cov(b 0, b ) x (Ths does ot requre the ormal assumpto.) ( ) s ε ~ χ Ths s the ch-squared dstrbuto. You wll also see ths χ expressed as s ε ~. It ca be show that the expected value of a ch-squared radom varable s ts degrees of freedom. Thus E( χ ), leadg us to E( s ε ). We wll say s ε s a ubased estmate of. It does ot follow that s ε s resd a ubased estmate of. ce M resd result ca be expressed as resd ~ χ. s ε s statstcally depedet of (b 0, b ) 4 s ε, ths A derved quatty, such as b, has a stadard devato that depeds o ukow thgs (usually ). Here D(b ). We ca estmate ths stadard devato by estmatg the ukow thg, ad we call ths

5 estmate the stadard error. ce s ε estmates, we wrte E(b ) sε stadard error of b. Alas, ot everyoe uses ths strct defto for stadard error. I may problems wth ormal dstrbutos ad estmates of from a ch-squared dstrbuto, we ca create radom quatty - E radom quatty E radom quatty ad t wll have a t dstrbuto. The exact statemets are a bt trcky, ad that wll be omtted here. The degrees of freedom s the degrees of freedom umber from the ch-squared dstrbuto. Ths leads us to b E( b ) ( b ) E b s ε β ~ t Ths ca be used to create a α cofdece terval for β. Thus sε α cofdece terval for β s b ± tα/;. mlar logc gets a α cofdece terval for β 0. α cofdece terval for β 0 s x b0 ± tα/; sε. The tervals for β 0 ad β are ot statstcally depedet. If t happes that β 0, the regresso s dstrbuted as χ, depedet of resd. If β 0, ths dstrbuto s a o-cetral ch-squared, but that s aother story. Mregresso If t happes that β 0, the F follows the F dstrbuto wth Mresd (, ) degrees of freedom. If β 0, the dstrbuto s o-cetral F ad teds to be bg. (That s why we reject β 0 wth large F statstcs.) 5

6 Let s suppose that we a target value x ew md, ad we wated to predct the populato average Y that would result from ths x ew. Ths s askg for a predcto of E Y ew β 0 β x ew, whch s somethg we geerally do ot eed to do! The pot predcto s clearly b 0 b x ew. We ca obta the varace of ths predcto: Var(b 0 b x ew ) Var ( b ) x Cov ( b, b ) x Var ( b ) 0 ew 0 ew x xew x xew ( x xewx xew ) ew ( x x) I ths expresso, x ad are based o the orgal data pots, ad x ew s ot part of the orgal data. It follows that we ca make a t dstrbuto out of ths: ( b0 b xew) ( β 0 β xew) ( x x) s ε ew ~ t The, of course, we ca gve a - α cofdece terval for β 0 β x ew. Here t s: b 0 b x ew ± t s α/; ε ( x x ) ew Ths s all logcally correct, but t begs the questo. Why exactly mght you wat a cofdece terval for β 0 β x ew? 6

7 Vastly more useful s a predcto for Y ew. The obvous predcto s Ŷ ew b 0 b x ew. I ths expresso, we regard x ew as o-radom, but we certaly must regard b 0 ad b (ad thus Ŷ ) as radom. ew Let s use the symbol Y ew as the ew value, whch we ll see evetually. The quatty Yˆ ew Yew s the predcto error. We ca get to the statstcal propertes of ths by usg the model o Y ew. pecfcally, we have Y ew β 0 β x ew ε ew. The ew ose term ε ew s statstcally depedet of everythg that came before t. Certaly the predcto error show that the mea s zero. Ŷ ew - Y ew wll follow a ormal dstrbuto. We ca Ŷ - Y ew ] E ( b b x ) ( β β x ε ) E[ ew 0 ew 0 ew ew ( E b0 E b xew ) ( 0 xew E( ew )) β β ε 0 The varace s trcker: Var[ ew Ŷ - Y ew ] Var ( b b x ) ( β β x ε ) 0 ew 0 ew ew Var ( b b x ) ε 0 ew ew Var ( b ) x Cov ( b, b ) x Var ( b ) Var ( ε ) 0 ew 0 ew ew x xew x xew ( x xewx xew ) ew ( x x) Ths leads to a t dstrbuto: 7

8 s ε Yˆ ew Y ew ( x x) ew ~ t You wll frequetly see the very useful predcto terval for Y ew : b 0 b x ew ± t s α/; ε ( x x ) ew It should be oted that the tervals for two ew values, say statstcally depedet of each other. x ad x are ot ew ew You wll also see plot of predcto bads. These are ot a cofdece rego for the le. It must be oted that these dstrbutoal results deped o the correctess of the model. It s curous to ote that the results hold exactly oly f we make o effort to check the tegrty of the model! Let s thk about that. If we check the model, perhaps by examg the data or lookg at resdual plots, we mght take actos. We mght replace Y by ts logarthm. We mght delete a pot as beg a outler. The actos are, of course, possble sources of errors. As a smple llustrato of that, let s recall the smple (pre-regresso) problem based o a sample of values w, w,, w. We make the model that these are a sample from a ormal populato wth mea μ w ad stadard devato w. s The a 95% cofdece terval for μ w s w ± t 0.05; -. Ths s true f the model s true ad we do t exame our data! uppose that we have a rule for fdg outlers ad throwg them away. Whether we make correct or correct decsos ay partcular case, we wll stll be messg up the 95% cofdece. Of course, every practcg statstca would look at the data to check for assumptos. We would look very foolsh f we dd ot. But let s at least be hoest ad admt that lookg at the data may alter our cofdece values! 8

9 Resduals also help detfy model falure, especally through usg the resdual-versusftted plot. (There s a secto the mple Lear Regresso pamphlet o ths; the commoly ecoutered s spreadg resduals. The cure s takg logarthms of Y. Let s exame ths the cotext of the alary data set. Ths s M\ALARY.MTP Here s the resdual versus ftted plot: Resduals Versus the Ftted Values (respose s ALARY) 0000 Resdual Ftted Value Ths s a clear case of expadg resduals. The cure for ths oe s to replace Y by log Y. * We recommed base-e logarthms. * If the Y s have some zeroes or egatve values, use log(y c), choosg c to elevate all the values to just above zero. We ca pot out also the gross sze pheomeo. Varables that ru over several orders of magtude are almost always treated wth logarthms. The resdual versus ftted plot wll also let us detect curvature. It ca fd werd values (outlers?), but we have to be careful wth such fuy pots. If ay pot s werd, t must be checked for correct codg. If the pot s ot recorded correctly, fx t ad start over. If t caot be fxed, the dscard t, but record ths acto the report. 9

10 If the pot s correct, try to decde f t has udue mpact o the regresso. If removg the pot gves a bg chage b, the you mght cosder removg t. I ay case, f you remove a pot, that acto must be cluded the report. For multple regresso the model s Y β 0 β x β x β 3 x 3 β K x K ε We say that ths s regresso wth K predctors. For K, we call t smple regresso ad for K, t s multple regresso. The otato s a problem. The subscrpt-free verso of the model s Y β 0 β x β x β 3 x 3 β K x K ε I ths form, the depedet varables would be descrbed as X, X,, X K. (We ca use upper case letters whe we thk of them as varables.) I applcatos, these wll have ames. The model mght look lke ALE β 0 β OCK OCK β HIRT HIRT β PANT PANT ε Ths would avod the cofuso of the double subscrpts o the x terms. The estmatg crtero s aga least squares. pecfcally, we try to mmze ( Y ) b0 bx bx b3x3... bkxk You ll also see ths as Y K b x j j j 0 Ths requres that we detfy x 0 for every. We have a hadout o the matrx otato. X X X Y requres the verso of a p-by-p matrx. Ths ca fal for a umber of reasos: 0 A crtcal ssue s that the soluto vector b -

11 * Rak(X) < p. The p colums of X are ot learly depedet vectors. The hadout o matrx otato shows ths. * Not eough data, meag < p. I that case, the matrx X has more colums tha rows. Ths automatcally makes Rak(X) < p. There s also the strage computatoal cocer whch Rak(X) s just barely p. Ths happes whe oe of the colums of X s very, very close to beg a exact lear combato of other colums. We wo t dscuss ths detal, as the problem shows up as collearty. The measure of ths cocer s the rato largest egevalue of smallest egevalue of ( X X) ( X X). There s ths dffcult terpretato ssue wth regard to the estmated coeffcets. Exame the amal set. Cosder the regresso of Gestato o (Bweght, Tsleep). Get matrx plot. Dscover the Bweght eeds logs. The ru the regresso, fdg that Gestato also eeds logs. For coherece, we ll eed to elmate ay amal that s ot complete o these three varables. Here s the regresso: Regresso Aalyss: log(g) versus log(w), Tsleep The regresso equato s log(g) log(w) Tsleep Predctor Coef E Coef T P Costat log(w) Tsleep R-q 6.% R-q(adj) 60.7% Aalyss of Varace ource DF M F P Regresso Resdual Error Total Ths s a good regresso (although t s depedet a absurd way o the style whch the amals were chose). Let s look at the estmated coeffcet The terpretato here s that

12 a oe ut chage log(bw) leads to a chage of 0.78 log(g), holdg Tsleep costat There s aother way to look at ths. Let log(bw)[tsleep] be the resdual from regressg log(bw) o Tsleep That s, ths s the part of log(bw) that caot be explaed by Tsleep. mlarly, let log(g)[tsleep] be the resdual from regressg log(g) o Tsleep Ths s the part of log(g) that caot be explaed by Tsleep. Now... f we regress log(g)[tsleep] o log(bw)[tsleep], we get Regresso Aalyss: log(g)[tsleep] versus log(w)[tsleep] The regresso equato s log(g)[tsleep] log(w)[tsleep] Predctor Coef E Coef T P Costat log(w)[tsleep] R-q 3.5% R-q(adj) 3.% Aalyss of Varace ource DF M F P Regresso Resdual Error Total Note that the slope coeffcet s 0.78! As a sde thought questo, why s the tercept zero? May thgs eed to be checked a multple regresso. The set CONDO09.mtp s useful for ths. * The resdual versus ftted should be patterless. If the resduals expad to the rght, you ll eed to use log(y ), or perhaps log(y c). * Wth other strage patters the resdual versus ftted plot, you may wsh to plot the resduals agast each depedet varable. Ths wll suggest that we have curvature DELEV. It s hard to kow f ths was worth correctg, but we dd.

13 ome people worry about teractos of predctors. The geeral method for dog ths s to add a product term. I the amal regresso of log(g) o {log(w), Tsleep}, cosder also the regresso of log(g) o {log(w), Tsleep, log(w) Tsleep}. That s, the regresso s exteded wth a ordary product. For ths example, the mprovemet R s trflg, ad the ew varable (the product) s ot statstcally sgfcat. Wth a p-value of 0.0 t does look terestg. Warg: If you do varable selecto ad you decde to keep the log(w) Tsleep teracto, you must also keep the predctors log(w) ad Tsleep. I other words, ever cosder regressg log(g) o {log(w), log(w) Tsleep}. I a multple regresso, the t statstcs are used to test hypotheses about dvdual coeffcets. If the model s Y β 0 β H H β K K β Q Q β V V ε, the the t statstc prted the le for b Q (the estmate of β Q ) tests the hypothess H 0 (Q): β Q 0 agast the alteratve H (Q): β Q 0. Ths must be terpreted the cotext of ths regresso ad wth the predctors {H, K, V}. Ths s most deftely ot depedet of the tests o β H, β K, or β V. uppose stead that you wshed to test the hypothess H 0 : β H β K agast the alteratve H : β H β K. Ths kd of ferece s ot automated Mtab. We ca get t, but we ll have to work for t. As a sde ote, you ca oly cosder β H β K f varables H ad K are the same uts. We are gog to use the partal F test. Ths s explaed eatly Chatterjee ad Had. Let s suppose that we have a full model ad a reduced model. I our example, the full model s Y β 0 β H H β K K β Q Q β V V ε the reduced model s Y β 0 β β Q Q β V V ε where H K 3

14 A requremet for usg ths partal F test s that the reduced model must be a sub-model of the full model. Ths meas that there s some choce for the parameters the full model that produces the reduced model. tart from Y β 0 β H H β K K β Q Q β V V ε ad the make the choce that β H β K to wrte Y β 0 β H H β H K β Q Q β V V ε Ths ca be regrouped as Y β 0 β H (H K ) β Q Q β V V ε Now reamg H K as ad also lettg β be aother ame for the assumed commo value of β H ad β K wll produce the reduced model. Do regressos for both the full model ad the reduced model. Observe that regresso (full) > regresso (reduced) resd (full) < resd (reduced) regresso (full) - regresso (reduced) resd (reduced) - resd (full) 4

15 The partal F statstc ca be wrtte several ways: F { regresso full regresso reduced } resd full DF ( full) resd ν { resd reduced resd full } resd full DF ( full) resd ν { regresso full regresso reduced } M ( full) resd ν { resd reduced resd full } M ( full) resd ν I every form, the umerator s based o a sum squares dfferece. How much better s the ft wth the full model? The value ν s the umber of parameters that dstgush H 0 from H. For ths example, the full model has E(Y ) β 0 β H H β K K β Q Q β V V ad t takes fve parameters to specfy ths (β 0, β H, β K, β Q, β V ) the reduced model has E(Y ) β 0 β β Q Q β V V For ths example, ν. ad t takes four parameters to specfy ths (β 0, β, β Q, β V ) The deomator of the partal F statstc s the mea square resdual from the full model. Ths deomator estmates whether H 0 s true or ot. The degrees of freedom for the partal F are (ν, DF resd (full) ). The rule s to reject H 0 at level α f the partal F equals or exceeds F α,df, the upper α pot for the F ν resd full dstrbuto wth (ν, DF resd (full) ) degrees of freedom. We are out of the habt of lookg up cutoff pots for the F dstrbuto because most software prts the p-value. We would just reject at level α f p α. The partal F s ot arraged by Mtab. You have to do two rus ad assemble ths yourself. Ths also meas that you wo t get a p-value. You could the use prted tables of the F dstrbuto, whch gve the upper 5% ad % pots. ce Mtab s 5

16 already actve for the work you re dog, you ca get p-value yourself. Let s suppose that your partal F works out to the value 7. wth (, 7) degrees of freedom. Use Calc Probablty Dstrbutos F ad the fll the resultg pael as follows: Ths starts wth the default rado butto settg at Cumulatve probablty, whch s exactly what you wat. The output s ths: Cumulatve Dstrbuto Fucto F dstrbuto wth DF umerator ad 7 DF deomator x P( X < x ) The probablty to the left of your 7. s foud to be The probablty to the rght (correspodg to the rejecto rego for H 0 versus H ) s the You ca report the p

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model Lecture 7. Cofdece Itervals ad Hypothess Tests the Smple CLR Model I lecture 6 we troduced the Classcal Lear Regresso (CLR) model that s the radom expermet of whch the data Y,,, K, are the outcomes. The