Unit 2. Regression and Correlation

Size: px
Start display at page:

Download "Unit 2. Regression and Correlation"

Transcription

1 PubHlth Sprg 0. Regresso ad Correlato Page of 80 Ut. Regresso ad Correlato Do t let us quarrel, the Whte Quee sad a axous toe. What s the cause of lghtg? The cause of lghtg, Alce sad very decdedly, for she felt qute certa about ths, s the thuder-oh o!, she hastly corrected herself. I meat the other way. It s too late to correct t, sad the Red Quee: whe you ve oce sad a thg, that fxes t, ad you must take the cosequeces. - Carroll Meopause heralds a complex terplay of hormoal ad physologc chages. Some are temporary dscomforts (e.g., hot flashes, sleep dsturbaces, depresso) whle others are log-term chages that crease the rsk of sgfcat chroc health codtos, boe loss ad osteopoross partcular. Recet observatos of a assocato betwee depressve symptoms ad low boe meral desty (BMD) rase the trgug possblty that allevato of depresso mght cofer a rsk beeft wth respect to boe meral desty loss ad osteopoross. However, the fdg of a assocato a smple (oe predctor) lear regresso model aalyss has multple possble explaatos, oly oe of whch s causal. Others clude, but are ot lmted to: () the apparet assocato s a artfact of the cofoudg effects of exercse, body fat, educato, smokg, etc; () there s o relatoshp ad we have observed a chace evet of low probablty (t ca happe!); (3) the pathway s the other way aroud (low BMD causes depressve symptoms), albet hghly ulkely; ad/or (4) the fdg s spurous due to study desg flaws (selecto bas, msclassfcato, etc). I settgs where multple, related predctors are assocated wth the outcome of terest, multple predctor lear regresso aalyss allows us to study the jot relatoshps amog the multple predctors (depressve symptoms, exercse, body fat, etc) ad a sgle cotuous outcome (BMD). I ths example, we mght be especally terested usg multple predctor lear regresso to solate the effect of depressve symptoms o BMD, holdg all other predctors costat (adjustmet). Or, we mght wat to vestgate the possblty of syergsm or teracto.

2 PubHlth Sprg 0. Regresso ad Correlato Page of 80 Table of Cotets Topc Learg Objectves.. Smple Lear Regresso.. a. Defto of the Lear Regresso Model.... b. Estmato.. c. The Aalyss of Varace Table. d. Assumptos for the Straght Le Regresso... e. Hypothess Testg..... f. Cofdece Iterval Estmato... Itroducto to Correlato.. a. Pearso Product Momet Correlato.. b. Hypothess Test for Correlato Multvarable Regresso.... a. Itroducto, Idcator ad Desg Varables. b. The Aalyss of Varace Table. c. The Partal F Test.. d. Multple Partal Correlato.. 4. Multvarable Model Developmet.. a. Itroducto b. Example Huma p53 ad Breast Cacer Rsk.. c. Gudeles for Multvarable Aalyses of Large Sets 5. Goodess-of-Ft ad Regresso Dagostcs. a. Itroducto ad Termology.. b. Assessmet of Normalty.. c. Cook-Wesberg Test of Heteroscedastcty. d. Method of Fractoal Polyomals... e. Ramsay Test for Omtted Varables. f. Resduals, Leverage, & Cook s Dstace

3 PubHlth Sprg 0. Regresso ad Correlato Page 3 of 80. Learg Objectves Whe you have fshed ths ut, you should be able to: Expla the cocepts of assocato, causato, cofoudg, medato, ad effect modfcato; Costruct ad terpret a scatter plot wth respect to: evdece of assocato, assessmet of learty, ad the presece of outlyg values; State the multple predctor lear regresso model ad the assumptos ecessary for ts use; Perform ad terpret the Shapro-Wlk ad Kolmogorov-Smrov tests of ormalty; Expla the relevace of the ormal probablty dstrbuto; Expla ad terpret the coeffcets (ad stadard error) ad aalyss of varace tables outputs of a sgle or multple predctor regresso model estmato;. Expla ad compare crude versus adjusted estmates (betas) of assocato; Expla ad terpret regresso model estmates of effect modfcato (teracto); Expla ad terpret overall ad adjusted R-squared measures of assocato; Expla ad terpret overall ad partal F-tests; Draft a aalyss pla for a multple predctor regresso model aalyss; ad Expla ad terpret selected regresso model dagostcs: resduals, leverage, ad Cook s dstace.

4 PubHlth Sprg 0. Regresso ad Correlato Page 4 of 80 Smple Lear Regresso. Smple Lear Regresso a. Defto of the Lear Regresso Model A smple lear regresso model s a partcular model of how the mea μ (the average value) of oe cotuous outcome radom varable Y (e.g. Y= boe meral desty) vares, depedg o the value of a sgle (usually cotuous) predctor varable X (e.g. X=depressve symptoms). Specfcally, t says that the average values of the outcome varable, as X chages, le o a straght le ( regresso le ). The estmato ad hypothess testg volved are extesos of deas ad techques that we have already see. I lear regresso, we observe a outcome or depedet varable Y at several levels of the depedet or predctor varable X (there may be more tha oe predctor X as see later). A lear regresso model assumes that the values of the predctor X have bee fxed advace of observg Y. However, ths s ot always the realty. Ofte Y ad X are observed jotly ad are both radom varables.

5 PubHlth Sprg 0. Regresso ad Correlato Page 5 of 80 Correlato Correlato cosders the assocato of two radom varables, Y ad X. The techques of estmato ad hypothess testg are the same for lear regresso ad correlato aalyses. Explorg the relatoshp begs wth fttg a le to the pots. We develop the lear regresso model aalyss for a smple example volvg oe predctor ad oe outcome. Example. Source: Klebaum, Kupper, ad Muller 988 Suppose we have observatos of age ad weght for = chcke embryos. The predctor of terest s X=AGE. The outcome of terest s weght. For purposes of llustrato, suppose we are terested two models of weght. I oe, the outcome varable s Y=WT. I the other, the outcome s the logarthm of weght, Z=LOGWT. Notato WT=Y AGE=X LOGWT=Z The data are pars of (X, Y ) where X=AGE ad Y=WT (X, Y ) = (6,.09) (X, Y ) = (6,.8) ad equvaletly, pars of (X, Z ) where X=AGE ad Z=LOGWT (X, Z ) = (6, -.538) (X, Z ) = (6, 0.449)

6 PubHlth Sprg 0. Regresso ad Correlato Page 6 of 80 Though smple, t helps to be clear the research questo How does weght chage wth age? Does t chage learly? I the laguage of aalyss of varace we are askg the followg: Ca the varablty weght be explaed, to a sgfcat extet, by varatos age? What s a good fuctoal form that relates age to weght? Always beg wth a scatter plot of the data! Plot the predctor X o the horzotal ad the outcome Y o the vertcal. A graph allows you to see thgs that you caot see the umbers aloe: rage, patters, outlers, etc. Here, let s beg wth a plot of X=AGE versus Y=WT Scatter Plot of WT vs AGE WT What to look for a scatter plot of the data: AGE The average ad meda of X The rage ad patter of varablty X The average ad meda of Y The rage ad patter of varablty Y The ature of the relatoshp betwee X ad Y The stregth of the relatoshp betwee X ad Y The detfcato of ay pots that mght be fluetal

7 PubHlth Sprg 0. Regresso ad Correlato Page 7 of 80 Example age (X) ad weght (Y) of chcke embryos: The plot suggests a relatoshp betwee AGE ad WT A straght le mght ft well, but aother model mght be better We have adequate rages of values for both AGE ad WT There are o outlers We mght have gotte ay of a varety of scatter plots: y.5 No relatoshp betwee X ad Y x 0 8 y 6 4 Lear relatoshp betwee X ad Y x

8 PubHlth Sprg 0. Regresso ad Correlato Page 8 of y3 5 No-lear relatoshp betwee X ad Y x y 0 Note the arrow potg to the outlyg pot Ft of a lear model wll yeld estmated slope that s spurously o-zero x 0 y Note the arrow potg to the outlyg pot Ft of a lear model wll yeld a estmated slope that s spurously ear zero x

9 PubHlth Sprg 0. Regresso ad Correlato Page 9 of 80 y Note the arrow potg to the outlyg pot Ft of a lear model wll yeld a estmated slope that s spurously hgh x Example, cotuous age (X) ad logweght (Z) of chcke embryos: The X-Y plot o page 6 s rather bowl shaped. Here we cosder a X-Z scatter plot. It s much more lear lookg, suggestg that perhaps a better model relates the logarthm of WT (Z) to AGE: 0.5 Scatter Plot of LOGWT vs AGE -0. LOGWT AGE We ll vestgate two models. ) WT = β 0 + β AGE ) LOGWT = β 0 + β AGE

10 PubHlth Sprg 0. Regresso ad Correlato Page 0 of 80 A lttle revew of your hgh school troducto to straght le relatoshps Slope > 0 Slope = 0 Slope < 0

11 PubHlth Sprg 0. Regresso ad Correlato Page of 80 Populato Y Defto of the Straght Le Model Y = β 0 + β X = β + β X + ε 0 Y = $ β + $ β X + e 0 Y = β + β X 0 s the relatoshp the populato. It s measured wth error. ε = measuremet error $β 0, $ β, ad e are our guesses of β 0, β ad ε e = resdual We do NOT kow the value of β 0 or β or ε We do have values of $ β 0, $ β ad e The values of $ β 0, $ β ad e are obtaed by the method of least squares estmato. To see f $ β β ad $ β β 0 0 regresso dagostcs. we perform A lttle otato, sorry! Y = the outcome or depedet varable X = the predctor or depedet varable μ Y = The expected value of Y for all persos the populato μ Y X=x = The expected value of Y for the sub-populato for whom X=x σ Y σ Y X=x = Varablty of Y amog all persos the populato = Varablty of Y for the sub-populato for whom X=x

12 PubHlth Sprg 0. Regresso ad Correlato Page of 80 b. Estmato There are a varety of methods for obtag estmates of β 0 ad β. I ths course, we wll cosder two of them, maxmum lkelhood estmato ad least squares estmato. Maxmum Lkelhood Estmato - Ths requres use of a probablty dstrbuto model. For example, we mght assume that the outcome varable Y s dstrbuted ormal, wth mea values that le o the regresso le. Maxmum lkelhood estmato chooses estmates of β 0 ad β that, whe appled to the data, gves us the largest value possble for the lkelhood of the data that was actually observed. Least Squares Estmato - NO probablty dstrbuto model requred here!. Least squares estmato chooses estmates of β 0 ad β that yeld the smallest total of vertcal dstaces (observed to predcted) Whe the outcome varable Y s dstrbuted ormal, Maxmum Lkelhood Estmato = Least Squares Estmato How Least Squares Estmato works. Theoretcally, we could draw lots of possble les through the X-Y scatter of the data pots. Whch oe s the closest? Ad what do we mea by close ayway? Cosder the followg: Y = observed Y ˆ = predcted, meag that Y ˆ = β ˆ + βˆ x 0 ˆ (Y-Y) = vertcal dstace betwee observed outcome ad predcted outcome I least squares estmato, close meas the followg. We d lke the observed Y ad ts correspodg predcto $ Y to be as close as possble. Ths s the same as watg ˆ (Y - Y) to be as small as possble It s ot possble to choose $ β 0 ad $ β so that t mmzes dy Y $ ad mmzes dvdually dy Y $ ad mmzes dvdually. d Y Y $

13 PubHlth Sprg 0. Regresso ad Correlato Page 3 of 80 So, stead, we choose $ β 0 ad $ β that makes ther total as small as possble c h e j Y Y$ = Y $ β + $ β X = = 0 How Least Squares Estmato works a pcture. The total (a total of squared dffereces) that we wat to mmze has a varety of ames. c h e j Y Y$ = Y $ β + $ β X s varously called: = = 0 resdual sum of squares sum of squares about the regresso le sum of squares due error (SSE)

14 PubHlth Sprg 0. Regresso ad Correlato Page 4 of 80 For the calculus lover, A lttle calculus yelds the soluto for the estmates $ β 0 ad $ β c h e j Cosder SSE = Y Y$ = Y $ β + $ β X = = 0 Step : Dfferetate wth respect to $ β Set dervatve equal to 0 ad solve. Step : Dfferetate wth respect to $ β 0 Set dervatve equal to 0, sert $ β ad solve. β s the ukow slope the populato Its estmate s deoted ˆβ or b β 0 s the ukow tercept the populato Its estmate s deoted 0 ˆβ or b 0 How to use some summato calculatos to obta these estmates Calculate ( ) Sxx = X-X = X NX ( ) Syy = Y-Y = Y NY xy ( ) S = X-X (Y-Y) = XY NXY Revew. These expressos make use of a specal otato called the summato otato. The captol S dcates summato. I S xy, the frst subscrpt x s sayg (x-x). The secod subscrpt y s sayg (y-y). S xy = ( ) X-X (Y-Y) S subscrpt x subscrpt y

15 PubHlth Sprg 0. Regresso ad Correlato Page 5 of 80 Formulae for Estmated Slope ad Itercept Slope ( X X)( Y Y) ˆ S = = S = β ( X X) = ( )( ) X X Y Y ( ) = = = ( ) = X X ( ) xy xx ( XY) cov ˆ, var( ˆ X ) ˆ S β = S xy xx Itercept $ β = Y $ β X 0 Predcto of Y Ŷ= ˆ β ˆ X 0 + β 0 + =b b X Do these estmates make sese? Slope ( X X )( Y Y ) ˆ = β = ( X X ) = côv( X, Y ) = vâr( X ) Itercept $ β = Y $ β X 0 The lear movemet Y wth lear movemet X s measured relatve to the varablty X. $β = 0 says: Wth a ut chage X, overall there s a chace that Y creases versus decreases. $β 0 says: Wth a ut crease X, Y creases also ( $ β > 0) or Y decreases ( $ β < 0). If the lear model s correct, or, f the true model does ot have a lear compoet, we obta $β = 0 ad $ β 0 = Y as our best guess of a ukow Y.

16 PubHlth Sprg 0. Regresso ad Correlato Page 6 of 80 Illustrato Stata Commad.. regress wt age Partal lstg of output aotatos red wt Coef. Std. Err. t P> t [95% Cof. Iterval] age = slope = b _cos = tercept = b The ftted le s therefore WT$ = * AGE Overlay of Least Squares Le o the X-Y Scatter Plot WT Scatter Plot of WT vs AGE As we mght have guessed, the straght le model may ot be the best choce. However t s worth otg that the bowl shape of the scatter plot does have a lear compoet AGE So wthout the plot, we mght have beleved the straght le ft s okay.

17 PubHlth Sprg 0. Regresso ad Correlato Page 7 of 80 lillustrato of straght le model ft to Z=LOGWT versus X=AGE. Commad.. regress logwt age Partal Lstg of Output Aotatos red logwt Coef. Std. Err. t P> t [95% Cof. Iterval] age = slope = b _cos = tercept = b Thus, the ftted le s LOGWT = *AGE Overlay of Least Squares Le o the X-Y Scatter Plot 0.5 Scatter Plot of LOGWT vs AGE Much better ft. -0. LOGWT AGE

18 PubHlth Sprg 0. Regresso ad Correlato Page 8 of 80 Now You Try Predcto of Weght from Heght Source: Dxo ad Massey (969) Idvdual Heght (X) Weght (Y) Some prelmary calculatos have bee doe for you X= X = 49,068 X Y 09,380 = xx Y=4.667 Y = 46,00 S = Syy = 5, Sxy =

19 PubHlth Sprg 0. Regresso ad Correlato Page 9 of 80 Slope ˆ S β = S xy xx ˆ β = = Itercept $ β = Y $ β X ˆ β (5.09)( = 0 =

20 PubHlth Sprg 0. Regresso ad Correlato Page 0 of 80 c. The Aalyss of Varace Table The aalyss of varace table s used to assess the explaatory power of the model just ft. Aalyss of varace calculatos clude sums of squares, degrees of freedom ad mea squares. Sums of squares. A aalyss of varace s a aalyss of a total varablty that s also called a total sum of squares : TOTAL SUM OF SQUARES, TSS = ( Y ) Y = - The TSS measures the total varablty of the observed outcomes Y about ther mea ( the average ). - TSS s thus 00% of what we are tryg to expla (the whole pe) usg the model just ft. I a smple lear regresso aalyss of varace, the TSS s splt to just two compoets: TSS = ( Y ) Y = MODEL SUM OF SQUARES RESIDUAL SUM OF SQUARES MSS = ( Ŷ-Y ) RSS = ( Y-Yˆ ) = = Ths compoet s the porto (wedge of the pe) of the varablty outcome that s explaed by the model just ft, whle ths compoet s the porto (the remader of the pe) of the varablty outcome that remas as leftover as uexplaed by the model just ft.

21 PubHlth Sprg 0. Regresso ad Correlato Page of 80 I a aalyss of varace we compare the porto of the total that s explaed by the ftted model wth the porto of the total that remas leftover as resdual Here s the partto (Note Look closely ad you ll see that both sdes are the same) ( Y ) ( ˆ ) ( ˆ Y = Y Y + Y Y) Some algebra (ot show) cofrms the partto of the total sum of squares to ts two compoets. ( Y Y) = ( Y Yˆ) + ( Yˆ Y) TSS RSS + MSS Total sum of squares Resdual sum of squares Model sum of squares Degrees of freedom (df). Each of the three sums of squares (TSS, RSS, ad MSS) s a calculato that utlzes every data pot, all of them. They dffer, however, the costrats that were also utlzed. Tp! Every tme a costrat s placed o the data, a degree of freedom s lost. To start wth, the data are a radom sample of mutually depedet outcomes, sample sze = The key here s mutually depedet, because t meas free to vary Thus, to start wth, ad before ay costrats, degrees of freedom = sample sze =. TSS: degree of freedom s lost because there s costrat o the data I computg the total sum of squares, squared devatos are measured about the sample mea Y. There s costrat o the data fxg Y. Thus, TSS degrees of freedom = (-) RSS: degrees of freedom are lost because there are costrats o the data I computg the resdual sum of squares, squared devatos are measured about the predcted values Ŷ = β ˆ ˆ 0 + βx. Now there are costrats o the data, oe for fxg ˆβ 0 ad the secod for fxg ˆβ. Thus, RSS degrees of freedom, smple lear regresso = (-) MSS: Tp Now we have to thk about ths as follows: cout oe degree of freedom for each regresso parameter AFTER the tercept I smple lear regresso there are two regresso model parameters, oe for the slope ad oe for the tercept. Thus, after the tercept, there s just the regresso parameter ad t s for the slope. MSS degrees of freedom = ()

22 PubHlth Sprg 0. Regresso ad Correlato Page of 80 Mea squares. A sum of squares by tself s ot a varace estmate because t s a measure of all the varablty; eg all the varablty about the mea (TSS) or all the varablty of the model about the mea (MSS) or all the varablty of the observatos about ther assocated predcted values (RSS). Istead, mea squares are varace estmates. They are defed: sum of squares mea square = varace estmate = degrees of freedom The aalyss of varace smple lear regresso compares the two varace estmates, due model versus due resdual to assess the explaatory power of the model just ft.. The relatoshp betwee X ad Y has a lear compoet wth a o-zero slope: β 0 A good predcto of Y s the ftted le: Yˆ = ˆ β + ˆ β X 0 The relatoshp betwee X ad Y (f ay) has o lear compoet. β = 0 A better predcto of Y s the average of the Y s: Yˆ = ˆ β = Y 0 Cosder the due model devatos: ( ) ˆ β ˆ 0 β Y ˆ Y = ( X + Y ) Here, cosder the due due resdual devatos: ( Y Yˆ ) = ( Y ˆ β 0 ) = ( Y Y) = Y ˆ β X + ˆ β X Y = ˆ β ( X X ) A straght le relatoshp s helpful. MSS = ( ˆ ) Y Y = RSS = ( ˆ ) Y Y = MODEL mea square RESIDUAL mea square s relatvely large s relatvely small wll be large MODEL A straght le relatoshp s ot helpful MSS = ( ˆ ) Y Y = RSS = ( ˆ ) Y Y = mea square RESIDUAL mea square s relatvely small s relatvely large wll be small (close to )

23 PubHlth Sprg 0. Regresso ad Correlato Page 3 of 80 Summary of Aalyss of Varace Terms. TSS: The total or total, corrected refers to the varablty of Y about Y b = Y Y g s called the total sum of squares Degrees of freedom = df = (-) Dvso of the total sum of squares by ts df yelds the total mea square. RSS: The resdual or due error refers to the varablty of Y about $ Y Y Y$ = c h s called the resdual sum of squares Degrees of freedom = df = (-) Dvso of the resdual sum of squares by ts df yelds the resdual mea square. 3. MSS: The model or due regresso refers to the varablty of $ Y about Y c h b g = = Y$ Y $ = β X X s called the regresso sum of squares Degrees of freedom = df = Dvso of the regresso sum of squares by ts df yelds the regresso mea square or model mea square.. Source df Sum of Squares Mea Square Model MSS = SS(model)= ( Yˆ -Y) MSS/ = SS(model) / = Resdual (-) RSS = SS(resdual)= ( Y ˆ -Y) RSS/(-) = = SS(resdual)/(-) Total, corrected (-) SS(total)= Y -Y TSS = ( ) Ht The etry the mea square colum s always the sum of squares dvded by the degrees of freedom =

24 PubHlth Sprg 0. Regresso ad Correlato Page 4 of 80 Be careful! Aalyss of varace aswers a lmted questo. Does the ft of the straght le model expla a sgfcat porto of the varablty of the dvdual Y about Y? Is ths better tha usg Y aloe? Aalyss of Varace does NOT address: - Is the choce of the straght le model correct? - Would aother fuctoal form be a better choce? Illustrato Stata Commad.. regress logwt age Partal lstg of output (ow I m showg you the aalyss of varace porto) aotatos red Sum of Degrees Mea Squares of freedom Square Source SS df MS Number of obs = F(, 9) = = MS (Model)/MS (Resdual) Model = MSS = MSS/ Prob > F = Resdual = RSS = RSS/9 R-squared = = SS(Model)/SS(total) Adj R-squared = 0.998= R adjusted for, #predctors Total Root MSE =.0807= MS(Resdual)

25 PubHlth Sprg 0. Regresso ad Correlato Page 5 of 80 d. Assumptos for a Straght Le Regresso Aalyss See aga, page. Least squares estmato does ot requre a probablty model. However, f we wat to do hypothess tests or cofdece terval estmato or both, the we do eed a probablty model. Assumptos of Smple Lear Regresso. The outcomes Y, Y,, Y are depedet.. The values of the predctor varable X are fxed ad measured wthout error. 3. At each value of the predctor varable X=x, the dstrbuto of the outcome Y s ormal wth mea = μ Y X=x = β 0 + β X ad commo varace = σ Y x. Assumptos -3 also mea that, for each dvdual, Y = β 0 + βx + ε where ε, ε,, ε are depedet ad detcally dstrbuted Normal wth mea μ ε = 0 ad varace = σ ε.= σ Y x

26 PubHlth Sprg 0. Regresso ad Correlato Page 6 of 80 Wth these assumptos, the comparso of the due model versus due resdual varace estmates s a F-statstc uder the ull hypothess of zero slope. mea square (due model) F = wth df =, (-) ull true. mea square (due resdual) Null Hypothess true β = 0 Due model mea square has expected value σ Y X Due resdual mea square, MS(resdual), has expected value σ Y X F = MS(model)/MS(resdual) wll be close to Null Hypothess ot true β 0 Due model meas square has expected value σ Y X + β b = X X Due resdual mea square, MS(resdual), has expected value σ Y X F = MS(model)/MS(resdual) wll be LARGER tha g Illustrato Stata for the model of Y=LOGWT to X=AGE:. regress logwt age Output (aother partal lstg) - Aotatos red. Sum of Degrees Mea Squares of freedom Square Source SS df MS Number of obs = F(, 9) = = MS (Model)/MS (Resdual) Model Prob > F = Resdual R-squared = = SS(Model)/SS(total) Adj R-squared = 0.998= R adjusted for ad # predctors Total Root MSE =.0807= MS(Resdual)

27 PubHlth Sprg 0. Regresso ad Correlato Page 7 of 80 Ths output correspods to the followg. Source df Sum of Squares Mea Square Due model MSS= ( Yˆ -Y) = 4.06 MSS/ = 4.06 Due resdual (-) = 9 Total, corrected (-) = 0 = ( ˆ ) RSS= Y -Y = = ( ) = 4.85 = TSS= Y -Y RSS/(-) = Stata provdes other formato, too: R-SQUARED = (MSS)/(TSS) Ths s the percet of the total sum of squares TSS that s explaed by the ft of the curret model ( ths case, the straght le model). - Tp! As predctors are added to the model, R-SQUARED ca oly crease. Evetually, we eed to adjust ths measure to take ths to accout. See ADJUSTED R-SQUARED. F(, 9) = [ mea square(model) ] / [mea square(resdual) ] Ths s the overall F test troduced o page 6. = [ 4.06 ] / [ ] = wth df =, 9 Prob > F = acheved sgfcace level (p-value) Ths s the result of the p-value calculato for the F test. Thus, t s the probablty of a F statstc value as extreme or more extreme tha the value attaed for the observed data uder the ull hypothess assumpto. I ths example, p-value < 0.000, promptg rejecto of the ull hypothess H O. We coclude that the ftted le s a statstcally sgfcat mprovemet model of the outcome Y compared to usg the smple model of the average Y. Root MSE = [ mea square (resdual) ] Ths s used may hypothess test ad cofdece terval calculatos.

28 PubHlth Sprg 0. Regresso ad Correlato Page 8 of 80 e. Hypothess Testg Smple Lear Regresso Model: Y = β 0 + β X Whle there are more tha two statstcal tests, there are just two hypothess test questos smple lear regresso:. Does the ft of a straght le expla statstcally sgfcatly more of the varablty outcomes tha the ull model that says there s o systematc relatoshp betwee X ad Y? Overall F-test t-test of zero slope t-test of zero correlato Tp! These are all equvalet!. Gve the ft of a straght le relatoshp, s the tercept statstcally sgfcatly dfferet from zero; that s, does the le pass through the org? t-test of zero tercept Overall F-Test Research Questo: Does the ftted regresso model expla statstcally sgfcatly more of the varablty amog the outcomes Y tha s explaed by the average of the Y s? Assumptos: As before (see page 5). H O ad H A : H O: β = 0 H : β 0 A Test Statstc: mea square(model) F= mea square(resdual) df=,(-)

29 PubHlth Sprg 0. Regresso ad Correlato Page 9 of 80 Evaluato rule: Uder the ull hypothess, F-statstc values wll be close to. Uder the alteratve hypothess, β 0, F-statstc values wll ted to be larger tha. Thus, our p-value calculato aswers: What are the chaces of obtag our value of the F or oe that s larger f we beleve the ull hypothess that β = 0? Calculatos: For our data, we obta p-value = mea square(model) pr F,(-) β =0 =pr F, <<.000 mea square(resdual) Evaluate: Uder the ull hypothess that β = 0, the chaces of obtag a F-statstc value as (or more) extreme as were less tha chace 0,000. Ths s a very small lkelhood! Statstcal rejecto. Iterpret: The ftted straght le model explas statstcally sgfcatly more of the varablty Y=LOGWT tha s explaed by the average of LOGWT aloe Stay tued. Later, we ll see that the aalyss does ot stop here

30 PubHlth Sprg 0. Regresso ad Correlato Page 30 of 80 T-test of Zero Slope Prelmares: () The overall F test ad the test of the slope are equvalet.; () the test of the slope uses a t-score approach to hypothess testg; ad (3) t ca be show that { t-score for slope } = { overall F } Research Questo: Is the slope β = 0? Assumptos: As before. H O ad H A : Test Statstc: H : β = 0 O H : β 0 A To compute the t-score, we eed a estmate of the stadard error of $ β ( ˆ ) SEˆ β = mea square(resdual) ( X-X ) = Our t-score s therefore: ( observed) -( expected) seˆ ( observed) t-score = = df=(-) ( ˆβ ) -0 ( ) seˆ ( βˆ )

31 PubHlth Sprg 0. Regresso ad Correlato Page 3 of 80 Illustrato Stata t = (Coef)/(Std. Err.) = / logwt Coef. Std. Err. t P> t [95% Cof. Iterval] age _cos Revew- Recall what we mea by a t-score: T=73.8 says the estmated slope s estmated to be 73.8 stadard error uts away from ts expected value of zero. Check that { t-score } = { Overall F }: [ 73.8 ] = whch s close. Evaluato rule: The p-value calculato aswers: Assumg ull hypothess model (β = 0), what were the chaces of obtag a estmated slope (0.959) that s as extreme as 73.8 stadard error uts away ( ether drecto!) from ts expected value of 0? Calculatos: For our data, we obta the two sded p-value = ˆβ -0 pr t (-) = pr[ t ] <<.000 seˆ ( βˆ ) Evaluate: Uder the ull hypothess that β = 0, the chaces of obtag a t-statstc value as (or more) extreme as 73.8 were less tha chace 0,000. Ths s a very small lkelhood! Statstcal rejecto. Iterpret: The terpretato s the same as for the overall F-test

32 PubHlth Sprg 0. Regresso ad Correlato Page 3 of 80 T-test of Zero Itercept Tp! Ths s rarely of terest Research Questo: Is the tercept β 0 = 0? Assumptos: As before. H O ad H A : Test Statstc: H : β = 0 O 0 H : β 0 A 0 To compute the t-score for the tercept, we eed a estmate of the stadard error of $ β 0 X SEˆ ( β ˆ 0 ) = mea square(resdual) + = Our t-score s therefore: ( X-X ) ( observed) -( expected) seˆ ( observed) t-score = = df=(-) Illustrato Stata ( ˆβ 0 )-0 ( ) seˆ ( βˆ 0 ) t = (Coef)/(Std. Err.) logwt Coef. Std. Err. t P> t [95% Cof. Iterval] age _cos = /

33 PubHlth Sprg 0. Regresso ad Correlato Page 33 of 80 Evaluato rule: The p-value calculato aswers: Assumg ull hypothess model (β 0 = 0), what were the chaces of obtag a estmated tercept (-.6893) that s as extreme as stadard error uts away ( ether drecto!) from ts expected value of 0? Calculatos: For these data the two sded p-value = ˆβ 0-0 pr t (-) = pr[ t ] <<.000 seˆ ( βˆ 0 ) Evaluate: Uder the ull hypothess that β 0 = 0, the chaces of obtag a t-statstc value as (or more) extreme as were less tha chace 0,000. Ths s a very small lkelhood! Statstcal rejecto. Iterpret: Coclude that the tercept s statstcally sgfcatly dfferet from zero or, equvaletly, that the straght le relatoshp does ot pass through the org.

34 PubHlth Sprg 0. Regresso ad Correlato Page 34 of 80 f. Cofdece Iterval Estmato Smple Lear Regresso Model: Y = β 0 + β X Cofdece tervals are helpful provdg formato about the rage of possble parameter values that are cosstet wth the observed data. A smple lear regresso aalyss mght clude cofdece terval estmato of four parameters: () slope: β ; () tercept: β 0 ; (3) mea of populato for whom X=x 0 : β 0 + β x 0 ad (4) predcted respose for a dvdual wth X=x: β 0 + β x 0. I all staces, the cofdece coeffcet s a percetle of the studet t-dstrbuto wth df = (-). Parameter Estmate SE (Estmate) Cofdece Coeffcet Slope: β $ β Itercept: β 0 $ β 0 Mea: Y 0 0 β 0 + β x X x0 = β + β x = 0 Predcto: Y 0 0 β 0 + β x X x0 = β + β x = 0 ˆ ˆ ˆ ˆ ˆ ˆ mea square(resdual) ( X X) = mea square(resdual) ( X X ) = X + mea square(resdual) + X X = ( x0 X) ( ) mea square(resdual) + + X X = ( x0 X) ( ) Percetle Studet t- df = (-) Percetle Studet t- df = (-) Percetle Studet t- df = (-) Percetle Studet t- df = (-) Revew of PubHlth 540! () for a 95% CI, the correct percetle s the 97.th percetle; ad more geerally () for a (-α)00% CI, the correct percetle s the (-α/)00 th percetle

35 PubHlth Sprg 0. Regresso ad Correlato Page 35 of 80 Stata llustrato for the model whch fts Y=LOGWT to X=AGE. How ce Stata pretty much gves t to you! logwt Coef. Std. Err. t P> t [95% Cof. Iterval] age _cos % Cofdece Iterval for the Slope, β ) Estmate = ˆ β = se ˆ β = ) SE (Estmate) = ( ) 3) Cofdece coeffcet = 97.5 th percetle of Studet t = t df. 975, = 9 = 6. 95% Cofdece Iterval for Slope β = Estmate ± ( cofdece coeffcet )*SE = ± (.6)(0.0068) = (0.898, 0.09) 95% Cofdece Iterval for the Itercept, β 0 ) Estmate = ˆ β 0 =.6895 se ˆ β = ) SE (Estmate) = ( 0 ) 3) Cofdece coeffcet = 97.5 th percetle of Studet t = t df 95% Cofdece Iterval for Itercept β 0 = Estmate ± ( cofdece coeffcet )*SE = ± (.6)( ) = (-.7585,-.600). 975, = 9 = 6.

36 PubHlth Sprg 0. Regresso ad Correlato Page 36 of 80 Cofdece Itervals for Predctos Stata code. Gree=commet, black = commad, blue=output. * Cofdece Itervals for Ft of Y=LOGWT to X=AGE. * Obta cof coeff as 97.5th percetle of Studet t w df=9. dsplay vttal(9,.05).657. regress logwt age <output ot show>. * Obta predcted values yhat. predct yhat, xb. * Obta se for predcted vdual se. predct se, stdf. * Obta se for predcted mea semea. predct semea, stdp. * 95% Cofdece Itervals for Idvdual Predctos. geerate cllow = yhat - (.657*se). geerate clupp = yhat + (.657*se). lst logwt yhat cllow clupp <output show below>. * 95% Cofdece Itervals for Mea Predctos. geerate cllowm = yhat - (.657*semea). geerate cluppm = yhat + (.657*semea). lst logwt yhat cllowm cluppm <output show below> 95% Cofdece Itervals for Idvdual Predctos 95% Cofdece Itervals for Mea Predctos logwt yhat cllow clupp logwt yhat cllowm cluppm

37 PubHlth Sprg 0. Regresso ad Correlato Page 37 of 80. Itroducto to Correlato a. Pearso Product Momet Correlato What s a correlato coeffcet? A correlato coeffcet s a measure of the assocato betwee two pared radom varables (e.g. heght ad weght). The Pearso product momet correlato, partcular, s a measure of the stregth of the straght le relatoshp betwee the two radom varables. The Spearma correlato s a measure of the stregth of the mootoe creasg (or decreasg) relatoshp betwee the two radom varables. Formula for the Pearso Product Momet Correlato ρ The populato parameter desgato s rho, wrtte as ρ The estmate of ρ, based o formato a sample s represeted usg r. Some prelmares: () Suppose we are terested the correlato betwee X ad Y () ˆ cov(x,y) = = (x x)(y y) (-) = S xy (-) Ths s the covarace(x,y) (3) (4) ˆ var(x) = ˆ var(y) = = = (x x) (-) (y y) (-) Sxx = (-) = S yy (-) ad smlarly

38 PubHlth Sprg 0. Regresso ad Correlato Page 38 of 80 Pearso Product Momet Correlato ˆ ρ = r = cov(x,y) ˆ var(x)var(y) ˆ ˆ = S xy S S xx yy Tp! If you absolutely have to do t by had, a equvalet (more calculator fredly formula) s ˆ ρ = r = = xy x y = = x y = = x y = = The correlato r ca take o values betwee 0 ad oly Thus, the correlato coeffcet s sad to be dmesoless t s depedet of the uts of x or y. Sg of the correlato coeffcet (postve or egatve) = Sg of the estmated slope ˆβ.

39 PubHlth Sprg 0. Regresso ad Correlato Page 39 of 80 Relatoshp betwee slope ˆβ ad the sample correlato r Because ˆ S xy β = ad Sxx r = S xy S S xx yy A lttle algebra reveals the followg terrelatoshps: r = S S xx yy ˆ β ad ˆ β = S S yy xx r Thus, beware!!! It s possble to have a very large (postve or egatve) r mght accompayg a very ozero slope, asmuch as - A very large r mght reflect a very large S xx, all other thgs equal - A very large r mght reflect a very small S yy, all other thgs equal.

40 PubHlth Sprg 0. Regresso ad Correlato Page 40 of 80 b. Hypothess Test of Zero Correlato Recall (see page 8) - The ull hypothess of zero correlato s equvalet to the ull hypothess of zero slope. Research Questo: Is the correlato ρ = 0? Is the slope β = 0? Assumptos: As before. See page 5. H O ad H A : H H O A : ρ = 0 : ρ 0 Test Statstc: A lttle algebra (ot show) yelds a very ce formula for the t-score that we eed. r (-) t score= r df = ( ) We ca fd ths formato our output. Recall the frst example ad the model of Y=LOGWT to X=AGE: Stata llustrato for the model whch fts Y=LOGWT to X=AGE. Tp! The Pearso Correlato, r, s the R-squared the output. Source SS df MS Number of obs = F(, 9) = Model Prob > F = Resdual R-squared = Adj R-squared = Total Root MSE =.0807 Pearso Correlato, r = = 0.999

41 PubHlth Sprg 0. Regresso ad Correlato Page 4 of 80 Substtuto to the formula for the t-score yelds r (-) t score= = = = 7.69 r Note: The value.999 the umerator s r= R =.9983 =.999 Ths s very close to the value of the t-score (73.8) that was obtaed for testg the ull hypothess of zero slope. The dscrepacy s probably roudg error. I dd the calculatos o my calculator usg 4 sgfcat dgts. Stata probably used more sgfcat dgts - cb.

42 PubHlth Sprg 0. Regresso ad Correlato Page 4 of Multvarable Lear Regresso a. Defto, Idcator Varables, ad Desg Varables It s possble to cosder multple predctors a lear regresso model ad these ca be ay mx of cotuous or dscrete. There s stll oe outcome varable Y that s cotuous ad assumed dstrbuted ormal. Example Cosder a hypothetcal study of the recogto of vertebral fracture amog older hosptalzed wome. Suppose we are terested uderstadg the correlates of the cotuous outcome, legth of hosptal stay, measured days. We mght hypothesze that both older age ad hstory of pror fracture are predctors of loger legth of hosptal stay. Y = legth of hosptal stay X = age, cotuous X = hstory of pror vertebral fracture (=yes, 0=o) A multvarable lear model that relates Y to X ad X s the followg Y = β 0 + β X + β X The Geeral Multvarable Lear Model Smlarly, t s possble to cosder a multvarable model that cludes p predctors: Y = β 0 + β X + + β p X p p = # predctors, apart from the tercept Each X X p ca be ether dscrete or cotuous. are comprsed of data pots of the form (Y, X,, X p ) For the th dvdual, we have a vector of predctor varable values that s represeted X = X X X,,..., p

43 PubHlth Sprg 0. Regresso ad Correlato Page 43 of 80 Assumptos The assumptos requred are a exteso of the oes we saw prevously.. The separate observatos Y, Y,, Y are depedet.. The values of the predctor varables X X p are fxed ad measured wthout error. 3. For each vector value of the predctor varable X=x, the dstrbuto of values of Y follows a ormal dstrbuto wth mea equal to μ Y X=x ad commo varace equal to σ Y x. 4. The separate meas μ Y X=x le o the le wth defto μ Y X=x = β 0 + β X + + β p X p Idcator Varables (0/ Predctors) Idcator varables are commoly used as predctors multvarable regresso models. We let = value of dcator whe trat s preset 0 = value of dcator whe trat s ot preset The estmated regresso coeffcet β assocated wth a dcator varable has a straghtforward terpretato, amely: β = predcted chage outcome Y that accompaes presece of trat Examples of Idcator Varables SEXF = f dvdual s female 0 otherwse TREAT = f dvdual receved expermetal treatmet 0 otherwse

44 PubHlth Sprg 0. Regresso ad Correlato Page 44 of 80 Desg Varables (Meagful sets of 0/ predctor varables) What do you do f you have a omal predctor wth more tha possble values? Aswer desg varables. Desg varables are sets of dcator varables that together defe values of omal varables. If a omal varable has k possble values, (k-) dcator varables are eeded to dstgush the etre rage of possbltes. Examples of Desg Varables Suppose a radomzed tral seeks to compare medcal therapy versus agoplasty versus bypass surgery for the treatmet of myocardal farcto. Thus, the orgal treatmet varable TREAT s omal wth 3 possble values: TREAT = f treatmet s medcal therapy f treatmet s agoplasty 3 f treatmet s bypass surgery We caot put TREAT to a regresso model as s because the estmated regresso coeffcet would be uterpretable. So TREAT s replaced wth a set of desg varables. For example, we mght clude the followg set: TR_ANG = f treatmet s agoplasty 0 otherwse TR_SUR = f treatmet s bypass surgery 0 otherwse A set of desg varables comprsed of (3-) = dcator varables summarze three possble values of treatmet. The referece category s medcal therapy. Subgroup Value of TR_ANG Value of TR_SUR TREAT= ( medcal ) 0 0 TREAT= ( agoplasty ) 0 TREAT=3 ( surgery ) 0

45 PubHlth Sprg 0. Regresso ad Correlato Page 45 of 80 Gudeles for the Defto of Idcator ad Desg Varables ) Cosder the choce of the referece group. Ofte ths choce wll be straghtforward. It mght be oe of the followg categores of values of the omal varable: The uexposed The placebo The stadard The most frequet ) K levels of the omal predctor (K-) dcator varables Whe the umber of levels of the omal predctor varable = k, defe (k-) dcator varables that wll detfy persos each of the separate groups, apart from the referece group. 3) I geeral (ths s ot hard ad fast), treat the (k-) desg varables as a set. - Eter the set together - Remove the set together - I geeral, reta all (k-) of the dcator varables, eve whe oly a subset are sgfcat.

46 PubHlth Sprg 0. Regresso ad Correlato Page 46 of 80 b. The Aalyss of Varace Table The deas of the aalyss of varace table troduced prevously (see page 0) apply here, as well.. TSS: Total or total, corrected b = TSS = Y Y Degrees of freedom = df = (-).. MSS: Regresso or due model MSS = ( ˆ ) = Y g Y s the varablty of Y about Y s the varablty of $ Y about Y Degrees of freedom = df = p = # predctors apart from tercept 3. RSS: Resdual or due error refers to the RSS = Y Y$ = c Degrees of freedom = df = (-) (p) h s the varablty of Y about $ Y Source df Sum of Squares Mea Square Model p MSS = ( Yˆ -Y) (MSS)/p Resdual (-) - p Total, corrected (-) = ( ˆ ) RSS = Y -Y (RSS)/(--p) = ( ) TSS = Y -Y =

47 PubHlth Sprg 0. Regresso ad Correlato Page 47 of 80 Overall F Test The overall F test troduced prevously (see page 8) also apples, yeldg a overall F-test to assess the sgfcace of the varace explaed by the model. Note that the degrees of freedom s dfferet here; ths s because there are ow p predctors stead of predctor. H : β = β =... = β = 0 O p H : At least oe β 0 A mea square(model) F = wth df = p, (--p) mea square(resdual)

48 PubHlth Sprg 0. Regresso ad Correlato Page 48 of 80 c. The Partal F Test The partal F test s a statstcal techque for assessg assocatos whle cotrollg for cofoudg. It s approprate oly whe the two models beg compared are herarchcal. What are herarchcal models? Herarchcal models are two models of a partcular type. The settg s that we are terested comparg the two models. The descrptor herarchcal meas that all of the predctors oe of the two models (the smaller model) are cotaed the larger model. For example, suppose we are dog a multple lear regresso aalyss of the outcome Y = legth of hosptal stay. We are terested comparg two models ad they are herarchcal: Predctors smaller model = {AGE, SEX } Predctors larger model = {AGE, SEX + HISTORY OF FRACTURE} Herarchcal s satsfed because all of the predctors (e.g. - AGE ad SEX) that are cotaed the smaller model are cotaed the larger model. The mportat pot to ote s ths. The comparso of these two models s a aalyss of the ature ad sgfcace of the extra predctor, HISTORY OF FRACTURE for the predcto of legth of hosptal stay, adjustg for ( cotrollg for) all of the varables the smaller model (AGE, SEX). Thus, the comparso of the herarchcal models s addressg the followg questo: What s the sgfcace of HISTORY OF FRACTURE for the predcto of Y = legth of hosptal stay, after cotrollg for the effects of AGE ad SEX?

49 PubHlth Sprg 0. Regresso ad Correlato Page 49 of 80 Statstcal Defto of the Partal F Test Research Questo: Does cluso of the extra predctors expla sgfcatly more of the varablty outcome compared to the varablty that s explaed by the predctors that are already the model? H O : Addto of X p+ X p+k s of o statstcal sgfcace for the predcto of Y after cotrollg for the predctors X X p meag that: H A : Not β = β =... = β = 0 p+ p+ p+k F PARTIAL = { Extra regressso sum of squares } / { Extra regresso df } { Resdual sum of squares LARGE } / { Resdual df LARGE } = ( ) ( ) ( ) MSS(X...X px p+...x p+k ) - MSS(X...X p) / p +k - p RSS(X...XpX p+...x p+k) / - - p+k Numerator df = (p+k) (p) = k Deomator df = ( ) - (p+k) H O true: The extra predctors are ot sgfcat adjusted aalyss H O false: The extra predctors are sgfcat adjusted aalyss F value = small p-value = large F value = large p-value = small

50 PubHlth Sprg 0. Regresso ad Correlato Page 50 of 80 d. Multple Partal Correlato The cocept of a partal correlato s related to that of a partal F test. To what extet are two varables, say X ad Y, correlated after accoutg for a cotrol varable, say Z? Prelmary : Regress X o the cotrol varable Z - Obta the resduals - These resduals represet the formato X that s depedet of Z Prelmary : Now regress Y o the cotrol varable Z - Obta the resduals - These resduals represet the formato Y that s depedet of Z These two sets of resduals permt you to look at the relatoshp betwee X ad Y, depedet of Z. Partal correlato (X,Y cotrollg for Z) = Correlato (resduals of X regressed o Z, resduals of Y regressed o Z) If there s more tha oe cotrol varable Z, the result s a multple partal correlato A ce detty allows us to compute a partal correlato by had from a multvarable model developmet Recall that R = [model sum of squares]/[total sum of squares] = MSS / TSS A partal correlato s also a rato of sums of squares. Tp! A partal F statstc s a rato of mea squares.

51 PubHlth Sprg 0. Regresso ad Correlato Page 5 of 80 Partal Correlato (X,Y cotrollg for Z) = MSS Model(X, Z) - MSS Model(Z aloe) RSS Resdual(Z aloe model) The hypothess test of a zero partal correlato s the partal F test troduced prevously. Research Questo: Cotrollg for Z, s there a lear correlato betwee X ad Y? H O : ρx,y Z = 0 H A : Not F PARTIAL = [ Z ] ( ) [ RSS(X, Z) ] / ( - ) - ( ) MSS(X, ) - MSS(X) / - = { Extra regressso sum of squares } / { Extra regresso df = } { SS Resdual LARGE } / { df Resdual LARGE } Numerator df = () () = Deomator df = ( ) () Tp! Notce that the deomator of the partal F test cotas the resdual sum of squares (RSS) for the large model, whereas the deomator of the partal correlato cotas the resdual sum of squares (RSS) for the small model!

52 PubHlth Sprg 0. Regresso ad Correlato Page 5 of Multvarable Model Developmet a. Itroducto Tp! Be careful the use of such text book approaches as forward stepwse, backward elmato, best subsets, etc.!! A detaled dscusso of multvarable model developmet s beyod the scope of ths course. However, we troduce the basc deas ad suggest some strateges. These deas wll be see to be applcable the settg of logstc regresso aalyss also. Approprate strateges for model developmet should stead be guded by the goal of the aalyss ad subject matter cosderatos. There s o sgle best strategy. For example, our approach mght be very dfferet depedg o the desg of our epdemologcal study: Settg/Goal of Aalyss Radomzed cotrolled tral. Goal: Does the terveto work? Prortes Model Developmet The treatmet varable mght be the last varable etered to the model. Thus, we address the questo: What s the depedet sgfcace of the expermetal terveto cotrollg for all other flueces? Epdemologcal descrpto We mght wat to reta as few predctors as possble so as to esure geeralzablty of our fdgs Rsk assessmet A publc health vestgato of hazardous exposures ad outcomes mght seek to be geerous ts detfcato of possble health rsks so as to ot mss aythg.

53 PubHlth Sprg 0. Regresso ad Correlato Page 53 of 80 b. Example Huma P53 ad Breast Cacer Rsk Note Ths s ot the oly strategy Source: Matthews et al. Party Iduced Protecto Agast Breast Cacer 007. Backgroud: Substatal epdemologc evdece suggests that early frst pregacy cofers a reduced lfe tme rsk of breast cacer. I laboratory studes of mce, smlar observatos have bee made. Laboratory studes of mce have also explored the relatoshp betwee party, expresso of the tumor suppressor gee p53 ad subsequet breast cacer tumor developmet. Lesley et al hypotheszed that mammary tssue cultured from wome who had a early full term pregacy would have creased levels of p53 as compared to ullparous wome ad as compared to wome whose frst full term pregacy was later lfe. Research Questo: What s the relatoshp of Y=p53 expresso to party ad age at frst pregacy, after adjustmet for curret age ad establshed breast cacer rsk, specfcally the followg: age at frst mess, famly hstory of breast cacer, meopausal status, ad hstory of oral cotraceptve use? Note Age at frst pregacy s cosdered each of two ways: () cotuous, years; ad () age at frst pregacy < 4 years versus age at frst pregacy > 4 years. Desg: Observatoal cohort.

54 PubHlth Sprg 0. Regresso ad Correlato Page 54 of 80 Characterstcs of Aalyss (=68) mea (sd) Rage/sd Age, years 39 (4) 5-75 Age at Frst Mess, years (.4) 9-6 Age at Frst Pregacy, years (=5 Parous oly) 3 (6) P53 Score (vald rage: 6) 3.5 Sd =.05 Famly Hstory Breast Cacer 0 9% Hormoe Replacemet Therapy User 3 4% Post Meopausal 9 8% Oral Cotraceptve User 56 8% Party Status (omal!) Nullparous 7 5% Early Parous (< 4 years) 3 47% Late Parous (> 4 years) 9 8% Number of Pregaces (ordal!) 0 7 5% 9 3% 4 35% 3 or 4 8 6%

55 PubHlth Sprg 0. Regresso ad Correlato Page 55 of 80 Study Varables: Varable Label Defto/Codgs Outcome, Y p53 P53 cotuous Predctors of terest parous Party status = ever parous 0 = ot pregum Number of pregaces 0 = 0 pregaces = pregacy = pregaces 3 = 3+ pregaces oe 0/ dcator of pregacy = f (pregum=) 0 otherwse two 0/ dcator of pregaces = f (pregum=) 0 otherwse threep 0/ dcator of or more pregaces = f (pregum=3) 0 otherwse agepreg Age at frst pregacy Cotuous, years = mssg for ever parous early late 0/ dcator frst pregacy at age < 4 0/ dcator frst pregacy at age >4 = yes 0 = o = mssg for ever parous = yes 0 = o = mssg for ever parous Potetal Covarates agecurr Curret age cotuous, years ageme Age at frst mess Cotuous, years famhx0 0/ dcator of famly hstory of breast cacer = f ay famly hx of breast ca 0 otherwse meop 0/ dcator of post-meopause = f yes oc 0/ dcator of ever used oral cotraceptves 0 otherwse = f yes 0 otherwse

56 PubHlth Sprg 0. Regresso ad Correlato Page 56 of 80 Step : Ft oe predctor models. Reta for further cosderato: predctors wth crude sgfcace levels for assocato p <.5 Reta for further cosderato: predctors of a pror terest that you kow wat to reta (for example: a blockg varable such as study ste a multceter radomzed tral) Report estmated regresso coeffcets, SE, 95% cofdece tervals, p Example Predctor R Sgfcace of Remark Model = % Varace Explaed Overall F Test parous Reta for further cosderato oe, two,.009 Reta for further cosderato threep *. agepreg.04.4 Reta for further cosderato early, late Reta for further cosderato agecurr ageme famhx meop oc * Note I dd ot assume that Y=p53 s learly related to PREGNUM. Thus, stead of usg PREGNUM as s, ths predctor s replaced by three desg varables: ONE, TWO ad THREEP. The referet group s thus PREGNUM=0, represetg ullparous. Predctor, X Δp53 ˆβ = ΔX se(β) ˆ ˆ 95% CI P * Parous , Oe , Two , Threep , Agepreg , Early , Late , agecurr , ageme , Famhx , Meop , Oc , * Note Sgfcace of t-test for predctor, adjusted for the other desg varables.

57 PubHlth Sprg 0. Regresso ad Correlato Page 57 of 80 Step : Ft a step multple lear model. The predctors ths model are the caddates from step. Reta for further cosderato: predctors wth adjusted sgfcace levels for assocato p <.0 Reta for further cosderato: predctors of a pror terest, regardless of the sgfcace of ther crude assocatos step. Example - cotued Cauto!! I fttg a multple predctor model wth desg varables, especally, care eeds to be take to avod what s called overfttg. Overfttg occurs whe two varables have deftos that are equvalet. It also occurs whe two or more predctors the model are collear. I ths example, to avod overfttg () Party wll be modeled usg TWO ad THREEP () Age at frst party wll be modeled usg EARLY ad LATE ADJUSTED Predctor, X Δp53 ˆβ = se(β) ˆ ˆ ΔX 95% CI P * Two , Threep , Early , late , * Note Sgfcace of Wald t-test for predctor, adjusted for the other varables the model.

58 PubHlth Sprg 0. Regresso ad Correlato Page 58 of 80 Step 3: Ft a step 3 multple lear model. The predctors ths model are a the subset of the predctors ad are the oes wth adjusted sgfcace levels p <.0 Compare the step ad step 3 models usg a partal F test. Step model Source SSQ DF MSQ = SSQ/DF Model: two, threep, early, late Resdual: Total, corrected: Step 3 model Source SSQ DF MSQ = SSQ/DF Model: two, threep Resdual: Total, corrected: Partal F =,6 = = [SS model(4 predctor model) - SS model( predctor model)]/ 4 -() SS resdual(4 predctor model)/ ( -) -(4) [ ]/ [ ] 57.7/ 6 [ ] ( ) p-value = Probablty [ F, 6 > 0.6 ] =.7730 Ths s ot statstcally sgfcat, suggestg that EARLY ad LATE are ot sgfcat predctors of P53 after adjustmet for TWO ad THREEP. Cosder droppg EARLY ad LATE.

59 PubHlth Sprg 0. Regresso ad Correlato Page 59 of 80 Step 4: Ivestgate cofoudg by cosderg as possble cofouders the extra varables the step model that are ot the step 3 model. For each cofouder, oe at a tme, perform a partal F test that compares reduced (small) model = step 3 model wth the full (large) model = step 3 model + cofouder of terest. A Suggested Statstcal Crtero for Determato of Cofoudg A varable Z mght be judged to be a cofouder of a X-Y relatoshp f BOTH of the followg are satsfed: ) Its cluso a model that already cotas X as a predctor has adjusted sgfcace level <.0 or <.05; ad ) Its cluso the model alters the estmated regresso coeffcet for X by 5-0% or more, relatve to the model that cotas oly X as a predctor. Example No evdece of cofoudg of Y=p53 X=TWO relatoshp by EARLY or LATE Potetal Cofouder = EARLY LATE Sgfcace of df Partal F test ˆβ WITH cofouder (TWO) = ˆβ WITHOUT cofouder (TWO) = ˆ ˆ ˆ β wth cofouder - β wthout cofouder 4.4% 6.7% Δβ = * 00 ˆβ wthcofouder Example No evdece of Cofoudg of Y=p53 X=THREEP relatoshp by EARLY or LATE Potetal Cofouder = EARLY LATE Sgfcace of df Partal F test ˆβ WITH cofouder (THREEP) = ˆβ (THREEP) = WITHOUT cofouder ˆ β ˆ - β ˆ ˆβ wthcofouder wth cofouder wthout cofouder Δβ = * 00.0% < %

60 PubHlth Sprg 0. Regresso ad Correlato Page 60 of 80 Step 5: Ivestgate effect modfcato cosderg as your startg pot the step 4 model. Beg wth your ear fal model; ths s your step 4 model Create teracto varables These wll be defed as parwse products of the predctor varables. For each teracto varable, oe at a tme Perform a partal F test that compares reduced model = step 4 model full model = step 4 model + teracto varable. A Suggested Statstcal Crtero for Assessmet of Iteracto A caddate teracto varable mght be judged to be worth retag the model f BOTH of the followg are satsfed: ) The partal F test for ts cluso has sgfcace level <.05. ) Its cluso the model alters the estmated regresso coeffcet for the ma effects by 5-0% or more. Example The results to ths pot suggest that a good model s oe cotag TWO ad THREEP. The potetal predctors EARLY ad LATE were ot sgfcat after adjustmet for TWO ad THREEP. Thus, we mght stop here ad ot explore potetal effect modfcato. However, oe of the hypotheses oted the backgroud (see aga page 7), expresses a terest the possblty that frst pregacy at earler age mght fluece P53. So we wll explore t here. Ths also has the advatage of llustratg the mechacs of step 5.

61 PubHlth Sprg 0. Regresso ad Correlato Page 6 of 80 Step 5 model cotas potetal modfer Source SSQ DF MSQ = SSQ/DF Model: two, threep, early Resdual: Total, corrected: Step4 model Source SSQ DF MSQ = SSQ/DF Model: two, threep Resdual: Total, corrected: Partal F =,63 = = [SS model(3 predctor model) - SS model( predctor model)]/ 3 -() SS resdual(3 predctor model)/ ( -) -(3) [ ]/ [ ] / 63 [] ( ) p-value = Probablty [ F, 63 > ] =.547 ot sgfcat; ull s NOT rejected. Coclude o evdece of modfcato of TWO ad THREEP by EARLY. Thus, the fal model s ˆ p53 = *TWO +.07*THREEP % varace explaed = 0.7% Sgfcace of Overall F test =.0006 The sgfcace of the overall F test ca be see from the output below (hghlghted red): Source SS df MS Number of obs = F(, 64) = 8.35 Model Prob > F = Resdual R-squared = Adj R-squared = 0.8 Total Root MSE =.95363

62 PubHlth Sprg 0. Regresso ad Correlato Page 6 of 80 c. Gudeles for Multvarable Aalyss of Large Sets #. State the Research Questos. Am for a focus that s explct, complete, ad focused, cludg: Statemet of populato Defto of outcome Specfcato of hypotheses (predctor-outcome relatoshps) Idetfcato of (cludg ature of) hypotheszed covarate relatoshps #. Defe the Aalyss Varables. For each research questo, ote for each aalyss varable, ts hypotheszed role. Outcome Predctor Cofouder Effect Modfer Itermedary (also called terveg) #3. Prepare a Clea Set Ready for Aalyss ( Maagemet) For each varable, check ts dstrbuto, especally: Completeess Occurrece of logcal errors Wth form cosstecy Betwee form cosstecy Rage

63 PubHlth Sprg 0. Regresso ad Correlato Page 63 of 80 #4. Descrbe the Aalyss Ths descrpto serves three purposes: ) Idetfes the populato actually represeted by the sample ) Defes the rage(s) of relatoshps that ca be explored 3) Idetfes, tetatvely, the fucto form of the relatoshps Methods clude: Frequecy dstrbutos for dscrete varables Mea, stadard devato, percetles for cotuous varables Bar charts Box ad whsker plots Scatter plots #5. Assessmet of Cofoudg The detfcato of cofouders s eeded for the correct terpretato of the predctor-outcome relatoshps. Cofouders eed to be cotrolled aalyses of predctor-outcome relatoshps. Methods clude: Cross-tabulatos ad sgle predctor regresso models to determe whether suspected cofouders are predctve of outcome ad are related to the predctor of terest. Ths step should clude a determato that there s a cofouder-exposure relatoshp amog cotrols. #6. Sgle Predctor Regresso Model Aalyses The ft of these models detfes the ature ad magtude of crude assocatos. It also permts assessmet of the approprateess of the assumed fuctoal form of the predctor-outcome relatoshp. Cross-tabulatos Graphcal dsplays (Scatter plots) Estmato of sgle predctor models

64 PubHlth Sprg 0. Regresso ad Correlato Page 64 of Goodess-of-Ft ad Regresso Dagostcs a. Itroducto ad Termology Nether predcto or estmato have meag whe the estmated model s a poor ft to the data: Our eye tells us: A better fttg relatoshp betwee X ad Y s quadratc We otce dfferet szes of dscrepaces Some observed Y are close to the ftted Y $ (e.g. ear X= or X=8) Other observed Y are very far from the ftted Y $ (e.g. ear X=5) Poor fts of the data to a ftted le ca occur for several reasos ad ca occur eve whe the ftted le explas a large proporto (R ) of the total varablty respose: The wrog fuctoal form (lk fucto) was ft. Extreme values (outlers) exhbt uquely large dscrepaces betwee observed ad ftted values. Oe or more mportat explaatory varables have bee omtted. Oe or more model assumptos have bee volated.

65 PubHlth Sprg 0. Regresso ad Correlato Page 65 of 80 Cosequeces of a poor ft clude: We lear the wrog bology. Comparso of group dffereces are t far because they are uduly flueced by a morty. Comparso of group meas are t far because we used the wrog stadard error. Predctos are wrog because the ftted model does ot apply to the case of terest. Avalable techques of goodess-of-ft assessmet are of two types:. Systematc - those that explore the approprateess of the model tself Have we ft the correct model? Should we ft aother model?. Case Aalyss those that vestgate the fluece of dvdual data pots Are there a small umber of dvduals whose cluso the aalyss flueces excessvely the choce of the ftted model?

66 PubHlth Sprg 0. Regresso ad Correlato Page 66 of 80 Systematc Compoet Goodess-of-Ft Assessmet Some Termology Lk: The fuctoal form (ad the assumed uderlyg dstrbuto of the errors) s sometmes called the lk. Example: mea, μ = β 0 + β X + + β p X p s called the lear lk. Example: Whe μ s a proporto, we mght model l [ μ/(-μ) ] = β 0 + β X + + β p X p. Ths s called the logt lk. Normalty: I the lear model regresso aalyss, we assume that the errors E follow a Normal(0, σ ) dstrbuto. Recall: The errors ε are estmated by the resduals e. Heteroscedastcty: If the assumpto of costat varace of the errors E s ot true, we say there s heteroscedastcty of errors, or o-homogeety of errors.

67 PubHlth Sprg 0. Regresso ad Correlato Page 67 of 80 Goodess-of-Ft Assessmet Some Termology - cotued Case Aalyss Resdual: The resdual s the dfferece betwee the observed outcome Y ad the ftted outcome Y $. e= Y Y $ It estmates the uobservable error ε. Outler: A outler s a resdual that s uusually large. Note: As before, we wll rescale the szes of the resduals va stadardzato so that we ca terpret ther magtudes o the scale of SE uts. Leverage: The leverage s a measure of the uusualess of the value of the predctor X. Leverage = dstace (observed X, ceter of X sample) Predctor values wth hgh leverages have, potetally, a large fluece o the choce of the ftted model. Ifluece: Measures of fluece gauge the chage the ftted model wth the omsso of the data pot. Example: Cook s Dstace

68 PubHlth Sprg 0. Regresso ad Correlato Page 68 of 80 A Feel for Resdual, Leverage, Ifluece Large resduals may or may ot be fluetal 0 y Large resdual Low leverage x The large resdual effects a large fluece. 0 y Large resdual Low leverage x Despte ts sze, the large resdual effects oly small fluece.

69 PubHlth Sprg 0. Regresso ad Correlato Page 69 of 80 A Feel for Resdual, Leverage, Ifluece Hgh leverage may or may ot be fluetal y Hgh leverage Small resdual x The hgh leverage effects a large fluece. 0 y Hgh leverage Small resdual x Despte ts sze, the large leverage effects oly small fluece. Thus, case aalyss s eeded to dscover all of: hgh leverage large resduals large fluece

70 PubHlth Sprg 0. Regresso ad Correlato Page 70 of 80 Overvew of Techques of Goodess-of-Ft Assessmet Lear Model Systematc Compoet Questo Addressed Error Dstrbuto: Is t reasoable to assume a ormal dstrbuto of errors wth a costat varace? H O : E Normal (0, σ ) Procedure Shapro-Wlk test of ormalty Cook-Wesberg test of heteroscedastcty Fuctoal Form: Is the choce of fuctoal form relatg the predctors to outcome a good oe? Method of fractoal polyomals. Systematc Volato: Have we faled to clude ay mportat explaatory (predctor) varables? Ramsey Test for omtted varables. Case Aalyss Are there outlers wth respect to the outcome values? Studetzed resduals Are there outlers wth respect to the predctor varable values? Leverage Are there dvdual observatos wth uduly large fluece o the ftted model? Cook s dstace (fluece)

71 PubHlth Sprg 0. Regresso ad Correlato Page 7 of 80 b. Assessmet of Normalty Recall what we are assumg wth respect to ormalty: Smple Lear Regresso: At each level x of the predctor varable X, the outcomes Y are dstrbuted ormal wth mea = μ Y x = β 0 + βx ad costat varace σ Y x Multple Lear Regresso: At each vector level x = [x, x,,x p ] of the predctor vector X, the outcomes Y are dstrbuted ormal wth mea = μ Y x = β 0 + βx+ βx βpxp ad costat varace σ Y x Ths s what t looks lke (courtesy of a pcture o the web!) Volatos of Normalty are sometmes, but ot always, a serous problem Whe ot to worry: Estmato ad hypothess tests of regresso parameters are farly robust to modest volatos of ormalty Whe to worry: Predctos are sestve to volatos of ormalty Beware: Sometmes the cure for volatos of ormalty s worse tha the problem.

72 PubHlth Sprg 0. Regresso ad Correlato Page 7 of 80 Some graphcal assessmets of ormalty ad what to watch out for: Method. Hstogram of outcome varable Y ad/or Hstogram of resduals. Hstogram of resduals (or studetzed or jackkfe resduals) 3. Quatle quatle plot of the quatles of the resduals versus the quatles of the assumed ormal dstrbuto of the resduals. What to watch out for: Look for ormal shape of the hstogram. Look for ormal shape of the hstogram. Normally dstrbuted resduals wll appear, approxmately, lear. Stata Illustrato (ote Ths example uses a data set from aother source, ot ths lecture) Hstogram wth overlay ormal Quatle Quatle Plot w referece = Normal. hstogram weght, ormal ttle ("Hstogram wth Overlay Normal"). qorm weght, ttle("smple Normal QQ-Plot for Y=Weght")

Topic 9. Regression and Correlation

Topic 9. Regression and Correlation BE54W Regresso ad Correlato Page of 43 Topc 9 Regresso ad Correlato Topc. Defto of the Lear Regresso Model... Estmato.... 3. The Aalyss of Varace Table. 4. Assumptos for the Straght Le Regresso. 5. Hypothess

More information

Simple Linear Regression

Simple Linear Regression Statstcal Methods I (EST 75) Page 139 Smple Lear Regresso Smple regresso applcatos are used to ft a model descrbg a lear relatoshp betwee two varables. The aspects of least squares regresso ad correlato

More information

Unit 9 Regression and Correlation

Unit 9 Regression and Correlation PubHlth 54 - Fall 4 Regresso ad Correlato Page of 44 Ut 9 Regresso ad Correlato Assume that a statstcal model such as a lear model s a good frst start oly - Gerald va Belle Is hgher blood pressure the

More information

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model Lecture 7. Cofdece Itervals ad Hypothess Tests the Smple CLR Model I lecture 6 we troduced the Classcal Lear Regresso (CLR) model that s the radom expermet of whch the data Y,,, K, are the outcomes. The

More information

Multiple Regression. More than 2 variables! Grade on Final. Multiple Regression 11/21/2012. Exam 2 Grades. Exam 2 Re-grades

Multiple Regression. More than 2 variables! Grade on Final. Multiple Regression 11/21/2012. Exam 2 Grades. Exam 2 Re-grades STAT 101 Dr. Kar Lock Morga 11/20/12 Exam 2 Grades Multple Regresso SECTIONS 9.2, 10.1, 10.2 Multple explaatory varables (10.1) Parttog varablty R 2, ANOVA (9.2) Codtos resdual plot (10.2) Trasformatos

More information

ENGI 3423 Simple Linear Regression Page 12-01

ENGI 3423 Simple Linear Regression Page 12-01 ENGI 343 mple Lear Regresso Page - mple Lear Regresso ometmes a expermet s set up where the expermeter has cotrol over the values of oe or more varables X ad measures the resultg values of aother varable

More information

Lecture 8: Linear Regression

Lecture 8: Linear Regression Lecture 8: Lear egresso May 4, GENOME 56, Sprg Goals Develop basc cocepts of lear regresso from a probablstc framework Estmatg parameters ad hypothess testg wth lear models Lear regresso Su I Lee, CSE

More information

Statistics MINITAB - Lab 5

Statistics MINITAB - Lab 5 Statstcs 10010 MINITAB - Lab 5 PART I: The Correlato Coeffcet Qute ofte statstcs we are preseted wth data that suggests that a lear relatoshp exsts betwee two varables. For example the plot below s of

More information

STA302/1001-Fall 2008 Midterm Test October 21, 2008

STA302/1001-Fall 2008 Midterm Test October 21, 2008 STA3/-Fall 8 Mdterm Test October, 8 Last Name: Frst Name: Studet Number: Erolled (Crcle oe) STA3 STA INSTRUCTIONS Tme allowed: hour 45 mutes Ads allowed: A o-programmable calculator A table of values from

More information

Mean is only appropriate for interval or ratio scales, not ordinal or nominal.

Mean is only appropriate for interval or ratio scales, not ordinal or nominal. Mea Same as ordary average Sum all the data values ad dvde by the sample sze. x = ( x + x +... + x Usg summato otato, we wrte ths as x = x = x = = ) x Mea s oly approprate for terval or rato scales, ot

More information

Chapter 13 Student Lecture Notes 13-1

Chapter 13 Student Lecture Notes 13-1 Chapter 3 Studet Lecture Notes 3- Basc Busess Statstcs (9 th Edto) Chapter 3 Smple Lear Regresso 4 Pretce-Hall, Ic. Chap 3- Chapter Topcs Types of Regresso Models Determg the Smple Lear Regresso Equato

More information

b. There appears to be a positive relationship between X and Y; that is, as X increases, so does Y.

b. There appears to be a positive relationship between X and Y; that is, as X increases, so does Y. .46. a. The frst varable (X) s the frst umber the par ad s plotted o the horzotal axs, whle the secod varable (Y) s the secod umber the par ad s plotted o the vertcal axs. The scatterplot s show the fgure

More information

CLASS NOTES. for. PBAF 528: Quantitative Methods II SPRING Instructor: Jean Swanson. Daniel J. Evans School of Public Affairs

CLASS NOTES. for. PBAF 528: Quantitative Methods II SPRING Instructor: Jean Swanson. Daniel J. Evans School of Public Affairs CLASS NOTES for PBAF 58: Quattatve Methods II SPRING 005 Istructor: Jea Swaso Dael J. Evas School of Publc Affars Uversty of Washgto Ackowledgemet: The structor wshes to thak Rachel Klet, Assstat Professor,

More information

Midterm Exam 1, section 1 (Solution) Thursday, February hour, 15 minutes

Midterm Exam 1, section 1 (Solution) Thursday, February hour, 15 minutes coometrcs, CON Sa Fracsco State Uversty Mchael Bar Sprg 5 Mdterm am, secto Soluto Thursday, February 6 hour, 5 mutes Name: Istructos. Ths s closed book, closed otes eam.. No calculators of ay kd are allowed..

More information

Simple Linear Regression

Simple Linear Regression Correlato ad Smple Lear Regresso Berl Che Departmet of Computer Scece & Iformato Egeerg Natoal Tawa Normal Uversty Referece:. W. Navd. Statstcs for Egeerg ad Scetsts. Chapter 7 (7.-7.3) & Teachg Materal

More information

Chapter 8. Inferences about More Than Two Population Central Values

Chapter 8. Inferences about More Than Two Population Central Values Chapter 8. Ifereces about More Tha Two Populato Cetral Values Case tudy: Effect of Tmg of the Treatmet of Port-We tas wth Lasers ) To vestgate whether treatmet at a youg age would yeld better results tha

More information

Multiple Linear Regression Analysis

Multiple Linear Regression Analysis LINEA EGESSION ANALYSIS MODULE III Lecture - 4 Multple Lear egresso Aalyss Dr. Shalabh Departmet of Mathematcs ad Statstcs Ida Isttute of Techology Kapur Cofdece terval estmato The cofdece tervals multple

More information

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS Exam: ECON430 Statstcs Date of exam: Frday, December 8, 07 Grades are gve: Jauary 4, 08 Tme for exam: 0900 am 00 oo The problem set covers 5 pages Resources allowed:

More information

Lecture Notes Types of economic variables

Lecture Notes Types of economic variables Lecture Notes 3 1. Types of ecoomc varables () Cotuous varable takes o a cotuum the sample space, such as all pots o a le or all real umbers Example: GDP, Polluto cocetrato, etc. () Dscrete varables fte

More information

Summary of the lecture in Biostatistics

Summary of the lecture in Biostatistics Summary of the lecture Bostatstcs Probablty Desty Fucto For a cotuos radom varable, a probablty desty fucto s a fucto such that: 0 dx a b) b a dx A probablty desty fucto provdes a smple descrpto of the

More information

Objectives of Multiple Regression

Objectives of Multiple Regression Obectves of Multple Regresso Establsh the lear equato that best predcts values of a depedet varable Y usg more tha oe eplaator varable from a large set of potetal predctors {,,... k }. Fd that subset of

More information

Statistics: Unlocking the Power of Data Lock 5

Statistics: Unlocking the Power of Data Lock 5 STAT 0 Dr. Kar Lock Morga Exam 2 Grades: I- Class Multple Regresso SECTIONS 9.2, 0., 0.2 Multple explaatory varables (0.) Parttog varablty R 2, ANOVA (9.2) Codtos resdual plot (0.2) Exam 2 Re- grades Re-

More information

Chapter 14 Logistic Regression Models

Chapter 14 Logistic Regression Models Chapter 4 Logstc Regresso Models I the lear regresso model X β + ε, there are two types of varables explaatory varables X, X,, X k ad study varable y These varables ca be measured o a cotuous scale as

More information

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions. Ordary Least Squares egresso. Smple egresso. Algebra ad Assumptos. I ths part of the course we are gog to study a techque for aalysg the lear relatoshp betwee two varables Y ad X. We have pars of observatos

More information

Probability and. Lecture 13: and Correlation

Probability and. Lecture 13: and Correlation 933 Probablty ad Statstcs for Software ad Kowledge Egeers Lecture 3: Smple Lear Regresso ad Correlato Mocha Soptkamo, Ph.D. Outle The Smple Lear Regresso Model (.) Fttg the Regresso Le (.) The Aalyss of

More information

12.2 Estimating Model parameters Assumptions: ox and y are related according to the simple linear regression model

12.2 Estimating Model parameters Assumptions: ox and y are related according to the simple linear regression model 1. Estmatg Model parameters Assumptos: ox ad y are related accordg to the smple lear regresso model (The lear regresso model s the model that says that x ad y are related a lear fasho, but the observed

More information

Econometric Methods. Review of Estimation

Econometric Methods. Review of Estimation Ecoometrc Methods Revew of Estmato Estmatg the populato mea Radom samplg Pot ad terval estmators Lear estmators Ubased estmators Lear Ubased Estmators (LUEs) Effcecy (mmum varace) ad Best Lear Ubased Estmators

More information

The equation is sometimes presented in form Y = a + b x. This is reasonable, but it s not the notation we use.

The equation is sometimes presented in form Y = a + b x. This is reasonable, but it s not the notation we use. INTRODUCTORY NOTE ON LINEAR REGREION We have data of the form (x y ) (x y ) (x y ) These wll most ofte be preseted to us as two colum of a spreadsheet As the topc develops we wll see both upper case ad

More information

Midterm Exam 1, section 2 (Solution) Thursday, February hour, 15 minutes

Midterm Exam 1, section 2 (Solution) Thursday, February hour, 15 minutes coometrcs, CON Sa Fracsco State Uverst Mchael Bar Sprg 5 Mdterm xam, secto Soluto Thursda, Februar 6 hour, 5 mutes Name: Istructos. Ths s closed book, closed otes exam.. No calculators of a kd are allowed..

More information

Example: Multiple linear regression. Least squares regression. Repetition: Simple linear regression. Tron Anders Moger

Example: Multiple linear regression. Least squares regression. Repetition: Simple linear regression. Tron Anders Moger Example: Multple lear regresso 5000,00 4000,00 Tro Aders Moger 0.0.007 brthweght 3000,00 000,00 000,00 0,00 50,00 00,00 50,00 00,00 50,00 weght pouds Repetto: Smple lear regresso We defe a model Y = β0

More information

Lecture 3. Sampling, sampling distributions, and parameter estimation

Lecture 3. Sampling, sampling distributions, and parameter estimation Lecture 3 Samplg, samplg dstrbutos, ad parameter estmato Samplg Defto Populato s defed as the collecto of all the possble observatos of terest. The collecto of observatos we take from the populato s called

More information

ECON 482 / WH Hong The Simple Regression Model 1. Definition of the Simple Regression Model

ECON 482 / WH Hong The Simple Regression Model 1. Definition of the Simple Regression Model ECON 48 / WH Hog The Smple Regresso Model. Defto of the Smple Regresso Model Smple Regresso Model Expla varable y terms of varable x y = β + β x+ u y : depedet varable, explaed varable, respose varable,

More information

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE THE ROYAL STATISTICAL SOCIETY 00 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE PAPER I STATISTICAL THEORY The Socety provdes these solutos to assst caddates preparg for the examatos future years ad for the

More information

Regresso What s a Model? 1. Ofte Descrbe Relatoshp betwee Varables 2. Types - Determstc Models (o radomess) - Probablstc Models (wth radomess) EPI 809/Sprg 2008 9 Determstc Models 1. Hypothesze

More information

ESS Line Fitting

ESS Line Fitting ESS 5 014 17. Le Fttg A very commo problem data aalyss s lookg for relatoshpetwee dfferet parameters ad fttg les or surfaces to data. The smplest example s fttg a straght le ad we wll dscuss that here

More information

Linear Regression with One Regressor

Linear Regression with One Regressor Lear Regresso wth Oe Regressor AIM QA.7. Expla how regresso aalyss ecoometrcs measures the relatoshp betwee depedet ad depedet varables. A regresso aalyss has the goal of measurg how chages oe varable,

More information

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ  1 STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS Recall Assumpto E(Y x) η 0 + η x (lear codtoal mea fucto) Data (x, y ), (x 2, y 2 ),, (x, y ) Least squares estmator ˆ E (Y x) ˆ " 0 + ˆ " x, where ˆ

More information

CHAPTER VI Statistical Analysis of Experimental Data

CHAPTER VI Statistical Analysis of Experimental Data Chapter VI Statstcal Aalyss of Expermetal Data CHAPTER VI Statstcal Aalyss of Expermetal Data Measuremets do ot lead to a uque value. Ths s a result of the multtude of errors (maly radom errors) that ca

More information

Functions of Random Variables

Functions of Random Variables Fuctos of Radom Varables Chapter Fve Fuctos of Radom Varables 5. Itroducto A geeral egeerg aalyss model s show Fg. 5.. The model output (respose) cotas the performaces of a system or product, such as weght,

More information

Lecture 1 Review of Fundamental Statistical Concepts

Lecture 1 Review of Fundamental Statistical Concepts Lecture Revew of Fudametal Statstcal Cocepts Measures of Cetral Tedecy ad Dsperso A word about otato for ths class: Idvduals a populato are desgated, where the dex rages from to N, ad N s the total umber

More information

Logistic regression (continued)

Logistic regression (continued) STAT562 page 138 Logstc regresso (cotued) Suppose we ow cosder more complex models to descrbe the relatoshp betwee a categorcal respose varable (Y) that takes o two (2) possble outcomes ad a set of p explaatory

More information

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #1

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #1 STA 08 Appled Lear Models: Regresso Aalyss Sprg 0 Soluto for Homework #. Let Y the dollar cost per year, X the umber of vsts per year. The the mathematcal relato betwee X ad Y s: Y 300 + X. Ths s a fuctoal

More information

ε. Therefore, the estimate

ε. Therefore, the estimate Suggested Aswers, Problem Set 3 ECON 333 Da Hugerma. Ths s ot a very good dea. We kow from the secod FOC problem b) that ( ) SSE / = y x x = ( ) Whch ca be reduced to read y x x = ε x = ( ) The OLS model

More information

Chapter 13, Part A Analysis of Variance and Experimental Design. Introduction to Analysis of Variance. Introduction to Analysis of Variance

Chapter 13, Part A Analysis of Variance and Experimental Design. Introduction to Analysis of Variance. Introduction to Analysis of Variance Chapter, Part A Aalyss of Varace ad Epermetal Desg Itroducto to Aalyss of Varace Aalyss of Varace: Testg for the Equalty of Populato Meas Multple Comparso Procedures Itroducto to Aalyss of Varace Aalyss

More information

Module 7: Probability and Statistics

Module 7: Probability and Statistics Lecture 4: Goodess of ft tests. Itroducto Module 7: Probablty ad Statstcs I the prevous two lectures, the cocepts, steps ad applcatos of Hypotheses testg were dscussed. Hypotheses testg may be used to

More information

Chapter Two. An Introduction to Regression ( )

Chapter Two. An Introduction to Regression ( ) ubject: A Itroducto to Regresso Frst tage Chapter Two A Itroducto to Regresso (018-019) 1 pg. ubject: A Itroducto to Regresso Frst tage A Itroducto to Regresso Regresso aalss s a statstcal tool for the

More information

Correlation and Simple Linear Regression

Correlation and Simple Linear Regression Correlato ad Smple Lear Regresso Berl Che Departmet of Computer Scece & Iformato Egeerg Natoal Tawa Normal Uverst Referece:. W. Navd. Statstcs for Egeerg ad Scetsts. Chapter 7 (7.-7.3) & Teachg Materal

More information

Chapter Business Statistics: A First Course Fifth Edition. Learning Objectives. Correlation vs. Regression. In this chapter, you learn:

Chapter Business Statistics: A First Course Fifth Edition. Learning Objectives. Correlation vs. Regression. In this chapter, you learn: Chapter 3 3- Busess Statstcs: A Frst Course Ffth Edto Chapter 2 Correlato ad Smple Lear Regresso Busess Statstcs: A Frst Course, 5e 29 Pretce-Hall, Ic. Chap 2- Learg Objectves I ths chapter, you lear:

More information

: At least two means differ SST

: At least two means differ SST Formula Card for Eam 3 STA33 ANOVA F-Test: Completely Radomzed Desg ( total umber of observatos, k = Number of treatmets,& T = total for treatmet ) Step : Epress the Clam Step : The ypotheses: :... 0 A

More information

CHAPTER 2. = y ˆ β x (.1022) So we can write

CHAPTER 2. = y ˆ β x (.1022) So we can write CHAPTER SOLUTIONS TO PROBLEMS. () Let y = GPA, x = ACT, ad = 8. The x = 5.875, y = 3.5, (x x )(y y ) = 5.85, ad (x x ) = 56.875. From equato (.9), we obta the slope as ˆβ = = 5.85/56.875., rouded to four

More information

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution: Chapter 4 Exercses Samplg Theory Exercse (Smple radom samplg: Let there be two correlated radom varables X ad A sample of sze s draw from a populato by smple radom samplg wthout replacemet The observed

More information

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA THE ROYAL STATISTICAL SOCIETY EXAMINATIONS SOLUTIONS GRADUATE DIPLOMA PAPER II STATISTICAL THEORY & METHODS The Socety provdes these solutos to assst caddates preparg for the examatos future years ad for

More information

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS Postpoed exam: ECON430 Statstcs Date of exam: Jauary 0, 0 Tme for exam: 09:00 a.m. :00 oo The problem set covers 5 pages Resources allowed: All wrtte ad prted

More information

Point Estimation: definition of estimators

Point Estimation: definition of estimators Pot Estmato: defto of estmators Pot estmator: ay fucto W (X,..., X ) of a data sample. The exercse of pot estmato s to use partcular fuctos of the data order to estmate certa ukow populato parameters.

More information

ECONOMETRIC THEORY. MODULE VIII Lecture - 26 Heteroskedasticity

ECONOMETRIC THEORY. MODULE VIII Lecture - 26 Heteroskedasticity ECONOMETRIC THEORY MODULE VIII Lecture - 6 Heteroskedastcty Dr. Shalabh Departmet of Mathematcs ad Statstcs Ida Isttute of Techology Kapur . Breusch Paga test Ths test ca be appled whe the replcated data

More information

ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections

ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections ENGI 441 Jot Probablty Dstrbutos Page 7-01 Jot Probablty Dstrbutos [Navd sectos.5 ad.6; Devore sectos 5.1-5.] The jot probablty mass fucto of two dscrete radom quattes, s, P ad p x y x y The margal probablty

More information

C. Statistics. X = n geometric the n th root of the product of numerical data ln X GM = or ln GM = X 2. X n X 1

C. Statistics. X = n geometric the n th root of the product of numerical data ln X GM = or ln GM = X 2. X n X 1 C. Statstcs a. Descrbe the stages the desg of a clcal tral, takg to accout the: research questos ad hypothess, lterature revew, statstcal advce, choce of study protocol, ethcal ssues, data collecto ad

More information

Simple Linear Regression - Scalar Form

Simple Linear Regression - Scalar Form Smple Lear Regresso - Scalar Form Q.. Model Y X,..., p..a. Derve the ormal equatos that mmze Q. p..b. Solve for the ordary least squares estmators, p..c. Derve E, V, E, V, COV, p..d. Derve the mea ad varace

More information

Regression. Linear Regression. A Simple Data Display. A Batch of Data. The Mean is 220. A Value of 474. STAT Handout Module 15 1 st of June 2009

Regression. Linear Regression. A Simple Data Display. A Batch of Data. The Mean is 220. A Value of 474. STAT Handout Module 15 1 st of June 2009 STAT Hadout Module 5 st of Jue 9 Lear Regresso Regresso Joh D. Sork, M.D. Ph.D. Baltmore VA Medcal Ceter GRCC ad Uversty of Marylad School of Medce Claude D. Pepper Older Amercas Idepedece Ceter Reducg

More information

STA 105-M BASIC STATISTICS (This is a multiple choice paper.)

STA 105-M BASIC STATISTICS (This is a multiple choice paper.) DCDM BUSINESS SCHOOL September Mock Eamatos STA 0-M BASIC STATISTICS (Ths s a multple choce paper.) Tme: hours 0 mutes INSTRUCTIONS TO CANDIDATES Do ot ope ths questo paper utl you have bee told to do

More information

Chapter 11 The Analysis of Variance

Chapter 11 The Analysis of Variance Chapter The Aalyss of Varace. Oe Factor Aalyss of Varace. Radomzed Bloc Desgs (ot for ths course) NIPRL . Oe Factor Aalyss of Varace.. Oe Factor Layouts (/4) Suppose that a expermeter s terested populatos

More information

Chapter 5 Properties of a Random Sample

Chapter 5 Properties of a Random Sample Lecture 6 o BST 63: Statstcal Theory I Ku Zhag, /0/008 Revew for the prevous lecture Cocepts: t-dstrbuto, F-dstrbuto Theorems: Dstrbutos of sample mea ad sample varace, relatoshp betwee sample mea ad sample

More information

residual. (Note that usually in descriptions of regression analysis, upper-case

residual. (Note that usually in descriptions of regression analysis, upper-case Regresso Aalyss Regresso aalyss fts or derves a model that descres the varato of a respose (or depedet ) varale as a fucto of oe or more predctor (or depedet ) varales. The geeral regresso model s oe of

More information

1. The weight of six Golden Retrievers is 66, 61, 70, 67, 92 and 66 pounds. The weight of six Labrador Retrievers is 54, 60, 72, 78, 84 and 67.

1. The weight of six Golden Retrievers is 66, 61, 70, 67, 92 and 66 pounds. The weight of six Labrador Retrievers is 54, 60, 72, 78, 84 and 67. Ecoomcs 3 Itroducto to Ecoometrcs Sprg 004 Professor Dobk Name Studet ID Frst Mdterm Exam You must aswer all the questos. The exam s closed book ad closed otes. You may use your calculators but please

More information

Applied Statistics and Probability for Engineers, 5 th edition February 23, b) y ˆ = (85) =

Applied Statistics and Probability for Engineers, 5 th edition February 23, b) y ˆ = (85) = Appled Statstcs ad Probablty for Egeers, 5 th edto February 3, y.8.7.6.5.4.3.. -5 5 5 x b) y ˆ.3999 +.46(85).6836 c) y ˆ.3999 +.46(9).744 d) ˆ.46-3 a) Regresso Aalyss: Ratg Pots versus Meters per Att The

More information

Chapter 3 Sampling For Proportions and Percentages

Chapter 3 Sampling For Proportions and Percentages Chapter 3 Samplg For Proportos ad Percetages I may stuatos, the characterstc uder study o whch the observatos are collected are qualtatve ature For example, the resposes of customers may marketg surveys

More information

Statistics. Correlational. Dr. Ayman Eldeib. Simple Linear Regression and Correlation. SBE 304: Linear Regression & Correlation 1/3/2018

Statistics. Correlational. Dr. Ayman Eldeib. Simple Linear Regression and Correlation. SBE 304: Linear Regression & Correlation 1/3/2018 /3/08 Sstems & Bomedcal Egeerg Departmet SBE 304: Bo-Statstcs Smple Lear Regresso ad Correlato Dr. Ama Eldeb Fall 07 Descrptve Orgasg, summarsg & descrbg data Statstcs Correlatoal Relatoshps Iferetal Geeralsg

More information

Chapter 2 Supplemental Text Material

Chapter 2 Supplemental Text Material -. Models for the Data ad the t-test Chapter upplemetal Text Materal The model preseted the text, equato (-3) s more properl called a meas model. ce the mea s a locato parameter, ths tpe of model s also

More information

ln( weekly earn) age age

ln( weekly earn) age age Problem Set 4, ECON 3033 (Due at the start of class, Wedesday, February 4, 04) (Questos marked wth a * are old test questos) Bll Evas Sprg 08. Cosder a multvarate regresso model of the form y 0 x x. Wrte

More information

Handout #8. X\Y f(x) 0 1/16 1/ / /16 3/ / /16 3/16 0 3/ /16 1/16 1/8 g(y) 1/16 1/4 3/8 1/4 1/16 1

Handout #8. X\Y f(x) 0 1/16 1/ / /16 3/ / /16 3/16 0 3/ /16 1/16 1/8 g(y) 1/16 1/4 3/8 1/4 1/16 1 Hadout #8 Ttle: Foudatos of Ecoometrcs Course: Eco 367 Fall/05 Istructor: Dr. I-Mg Chu Lear Regresso Model So far we have focused mostly o the study of a sgle radom varable, ts correspodg theoretcal dstrbuto,

More information

The number of observed cases The number of parameters. ith case of the dichotomous dependent variable. the ith case of the jth parameter

The number of observed cases The number of parameters. ith case of the dichotomous dependent variable. the ith case of the jth parameter LOGISTIC REGRESSION Notato Model Logstc regresso regresses a dchotomous depedet varable o a set of depedet varables. Several methods are mplemeted for selectg the depedet varables. The followg otato s

More information

Multivariate Transformation of Variables and Maximum Likelihood Estimation

Multivariate Transformation of Variables and Maximum Likelihood Estimation Marquette Uversty Multvarate Trasformato of Varables ad Maxmum Lkelhood Estmato Dael B. Rowe, Ph.D. Assocate Professor Departmet of Mathematcs, Statstcs, ad Computer Scece Copyrght 03 by Marquette Uversty

More information

Lecture 2: Linear Least Squares Regression

Lecture 2: Linear Least Squares Regression Lecture : Lear Least Squares Regresso Dave Armstrog UW Mlwaukee February 8, 016 Is the Relatoshp Lear? lbrary(car) data(davs) d 150) Davs$weght[d]

More information

4. Standard Regression Model and Spatial Dependence Tests

4. Standard Regression Model and Spatial Dependence Tests 4. Stadard Regresso Model ad Spatal Depedece Tests Stadard regresso aalss fals the presece of spatal effects. I case of spatal depedeces ad/or spatal heterogeet a stadard regresso model wll be msspecfed.

More information

Special Instructions / Useful Data

Special Instructions / Useful Data JAM 6 Set of all real umbers P A..d. B, p Posso Specal Istructos / Useful Data x,, :,,, x x Probablty of a evet A Idepedetly ad detcally dstrbuted Bomal dstrbuto wth parameters ad p Posso dstrbuto wth

More information

Multiple Choice Test. Chapter Adequacy of Models for Regression

Multiple Choice Test. Chapter Adequacy of Models for Regression Multple Choce Test Chapter 06.0 Adequac of Models for Regresso. For a lear regresso model to be cosdered adequate, the percetage of scaled resduals that eed to be the rage [-,] s greater tha or equal to

More information

Chapter Statistics Background of Regression Analysis

Chapter Statistics Background of Regression Analysis Chapter 06.0 Statstcs Backgroud of Regresso Aalyss After readg ths chapter, you should be able to:. revew the statstcs backgroud eeded for learg regresso, ad. kow a bref hstory of regresso. Revew of Statstcal

More information

ENGI 4421 Propagation of Error Page 8-01

ENGI 4421 Propagation of Error Page 8-01 ENGI 441 Propagato of Error Page 8-01 Propagato of Error [Navd Chapter 3; ot Devore] Ay realstc measuremet procedure cotas error. Ay calculatos based o that measuremet wll therefore also cota a error.

More information

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model Chapter 3 Asmptotc Theor ad Stochastc Regressors The ature of eplaator varable s assumed to be o-stochastc or fed repeated samples a regresso aalss Such a assumpto s approprate for those epermets whch

More information

Lecture 1: Introduction to Regression

Lecture 1: Introduction to Regression Lecture : Itroducto to Regresso A Eample: Eplag State Homcde Rates What kds of varables mght we use to epla/predct state homcde rates? Let s cosder just oe predctor for ow: povert Igore omtted varables,

More information

Previous lecture. Lecture 8. Learning outcomes of this lecture. Today. Statistical test and Scales of measurement. Correlation

Previous lecture. Lecture 8. Learning outcomes of this lecture. Today. Statistical test and Scales of measurement. Correlation Lecture 8 Emprcal Research Methods I434 Quattatve Data aalss II Relatos Prevous lecture Idea behd hpothess testg Is the dfferece betwee two samples a reflecto of the dfferece of two dfferet populatos or

More information

Chapter 2 Simple Linear Regression

Chapter 2 Simple Linear Regression Chapter Smple Lear Regresso. Itroducto ad Least Squares Estmates Regresso aalyss s a method for vestgatg the fuctoal relatoshp amog varables. I ths chapter we cosder problems volvg modelg the relatoshp

More information

THE ROYAL STATISTICAL SOCIETY 2016 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE MODULE 5

THE ROYAL STATISTICAL SOCIETY 2016 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE MODULE 5 THE ROYAL STATISTICAL SOCIETY 06 EAMINATIONS SOLUTIONS HIGHER CERTIFICATE MODULE 5 The Socety s provdg these solutos to assst cadtes preparg for the examatos 07. The solutos are teded as learg ads ad should

More information

is the score of the 1 st student, x

is the score of the 1 st student, x 8 Chapter Collectg, Dsplayg, ad Aalyzg your Data. Descrptve Statstcs Sectos explaed how to choose a sample, how to collect ad orgaze data from the sample, ad how to dsplay your data. I ths secto, you wll

More information

Discrete Mathematics and Probability Theory Fall 2016 Seshia and Walrand DIS 10b

Discrete Mathematics and Probability Theory Fall 2016 Seshia and Walrand DIS 10b CS 70 Dscrete Mathematcs ad Probablty Theory Fall 206 Sesha ad Walrad DIS 0b. Wll I Get My Package? Seaky delvery guy of some compay s out delverg packages to customers. Not oly does he had a radom package

More information

Chapter 4 Multiple Random Variables

Chapter 4 Multiple Random Variables Revew for the prevous lecture: Theorems ad Examples: How to obta the pmf (pdf) of U = g (, Y) ad V = g (, Y) Chapter 4 Multple Radom Varables Chapter 44 Herarchcal Models ad Mxture Dstrbutos Examples:

More information

Continuous Distributions

Continuous Distributions 7//3 Cotuous Dstrbutos Radom Varables of the Cotuous Type Desty Curve Percet Desty fucto, f (x) A smooth curve that ft the dstrbuto 3 4 5 6 7 8 9 Test scores Desty Curve Percet Probablty Desty Fucto, f

More information

2.28 The Wall Street Journal is probably referring to the average number of cubes used per glass measured for some population that they have chosen.

2.28 The Wall Street Journal is probably referring to the average number of cubes used per glass measured for some population that they have chosen. .5 x 54.5 a. x 7. 786 7 b. The raked observatos are: 7.4, 7.5, 7.7, 7.8, 7.9, 8.0, 8.. Sce the sample sze 7 s odd, the meda s the (+)/ 4 th raked observato, or meda 7.8 c. The cosumer would more lkely

More information

Homework Solution (#5)

Homework Solution (#5) Homework Soluto (# Chapter : #6,, 8(b, 3, 4, 44, 49, 3, 9 ad 7 Chapter. Smple Lear Regresso ad Correlato.6 (6 th edto 7, old edto Page 9 Rafall volume ( vs Ruoff volume ( : 9 8 7 6 4 3 : a. Yes, the scatter-plot

More information

Simulation Output Analysis

Simulation Output Analysis Smulato Output Aalyss Summary Examples Parameter Estmato Sample Mea ad Varace Pot ad Iterval Estmato ermatg ad o-ermatg Smulato Mea Square Errors Example: Sgle Server Queueg System x(t) S 4 S 4 S 3 S 5

More information

Sum Mean n

Sum Mean n tatstcal Methods I (EXT 75) Page 147 ummary data Itermedate Calculatos X = 83 Y = 8 X = 51 Y = 368 Mea of X = X = 5.1875 Mea of Y = Y = 14.5 XY = 1348 = 16 Correcto factors ad Corrected values (ums of

More information

Analysis of Variance with Weibull Data

Analysis of Variance with Weibull Data Aalyss of Varace wth Webull Data Lahaa Watthaacheewaul Abstract I statstcal data aalyss by aalyss of varace, the usual basc assumptos are that the model s addtve ad the errors are radomly, depedetly, ad

More information

Investigation of Partially Conditional RP Model with Response Error. Ed Stanek

Investigation of Partially Conditional RP Model with Response Error. Ed Stanek Partally Codtoal Radom Permutato Model 7- vestgato of Partally Codtoal RP Model wth Respose Error TRODUCTO Ed Staek We explore the predctor that wll result a smple radom sample wth respose error whe a

More information

Simple Linear Regression and Correlation. Applied Statistics and Probability for Engineers. Chapter 11 Simple Linear Regression and Correlation

Simple Linear Regression and Correlation. Applied Statistics and Probability for Engineers. Chapter 11 Simple Linear Regression and Correlation 4//6 Appled Statstcs ad Probablty for Egeers Sth Edto Douglas C. Motgomery George C. Ruger Chapter Smple Lear Regresso ad Correlato CHAPTER OUTLINE Smple Lear Regresso ad Correlato - Emprcal Models -8

More information

MEASURES OF DISPERSION

MEASURES OF DISPERSION MEASURES OF DISPERSION Measure of Cetral Tedecy: Measures of Cetral Tedecy ad Dsperso ) Mathematcal Average: a) Arthmetc mea (A.M.) b) Geometrc mea (G.M.) c) Harmoc mea (H.M.) ) Averages of Posto: a) Meda

More information

22 Nonparametric Methods.

22 Nonparametric Methods. 22 oparametrc Methods. I parametrc models oe assumes apror that the dstrbutos have a specfc form wth oe or more ukow parameters ad oe tres to fd the best or atleast reasoably effcet procedures that aswer

More information

Lecture 3 Probability review (cont d)

Lecture 3 Probability review (cont d) STATS 00: Itroducto to Statstcal Iferece Autum 06 Lecture 3 Probablty revew (cot d) 3. Jot dstrbutos If radom varables X,..., X k are depedet, the ther dstrbuto may be specfed by specfyg the dvdual dstrbuto

More information

Recall MLR 5 Homskedasticity error u has the same variance given any values of the explanatory variables Var(u x1,...,xk) = 2 or E(UU ) = 2 I

Recall MLR 5 Homskedasticity error u has the same variance given any values of the explanatory variables Var(u x1,...,xk) = 2 or E(UU ) = 2 I Chapter 8 Heterosedastcty Recall MLR 5 Homsedastcty error u has the same varace gve ay values of the eplaatory varables Varu,..., = or EUU = I Suppose other GM assumptos hold but have heterosedastcty.

More information

Bayes (Naïve or not) Classifiers: Generative Approach

Bayes (Naïve or not) Classifiers: Generative Approach Logstc regresso Bayes (Naïve or ot) Classfers: Geeratve Approach What do we mea by Geeratve approach: Lear p(y), p(x y) ad the apply bayes rule to compute p(y x) for makg predctos Ths s essetally makg

More information

TESTS BASED ON MAXIMUM LIKELIHOOD

TESTS BASED ON MAXIMUM LIKELIHOOD ESE 5 Toy E. Smth. The Basc Example. TESTS BASED ON MAXIMUM LIKELIHOOD To llustrate the propertes of maxmum lkelhood estmates ad tests, we cosder the smplest possble case of estmatg the mea of the ormal

More information