Today s stc rgrsson tocs Lctur 15: Effct modfcaton, and confoundng n stc rgrsson Sandy Eckl sckl@jhsh.du 16 May 28 Includng catgorcal rdctor crat dummy/ndcator varabls just lk for lnar rgrsson Comarng nstd modls that dffr by two or mor varabls for stc rgrsson Ch-squar (X 2 ) Tst of Dvanc.., lklhood rato tst anaous to th F-tst for nstd modls n lnar rgrsson Effct Modfcaton and Confoundng 1 2 Examl Man SAT scors wr comard for th 5 US stats. Th goal of th study was to comar ovrall SAT scors usng stat-wd rdctors such as r-ul xndturs avrag tachrs salary Varabls Outcom Total SAT scor [sat_low] 1=low, =hgh Prmary rdctor Avrag xndturs r ul [xn] n thousands Contnuous, rang: 3.65-9.77, man: 5.9 Dosn t nclud : cntr at $5, r ul Scondary rdctor Man tachr salary n thousands, n quartls salary1 lowst quartl salary2 2 nd quartl salary3 3 rd quartl salary4 hghst quartl four dummy varabls for four catgors; must xclud on catgory to crat a rfrnc grou 3 4
Analyss Plan Modls and Rsults (not that only xonntatd slos ar shown) Assss rmary rlatonsh (arnt modl) Add scondary rdctor n sarat modl (xtndd modl) Dtrmn f scondary rdctor s statstcally sgnfcant How? Us th Ch-squar tst of dvanc 5 = β + ( Exndtur 5) Modl 1 (Parnt): Only rmary rdctor sat_low Odds Rato Std. Err. z P> z [95% Conf. Intrval] -------------+---------------------------------------------------------------- xnc 2.48476.8246782 2.74.6 1.296462 4.7621 = β + ( Exndtur 5) + I ( Salary = 2) + I( Salary = 3) + β4i ( Salary = Modl 2 (Extndd): Prmary Prdctor and Scondary Prdctor sat_low Odds Rato Std. Err. z P> z [95% Conf. Intrval] -------------+---------------------------------------------------------------- xnc 1.796861.7982988 1.32.187.7522251 4.292213 salary2 2.783137 2.815949 1.1.312.383872 2.21955 salary3 2.923654 3.2716.96.338.326154 26.2773 salary4 4.362678 6.14715 1.5.296.2756828 69.3933 4) 6 Th X 2 Tst of Dvanc Prformng th Ch-squar tst of dvanc for nstd stc rgrsson W want to comar th arnt modl to an xtndd modl, whch dffrs by th thr dummy varabls for th four salary quartls. Th X 2 tst of dvanc comars nstd stc rgrsson modls W us t for nstd modls that dffr by two or mor varabls bcaus th Wald tst cannot b usd n that stuaton 1. Gt th lklhood (LL) from both modls Parnt modl: LL = -28.94 Extndd modl: LL = -28.25 2. Fnd th dvanc for both modls Dvanc = -2( lklhood) Parnt modl: Dvanc = -2(-28.94) = 57.88 Extndd modl: Dvanc = -2(-28.25) = 56.5 Dvanc s anaous to rsdual sums of squars (RSS) n lnar rgrsson; t masurs th dvaton stll avalabl n th modl A saturatd modl s on n whch vry Y s rfctly rdctd 7 8
Prformng th Ch-squar tst of dvanc for nstd stc rgrsson, cont Th Ch-squar tst of dvanc for our nstd stc rgrsson xaml 3. Fnd th chang n dvanc btwn th nstd modls = dvanc arnt dvanc xtndd = 57.88-56.5 = 1.38 = Tst Statstc (X 2 ) 4. Evaluat th chang n dvanc Th chang n dvanc s an obsrvd Ch-squar statstc df = # of varabls addd H : all nw β s ar n th oulaton.., H : th arnt modl s bttr 9 H : Aftr adjustng for r-ul xndturs, all th slos on salary ndcators ar (β 2 = β 3 = β 4 = ) X 2 obs = 1.38 df = 3 Wth 3 df and α=.5, X 2 cr s 7.81 X 2 obs < X 2 cr Fal to rjct H Conclud: Aftr adjustng for r-ul xndtur, tachrs salary s not a statstcally sgnfcant rdctor of low SAT scors 1 Nots about Ch-squar dvanc tst Th dvanc tst gvs us a framwork n whch to add svral rdctors to a modl smultanously Can only handl nstd modls Anaous to F-tst for lnar rgrsson Also known as "lklhood rato tst" 11 How can I do th Ch-squar dvanc tst n R? 1. Ft arnt modl ft.arnt <- glm(y~x1, famly=bnomal()) 2. Ft th xtndd modl (arnt modl s nstd wthn th xtndd modl) ft.xtndd <- glm(y~x1+x2+x3, famly=bnomal()) 3. Prform th Ch-squar dvanc tst anova(ft.arnt, ft.xtndd, tst="ch") Examl outut: Analyss of Dvanc Tabl Modl 1: y ~ x1 Ch-squar Tst Statstc Modl 2: y ~ x1 + x2 + x3 Rsd. Df Rsd. Dv Df Dvanc P(> Ch ) 1 48 64.25 2 46 48.821 2 15.429.4464 P-valu Dgrs of frdom 12
Effct modfcaton n stc rgrsson Effct Modfcaton and Confoundng n Logstc Rgrsson Hart Dsas Smokng and Coff Examl Just lk wth lnar rgrsson, w may want to allow dffrnt rlatonshs btwn th rmary rdctor and outcom across lvls of anothr covarat W can modl such rlatonshs by fttng ntracton trms n stc rgrssons Modllng ffct modfcaton wll rqur dalng wth two or mor covarats 13 14 Logstc modls wth two covarats t() = β + β 1 X 1 + β 2 X 2 Thn: t( X 1 =X 1 +1,X 2 =X 2 ) = β + β 1 (X 1 +1)+ β 2 X 2 t( X 1 =X 1,X 2 =X 2 ) = β + β 1 (X 1 )+ β 2 X 2 n -odds = β 1 β 1 s th chang n -odds for a 1 unt chang n X 1 rovdd X 2 s hld constant. 15 Intrrtaton n Gnral Also: = β odds(y = 1 X + 1,X ) 1 2 1 odds(y = 1 X,X ) 1 2 And: OR = x(β 1 )!! x(β 1 ) s th multlcatv chang n odds for a 1 unt ncras n X 1 rovdd X 2 s hld constant. Th rsult s smlar for X 2 What f th ffcts of ach of X 1 and X 2 dnd on th rsnc of th othr? Effct modfcaton! 16
Data: Coronary Hart Dsas (CHD), Smokng and Coff Study Informaton n = 151 Study Facts: Cas-Control study (dsas = CHD) 4-5 yar-old mals rvously n good halth Study qustons: Is smokng and/or coff rlatd to an ncrasd odds of CHD? Is th assocaton of coff wth CHD hghr among smokrs? That s, s smokng an ffct modfr of th coff-chd assocatons? 17 18 Fracton wth CHD by smokng and coff Poold data (gnorng smokng) Numbr n ach cll s th roorton of th total numbr of ndvduals wth that smokng/coff combnaton that hav CHD 19 Odds rato of CHD comarng coff to noncoff drnkrs.53/(1.53) = 2.2.34 /(1.34) 95% CI = (1.14, 4.24) 2
Among Non-Smokrs Among Smokrs P(CHD Coff drnkr) = 15/(15+21) =.42 P(CHD Not Coff drnkr) = 15/(15+42) =.26 Odds rato of CHD comarng coff to noncoff drnkrs.42 /(1.42) = 2.6.26 /(1.26) 95% CI = (.82, 4.9) P(CHD Coff drnkr) = 25/(25+14) =.64 P(CHD Not Coff drnkr) = 11/(11+8) =.58 Odds rato of CHD comarng coff to noncoff drnkrs.64 /(1.64) = 1.29.58/(1.58) 95% CI = (.42, 4.) 21 22 Plot Odds Ratos and 95% CIs Dfn Varabls Y = 1 f CHD cas, f control coff = 1 f Coff Drnkr, f not smok = 1 f Smokr, f not = Pr (Y = 1) n = Numbr obsrvd at attrn of Xs 23 24
Logstc Rgrsson Modl Y ar ndndnt Random art Y ar from a Bnomal (n, ) dstrbuton Systmatc art odds (Y =1) (or t( Y =1) ) s a functon of Coff Smokng and coff-smokng ntracton = + coff + smok + coff smok β 25 Intrrtatons stratfy by smokng status = + coff + smok + coff smok β = β + coff 1 If smok = If smok = 1 = β + coff + 1+ coff 1 = ( β + ) + ( + ) 1 x(β 1 ): odds rato of bng a CHD cas for coff drnkrs -vs- non-drnkrs among non-smokrs x(β 1 +β 3 ): odds rato of bng a CHD cas for coff drnkrs -vs- non-drnkrs among smokrs coff 26 Intrrtatons stratfy by coff drnkng = + coff + smok + coff smok β = β + smok 1 If coff = If coff = 1 = β + 1+ smok + 1 smok = ( β + ) + ( + ) 1 x(β 2 ): odds rato of bng a CHD cas for smokrs -vs- non-smokrs among noncoff drnkrs x(β 2 +β 3 ): odds rato of bng a CHD cas for smokrs -vs- non-smokrs among coff drnkrs smok 27 Intrrtatons = + coff + smok + coff smok β β β 1 + Probablty of CHD f all X s ar zro.., fracton of cass among non- smokng noncoff drnkng ndvduals n th saml (dtrmnd by samlng lan) x(β 3 ): rato of odds ratos What do w man by ths? 28
x(β 3 ) Intrrtatons = + coff + smok + coff smok β x(β 3 ): factor by whch odds rato of bng a CHD cas for coff drnkrs -vs- nondrnkrs s multld for smokrs as comard to non-smokrs or x(β 3 ): factor by whch odds rato of bng a CHD cas for smokrs -vs- non-smokrs s multld for coff drnkrs as comard to non-coff drnkrs COMMON IDEA: Addtonal multlcatv chang n th odds rato byond th smokng or coff drnkng ffct alon whn you hav both of ths rsk factors rsnt 29 Som Scal Cass: No smokng or coff drnkng ffcts Gvn coff smok coff smok = β + + + If β 1 = β 2 = β 3 = Nthr smokng nor coff drnkng s assocatd wth ncrasd rsk of CHD 3 Som Scal Cass: Only on ffct Som Scal Cass Gvn = + coff + smok + coff smok β If β 2 = β 3 = Coff drnkng, but not smokng, s assocatd wth ncrasd rsk of CHD If β 1 = β 3 = Smokng, but not coff drnkng, s assocatd wth ncrasd rsk of CHD 31 = + coff + smok + coff smok β If β 3 = Smokng and coff drnkng ar both assocatd wth rsk of CHD but th odds rato of CHD-smokng s th sam at both lvls of coff Smokng and coff drnkng ar both assocatd wth rsk of CHD but th odds rato of CHD-coff s th sam at both lvls of smokng Common da: th ffcts of ach of ths rsk factors s urly addtv (on th -odds scal), thr s no ntracton 32
Modl 1: man ffct of coff = β + coff Logt stmats Numbr of obs = 151 LR ch2(1) = 5.65 Prob > ch2 =.175 Log lklhood = -1.64332 Psudo R2 =.273 chd Cof. Std. Err. z P> z [95% Conf. Intrval] -------------+---------------------------------------------------------------- coff.7874579.3347123 2.35.19.1314338 1.443482 (Intrct) -.6539265.2417869-2.7.7-1.12782 -.18329 Modl 2: man ffcts of coff and smok = β + coff + smok 1 Logt stmats Numbr of obs = 151 LR ch2(2) = 15.19 Prob > ch2 =.5 Log lklhood = -95.869718 Psudo R2 =.734 chd Cof. Std. Err. z P> z [95% Conf. Intrval] -------------+---------------------------------------------------------------- coff.5269764.3541932 1.49.137 -.1672295 1.221182 smok 1.11978.369954 3.5.2.394444 1.89516 (Intrct) -.9572328.27386-3.54. -1.48728 -.4274377 33 34 Modl 3: man ffcts of coff and smok AND thr ntracton Comarng Modls 1 & 2 Quston: Is smokng a confoundr? = + coff + smok + coff smok β Logt stmats Numbr of obs = 151 LR ch2(3) = 15.55 Prob > ch2 =.14 Log lklhood = -95.694169 Psudo R2 =.751 Varabl Intrct Coff Est s Modl1 -.65.24.79.33 z -2.7 2.4 chd Cof. Std. Err. z P> z [95% Conf. Intrval] -------------+---------------------------------------------------------------- coff.6931472.452562 1.53.126 -.1937487 1.5843 smok 1.34873.553528 2.44.15.2631923 2.432954 coff_smok -.4317824.7294515 -.59.554-1.861481.9979163 (Intrct) -1.29619.37926-3.42.1-1.619162 -.44768 Intrct Coff Smokng Modl 2 -.96.27.53.35 1.1.36-3.5 1.5 3.1 35 36
Look at Confdnc Intrvals Wthout Smokng OR =.79 = 2.2 95% CI for (OR):.79 ± 1.96(.33) = (.13, 1.44) 95% CI for OR: (.13, 1.44 ) = (1.14, 4.24) Wth Smokng (adjustng for smokng) OR =.53 = 1.7 Smokng dos not confound th rlatonsh btwn coff drnkng and CHD snc 1.7 s n th 95% CI from th modl wthout Concluson rgardng confoundng So, gnorng smokng, th CHD and coff OR s 2.2 (95% CI: 1.14-4.26) Adjustng for smokng, gvs mor modst vdnc for a coff ffct Howvr, smokng dos not aar to b an mortant confoundr smokng 37 38 Intracton Modl Quston: Is smokng an ffct modfr of CHDcoff assocaton? Intrct Coff Smokng Varabl Coff*Smokng Est Modl 3-1..69 1.3 -.43.3.45.55.73 s -3.4 1.5 2.4 -.59 z Tstng Intracton Trm Z= -.59, -valu =.554 W fal to rjct H : ntracton slo= And w conclud thr s lttl vdnc that smokng s an ffct modfr! 39 4
Fttd Valus W can us transform to gt fttd robablts and comar wth obsrvd roortons usng ach of th thr modls Quston: Modl slcton What modl should w choos to dscrb th rlatonsh of coff and smokng wth CHD? 41 Modl 1: Modl 2: Modl 3: -.65 ˆ = 1+ -.96+ ˆ = 1+ ˆ = +.79Coff -.65+.79Coff.53Coff+ 1.1Smokng 1+ -.96+.53Coff+ 1.1Smokng -.1.3+.69Coff+ 1.3Smokng-.43(Coff*Smokng) -.1.3+.69Coff+ 1.3Smokng-.43(Coff*Smokng) 42 Obsrvd vs Fttd Valus Saturatd Modl Not that fttd valus from Modl 3 xactly match th obsrvd valus ndcatng a saturatd modl that gvs rfct rdctons Although th saturatd modl wll always rsult n a rfct ft, t s usually not th bst modl (.g., whn thr ar contnuous covarats or many covarats) 43 44
Lklhood Rato Tst Examl summary wrt-u Th Lklhood Rato Tst wll hl dcd whthr or not addtonal trm(s) sgnfcantly mrov th modl ft Lklhood Rato Tst (LRT) statstc for comarng nstd modls s -2 tms th dffrnc btwn th lklhoods (LLs) for th Null -vs- Extndd modls W v alrady don ths arlr n today s lctur!! Ch-squar (X 2 ) Tst of Dvanc s th sam thng as th Lklhood Rato Tst Usd to comar any ar of nstd stc rgrsson modls and gt a -valu assocatd wth th H : th nw β s all= 45 A cas-control study was conductd wth 151 subjcts, 66 (44%) of whom had CHD, to assss th rlatv mortanc of smokng and coff drnkng as rsk factors. Th obsrvd fractons of CHD cass by smokng, coff strata ar 46 Examl Summary: Unadjustd ORs Examl Summary: Adjustd ORs Th odds of CHD was stmatd to b 3.4 tms hghr among smokrs comard to non-smokrs 95% CI: (1.7, 7.9) Th odds of CHD was stmatd to b 2.2 tms hghr among coff drnkrs comard to non-coff drnkrs 95% CI: (1.1, 4.3) Controllng for th otntal confoundng of smokng, th coff odds rato was stmatd to b 1.7 wth 95% CI: (.85, 3.4). Hnc, th vdnc n ths data ar nsuffcnt to conclud coff has an ndndnt ffct on CHD byond that of smokng. 47 48
Examl Summary: ffct modfcaton Fnally, w stmatd th coff odds rato saratly for smokrs and non-smokrs to assss whthr smokng s an ffct modfr of th coff-chd rlatonsh. For th smokrs and non-smokrs, th coff odds rato was stmatd to b 1.3 (95% CI:.42, 4.) and 2. (95% CI:.82, 4.9) rsctvly. Thr s lttl vdnc of ffct modfcaton n ths data. Summary of Lctur 15 Includng catgorcal rdctors n stc rgrsson crat dummy/ndcator varabls just lk for lnar rgrsson Comarng nstd modls that dffr by two or mor varabls for stc rgrsson Ch-squar (X 2 ) Tst of Dvanc.., lklhood rato tst anaous to th F-tst for nstd modls n lnar rgrsson Effct Modfcaton and Confoundng n stc rgrsson 49 5