Binary Data in Epidemiology Epidemiologic Measures of Association with Time to Event Data

Size: px

Start display at page:

Download "Binary Data in Epidemiology Epidemiologic Measures of Association with Time to Event Data"

Pamela Arabella Boyd
5 years ago
Views:

1 Leture 3: Measures of Assoiatio tober 3, 23 Biost 536 / Epi 536 Categorial Data Aalysis i Epidemiology Sott S. Emerso, M.D., Ph.D. Professor of Biostatistis Uiversity of Washigto Leture utlie Measures of Assoiatio (Siee) With full time to evet data With biary data Preisio of Iferee (Statistis) Leture 3: Measures of Assoiatio tober 3, 23 2 Biary Data i Epidemiology Epidemiologi Measures of Assoiatio with Time to Evet Data Idetifyig Risk Fators ad Evaluatig Prevetive ad Therapeuti Agets Fous of epidemiologi researh (i broad ategories) Disease surveillae: morbidity, mortality Quatifyig patters of iidee, prevalee Idetifyig risk fators or protetive fators for disease / death Comparig patters of iidee aross exposure groups Evaluatig prevetive strategies ad treatmets Comparig patters of iidee / outomes aross itervetios 3 Sietifi lassifiatio of biary data Cliial outomes Diagosis of disease, ure, disease progressio, death Exposures sometimes dihotomized Geotypes, evirometal exposures, behavioral exposures Idiators of risk status Time, plae, populatio 4 Categorial Data Aalysis, AUT 23

2 Leture 3: Measures of Assoiatio tober 3, 23 Risk Sets Summarizig Disease Epidemiology Most ofte, we reogize that the probability of a evet depeds i some way upo time I may ases, that time depedee is somethig we merely wat to adjust for as we ompare differet groups It is ot as importat to otrast the evet probability over time Examples Comparig rates of aer betwee smokers ad osmokers, we kow aer ours more ofte late i life Comparig exposure groups for premature delivery, we kow that probability of delivery ireases as pregay otiues We thus fid it oveiet to ouh may of our aalyses of biary data i terms that also osider time to evet 5 Three alterative ways of desribig the probability of observig a evet (disease diagosis, progressio, death, ) over time Ay of the first four ompletely speifies all of the others Radom variable T measures time of evet Cumulative dist futio Survivor futio Desity Iidee rate (hazard) Cumul iidee from t F F t PrT t e u du S t PrT t F t e d f t Prt T t t F t dt t Prt T t t t T Prt T t t f ( t Prt T F t t Prt T t t T F t Ft Ft t t u du ) ( t) 6 A Aside: Prevalee Risk Fators for Disease I epidemiology we also osider aother biary variable: prevalee Prevalee is i some sese a omposite of Disease iidee Disease ase fatality rate Mortality rate from all other auses Wheever it is feasible we would geerally prefer to separately study Disease iidee rates Disease ase fatality rates However, i some settigs (e.g., health servies, pharmaeutis) prevalee may ideed be most importat, ad studyig that 7 Summarizig disease D epidemiology i a speified populatio Iidee rate (over time) Cumulative iidee (over time) A exposure E is a risk fator if the disease epidemiology differs i some way aross populatios havig differet levels of E Differs at some time for some levels of E Does ot have to differ at all times Does ot have to differ for all groups A assoiatio betwee the distributio of D ad E My workig defiitio of ausality: It would be a ause if experimetally alterig the value of E ivariably alters the distributio of D i some way (Eve RCT do ot prove that E is the proximal ause, but they ome loser tha observatioal studies a) 8 Categorial Data Aalysis, AUT 23 2

3 Leture 3: Measures of Assoiatio tober 3, 23 Comparig Distributios Cosider the followig three graphs depitig the mortality experiee predited usig US Soial Seurity Data from 29 What measures would you look at desriptively to deide that there is a assoiatio betwee sex ad mortality? I partiular, how a you read from the survival urves Comparisos of survival at a partiular time Comparisos of medias (quatiles) of the distributios Comparisos of meas Comparisos of geometri meas Comparisos of hazards Is there effet modifiatio? Detetig Assoiatios with Full Data No assoiatio Graphs of desities, survival futios, or hazards will show oiidet urves Assoiatio Curves are differet somewhere Assoiatio i some speifi summary measure (e.g., mea, ) Curves differ i a partiular aspet 9 Detetig Effet Modifiatio with Full Data Effet modifiatio depeds o the summary measure used to quatify effet No effet modifiatio usig some speifi summary measure Graphs of desities, survival futios, or hazards will have same appearae with respet to the orrespodig aspet Comparig Desities: Diffiult Mode / Multimodality Age at highest peak Survival probability at t Area to the right of t Media (other quatiles) Area uder urve to the left of the media age will equal ½ Meas Not easily visualized Hazard Desity divided by area uder urve to the right 2 Categorial Data Aalysis, AUT 23 3

4 Leture 3: Measures of Assoiatio tober 3, 23 Comparig Survival Curves: S(t) vs t Comparig Survival Curves: S(t) vs log t Multimodality Chage i ovexity Differee i survival at t Vertial separatio at t Differee i quatiles Horizotal separatio at p Differee i meas Area betwee urves Hazard Slope divided by height of the urve Differee i survival at t Vertial separatio at t Log ratio of quatiles Horizotal separatio at p Log ratio of geometri meas Area betwee urves Hazard Slope divided by height of the urve 3 4 Comparig Survival Curves: log S(t) vs t Comparig CDF : log ( - S(t)) vs t Log ratio of survival at t Vertial separatio at t Log ratio of evet probability at t Vertial separatio at t 5 6 Categorial Data Aalysis, AUT 23 4

5 Leture 3: Measures of Assoiatio tober 3, 23 dds of Death : log [(-S(t))/S(t)] vs t Comparig Hazards: (t) vs t Log odds ratio of evet at t Vertial separatio at t Differee i survival at t Differee i expoetiated area uder urve to left of t o Differee i hazards at t Vertial separatio at t 7 8 Comparig Hazards: log (t) vs t Comparig Hazards: log (t) vs log t Log hazard ratio at t Vertial separatio at t Shape of distributio Weibull is straight lie Expoetial is flat lie Log hazard ratio at t Vertial separatio at t 9 2 Categorial Data Aalysis, AUT 23 5

6 Leture 3: Measures of Assoiatio tober 3, 23 Commo Measures of Assoiatio Derived Measures of Assoiatio Commo epidemiologi measures of assoiatio with full time to evet data Used most ofte: Relative risk Also defied: Risk differee Relative Risk (RR) Risk Differee (RD) RR t RD t t t t t 2 Relative risk differee: Proportioate ireased risk i exposed relative to otrols Attributable risk: Proportio of risk i exposed from exposure Populatio attributable risk: Proportio of disease i populatio attributable to prevalee of exposure p t Relative Risk Differee (RRD) Attributable Risk (AR) PopulatioAttributable Risk (PAR) RRD AR PAR t t t t t RR( t) t t t t RR( t pt t t pt t pt t pt RR t p RR t p t t t ) 22 Epidemiologi Study Desig I Epidemiologi Measures of Assoiatio with Biary Data Idetifyig Risk Fators ad Evaluatig Prevetive ad Therapeuti Agets Cross setioal surveys Populatio measured at a sigle poit i time May be aledar time, evet time, age Ca study prevalee of exposures ad prevalee of disease Ca estimate Pr [ D E ] ; Pr [ E D ] ; Pr [ E ] ; Pr[ D ] Whe iterested i evaluatig risk fators, suh desigs ofuse Disease iidee Disease ase fatality rate Mortality from ompetig risks Categorial Data Aalysis, AUT 23 6

7 Leture 3: Measures of Assoiatio tober 3, 23 Epidemiologi Study Desig II Measures of Assoiatio Cohort study Populatios idetified by levels of exposure at time Followed (prospetively or retrospetively) over speified time Ca study iidee rates ad umulative iidee of disease Ca estimate Pr [ D E ] fte, the time of follow-up is suffiietly short that iidee rate is (early) ostat Cumulative iidee is iidee rate times perso-years of follow-up ther times the use of idetial follow-up periods avoids ofoudig the determiatio of umulative iidee by a timevaryig iidee rate Eah group s umulative iidee will be averaged over the Quatifiatio of equality betwee two umbers Differee is Ratio is Comparig distributios for biary measures (e.g., disease) Differee i proportios Ratio of proportios same part of the iidee rate urve Measures of Assoiatio: Cohort Desigs Simple settig (or withi groups defied by ovariates) Exposed (E) vs o-exposed (E ) Diseased (D) vs o-diseased (D ) Parameter Spae Graph of all possible values for disease iidees Estimated iidee of disease withi eah exposure group Might be viewed as average iidee rate if divided by perso years of exposure Risk differee: RD = Pr [ D E ] Pr [ D E ] Iterpretable as exess proportio of disease Relative risk: RR = Pr [ D E ] / Pr [ D E ] f partiular use whe establishig a risk fator Categorial Data Aalysis, AUT 23 7

8 Leture 3: Measures of Assoiatio tober 3, 23 Cotours: Equal Risk Differee (RD) Graph of all possible values for disease iidees Cotours: Equal Relative Risk (RR) Graph of all possible values for disease iidees 29 3 Lookig Ahead: dds Ratios Limitatios o the rage of values for RD ad RR RD a be betwee - ad But if uexposed group already has high iidee, say,.75, the RD a oly be betwee -.75 ad.25 RR a be betwee ad ifiity But if uexposed group already has high iidee, say,.75, the RR a oly be betwee ad.33 dds Ratio dds of disease i uexposed dds of disease i exposed P P D P D P E D E D E E These fats argue that effet modifiatio might be regularly see whe usig RD or RR to summarize omparisos of risk The odds ratio has o suh limitatio The R a always be ay oegative umber, o matter the iidee i the uexposed populatio dds Ratio R P D E P D E P D P D E E P D, E P D, E P D, E P D, E 3 32 Categorial Data Aalysis, AUT 23 8

9 Leture 3: Measures of Assoiatio tober 3, 23 Cotours: Equal dds Ratio (R) Graph of all possible values for disease iidees Cotours: Equal RR vs Equal R Graph of all possible values for disease iidees Relative Risk dds Ratio Epidemiologi Study Desig III Case-otrol study Populatios idetified by disease status (aordig to eligibility riteria) Retrospetive follow-up to determie exposure status Ca study distributio of exposures withi disease groups Ca estimate Pr [ E D ] Mathig of otrols to same time / spae athmet of ases may avoid some ofoudig Use of odds ratio as a good approximatio for relative risk for rare disease 35 dds of dds of dds Ratio RE D Rare disease : dds Ratio from Case-Cotrol exposure i healthy exposure i disease E D E D R P E D P E D P E D P E D E D E D P P E P E P D E D D E D P D E P D E P D E P D E P D, E P D, E P D, E P D, E P D, E P D, E R RR 36 Categorial Data Aalysis, AUT 23 9

10 Leture 3: Measures of Assoiatio tober 3, 23 Summary Measures Two Sample Iferee with Biary Data Large Sample Iferee (Uesored Data) Comparig distributios of biary variables aross two groups Differee of proportios Most ommo Iferee based o differee of estimated proportios Ratio of proportios f most relevae with low probabilities fte atually use odds ratio dds ratio Large Sample Distributio With totally idepedet data, we use the Cetral Limit Theorem Proportios are meas Sample proportios are sample meas Stadard error estimates for eah group s estimated proportio based o the mea variae relatioship Asymptoti Samplig Dist Comparig two biomial proportios Suppose idepedet X, p X i X ~ B, p i, p Yi Y ~ B, p i We wat to make iferee about p p pˆ ~ B ad Y ~ B X p ( p) ~ N p, i i pˆ Y p ( p) ~ N p, 39 ˆ pˆ pˆ ~ N p p, p ( p ) p( p) 4 Categorial Data Aalysis, AUT 23

11 Leture 3: Measures of Assoiatio tober 3, 23 Asymptoti (Wald) Cofidee Itervals Cofidee iterval for differee betwee two biomial proportios We wat to make iferee about p ( )% ofidee iterval is ˆ z se ˆ z se Estimate / 2 se ˆ ˆ, ˆ pˆ pˆ / 2 ˆ pˆ ( pˆ ˆ ˆ ) p( p) p Asymptoti (Sore) Hypothesis Tests Test statisti for differee betwee two biomial proportios Compute variae usig plug-i omputed uder the ull Suppose we wat to test H Uder the ull hypothesis, Uder H Estimate ˆ : p ˆ the etire distributios are equal so se pˆ( pˆ) pˆ( pˆ) ˆ Z se p ~ N(,) X Y with pˆ 4 42 Elevator Statistis Need for Large Samples Well, early elevator statistis a m b d m N Note that beause the goodess of fit test is relyig o asymptoti properties, it is oly valid i large samples A ommoly used rule of thumb if that the expeted outs be greater tha 5 i the vast majority of the ells 2 ad b N m m 2 Z ad b N m m Categorial Data Aalysis, AUT 23

12 Leture 3: Measures of Assoiatio tober 3, 23 ther Large Sample Tests Likelihood Ratio Test Also has hi squared distributio i large samples But ot the same statisti as hi squared statisti Stata Commads: CI s respvar groupvar, level(#) Both variables must be oded as ad Respose will be alled ases ad oases CI a be foud uder Risk differee Chi squared statisti ad two-sided P value Good large sample properties fte most powerful tabulate respvar groupvar, row ol hi2 lr Row ad olum peretages Chi squared ad likelihood ratio P values Less ommoly used i 2 x 2 tables We will use it more ofte i logisti regressio Caveat: Sad Fat of Life Differet variae estimates are typially used for CI ad hypothesis tests Yates Corretio Historially, a otiuity orretio to the hi squared test to try to avoid its ati-oservatism i small samples We a see disagreemet betwee the olusio reahed by CI ad P value The P value might be less tha.5, but the CI otai The P value might be greater tha.5, but the CI exlude Ivertig a sore or LR test would alleviate some of this problem I kow of o software that does this All that was ahieved was gettig a test that behaves as poorly as the Fisher s exat test I heartily disreommed use of the otiuity orretio whe omparig two samples (There is a otiuity orretio used i oe sample Z tests that is useful, but exat distributios are eve better) Categorial Data Aalysis, AUT 23 2

13 Leture 3: Measures of Assoiatio tober 3, 23 Small Sample Distributio Two Sample Iferee with Biary Data The exat distributio for the differee i two proportios a ot be determied i geeral, beause of the mea variae relatioship Small Sample Iferee (Uesored) We eed to kow the value of the two proportios beig ompared i order to fid the exat distributio of the differee 49 5 Small Sample CI We have o way of obtaiig exat CI for the differee i proportios We ould osider all possible values of the two proportios, ad see whether a test would rejet eah ombiatio But the resultig joit ofidee iterval would ot always give the same deisio for equal differees E.g, it might rejet. ad.2, but ot.4 ad.5 Small Sample Tests We a, however, desribe the exat distributio of the data uder the ull hypothesis oditioal o all the margis of a otigey table A permutatio distributio We imagie radomly assigig observatios betwee the groups 5 52 Categorial Data Aalysis, AUT 23 3

14 Leture 3: Measures of Assoiatio tober 3, 23 Permutatio Idea Radomly permute m positives ad m egatives Call the first group ad the last group Repeat may times ad see how ofte group has a or more positives Group Respose a b d m m N Permutatio Tests I usually objet to permutatio distributios exept as a last resort They test equality of distributios, ot just equality of the populatio parameter Usually they are ot, however, guarateed to detet arbitrary differees betwee distributios eve i ifiite samples Coditio o values of m, m,, Permutatio with Biary Data However, with biary data, distributios are differet if ad oly if the proportios are differet Hee permutatio tests are okay for testig But still, we have o ofidee itervals beause we have ot quatified alteratives Small Sample Tests Coditioig o the margis fte oe margi is fixed by desig Cohort studies sample by exposure Case-otrol studies sample by disease I ay ase, it a be show that oe of the margi totals otribute iformatio about the differee i proportios Categorial Data Aalysis, AUT 23 4

15 Leture 3: Measures of Assoiatio tober 3, 23 Fisher s Exat Test Probability of more extreme otigey tables with the same margial totals Probabilities by hypergeometri distributio (Use a omputer) Stata Commads The Fisher s exat test P values are give by several ommads s respvar groupvar, exat tabulate respvar groupvar, exat e-sided ad two-sided P values are provided Respose Group a k k m b k d k m N Cosider all possible values of k Commets Problem Fisher s exat test does ot tur out to be a exat test i pratie We the fae a dilemma A problem is posed by the disrete ature of the data To ahieve the desired level.5 two-sided test, we would sometimes have to rejet the ull whe both groups had suesses With some results flip a biased oi to deide whether sigifiat Few people are willig to do this The hi squared test (Z test for proportios) may be atioservative i small samples Geerally so log as all ell outs i the otigey table are expeted to be greater tha 5 uder the ull hypothesis, we are K The Fisher s exat test is too oservative 59 6 Categorial Data Aalysis, AUT 23 5

Leture 3: Measures of Assoiatio tober 3, 23 Regios of Valid Iferee Use of hi square ad LR tests whe =3 i eah group Valid Use of the Uadjusted Chi Square Valid Use of the Uadjusted LR Nomial Type I

16 Leture 3: Measures of Assoiatio tober 3, 23 Regios of Valid Iferee Use of hi square ad LR tests whe =3 i eah group Valid Use of the Uadjusted Chi Square Valid Use of the Uadjusted LR Nomial Type I Error Nomial Type I Error Alteratives Great improvemets i statistial power obtaied by modifyig either of those tests to ahieve as lose to the omial type I error without exeedig Several statistial pakages provide suh modified tests (e.g., StatExat) Stata provides exat logisti regressio for this settig I have suh futios i R p = p = p p = p = p 6 62 Modifiatios: Basi Idea Critial Regio Modifiatios Use the statisti Do t presume the lassial distributio Do t assume hi squared statisti has hi square distributio Do t assume Fisher s Exat P value has uiform distributio Cosider all possible values of p ommo to both groups, ad use exat distributio The take worst ase Categorial Data Aalysis, AUT 23 6

17 Leture 3: Measures of Assoiatio tober 3, 23 True Type I Error by Commo p Use of adjusted ad uadjusted tests for =3/arm ad =5/arm Gais i Power Power of uadjusted, adjusted level.5 tests Nomial Level.25 Nomial Level.25 Leged Atual Type I Error Atual Type I Error Adj Chi Square Uadj Chi Square Adj Fisher's Uadj Fisher's Adj Lklhd Ratio Uadj Lklhd Ratio p = p = p p = p = p Geeral Commets It is geerally immaterial whether the Fisher s exat test P value or the hi square statisti or likelihood ratio statisti is used as the basis for the exat test I ay ase, the ritial value is depedet upo the sample sizes Usig this approah, substatial improvemet i power is obtaied i low sample sizes I strogly reommed its use whe ofroted with small samples i real life 67 Categorial Data Aalysis, AUT 23 7

Chapter 8 Hypothesis Testing

Chapter 8 Hypothesis Testing Chapter 8 for BST 695: Speial Topis i Statistial Theory Kui Zhag, Chapter 8 Hypothesis Testig Setio 8 Itrodutio Defiitio 8 A hypothesis is a statemet about a populatio parameter Defiitio 8 The two omplemetary