Stat 642, Lecture notes for 01/27/ d i = 1 t. n i t nj. n j

Stat 642, Lecture notes for 01/27/05 18 Rate Standardzaton Contnued: Note that f T n t where T s the cumulatve follow-up tme and n s the number of subjects at rsk at the mdpont or nterval, and d s the observed number of cases durng nterval, so that f ˆλ d T, then lettng w n / n j, ˆλw d n n t nj 1 t n j d whch s the number of cases per person year of exposure for the entre cohort. If w represents the proporton of the standard populaton n each age category, ˆλw would represent the number of cases per person year of exposure f the cohort had the same dstrbuton as the standard populaton. The prmary dffculty wth the CMF s that f, for some, var(ˆλ ) s large (e.g. T s small), then var(cmf) can be large. var(cmf) w 2 var(ˆλ ) ( w λ )2 w 2 d /T 2 ( w λ )2 Standardzed Mortalty Rato: The CMF mposes the standard populaton dstrbuton on the observed hazard rates. Conversely, one could mpose the standard rates on the observed populaton. Standardzed Mortalty Rato (SMR) d T λ Observed Events Expected Events under Standardzed rates The Varance of SMR s stable. If rate ratos λ /λ mnmum varance estmate of the common rato. are ndependent of, then the SMR s the var(smr) d ( T λ )2 and s usually smaller than var(cmf). In the above example, SMR 5.4/3.9 1.38 (assumng that w represent observed proportons n group B). Whle the SMR has smaller varance than the CMF, the SMR s potentally based.

Stat 642, Lecture notes for 01/27/05 19 Example: (Breslow and Day II, Table 2.13) (An example of Smpson s Paradox) Suppose that we have two subject cohorts. Age Groups Cohort I 20-44 45-64 total d (observed) 100 1600 1700 expected 200 800 1000 SMR (%) 50% 200% 170% Cohort II d (observed) 80 180 260 expected 120 60 180 SMR (%) 67% 300% 144% SMR I SMR II.75.67 1.18 Even though, n each age group, the SMR for cohort I s less than the SMR for cohort II, the the relatonshp for the aggregate SMR s reversed. Snce the SMRs are qute dfferent n older and younger subjects, the overall SMR, whch s an average of the two, s not a very meanngful summary measure. Note that n Cohort I, 200/1000 20% of expected events are n the low SMR group. In Cohort II, 120/180 67% of expected events are n the low SMR group. We wll dscuss SMR and CMF n more detal n the second half of the course. Measures of Assocaton. Gven two groups, exposed and not exposed, and two hazard rates λ 1 and λ 0 for the two groups, respectvely, we want to summarze the dfference n rsk. There are two mmedate ways to do ths. Excess Rsk: b λ 1 λ 0 addtve model Rsk Rato: r λ 1 /λ 0 multplcatve model Wth only two groups, t makes very lttle dfference whch of these measures s used for modelng purposes (one s just a re-parameterzaton of the other). As a descrptve measure, excess rsk s only useful f the baselne (un-exposed) rate s also gven. Rsk rato s typcally a more easly nterpreted measure. In most cases, rsk rato make more bologcal sense, although f, for example, the mechansm by whch exposure causes dsease s ndependent of that for non-exposed cases, then the addtve model mght make more sense.

Stat 642, Lecture notes for 01/27/05 20 Attrbutable Rsk: It s sometmes useful to consder the proporton of rsk whch can be attrbuted to exposure. AR attrbutable rsk for an exposed person excess rsk total rsk gven exposure λ 1 λ 0 λ 1 rλ 0 λ 0 rλ 0 r 1 r It the proporton of subjects n the populaton wth exposure s p, then the populaton average rsk pλ 1 + (1 p)λ 0 λ 0 + p(λ 1 λ 0 ) }{{} populaton excess rsk PAR populaton attrbutable rsk p(λ 1 λ 0 ) λ 0 + p(λ 1 λ 0 ) p(r 1) 1 + p(r 1) λ 1 Area p(λ 1 - λ 0 ) Excess Rsk In the accompanyng fgure, the area of the L shaped regon s the populaton average hazard rate. The area of the rectangle on the rght hand sde whch les between λ 0 and λ 1 represents the populaton average excess rsk. The rato of ths area and the total area s the populaton attrbutable rsk. λ 0 Area λ 0 Rsk Wth No Exposure 0 1-p 1

Stat 642, Lecture notes for 01/27/05 21 Analyss of crude data: 2 2 Tables Suppose that we have the followng probabltes for exposure and dsease n a populaton: E+ p 11 p 10 E- p 01 p 00 where p j 1, and let ψ p 11p 00 p 10 p 01. If we take a random sample from the populaton of sze N, we have expected values E+ Np 11 Np 10 E- Np 01 Np 00 and the OR s Np 11 Np 00 Np 10 Np 01 ψ. On the other hand, f we sample n 1 and n 0 from the rows, we have expected values E+ E- n 1 p 11 n 1 p 10 p 11 + p 10 p 11 + p 10 n 0 p 01 n 0 p 00 p 01 + p 00 p 01 + p 00 and the OR s n 1 p 11 n 0 p 00 (p 11 + p 10 )(p 01 + p 00 ) n 1 p 10 n 0 p 01 (p 11 + p 10 )(p 01 + p 00 ) ψ. Clearly, f we sample from the columns, we obtan the same odds-rato. The only samplng model we have not consdered s one n whch we fx both row and column totals, but t s dffcult to construct an observatonal settng n whch and sample from the nteror of the table. (If we choose a prescrbed number of dseased and non-dseased subjects, and randomly assgn a prescrbed number of labels of exposed and not exposed to these subjects, we can generate a random 2 2 table wth prescrbed margns and ψ 1. Also see Fsher s tea tastng experment.) Now suppose that we have the followng populaton dstrbuton:

Stat 642, Lecture notes for 01/27/05 22 E+ pp 1 pq 1 p E- qp 0 qq 0 q where p Probablty of Exposure, q 1 p P 1 Probablty of Dsease Gven Exposure, Q 1 1 P 1 P 0 Probablty of Dsease Gven No Exposure, Q 0 1 P 0 and suppose that we observe We have two goals: E+ a b n 1 E- c d n 0 m 1 m 0 N. estmate ψ, and a confdence nterval or standard error. test hypotheses, say, H 0 : ψ ψ 0 (usually ψ 0 1) For ths problem, P 1, P 0 and p are nusance parameters. There are three prmary approaches to ths problem. 1. Exact condtonal: test (or estmate) ψ condtonal on the suffcent statstcs for the nusance parameters usng exact dstrbuton. 2. Exact uncondtonal: consder worst case values of nusance parameters usng exact dstrbuton. 3. Asymptotc: estmate nusance parameters, and appeal to asymptotc theory for samplng dstrbuton. We wll also consder the samplng dstrbuton of X (a, b, c, d) under three samplng models: (I) random sample from populaton - N fxed (II) cohort study, n 1, n 0 fxed (III) case-control study, m 1, m 0 fxed 1. Exact condtonal I - random sample.

Stat 642, Lecture notes for 01/27/05 23 Let X (a, b, c, d), then X s multnomal wth probabltes (pp 1, pq 1, qp 0, qq 0 ). The dstrbuton functon for X s ( ) N Pr{X; P 0, P 1, p} (pp a b c d 1 ) a (pq 1 ) b (qp 0 ) c (qq 0 ) d N n1 ( ) m1 ( ) (qq n 1 a c 0 ) N pq1 P0 P1 Q a 0 qq 0 Q 0 Q 1 P 0 h I (n 0, m 1, P 0, P 1, p) ψ a Therefore, condtonal on all margns, the dstrbuton for a becomes ψ a Pr{a ψ, n 1, m 1 } u ψ u m 1 u u Ths s the (noncentral) hypergeometrc dstrbuton. II - n 0, n 1 fxed. a and c are ndependent bnomal RV s wth probabltes P 1 and P 0, and sample szes n 1 and n 0. The dstrbuton for X (a, c), s Pr{X; P 0, P 1 } (P a c 1 ) a (Q 1 ) b (P 0 ) c (Q 0 ) d ( ) m1 ( ) Q n 0 a c 0 Qn 1 P0 P1 Q a 0 1 Q 0 Q 1 P 0 h II (n 0, m 1, P 0, P 1, p) ψ a The factor whch depends on a and ψ s precsely the same as case I. III - m 0, m 1 fxed. By nterchangng the roles of the rows and columns and usng the fact that m1 m0 n1!n 0! a c a b m 1!m 0! We have that Pr{X; P 0, P 1 } h III (n 0, m 1, P 0, P 1, p) ψ a Agan, the factor whch depends on a and ψ s precsely the same as case I. In all cases, the condtonal dstrbuton of a, s the same non-central hypergeometrc dstrbuton.