Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.

yes to (3) two-sample problem? no to (4) underlyng dstrbuton normal or can centrallmt theorem be assumed to hold? and yes to (5) underlyng dstrbuton bnomal? We now refer to the flowchart at the end of ths chapter (p. 409). We answer yes to (1) are samples ndependent? () are all expected values 5? and (3) contngency table? Ths leads us to the box labeled Use the two-sample test for bnomal proportons or contngency-table methods f no confoundng s present, or Mantel-Haenszel test f confoundng s present. In bref, a confounder s another varable that s potentally related to both the row and column classfcaton varables, and t must be controlled for. We dscuss methods for controllng for confoundng n Chapter 13. In ths chapter, we assume no confoundng s present. Thus we use ether the two-sample test for bnomal proportons (Equaton 10.3) or the equvalent ch-square test for contngency tables (Equaton 10.5). In Secton 10., we dscussed methods for comparng two bnomal proportons usng ether normal-theory or contngency-table methods. Both methods yeld dentcal p-values. However, they requre that the normal approxmaton to the bnomal dstrbuton be vald, whch s not always the case, especally for small samples. Suppose we want to nvestgate the relatonshp between hgh salt ntae and death from cardovascular dsease (CD). Groups of hgh- and low-salt users could be dentfed and followed over a long tme to compare relatve frequency of death from CD n the two groups. In contrast, a much less expensve study would nvolve loong at death records, separatng CD deaths from non-cd deaths, asng a close relatve (such as a spouse) about the detary habts of the deceased, and then comparng salt ntae between people who ded of CD vs. people who ded of other causes. The latter type of study, a retrospectve study, may be mpossble to perform for a number of reasons. But f t s possble, t s almost always less expensve than the former type, a prospectve study. Suppose a retrospectve study s done among men ages 5054 n a specfc county who ded over a 1-month perod. The nvestgators try to nclude approxmately an equal number of men who ded from CD (the cases) and men who ded from other causes (the controls). Of 35 people who ded from CD, 5 were on a hgh-salt det before they ded, whereas of 5 people who ded from other causes were on such a det. These Copyrght 010 Cengage Learnng, Inc. All Rghts Reserved. May not be coped, scanned, or duplcated, n whole or n part.

data, presented n Table 10.9, are n the form of a contngency table, so the methods of Secton 10. may be applcable. However, the expected values of ths table are too small for such methods to be vald. Indeed, E E 11 1 75 60. 9 735 60 4. 08 thus two of the four cells have expected values less than 5. How should the possble assocaton between cause of death and type of det be assessed? In ths case, Fsher s exact test can be used. Ths procedure gves exact levels of sgnfcance for any table but s only necessary for tables wth small expected values, tables n whch the standard ch-square test as gven n Equaton 10.5 s not applcable. For tables n whch use of the ch-square test s approprate, the two tests gve very smlar results. Suppose the probablty that a man was on a hgh-salt det gven that hs cause of death was noncardovascular (non-cd) p 1 and the probablty that a man was on a hgh-salt det gven that hs cause of death was cardovascular (CD) p. We wsh to test the hypothess H 0 : p 1 p p vs. H 1 : p 1 p. Table 10.10 gves the general layout of the data. For mathematcal convenence, we assume the margns of ths table are fxed; that s, the numbers of non-cd deaths and CD deaths are fxed at a b and c d, respectvely, and the numbers of people on hgh- and low-salt dets are fxed at a c and b d, respectvely. Indeed, t s dffcult to compute exact probabltes unless one assumes fxed margns. The exact probablty of observng the table wth cells a, b, c, d s as follows. Copyrght 010 Cengage Learnng, Inc. All Rghts Reserved. May not be coped, scanned, or duplcated, n whole or n part.

( ) ( + ) ( + ) ( + ) ( ) = Pr a, bcd,, a+ b! c d! a c! b d! nabc!!!! d! The formula n Equaton 10.7 s easy to remember because the numerator s the product of the factorals of each of the row and column margns, and the denomnator s the product of the factoral of the grand total and the factorals of the ndvdual cells. Suppose we have the table shown n Table 10.11. Compute the exact probablty of obtanng ths table assumng the margns are fxed. Pr 531,,, 7456!!!! 11! 531!!!! 5040 4 10 70 39, 916, 800 10 6 1. 0450944 10 1 5. 748019 10 0 10. 18 Suppose we consder all possble tables wth fxed row margns denoted by N 1 and N and fxed column margns denoted by M 1 and M. We assume the rows and columns have been rearranged so that M 1 M and N 1 N. We refer to each table by ts (1, 1) cell because all other cells are then determned from the fxed row and column margns. Let the random varable X denote the cell count n the (1, 1) cell. The probablty dstrbuton of X s gven by ( ) = Pr X = a N1! N! M1! M! a M N N! a! N a! M a! M N + a!,, K,mn, ( ) ( ) ( ) 1 1 1 0 1 1 = ( ) and N N 1 N M 1 M. Ths probablty dstrbuton s called the hypergeometrc dstrbuton. It wll be useful for our subsequent wor on combnng evdence from more than one table n Chapter 13 to refer to the expected value and varance of the hypergeometrc dstrbuton. These are as follows. Suppose we consder all possble tables wth fxed row margns N 1, N and fxed column margns M 1, M, where N 1 N, M 1 M, and N N 1 N M 1 M. Let the random varable X denote the cell count n the (1, 1) cell. The expected value and varance of X are Copyrght 010 Cengage Learnng, Inc. All Rghts Reserved. May not be coped, scanned, or duplcated, n whole or n part.

E X ar X ( ) = ( ) = MN 1 1 N MMNN N N 1 1 1 ( ) Thus the exact probablty of obtanng a table wth cells a, b, c, d n Equaton 10.7 s a specal case of the hypergeometrc dstrbuton, where N 1 a b, N c d, M 1 a c, M b d, and N a b c d. We can evaluate ths probablty by calculator usng Equaton 10.7, or we can use the HYPGEOMDIST functon of Excel. In the latter case, to evaluate Pr(a, b, c, d), we specfy HYPGEOMDIST (a, a b, a c, N). In words, the hypergeometrc dstrbuton evaluates the probablty of obtanng a successes out of a sample of a b observatons, gven that the total populaton (n ths case, the two samples combned), s of sze N, of whch a c observatons are successes. Thus, to evaluate the exact probablty n Table 10.11, we specfy HYPGEOMDIST (, 7, 5, 11).18, whch s the probablty of obtanng two successes n a sample of 7 observatons gven that the total populaton conssts of 11 observatons, of whch 5 are successes. The hypergeometrc dstrbuton dffers from the bnomal dstrbuton, because n the latter case, we smply evaluate the probablty of obtanng a successes out of a b observatons, assumng that each outcome s ndependent. For the hypergeometrc dstrbuton, the outcomes are not ndependent because once a success occurs t s less lely that another observaton wll be a success, as the total number of successes s fxed (at a c). If N s large, the two dstrbutons are very smlar because there s only a slght devaton from ndependence for the hypergeometrc. The basc strategy n testng the hypothess H0: p1 p vs. H1: p1 p wll be to enumerate all possble tables wth the same margns as the observed table and to compute the exact probablty for each such table based on the hypergeometrc dstrbuton. A method for accomplshng ths s as follows. (1) Rearrange the rows and columns of the observed table so the smaller row total s n the frst row and the smaller column total s n the frst column. Suppose that after the rearrangement, the cells n the observed table are a, b, c, d, as shown n Table 10.10. () Start wth the table wth 0 n the (1, 1) cell. The other cells n ths table are then determned from the row and column margns. Indeed, to mantan the same row and column margns as the observed table, the (1, ) element must be a b, the (, 1) cell must be a c, and the (, ) element must be (c d) (a c) d a. (3) Construct the next table by ncreasng the (1, 1) cell by 1 (.e., from 0 to 1), decreasng the (1, ) and (, 1) cells by 1, and ncreasng the (, ) cell by 1. (4) Contnue ncreasng and decreasng the cells by 1, as n step 3, untl one of the cells s 0, at whch pont all possble tables wth the gven row and column margns have been enumerated. Each table n the sequence of tables s referred to by ts (1, 1) element. Thus, the frst table s the 0 table, the next table s the 1 table, and so on. Enumerate all possble tables wth the same row and column margns as the observed data n Table 10.9. Copyrght 010 Cengage Learnng, Inc. All Rghts Reserved. May not be coped, scanned, or duplcated, n whole or n part.

The observed table has a, b 3, c 5, d 30. The rows or columns do not need to be rearranged because the frst row total s smaller than the second row total, and the frst column total s smaller than the second column total. Start wth the 0 table, whch has 0 n the (1, 1) cell, 5 n the (1, ) cell, 7 n the (, 1) cell, and 30, or 8, n the (, ) cell. The 1 table then has 1 n the (1, 1) cell, 5 1 4 n the (1, ) cell, 7 1 6 n the (, 1) cell, and 8 1 9 n the (, ) cell. Contnue n ths fashon untl the 7 table s reached, whch has 0 n the (, 1) cell, at whch pont all possble tables wth the gven row and column margns have been enumerated. The set of hypergeometrc probabltes n Table 10.1 can be easly evaluated usng the recursve propertes of Excel by (1) settng up a column wth consecutve values from 0 to 7 (say from B1 to B8), () usng the functon HYPGEOMDIST to compute Pr(0) HYPGEOMDIST (B1, 5, 7, 60) and placng t n C1, and then (3) draggng the cursor down column C to compute the remanng hypergeometrc probabltes. See the Companon Webste for more detals on the use of the HYPGEOMDIST functon. The collecton of tables and ther assocated probabltes based on the hypergeometrc dstrbuton n Equaton 10.8 are gven n Table 10.1. The queston now s: What should be done wth these probabltes to evaluate the sgnfcance of the results? The answer depends on whether a one-sded or a twosded alternatve s beng used. In general, the followng method can be used. To test the hypothess H0: p1 = p vs. H1: p1 p, where the expected value of at least one cell s 5 when the data are analyzed n the form of a contngency table, use the followng procedure: (1) Enumerate all possble tables wth the same row and column margns as the observed table, as shown n Equaton 10.10. () Compute the exact probablty of each table enumerated n step 1, usng ether the computer or the formula n Equaton 10.7. (3) Suppose the observed table s the a table and the last table enumerated s the table. (a) To test the hypothess H0: p1 = p vs. H1: p1 p, the p-value mn Pr( 0) + Pr( 1) +... + Pr( a), Pr( a) + Pr( a + 1) +... + Pr( ),. 5. [ ] (b) To test the hypothess H0: p1 = p vs. H1: p1 < p, the p-value Pr(0) Pr(1)... Pr(a). Copyrght 010 Cengage Learnng, Inc. All Rghts Reserved. May not be coped, scanned, or duplcated, n whole or n part.

(c) To test the hypothess H0: p1 = p vs. H1: p1 > p, the p-value Pr(a) Pr(a 1) Pr(). For each of these three alternatve hypotheses, the p-value can be nterpreted as the probablty of obtanng a table as extreme as or more extreme than the observed table. Evaluate the statstcal sgnfcance of the data n Example 10.17 usng a two-sded alternatve. We want to test the hypothess H0: p1 p vs. H1: p1 p. Our table s the table whose probablty s.5 n Table 10.1. Thus, to compute the p-value, the smaller of the tal probabltes correspondng to the table s computed and doubled. Ths strategy corresponds to the procedures for the varous normal-theory tests studed n Chapters 7 and 8. Frst compute the left-hand tal area, Pr( 0) Pr() 1 Pr( ). 017. 105. 5. 375 and the rght-hand tal area, Pr( ) Pr( 3)... Pr( 7). 5. 31. 14. 08. 016. 001. 878 Then p mn(. 375,. 878,. 5) (. 375). 749 If a one-sded alternatve of the form H0: p1 p vs. H1: p1 p s used, then the p-value equals Pr( 0) Pr() 1 Pr( ). 017. 105. 5. 375 Thus the two proportons n ths example are not sgnfcantly dfferent wth ether a one-sded or two-sded test, and we cannot say, on the bass of ths lmted amount of data, that there s a sgnfcant assocaton between salt ntae and cause of death. In most nstances, computer programs are used to mplement Fsher s exact test usng statstcal pacages such as SAS. There are other possble approaches to sgnfcance testng n the two-sded case. For example, the approach used by SAS s to compute p-value (two-taled) : Pr( ) Pr( a) Pr() In other words, the two-taled p-value usng SAS s the sum of the probabltes of all tables whose probabltes are the probablty of the observed table. Usng ths approach, the two-taled p-value would be p-value (two-taled) Pr( 0) Pr() 1 Pr( ) Pr( 4) Pr( 5) Pr( 6) Pr( 7). 017. 105. 5. 14. 08. 016. 001.688 In ths secton, we learned about Fsher s exact test, whch s used for comparng bnomal proportons from two ndependent samples n tables wth small expected counts (5). Ths s the two-sample analog to the exact one-sample bnomal test gven n Equaton 7.44. If we refer to the flowchart at the end of ths chapter (Fgure 10.16, p. 409), we answer yes to (1) are samples ndependent? and no to () are all expected values 5? Ths leads us to the box labeled Use Fsher s exact test. Copyrght 010 Cengage Learnng, Inc. All Rghts Reserved. May not be coped, scanned, or duplcated, n whole or n part.

dsease exposure relatonshps n a hypothess-testng framewor usng the Mantel- Haenszel test. Fnally, standardzaton can be based on stratfcaton by factors other than age. For example, standardzaton by both age and sex s common. Smlar methods can be used to obtan age sex standardzed rss and standardzed RRs as gven n Defnton 13.15. In ths secton, we have ntroduced the concept of a confoundng varable (C), a varable related to both the dsease (D) and exposure (E) varables. Furthermore, we classfed confoundng varables as postve confounders f the assocatons between C and D and C and E, respectvely, are n the same drecton and as negatve confounders f the assocatons between C and D and C and E are n opposte drectons. We also dscussed when t s or s not approprate to control for a confounder, accordng to whether C s or s not n the causal pathway between E and D. Fnally, because age s often an mportant confoundng varable, t s reasonable to consder descrptve measures of proportons and relatve rs that control for age. Age-standardzed proportons and RRs are such measures. A 1985 study dentfed a group of 518 cancer cases ages 15 59 and a group of 518 age- and sex-matched controls by mal questonnare [4]. The man purpose of the study was to loo at the effect of passve smong on cancer rs. The study Copyrght 010 Cengage Learnng, Inc. All Rghts Reserved. May not be coped, scanned, or duplcated, n whole or n part.

defned passve smong as exposure to the cgarette smoe of a spouse who smoed at least one cgarette per day for at least 6 months. One potental confoundng varable was smong by the partcpants themselves (.e., personal smong) because personal smong s related to both cancer rs and spouse smong. Therefore, t was mportant to control for personal smong before loong at the relatonshp between passve smong and cancer rs. To dsplay the data, a table relatng case control status to passve smong can be constructed for both nonsmoers and smoers. The data are gven n Table 13.11 for nonsmoers and Table 13.1 for smoers. The passve-smong effect can be assessed separately for nonsmoers and smoers. Indeed, we notce from Tables 13.11 and 13.1 that the OR n favor of a case beng exposed to cgarette smoe from a spouse who smoes vs. a control s (10 155)/ (80 111).1 for nonsmoers, whereas the correspondng OR for smoers s (161 14)/(130 117) 1.3. Thus for both subgroups the trend s n the drecton of more passve smong among cases than among controls. The ey queston s how to combne the results from the two tables to obtan an overall estmated OR and test of sgnfcance for the passve-smong effect. In general, the data are stratfed nto subgroups accordng to one or more confoundng varables to mae the unts wthn a stratum as homogeneous as possble. The data for each stratum consst of a contngency table relatng exposure to dsease, as shown n Table 13.13 for the th stratum. Based on our wor on Fsher s exact test, the dstrbuton of a follows a hypergeometrc dstrbuton. The test procedure s based on a comparson of the observed number of unts n the (1, 1) cell of each stratum (denoted by O a ) wth the Copyrght 010 Cengage Learnng, Inc. All Rghts Reserved. May not be coped, scanned, or duplcated, n whole or n part.

expected number of unts n that cell (denoted by E ). The test procedure s the same regardless of order of the rows and columns; that s, whch row (or column) s desgnated as the frst row (or column) s arbtrary. Based on the hypergeometrc dstrbuton (Equaton 10.9), the expected number of unts n the (1, 1) cell of the th stratum s gven by E ( a + b)( a + c) = n The observed and expected numbers of unts n the (1, 1) cell are then summed over all strata, yeldng O O 1, E E 1, and the test s based on O E. Based on the hypergeometrc dstrbuton (Equaton 10.9), the varance of O s gven by ( a + b)( c + d)( a + c)( b + d) = n ( n 1) Furthermore, the varance of O s denoted by 1. The test statstc s gven by XMH ( O E. 5) /, whch should follow a ch-square dstrbuton wth 1 degree of freedom (df) under the null hypothess of no assocaton between dsease and exposure. H 0 s rejected f X MH s large. The abbrevaton MH refers to Mantel-Haenszel; ths procedure s nown as the Mantel-Haenszel test and s summarzed as follows. To assess the assocaton between a dchotomous dsease and a dchotomous exposure varable after controllng for one or more confoundng varables, use the followng procedure: (1) Form strata, based on the level of the confoundng varable(s), and construct a table relatng dsease and exposure wthn each stratum, as shown n Table 13.13. () Compute the total observed number of unts (O) n the (1, 1) cell over all strata, where O = O = a = 1 = 1 (3) Compute the total expected number of unts (E) n the (1, 1) cell over all strata, where E = E = = 1 = 1 ( a + b)( a + c) n (4) Compute the varance () of O under H 0, where 1 a b c d a c b d ( )( )( )( ) n n 1 1 ( ) (5) The test statstc s then gven by X MH ( O E. 5) = whch under H 0 follows a ch-square dstrbuton wth 1 df. Copyrght 010 Cengage Learnng, Inc. All Rghts Reserved. May not be coped, scanned, or duplcated, n whole or n part.

(6) For a two-sded test wth sgnfcance level, f X MH 11, then reject H 0. f X MH 11, then accept H 0. (7) The exact p-value for ths test s gven by p Pr( X MH ) 1 (8) Use ths test only f the varance s 5. (9) Whch row or column s desgnated as frst s arbtrary. The test statstc X MH and the assessment of sgnfcance are the same regardless of the order of the rows and columns. The acceptance and rejecton regons for the Mantel-Haenszel test are shown n Fgure 13.1. The computaton of the p-value for the Mantel-Haenszel test s llustrated n Fgure 13.. ( O E.5) X MH = Frequency 1 dstrbuton X MH 1, 1 Acceptance regon X MH > 1, 1 Rejecton regon 0 1, 1 alue ( O E.5) X MH = Frequency 1 dstrbuton p 0 X MH alue Copyrght 010 Cengage Learnng, Inc. All Rghts Reserved. May not be coped, scanned, or duplcated, n whole or n part.

Assess the relatonshp between passve smong and cancer rs usng the data stratfed by personal smong status n Tables 13.11 and 13.1. Denote the nonsmoers as stratum 1 and the smoers as stratum. O 1 observed number of nonsmong cases who are passve smoers 10 O observed number of smong cases who are passve smoers 161 Furthermore, E E 1 31 00 99. 1 466 78 91 15. 1 53 Thus the total observed and expected numbers of cases who are passve smoers are, respectvely, O O O 10 161 81 1 E E E 99. 1 15. 1 51. 1 Therefore, more cases are passve smoers than would be expected based on ther personal smong habts. Now compute the varance to assess whether ths dfference s statstcally sgnfcant. 1 31 35 00 66 8. 60 466 465 78 54 91 41 3. 95 53 531 Therefore 1 8. 60 3. 95 61. 55 Thus the test statstc X MH s gven by X MH 81 51.. 5 61. 55 858. 17 13.94 ~ 1 under H 61. 55 0 Because 1,. 999 10. 83 13. 94 X MH, t follows that p.001. Thus there s a hghly sgnfcant postve assocaton between case control status and passve-smong exposure, even after controllng for personal cgarette-smong habt. The Mantel-Haenszel method tests sgnfcance of the relatonshp between dsease and exposure. However, t does not measure the strength of the assocaton. Ideally, we would le a measure smlar to the OR presented for a sngle contngency table n Defnton 13.6. Assumng that the underlyng OR s the same for each stratum, an estmate of the common underlyng OR s provded by the Mantel-Haenszel estmator as follows. In a collecton of contngency tables, where the table correspondng to the th stratum s denoted as n Table 13.13, the Mantel-Haenszel estmator of the common OR s gven by Copyrght 010 Cengage Learnng, Inc. All Rghts Reserved. May not be coped, scanned, or duplcated, n whole or n part.