Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.

Similar documents
x = , so that calculated

Stat 642, Lecture notes for 01/27/ d i = 1 t. n i t nj. n j

Joint Statistical Meetings - Biopharmaceutical Section

/ n ) are compared. The logic is: if the two

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

Kernel Methods and SVMs Extension

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests

ANSWERS CHAPTER 9. TIO 9.2: If the values are the same, the difference is 0, therefore the null hypothesis cannot be rejected.

BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS. M. Krishna Reddy, B. Naveen Kumar and Y. Ramu

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Chapter 8 Indicator Variables

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

Economics 130. Lecture 4 Simple Linear Regression Continued

Copyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for P Charts. Dr. Wayne A. Taylor

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)

Statistics for Economics & Business

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

Chapter 11: Simple Linear Regression and Correlation

Durban Watson for Testing the Lack-of-Fit of Polynomial Regression Models without Replications

LINEAR REGRESSION ANALYSIS. MODULE VIII Lecture Indicator Variables

Statistical tables are provided Two Hours UNIVERSITY OF MANCHESTER. Date: Wednesday 4 th June 2008 Time: 1400 to 1600

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

CHAPTER 6 GOODNESS OF FIT AND CONTINGENCY TABLE PREPARED BY: DR SITI ZANARIAH SATARI & FARAHANIM MISNI

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution

Statistics II Final Exam 26/6/18

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

x i1 =1 for all i (the constant ).

Chapter 13: Multiple Regression

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

F statistic = s2 1 s 2 ( F for Fisher )

Chapter 14 Simple Linear Regression

Jon Deeks and Julian Higgins. on Behalf of the Statistical Methods Group of The Cochrane Collaboration. April 2005

January Examinations 2015

Chapter 3 Describing Data Using Numerical Measures

4.1. Lecture 4: Fitting distributions: goodness of fit. Goodness of fit: the underlying principle

Comparison of Regression Lines

A Robust Method for Calculating the Correlation Coefficient

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis

Lecture 6 More on Complete Randomized Block Design (RBD)

Statistics for Business and Economics

Statistics Chapter 4

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Negative Binomial Regression

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

Tests of Single Linear Coefficient Restrictions: t-tests and F-tests. 1. Basic Rules. 2. Testing Single Linear Coefficient Restrictions

Lecture 6: Introduction to Linear Regression

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Statistical Inference. 2.3 Summary Statistics Measures of Center and Spread. parameters ( population characteristics )

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

Linear Approximation with Regularization and Moving Least Squares

Answers Problem Set 2 Chem 314A Williamsen Spring 2000

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

Linear Regression Analysis: Terminology and Notation

STATISTICS QUESTIONS. Step by Step Solutions.

Lecture 4 Hypothesis Testing

ANOVA. The Observations y ij

Expected Value and Variance

Using the estimated penetrances to determine the range of the underlying genetic model in casecontrol

Sampling Theory MODULE VII LECTURE - 23 VARYING PROBABILITY SAMPLING

Bayesian predictive Configural Frequency Analysis

Basic Business Statistics, 10/e

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Meta-Analysis of Correlated Proportions

The optimal delay of the second test is therefore approximately 210 hours earlier than =2.

Chapter 15 Student Lecture Notes 15-1

Topic- 11 The Analysis of Variance

STAT 3008 Applied Regression Analysis

Copyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for U Charts. Dr. Wayne A. Taylor

NEW ASTERISKS IN VERSION 2.0 OF ACTIVEPI

Definition. Measures of Dispersion. Measures of Dispersion. Definition. The Range. Measures of Dispersion 3/24/2014

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

Generalized Linear Methods

Chapter 1. Probability

[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students.

Chapter 9: Statistical Inference and the Relationship between Two Variables

More metrics on cartesian products

Lecture Notes for STATISTICAL METHODS FOR BUSINESS II BMGT 212. Chapters 14, 15 & 16. Professor Ahmadi, Ph.D. Department of Management

Lecture 3: Probability Distributions

Multiple Contrasts (Simulation)

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes

Basic Statistical Analysis and Yield Calculations

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

NUMERICAL DIFFERENTIATION

UCLA STAT 13 Introduction to Statistical Methods for the Life and Health Sciences. Chapter 11 Analysis of Variance - ANOVA. Instructor: Ivo Dinov,

First Year Examination Department of Statistics, University of Florida

Module 2. Random Processes. Version 2 ECE IIT, Kharagpur

ECONOMETRICS - FINAL EXAM, 3rd YEAR (GECO & GADE)

University of Washington Department of Chemistry Chemistry 453 Winter Quarter 2015

FREQUENCY DISTRIBUTIONS Page 1 of The idea of a frequency distribution for sets of observations will be introduced,

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

1-FACTOR ANOVA (MOTIVATION) [DEVORE 10.1]

Introduction to Regression

Structure and Drive Paul A. Jensen Copyright July 20, 2003

Chapter Newton s Method

MATCHING IN CASE CONTROL STUDIES. Matching addresses issues of confounding in the DESIGN stage of a study as opposed to the analysis phase

Transcription:

yes to (3) two-sample problem? no to (4) underlyng dstrbuton normal or can centrallmt theorem be assumed to hold? and yes to (5) underlyng dstrbuton bnomal? We now refer to the flowchart at the end of ths chapter (p. 409). We answer yes to (1) are samples ndependent? () are all expected values 5? and (3) contngency table? Ths leads us to the box labeled Use the two-sample test for bnomal proportons or contngency-table methods f no confoundng s present, or Mantel-Haenszel test f confoundng s present. In bref, a confounder s another varable that s potentally related to both the row and column classfcaton varables, and t must be controlled for. We dscuss methods for controllng for confoundng n Chapter 13. In ths chapter, we assume no confoundng s present. Thus we use ether the two-sample test for bnomal proportons (Equaton 10.3) or the equvalent ch-square test for contngency tables (Equaton 10.5). In Secton 10., we dscussed methods for comparng two bnomal proportons usng ether normal-theory or contngency-table methods. Both methods yeld dentcal p-values. However, they requre that the normal approxmaton to the bnomal dstrbuton be vald, whch s not always the case, especally for small samples. Suppose we want to nvestgate the relatonshp between hgh salt ntae and death from cardovascular dsease (CD). Groups of hgh- and low-salt users could be dentfed and followed over a long tme to compare relatve frequency of death from CD n the two groups. In contrast, a much less expensve study would nvolve loong at death records, separatng CD deaths from non-cd deaths, asng a close relatve (such as a spouse) about the detary habts of the deceased, and then comparng salt ntae between people who ded of CD vs. people who ded of other causes. The latter type of study, a retrospectve study, may be mpossble to perform for a number of reasons. But f t s possble, t s almost always less expensve than the former type, a prospectve study. Suppose a retrospectve study s done among men ages 5054 n a specfc county who ded over a 1-month perod. The nvestgators try to nclude approxmately an equal number of men who ded from CD (the cases) and men who ded from other causes (the controls). Of 35 people who ded from CD, 5 were on a hgh-salt det before they ded, whereas of 5 people who ded from other causes were on such a det. These Copyrght 010 Cengage Learnng, Inc. All Rghts Reserved. May not be coped, scanned, or duplcated, n whole or n part.

data, presented n Table 10.9, are n the form of a contngency table, so the methods of Secton 10. may be applcable. However, the expected values of ths table are too small for such methods to be vald. Indeed, E E 11 1 75 60. 9 735 60 4. 08 thus two of the four cells have expected values less than 5. How should the possble assocaton between cause of death and type of det be assessed? In ths case, Fsher s exact test can be used. Ths procedure gves exact levels of sgnfcance for any table but s only necessary for tables wth small expected values, tables n whch the standard ch-square test as gven n Equaton 10.5 s not applcable. For tables n whch use of the ch-square test s approprate, the two tests gve very smlar results. Suppose the probablty that a man was on a hgh-salt det gven that hs cause of death was noncardovascular (non-cd) p 1 and the probablty that a man was on a hgh-salt det gven that hs cause of death was cardovascular (CD) p. We wsh to test the hypothess H 0 : p 1 p p vs. H 1 : p 1 p. Table 10.10 gves the general layout of the data. For mathematcal convenence, we assume the margns of ths table are fxed; that s, the numbers of non-cd deaths and CD deaths are fxed at a b and c d, respectvely, and the numbers of people on hgh- and low-salt dets are fxed at a c and b d, respectvely. Indeed, t s dffcult to compute exact probabltes unless one assumes fxed margns. The exact probablty of observng the table wth cells a, b, c, d s as follows. Copyrght 010 Cengage Learnng, Inc. All Rghts Reserved. May not be coped, scanned, or duplcated, n whole or n part.

( ) ( + ) ( + ) ( + ) ( ) = Pr a, bcd,, a+ b! c d! a c! b d! nabc!!!! d! The formula n Equaton 10.7 s easy to remember because the numerator s the product of the factorals of each of the row and column margns, and the denomnator s the product of the factoral of the grand total and the factorals of the ndvdual cells. Suppose we have the table shown n Table 10.11. Compute the exact probablty of obtanng ths table assumng the margns are fxed. Pr 531,,, 7456!!!! 11! 531!!!! 5040 4 10 70 39, 916, 800 10 6 1. 0450944 10 1 5. 748019 10 0 10. 18 Suppose we consder all possble tables wth fxed row margns denoted by N 1 and N and fxed column margns denoted by M 1 and M. We assume the rows and columns have been rearranged so that M 1 M and N 1 N. We refer to each table by ts (1, 1) cell because all other cells are then determned from the fxed row and column margns. Let the random varable X denote the cell count n the (1, 1) cell. The probablty dstrbuton of X s gven by ( ) = Pr X = a N1! N! M1! M! a M N N! a! N a! M a! M N + a!,, K,mn, ( ) ( ) ( ) 1 1 1 0 1 1 = ( ) and N N 1 N M 1 M. Ths probablty dstrbuton s called the hypergeometrc dstrbuton. It wll be useful for our subsequent wor on combnng evdence from more than one table n Chapter 13 to refer to the expected value and varance of the hypergeometrc dstrbuton. These are as follows. Suppose we consder all possble tables wth fxed row margns N 1, N and fxed column margns M 1, M, where N 1 N, M 1 M, and N N 1 N M 1 M. Let the random varable X denote the cell count n the (1, 1) cell. The expected value and varance of X are Copyrght 010 Cengage Learnng, Inc. All Rghts Reserved. May not be coped, scanned, or duplcated, n whole or n part.

E X ar X ( ) = ( ) = MN 1 1 N MMNN N N 1 1 1 ( ) Thus the exact probablty of obtanng a table wth cells a, b, c, d n Equaton 10.7 s a specal case of the hypergeometrc dstrbuton, where N 1 a b, N c d, M 1 a c, M b d, and N a b c d. We can evaluate ths probablty by calculator usng Equaton 10.7, or we can use the HYPGEOMDIST functon of Excel. In the latter case, to evaluate Pr(a, b, c, d), we specfy HYPGEOMDIST (a, a b, a c, N). In words, the hypergeometrc dstrbuton evaluates the probablty of obtanng a successes out of a sample of a b observatons, gven that the total populaton (n ths case, the two samples combned), s of sze N, of whch a c observatons are successes. Thus, to evaluate the exact probablty n Table 10.11, we specfy HYPGEOMDIST (, 7, 5, 11).18, whch s the probablty of obtanng two successes n a sample of 7 observatons gven that the total populaton conssts of 11 observatons, of whch 5 are successes. The hypergeometrc dstrbuton dffers from the bnomal dstrbuton, because n the latter case, we smply evaluate the probablty of obtanng a successes out of a b observatons, assumng that each outcome s ndependent. For the hypergeometrc dstrbuton, the outcomes are not ndependent because once a success occurs t s less lely that another observaton wll be a success, as the total number of successes s fxed (at a c). If N s large, the two dstrbutons are very smlar because there s only a slght devaton from ndependence for the hypergeometrc. The basc strategy n testng the hypothess H0: p1 p vs. H1: p1 p wll be to enumerate all possble tables wth the same margns as the observed table and to compute the exact probablty for each such table based on the hypergeometrc dstrbuton. A method for accomplshng ths s as follows. (1) Rearrange the rows and columns of the observed table so the smaller row total s n the frst row and the smaller column total s n the frst column. Suppose that after the rearrangement, the cells n the observed table are a, b, c, d, as shown n Table 10.10. () Start wth the table wth 0 n the (1, 1) cell. The other cells n ths table are then determned from the row and column margns. Indeed, to mantan the same row and column margns as the observed table, the (1, ) element must be a b, the (, 1) cell must be a c, and the (, ) element must be (c d) (a c) d a. (3) Construct the next table by ncreasng the (1, 1) cell by 1 (.e., from 0 to 1), decreasng the (1, ) and (, 1) cells by 1, and ncreasng the (, ) cell by 1. (4) Contnue ncreasng and decreasng the cells by 1, as n step 3, untl one of the cells s 0, at whch pont all possble tables wth the gven row and column margns have been enumerated. Each table n the sequence of tables s referred to by ts (1, 1) element. Thus, the frst table s the 0 table, the next table s the 1 table, and so on. Enumerate all possble tables wth the same row and column margns as the observed data n Table 10.9. Copyrght 010 Cengage Learnng, Inc. All Rghts Reserved. May not be coped, scanned, or duplcated, n whole or n part.

The observed table has a, b 3, c 5, d 30. The rows or columns do not need to be rearranged because the frst row total s smaller than the second row total, and the frst column total s smaller than the second column total. Start wth the 0 table, whch has 0 n the (1, 1) cell, 5 n the (1, ) cell, 7 n the (, 1) cell, and 30, or 8, n the (, ) cell. The 1 table then has 1 n the (1, 1) cell, 5 1 4 n the (1, ) cell, 7 1 6 n the (, 1) cell, and 8 1 9 n the (, ) cell. Contnue n ths fashon untl the 7 table s reached, whch has 0 n the (, 1) cell, at whch pont all possble tables wth the gven row and column margns have been enumerated. The set of hypergeometrc probabltes n Table 10.1 can be easly evaluated usng the recursve propertes of Excel by (1) settng up a column wth consecutve values from 0 to 7 (say from B1 to B8), () usng the functon HYPGEOMDIST to compute Pr(0) HYPGEOMDIST (B1, 5, 7, 60) and placng t n C1, and then (3) draggng the cursor down column C to compute the remanng hypergeometrc probabltes. See the Companon Webste for more detals on the use of the HYPGEOMDIST functon. The collecton of tables and ther assocated probabltes based on the hypergeometrc dstrbuton n Equaton 10.8 are gven n Table 10.1. The queston now s: What should be done wth these probabltes to evaluate the sgnfcance of the results? The answer depends on whether a one-sded or a twosded alternatve s beng used. In general, the followng method can be used. To test the hypothess H0: p1 = p vs. H1: p1 p, where the expected value of at least one cell s 5 when the data are analyzed n the form of a contngency table, use the followng procedure: (1) Enumerate all possble tables wth the same row and column margns as the observed table, as shown n Equaton 10.10. () Compute the exact probablty of each table enumerated n step 1, usng ether the computer or the formula n Equaton 10.7. (3) Suppose the observed table s the a table and the last table enumerated s the table. (a) To test the hypothess H0: p1 = p vs. H1: p1 p, the p-value mn Pr( 0) + Pr( 1) +... + Pr( a), Pr( a) + Pr( a + 1) +... + Pr( ),. 5. [ ] (b) To test the hypothess H0: p1 = p vs. H1: p1 < p, the p-value Pr(0) Pr(1)... Pr(a). Copyrght 010 Cengage Learnng, Inc. All Rghts Reserved. May not be coped, scanned, or duplcated, n whole or n part.

(c) To test the hypothess H0: p1 = p vs. H1: p1 > p, the p-value Pr(a) Pr(a 1) Pr(). For each of these three alternatve hypotheses, the p-value can be nterpreted as the probablty of obtanng a table as extreme as or more extreme than the observed table. Evaluate the statstcal sgnfcance of the data n Example 10.17 usng a two-sded alternatve. We want to test the hypothess H0: p1 p vs. H1: p1 p. Our table s the table whose probablty s.5 n Table 10.1. Thus, to compute the p-value, the smaller of the tal probabltes correspondng to the table s computed and doubled. Ths strategy corresponds to the procedures for the varous normal-theory tests studed n Chapters 7 and 8. Frst compute the left-hand tal area, Pr( 0) Pr() 1 Pr( ). 017. 105. 5. 375 and the rght-hand tal area, Pr( ) Pr( 3)... Pr( 7). 5. 31. 14. 08. 016. 001. 878 Then p mn(. 375,. 878,. 5) (. 375). 749 If a one-sded alternatve of the form H0: p1 p vs. H1: p1 p s used, then the p-value equals Pr( 0) Pr() 1 Pr( ). 017. 105. 5. 375 Thus the two proportons n ths example are not sgnfcantly dfferent wth ether a one-sded or two-sded test, and we cannot say, on the bass of ths lmted amount of data, that there s a sgnfcant assocaton between salt ntae and cause of death. In most nstances, computer programs are used to mplement Fsher s exact test usng statstcal pacages such as SAS. There are other possble approaches to sgnfcance testng n the two-sded case. For example, the approach used by SAS s to compute p-value (two-taled) : Pr( ) Pr( a) Pr() In other words, the two-taled p-value usng SAS s the sum of the probabltes of all tables whose probabltes are the probablty of the observed table. Usng ths approach, the two-taled p-value would be p-value (two-taled) Pr( 0) Pr() 1 Pr( ) Pr( 4) Pr( 5) Pr( 6) Pr( 7). 017. 105. 5. 14. 08. 016. 001.688 In ths secton, we learned about Fsher s exact test, whch s used for comparng bnomal proportons from two ndependent samples n tables wth small expected counts (5). Ths s the two-sample analog to the exact one-sample bnomal test gven n Equaton 7.44. If we refer to the flowchart at the end of ths chapter (Fgure 10.16, p. 409), we answer yes to (1) are samples ndependent? and no to () are all expected values 5? Ths leads us to the box labeled Use Fsher s exact test. Copyrght 010 Cengage Learnng, Inc. All Rghts Reserved. May not be coped, scanned, or duplcated, n whole or n part.

dsease exposure relatonshps n a hypothess-testng framewor usng the Mantel- Haenszel test. Fnally, standardzaton can be based on stratfcaton by factors other than age. For example, standardzaton by both age and sex s common. Smlar methods can be used to obtan age sex standardzed rss and standardzed RRs as gven n Defnton 13.15. In ths secton, we have ntroduced the concept of a confoundng varable (C), a varable related to both the dsease (D) and exposure (E) varables. Furthermore, we classfed confoundng varables as postve confounders f the assocatons between C and D and C and E, respectvely, are n the same drecton and as negatve confounders f the assocatons between C and D and C and E are n opposte drectons. We also dscussed when t s or s not approprate to control for a confounder, accordng to whether C s or s not n the causal pathway between E and D. Fnally, because age s often an mportant confoundng varable, t s reasonable to consder descrptve measures of proportons and relatve rs that control for age. Age-standardzed proportons and RRs are such measures. A 1985 study dentfed a group of 518 cancer cases ages 15 59 and a group of 518 age- and sex-matched controls by mal questonnare [4]. The man purpose of the study was to loo at the effect of passve smong on cancer rs. The study Copyrght 010 Cengage Learnng, Inc. All Rghts Reserved. May not be coped, scanned, or duplcated, n whole or n part.

defned passve smong as exposure to the cgarette smoe of a spouse who smoed at least one cgarette per day for at least 6 months. One potental confoundng varable was smong by the partcpants themselves (.e., personal smong) because personal smong s related to both cancer rs and spouse smong. Therefore, t was mportant to control for personal smong before loong at the relatonshp between passve smong and cancer rs. To dsplay the data, a table relatng case control status to passve smong can be constructed for both nonsmoers and smoers. The data are gven n Table 13.11 for nonsmoers and Table 13.1 for smoers. The passve-smong effect can be assessed separately for nonsmoers and smoers. Indeed, we notce from Tables 13.11 and 13.1 that the OR n favor of a case beng exposed to cgarette smoe from a spouse who smoes vs. a control s (10 155)/ (80 111).1 for nonsmoers, whereas the correspondng OR for smoers s (161 14)/(130 117) 1.3. Thus for both subgroups the trend s n the drecton of more passve smong among cases than among controls. The ey queston s how to combne the results from the two tables to obtan an overall estmated OR and test of sgnfcance for the passve-smong effect. In general, the data are stratfed nto subgroups accordng to one or more confoundng varables to mae the unts wthn a stratum as homogeneous as possble. The data for each stratum consst of a contngency table relatng exposure to dsease, as shown n Table 13.13 for the th stratum. Based on our wor on Fsher s exact test, the dstrbuton of a follows a hypergeometrc dstrbuton. The test procedure s based on a comparson of the observed number of unts n the (1, 1) cell of each stratum (denoted by O a ) wth the Copyrght 010 Cengage Learnng, Inc. All Rghts Reserved. May not be coped, scanned, or duplcated, n whole or n part.

expected number of unts n that cell (denoted by E ). The test procedure s the same regardless of order of the rows and columns; that s, whch row (or column) s desgnated as the frst row (or column) s arbtrary. Based on the hypergeometrc dstrbuton (Equaton 10.9), the expected number of unts n the (1, 1) cell of the th stratum s gven by E ( a + b)( a + c) = n The observed and expected numbers of unts n the (1, 1) cell are then summed over all strata, yeldng O O 1, E E 1, and the test s based on O E. Based on the hypergeometrc dstrbuton (Equaton 10.9), the varance of O s gven by ( a + b)( c + d)( a + c)( b + d) = n ( n 1) Furthermore, the varance of O s denoted by 1. The test statstc s gven by XMH ( O E. 5) /, whch should follow a ch-square dstrbuton wth 1 degree of freedom (df) under the null hypothess of no assocaton between dsease and exposure. H 0 s rejected f X MH s large. The abbrevaton MH refers to Mantel-Haenszel; ths procedure s nown as the Mantel-Haenszel test and s summarzed as follows. To assess the assocaton between a dchotomous dsease and a dchotomous exposure varable after controllng for one or more confoundng varables, use the followng procedure: (1) Form strata, based on the level of the confoundng varable(s), and construct a table relatng dsease and exposure wthn each stratum, as shown n Table 13.13. () Compute the total observed number of unts (O) n the (1, 1) cell over all strata, where O = O = a = 1 = 1 (3) Compute the total expected number of unts (E) n the (1, 1) cell over all strata, where E = E = = 1 = 1 ( a + b)( a + c) n (4) Compute the varance () of O under H 0, where 1 a b c d a c b d ( )( )( )( ) n n 1 1 ( ) (5) The test statstc s then gven by X MH ( O E. 5) = whch under H 0 follows a ch-square dstrbuton wth 1 df. Copyrght 010 Cengage Learnng, Inc. All Rghts Reserved. May not be coped, scanned, or duplcated, n whole or n part.

(6) For a two-sded test wth sgnfcance level, f X MH 11, then reject H 0. f X MH 11, then accept H 0. (7) The exact p-value for ths test s gven by p Pr( X MH ) 1 (8) Use ths test only f the varance s 5. (9) Whch row or column s desgnated as frst s arbtrary. The test statstc X MH and the assessment of sgnfcance are the same regardless of the order of the rows and columns. The acceptance and rejecton regons for the Mantel-Haenszel test are shown n Fgure 13.1. The computaton of the p-value for the Mantel-Haenszel test s llustrated n Fgure 13.. ( O E.5) X MH = Frequency 1 dstrbuton X MH 1, 1 Acceptance regon X MH > 1, 1 Rejecton regon 0 1, 1 alue ( O E.5) X MH = Frequency 1 dstrbuton p 0 X MH alue Copyrght 010 Cengage Learnng, Inc. All Rghts Reserved. May not be coped, scanned, or duplcated, n whole or n part.

Assess the relatonshp between passve smong and cancer rs usng the data stratfed by personal smong status n Tables 13.11 and 13.1. Denote the nonsmoers as stratum 1 and the smoers as stratum. O 1 observed number of nonsmong cases who are passve smoers 10 O observed number of smong cases who are passve smoers 161 Furthermore, E E 1 31 00 99. 1 466 78 91 15. 1 53 Thus the total observed and expected numbers of cases who are passve smoers are, respectvely, O O O 10 161 81 1 E E E 99. 1 15. 1 51. 1 Therefore, more cases are passve smoers than would be expected based on ther personal smong habts. Now compute the varance to assess whether ths dfference s statstcally sgnfcant. 1 31 35 00 66 8. 60 466 465 78 54 91 41 3. 95 53 531 Therefore 1 8. 60 3. 95 61. 55 Thus the test statstc X MH s gven by X MH 81 51.. 5 61. 55 858. 17 13.94 ~ 1 under H 61. 55 0 Because 1,. 999 10. 83 13. 94 X MH, t follows that p.001. Thus there s a hghly sgnfcant postve assocaton between case control status and passve-smong exposure, even after controllng for personal cgarette-smong habt. The Mantel-Haenszel method tests sgnfcance of the relatonshp between dsease and exposure. However, t does not measure the strength of the assocaton. Ideally, we would le a measure smlar to the OR presented for a sngle contngency table n Defnton 13.6. Assumng that the underlyng OR s the same for each stratum, an estmate of the common underlyng OR s provded by the Mantel-Haenszel estmator as follows. In a collecton of contngency tables, where the table correspondng to the th stratum s denoted as n Table 13.13, the Mantel-Haenszel estmator of the common OR s gven by Copyrght 010 Cengage Learnng, Inc. All Rghts Reserved. May not be coped, scanned, or duplcated, n whole or n part.