Describing Contingency tables

Similar documents
STAC51: Categorical data Analysis

Sections 2.3, 2.4. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis 1 / 21

2 Describing Contingency Tables

Chapter 2: Describing Contingency Tables - II

Lecture 8: Summary Measures

ST3241 Categorical Data Analysis I Two-way Contingency Tables. 2 2 Tables, Relative Risks and Odds Ratios

Chapter 2: Describing Contingency Tables - I

Contingency Tables Part One 1

STAT Chapter 13: Categorical Data. Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure).

Lecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

Discrete Multivariate Statistics

Analysis of Categorical Data Three-Way Contingency Table

Optimal exact tests for complex alternative hypotheses on cross tabulated data

Statistics of Contingency Tables - Extension to I x J. stat 557 Heike Hofmann

Statistics 3858 : Contingency Tables

3 Way Tables Edpsy/Psych/Soc 589

STAT 705: Analysis of Contingency Tables

CDA Chapter 3 part II

ij i j m ij n ij m ij n i j Suppose we denote the row variable by X and the column variable by Y ; We can then re-write the above expression as

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Advanced topics from statistics

Computational Systems Biology: Biology X

Relate Attributes and Counts

Chapter 10. Chapter 10. Multinomial Experiments and. Multinomial Experiments and Contingency Tables. Contingency Tables.

11-2 Multinomial Experiment

Department of Large Animal Sciences. Outline. Slide 2. Department of Large Animal Sciences. Slide 4. Department of Large Animal Sciences

Measures of Association and Variance Estimation

Poisson regression: Further topics

Correspondence Analysis

Sleep data, two drugs Ch13.xls

Log-linear Models for Contingency Tables

Categorical data analysis Chapter 5

Chapter 1. Modeling Basics

DIAGNOSTICS FOR STRATIFIED CLINICAL TRIALS IN PROPORTIONAL ODDS MODELS

1 Comparing two binomials

Naïve Bayes classification

Minimal basis for connected Markov chain over 3 3 K contingency tables with fixed two-dimensional marginals. Satoshi AOKI and Akimichi TAKEMURA

Testing Independence

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)

Lecture 01: Introduction

Directional Control Schemes for Multivariate Categorical Processes

Ling 289 Contingency Table Statistics

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

Test of Association between Two Ordinal Variables while Adjusting for Covariates

Solution to Tutorial 7

Describing Stratified Multiple Responses for Sparse Data

Module 10: Analysis of Categorical Data Statistics (OA3102)

Categorical Data Analysis Chapter 3

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Lab 8. Matched Case Control Studies

INTRODUCTION TO LOG-LINEAR MODELING

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

An introduction to biostatistics: part 1

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Probability Experiments, Trials, Outcomes, Sample Spaces Example 1 Example 2

n y π y (1 π) n y +ylogπ +(n y)log(1 π).

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

TESTS FOR EQUIVALENCE BASED ON ODDS RATIO FOR MATCHED-PAIR DESIGN

Review of Statistics 101

8 Nominal and Ordinal Logistic Regression

Hypothesis Testing hypothesis testing approach

Categorical Variables and Contingency Tables: Description and Inference

Bayes methods for categorical data. April 25, 2017

Nemours Biomedical Research Statistics Course. Li Xie Nemours Biostatistics Core October 14, 2014

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

Statistical Process Control for Multivariate Categorical Processes

Estimating the Number of Tables via Sequential Importance Sampling

An Overview of Methods in the Analysis of Dependent Ordered Categorical Data: Assumptions and Implications

Generalized logit models for nominal multinomial responses. Local odds ratios

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Binary response data

STAT 526 Spring Final Exam. Thursday May 5, 2011

Stat 642, Lecture notes for 04/12/05 96

Statistics in medicine

Inference for Categorical Data. Chi-Square Tests for Goodness of Fit and Independence

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Three-Way Contingency Tables

Chapter 5. Chapter 5 sections

Lecture Discussion. Confounding, Non-Collapsibility, Precision, and Power Statistics Statistical Methods II. Presented February 27, 2018

Glossary for the Triola Statistics Series

Logistic Regression Models for Multinomial and Ordinal Outcomes

Contingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878

Stat 315c: Transposable Data Rasch model and friends

Comparison of Two Samples

Missing Covariate Data in Matched Case-Control Studies

Logistic regression modeling the probability of success

Ordinal Variables in 2 way Tables

Review of One-way Tables and SAS

36-720: The Rasch Model

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Statistics in medicine

Various Issues in Fitting Contingency Tables

MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 5: Bivariate Correspondence Analysis

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

2 Inference for Multinomial Distribution

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.

Generalized Linear Models

Sample Spaces, Random Variables

Transcription:

Today s topics: Describing Contingency tables 1. Probability structure for contingency tables (distributions, sensitivity/specificity, sampling schemes). 2. Comparing two proportions (relative risk, odds ratios). 3. 2 2-tables with covariates (marginal and conditional independence). 4. I J tables (summary measures of association) Sections skipped: none Describing Contingency tables 2.1

Probability structure Let X and Y be two categorical variables, X with I categories and Y with J categories. Classifications of subjects on both variables have IJ combinations. The responses (X, Y ) of a random subject have a probability distribution. A rectangular table having I rows and J columns displays the distribution. The cells of the table represent the IJ outcomes. When the cells contain frequency counts, the table is called a contingency table or cross-classification table. Example: Solution Accuracy Correct Error Total Number of One 80 251 331 Classes Multiple 84 66 150 Total 164 317 481 Describing Contingency tables 2.2

Probability structure- cont. Let π ij denote the probability that (X, Y ) occurs in the cell in row i and column j. The probability distribution {π ij } is the joint distribution of X and Y. The marginal distribution are the row and column totals that result from summing the joint probabilities, denoted by {π i+ }. Probabilities sum to one: π i+ = i j π +j = i π ij = 1 j When X is a explanatory variable (a fixed variable) and Y a response, the notion of a joint distribution is no longer meaningful. Given that a subject is in row i of X, π j i denotes the probability of classification in column j of Y. The probabilities {π 1 i,..., π J i } form the conditional distribution of Y at category i of X. Note that π j i = 1 j Describing Contingency tables 2.3

Sensitivity & Specificity Example: Diagnosis/Test Positive Negative Total Disease Yes 0.82 0.18 1.0 No 0.01 0.99 1.0 Sensitivity of a test is the conditional probability that a diagnostic test is positive given that the subject has the disease. Specificity of a test is the conditional probability that a diagnostic test is negative given that the subject does not have the disease. Describing Contingency tables 2.4

Independence When both variables are response variables, descriptions of the association between the two can use the joint as well as the conditional distribution. In this case π j i = π ij /π i+. Two categorical variables are called independent if the joint probabilities equal the product of marginal distribution π ij = π i+ π +j or when the conditional distributions are equal for different rows π j 1 = π j 2 =..., π j I, for j = 1,..., J often called homogeneity of the conditional distributions. Describing Contingency tables 2.5

Sampling distributions The probability distributions introduced last week extend to contingency tables. 1. Poisson distribution treats cell counts as independent Poisson random variables with parameters µ ij. The joint probability mass function is then the product of the Poisson probabilities P (Y ij = n ij ). 2. When the total sample size is fixed, but not the row or column totals, the multinomial sampling model applies. 3. When the row (or column) totals are fixed, or when the X variable is explanatory and Y response each row follows a multinomial model. When the different rows are independent, the joint probability distribution for the entire data set is the product multinomial sampling or independent multinomial sampling. Describing Contingency tables 2.6

Types of study 1. Case-control studies use a retrospective design (look-into-the-past design). In this case Y is fixed and we should condition on π +j. 2. Clinical trials: Subjects are assigned to treatments. X is fixed. This is an experimental study (the others are observational studies) 3. Cohort studies: Subjects choose there own treatment and are followed over time 4. Cross-sectional design: subjects are classified simultaneously on both variables. Only the total sample size is fixed. Describing Contingency tables 2.7

Comparing two proportions Comparison of groups on a binary response variable, like success/failure, dead/alive,... We confine ourselves to two groups, i.e. the table is of size 2 2. The conditional probability on success π 1 i equals 1 π 2 i and π i (the probability on success in group i) will be used instead of π 1 i. 1 Difference of proportions: π = π 1 π 2 is a basic comparison of the rows. Possible values of π are 1 π 1 and when π = 0 the response is statistically independent of the rows. 2 Relative risk π 1 /π 2 can be any positive number. A relative risk of 1 indicates independence. This measure is useful in case a fixed difference of proportions is relatively more important for low or high π. Describing Contingency tables 2.8

Comparing two proportions- cont. 3 Odds ratio. For a probability of success π i, the odds are defined by Ω = π i /(1 π i ). The odds are nonnegative with a Ω > 1 indicating that success is more likely. The ratio of odds θ = Ω 1 Ω 2 = π 11/π 12 π 21 /π 22 sometimes called cross-product ratio. = π 11 π 22 π 12 π 21 Describing Contingency tables 2.9

Properties of the odds ratio 1. The condition Ω 1 = Ω 2 and thus θ = 1 refers to independence. When θ > 1 subjects in row 1 are more likely to have success than are subjects in row 2. 2. The odds ratio is multiplicatively symmetric around 1, i.e. a value of 4 indicates the same amount of association as a value of 1/4. The log of the odds ratio is symmetric around 0, i.e. a value of -4 refers to the same amount of association as a value of 4. The log odds ratio is also convenient for inference (next week). 3. The odds ratio does not change when an entire row is multiplied by a constant. 4. The odds ratio is equally valid for prospective, retrospective or crosssectional designs. 5. For cell counts n ij the sample odds ratio is ˆθ = (n 11 n 22 )/(n 12 n 21 ) Describing Contingency tables 2.10

Example Frequencies & Conditional proportions: Solution Accuracy Correct Error Total Number of One 80 251 331.24.76 1.0 Classes Multiple 84 66 150.56.44 1.0 Total 164 317 481 1. Difference of proportions π =.24.56 =.32 2. Relative risk.24/.56 =.43. The probability of success for those who did multiple classes is about two times as high as for those who followed one class. 3. Odds ratio 80 66 ˆθ = 84 251 =.25 The odds are 4 times as high in the multiple class group compared to single class group. Describing Contingency tables 2.11

Multiple 2 2-tables An important part of most studies is the choice of control variables. In studying the relationship between X and Y one should control for any covariate that can influence this relationship. (think about suppression and spurious correlation) We will study the relationship between X and Y while controlling for Z. This can be done by studying the relationship at fixed levels of Z. This is a single slice (two-way table) from a three-way table, called a partial table. The table obtained by summing the partial tables is called the marginal table. In the marginal table Z is ignored. The associations in partial tables is called conditional association, since it is the association for fixed values of Z. Conditional association can have a different direction than marginal association. This phenomenon is called Simpson s paradox. Describing Contingency tables 2.12

Conditional and Marginal odds ratios Let µ ijk denote expected frequencies for some sampling model. We assume both X and Y are binary. Within a fixed level k of variable Z the odds ratio is called the conditional odds ratio. θ XY (k) = µ 11kµ 22k µ 12k µ 21k Define µ ij+ = k µ ijk, then is called the marginal odds ratio. θ XY = µ 11+µ 22+ µ 12+ µ 21+ Describing Contingency tables 2.13

Conditional and Marginal Independence If X and Y are independent in the partial table k, then X and Y are called conditionally independent at level k of Z. When Y is a response: P (Y = j X = i, Z = k) = P (Y = j Z = k), for all i, j When this is true for all k, X and Y are called conditionally independent given Z. Suppose a multinomial sampling scheme applies with joint probabilities π ijk = P (X = i, Y = j, Z = k). Conditional independence is equal to Marginal independence is equal to π ijk = π i+k π +jk /π ++k π ij+ = π i++ π +j+ Conditional independence does not imply marginal independence, nor the other way around! Describing Contingency tables 2.14

Homogeneous Association A 2 2 K table has homogeneous association when θ XY (1) = θ XY (2) = = θ XY (K) (Conditional independence is a special case.) If then also θ XY (1) = θ XY (2) = = θ XY (K) θ XZ(1) = θ XZ(2) = = θ XZ(J) Homogeneous association is a symmetric property. When it occurs it is said that there is no interaction between two variables in their effects on the other variable. If an interaction exists, the conditional odds ratio is different for each level of the covariate and the covariate is said to be an effect modifier. Describing Contingency tables 2.15

I J tables For 2 2 tables a single odds ratio can summarize the association. For I J tables this is not possible, however a set of odds ratios or another summary index can describe features of the association. For rows a and b and columns c and d the odds ratio π ac π bd /π ad π bc uses four cells in a rectangular pattern. For I J tables there are many of such odds ratios. However, this set contains much redundant information. Basic sets The local odds ratios defined by θ (l) ij = π ijπ i+1j+1 π ij+1 π i+1j, 1 i (I 1), 1 j (J 1) The spanning cell odds ratios (here with the cell i = 1, j = 1 as spanning cell) θ (s) ij = π 11π ij π 1j π i1, 2 i I, 2 j J. Describing Contingency tables 2.16

Summary measures These measures summarize the association in a single number. 1. Summary measures of association. Various forms exist but in general it is difficult to understand them. 2. Ordinal trends: Concordant and discordant pairs. A pair of subjects is concordant if the subject ranked higher on X also ranks higher on Y. The pair is discordant if the subject ranked higher on X ranks lower on Y. Such measures are only valid for ordinal data 3. Ordinal trends: gamma Describing Contingency tables 2.17