STAC51: Categorical data Analysis

Size: px
Start display at page:

Download "STAC51: Categorical data Analysis"

Transcription

1 STAC51: Categorical data Analysis Mahinda Samarakoon January 26, 2016 Mahinda Samarakoon STAC51: Categorical data Analysis 1 / 32

2 Table of contents Contingency Tables 1 Contingency Tables Mahinda Samarakoon STAC51: Categorical data Analysis 2 / 32

3 Contingency Tables The two-way table below shows the distribution of the number of members in a fitness club classified by two variables. Women Men Vegetarian 9 3 Non-vegetarian 8 10 Does an association exist between gender and food habits (being a vegetarian or a non-vegetarian)? Are women in this club more likely to be vegetarians than men? Mahinda Samarakoon STAC51: Categorical data Analysis 3 / 32

4 Joint distribution Contingency Tables The three probability distributions below are often used when analyzing data from a contingency table: Joint distribution: The joint distribution of X and Y is defined by the collection of probabilities π ij = P(X = i, Y = j) for i = 1,..., I and j = 1,..., J. If n ij denote the observed number of evens in the i th row and j th column, π ij is estimated by p ij = n ij /n where n = I i=1 J j=1 n ij. Mahinda Samarakoon STAC51: Categorical data Analysis 4 / 32

5 Marginal distribution Contingency Tables Marginal distribution: The probability distribution of X is called the marginal distribution of X. The probability mass function of X, i.e. f X (i) = P(X = i) for i = 1,..., I identifies the marginal distribution of X. This is estimated by ˆf X (i) = n i+ /n for i = 1,..., I where n i+ = J j=1 n ij. Similarly The probability mass function of Y, i.e. p Y (j) = P(Y = j) for j = 1,..., J identifies the marginal distribution of Y. This is estimated by ˆf Y (j) = n +j /n for j = 1,..., J where n +j = I i=1 n ij. Mahinda Samarakoon STAC51: Categorical data Analysis 5 / 32

6 Conditional distribution Conditional distribution: A conditional distribution refers to the probability distribution of Y at a fixed level of X. The conditional distribution of Y given X = i can be estimated by n ij /n i+, j = 1,..., J. We choose Y to be the dependent variable and X to be the independent variable. Mahinda Samarakoon STAC51: Categorical data Analysis 6 / 32

7 Conditional distribution In the fitness club data above, Y is food habits (being a vegetarian or a non-vegetarian) and X is gender. The estimated conditional distribution (i.e. conditional proportions) for women (say X = 1) is given below: Y Probability Vegetarian 9/(9 + 8) = 0.53 Non-vegetarian 8/(9 + 8) = 0.47 The estimated conditional distribution (i.e. conditional proportions) for men (say X = 2) is given below: Y Probability Vegetarian 3/(3 + 10) = 0.23 Non-vegetarian 10/(3 + 10) = 0.77 The proportion of vegetarians among women is more than twice that among men. Mahinda Samarakoon STAC51: Categorical data Analysis 7 / 32

8 Sensitivity and Specificity in Diagnostic Tests Diagnostic testing is used to detect many medical conditions. For example, a test can detect cancer in a population. The result of a diagnostic test is said to be positive if it states that the disease is present and negative if it states that the disease is absent. The accuracy of diagnostic tests is often assessed with two conditional probabilities: Given that a subject has the disease, the probability the diagnostic test is positive is called the sensitivity. Given that the subject does not have the disease, the probability the test is negative is called the specificity. Mahinda Samarakoon STAC51: Categorical data Analysis 8 / 32

9 Sensitivity and Specificity in Diagnostic Tests If X denote the true state of a person, with categories 1 = the person has the disease, 0 = the person does not have the disease, and if Y = outcome of diagnostic test, with categories 1 = positive, 0 = negative, then, sensitivity = P(Y = 1 X = 1), specificity = P(Y = 0 X = 0). The higher the sensitivity and specificity, the better the diagnostic test. If you get positive result on a diagnostic test, then you might be interested in knowing the probability that you really have the disease, i.e. P(X = 1 Y = 1). This may be low even if sensitivity and specificity are both high. Mahinda Samarakoon STAC51: Categorical data Analysis 9 / 32

10 Sensitivity and Specificity in Diagnostic Tests: Example (Agresti) The data are from a screening test for HIV that was performed on a group of 100,000 people. Note that the prevalence rate of HIV in this group was very low. HIV status Test Result Positive Negative Total Positive Negative Total In this study the estimated sensitivity = 475/500 = 0.95 and estimated specificity = 94525/99500 = Mahinda Samarakoon STAC51: Categorical data Analysis 10 / 32

11 Sensitivity and Specificity in Diagnostic Tests: Example (Agresti) Breast cancer is the most common form of cancer in women. Of women who get mammograms at any given time, it has been estimated that 1% truly have breast cancer. Typical values reported for mammograms are sensitivity = 0.86 and specificity = If these values are correct, then given that a mammogram has a positive result, what is the probability this person truly has breast cancer? Solution: We have P(C) = 0.01, P(+ C) = 0.86 and P( C c ) = P(C+) = P(+ C)P(C) = = P(C c ) = P( C c )P(C c ) = 0.88 (1 0.01) = P(C c +) + P(C c ) = P(C c ) = = 0.99 = P(C c +) = = and P(+) = P(C+) + P(C c +) = = P(C +) = P(C+) P(+) = = Mahinda Samarakoon STAC51: Categorical data Analysis 11 / 32

12 Independence of Two Categorical Variables Definition The random variables X and Y are said to be related if the conditional distribution of Y given that X = x changes as x changes. Definition The random variables X and Y are said to be statistically independent if the conditional distribution of Y given that X = x is identical at each level of x. Note that X and Y are statistically independent if and only if π ij = π i+ π +j Mahinda Samarakoon STAC51: Categorical data Analysis 12 / 32

13 Independence of Two Categorical Variables: Example The joint distribution of the two random variables X and Y is given below: y x Are X and Y independent? Why? Mahinda Samarakoon STAC51: Categorical data Analysis 13 / 32

14 Poisson, binomial, and multinomial sampling What is the distribution of counts in a contingency table? There are four possible cases. Poisson Sampling In some cases we can treat each cell of an I J contingency table as independent Poisson random variables; i.e., the number of observation s in each cell, N ij independent Poisson(µ ij ). Thus, P(N ij = n ij ) = e µ ij µ n ij il n ij!, n ij = 0, 1, 2,... Independent Poisson sampling is appropriate when the total sample size n is not fixed. Mahinda Samarakoon STAC51: Categorical data Analysis 14 / 32

15 Poisson sampling Example These data are from records of accidents in 1988 compiled by the Department of Highway Safety and Motor Vehicles in Florida (Agresti, 1990, 1996). Seat belt use Injury Fatal Nonfatal No , 527 Yes , 368 Mahinda Samarakoon STAC51: Categorical data Analysis 15 / 32

16 Multinomial Sampling sampling When n is fixed (or conditional on sample size), multinomial sampling occurs over all of the cells of the contingency table; i.e., (N 11, N 12,..., N IJ ) Multinomial(n, π 11, π 12,..., π IJ ). (Agresti 3rd ed, p41) Suppose the researchers randomly sample 200 police records of accidents and classify each according to seat-best use and outcome of the accident (fatal or non-fatal). For this study the total sample size n is fixed (n = 200). They might treat the numbers of observations at the four combinations of seat-belt use and outcome of accident as a multinomial random variables with unknown joint probabilities (π 11, π 12, π 21, π 22 ) Mahinda Samarakoon STAC51: Categorical data Analysis 16 / 32

17 Independent Multinomial Sampling sampling Sometimes the row totals, n 1+, n 2+,..., n I + are fixed by the sampling design. For example in a clinical trial, there may be only 10 people available for the placebo group and 12 people available for the drug group. This type of sampling is appropriate for case-control studies and cohort studies. If there are only two possible outcomes for the trial cured and not-cured. In this case, we have binomial sampling within each row of the contingency table. This is often called independent binomial sampling since random variables are independent across the rows. Mahinda Samarakoon STAC51: Categorical data Analysis 17 / 32

18 Independent Multinomial Sampling sampling When more than two outcomes are possible, say cured, partially cured, and not cured, then independent multinomial sampling occurs within each row of the contingency table. (N 11, N 12,..., N 1J ) Multinomial(n 1+, π 1 1, π 2 1,..., π I 1 ), (N 21, N 22,..., N 2J ) Multinomial(n 2+, π 1 2, π 2 2,..., π I 2 ), and so on. - Sometimes both row and column totals are fixed. In that case hypergeometric sampling is appropriate. Mahinda Samarakoon STAC51: Categorical data Analysis 18 / 32

19 Conditional Association in Stratified 2 2 tables More than two categorical variables may be of interest. Let X, Y and Z be three categorical variables. Let s assume the X and Y each has two levels and Z has k levels. The table showing the counts in each cell is now a three dimensional table, but we can also display they as k two dimensional tables. Mahinda Samarakoon STAC51: Categorical data Analysis 19 / 32

20 Conditional Association in Stratified 2 2 tables Y Z = Total 1 n X 111 n 121 n n 211 n 221 n 2+1 Total n +11 n +21 n ++1 and for Z = k, Y Z = Total 1 n X 112 n 122 n n 212 n 222 n 2+2 Total n +12 n +22 n ++2 Y Z = k 1 2 Total 1 n X 11k n 12k n 1+k 2 n 21k n 22k n 2+k Total n +1k n +2k n ++k Mahinda Samarakoon STAC51: Categorical data Analysis 20 / 32

21 Conditional Association in Stratified 2 2 tables This idea can be extended to I J K tables Notation P(X = i, Y = j, Z = k) = π ijk and µ ijk = E(n ijk ) I J K Note 1: π ijk = 1 i=1 j=1 k=1 Note 2: An unbiased estimator of π ijk is ˆπ ijk = n ijk /n. i.e. E(ˆπ ijk ) = π ijk Note 3: A table showing the counts based on X and Y for a particular level of the third variable Z is called a partial table. These tables can be used to study the relationship between X and Y for a particular level of the variable Z. The associations in partial tables are called conditional associations, because they refer to the effect of X on Y conditional on fixing Z at some level. Mahinda Samarakoon STAC51: Categorical data Analysis 21 / 32

22 Conditional Association in Stratified 2 2 tables Note 4: The two-way XY contingency table obtained by combining the partial tables for all levels of the variable Z is called the XY marginal table. The marginal table contains no information about Z. It is simply a two-way table relating X and Y but may reflect the effects of Z on X and Y. Mahinda Samarakoon STAC51: Categorical data Analysis 22 / 32

23 Z as a control variable Contingency Tables Sometimes Z plays the role of a control variable In this case, the purpose is to understand the relationship between X and Y while controlling for Z In this case Z is sometimes called a layer variable Sometimes Z is also called a stratification variable Mahinda Samarakoon STAC51: Categorical data Analysis 23 / 32

24 Example Contingency Tables Let s assume that a university consists only two professional schools: Law school and business school and we are interested in studying the association between the two variables X = gender and Y = admission decision. Z = school is another variable that might influence the admission decision. X has two levels: 1 = Male, 2 = Female. Y has two levels: 1 = accepted, 2 = rejected. Z has two levels: 1 = law school, 2 = business school Mahinda Samarakoon STAC51: Categorical data Analysis 24 / 32

25 Example Partial tables Contingency Tables Y = Decision Z = 1 = Law school Accepted Rejected Total Male n X = gender 111 = 10 n 121 = 90 n 1+1 = 100 Female n 211 = 100 n 221 = 200 n 2+1 = 300 Total n +11 = 110 n +21 = 290 n ++1 = 400 Y = Decision Z = 2 = Business school Accepted Rejected Total Male n X = gender 112 = 480 n 122 = 120 n 1+2 = 600 Female n 212 = 180 n 222 = 20 n 2+2 = 200 Total n +12 = 660 n +22 = 140 n ++2 = 800 Marginal XY table Y = Decision Both schools Accepted Rejected Total Male n 11+ = 490 n 12+ = 210 n 1++ = 700 X = gender Female n 21+ = 280 n 22+ = 220 n 2++ = 500 Total n +1+ = 770 n +2+ = 430 n +++ = 1200 Mahinda Samarakoon STAC51: Categorical data Analysis 25 / 32

26 Conditional and Marginal Odds Ratios Odds ratios can describe marginal and conditional associations. Let {µ ijk } denote cell expected frequencies for some sampling model, such as binomial, multinomial, or Poisson sampling. Then the conditional odds ratio for category k of Z is given by θ XY (k) = µ 11kµ 22k µ 21k µ 12k The marginal odds ratio is given by θ XY = µ 11+µ 22+ µ 21+ µ 12+ Mahinda Samarakoon STAC51: Categorical data Analysis 26 / 32

27 Conditional and Marginal Odds Ratios Their estimates are given by ˆθ XY (k) = n 11kn 22k n 21k n 12k The estimate of marginal odds ratio is given by ˆθ XY = n 11+n 22+ n 21+ n 12+ Mahinda Samarakoon STAC51: Categorical data Analysis 27 / 32

28 Conditional and Marginal Odds Ratios: Example In example 25 above, ˆθ XY (1) = = 2 9 This means, for females, the odds of being selected to the law school is 4.5 times higher than that for males ˆθ XY (2) = = 1 12 This means, for females, the odds of being selected to the business school is 12 times higher than that for males. The marginal odds ratio is given by ˆθ XY = n 11+n = n 21+ n = 2.96 This means, for males, the odds of being selected to this school is 2.96 times higher than that for females. Mahinda Samarakoon STAC51: Categorical data Analysis 28 / 32

29 Conditional and Marginal Odds Ratios: Example The result that a marginal association can have a different direction from each conditional association is called Simpson s paradox. Mahinda Samarakoon STAC51: Categorical data Analysis 29 / 32

30 Marginal Independence versus Conditional Independence Definition: If X and Y are independent in partial table k, then X and Y are called conditionally independent at level k of Z. X and Y are said to be conditionally independent given Z when they are conditionally independent at every level of Z. Mahinda Samarakoon STAC51: Categorical data Analysis 30 / 32

31 Marginal Independence versus Conditional Independence : Example The data shown below is from Agresti Table 2.7, p52. Response Clinic Treatment Success Failure 1 A B A 2 8 B 8 32 Total A B Mahinda Samarakoon STAC51: Categorical data Analysis 31 / 32

32 Marginal Independence versus Conditional Independence : Example ˆθ XY (1) = n 111n 221 n 211 n 121 = = 1 and so X and Y are conditionally independent at Z = 1. ˆθ XY (2) = n 112n 222 n 212 n 122 = = 1 and so X and Y are conditionally independent at Z = 2. The marginal odds ratio is ˆθ XY = n 11+n = n 21+ n = 2 and so X and Y are not marginally independent. Mahinda Samarakoon STAC51: Categorical data Analysis 32 / 32

Chapter 2: Describing Contingency Tables - I

Chapter 2: Describing Contingency Tables - I : Describing Contingency Tables - I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu]

More information

Describing Contingency tables

Describing Contingency tables Today s topics: Describing Contingency tables 1. Probability structure for contingency tables (distributions, sensitivity/specificity, sampling schemes). 2. Comparing two proportions (relative risk, odds

More information

2 Describing Contingency Tables

2 Describing Contingency Tables 2 Describing Contingency Tables I. Probability structure of a 2-way contingency table I.1 Contingency Tables X, Y : cat. var. Y usually random (except in a case-control study), response; X can be random

More information

Analysis of Categorical Data Three-Way Contingency Table

Analysis of Categorical Data Three-Way Contingency Table Yu Lecture 4 p. 1/17 Analysis of Categorical Data Three-Way Contingency Table Yu Lecture 4 p. 2/17 Outline Three way contingency tables Simpson s paradox Marginal vs. conditional independence Homogeneous

More information

ST3241 Categorical Data Analysis I Two-way Contingency Tables. 2 2 Tables, Relative Risks and Odds Ratios

ST3241 Categorical Data Analysis I Two-way Contingency Tables. 2 2 Tables, Relative Risks and Odds Ratios ST3241 Categorical Data Analysis I Two-way Contingency Tables 2 2 Tables, Relative Risks and Odds Ratios 1 What Is A Contingency Table (p.16) Suppose X and Y are two categorical variables X has I categories

More information

STAT 705: Analysis of Contingency Tables

STAT 705: Analysis of Contingency Tables STAT 705: Analysis of Contingency Tables Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Analysis of Contingency Tables 1 / 45 Outline of Part I: models and parameters Basic

More information

Sections 2.3, 2.4. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis 1 / 21

Sections 2.3, 2.4. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis 1 / 21 Sections 2.3, 2.4 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 21 2.3 Partial association in stratified 2 2 tables In describing a relationship

More information

Lecture 8: Summary Measures

Lecture 8: Summary Measures Lecture 8: Summary Measures Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 8:

More information

Chapter 2: Describing Contingency Tables - II

Chapter 2: Describing Contingency Tables - II : Describing Contingency Tables - II Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu]

More information

STAT Chapter 13: Categorical Data. Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure).

STAT Chapter 13: Categorical Data. Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure). STAT 515 -- Chapter 13: Categorical Data Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure). Many studies allow for more than 2 categories. Example

More information

Lecture 23. November 15, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

Lecture 23. November 15, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University. This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

STAT 526 Spring Final Exam. Thursday May 5, 2011

STAT 526 Spring Final Exam. Thursday May 5, 2011 STAT 526 Spring 2011 Final Exam Thursday May 5, 2011 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will

More information

n y π y (1 π) n y +ylogπ +(n y)log(1 π).

n y π y (1 π) n y +ylogπ +(n y)log(1 π). Tests for a binomial probability π Let Y bin(n,π). The likelihood is L(π) = n y π y (1 π) n y and the log-likelihood is L(π) = log n y +ylogπ +(n y)log(1 π). So L (π) = y π n y 1 π. 1 Solving for π gives

More information

Contingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878

Contingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878 Contingency Tables I. Definition & Examples. A) Contingency tables are tables where we are looking at two (or more - but we won t cover three or more way tables, it s way too complicated) factors, each

More information

Lecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

Lecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University Lecture 25 Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University November 24, 2015 1 2 3 4 5 6 7 8 9 10 11 1 Hypothesis s of homgeneity 2 Estimating risk

More information

Comparing p s Dr. Don Edwards notes (slightly edited and augmented) The Odds for Success

Comparing p s Dr. Don Edwards notes (slightly edited and augmented) The Odds for Success Comparing p s Dr. Don Edwards notes (slightly edited and augmented) The Odds for Success When the experiment consists of a series of n independent trials, and each trial may end in either success or failure,

More information

Binomial and Poisson Probability Distributions

Binomial and Poisson Probability Distributions Binomial and Poisson Probability Distributions Esra Akdeniz March 3, 2016 Bernoulli Random Variable Any random variable whose only possible values are 0 or 1 is called a Bernoulli random variable. What

More information

Contingency Tables. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels.

Contingency Tables. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels. Contingency Tables Definition & Examples. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels. (Using more than two factors gets complicated,

More information

3 Way Tables Edpsy/Psych/Soc 589

3 Way Tables Edpsy/Psych/Soc 589 3 Way Tables Edpsy/Psych/Soc 589 Carolyn J. Anderson Department of Educational Psychology I L L I N O I S university of illinois at urbana-champaign c Board of Trustees, University of Illinois Spring 2017

More information

Lecture 01: Introduction

Lecture 01: Introduction Lecture 01: Introduction Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 01: Introduction

More information

Probability and Probability Distributions. Dr. Mohammed Alahmed

Probability and Probability Distributions. Dr. Mohammed Alahmed Probability and Probability Distributions 1 Probability and Probability Distributions Usually we want to do more with data than just describing them! We might want to test certain specific inferences about

More information

Log-linear Models for Contingency Tables

Log-linear Models for Contingency Tables Log-linear Models for Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Log-linear Models for Two-way Contingency Tables Example: Business Administration Majors and Gender A

More information

13.1 Categorical Data and the Multinomial Experiment

13.1 Categorical Data and the Multinomial Experiment Chapter 13 Categorical Data Analysis 13.1 Categorical Data and the Multinomial Experiment Recall Variable: (numerical) variable (i.e. # of students, temperature, height,). (non-numerical, categorical)

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent

More information

Chapter 10. Chapter 10. Multinomial Experiments and. Multinomial Experiments and Contingency Tables. Contingency Tables.

Chapter 10. Chapter 10. Multinomial Experiments and. Multinomial Experiments and Contingency Tables. Contingency Tables. Chapter 10 Multinomial Experiments and Contingency Tables 1 Chapter 10 Multinomial Experiments and Contingency Tables 10-1 1 Overview 10-2 2 Multinomial Experiments: of-fitfit 10-3 3 Contingency Tables:

More information

11-2 Multinomial Experiment

11-2 Multinomial Experiment Chapter 11 Multinomial Experiments and Contingency Tables 1 Chapter 11 Multinomial Experiments and Contingency Tables 11-11 Overview 11-2 Multinomial Experiments: Goodness-of-fitfit 11-3 Contingency Tables:

More information

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression Rebecca Barter April 20, 2015 Fisher s Exact Test Fisher s Exact Test

More information

STAC51H3:Categorical Data Analysis Assign 2 Due: Thu Feb 9, 2016 in class All relevant work must be shown for credit.

STAC51H3:Categorical Data Analysis Assign 2 Due: Thu Feb 9, 2016 in class All relevant work must be shown for credit. STAC51H3:Categorical Data Analysis Assign 2 Due: Thu Feb 9, 2016 in class All relevant work must be shown for credit. Note: In any question, if you are using R, all R codes and R outputs must be included

More information

1 Comparing two binomials

1 Comparing two binomials BST 140.652 Review notes 1 Comparing two binomials 1. Let X Binomial(n 1,p 1 ) and ˆp 1 = X/n 1 2. Let Y Binomial(n 2,p 2 ) and ˆp 2 = Y/n 2 3. We also use the following notation: n 11 = X n 12 = n 1 X

More information

Statistics 3858 : Contingency Tables

Statistics 3858 : Contingency Tables Statistics 3858 : Contingency Tables 1 Introduction Before proceeding with this topic the student should review generalized likelihood ratios ΛX) for multinomial distributions, its relation to Pearson

More information

Categorical Data Analysis Chapter 3

Categorical Data Analysis Chapter 3 Categorical Data Analysis Chapter 3 The actual coverage probability is usually a bit higher than the nominal level. Confidence intervals for association parameteres Consider the odds ratio in the 2x2 table,

More information

Testing Independence

Testing Independence Testing Independence Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM 1/50 Testing Independence Previously, we looked at RR = OR = 1

More information

Contingency Tables Part One 1

Contingency Tables Part One 1 Contingency Tables Part One 1 STA 312: Fall 2012 1 See last slide for copyright information. 1 / 32 Suggested Reading: Chapter 2 Read Sections 2.1-2.4 You are not responsible for Section 2.5 2 / 32 Overview

More information

Discrete Distributions

Discrete Distributions Discrete Distributions STA 281 Fall 2011 1 Introduction Previously we defined a random variable to be an experiment with numerical outcomes. Often different random variables are related in that they have

More information

We know from STAT.1030 that the relevant test statistic for equality of proportions is:

We know from STAT.1030 that the relevant test statistic for equality of proportions is: 2. Chi 2 -tests for equality of proportions Introduction: Two Samples Consider comparing the sample proportions p 1 and p 2 in independent random samples of size n 1 and n 2 out of two populations which

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population

More information

For more information about how to cite these materials visit

For more information about how to cite these materials visit Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/

More information

Categorical Variables and Contingency Tables: Description and Inference

Categorical Variables and Contingency Tables: Description and Inference Categorical Variables and Contingency Tables: Description and Inference STAT 526 Professor Olga Vitek March 3, 2011 Reading: Agresti Ch. 1, 2 and 3 Faraway Ch. 4 3 Univariate Binomial and Multinomial Measurements

More information

Let us think of the situation as having a 50 sided fair die; any one number is equally likely to appear.

Let us think of the situation as having a 50 sided fair die; any one number is equally likely to appear. Probability_Homework Answers. Let the sample space consist of the integers through. {, 2, 3,, }. Consider the following events from that Sample Space. Event A: {a number is a multiple of 5 5, 0, 5,, }

More information

Lecture 24. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

Lecture 24. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University November 24, 2015 1 2 3 4 5 1 Odds ratios for retrospective studies 2 Odds ratios approximating the

More information

Probability and Discrete Distributions

Probability and Discrete Distributions AMS 7L LAB #3 Fall, 2007 Objectives: Probability and Discrete Distributions 1. To explore relative frequency and the Law of Large Numbers 2. To practice the basic rules of probability 3. To work with the

More information

Cohen s s Kappa and Log-linear Models

Cohen s s Kappa and Log-linear Models Cohen s s Kappa and Log-linear Models HRP 261 03/03/03 10-11 11 am 1. Cohen s Kappa Actual agreement = sum of the proportions found on the diagonals. π ii Cohen: Compare the actual agreement with the chance

More information

2. AXIOMATIC PROBABILITY

2. AXIOMATIC PROBABILITY IA Probability Lent Term 2. AXIOMATIC PROBABILITY 2. The axioms The formulation for classical probability in which all outcomes or points in the sample space are equally likely is too restrictive to develop

More information

Person-Time Data. Incidence. Cumulative Incidence: Example. Cumulative Incidence. Person-Time Data. Person-Time Data

Person-Time Data. Incidence. Cumulative Incidence: Example. Cumulative Incidence. Person-Time Data. Person-Time Data Person-Time Data CF Jeff Lin, MD., PhD. Incidence 1. Cumulative incidence (incidence proportion) 2. Incidence density (incidence rate) December 14, 2005 c Jeff Lin, MD., PhD. c Jeff Lin, MD., PhD. Person-Time

More information

Exercise 1. Exercise 2. Lesson 2 Theoretical Foundations Probabilities Solutions You ip a coin three times.

Exercise 1. Exercise 2. Lesson 2 Theoretical Foundations Probabilities Solutions You ip a coin three times. Lesson 2 Theoretical Foundations Probabilities Solutions monia.ranalli@uniroma3.it Exercise 1 You ip a coin three times. 1. Use a tree diagram to show the possible outcome patterns. How many outcomes are

More information

Lecture 2: Categorical Variable. A nice book about categorical variable is An Introduction to Categorical Data Analysis authored by Alan Agresti

Lecture 2: Categorical Variable. A nice book about categorical variable is An Introduction to Categorical Data Analysis authored by Alan Agresti Lecture 2: Categorical Variable A nice book about categorical variable is An Introduction to Categorical Data Analysis authored by Alan Agresti 1 Categorical Variable Categorical variable is qualitative

More information

BIOS 625 Fall 2015 Homework Set 3 Solutions

BIOS 625 Fall 2015 Homework Set 3 Solutions BIOS 65 Fall 015 Homework Set 3 Solutions 1. Agresti.0 Table.1 is from an early study on the death penalty in Florida. Analyze these data and show that Simpson s Paradox occurs. Death Penalty Victim's

More information

Estimating the Number of Tables via Sequential Importance Sampling

Estimating the Number of Tables via Sequential Importance Sampling Estimating the Number of Tables via Sequential Importance Sampling Jing Xi Department of Statistics University of Kentucky Jing Xi, Ruriko Yoshida, David Haws Introduction combinatorics social networks

More information

ij i j m ij n ij m ij n i j Suppose we denote the row variable by X and the column variable by Y ; We can then re-write the above expression as

ij i j m ij n ij m ij n i j Suppose we denote the row variable by X and the column variable by Y ; We can then re-write the above expression as page1 Loglinear Models Loglinear models are a way to describe association and interaction patterns among categorical variables. They are commonly used to model cell counts in contingency tables. These

More information

Goodness of Fit Tests

Goodness of Fit Tests Goodness of Fit Tests Marc H. Mehlman marcmehlman@yahoo.com University of New Haven (University of New Haven) Goodness of Fit Tests 1 / 38 Table of Contents 1 Goodness of Fit Chi Squared Test 2 Tests of

More information

Example. χ 2 = Continued on the next page. All cells

Example. χ 2 = Continued on the next page. All cells Section 11.1 Chi Square Statistic k Categories 1 st 2 nd 3 rd k th Total Observed Frequencies O 1 O 2 O 3 O k n Expected Frequencies E 1 E 2 E 3 E k n O 1 + O 2 + O 3 + + O k = n E 1 + E 2 + E 3 + + E

More information

Department of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance ECON 509. Dr.

Department of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance ECON 509. Dr. Department of Economics Business Statistics Chapter 1 Chi-square test of independence & Analysis of Variance ECON 509 Dr. Mohammad Zainal Chapter Goals After completing this chapter, you should be able

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

Lecture 2: Probability and Distributions

Lecture 2: Probability and Distributions Lecture 2: Probability and Distributions Ani Manichaikul amanicha@jhsph.edu 17 April 2007 1 / 65 Probability: Why do we care? Probability helps us by: Allowing us to translate scientific questions info

More information

# of 6s # of times Test the null hypthesis that the dice are fair at α =.01 significance

# of 6s # of times Test the null hypthesis that the dice are fair at α =.01 significance Practice Final Exam Statistical Methods and Models - Math 410, Fall 2011 December 4, 2011 You may use a calculator, and you may bring in one sheet (8.5 by 11 or A4) of notes. Otherwise closed book. The

More information

Solution to Tutorial 7

Solution to Tutorial 7 1. (a) We first fit the independence model ST3241 Categorical Data Analysis I Semester II, 2012-2013 Solution to Tutorial 7 log µ ij = λ + λ X i + λ Y j, i = 1, 2, j = 1, 2. The parameter estimates are

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 3: Bivariate association : Categorical variables Proportion in one group One group is measured one time: z test Use the z distribution as an approximation to the binomial

More information

Probability. Chapter 1 Probability. A Simple Example. Sample Space and Probability. Sample Space and Event. Sample Space (Two Dice) Probability

Probability. Chapter 1 Probability. A Simple Example. Sample Space and Probability. Sample Space and Event. Sample Space (Two Dice) Probability Probability Chapter 1 Probability 1.1 asic Concepts researcher claims that 10% of a large population have disease H. random sample of 100 people is taken from this population and examined. If 20 people

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

Introduction. Dipankar Bandyopadhyay, Ph.D. Department of Biostatistics, Virginia Commonwealth University

Introduction. Dipankar Bandyopadhyay, Ph.D. Department of Biostatistics, Virginia Commonwealth University Introduction Dipankar Bandyopadhyay, Ph.D. Department of Biostatistics, Virginia Commonwealth University D. Bandyopadhyay (VCU) BIOS 625: Categorical Data & GLM 1 / 56 Course logistics Let Y be a discrete

More information

Probability: Why do we care? Lecture 2: Probability and Distributions. Classical Definition. What is Probability?

Probability: Why do we care? Lecture 2: Probability and Distributions. Classical Definition. What is Probability? Probability: Why do we care? Lecture 2: Probability and Distributions Sandy Eckel seckel@jhsph.edu 22 April 2008 Probability helps us by: Allowing us to translate scientific questions into mathematical

More information

Chapter 10: Chi-Square and F Distributions

Chapter 10: Chi-Square and F Distributions Chapter 10: Chi-Square and F Distributions Chapter Notes 1 Chi-Square: Tests of Independence 2 4 & of Homogeneity 2 Chi-Square: Goodness of Fit 5 6 3 Testing & Estimating a Single Variance 7 10 or Standard

More information

Biostatistics Advanced Methods in Biostatistics IV

Biostatistics Advanced Methods in Biostatistics IV Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu 1 / 35 Tip + Paper Tip Meet with seminar speakers. When you go on

More information

2.3 Analysis of Categorical Data

2.3 Analysis of Categorical Data 90 CHAPTER 2. ESTIMATION AND HYPOTHESIS TESTING 2.3 Analysis of Categorical Data 2.3.1 The Multinomial Probability Distribution A mulinomial random variable is a generalization of the binomial rv. It results

More information

Describing Stratified Multiple Responses for Sparse Data

Describing Stratified Multiple Responses for Sparse Data Describing Stratified Multiple Responses for Sparse Data Ivy Liu School of Mathematical and Computing Sciences Victoria University Wellington, New Zealand June 28, 2004 SUMMARY Surveys often contain qualitative

More information

What is Probability? Probability. Sample Spaces and Events. Simple Event

What is Probability? Probability. Sample Spaces and Events. Simple Event What is Probability? Probability Peter Lo Probability is the numerical measure of likelihood that the event will occur. Simple Event Joint Event Compound Event Lies between 0 & 1 Sum of events is 1 1.5

More information

Lecture 4. Selected material from: Ch. 6 Probability

Lecture 4. Selected material from: Ch. 6 Probability Lecture 4 Selected material from: Ch. 6 Probability Example: Music preferences F M Suppose you want to know what types of CD s males and females are more likely to buy. The CD s are classified as Classical,

More information

STA Outlines of Solutions to Selected Homework Problems

STA Outlines of Solutions to Selected Homework Problems 1 STA 6505 CATEGORICAL DATA ANALYSIS Outlines of Solutions to Selected Homework Problems Alan Agresti January 5, 2004, c Alan Agresti 2004 This handout contains solutions and hints to solutions for many

More information

STAT/MA 416 Answers Homework 6 November 15, 2007 Solutions by Mark Daniel Ward PROBLEMS

STAT/MA 416 Answers Homework 6 November 15, 2007 Solutions by Mark Daniel Ward PROBLEMS STAT/MA 4 Answers Homework November 5, 27 Solutions by Mark Daniel Ward PROBLEMS Chapter Problems 2a. The mass p, corresponds to neither of the first two balls being white, so p, 8 7 4/39. The mass p,

More information

Unit 2 Introduction to Probability

Unit 2 Introduction to Probability PubHlth 540 Fall 2013 2. Introduction to Probability Page 1 of 55 Unit 2 Introduction to Probability Chance favours only those who know how to court her - Charles Nicolle A weather report statement such

More information

Conditional Probability

Conditional Probability Conditional Probability Idea have performed a chance experiment but don t know the outcome (ω), but have some partial information (event A) about ω. Question: given this partial information what s the

More information

Correspondence Analysis

Correspondence Analysis Correspondence Analysis Q: when independence of a 2-way contingency table is rejected, how to know where the dependence is coming from? The interaction terms in a GLM contain dependence information; however,

More information

Ling 289 Contingency Table Statistics

Ling 289 Contingency Table Statistics Ling 289 Contingency Table Statistics Roger Levy and Christopher Manning This is a summary of the material that we ve covered on contingency tables. Contingency tables: introduction Odds ratios Counting,

More information

Relative Risks (RR) and Odds Ratios (OR) 20

Relative Risks (RR) and Odds Ratios (OR) 20 BSTT523: Pagano & Gavreau, Chapter 6 1 Chapter 6: Probability slide: Definitions (6.1 in P&G) 2 Experiments; trials; probabilities Event operations 4 Intersection; Union; Complement Venn diagrams Conditional

More information

(x t. x t +1. TIME SERIES (Chapter 8 of Wilks)

(x t. x t +1. TIME SERIES (Chapter 8 of Wilks) 45 TIME SERIES (Chapter 8 of Wilks) In meteorology, the order of a time series matters! We will assume stationarity of the statistics of the time series. If there is non-stationarity (e.g., there is a

More information

Optimal exact tests for complex alternative hypotheses on cross tabulated data

Optimal exact tests for complex alternative hypotheses on cross tabulated data Optimal exact tests for complex alternative hypotheses on cross tabulated data Daniel Yekutieli Statistics and OR Tel Aviv University CDA course 29 July 2017 Yekutieli (TAU) Optimal exact tests for complex

More information

Review of One-way Tables and SAS

Review of One-way Tables and SAS Stat 504, Lecture 7 1 Review of One-way Tables and SAS In-class exercises: Ex1, Ex2, and Ex3 from http://v8doc.sas.com/sashtml/proc/z0146708.htm To calculate p-value for a X 2 or G 2 in SAS: http://v8doc.sas.com/sashtml/lgref/z0245929.htmz0845409

More information

Chapter 8 Student Lecture Notes 8-1. Department of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance

Chapter 8 Student Lecture Notes 8-1. Department of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance Chapter 8 Student Lecture Notes 8-1 Department of Economics Business Statistics Chapter 1 Chi-square test of independence & Analysis of Variance ECON 509 Dr. Mohammad Zainal Chapter Goals After completing

More information

Unit 9: Inferences for Proportions and Count Data

Unit 9: Inferences for Proportions and Count Data Unit 9: Inferences for Proportions and Count Data Statistics 571: Statistical Methods Ramón V. León 1/15/008 Unit 9 - Stat 571 - Ramón V. León 1 Large Sample Confidence Interval for Proportion ( pˆ p)

More information

4. Suppose that we roll two die and let X be equal to the maximum of the two rolls. Find P (X {1, 3, 5}) and draw the PMF for X.

4. Suppose that we roll two die and let X be equal to the maximum of the two rolls. Find P (X {1, 3, 5}) and draw the PMF for X. Math 10B with Professor Stankova Worksheet, Midterm #2; Wednesday, 3/21/2018 GSI name: Roy Zhao 1 Problems 1.1 Bayes Theorem 1. Suppose a test is 99% accurate and 1% of people have a disease. What is the

More information

Inference for Categorical Data. Chi-Square Tests for Goodness of Fit and Independence

Inference for Categorical Data. Chi-Square Tests for Goodness of Fit and Independence Chi-Square Tests for Goodness of Fit and Independence Chi-Square Tests In this course, we use chi-square tests in two different ways The chi-square test for goodness-of-fit is used to determine whether

More information

Unit 3. Discrete Distributions

Unit 3. Discrete Distributions PubHlth 640 3. Discrete Distributions Page 1 of 39 Unit 3. Discrete Distributions Topic 1. Proportions and Rates in Epidemiological Research.... 2. Review - Bernoulli Distribution. 3. Review - Binomial

More information

Probability Theory The Binomial and Poisson Distributions. Sections 5.2 and 5.3

Probability Theory The Binomial and Poisson Distributions. Sections 5.2 and 5.3 Probability Theory The Binomial and Poisson Distributions Sections 5.2 and 5.3 Models for count data The binomial distributions provide a theoretical model for count data having a fixed maximum Examples:

More information

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651 Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html Lecture 6 (MWF) Conditional probabilities and associations Suhasini Subba Rao Review of previous lecture

More information

Statistics for Business and Economics

Statistics for Business and Economics Statistics for Business and Economics Basic Probability Learning Objectives In this lecture(s), you learn: Basic probability concepts Conditional probability To use Bayes Theorem to revise probabilities

More information

This is a multiple choice and short answer practice exam. It does not count towards your grade. You may use the tables in your book.

This is a multiple choice and short answer practice exam. It does not count towards your grade. You may use the tables in your book. NAME (Please Print): HONOR PLEDGE (Please Sign): statistics 101 Practice Final Key This is a multiple choice and short answer practice exam. It does not count towards your grade. You may use the tables

More information

Goodness of Fit Goodness of fit - 2 classes

Goodness of Fit Goodness of fit - 2 classes Goodness of Fit Goodness of fit - 2 classes A B 78 22 Do these data correspond reasonably to the proportions 3:1? We previously discussed options for testing p A = 0.75! Exact p-value Exact confidence

More information

MA : Introductory Probability

MA : Introductory Probability MA 320-001: Introductory Probability David Murrugarra Department of Mathematics, University of Kentucky http://www.math.uky.edu/~dmu228/ma320/ Spring 2017 David Murrugarra (University of Kentucky) MA 320:

More information

Measures of Association and Variance Estimation

Measures of Association and Variance Estimation Measures of Association and Variance Estimation Dipankar Bandyopadhyay, Ph.D. Department of Biostatistics, Virginia Commonwealth University D. Bandyopadhyay (VCU) BIOS 625: Categorical Data & GLM 1 / 35

More information

Statistics for Managers Using Microsoft Excel (3 rd Edition)

Statistics for Managers Using Microsoft Excel (3 rd Edition) Statistics for Managers Using Microsoft Excel (3 rd Edition) Chapter 4 Basic Probability and Discrete Probability Distributions 2002 Prentice-Hall, Inc. Chap 4-1 Chapter Topics Basic probability concepts

More information

Generalized logit models for nominal multinomial responses. Local odds ratios

Generalized logit models for nominal multinomial responses. Local odds ratios Generalized logit models for nominal multinomial responses Categorical Data Analysis, Summer 2015 1/17 Local odds ratios Y 1 2 3 4 1 π 11 π 12 π 13 π 14 π 1+ X 2 π 21 π 22 π 23 π 24 π 2+ 3 π 31 π 32 π

More information

Clinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto.

Clinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto. Introduction to Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca September 18, 2014 38-1 : a review 38-2 Evidence Ideal: to advance the knowledge-base of clinical medicine,

More information

Tests for Population Proportion(s)

Tests for Population Proportion(s) Tests for Population Proportion(s) Esra Akdeniz April 6th, 2016 Motivation We are interested in estimating the prevalence rate of breast cancer among 50- to 54-year-old women whose mothers have had breast

More information

Chapter Six: Two Independent Samples Methods 1/51

Chapter Six: Two Independent Samples Methods 1/51 Chapter Six: Two Independent Samples Methods 1/51 6.3 Methods Related To Differences Between Proportions 2/51 Test For A Difference Between Proportions:Introduction Suppose a sampling distribution were

More information

Logistic regression modeling the probability of success

Logistic regression modeling the probability of success Logistic regression modeling the probability of success Regression models are usually thought of as only being appropriate for target variables that are continuous Is there any situation where we might

More information

Topics on Statistics 3

Topics on Statistics 3 Topics on Statistics 3 Pejman Mahboubi April 24, 2018 1 Contingency Tables Assume we ask a sample of 1127 Americans if they believe in an afterlife world. The table below cross classifies the sample based

More information

Module 10: Analysis of Categorical Data Statistics (OA3102)

Module 10: Analysis of Categorical Data Statistics (OA3102) Module 10: Analysis of Categorical Data Statistics (OA3102) Professor Ron Fricker Naval Postgraduate School Monterey, California Reading assignment: WM&S chapter 14.1-14.7 Revision: 3-12 1 Goals for this

More information

Statistics. Nicodème Paul Faculté de médecine, Université de Strasbourg

Statistics. Nicodème Paul Faculté de médecine, Université de Strasbourg Statistics Nicodème Paul Faculté de médecine, Université de Strasbourg Course logistics Statistics & Experimental plani cation Course website: http://statnipa.appspot.com/ (http://statnipa.appspot.com/)

More information

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem Recall from last time: Conditional probabilities Our probabilistic models will compute and manipulate conditional probabilities. Given two random variables X, Y, we denote by Lecture 2: Belief (Bayesian)

More information