STAC51: Categorical data Analysis
|
|
- Laureen Matthews
- 5 years ago
- Views:
Transcription
1 STAC51: Categorical data Analysis Mahinda Samarakoon January 26, 2016 Mahinda Samarakoon STAC51: Categorical data Analysis 1 / 32
2 Table of contents Contingency Tables 1 Contingency Tables Mahinda Samarakoon STAC51: Categorical data Analysis 2 / 32
3 Contingency Tables The two-way table below shows the distribution of the number of members in a fitness club classified by two variables. Women Men Vegetarian 9 3 Non-vegetarian 8 10 Does an association exist between gender and food habits (being a vegetarian or a non-vegetarian)? Are women in this club more likely to be vegetarians than men? Mahinda Samarakoon STAC51: Categorical data Analysis 3 / 32
4 Joint distribution Contingency Tables The three probability distributions below are often used when analyzing data from a contingency table: Joint distribution: The joint distribution of X and Y is defined by the collection of probabilities π ij = P(X = i, Y = j) for i = 1,..., I and j = 1,..., J. If n ij denote the observed number of evens in the i th row and j th column, π ij is estimated by p ij = n ij /n where n = I i=1 J j=1 n ij. Mahinda Samarakoon STAC51: Categorical data Analysis 4 / 32
5 Marginal distribution Contingency Tables Marginal distribution: The probability distribution of X is called the marginal distribution of X. The probability mass function of X, i.e. f X (i) = P(X = i) for i = 1,..., I identifies the marginal distribution of X. This is estimated by ˆf X (i) = n i+ /n for i = 1,..., I where n i+ = J j=1 n ij. Similarly The probability mass function of Y, i.e. p Y (j) = P(Y = j) for j = 1,..., J identifies the marginal distribution of Y. This is estimated by ˆf Y (j) = n +j /n for j = 1,..., J where n +j = I i=1 n ij. Mahinda Samarakoon STAC51: Categorical data Analysis 5 / 32
6 Conditional distribution Conditional distribution: A conditional distribution refers to the probability distribution of Y at a fixed level of X. The conditional distribution of Y given X = i can be estimated by n ij /n i+, j = 1,..., J. We choose Y to be the dependent variable and X to be the independent variable. Mahinda Samarakoon STAC51: Categorical data Analysis 6 / 32
7 Conditional distribution In the fitness club data above, Y is food habits (being a vegetarian or a non-vegetarian) and X is gender. The estimated conditional distribution (i.e. conditional proportions) for women (say X = 1) is given below: Y Probability Vegetarian 9/(9 + 8) = 0.53 Non-vegetarian 8/(9 + 8) = 0.47 The estimated conditional distribution (i.e. conditional proportions) for men (say X = 2) is given below: Y Probability Vegetarian 3/(3 + 10) = 0.23 Non-vegetarian 10/(3 + 10) = 0.77 The proportion of vegetarians among women is more than twice that among men. Mahinda Samarakoon STAC51: Categorical data Analysis 7 / 32
8 Sensitivity and Specificity in Diagnostic Tests Diagnostic testing is used to detect many medical conditions. For example, a test can detect cancer in a population. The result of a diagnostic test is said to be positive if it states that the disease is present and negative if it states that the disease is absent. The accuracy of diagnostic tests is often assessed with two conditional probabilities: Given that a subject has the disease, the probability the diagnostic test is positive is called the sensitivity. Given that the subject does not have the disease, the probability the test is negative is called the specificity. Mahinda Samarakoon STAC51: Categorical data Analysis 8 / 32
9 Sensitivity and Specificity in Diagnostic Tests If X denote the true state of a person, with categories 1 = the person has the disease, 0 = the person does not have the disease, and if Y = outcome of diagnostic test, with categories 1 = positive, 0 = negative, then, sensitivity = P(Y = 1 X = 1), specificity = P(Y = 0 X = 0). The higher the sensitivity and specificity, the better the diagnostic test. If you get positive result on a diagnostic test, then you might be interested in knowing the probability that you really have the disease, i.e. P(X = 1 Y = 1). This may be low even if sensitivity and specificity are both high. Mahinda Samarakoon STAC51: Categorical data Analysis 9 / 32
10 Sensitivity and Specificity in Diagnostic Tests: Example (Agresti) The data are from a screening test for HIV that was performed on a group of 100,000 people. Note that the prevalence rate of HIV in this group was very low. HIV status Test Result Positive Negative Total Positive Negative Total In this study the estimated sensitivity = 475/500 = 0.95 and estimated specificity = 94525/99500 = Mahinda Samarakoon STAC51: Categorical data Analysis 10 / 32
11 Sensitivity and Specificity in Diagnostic Tests: Example (Agresti) Breast cancer is the most common form of cancer in women. Of women who get mammograms at any given time, it has been estimated that 1% truly have breast cancer. Typical values reported for mammograms are sensitivity = 0.86 and specificity = If these values are correct, then given that a mammogram has a positive result, what is the probability this person truly has breast cancer? Solution: We have P(C) = 0.01, P(+ C) = 0.86 and P( C c ) = P(C+) = P(+ C)P(C) = = P(C c ) = P( C c )P(C c ) = 0.88 (1 0.01) = P(C c +) + P(C c ) = P(C c ) = = 0.99 = P(C c +) = = and P(+) = P(C+) + P(C c +) = = P(C +) = P(C+) P(+) = = Mahinda Samarakoon STAC51: Categorical data Analysis 11 / 32
12 Independence of Two Categorical Variables Definition The random variables X and Y are said to be related if the conditional distribution of Y given that X = x changes as x changes. Definition The random variables X and Y are said to be statistically independent if the conditional distribution of Y given that X = x is identical at each level of x. Note that X and Y are statistically independent if and only if π ij = π i+ π +j Mahinda Samarakoon STAC51: Categorical data Analysis 12 / 32
13 Independence of Two Categorical Variables: Example The joint distribution of the two random variables X and Y is given below: y x Are X and Y independent? Why? Mahinda Samarakoon STAC51: Categorical data Analysis 13 / 32
14 Poisson, binomial, and multinomial sampling What is the distribution of counts in a contingency table? There are four possible cases. Poisson Sampling In some cases we can treat each cell of an I J contingency table as independent Poisson random variables; i.e., the number of observation s in each cell, N ij independent Poisson(µ ij ). Thus, P(N ij = n ij ) = e µ ij µ n ij il n ij!, n ij = 0, 1, 2,... Independent Poisson sampling is appropriate when the total sample size n is not fixed. Mahinda Samarakoon STAC51: Categorical data Analysis 14 / 32
15 Poisson sampling Example These data are from records of accidents in 1988 compiled by the Department of Highway Safety and Motor Vehicles in Florida (Agresti, 1990, 1996). Seat belt use Injury Fatal Nonfatal No , 527 Yes , 368 Mahinda Samarakoon STAC51: Categorical data Analysis 15 / 32
16 Multinomial Sampling sampling When n is fixed (or conditional on sample size), multinomial sampling occurs over all of the cells of the contingency table; i.e., (N 11, N 12,..., N IJ ) Multinomial(n, π 11, π 12,..., π IJ ). (Agresti 3rd ed, p41) Suppose the researchers randomly sample 200 police records of accidents and classify each according to seat-best use and outcome of the accident (fatal or non-fatal). For this study the total sample size n is fixed (n = 200). They might treat the numbers of observations at the four combinations of seat-belt use and outcome of accident as a multinomial random variables with unknown joint probabilities (π 11, π 12, π 21, π 22 ) Mahinda Samarakoon STAC51: Categorical data Analysis 16 / 32
17 Independent Multinomial Sampling sampling Sometimes the row totals, n 1+, n 2+,..., n I + are fixed by the sampling design. For example in a clinical trial, there may be only 10 people available for the placebo group and 12 people available for the drug group. This type of sampling is appropriate for case-control studies and cohort studies. If there are only two possible outcomes for the trial cured and not-cured. In this case, we have binomial sampling within each row of the contingency table. This is often called independent binomial sampling since random variables are independent across the rows. Mahinda Samarakoon STAC51: Categorical data Analysis 17 / 32
18 Independent Multinomial Sampling sampling When more than two outcomes are possible, say cured, partially cured, and not cured, then independent multinomial sampling occurs within each row of the contingency table. (N 11, N 12,..., N 1J ) Multinomial(n 1+, π 1 1, π 2 1,..., π I 1 ), (N 21, N 22,..., N 2J ) Multinomial(n 2+, π 1 2, π 2 2,..., π I 2 ), and so on. - Sometimes both row and column totals are fixed. In that case hypergeometric sampling is appropriate. Mahinda Samarakoon STAC51: Categorical data Analysis 18 / 32
19 Conditional Association in Stratified 2 2 tables More than two categorical variables may be of interest. Let X, Y and Z be three categorical variables. Let s assume the X and Y each has two levels and Z has k levels. The table showing the counts in each cell is now a three dimensional table, but we can also display they as k two dimensional tables. Mahinda Samarakoon STAC51: Categorical data Analysis 19 / 32
20 Conditional Association in Stratified 2 2 tables Y Z = Total 1 n X 111 n 121 n n 211 n 221 n 2+1 Total n +11 n +21 n ++1 and for Z = k, Y Z = Total 1 n X 112 n 122 n n 212 n 222 n 2+2 Total n +12 n +22 n ++2 Y Z = k 1 2 Total 1 n X 11k n 12k n 1+k 2 n 21k n 22k n 2+k Total n +1k n +2k n ++k Mahinda Samarakoon STAC51: Categorical data Analysis 20 / 32
21 Conditional Association in Stratified 2 2 tables This idea can be extended to I J K tables Notation P(X = i, Y = j, Z = k) = π ijk and µ ijk = E(n ijk ) I J K Note 1: π ijk = 1 i=1 j=1 k=1 Note 2: An unbiased estimator of π ijk is ˆπ ijk = n ijk /n. i.e. E(ˆπ ijk ) = π ijk Note 3: A table showing the counts based on X and Y for a particular level of the third variable Z is called a partial table. These tables can be used to study the relationship between X and Y for a particular level of the variable Z. The associations in partial tables are called conditional associations, because they refer to the effect of X on Y conditional on fixing Z at some level. Mahinda Samarakoon STAC51: Categorical data Analysis 21 / 32
22 Conditional Association in Stratified 2 2 tables Note 4: The two-way XY contingency table obtained by combining the partial tables for all levels of the variable Z is called the XY marginal table. The marginal table contains no information about Z. It is simply a two-way table relating X and Y but may reflect the effects of Z on X and Y. Mahinda Samarakoon STAC51: Categorical data Analysis 22 / 32
23 Z as a control variable Contingency Tables Sometimes Z plays the role of a control variable In this case, the purpose is to understand the relationship between X and Y while controlling for Z In this case Z is sometimes called a layer variable Sometimes Z is also called a stratification variable Mahinda Samarakoon STAC51: Categorical data Analysis 23 / 32
24 Example Contingency Tables Let s assume that a university consists only two professional schools: Law school and business school and we are interested in studying the association between the two variables X = gender and Y = admission decision. Z = school is another variable that might influence the admission decision. X has two levels: 1 = Male, 2 = Female. Y has two levels: 1 = accepted, 2 = rejected. Z has two levels: 1 = law school, 2 = business school Mahinda Samarakoon STAC51: Categorical data Analysis 24 / 32
25 Example Partial tables Contingency Tables Y = Decision Z = 1 = Law school Accepted Rejected Total Male n X = gender 111 = 10 n 121 = 90 n 1+1 = 100 Female n 211 = 100 n 221 = 200 n 2+1 = 300 Total n +11 = 110 n +21 = 290 n ++1 = 400 Y = Decision Z = 2 = Business school Accepted Rejected Total Male n X = gender 112 = 480 n 122 = 120 n 1+2 = 600 Female n 212 = 180 n 222 = 20 n 2+2 = 200 Total n +12 = 660 n +22 = 140 n ++2 = 800 Marginal XY table Y = Decision Both schools Accepted Rejected Total Male n 11+ = 490 n 12+ = 210 n 1++ = 700 X = gender Female n 21+ = 280 n 22+ = 220 n 2++ = 500 Total n +1+ = 770 n +2+ = 430 n +++ = 1200 Mahinda Samarakoon STAC51: Categorical data Analysis 25 / 32
26 Conditional and Marginal Odds Ratios Odds ratios can describe marginal and conditional associations. Let {µ ijk } denote cell expected frequencies for some sampling model, such as binomial, multinomial, or Poisson sampling. Then the conditional odds ratio for category k of Z is given by θ XY (k) = µ 11kµ 22k µ 21k µ 12k The marginal odds ratio is given by θ XY = µ 11+µ 22+ µ 21+ µ 12+ Mahinda Samarakoon STAC51: Categorical data Analysis 26 / 32
27 Conditional and Marginal Odds Ratios Their estimates are given by ˆθ XY (k) = n 11kn 22k n 21k n 12k The estimate of marginal odds ratio is given by ˆθ XY = n 11+n 22+ n 21+ n 12+ Mahinda Samarakoon STAC51: Categorical data Analysis 27 / 32
28 Conditional and Marginal Odds Ratios: Example In example 25 above, ˆθ XY (1) = = 2 9 This means, for females, the odds of being selected to the law school is 4.5 times higher than that for males ˆθ XY (2) = = 1 12 This means, for females, the odds of being selected to the business school is 12 times higher than that for males. The marginal odds ratio is given by ˆθ XY = n 11+n = n 21+ n = 2.96 This means, for males, the odds of being selected to this school is 2.96 times higher than that for females. Mahinda Samarakoon STAC51: Categorical data Analysis 28 / 32
29 Conditional and Marginal Odds Ratios: Example The result that a marginal association can have a different direction from each conditional association is called Simpson s paradox. Mahinda Samarakoon STAC51: Categorical data Analysis 29 / 32
30 Marginal Independence versus Conditional Independence Definition: If X and Y are independent in partial table k, then X and Y are called conditionally independent at level k of Z. X and Y are said to be conditionally independent given Z when they are conditionally independent at every level of Z. Mahinda Samarakoon STAC51: Categorical data Analysis 30 / 32
31 Marginal Independence versus Conditional Independence : Example The data shown below is from Agresti Table 2.7, p52. Response Clinic Treatment Success Failure 1 A B A 2 8 B 8 32 Total A B Mahinda Samarakoon STAC51: Categorical data Analysis 31 / 32
32 Marginal Independence versus Conditional Independence : Example ˆθ XY (1) = n 111n 221 n 211 n 121 = = 1 and so X and Y are conditionally independent at Z = 1. ˆθ XY (2) = n 112n 222 n 212 n 122 = = 1 and so X and Y are conditionally independent at Z = 2. The marginal odds ratio is ˆθ XY = n 11+n = n 21+ n = 2 and so X and Y are not marginally independent. Mahinda Samarakoon STAC51: Categorical data Analysis 32 / 32
Chapter 2: Describing Contingency Tables - I
: Describing Contingency Tables - I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu]
More informationDescribing Contingency tables
Today s topics: Describing Contingency tables 1. Probability structure for contingency tables (distributions, sensitivity/specificity, sampling schemes). 2. Comparing two proportions (relative risk, odds
More information2 Describing Contingency Tables
2 Describing Contingency Tables I. Probability structure of a 2-way contingency table I.1 Contingency Tables X, Y : cat. var. Y usually random (except in a case-control study), response; X can be random
More informationAnalysis of Categorical Data Three-Way Contingency Table
Yu Lecture 4 p. 1/17 Analysis of Categorical Data Three-Way Contingency Table Yu Lecture 4 p. 2/17 Outline Three way contingency tables Simpson s paradox Marginal vs. conditional independence Homogeneous
More informationST3241 Categorical Data Analysis I Two-way Contingency Tables. 2 2 Tables, Relative Risks and Odds Ratios
ST3241 Categorical Data Analysis I Two-way Contingency Tables 2 2 Tables, Relative Risks and Odds Ratios 1 What Is A Contingency Table (p.16) Suppose X and Y are two categorical variables X has I categories
More informationSTAT 705: Analysis of Contingency Tables
STAT 705: Analysis of Contingency Tables Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Analysis of Contingency Tables 1 / 45 Outline of Part I: models and parameters Basic
More informationSections 2.3, 2.4. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis 1 / 21
Sections 2.3, 2.4 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 21 2.3 Partial association in stratified 2 2 tables In describing a relationship
More informationLecture 8: Summary Measures
Lecture 8: Summary Measures Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 8:
More informationChapter 2: Describing Contingency Tables - II
: Describing Contingency Tables - II Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu]
More informationSTAT Chapter 13: Categorical Data. Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure).
STAT 515 -- Chapter 13: Categorical Data Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure). Many studies allow for more than 2 categories. Example
More informationLecture 23. November 15, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
More informationSTAT 526 Spring Final Exam. Thursday May 5, 2011
STAT 526 Spring 2011 Final Exam Thursday May 5, 2011 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will
More informationn y π y (1 π) n y +ylogπ +(n y)log(1 π).
Tests for a binomial probability π Let Y bin(n,π). The likelihood is L(π) = n y π y (1 π) n y and the log-likelihood is L(π) = log n y +ylogπ +(n y)log(1 π). So L (π) = y π n y 1 π. 1 Solving for π gives
More informationContingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878
Contingency Tables I. Definition & Examples. A) Contingency tables are tables where we are looking at two (or more - but we won t cover three or more way tables, it s way too complicated) factors, each
More informationLecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University
Lecture 25 Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University November 24, 2015 1 2 3 4 5 6 7 8 9 10 11 1 Hypothesis s of homgeneity 2 Estimating risk
More informationComparing p s Dr. Don Edwards notes (slightly edited and augmented) The Odds for Success
Comparing p s Dr. Don Edwards notes (slightly edited and augmented) The Odds for Success When the experiment consists of a series of n independent trials, and each trial may end in either success or failure,
More informationBinomial and Poisson Probability Distributions
Binomial and Poisson Probability Distributions Esra Akdeniz March 3, 2016 Bernoulli Random Variable Any random variable whose only possible values are 0 or 1 is called a Bernoulli random variable. What
More informationContingency Tables. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels.
Contingency Tables Definition & Examples. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels. (Using more than two factors gets complicated,
More information3 Way Tables Edpsy/Psych/Soc 589
3 Way Tables Edpsy/Psych/Soc 589 Carolyn J. Anderson Department of Educational Psychology I L L I N O I S university of illinois at urbana-champaign c Board of Trustees, University of Illinois Spring 2017
More informationLecture 01: Introduction
Lecture 01: Introduction Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 01: Introduction
More informationProbability and Probability Distributions. Dr. Mohammed Alahmed
Probability and Probability Distributions 1 Probability and Probability Distributions Usually we want to do more with data than just describing them! We might want to test certain specific inferences about
More informationLog-linear Models for Contingency Tables
Log-linear Models for Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Log-linear Models for Two-way Contingency Tables Example: Business Administration Majors and Gender A
More information13.1 Categorical Data and the Multinomial Experiment
Chapter 13 Categorical Data Analysis 13.1 Categorical Data and the Multinomial Experiment Recall Variable: (numerical) variable (i.e. # of students, temperature, height,). (non-numerical, categorical)
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent
More informationChapter 10. Chapter 10. Multinomial Experiments and. Multinomial Experiments and Contingency Tables. Contingency Tables.
Chapter 10 Multinomial Experiments and Contingency Tables 1 Chapter 10 Multinomial Experiments and Contingency Tables 10-1 1 Overview 10-2 2 Multinomial Experiments: of-fitfit 10-3 3 Contingency Tables:
More information11-2 Multinomial Experiment
Chapter 11 Multinomial Experiments and Contingency Tables 1 Chapter 11 Multinomial Experiments and Contingency Tables 11-11 Overview 11-2 Multinomial Experiments: Goodness-of-fitfit 11-3 Contingency Tables:
More informationSTAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression
STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression Rebecca Barter April 20, 2015 Fisher s Exact Test Fisher s Exact Test
More informationSTAC51H3:Categorical Data Analysis Assign 2 Due: Thu Feb 9, 2016 in class All relevant work must be shown for credit.
STAC51H3:Categorical Data Analysis Assign 2 Due: Thu Feb 9, 2016 in class All relevant work must be shown for credit. Note: In any question, if you are using R, all R codes and R outputs must be included
More information1 Comparing two binomials
BST 140.652 Review notes 1 Comparing two binomials 1. Let X Binomial(n 1,p 1 ) and ˆp 1 = X/n 1 2. Let Y Binomial(n 2,p 2 ) and ˆp 2 = Y/n 2 3. We also use the following notation: n 11 = X n 12 = n 1 X
More informationStatistics 3858 : Contingency Tables
Statistics 3858 : Contingency Tables 1 Introduction Before proceeding with this topic the student should review generalized likelihood ratios ΛX) for multinomial distributions, its relation to Pearson
More informationCategorical Data Analysis Chapter 3
Categorical Data Analysis Chapter 3 The actual coverage probability is usually a bit higher than the nominal level. Confidence intervals for association parameteres Consider the odds ratio in the 2x2 table,
More informationTesting Independence
Testing Independence Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM 1/50 Testing Independence Previously, we looked at RR = OR = 1
More informationContingency Tables Part One 1
Contingency Tables Part One 1 STA 312: Fall 2012 1 See last slide for copyright information. 1 / 32 Suggested Reading: Chapter 2 Read Sections 2.1-2.4 You are not responsible for Section 2.5 2 / 32 Overview
More informationDiscrete Distributions
Discrete Distributions STA 281 Fall 2011 1 Introduction Previously we defined a random variable to be an experiment with numerical outcomes. Often different random variables are related in that they have
More informationWe know from STAT.1030 that the relevant test statistic for equality of proportions is:
2. Chi 2 -tests for equality of proportions Introduction: Two Samples Consider comparing the sample proportions p 1 and p 2 in independent random samples of size n 1 and n 2 out of two populations which
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population
More informationFor more information about how to cite these materials visit
Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/
More informationCategorical Variables and Contingency Tables: Description and Inference
Categorical Variables and Contingency Tables: Description and Inference STAT 526 Professor Olga Vitek March 3, 2011 Reading: Agresti Ch. 1, 2 and 3 Faraway Ch. 4 3 Univariate Binomial and Multinomial Measurements
More informationLet us think of the situation as having a 50 sided fair die; any one number is equally likely to appear.
Probability_Homework Answers. Let the sample space consist of the integers through. {, 2, 3,, }. Consider the following events from that Sample Space. Event A: {a number is a multiple of 5 5, 0, 5,, }
More informationLecture 24. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University
Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University November 24, 2015 1 2 3 4 5 1 Odds ratios for retrospective studies 2 Odds ratios approximating the
More informationProbability and Discrete Distributions
AMS 7L LAB #3 Fall, 2007 Objectives: Probability and Discrete Distributions 1. To explore relative frequency and the Law of Large Numbers 2. To practice the basic rules of probability 3. To work with the
More informationCohen s s Kappa and Log-linear Models
Cohen s s Kappa and Log-linear Models HRP 261 03/03/03 10-11 11 am 1. Cohen s Kappa Actual agreement = sum of the proportions found on the diagonals. π ii Cohen: Compare the actual agreement with the chance
More information2. AXIOMATIC PROBABILITY
IA Probability Lent Term 2. AXIOMATIC PROBABILITY 2. The axioms The formulation for classical probability in which all outcomes or points in the sample space are equally likely is too restrictive to develop
More informationPerson-Time Data. Incidence. Cumulative Incidence: Example. Cumulative Incidence. Person-Time Data. Person-Time Data
Person-Time Data CF Jeff Lin, MD., PhD. Incidence 1. Cumulative incidence (incidence proportion) 2. Incidence density (incidence rate) December 14, 2005 c Jeff Lin, MD., PhD. c Jeff Lin, MD., PhD. Person-Time
More informationExercise 1. Exercise 2. Lesson 2 Theoretical Foundations Probabilities Solutions You ip a coin three times.
Lesson 2 Theoretical Foundations Probabilities Solutions monia.ranalli@uniroma3.it Exercise 1 You ip a coin three times. 1. Use a tree diagram to show the possible outcome patterns. How many outcomes are
More informationLecture 2: Categorical Variable. A nice book about categorical variable is An Introduction to Categorical Data Analysis authored by Alan Agresti
Lecture 2: Categorical Variable A nice book about categorical variable is An Introduction to Categorical Data Analysis authored by Alan Agresti 1 Categorical Variable Categorical variable is qualitative
More informationBIOS 625 Fall 2015 Homework Set 3 Solutions
BIOS 65 Fall 015 Homework Set 3 Solutions 1. Agresti.0 Table.1 is from an early study on the death penalty in Florida. Analyze these data and show that Simpson s Paradox occurs. Death Penalty Victim's
More informationEstimating the Number of Tables via Sequential Importance Sampling
Estimating the Number of Tables via Sequential Importance Sampling Jing Xi Department of Statistics University of Kentucky Jing Xi, Ruriko Yoshida, David Haws Introduction combinatorics social networks
More informationij i j m ij n ij m ij n i j Suppose we denote the row variable by X and the column variable by Y ; We can then re-write the above expression as
page1 Loglinear Models Loglinear models are a way to describe association and interaction patterns among categorical variables. They are commonly used to model cell counts in contingency tables. These
More informationGoodness of Fit Tests
Goodness of Fit Tests Marc H. Mehlman marcmehlman@yahoo.com University of New Haven (University of New Haven) Goodness of Fit Tests 1 / 38 Table of Contents 1 Goodness of Fit Chi Squared Test 2 Tests of
More informationExample. χ 2 = Continued on the next page. All cells
Section 11.1 Chi Square Statistic k Categories 1 st 2 nd 3 rd k th Total Observed Frequencies O 1 O 2 O 3 O k n Expected Frequencies E 1 E 2 E 3 E k n O 1 + O 2 + O 3 + + O k = n E 1 + E 2 + E 3 + + E
More informationDepartment of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance ECON 509. Dr.
Department of Economics Business Statistics Chapter 1 Chi-square test of independence & Analysis of Variance ECON 509 Dr. Mohammad Zainal Chapter Goals After completing this chapter, you should be able
More informationLecture 14: Introduction to Poisson Regression
Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why
More informationModelling counts. Lecture 14: Introduction to Poisson Regression. Overview
Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week
More informationLecture 2: Probability and Distributions
Lecture 2: Probability and Distributions Ani Manichaikul amanicha@jhsph.edu 17 April 2007 1 / 65 Probability: Why do we care? Probability helps us by: Allowing us to translate scientific questions info
More information# of 6s # of times Test the null hypthesis that the dice are fair at α =.01 significance
Practice Final Exam Statistical Methods and Models - Math 410, Fall 2011 December 4, 2011 You may use a calculator, and you may bring in one sheet (8.5 by 11 or A4) of notes. Otherwise closed book. The
More informationSolution to Tutorial 7
1. (a) We first fit the independence model ST3241 Categorical Data Analysis I Semester II, 2012-2013 Solution to Tutorial 7 log µ ij = λ + λ X i + λ Y j, i = 1, 2, j = 1, 2. The parameter estimates are
More informationStatistics in medicine
Statistics in medicine Lecture 3: Bivariate association : Categorical variables Proportion in one group One group is measured one time: z test Use the z distribution as an approximation to the binomial
More informationProbability. Chapter 1 Probability. A Simple Example. Sample Space and Probability. Sample Space and Event. Sample Space (Two Dice) Probability
Probability Chapter 1 Probability 1.1 asic Concepts researcher claims that 10% of a large population have disease H. random sample of 100 people is taken from this population and examined. If 20 people
More information8 Nominal and Ordinal Logistic Regression
8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on
More informationIntroduction. Dipankar Bandyopadhyay, Ph.D. Department of Biostatistics, Virginia Commonwealth University
Introduction Dipankar Bandyopadhyay, Ph.D. Department of Biostatistics, Virginia Commonwealth University D. Bandyopadhyay (VCU) BIOS 625: Categorical Data & GLM 1 / 56 Course logistics Let Y be a discrete
More informationProbability: Why do we care? Lecture 2: Probability and Distributions. Classical Definition. What is Probability?
Probability: Why do we care? Lecture 2: Probability and Distributions Sandy Eckel seckel@jhsph.edu 22 April 2008 Probability helps us by: Allowing us to translate scientific questions into mathematical
More informationChapter 10: Chi-Square and F Distributions
Chapter 10: Chi-Square and F Distributions Chapter Notes 1 Chi-Square: Tests of Independence 2 4 & of Homogeneity 2 Chi-Square: Goodness of Fit 5 6 3 Testing & Estimating a Single Variance 7 10 or Standard
More informationBiostatistics Advanced Methods in Biostatistics IV
Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu 1 / 35 Tip + Paper Tip Meet with seminar speakers. When you go on
More information2.3 Analysis of Categorical Data
90 CHAPTER 2. ESTIMATION AND HYPOTHESIS TESTING 2.3 Analysis of Categorical Data 2.3.1 The Multinomial Probability Distribution A mulinomial random variable is a generalization of the binomial rv. It results
More informationDescribing Stratified Multiple Responses for Sparse Data
Describing Stratified Multiple Responses for Sparse Data Ivy Liu School of Mathematical and Computing Sciences Victoria University Wellington, New Zealand June 28, 2004 SUMMARY Surveys often contain qualitative
More informationWhat is Probability? Probability. Sample Spaces and Events. Simple Event
What is Probability? Probability Peter Lo Probability is the numerical measure of likelihood that the event will occur. Simple Event Joint Event Compound Event Lies between 0 & 1 Sum of events is 1 1.5
More informationLecture 4. Selected material from: Ch. 6 Probability
Lecture 4 Selected material from: Ch. 6 Probability Example: Music preferences F M Suppose you want to know what types of CD s males and females are more likely to buy. The CD s are classified as Classical,
More informationSTA Outlines of Solutions to Selected Homework Problems
1 STA 6505 CATEGORICAL DATA ANALYSIS Outlines of Solutions to Selected Homework Problems Alan Agresti January 5, 2004, c Alan Agresti 2004 This handout contains solutions and hints to solutions for many
More informationSTAT/MA 416 Answers Homework 6 November 15, 2007 Solutions by Mark Daniel Ward PROBLEMS
STAT/MA 4 Answers Homework November 5, 27 Solutions by Mark Daniel Ward PROBLEMS Chapter Problems 2a. The mass p, corresponds to neither of the first two balls being white, so p, 8 7 4/39. The mass p,
More informationUnit 2 Introduction to Probability
PubHlth 540 Fall 2013 2. Introduction to Probability Page 1 of 55 Unit 2 Introduction to Probability Chance favours only those who know how to court her - Charles Nicolle A weather report statement such
More informationConditional Probability
Conditional Probability Idea have performed a chance experiment but don t know the outcome (ω), but have some partial information (event A) about ω. Question: given this partial information what s the
More informationCorrespondence Analysis
Correspondence Analysis Q: when independence of a 2-way contingency table is rejected, how to know where the dependence is coming from? The interaction terms in a GLM contain dependence information; however,
More informationLing 289 Contingency Table Statistics
Ling 289 Contingency Table Statistics Roger Levy and Christopher Manning This is a summary of the material that we ve covered on contingency tables. Contingency tables: introduction Odds ratios Counting,
More informationRelative Risks (RR) and Odds Ratios (OR) 20
BSTT523: Pagano & Gavreau, Chapter 6 1 Chapter 6: Probability slide: Definitions (6.1 in P&G) 2 Experiments; trials; probabilities Event operations 4 Intersection; Union; Complement Venn diagrams Conditional
More information(x t. x t +1. TIME SERIES (Chapter 8 of Wilks)
45 TIME SERIES (Chapter 8 of Wilks) In meteorology, the order of a time series matters! We will assume stationarity of the statistics of the time series. If there is non-stationarity (e.g., there is a
More informationOptimal exact tests for complex alternative hypotheses on cross tabulated data
Optimal exact tests for complex alternative hypotheses on cross tabulated data Daniel Yekutieli Statistics and OR Tel Aviv University CDA course 29 July 2017 Yekutieli (TAU) Optimal exact tests for complex
More informationReview of One-way Tables and SAS
Stat 504, Lecture 7 1 Review of One-way Tables and SAS In-class exercises: Ex1, Ex2, and Ex3 from http://v8doc.sas.com/sashtml/proc/z0146708.htm To calculate p-value for a X 2 or G 2 in SAS: http://v8doc.sas.com/sashtml/lgref/z0245929.htmz0845409
More informationChapter 8 Student Lecture Notes 8-1. Department of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance
Chapter 8 Student Lecture Notes 8-1 Department of Economics Business Statistics Chapter 1 Chi-square test of independence & Analysis of Variance ECON 509 Dr. Mohammad Zainal Chapter Goals After completing
More informationUnit 9: Inferences for Proportions and Count Data
Unit 9: Inferences for Proportions and Count Data Statistics 571: Statistical Methods Ramón V. León 1/15/008 Unit 9 - Stat 571 - Ramón V. León 1 Large Sample Confidence Interval for Proportion ( pˆ p)
More information4. Suppose that we roll two die and let X be equal to the maximum of the two rolls. Find P (X {1, 3, 5}) and draw the PMF for X.
Math 10B with Professor Stankova Worksheet, Midterm #2; Wednesday, 3/21/2018 GSI name: Roy Zhao 1 Problems 1.1 Bayes Theorem 1. Suppose a test is 99% accurate and 1% of people have a disease. What is the
More informationInference for Categorical Data. Chi-Square Tests for Goodness of Fit and Independence
Chi-Square Tests for Goodness of Fit and Independence Chi-Square Tests In this course, we use chi-square tests in two different ways The chi-square test for goodness-of-fit is used to determine whether
More informationUnit 3. Discrete Distributions
PubHlth 640 3. Discrete Distributions Page 1 of 39 Unit 3. Discrete Distributions Topic 1. Proportions and Rates in Epidemiological Research.... 2. Review - Bernoulli Distribution. 3. Review - Binomial
More informationProbability Theory The Binomial and Poisson Distributions. Sections 5.2 and 5.3
Probability Theory The Binomial and Poisson Distributions Sections 5.2 and 5.3 Models for count data The binomial distributions provide a theoretical model for count data having a fixed maximum Examples:
More informationData Analysis and Statistical Methods Statistics 651
Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html Lecture 6 (MWF) Conditional probabilities and associations Suhasini Subba Rao Review of previous lecture
More informationStatistics for Business and Economics
Statistics for Business and Economics Basic Probability Learning Objectives In this lecture(s), you learn: Basic probability concepts Conditional probability To use Bayes Theorem to revise probabilities
More informationThis is a multiple choice and short answer practice exam. It does not count towards your grade. You may use the tables in your book.
NAME (Please Print): HONOR PLEDGE (Please Sign): statistics 101 Practice Final Key This is a multiple choice and short answer practice exam. It does not count towards your grade. You may use the tables
More informationGoodness of Fit Goodness of fit - 2 classes
Goodness of Fit Goodness of fit - 2 classes A B 78 22 Do these data correspond reasonably to the proportions 3:1? We previously discussed options for testing p A = 0.75! Exact p-value Exact confidence
More informationMA : Introductory Probability
MA 320-001: Introductory Probability David Murrugarra Department of Mathematics, University of Kentucky http://www.math.uky.edu/~dmu228/ma320/ Spring 2017 David Murrugarra (University of Kentucky) MA 320:
More informationMeasures of Association and Variance Estimation
Measures of Association and Variance Estimation Dipankar Bandyopadhyay, Ph.D. Department of Biostatistics, Virginia Commonwealth University D. Bandyopadhyay (VCU) BIOS 625: Categorical Data & GLM 1 / 35
More informationStatistics for Managers Using Microsoft Excel (3 rd Edition)
Statistics for Managers Using Microsoft Excel (3 rd Edition) Chapter 4 Basic Probability and Discrete Probability Distributions 2002 Prentice-Hall, Inc. Chap 4-1 Chapter Topics Basic probability concepts
More informationGeneralized logit models for nominal multinomial responses. Local odds ratios
Generalized logit models for nominal multinomial responses Categorical Data Analysis, Summer 2015 1/17 Local odds ratios Y 1 2 3 4 1 π 11 π 12 π 13 π 14 π 1+ X 2 π 21 π 22 π 23 π 24 π 2+ 3 π 31 π 32 π
More informationClinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto.
Introduction to Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca September 18, 2014 38-1 : a review 38-2 Evidence Ideal: to advance the knowledge-base of clinical medicine,
More informationTests for Population Proportion(s)
Tests for Population Proportion(s) Esra Akdeniz April 6th, 2016 Motivation We are interested in estimating the prevalence rate of breast cancer among 50- to 54-year-old women whose mothers have had breast
More informationChapter Six: Two Independent Samples Methods 1/51
Chapter Six: Two Independent Samples Methods 1/51 6.3 Methods Related To Differences Between Proportions 2/51 Test For A Difference Between Proportions:Introduction Suppose a sampling distribution were
More informationLogistic regression modeling the probability of success
Logistic regression modeling the probability of success Regression models are usually thought of as only being appropriate for target variables that are continuous Is there any situation where we might
More informationTopics on Statistics 3
Topics on Statistics 3 Pejman Mahboubi April 24, 2018 1 Contingency Tables Assume we ask a sample of 1127 Americans if they believe in an afterlife world. The table below cross classifies the sample based
More informationModule 10: Analysis of Categorical Data Statistics (OA3102)
Module 10: Analysis of Categorical Data Statistics (OA3102) Professor Ron Fricker Naval Postgraduate School Monterey, California Reading assignment: WM&S chapter 14.1-14.7 Revision: 3-12 1 Goals for this
More informationStatistics. Nicodème Paul Faculté de médecine, Université de Strasbourg
Statistics Nicodème Paul Faculté de médecine, Université de Strasbourg Course logistics Statistics & Experimental plani cation Course website: http://statnipa.appspot.com/ (http://statnipa.appspot.com/)
More informationRecall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem
Recall from last time: Conditional probabilities Our probabilistic models will compute and manipulate conditional probabilities. Given two random variables X, Y, we denote by Lecture 2: Belief (Bayesian)
More information