STAC51H3:Categorical Data Analysis Assign 2 Due: Thu Feb 9, 2016 in class All relevant work must be shown for credit.

Size: px
Start display at page:

Download "STAC51H3:Categorical Data Analysis Assign 2 Due: Thu Feb 9, 2016 in class All relevant work must be shown for credit."

Transcription

1 STAC51H3:Categorical Data Analysis Assign 2 Due: Thu Feb 9, 2016 in class All relevant work must be shown for credit. Note: In any question, if you are using R, all R codes and R outputs must be included in your answers. You should assume that the reader is not familiar with R outputs and so explain all your findings, quoting necessary values form your outputs. Please note that academic integrity is fundamental to learning and scholarship. You may discuss questions with other students. However, the work you submit should be your own. If I feel suspicious of any assignment (e.g. if your work doesn t appear to be consistent with what we have discussed in class), I will not mark the assignment. Instead, I will ask you to present your work in my office and your grade will be assigned based on your presentation. 1. (Agresti) In the United States, the estimated annual probability that a woman over the age of 35 dies of lung cancer equals for current smokers and for nonsmokers [M. Pagano and K. Gauvreau, Principles of Biostatistics, Belmont, CA: Duxbury Press (1993), p. 134]. (a) (3 points) Calculate and interpret the difference of proportions and the relative risk. Denoting smokers by 1 and non-smokers by 2, the difference of proportion ˆπ 1 ˆπ For women over 35 years of age, the probability of dying of lung cancer is grater (by ) for smokers compared to non-smokers. Relative risk is / For women over 35 years of age, the chance of dying of lung cancer for smokers is times higher than that of the non-smokers. (b) (3 points) Calculate and interpret the odds ratio. Explain why the relative risk and odds ratio take similar values. Is this always the case or only in some cases? Explain. Odds ratio ˆθ /( ) For women over 35 years of /( ) age, the odds of dying from lung cancer is 10.8 times higher for smoker compared to non-smokers. The relative risk and the odds ratios are very different in example. This is usually the case for rare diseases, i.e. when the probability is small. > # part a > p1 < > p2 < > diff <- p1-p2 # Risk difference Question 1 continues on the next page...

2 > diff [1] > rr <- p1/p2 # Relative risk > rr [1] > # part b > odds1 <- p1/(1-p1) > odds2 <- p2/(1-p2) > thetahat <- odds1/odds2 > odds1 [1] > odds2 [1] > thetahat [1] Page 2 of 11

3 Page 3 of Drivers at an intersection are classified by gender (Female or Male) and seat-belt usage (Yes or No). After one hours observation, the following table was collected: Seat-belt use Gender Yes No F M (a) (4 points) Compute and interpret the odds (odd of not wearing seat-belt) ratio for this example. ˆθ 60/ The odds of not wearing seat belts 45/65 among female drivers is about 0.79 times that among male drivers. Or the odds of not wearing seat belts among male drivers is about 1/ times that among female drivers. (b) (3 points) Which sampling model (Poisson, Binomial, Multinomial, Product Multinomial) seems most appropriate here? Give reasons for your answer. The design didn t have a fixed sample size. The number of drivers passing the crossing in a one-hour period is a random variable, typically modeled by Poisson distribution. In contingency tables, cell counts have independent Poisson distributions. (c) (2 points) Is one of the variables a response variable? Which one? Explain. Whether or not wearing seat belts is what can depend on gender and whether or not wearing seat belts is the response variable and so gender is the explanatory variable. 3. A survey estimated that 20% of all Americans aged 16 to 20 drove under the influence of drugs or alcohol. A similar survey is planned for Canada. They want a 95% confidence interval to have a margin of error of 0.04 (for Wald confidence interval). (a) (4 points) Find the necessary sample size if they expect to find results similar to those in the United States. 0.2 (1 0.2) We wanr and so n 385 n (b) (2 points) Suppose instead they used the conservative formula based on ˆp 0.5. What is now the required sample size?

4 Page 4 of (1 0.5) We wanr and so n 600 n 4. In this question we will do a simulation study of the confidence intervals for odds ratios for contingency tables based on multinomial sampling. (a) (8 points) Use R to generate n contingency tables with total count (i.e. grad total), N 100 with and known cell probabilities (π 11, π 12, π 21, π 22 ) (0.2, 0.3, 0.3, 0.2) from a multinomial distribution. i.e. from multinomial (N, π 11, π 12, π 21, π 22 ). For each of these generated tables, calculate the odds ratio and a 95 percent large sample confidence interval. What is the true odds ratio θ(i.e. population odds ratio) for these tables? How many of the 10 intervals you calculated contain θ? Note in this part please print all your table cell counts (i.e. for the 10 tables), estimated odds ratios (i.e. ˆθ) and the confidence intervals. > #R code Q4 Assign 2 > N <- 100 # the grad total for each table > n <- 10 # number of tables > pi11 <- 0.2 > pi12 <- 0.3 > pi21 <- 0.3 > pi22 <- 0.2 > alpha < > # > table <- rmultinom(n, size N, prob c(pi11, pi12, pi21, pi22)) > theta <- (pi11*pi22)/(pi12*pi21) > table <- t(table) > a <- table[,1] > b <- table[,2] > c <- table[,3] > d <- table[,4] > # add 0.5 if any cell count is 0 to avoid division by zero > a <- (a0)*(a+0.5)+(a > 0)*a > b <- (b0)*(b+0.5)+(b > 0)*b > c <- (c0)*(c+0.5)+(c > 0)*c > d <- (d0)*(d+0.5)+(d > 0)*d > thetahat <- (a*d)/(b*c) > logthetahat <- log(thetahat) > SE <- sqrt(1/a+1/b+1/c+1/d) > LLlog <- logthetahat - qnorm(1-alpha/2)*se > ULlog <- logthetahat + qnorm(1-alpha/2)*se Question 4 continues on the next page...

5 Page 5 of 11 > LL <- exp(lllog) > UL <- exp(ullog) > results <- cbind(a, b, c, d, thetahat, LL, UL, theta) > results a b c d thetahat LL UL theta [1,] [2,] [3,] [4,] [5,] [6,] [7,] [8,] [9,] [10,] > (b) (7 points) Repeat part (a) but this time with n Do not print the tables etc this time, but instead calculate the proportion of the intervals (i.e. the million intervals) containing θ. Comment on your value. Note: Any table with a zero cell count has odds ratio equal to 0 or. Replace any zero cell counts by 0.5. (this is often done when dealing with zero cell counts) > #Now reapeat the above simulation for a larger number of tables > # and calculate the coverage probability > N <- 100 # the grad total for each table > n < # number of tables > pi11 <- 0.2 > pi12 <- 0.3 > pi21 <- 0.3 > pi22 <- 0.2 > alpha < > # > table <- rmultinom(n, size N, prob c(pi11, pi12, pi21, pi22)) > theta <- (pi11*pi22)/(pi12*pi21) > table <- t(table) > a <- table[,1] > b <- table[,2] > c <- table[,3] > d <- table[,4] > # add 0.5 if any cell count is 0 to avoid division by zero Question 4 continues on the next page...

6 Page 6 of 11 > a <- (a0)*(a+0.5)+(a > 0)*a > b <- (b0)*(b+0.5)+(b > 0)*b > c <- (c0)*(c+0.5)+(c > 0)*c > d <- (d0)*(d+0.5)+(d > 0)*d > thetahat <- (a*d)/(b*c) > logthetahat <- log(thetahat) > SE <- sqrt(1/a+1/b+1/c+1/d) > LLlog <- logthetahat - qnorm(1-alpha/2)*se > ULlog <- logthetahat + qnorm(1-alpha/2)*se > LL <- exp(lllog) > UL <- exp(ullog) > thetainci (LL < theta)*(ul > theta) > observed_cof_level <- mean(thetainci) > observed_cof_level [1] > (c) (3 points) Repeat part (b), but this time with N 20. (i.e. still a million tables but each table with grad total 20) > N <- 20 # the grad total for each table > n < # number of tables > pi11 <- 0.2 > pi12 <- 0.3 > pi21 <- 0.3 > pi22 <- 0.2 > alpha < > # > table <- rmultinom(n, size N, prob c(pi11, pi12, pi21, pi22)) > theta <- (pi11*pi22)/(pi12*pi21) > table <- t(table) > a <- table[,1] > b <- table[,2] > c <- table[,3] > d <- table[,4] > # add 0.5 if any cell count is 0 to avoid division by zero > a <- (a0)*(a+0.5)+(a > 0)*a > b <- (b0)*(b+0.5)+(b > 0)*b > c <- (c0)*(c+0.5)+(c > 0)*c > d <- (d0)*(d+0.5)+(d > 0)*d Question 4 continues on the next page...

7 > thetahat <- (a*d)/(b*c) > logthetahat <- log(thetahat) > SE <- sqrt(1/a+1/b+1/c+1/d) > LLlog <- logthetahat - qnorm(1-alpha/2)*se > ULlog <- logthetahat + qnorm(1-alpha/2)*se > LL <- exp(lllog) > UL <- exp(ullog) > thetainci (LL < theta)*(ul > theta) > observed_cof_level <- mean(thetainci) > observed_cof_level [1] Page 7 of 11

8 Page 8 of Suppose that we would like to know whether there is an association between voter gender and candidate choice in an election, say candidate A and candidate B. An investigator has decided to conduct an exit poll with 50 voters. He classified the results by gender and the candidate they voted for. The counts are given in the table below: Candidate A Candidate B Female Male 5 20 (a) (2 points) What is the appropriate sampling method appropriate for counts in this this table and give reasons for your answer? In this design, the grand total is fixed and so multinomial sampling sampling is the appropriate sampling method. (b) (4 points) What are the estimated cell provabilities under the assumption of independence of gender and the preference for the candidate? > a <- 10 > b <- 15 > c <- 5 > d <- 20 > n <- a+b+c+d > n [1] 50 > pi11 <- ((a+b)/n)*((a+c)/n) > pi12 <- ((a+b)/n)*((b+d)/n) > pi21 <- ((c+d)/n)*((a+c)/n) > pi22 <- ((c+d)/n)*((b+d)/n) > prob <- c(pi11, pi12, pi21, pi22) > prob [1] (c) (4 points) Using the estimated cell probabilities as the actual values of the probabilities (i.e. π ij ), calculate the probability of observing the counts in the table above. > pobstable <- dmultinom(c(a, b, c, d), size n, prob prob) > pobstable [1] Question 5 continues on the next page...

9 Page 9 of 11 (d) (5 points) Calculate the probability of observing table counts as surprising as or more surprising than counts on the table above. > a <- 10 > b <- 15 > c <- 5 > d <- 20 > n <- a+b+c+d > n [1] 50 > X <- t(as.matrix(expand.grid(0:n, 0:n, 0:n))) > X <- X[, colsums(x) < n] > X <- rbind(x, n - colsums(x)) > # Let s use estimated proprotions under independence as the probabilities and caluculate > # the probability of observing a table as surprising as or more surprising than the > # table observed > pi11 <- ((a+b)/n)*((a+c)/n) > pi12 <- ((a+b)/n)*((b+d)/n) > pi21 <- ((c+d)/n)*((a+c)/n) > pi22 <- ((c+d)/n)*((b+d)/n) > prob <- c(pi11, pi12, pi21, pi22) > prob [1] > sum(prob) [1] 1 > #p <- round(apply(x, 2, function(x) dmultinom(x, size n, prob prob)), 3) > p <- apply(x, 2, function(x) dmultinom(x, size n, prob prob)) > pobstable <- dmultinom(c(a, b, c, d), size n, prob prob) > pextreme <- subset(p, p < pobstable) > pvalue sum(pextreme) > pobstable [1] > pvalue [1]

10 6. Consider the 2 2 contingency table with the cell probabilities as shown below: Let θ P (Y 1 X1)/P (Y 2 X1) P (Y 1 X2)/P (Y 2 X2). Y 1 Y 2 Total X 1 x a x a X 2 b x 1 a b + x 1-a Total b 1 b 1 Page 10 of 11 Note: Do not use any statistical ideas in parts (a) and (b) of this question. Just treat a, b and x as real numbers between 0 and 1 and use only simple algebra (nothing more than high school algebra). (a) (5 points) Show that if θ 1, then x a b θ P (Y 1 X 1)/P (Y 2 X 1) P (Y 1 X 2)/P (Y 2 X 2) x / a x a a b x / 1 a b+x 1 a 1 a x(1 a b + x) (a x)(b x) x ax bx + x2 ab ax bx + x 2 θ 1 x ax bx + x2 ab ax bx + x 1 2 x ax bx + x 2 ab ax bx + x 2 x ab (b) (5 points) Show that if x a b, then θ 1 Question 6 continues on the next page...

11 Page 11 of 11 If x ab, then θ P (Y 1 X 1)/P (Y 2 X 1) P (Y 1 X 2)/P (Y 2 X 2) x / a x a a b x / 1 a b+x 1 a 1 a ab / a ab a a b ab / 1 a b+ab 1 a 1 a b/(1 b) b 1 a/ (1 a)(1 b) 1 a 1 a b/(1 b) b/(1 b) 1 (c) (2 points) What is the meaning of the above result (from statistical point)? Note that x π 11, a π 1+, b π +1 and so x ab means π 11 π 1+ π +1. Also when x ab, π 12 a x a ab a(1 b) π 1+ π +2, π 21 b x b ab (1 a)b π 2+ π +2 and π 22 1 a b+x a1 a b+ab (1 a)(1 b) π 2+ π +2. In other words π ij π i+ π +j, for all (i, j). In other words x ab is the same as to say P (X i, Y j) P (X i)p (Y j) for all (i, j), or X and Y are independent. In parts (a) and (b) above we proved that θ 1 iff x ab. That means what shown here is θ 1 iff X and Y are independent

STAC51: Categorical data Analysis

STAC51: Categorical data Analysis STAC51: Categorical data Analysis Mahinda Samarakoon January 26, 2016 Mahinda Samarakoon STAC51: Categorical data Analysis 1 / 32 Table of contents Contingency Tables 1 Contingency Tables Mahinda Samarakoon

More information

Chapter 2: Describing Contingency Tables - I

Chapter 2: Describing Contingency Tables - I : Describing Contingency Tables - I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu]

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population

More information

ST3241 Categorical Data Analysis I Two-way Contingency Tables. 2 2 Tables, Relative Risks and Odds Ratios

ST3241 Categorical Data Analysis I Two-way Contingency Tables. 2 2 Tables, Relative Risks and Odds Ratios ST3241 Categorical Data Analysis I Two-way Contingency Tables 2 2 Tables, Relative Risks and Odds Ratios 1 What Is A Contingency Table (p.16) Suppose X and Y are two categorical variables X has I categories

More information

Problems Pages 1-4 Answers Page 5 Solutions Pages 6-11

Problems Pages 1-4 Answers Page 5 Solutions Pages 6-11 Part III Practice Problems Problems Pages 1-4 Answers Page 5 Solutions Pages 6-11 1. In estimating population mean or proportion what is the width of an interval? 2. If 25 college students out of 80 graduate

More information

STAT 705: Analysis of Contingency Tables

STAT 705: Analysis of Contingency Tables STAT 705: Analysis of Contingency Tables Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Analysis of Contingency Tables 1 / 45 Outline of Part I: models and parameters Basic

More information

Contingency Tables Part One 1

Contingency Tables Part One 1 Contingency Tables Part One 1 STA 312: Fall 2012 1 See last slide for copyright information. 1 / 32 Suggested Reading: Chapter 2 Read Sections 2.1-2.4 You are not responsible for Section 2.5 2 / 32 Overview

More information

Epidemiology Wonders of Biostatistics Chapter 11 (continued) - probability in a single population. John Koval

Epidemiology Wonders of Biostatistics Chapter 11 (continued) - probability in a single population. John Koval Epidemiology 9509 Wonders of Biostatistics Chapter 11 (continued) - probability in a single population John Koval Department of Epidemiology and Biostatistics University of Western Ontario What is being

More information

13.1 Categorical Data and the Multinomial Experiment

13.1 Categorical Data and the Multinomial Experiment Chapter 13 Categorical Data Analysis 13.1 Categorical Data and the Multinomial Experiment Recall Variable: (numerical) variable (i.e. # of students, temperature, height,). (non-numerical, categorical)

More information

MAT 2379, Introduction to Biostatistics, Sample Calculator Questions 1. MAT 2379, Introduction to Biostatistics

MAT 2379, Introduction to Biostatistics, Sample Calculator Questions 1. MAT 2379, Introduction to Biostatistics MAT 2379, Introduction to Biostatistics, Sample Calculator Questions 1 MAT 2379, Introduction to Biostatistics Sample Calculator Problems for the Final Exam Note: The exam will also contain some problems

More information

n y π y (1 π) n y +ylogπ +(n y)log(1 π).

n y π y (1 π) n y +ylogπ +(n y)log(1 π). Tests for a binomial probability π Let Y bin(n,π). The likelihood is L(π) = n y π y (1 π) n y and the log-likelihood is L(π) = log n y +ylogπ +(n y)log(1 π). So L (π) = y π n y 1 π. 1 Solving for π gives

More information

Inferences About Two Proportions

Inferences About Two Proportions Inferences About Two Proportions Quantitative Methods II Plan for Today Sampling two populations Confidence intervals for differences of two proportions Testing the difference of proportions Examples 1

More information

Contingency Tables. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels.

Contingency Tables. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels. Contingency Tables Definition & Examples. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels. (Using more than two factors gets complicated,

More information

Cohen s s Kappa and Log-linear Models

Cohen s s Kappa and Log-linear Models Cohen s s Kappa and Log-linear Models HRP 261 03/03/03 10-11 11 am 1. Cohen s Kappa Actual agreement = sum of the proportions found on the diagonals. π ii Cohen: Compare the actual agreement with the chance

More information

Statistics 3858 : Contingency Tables

Statistics 3858 : Contingency Tables Statistics 3858 : Contingency Tables 1 Introduction Before proceeding with this topic the student should review generalized likelihood ratios ΛX) for multinomial distributions, its relation to Pearson

More information

STAT Chapter 13: Categorical Data. Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure).

STAT Chapter 13: Categorical Data. Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure). STAT 515 -- Chapter 13: Categorical Data Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure). Many studies allow for more than 2 categories. Example

More information

Unit 9: Inferences for Proportions and Count Data

Unit 9: Inferences for Proportions and Count Data Unit 9: Inferences for Proportions and Count Data Statistics 571: Statistical Methods Ramón V. León 12/15/2008 Unit 9 - Stat 571 - Ramón V. León 1 Large Sample Confidence Interval for Proportion ( pˆ p)

More information

Unit 9: Inferences for Proportions and Count Data

Unit 9: Inferences for Proportions and Count Data Unit 9: Inferences for Proportions and Count Data Statistics 571: Statistical Methods Ramón V. León 1/15/008 Unit 9 - Stat 571 - Ramón V. León 1 Large Sample Confidence Interval for Proportion ( pˆ p)

More information

Testing Independence

Testing Independence Testing Independence Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM 1/50 Testing Independence Previously, we looked at RR = OR = 1

More information

Exercise 1. Exercise 2. Lesson 2 Theoretical Foundations Probabilities Solutions You ip a coin three times.

Exercise 1. Exercise 2. Lesson 2 Theoretical Foundations Probabilities Solutions You ip a coin three times. Lesson 2 Theoretical Foundations Probabilities Solutions monia.ranalli@uniroma3.it Exercise 1 You ip a coin three times. 1. Use a tree diagram to show the possible outcome patterns. How many outcomes are

More information

Lecture 24. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

Lecture 24. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University November 24, 2015 1 2 3 4 5 1 Odds ratios for retrospective studies 2 Odds ratios approximating the

More information

Lecture 8: Summary Measures

Lecture 8: Summary Measures Lecture 8: Summary Measures Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 8:

More information

Describing Contingency tables

Describing Contingency tables Today s topics: Describing Contingency tables 1. Probability structure for contingency tables (distributions, sensitivity/specificity, sampling schemes). 2. Comparing two proportions (relative risk, odds

More information

STA 291 Lecture 8. Probability. Probability Rules. Joint and Marginal Probability. STA Lecture 8 1

STA 291 Lecture 8. Probability. Probability Rules. Joint and Marginal Probability. STA Lecture 8 1 STA 291 Lecture 8 Probability Probability Rules Joint and Marginal Probability STA 291 - Lecture 8 1 Union and Intersection Let A and B denote two events. The union of two events: A B The intersection

More information

Lecture 01: Introduction

Lecture 01: Introduction Lecture 01: Introduction Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 01: Introduction

More information

Discrete Distributions

Discrete Distributions Discrete Distributions STA 281 Fall 2011 1 Introduction Previously we defined a random variable to be an experiment with numerical outcomes. Often different random variables are related in that they have

More information

2 Describing Contingency Tables

2 Describing Contingency Tables 2 Describing Contingency Tables I. Probability structure of a 2-way contingency table I.1 Contingency Tables X, Y : cat. var. Y usually random (except in a case-control study), response; X can be random

More information

STAT 526 Spring Final Exam. Thursday May 5, 2011

STAT 526 Spring Final Exam. Thursday May 5, 2011 STAT 526 Spring 2011 Final Exam Thursday May 5, 2011 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will

More information

Lecture 9. Selected material from: Ch. 12 The analysis of categorical data and goodness of fit tests

Lecture 9. Selected material from: Ch. 12 The analysis of categorical data and goodness of fit tests Lecture 9 Selected material from: Ch. 12 The analysis of categorical data and goodness of fit tests Univariate categorical data Univariate categorical data are best summarized in a one way frequency table.

More information

ST3241 Categorical Data Analysis I Two-way Contingency Tables. Odds Ratio and Tests of Independence

ST3241 Categorical Data Analysis I Two-way Contingency Tables. Odds Ratio and Tests of Independence ST3241 Categorical Data Analysis I Two-way Contingency Tables Odds Ratio and Tests of Independence 1 Inference For Odds Ratio (p. 24) For small to moderate sample size, the distribution of sample odds

More information

Three-Way Contingency Tables

Three-Way Contingency Tables Newsom PSY 50/60 Categorical Data Analysis, Fall 06 Three-Way Contingency Tables Three-way contingency tables involve three binary or categorical variables. I will stick mostly to the binary case to keep

More information

Lecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

Lecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University Lecture 25 Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University November 24, 2015 1 2 3 4 5 6 7 8 9 10 11 1 Hypothesis s of homgeneity 2 Estimating risk

More information

We know from STAT.1030 that the relevant test statistic for equality of proportions is:

We know from STAT.1030 that the relevant test statistic for equality of proportions is: 2. Chi 2 -tests for equality of proportions Introduction: Two Samples Consider comparing the sample proportions p 1 and p 2 in independent random samples of size n 1 and n 2 out of two populations which

More information

Epidemiology Principle of Biostatistics Chapter 11 - Inference about probability in a single population. John Koval

Epidemiology Principle of Biostatistics Chapter 11 - Inference about probability in a single population. John Koval Epidemiology 9509 Principle of Biostatistics Chapter 11 - Inference about probability in a single population John Koval Department of Epidemiology and Biostatistics University of Western Ontario What is

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

Estimation and Confidence Intervals

Estimation and Confidence Intervals Estimation and Confidence Intervals Sections 7.1-7.3 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 17-3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

Binary Logistic Regression

Binary Logistic Regression The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b

More information

Module 10: Analysis of Categorical Data Statistics (OA3102)

Module 10: Analysis of Categorical Data Statistics (OA3102) Module 10: Analysis of Categorical Data Statistics (OA3102) Professor Ron Fricker Naval Postgraduate School Monterey, California Reading assignment: WM&S chapter 14.1-14.7 Revision: 3-12 1 Goals for this

More information

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013 Analysis of Categorical Data Nick Jackson University of Southern California Department of Psychology 10/11/2013 1 Overview Data Types Contingency Tables Logit Models Binomial Ordinal Nominal 2 Things not

More information

Goodness of Fit Tests

Goodness of Fit Tests Goodness of Fit Tests Marc H. Mehlman marcmehlman@yahoo.com University of New Haven (University of New Haven) Goodness of Fit Tests 1 / 38 Table of Contents 1 Goodness of Fit Chi Squared Test 2 Tests of

More information

Chapters 4-6: Inference with two samples Read sections 4.2.5, 5.2, 5.3, 6.2

Chapters 4-6: Inference with two samples Read sections 4.2.5, 5.2, 5.3, 6.2 Chapters 4-6: Inference with two samples Read sections 45, 5, 53, 6 COMPARING TWO POPULATION MEANS When presented with two samples that you wish to compare, there are two possibilities: I independent samples

More information

Investigating Models with Two or Three Categories

Investigating Models with Two or Three Categories Ronald H. Heck and Lynn N. Tabata 1 Investigating Models with Two or Three Categories For the past few weeks we have been working with discriminant analysis. Let s now see what the same sort of model might

More information

Tests for Population Proportion(s)

Tests for Population Proportion(s) Tests for Population Proportion(s) Esra Akdeniz April 6th, 2016 Motivation We are interested in estimating the prevalence rate of breast cancer among 50- to 54-year-old women whose mothers have had breast

More information

BIOS 625 Fall 2015 Homework Set 3 Solutions

BIOS 625 Fall 2015 Homework Set 3 Solutions BIOS 65 Fall 015 Homework Set 3 Solutions 1. Agresti.0 Table.1 is from an early study on the death penalty in Florida. Analyze these data and show that Simpson s Paradox occurs. Death Penalty Victim's

More information

Lab #11. Variable B. Variable A Y a b a+b N c d c+d a+c b+d N = a+b+c+d

Lab #11. Variable B. Variable A Y a b a+b N c d c+d a+c b+d N = a+b+c+d BIOS 4120: Introduction to Biostatistics Breheny Lab #11 We will explore observational studies in today s lab and review how to make inferences on contingency tables. We will only use 2x2 tables for today

More information

Comparing p s Dr. Don Edwards notes (slightly edited and augmented) The Odds for Success

Comparing p s Dr. Don Edwards notes (slightly edited and augmented) The Odds for Success Comparing p s Dr. Don Edwards notes (slightly edited and augmented) The Odds for Success When the experiment consists of a series of n independent trials, and each trial may end in either success or failure,

More information

Tests for Two Correlated Proportions in a Matched Case- Control Design

Tests for Two Correlated Proportions in a Matched Case- Control Design Chapter 155 Tests for Two Correlated Proportions in a Matched Case- Control Design Introduction A 2-by-M case-control study investigates a risk factor relevant to the development of a disease. A population

More information

Topics on Statistics 3

Topics on Statistics 3 Topics on Statistics 3 Pejman Mahboubi April 24, 2018 1 Contingency Tables Assume we ask a sample of 1127 Americans if they believe in an afterlife world. The table below cross classifies the sample based

More information

Contingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878

Contingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878 Contingency Tables I. Definition & Examples. A) Contingency tables are tables where we are looking at two (or more - but we won t cover three or more way tables, it s way too complicated) factors, each

More information

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics Faculty of Health Sciences Regression models Counts, Poisson regression, 27-5-2013 Lene Theil Skovgaard Dept. of Biostatistics 1 / 36 Count outcome PKA & LTS, Sect. 7.2 Poisson regression The Binomial

More information

Stat 135 Fall 2013 FINAL EXAM December 18, 2013

Stat 135 Fall 2013 FINAL EXAM December 18, 2013 Stat 135 Fall 2013 FINAL EXAM December 18, 2013 Name: Person on right SID: Person on left There will be one, double sided, handwritten, 8.5in x 11in page of notes allowed during the exam. The exam is closed

More information

Final Exam. 1 True or False (15 Points)

Final Exam. 1 True or False (15 Points) 10-606 Final Exam Submit by Oct. 16, 2017 11:59pm EST Please submit early, and update your submission if you want to make changes. Do not wait to the last minute to submit: we reserve the right not to

More information

Sociology 362 Data Exercise 6 Logistic Regression 2

Sociology 362 Data Exercise 6 Logistic Regression 2 Sociology 362 Data Exercise 6 Logistic Regression 2 The questions below refer to the data and output beginning on the next page. Although the raw data are given there, you do not have to do any Stata runs

More information

Log-linear Models for Contingency Tables

Log-linear Models for Contingency Tables Log-linear Models for Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Log-linear Models for Two-way Contingency Tables Example: Business Administration Majors and Gender A

More information

Elementary Statistics Lecture 3 Association: Contingency, Correlation and Regression

Elementary Statistics Lecture 3 Association: Contingency, Correlation and Regression Elementary Statistics Lecture 3 Association: Contingency, Correlation and Regression Chong Ma Department of Statistics University of South Carolina chongm@email.sc.edu Chong Ma (Statistics, USC) STAT 201

More information

Practice Questions: Statistics W1111, Fall Solutions

Practice Questions: Statistics W1111, Fall Solutions Practice Questions: Statistics W, Fall 9 Solutions Question.. The standard deviation of Z is 89... P(=6) =..3. is definitely inside of a 95% confidence interval for..4. (a) YES (b) YES (c) NO (d) NO Questions

More information

Categorical Data Analysis 1

Categorical Data Analysis 1 Categorical Data Analysis 1 STA 312: Fall 2012 1 See last slide for copyright information. 1 / 1 Variables and Cases There are n cases (people, rats, factories, wolf packs) in a data set. A variable is

More information

Exam ECON3150/4150: Introductory Econometrics. 18 May 2016; 09:00h-12.00h.

Exam ECON3150/4150: Introductory Econometrics. 18 May 2016; 09:00h-12.00h. Exam ECON3150/4150: Introductory Econometrics. 18 May 2016; 09:00h-12.00h. This is an open book examination where all printed and written resources, in addition to a calculator, are allowed. If you are

More information

Chapter 26: Comparing Counts (Chi Square)

Chapter 26: Comparing Counts (Chi Square) Chapter 6: Comparing Counts (Chi Square) We ve seen that you can turn a qualitative variable into a quantitative one (by counting the number of successes and failures), but that s a compromise it forces

More information

3 Way Tables Edpsy/Psych/Soc 589

3 Way Tables Edpsy/Psych/Soc 589 3 Way Tables Edpsy/Psych/Soc 589 Carolyn J. Anderson Department of Educational Psychology I L L I N O I S university of illinois at urbana-champaign c Board of Trustees, University of Illinois Spring 2017

More information

Epidemiology Principle of Biostatistics Chapter 14 - Dependent Samples and effect measures. John Koval

Epidemiology Principle of Biostatistics Chapter 14 - Dependent Samples and effect measures. John Koval Epidemiology 9509 Principle of Biostatistics Chapter 14 - Dependent Samples and effect measures John Koval Department of Epidemiology and Biostatistics University of Western Ontario What is being covered

More information

Probability and Discrete Distributions

Probability and Discrete Distributions AMS 7L LAB #3 Fall, 2007 Objectives: Probability and Discrete Distributions 1. To explore relative frequency and the Law of Large Numbers 2. To practice the basic rules of probability 3. To work with the

More information

STEP Support Programme. Hints and Partial Solutions for Assignment 17

STEP Support Programme. Hints and Partial Solutions for Assignment 17 STEP Support Programme Hints and Partial Solutions for Assignment 7 Warm-up You need to be quite careful with these proofs to ensure that you are not assuming something that should not be assumed. For

More information

Confidence Intervals for the Mean of Non-normal Data Class 23, Jeremy Orloff and Jonathan Bloom

Confidence Intervals for the Mean of Non-normal Data Class 23, Jeremy Orloff and Jonathan Bloom Confidence Intervals for the Mean of Non-normal Data Class 23, 8.05 Jeremy Orloff and Jonathan Bloom Learning Goals. Be able to derive the formula for conservative normal confidence intervals for the proportion

More information

Tennessee s State Mathematics Standards - Algebra II

Tennessee s State Mathematics Standards - Algebra II Domain Cluster Standard Scope and Clarifications The Real Number System (N-RN) Extend the properties of exponents to rational exponents 1. Explain how the definition of the meaning of rational exponents

More information

Mathematics Standards for High School Algebra II

Mathematics Standards for High School Algebra II Mathematics Standards for High School Algebra II Algebra II is a course required for graduation and is aligned with the College and Career Ready Standards for Mathematics in High School. Throughout the

More information

Math 124: Modules Overall Goal. Point Estimations. Interval Estimation. Math 124: Modules Overall Goal.

Math 124: Modules Overall Goal. Point Estimations. Interval Estimation. Math 124: Modules Overall Goal. What we will do today s David Meredith Department of Mathematics San Francisco State University October 22, 2009 s 1 2 s 3 What is a? Decision support Political decisions s s Goal of statistics: optimize

More information

Gov 2000: 6. Hypothesis Testing

Gov 2000: 6. Hypothesis Testing Gov 2000: 6. Hypothesis Testing Matthew Blackwell October 11, 2016 1 / 55 1. Hypothesis Testing Examples 2. Hypothesis Test Nomenclature 3. Conducting Hypothesis Tests 4. p-values 5. Power Analyses 6.

More information

Review of One-way Tables and SAS

Review of One-way Tables and SAS Stat 504, Lecture 7 1 Review of One-way Tables and SAS In-class exercises: Ex1, Ex2, and Ex3 from http://v8doc.sas.com/sashtml/proc/z0146708.htm To calculate p-value for a X 2 or G 2 in SAS: http://v8doc.sas.com/sashtml/lgref/z0245929.htmz0845409

More information

This is a multiple choice and short answer practice exam. It does not count towards your grade. You may use the tables in your book.

This is a multiple choice and short answer practice exam. It does not count towards your grade. You may use the tables in your book. NAME (Please Print): HONOR PLEDGE (Please Sign): statistics 101 Practice Final Key This is a multiple choice and short answer practice exam. It does not count towards your grade. You may use the tables

More information

Mathematics 375 Probability and Statistics I Final Examination Solutions December 14, 2009

Mathematics 375 Probability and Statistics I Final Examination Solutions December 14, 2009 Mathematics 375 Probability and Statistics I Final Examination Solutions December 4, 9 Directions Do all work in the blue exam booklet. There are possible regular points and possible Extra Credit points.

More information

STAT 201 Assignment 6

STAT 201 Assignment 6 STAT 201 Assignment 6 Partial Solutions 12.1 Research question: Do parents in the school district support the new education program? Parameter: p = proportion of all parents in the school district who

More information

Categorical Variables and Contingency Tables: Description and Inference

Categorical Variables and Contingency Tables: Description and Inference Categorical Variables and Contingency Tables: Description and Inference STAT 526 Professor Olga Vitek March 3, 2011 Reading: Agresti Ch. 1, 2 and 3 Faraway Ch. 4 3 Univariate Binomial and Multinomial Measurements

More information

HWA CHONG INSTITUTION 2016 JC2 PRELIMINARY EXAMINATION. Tuesday 20 September hours. List of Formula (MF15)

HWA CHONG INSTITUTION 2016 JC2 PRELIMINARY EXAMINATION. Tuesday 20 September hours. List of Formula (MF15) HWA CHONG INSTITUTION 06 JC PRELIMINARY EXAMINATION MATHEMATICS Higher 9740/0 Paper Tuesday 0 September 06 3 hours Additional materials: Answer paper List of Formula (MF5) READ THESE INSTRUCTIONS FIRST

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

3 PROBABILITY TOPICS

3 PROBABILITY TOPICS Chapter 3 Probability Topics 135 3 PROBABILITY TOPICS Figure 3.1 Meteor showers are rare, but the probability of them occurring can be calculated. (credit: Navicore/flickr) Introduction It is often necessary

More information

Last few slides from last time

Last few slides from last time Last few slides from last time Example 3: What is the probability that p will fall in a certain range, given p? Flip a coin 50 times. If the coin is fair (p=0.5), what is the probability of getting an

More information

Chapter 5 : Probability. Exercise Sheet. SHilal. 1 P a g e

Chapter 5 : Probability. Exercise Sheet. SHilal. 1 P a g e 1 P a g e experiment ( observing / measuring ) outcomes = results sample space = set of all outcomes events = subset of outcomes If we collect all outcomes we are forming a sample space If we collect some

More information

MTH135/STA104: Probability

MTH135/STA104: Probability MTH35/STA04: Probability Homework # 3 Due: Tuesday, Sep 0, 005 Prof. Robert Wolpert. from prob 7 p. 9 You roll a fair, six-sided die and I roll a die. You win if the number showing on your die is strictly

More information

Exam 1 Solutions. Problem Points Score Total 145

Exam 1 Solutions. Problem Points Score Total 145 Exam Solutions Read each question carefully and answer all to the best of your ability. Show work to receive as much credit as possible. At the end of the exam, please sign the box below. Problem Points

More information

Math 10 - Compilation of Sample Exam Questions + Answers

Math 10 - Compilation of Sample Exam Questions + Answers Math 10 - Compilation of Sample Exam Questions + Sample Exam Question 1 We have a population of size N. Let p be the independent probability of a person in the population developing a disease. Answer the

More information

Epidemiology Wonders of Biostatistics Chapter 13 - Effect Measures. John Koval

Epidemiology Wonders of Biostatistics Chapter 13 - Effect Measures. John Koval Epidemiology 9509 Wonders of Biostatistics Chapter 13 - Effect Measures John Koval Department of Epidemiology and Biostatistics University of Western Ontario What is being covered 1. risk factors 2. risk

More information

1 Comparing two binomials

1 Comparing two binomials BST 140.652 Review notes 1 Comparing two binomials 1. Let X Binomial(n 1,p 1 ) and ˆp 1 = X/n 1 2. Let Y Binomial(n 2,p 2 ) and ˆp 2 = Y/n 2 3. We also use the following notation: n 11 = X n 12 = n 1 X

More information

Chapters 3.2 Discrete distributions

Chapters 3.2 Discrete distributions Chapters 3.2 Discrete distributions In this section we study several discrete distributions and their properties. Here are a few, classified by their support S X. There are of course many, many more. For

More information

Solution: First note that the power function of the test is given as follows,

Solution: First note that the power function of the test is given as follows, Problem 4.5.8: Assume the life of a tire given by X is distributed N(θ, 5000 ) Past experience indicates that θ = 30000. The manufacturere claims the tires made by a new process have mean θ > 30000. Is

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

Chapter 8: Confidence Intervals

Chapter 8: Confidence Intervals Chapter 8: Confidence Intervals Introduction Suppose you are trying to determine the mean rent of a two-bedroom apartment in your town. You might look in the classified section of the newspaper, write

More information

Discrete Multivariate Statistics

Discrete Multivariate Statistics Discrete Multivariate Statistics Univariate Discrete Random variables Let X be a discrete random variable which, in this module, will be assumed to take a finite number of t different values which are

More information

This does not cover everything on the final. Look at the posted practice problems for other topics.

This does not cover everything on the final. Look at the posted practice problems for other topics. Class 7: Review Problems for Final Exam 8.5 Spring 7 This does not cover everything on the final. Look at the posted practice problems for other topics. To save time in class: set up, but do not carry

More information

MSH3 Generalized linear model

MSH3 Generalized linear model Contents MSH3 Generalized linear model 7 Log-Linear Model 231 7.1 Equivalence between GOF measures........... 231 7.2 Sampling distribution................... 234 7.3 Interpreting Log-Linear models..............

More information

Today we ll discuss ways to learn how to think about events that are influenced by chance.

Today we ll discuss ways to learn how to think about events that are influenced by chance. Overview Today we ll discuss ways to learn how to think about events that are influenced by chance. Basic probability: cards, coins and dice Definitions and rules: mutually exclusive events and independent

More information

15: CHI SQUARED TESTS

15: CHI SQUARED TESTS 15: CHI SQUARED ESS MULIPLE CHOICE QUESIONS In the following multiple choice questions, please circle the correct answer. 1. Which statistical technique is appropriate when we describe a single population

More information

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: ) NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3

More information

Two Correlated Proportions Non- Inferiority, Superiority, and Equivalence Tests

Two Correlated Proportions Non- Inferiority, Superiority, and Equivalence Tests Chapter 59 Two Correlated Proportions on- Inferiority, Superiority, and Equivalence Tests Introduction This chapter documents three closely related procedures: non-inferiority tests, superiority (by a

More information

Lecture 12: Effect modification, and confounding in logistic regression

Lecture 12: Effect modification, and confounding in logistic regression Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression

More information

Lecture 23. November 15, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

Lecture 23. November 15, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University. This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Inferences About Two Population Proportions

Inferences About Two Population Proportions Inferences About Two Population Proportions MATH 130, Elements of Statistics I J. Robert Buchanan Department of Mathematics Fall 2018 Background Recall: for a single population the sampling proportion

More information

Probability deals with modeling of random phenomena (phenomena or experiments whose outcomes may vary)

Probability deals with modeling of random phenomena (phenomena or experiments whose outcomes may vary) Chapter 14 From Randomness to Probability How to measure a likelihood of an event? How likely is it to answer correctly one out of two true-false questions on a quiz? Is it more, less, or equally likely

More information