STAC51H3:Categorical Data Analysis Assign 2 Due: Thu Feb 9, 2016 in class All relevant work must be shown for credit.
|
|
- Oswin Morgan
- 5 years ago
- Views:
Transcription
1 STAC51H3:Categorical Data Analysis Assign 2 Due: Thu Feb 9, 2016 in class All relevant work must be shown for credit. Note: In any question, if you are using R, all R codes and R outputs must be included in your answers. You should assume that the reader is not familiar with R outputs and so explain all your findings, quoting necessary values form your outputs. Please note that academic integrity is fundamental to learning and scholarship. You may discuss questions with other students. However, the work you submit should be your own. If I feel suspicious of any assignment (e.g. if your work doesn t appear to be consistent with what we have discussed in class), I will not mark the assignment. Instead, I will ask you to present your work in my office and your grade will be assigned based on your presentation. 1. (Agresti) In the United States, the estimated annual probability that a woman over the age of 35 dies of lung cancer equals for current smokers and for nonsmokers [M. Pagano and K. Gauvreau, Principles of Biostatistics, Belmont, CA: Duxbury Press (1993), p. 134]. (a) (3 points) Calculate and interpret the difference of proportions and the relative risk. Denoting smokers by 1 and non-smokers by 2, the difference of proportion ˆπ 1 ˆπ For women over 35 years of age, the probability of dying of lung cancer is grater (by ) for smokers compared to non-smokers. Relative risk is / For women over 35 years of age, the chance of dying of lung cancer for smokers is times higher than that of the non-smokers. (b) (3 points) Calculate and interpret the odds ratio. Explain why the relative risk and odds ratio take similar values. Is this always the case or only in some cases? Explain. Odds ratio ˆθ /( ) For women over 35 years of /( ) age, the odds of dying from lung cancer is 10.8 times higher for smoker compared to non-smokers. The relative risk and the odds ratios are very different in example. This is usually the case for rare diseases, i.e. when the probability is small. > # part a > p1 < > p2 < > diff <- p1-p2 # Risk difference Question 1 continues on the next page...
2 > diff [1] > rr <- p1/p2 # Relative risk > rr [1] > # part b > odds1 <- p1/(1-p1) > odds2 <- p2/(1-p2) > thetahat <- odds1/odds2 > odds1 [1] > odds2 [1] > thetahat [1] Page 2 of 11
3 Page 3 of Drivers at an intersection are classified by gender (Female or Male) and seat-belt usage (Yes or No). After one hours observation, the following table was collected: Seat-belt use Gender Yes No F M (a) (4 points) Compute and interpret the odds (odd of not wearing seat-belt) ratio for this example. ˆθ 60/ The odds of not wearing seat belts 45/65 among female drivers is about 0.79 times that among male drivers. Or the odds of not wearing seat belts among male drivers is about 1/ times that among female drivers. (b) (3 points) Which sampling model (Poisson, Binomial, Multinomial, Product Multinomial) seems most appropriate here? Give reasons for your answer. The design didn t have a fixed sample size. The number of drivers passing the crossing in a one-hour period is a random variable, typically modeled by Poisson distribution. In contingency tables, cell counts have independent Poisson distributions. (c) (2 points) Is one of the variables a response variable? Which one? Explain. Whether or not wearing seat belts is what can depend on gender and whether or not wearing seat belts is the response variable and so gender is the explanatory variable. 3. A survey estimated that 20% of all Americans aged 16 to 20 drove under the influence of drugs or alcohol. A similar survey is planned for Canada. They want a 95% confidence interval to have a margin of error of 0.04 (for Wald confidence interval). (a) (4 points) Find the necessary sample size if they expect to find results similar to those in the United States. 0.2 (1 0.2) We wanr and so n 385 n (b) (2 points) Suppose instead they used the conservative formula based on ˆp 0.5. What is now the required sample size?
4 Page 4 of (1 0.5) We wanr and so n 600 n 4. In this question we will do a simulation study of the confidence intervals for odds ratios for contingency tables based on multinomial sampling. (a) (8 points) Use R to generate n contingency tables with total count (i.e. grad total), N 100 with and known cell probabilities (π 11, π 12, π 21, π 22 ) (0.2, 0.3, 0.3, 0.2) from a multinomial distribution. i.e. from multinomial (N, π 11, π 12, π 21, π 22 ). For each of these generated tables, calculate the odds ratio and a 95 percent large sample confidence interval. What is the true odds ratio θ(i.e. population odds ratio) for these tables? How many of the 10 intervals you calculated contain θ? Note in this part please print all your table cell counts (i.e. for the 10 tables), estimated odds ratios (i.e. ˆθ) and the confidence intervals. > #R code Q4 Assign 2 > N <- 100 # the grad total for each table > n <- 10 # number of tables > pi11 <- 0.2 > pi12 <- 0.3 > pi21 <- 0.3 > pi22 <- 0.2 > alpha < > # > table <- rmultinom(n, size N, prob c(pi11, pi12, pi21, pi22)) > theta <- (pi11*pi22)/(pi12*pi21) > table <- t(table) > a <- table[,1] > b <- table[,2] > c <- table[,3] > d <- table[,4] > # add 0.5 if any cell count is 0 to avoid division by zero > a <- (a0)*(a+0.5)+(a > 0)*a > b <- (b0)*(b+0.5)+(b > 0)*b > c <- (c0)*(c+0.5)+(c > 0)*c > d <- (d0)*(d+0.5)+(d > 0)*d > thetahat <- (a*d)/(b*c) > logthetahat <- log(thetahat) > SE <- sqrt(1/a+1/b+1/c+1/d) > LLlog <- logthetahat - qnorm(1-alpha/2)*se > ULlog <- logthetahat + qnorm(1-alpha/2)*se Question 4 continues on the next page...
5 Page 5 of 11 > LL <- exp(lllog) > UL <- exp(ullog) > results <- cbind(a, b, c, d, thetahat, LL, UL, theta) > results a b c d thetahat LL UL theta [1,] [2,] [3,] [4,] [5,] [6,] [7,] [8,] [9,] [10,] > (b) (7 points) Repeat part (a) but this time with n Do not print the tables etc this time, but instead calculate the proportion of the intervals (i.e. the million intervals) containing θ. Comment on your value. Note: Any table with a zero cell count has odds ratio equal to 0 or. Replace any zero cell counts by 0.5. (this is often done when dealing with zero cell counts) > #Now reapeat the above simulation for a larger number of tables > # and calculate the coverage probability > N <- 100 # the grad total for each table > n < # number of tables > pi11 <- 0.2 > pi12 <- 0.3 > pi21 <- 0.3 > pi22 <- 0.2 > alpha < > # > table <- rmultinom(n, size N, prob c(pi11, pi12, pi21, pi22)) > theta <- (pi11*pi22)/(pi12*pi21) > table <- t(table) > a <- table[,1] > b <- table[,2] > c <- table[,3] > d <- table[,4] > # add 0.5 if any cell count is 0 to avoid division by zero Question 4 continues on the next page...
6 Page 6 of 11 > a <- (a0)*(a+0.5)+(a > 0)*a > b <- (b0)*(b+0.5)+(b > 0)*b > c <- (c0)*(c+0.5)+(c > 0)*c > d <- (d0)*(d+0.5)+(d > 0)*d > thetahat <- (a*d)/(b*c) > logthetahat <- log(thetahat) > SE <- sqrt(1/a+1/b+1/c+1/d) > LLlog <- logthetahat - qnorm(1-alpha/2)*se > ULlog <- logthetahat + qnorm(1-alpha/2)*se > LL <- exp(lllog) > UL <- exp(ullog) > thetainci (LL < theta)*(ul > theta) > observed_cof_level <- mean(thetainci) > observed_cof_level [1] > (c) (3 points) Repeat part (b), but this time with N 20. (i.e. still a million tables but each table with grad total 20) > N <- 20 # the grad total for each table > n < # number of tables > pi11 <- 0.2 > pi12 <- 0.3 > pi21 <- 0.3 > pi22 <- 0.2 > alpha < > # > table <- rmultinom(n, size N, prob c(pi11, pi12, pi21, pi22)) > theta <- (pi11*pi22)/(pi12*pi21) > table <- t(table) > a <- table[,1] > b <- table[,2] > c <- table[,3] > d <- table[,4] > # add 0.5 if any cell count is 0 to avoid division by zero > a <- (a0)*(a+0.5)+(a > 0)*a > b <- (b0)*(b+0.5)+(b > 0)*b > c <- (c0)*(c+0.5)+(c > 0)*c > d <- (d0)*(d+0.5)+(d > 0)*d Question 4 continues on the next page...
7 > thetahat <- (a*d)/(b*c) > logthetahat <- log(thetahat) > SE <- sqrt(1/a+1/b+1/c+1/d) > LLlog <- logthetahat - qnorm(1-alpha/2)*se > ULlog <- logthetahat + qnorm(1-alpha/2)*se > LL <- exp(lllog) > UL <- exp(ullog) > thetainci (LL < theta)*(ul > theta) > observed_cof_level <- mean(thetainci) > observed_cof_level [1] Page 7 of 11
8 Page 8 of Suppose that we would like to know whether there is an association between voter gender and candidate choice in an election, say candidate A and candidate B. An investigator has decided to conduct an exit poll with 50 voters. He classified the results by gender and the candidate they voted for. The counts are given in the table below: Candidate A Candidate B Female Male 5 20 (a) (2 points) What is the appropriate sampling method appropriate for counts in this this table and give reasons for your answer? In this design, the grand total is fixed and so multinomial sampling sampling is the appropriate sampling method. (b) (4 points) What are the estimated cell provabilities under the assumption of independence of gender and the preference for the candidate? > a <- 10 > b <- 15 > c <- 5 > d <- 20 > n <- a+b+c+d > n [1] 50 > pi11 <- ((a+b)/n)*((a+c)/n) > pi12 <- ((a+b)/n)*((b+d)/n) > pi21 <- ((c+d)/n)*((a+c)/n) > pi22 <- ((c+d)/n)*((b+d)/n) > prob <- c(pi11, pi12, pi21, pi22) > prob [1] (c) (4 points) Using the estimated cell probabilities as the actual values of the probabilities (i.e. π ij ), calculate the probability of observing the counts in the table above. > pobstable <- dmultinom(c(a, b, c, d), size n, prob prob) > pobstable [1] Question 5 continues on the next page...
9 Page 9 of 11 (d) (5 points) Calculate the probability of observing table counts as surprising as or more surprising than counts on the table above. > a <- 10 > b <- 15 > c <- 5 > d <- 20 > n <- a+b+c+d > n [1] 50 > X <- t(as.matrix(expand.grid(0:n, 0:n, 0:n))) > X <- X[, colsums(x) < n] > X <- rbind(x, n - colsums(x)) > # Let s use estimated proprotions under independence as the probabilities and caluculate > # the probability of observing a table as surprising as or more surprising than the > # table observed > pi11 <- ((a+b)/n)*((a+c)/n) > pi12 <- ((a+b)/n)*((b+d)/n) > pi21 <- ((c+d)/n)*((a+c)/n) > pi22 <- ((c+d)/n)*((b+d)/n) > prob <- c(pi11, pi12, pi21, pi22) > prob [1] > sum(prob) [1] 1 > #p <- round(apply(x, 2, function(x) dmultinom(x, size n, prob prob)), 3) > p <- apply(x, 2, function(x) dmultinom(x, size n, prob prob)) > pobstable <- dmultinom(c(a, b, c, d), size n, prob prob) > pextreme <- subset(p, p < pobstable) > pvalue sum(pextreme) > pobstable [1] > pvalue [1]
10 6. Consider the 2 2 contingency table with the cell probabilities as shown below: Let θ P (Y 1 X1)/P (Y 2 X1) P (Y 1 X2)/P (Y 2 X2). Y 1 Y 2 Total X 1 x a x a X 2 b x 1 a b + x 1-a Total b 1 b 1 Page 10 of 11 Note: Do not use any statistical ideas in parts (a) and (b) of this question. Just treat a, b and x as real numbers between 0 and 1 and use only simple algebra (nothing more than high school algebra). (a) (5 points) Show that if θ 1, then x a b θ P (Y 1 X 1)/P (Y 2 X 1) P (Y 1 X 2)/P (Y 2 X 2) x / a x a a b x / 1 a b+x 1 a 1 a x(1 a b + x) (a x)(b x) x ax bx + x2 ab ax bx + x 2 θ 1 x ax bx + x2 ab ax bx + x 1 2 x ax bx + x 2 ab ax bx + x 2 x ab (b) (5 points) Show that if x a b, then θ 1 Question 6 continues on the next page...
11 Page 11 of 11 If x ab, then θ P (Y 1 X 1)/P (Y 2 X 1) P (Y 1 X 2)/P (Y 2 X 2) x / a x a a b x / 1 a b+x 1 a 1 a ab / a ab a a b ab / 1 a b+ab 1 a 1 a b/(1 b) b 1 a/ (1 a)(1 b) 1 a 1 a b/(1 b) b/(1 b) 1 (c) (2 points) What is the meaning of the above result (from statistical point)? Note that x π 11, a π 1+, b π +1 and so x ab means π 11 π 1+ π +1. Also when x ab, π 12 a x a ab a(1 b) π 1+ π +2, π 21 b x b ab (1 a)b π 2+ π +2 and π 22 1 a b+x a1 a b+ab (1 a)(1 b) π 2+ π +2. In other words π ij π i+ π +j, for all (i, j). In other words x ab is the same as to say P (X i, Y j) P (X i)p (Y j) for all (i, j), or X and Y are independent. In parts (a) and (b) above we proved that θ 1 iff x ab. That means what shown here is θ 1 iff X and Y are independent
STAC51: Categorical data Analysis
STAC51: Categorical data Analysis Mahinda Samarakoon January 26, 2016 Mahinda Samarakoon STAC51: Categorical data Analysis 1 / 32 Table of contents Contingency Tables 1 Contingency Tables Mahinda Samarakoon
More informationChapter 2: Describing Contingency Tables - I
: Describing Contingency Tables - I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu]
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population
More informationST3241 Categorical Data Analysis I Two-way Contingency Tables. 2 2 Tables, Relative Risks and Odds Ratios
ST3241 Categorical Data Analysis I Two-way Contingency Tables 2 2 Tables, Relative Risks and Odds Ratios 1 What Is A Contingency Table (p.16) Suppose X and Y are two categorical variables X has I categories
More informationProblems Pages 1-4 Answers Page 5 Solutions Pages 6-11
Part III Practice Problems Problems Pages 1-4 Answers Page 5 Solutions Pages 6-11 1. In estimating population mean or proportion what is the width of an interval? 2. If 25 college students out of 80 graduate
More informationSTAT 705: Analysis of Contingency Tables
STAT 705: Analysis of Contingency Tables Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Analysis of Contingency Tables 1 / 45 Outline of Part I: models and parameters Basic
More informationContingency Tables Part One 1
Contingency Tables Part One 1 STA 312: Fall 2012 1 See last slide for copyright information. 1 / 32 Suggested Reading: Chapter 2 Read Sections 2.1-2.4 You are not responsible for Section 2.5 2 / 32 Overview
More informationEpidemiology Wonders of Biostatistics Chapter 11 (continued) - probability in a single population. John Koval
Epidemiology 9509 Wonders of Biostatistics Chapter 11 (continued) - probability in a single population John Koval Department of Epidemiology and Biostatistics University of Western Ontario What is being
More information13.1 Categorical Data and the Multinomial Experiment
Chapter 13 Categorical Data Analysis 13.1 Categorical Data and the Multinomial Experiment Recall Variable: (numerical) variable (i.e. # of students, temperature, height,). (non-numerical, categorical)
More informationMAT 2379, Introduction to Biostatistics, Sample Calculator Questions 1. MAT 2379, Introduction to Biostatistics
MAT 2379, Introduction to Biostatistics, Sample Calculator Questions 1 MAT 2379, Introduction to Biostatistics Sample Calculator Problems for the Final Exam Note: The exam will also contain some problems
More informationn y π y (1 π) n y +ylogπ +(n y)log(1 π).
Tests for a binomial probability π Let Y bin(n,π). The likelihood is L(π) = n y π y (1 π) n y and the log-likelihood is L(π) = log n y +ylogπ +(n y)log(1 π). So L (π) = y π n y 1 π. 1 Solving for π gives
More informationInferences About Two Proportions
Inferences About Two Proportions Quantitative Methods II Plan for Today Sampling two populations Confidence intervals for differences of two proportions Testing the difference of proportions Examples 1
More informationContingency Tables. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels.
Contingency Tables Definition & Examples. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels. (Using more than two factors gets complicated,
More informationCohen s s Kappa and Log-linear Models
Cohen s s Kappa and Log-linear Models HRP 261 03/03/03 10-11 11 am 1. Cohen s Kappa Actual agreement = sum of the proportions found on the diagonals. π ii Cohen: Compare the actual agreement with the chance
More informationStatistics 3858 : Contingency Tables
Statistics 3858 : Contingency Tables 1 Introduction Before proceeding with this topic the student should review generalized likelihood ratios ΛX) for multinomial distributions, its relation to Pearson
More informationSTAT Chapter 13: Categorical Data. Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure).
STAT 515 -- Chapter 13: Categorical Data Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure). Many studies allow for more than 2 categories. Example
More informationUnit 9: Inferences for Proportions and Count Data
Unit 9: Inferences for Proportions and Count Data Statistics 571: Statistical Methods Ramón V. León 12/15/2008 Unit 9 - Stat 571 - Ramón V. León 1 Large Sample Confidence Interval for Proportion ( pˆ p)
More informationUnit 9: Inferences for Proportions and Count Data
Unit 9: Inferences for Proportions and Count Data Statistics 571: Statistical Methods Ramón V. León 1/15/008 Unit 9 - Stat 571 - Ramón V. León 1 Large Sample Confidence Interval for Proportion ( pˆ p)
More informationTesting Independence
Testing Independence Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM 1/50 Testing Independence Previously, we looked at RR = OR = 1
More informationExercise 1. Exercise 2. Lesson 2 Theoretical Foundations Probabilities Solutions You ip a coin three times.
Lesson 2 Theoretical Foundations Probabilities Solutions monia.ranalli@uniroma3.it Exercise 1 You ip a coin three times. 1. Use a tree diagram to show the possible outcome patterns. How many outcomes are
More informationLecture 24. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University
Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University November 24, 2015 1 2 3 4 5 1 Odds ratios for retrospective studies 2 Odds ratios approximating the
More informationLecture 8: Summary Measures
Lecture 8: Summary Measures Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 8:
More informationDescribing Contingency tables
Today s topics: Describing Contingency tables 1. Probability structure for contingency tables (distributions, sensitivity/specificity, sampling schemes). 2. Comparing two proportions (relative risk, odds
More informationSTA 291 Lecture 8. Probability. Probability Rules. Joint and Marginal Probability. STA Lecture 8 1
STA 291 Lecture 8 Probability Probability Rules Joint and Marginal Probability STA 291 - Lecture 8 1 Union and Intersection Let A and B denote two events. The union of two events: A B The intersection
More informationLecture 01: Introduction
Lecture 01: Introduction Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 01: Introduction
More informationDiscrete Distributions
Discrete Distributions STA 281 Fall 2011 1 Introduction Previously we defined a random variable to be an experiment with numerical outcomes. Often different random variables are related in that they have
More information2 Describing Contingency Tables
2 Describing Contingency Tables I. Probability structure of a 2-way contingency table I.1 Contingency Tables X, Y : cat. var. Y usually random (except in a case-control study), response; X can be random
More informationSTAT 526 Spring Final Exam. Thursday May 5, 2011
STAT 526 Spring 2011 Final Exam Thursday May 5, 2011 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will
More informationLecture 9. Selected material from: Ch. 12 The analysis of categorical data and goodness of fit tests
Lecture 9 Selected material from: Ch. 12 The analysis of categorical data and goodness of fit tests Univariate categorical data Univariate categorical data are best summarized in a one way frequency table.
More informationST3241 Categorical Data Analysis I Two-way Contingency Tables. Odds Ratio and Tests of Independence
ST3241 Categorical Data Analysis I Two-way Contingency Tables Odds Ratio and Tests of Independence 1 Inference For Odds Ratio (p. 24) For small to moderate sample size, the distribution of sample odds
More informationThree-Way Contingency Tables
Newsom PSY 50/60 Categorical Data Analysis, Fall 06 Three-Way Contingency Tables Three-way contingency tables involve three binary or categorical variables. I will stick mostly to the binary case to keep
More informationLecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University
Lecture 25 Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University November 24, 2015 1 2 3 4 5 6 7 8 9 10 11 1 Hypothesis s of homgeneity 2 Estimating risk
More informationWe know from STAT.1030 that the relevant test statistic for equality of proportions is:
2. Chi 2 -tests for equality of proportions Introduction: Two Samples Consider comparing the sample proportions p 1 and p 2 in independent random samples of size n 1 and n 2 out of two populations which
More informationEpidemiology Principle of Biostatistics Chapter 11 - Inference about probability in a single population. John Koval
Epidemiology 9509 Principle of Biostatistics Chapter 11 - Inference about probability in a single population John Koval Department of Epidemiology and Biostatistics University of Western Ontario What is
More informationLecture 14: Introduction to Poisson Regression
Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why
More informationModelling counts. Lecture 14: Introduction to Poisson Regression. Overview
Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week
More informationEstimation and Confidence Intervals
Estimation and Confidence Intervals Sections 7.1-7.3 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 17-3339 Cathy Poliak, Ph.D. cathy@math.uh.edu
More informationBinary Logistic Regression
The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b
More informationModule 10: Analysis of Categorical Data Statistics (OA3102)
Module 10: Analysis of Categorical Data Statistics (OA3102) Professor Ron Fricker Naval Postgraduate School Monterey, California Reading assignment: WM&S chapter 14.1-14.7 Revision: 3-12 1 Goals for this
More informationAnalysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013
Analysis of Categorical Data Nick Jackson University of Southern California Department of Psychology 10/11/2013 1 Overview Data Types Contingency Tables Logit Models Binomial Ordinal Nominal 2 Things not
More informationGoodness of Fit Tests
Goodness of Fit Tests Marc H. Mehlman marcmehlman@yahoo.com University of New Haven (University of New Haven) Goodness of Fit Tests 1 / 38 Table of Contents 1 Goodness of Fit Chi Squared Test 2 Tests of
More informationChapters 4-6: Inference with two samples Read sections 4.2.5, 5.2, 5.3, 6.2
Chapters 4-6: Inference with two samples Read sections 45, 5, 53, 6 COMPARING TWO POPULATION MEANS When presented with two samples that you wish to compare, there are two possibilities: I independent samples
More informationInvestigating Models with Two or Three Categories
Ronald H. Heck and Lynn N. Tabata 1 Investigating Models with Two or Three Categories For the past few weeks we have been working with discriminant analysis. Let s now see what the same sort of model might
More informationTests for Population Proportion(s)
Tests for Population Proportion(s) Esra Akdeniz April 6th, 2016 Motivation We are interested in estimating the prevalence rate of breast cancer among 50- to 54-year-old women whose mothers have had breast
More informationBIOS 625 Fall 2015 Homework Set 3 Solutions
BIOS 65 Fall 015 Homework Set 3 Solutions 1. Agresti.0 Table.1 is from an early study on the death penalty in Florida. Analyze these data and show that Simpson s Paradox occurs. Death Penalty Victim's
More informationLab #11. Variable B. Variable A Y a b a+b N c d c+d a+c b+d N = a+b+c+d
BIOS 4120: Introduction to Biostatistics Breheny Lab #11 We will explore observational studies in today s lab and review how to make inferences on contingency tables. We will only use 2x2 tables for today
More informationComparing p s Dr. Don Edwards notes (slightly edited and augmented) The Odds for Success
Comparing p s Dr. Don Edwards notes (slightly edited and augmented) The Odds for Success When the experiment consists of a series of n independent trials, and each trial may end in either success or failure,
More informationTests for Two Correlated Proportions in a Matched Case- Control Design
Chapter 155 Tests for Two Correlated Proportions in a Matched Case- Control Design Introduction A 2-by-M case-control study investigates a risk factor relevant to the development of a disease. A population
More informationTopics on Statistics 3
Topics on Statistics 3 Pejman Mahboubi April 24, 2018 1 Contingency Tables Assume we ask a sample of 1127 Americans if they believe in an afterlife world. The table below cross classifies the sample based
More informationContingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878
Contingency Tables I. Definition & Examples. A) Contingency tables are tables where we are looking at two (or more - but we won t cover three or more way tables, it s way too complicated) factors, each
More informationFaculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics
Faculty of Health Sciences Regression models Counts, Poisson regression, 27-5-2013 Lene Theil Skovgaard Dept. of Biostatistics 1 / 36 Count outcome PKA & LTS, Sect. 7.2 Poisson regression The Binomial
More informationStat 135 Fall 2013 FINAL EXAM December 18, 2013
Stat 135 Fall 2013 FINAL EXAM December 18, 2013 Name: Person on right SID: Person on left There will be one, double sided, handwritten, 8.5in x 11in page of notes allowed during the exam. The exam is closed
More informationFinal Exam. 1 True or False (15 Points)
10-606 Final Exam Submit by Oct. 16, 2017 11:59pm EST Please submit early, and update your submission if you want to make changes. Do not wait to the last minute to submit: we reserve the right not to
More informationSociology 362 Data Exercise 6 Logistic Regression 2
Sociology 362 Data Exercise 6 Logistic Regression 2 The questions below refer to the data and output beginning on the next page. Although the raw data are given there, you do not have to do any Stata runs
More informationLog-linear Models for Contingency Tables
Log-linear Models for Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Log-linear Models for Two-way Contingency Tables Example: Business Administration Majors and Gender A
More informationElementary Statistics Lecture 3 Association: Contingency, Correlation and Regression
Elementary Statistics Lecture 3 Association: Contingency, Correlation and Regression Chong Ma Department of Statistics University of South Carolina chongm@email.sc.edu Chong Ma (Statistics, USC) STAT 201
More informationPractice Questions: Statistics W1111, Fall Solutions
Practice Questions: Statistics W, Fall 9 Solutions Question.. The standard deviation of Z is 89... P(=6) =..3. is definitely inside of a 95% confidence interval for..4. (a) YES (b) YES (c) NO (d) NO Questions
More informationCategorical Data Analysis 1
Categorical Data Analysis 1 STA 312: Fall 2012 1 See last slide for copyright information. 1 / 1 Variables and Cases There are n cases (people, rats, factories, wolf packs) in a data set. A variable is
More informationExam ECON3150/4150: Introductory Econometrics. 18 May 2016; 09:00h-12.00h.
Exam ECON3150/4150: Introductory Econometrics. 18 May 2016; 09:00h-12.00h. This is an open book examination where all printed and written resources, in addition to a calculator, are allowed. If you are
More informationChapter 26: Comparing Counts (Chi Square)
Chapter 6: Comparing Counts (Chi Square) We ve seen that you can turn a qualitative variable into a quantitative one (by counting the number of successes and failures), but that s a compromise it forces
More information3 Way Tables Edpsy/Psych/Soc 589
3 Way Tables Edpsy/Psych/Soc 589 Carolyn J. Anderson Department of Educational Psychology I L L I N O I S university of illinois at urbana-champaign c Board of Trustees, University of Illinois Spring 2017
More informationEpidemiology Principle of Biostatistics Chapter 14 - Dependent Samples and effect measures. John Koval
Epidemiology 9509 Principle of Biostatistics Chapter 14 - Dependent Samples and effect measures John Koval Department of Epidemiology and Biostatistics University of Western Ontario What is being covered
More informationProbability and Discrete Distributions
AMS 7L LAB #3 Fall, 2007 Objectives: Probability and Discrete Distributions 1. To explore relative frequency and the Law of Large Numbers 2. To practice the basic rules of probability 3. To work with the
More informationSTEP Support Programme. Hints and Partial Solutions for Assignment 17
STEP Support Programme Hints and Partial Solutions for Assignment 7 Warm-up You need to be quite careful with these proofs to ensure that you are not assuming something that should not be assumed. For
More informationConfidence Intervals for the Mean of Non-normal Data Class 23, Jeremy Orloff and Jonathan Bloom
Confidence Intervals for the Mean of Non-normal Data Class 23, 8.05 Jeremy Orloff and Jonathan Bloom Learning Goals. Be able to derive the formula for conservative normal confidence intervals for the proportion
More informationTennessee s State Mathematics Standards - Algebra II
Domain Cluster Standard Scope and Clarifications The Real Number System (N-RN) Extend the properties of exponents to rational exponents 1. Explain how the definition of the meaning of rational exponents
More informationMathematics Standards for High School Algebra II
Mathematics Standards for High School Algebra II Algebra II is a course required for graduation and is aligned with the College and Career Ready Standards for Mathematics in High School. Throughout the
More informationMath 124: Modules Overall Goal. Point Estimations. Interval Estimation. Math 124: Modules Overall Goal.
What we will do today s David Meredith Department of Mathematics San Francisco State University October 22, 2009 s 1 2 s 3 What is a? Decision support Political decisions s s Goal of statistics: optimize
More informationGov 2000: 6. Hypothesis Testing
Gov 2000: 6. Hypothesis Testing Matthew Blackwell October 11, 2016 1 / 55 1. Hypothesis Testing Examples 2. Hypothesis Test Nomenclature 3. Conducting Hypothesis Tests 4. p-values 5. Power Analyses 6.
More informationReview of One-way Tables and SAS
Stat 504, Lecture 7 1 Review of One-way Tables and SAS In-class exercises: Ex1, Ex2, and Ex3 from http://v8doc.sas.com/sashtml/proc/z0146708.htm To calculate p-value for a X 2 or G 2 in SAS: http://v8doc.sas.com/sashtml/lgref/z0245929.htmz0845409
More informationThis is a multiple choice and short answer practice exam. It does not count towards your grade. You may use the tables in your book.
NAME (Please Print): HONOR PLEDGE (Please Sign): statistics 101 Practice Final Key This is a multiple choice and short answer practice exam. It does not count towards your grade. You may use the tables
More informationMathematics 375 Probability and Statistics I Final Examination Solutions December 14, 2009
Mathematics 375 Probability and Statistics I Final Examination Solutions December 4, 9 Directions Do all work in the blue exam booklet. There are possible regular points and possible Extra Credit points.
More informationSTAT 201 Assignment 6
STAT 201 Assignment 6 Partial Solutions 12.1 Research question: Do parents in the school district support the new education program? Parameter: p = proportion of all parents in the school district who
More informationCategorical Variables and Contingency Tables: Description and Inference
Categorical Variables and Contingency Tables: Description and Inference STAT 526 Professor Olga Vitek March 3, 2011 Reading: Agresti Ch. 1, 2 and 3 Faraway Ch. 4 3 Univariate Binomial and Multinomial Measurements
More informationHWA CHONG INSTITUTION 2016 JC2 PRELIMINARY EXAMINATION. Tuesday 20 September hours. List of Formula (MF15)
HWA CHONG INSTITUTION 06 JC PRELIMINARY EXAMINATION MATHEMATICS Higher 9740/0 Paper Tuesday 0 September 06 3 hours Additional materials: Answer paper List of Formula (MF5) READ THESE INSTRUCTIONS FIRST
More information8 Nominal and Ordinal Logistic Regression
8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on
More information3 PROBABILITY TOPICS
Chapter 3 Probability Topics 135 3 PROBABILITY TOPICS Figure 3.1 Meteor showers are rare, but the probability of them occurring can be calculated. (credit: Navicore/flickr) Introduction It is often necessary
More informationLast few slides from last time
Last few slides from last time Example 3: What is the probability that p will fall in a certain range, given p? Flip a coin 50 times. If the coin is fair (p=0.5), what is the probability of getting an
More informationChapter 5 : Probability. Exercise Sheet. SHilal. 1 P a g e
1 P a g e experiment ( observing / measuring ) outcomes = results sample space = set of all outcomes events = subset of outcomes If we collect all outcomes we are forming a sample space If we collect some
More informationMTH135/STA104: Probability
MTH35/STA04: Probability Homework # 3 Due: Tuesday, Sep 0, 005 Prof. Robert Wolpert. from prob 7 p. 9 You roll a fair, six-sided die and I roll a die. You win if the number showing on your die is strictly
More informationExam 1 Solutions. Problem Points Score Total 145
Exam Solutions Read each question carefully and answer all to the best of your ability. Show work to receive as much credit as possible. At the end of the exam, please sign the box below. Problem Points
More informationMath 10 - Compilation of Sample Exam Questions + Answers
Math 10 - Compilation of Sample Exam Questions + Sample Exam Question 1 We have a population of size N. Let p be the independent probability of a person in the population developing a disease. Answer the
More informationEpidemiology Wonders of Biostatistics Chapter 13 - Effect Measures. John Koval
Epidemiology 9509 Wonders of Biostatistics Chapter 13 - Effect Measures John Koval Department of Epidemiology and Biostatistics University of Western Ontario What is being covered 1. risk factors 2. risk
More information1 Comparing two binomials
BST 140.652 Review notes 1 Comparing two binomials 1. Let X Binomial(n 1,p 1 ) and ˆp 1 = X/n 1 2. Let Y Binomial(n 2,p 2 ) and ˆp 2 = Y/n 2 3. We also use the following notation: n 11 = X n 12 = n 1 X
More informationChapters 3.2 Discrete distributions
Chapters 3.2 Discrete distributions In this section we study several discrete distributions and their properties. Here are a few, classified by their support S X. There are of course many, many more. For
More informationSolution: First note that the power function of the test is given as follows,
Problem 4.5.8: Assume the life of a tire given by X is distributed N(θ, 5000 ) Past experience indicates that θ = 30000. The manufacturere claims the tires made by a new process have mean θ > 30000. Is
More informationSCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models
SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION
More informationChapter 8: Confidence Intervals
Chapter 8: Confidence Intervals Introduction Suppose you are trying to determine the mean rent of a two-bedroom apartment in your town. You might look in the classified section of the newspaper, write
More informationDiscrete Multivariate Statistics
Discrete Multivariate Statistics Univariate Discrete Random variables Let X be a discrete random variable which, in this module, will be assumed to take a finite number of t different values which are
More informationThis does not cover everything on the final. Look at the posted practice problems for other topics.
Class 7: Review Problems for Final Exam 8.5 Spring 7 This does not cover everything on the final. Look at the posted practice problems for other topics. To save time in class: set up, but do not carry
More informationMSH3 Generalized linear model
Contents MSH3 Generalized linear model 7 Log-Linear Model 231 7.1 Equivalence between GOF measures........... 231 7.2 Sampling distribution................... 234 7.3 Interpreting Log-Linear models..............
More informationToday we ll discuss ways to learn how to think about events that are influenced by chance.
Overview Today we ll discuss ways to learn how to think about events that are influenced by chance. Basic probability: cards, coins and dice Definitions and rules: mutually exclusive events and independent
More information15: CHI SQUARED TESTS
15: CHI SQUARED ESS MULIPLE CHOICE QUESIONS In the following multiple choice questions, please circle the correct answer. 1. Which statistical technique is appropriate when we describe a single population
More informationNATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )
NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3
More informationTwo Correlated Proportions Non- Inferiority, Superiority, and Equivalence Tests
Chapter 59 Two Correlated Proportions on- Inferiority, Superiority, and Equivalence Tests Introduction This chapter documents three closely related procedures: non-inferiority tests, superiority (by a
More informationLecture 12: Effect modification, and confounding in logistic regression
Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression
More informationLecture 23. November 15, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
More informationInferences About Two Population Proportions
Inferences About Two Population Proportions MATH 130, Elements of Statistics I J. Robert Buchanan Department of Mathematics Fall 2018 Background Recall: for a single population the sampling proportion
More informationProbability deals with modeling of random phenomena (phenomena or experiments whose outcomes may vary)
Chapter 14 From Randomness to Probability How to measure a likelihood of an event? How likely is it to answer correctly one out of two true-false questions on a quiz? Is it more, less, or equally likely
More information