Goodness of Fit Tests: Homogeneity

Size: px
Start display at page:

Download "Goodness of Fit Tests: Homogeneity"

Transcription

1 Goodness of Fit Tests: Homogeneity Mathematics 47: Lecture 35 Dan Sloughter Furman University May 11, 2006 Dan Sloughter (Furman University) Goodness of Fit Tests: Homogeneity May 11, / 13

2 Testing for homogeneity Suppose we have c random samples from discrete distributions each having the same r possible outcomes. Dan Sloughter (Furman University) Goodness of Fit Tests: Homogeneity May 11, / 13

3 Testing for homogeneity Suppose we have c random samples from discrete distributions each having the same r possible outcomes. Let p ij = probability of outcome i for the jth distribution, where i = 1, 2,..., r and j = 1, 2,..., c. Dan Sloughter (Furman University) Goodness of Fit Tests: Homogeneity May 11, / 13

4 Testing for homogeneity Suppose we have c random samples from discrete distributions each having the same r possible outcomes. Let p ij = probability of outcome i for the jth distribution, where i = 1, 2,..., r and j = 1, 2,..., c. Let p j = (p 1j, p 2j,..., p rj ) for j = 1, 2,..., c. Dan Sloughter (Furman University) Goodness of Fit Tests: Homogeneity May 11, / 13

5 Testing for homogeneity Suppose we have c random samples from discrete distributions each having the same r possible outcomes. Let p ij = probability of outcome i for the jth distribution, where i = 1, 2,..., r and j = 1, 2,..., c. Let p j = (p 1j, p 2j,..., p rj ) for j = 1, 2,..., c. We want to test H 0 : p 1 = p 2 = = p c H A : p j p k for some j k. Dan Sloughter (Furman University) Goodness of Fit Tests: Homogeneity May 11, / 13

6 Testing for homogeneity Suppose we have c random samples from discrete distributions each having the same r possible outcomes. Let p ij = probability of outcome i for the jth distribution, where i = 1, 2,..., r and j = 1, 2,..., c. Let p j = (p 1j, p 2j,..., p rj ) for j = 1, 2,..., c. We want to test Let H 0 : p 1 = p 2 = = p c H A : p j p k for some j k. n ij = number of observations of outcome i in sample j n i+ = n i1 + n i2 + + n ic = number of observations of outcome i n +j = n 1j + n 2j + + n rj = size of sample j n = n 1+ + n n r+ = n +1 + n n +c = total number of observations. Dan Sloughter (Furman University) Goodness of Fit Tests: Homogeneity May 11, / 13

7 Testing for homogeneity (cont d) We may summarize this information in a contingency table as follows. 1 2 c Total 1 n 11 n 12 n 1c n 1+ 2 n 21 n 22 n 2c n r n r1 n r2 n rc n r+ Total n +1 n +2 n +c n Dan Sloughter (Furman University) Goodness of Fit Tests: Homogeneity May 11, / 13

8 Testing for homogeneity (cont d) Under H 0, the maximum likelihood estimator of the probability of outcome i is n i+ n. Dan Sloughter (Furman University) Goodness of Fit Tests: Homogeneity May 11, / 13

9 Testing for homogeneity (cont d) Under H 0, the maximum likelihood estimator of the probability of outcome i is n i+ n. And so the expected number of observations of outcome i in sample j is e ij = n +j ni+ n = n i+n +j. n Dan Sloughter (Furman University) Goodness of Fit Tests: Homogeneity May 11, / 13

10 Testing for homogeneity (cont d) Under H 0, the maximum likelihood estimator of the probability of outcome i is n i+ n. And so the expected number of observations of outcome i in sample j is e ij = n +j ni+ n = n i+n +j. n We may now evaluate either or 2 log(λ) = 2 Q = r r i=1 j=1 c i=1 j=1 c n ij log (n ij e ij ) 2 e ij. ( nij e ij ) Dan Sloughter (Furman University) Goodness of Fit Tests: Homogeneity May 11, / 13

11 Testing for homogeneity (cont d) Note: We initially have c(r 1) degrees of freedom (adding together r 1 degrees of freedom for each of the c samples) and have estimated r 1 parameters. Dan Sloughter (Furman University) Goodness of Fit Tests: Homogeneity May 11, / 13

12 Testing for homogeneity (cont d) Note: We initially have c(r 1) degrees of freedom (adding together r 1 degrees of freedom for each of the c samples) and have estimated r 1 parameters. Hence we have degrees of freedom. c(r 1) (r 1) = (r 1)(c 1) Dan Sloughter (Furman University) Goodness of Fit Tests: Homogeneity May 11, / 13

13 Testing for homogeneity (cont d) Note: We initially have c(r 1) degrees of freedom (adding together r 1 degrees of freedom for each of the c samples) and have estimated r 1 parameters. Hence we have degrees of freedom. c(r 1) (r 1) = (r 1)(c 1) That is, under H 0, both 2 log(λ) and Q are approximately χ 2 ((r 1)(c 1)). Dan Sloughter (Furman University) Goodness of Fit Tests: Homogeneity May 11, / 13

14 Example Dan Sloughter (Furman University) Goodness of Fit Tests: Homogeneity May 11, / 13

15 Example When Jane Austen died in 1817, she left the novel Sanditon unfinished, but with a summary of the rest. This was completed by an admirer, and then published. Dan Sloughter (Furman University) Goodness of Fit Tests: Homogeneity May 11, / 13

16 Example When Jane Austen died in 1817, she left the novel Sanditon unfinished, but with a summary of the rest. This was completed by an admirer, and then published. In 1978, A. Q. Morton published some statistical studies comparing the writings of Austen and the person who completed Sanditon. Morton counted the occurrences of a, an, this, that, with, and without in chapters 1 and 3 of Sense and Sensibility; chapters 1, 2, and 3 of Emma; and chapters 1 and 6 of Sanditon (written by Austen), and also the occurrences of these words in chapters 12 and 24 of Sanditon (not written by Austen). Dan Sloughter (Furman University) Goodness of Fit Tests: Homogeneity May 11, / 13

17 Dan Sloughter (Furman University) Goodness of Fit Tests: Homogeneity May 11, / 13

18 The results: Word Austen Imitator Total a an this that with without Total Dan Sloughter (Furman University) Goodness of Fit Tests: Homogeneity May 11, / 13

19 Dan Sloughter (Furman University) Goodness of Fit Tests: Homogeneity May 11, / 13

20 The expected frequencies are e 11 = (1017)(517) 1213 = , e 12 = (196)(517) 1213 = 83.54, e 21 = (1017)(91) = 76.30, e 22 = (196)(91) = 14.70, and so on, giving us the following table of expected frequencies: Word Austen Imitator Total a an this that with without Total Dan Sloughter (Furman University) Goodness of Fit Tests: Homogeneity May 11, / 13

21 Dan Sloughter (Furman University) Goodness of Fit Tests: Homogeneity May 11, / 13

22 Evaluating our test statistics, we find either 2 log(λ) = or q = Dan Sloughter (Furman University) Goodness of Fit Tests: Homogeneity May 11, / 13

23 Evaluating our test statistics, we find either 2 log(λ) = or q = If U is χ 2 (5), we have either p-value = P(U 31.75) = or p-value = P(U 32.83) = Dan Sloughter (Furman University) Goodness of Fit Tests: Homogeneity May 11, / 13

24 Evaluating our test statistics, we find either 2 log(λ) = or q = If U is χ 2 (5), we have either p-value = P(U 31.75) = or p-value = P(U 32.83) = Hence we may conclude that the imitator has not been successful in imitating this aspect of Austen s style. Dan Sloughter (Furman University) Goodness of Fit Tests: Homogeneity May 11, / 13

25 Example (Doll and Hill Cancer Study) Dan Sloughter (Furman University) Goodness of Fit Tests: Homogeneity May 11, / 13

26 Example (Doll and Hill Cancer Study) In a study of patients in London hospitals in 1948 and 1949, Doll and Hill categorized each of 709 lung cancer patients and 709 control patients (that is, patients who did not have lung cancer) as either a smoker or a non-smoker. Dan Sloughter (Furman University) Goodness of Fit Tests: Homogeneity May 11, / 13

27 Example (Doll and Hill Cancer Study) In a study of patients in London hospitals in 1948 and 1949, Doll and Hill categorized each of 709 lung cancer patients and 709 control patients (that is, patients who did not have lung cancer) as either a smoker or a non-smoker. Results of the study: Cancer Control Total Non-smoker Smoker Total Dan Sloughter (Furman University) Goodness of Fit Tests: Homogeneity May 11, / 13

28 Example (Doll and Hill Cancer Study) In a study of patients in London hospitals in 1948 and 1949, Doll and Hill categorized each of 709 lung cancer patients and 709 control patients (that is, patients who did not have lung cancer) as either a smoker or a non-smoker. Results of the study: Cancer Control Total Non-smoker Smoker Total The data raises the following question: Are the 38 additional non-smokers in the control group due to randomness, or to a higher rate of smoking among people with lung cancer than among those without lung cancer? Dan Sloughter (Furman University) Goodness of Fit Tests: Homogeneity May 11, / 13

29 Dan Sloughter (Furman University) Goodness of Fit Tests: Homogeneity May 11, / 13

30 The expected frequencies are: Cancer Control Total Non-smoker Smoker Total Dan Sloughter (Furman University) Goodness of Fit Tests: Homogeneity May 11, / 13

31 The expected frequencies are: Cancer Control Total Non-smoker Smoker Total And so 2 log(λ) = and q = , giving p-values of and , respectively. Dan Sloughter (Furman University) Goodness of Fit Tests: Homogeneity May 11, / 13

32 The expected frequencies are: Cancer Control Total Non-smoker Smoker Total And so 2 log(λ) = and q = , giving p-values of and , respectively. Hence we have very strong evidence for rejecting the hypothesis that the rate of smoking among the two groups is the same. Dan Sloughter (Furman University) Goodness of Fit Tests: Homogeneity May 11, / 13

33 Dan Sloughter (Furman University) Goodness of Fit Tests: Homogeneity May 11, / 13

34 Note: We could also perform this test as a two-sample test for the equality of the probability of success in two independent Bernoulli populations. Dan Sloughter (Furman University) Goodness of Fit Tests: Homogeneity May 11, / 13

35 Note: We could also perform this test as a two-sample test for the equality of the probability of success in two independent Bernoulli populations. That is, let p X be the proportion of non-smokers in the cancer population and let p Y be the proportion of non-smokers in the control population. Dan Sloughter (Furman University) Goodness of Fit Tests: Homogeneity May 11, / 13

36 Note: We could also perform this test as a two-sample test for the equality of the probability of success in two independent Bernoulli populations. That is, let p X be the proportion of non-smokers in the cancer population and let p Y be the proportion of non-smokers in the control population. We want to test H 0 : p X = p Y H A : p X p Y. Dan Sloughter (Furman University) Goodness of Fit Tests: Homogeneity May 11, / 13

37 Note: We could also perform this test as a two-sample test for the equality of the probability of success in two independent Bernoulli populations. That is, let p X be the proportion of non-smokers in the cancer population and let p Y be the proportion of non-smokers in the control population. We want to test Now H 0 : p X = p Y H A : p X p Y. ˆp X = = , ˆp Y = = , ˆp = = Dan Sloughter (Furman University) Goodness of Fit Tests: Homogeneity May 11, / 13

38 Dan Sloughter (Furman University) Goodness of Fit Tests: Homogeneity May 11, / 13

39 Hence z = ˆp X ˆp y ˆp(1 ˆp) ( ) = Dan Sloughter (Furman University) Goodness of Fit Tests: Homogeneity May 11, / 13

40 Hence z = ˆp X ˆp y ˆp(1 ˆp) ( ) = This yields a p-value of , the same as for q above. Dan Sloughter (Furman University) Goodness of Fit Tests: Homogeneity May 11, / 13

41 Hence z = ˆp X ˆp y ˆp(1 ˆp) ( ) = This yields a p-value of , the same as for q above. Indeed: z 2 = = q. Dan Sloughter (Furman University) Goodness of Fit Tests: Homogeneity May 11, / 13

Mathematics 13: Lecture 4

Mathematics 13: Lecture 4 Mathematics 13: Lecture Planes Dan Sloughter Furman University January 10, 2008 Dan Sloughter (Furman University) Mathematics 13: Lecture January 10, 2008 1 / 10 Planes in R n Suppose v and w are nonzero

More information

Sampling Distributions

Sampling Distributions Sampling Distributions Mathematics 47: Lecture 9 Dan Sloughter Furman University March 16, 2006 Dan Sloughter (Furman University) Sampling Distributions March 16, 2006 1 / 10 Definition We call the probability

More information

Pivotal Quantities. Mathematics 47: Lecture 16. Dan Sloughter. Furman University. March 30, 2006

Pivotal Quantities. Mathematics 47: Lecture 16. Dan Sloughter. Furman University. March 30, 2006 Pivotal Quantities Mathematics 47: Lecture 16 Dan Sloughter Furman University March 30, 2006 Dan Sloughter (Furman University) Pivotal Quantities March 30, 2006 1 / 10 Pivotal quantities Definition Suppose

More information

Mathematics 22: Lecture 7

Mathematics 22: Lecture 7 Mathematics 22: Lecture 7 Separation of Variables Dan Sloughter Furman University January 15, 2008 Dan Sloughter (Furman University) Mathematics 22: Lecture 7 January 15, 2008 1 / 8 Separable equations

More information

Statistics 3858 : Contingency Tables

Statistics 3858 : Contingency Tables Statistics 3858 : Contingency Tables 1 Introduction Before proceeding with this topic the student should review generalized likelihood ratios ΛX) for multinomial distributions, its relation to Pearson

More information

Nonparametric Tests. Mathematics 47: Lecture 25. Dan Sloughter. Furman University. April 20, 2006

Nonparametric Tests. Mathematics 47: Lecture 25. Dan Sloughter. Furman University. April 20, 2006 Nonparametric Tests Mathematics 47: Lecture 25 Dan Sloughter Furman University April 20, 2006 Dan Sloughter (Furman University) Nonparametric Tests April 20, 2006 1 / 14 The sign test Suppose X 1, X 2,...,

More information

Topic 21 Goodness of Fit

Topic 21 Goodness of Fit Topic 21 Goodness of Fit Contingency Tables 1 / 11 Introduction Two-way Table Smoking Habits The Hypothesis The Test Statistic Degrees of Freedom Outline 2 / 11 Introduction Contingency tables, also known

More information

Lecture 01: Introduction

Lecture 01: Introduction Lecture 01: Introduction Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 01: Introduction

More information

Mathematics 13: Lecture 10

Mathematics 13: Lecture 10 Mathematics 13: Lecture 10 Matrices Dan Sloughter Furman University January 25, 2008 Dan Sloughter (Furman University) Mathematics 13: Lecture 10 January 25, 2008 1 / 19 Matrices Recall: A matrix is a

More information

Chi-square (χ 2 ) Tests

Chi-square (χ 2 ) Tests Math 442 - Mathematical Statistics II April 30, 2018 Chi-square (χ 2 ) Tests Common Uses of the χ 2 test. 1. Testing Goodness-of-fit. 2. Testing Equality of Several Proportions. 3. Homogeneity Test. 4.

More information

Mathematics 22: Lecture 12

Mathematics 22: Lecture 12 Mathematics 22: Lecture 12 Second-order Linear Equations Dan Sloughter Furman University January 28, 2008 Dan Sloughter (Furman University) Mathematics 22: Lecture 12 January 28, 2008 1 / 14 Definition

More information

10: Crosstabs & Independent Proportions

10: Crosstabs & Independent Proportions 10: Crosstabs & Independent Proportions p. 10.1 P Background < Two independent groups < Binary outcome < Compare binomial proportions P Illustrative example ( oswege.sav ) < Food poisoning following church

More information

Some Trigonometric Limits

Some Trigonometric Limits Some Trigonometric Limits Mathematics 11: Lecture 7 Dan Sloughter Furman University September 20, 2007 Dan Sloughter (Furman University) Some Trigonometric Limits September 20, 2007 1 / 14 The squeeze

More information

Chi-square (χ 2 ) Tests

Chi-square (χ 2 ) Tests Math 145 - Elementary Statistics April 17, 2007 Common Uses of the χ 2 test. 1. Testing Goodness-of-fit. Chi-square (χ 2 ) Tests 2. Testing Equality of Several Proportions. 3. Homogeneity Test. 4. Testing

More information

Research Methodology: Tools

Research Methodology: Tools MSc Business Administration Research Methodology: Tools Applied Data Analysis (with SPSS) Lecture 05: Contingency Analysis March 2014 Prof. Dr. Jürg Schwarz Lic. phil. Heidi Bruderer Enzler Contents Slide

More information

Calculus: Area. Mathematics 15: Lecture 22. Dan Sloughter. Furman University. November 12, 2006

Calculus: Area. Mathematics 15: Lecture 22. Dan Sloughter. Furman University. November 12, 2006 Calculus: Area Mathematics 15: Lecture 22 Dan Sloughter Furman University November 12, 2006 Dan Sloughter (Furman University) Calculus: Area November 12, 2006 1 / 7 Area Note: formulas for the areas of

More information

Conditional Probability (cont...) 10/06/2005

Conditional Probability (cont...) 10/06/2005 Conditional Probability (cont...) 10/06/2005 Independent Events Two events E and F are independent if both E and F have positive probability and if P (E F ) = P (E), and P (F E) = P (F ). 1 Theorem. If

More information

Mathematics 22: Lecture 19

Mathematics 22: Lecture 19 Mathematics 22: Lecture 19 Legendre s Equation Dan Sloughter Furman University February 5, 2008 Dan Sloughter (Furman University) Mathematics 22: Lecture 19 February 5, 2008 1 / 11 Example: Legendre s

More information

Binomial and Poisson Probability Distributions

Binomial and Poisson Probability Distributions Binomial and Poisson Probability Distributions Esra Akdeniz March 3, 2016 Bernoulli Random Variable Any random variable whose only possible values are 0 or 1 is called a Bernoulli random variable. What

More information

Antiderivatives. Mathematics 11: Lecture 30. Dan Sloughter. Furman University. November 7, 2007

Antiderivatives. Mathematics 11: Lecture 30. Dan Sloughter. Furman University. November 7, 2007 Antiderivatives Mathematics 11: Lecture 30 Dan Sloughter Furman University November 7, 2007 Dan Sloughter (Furman University) Antiderivatives November 7, 2007 1 / 9 Definition Recall: Suppose F and f are

More information

Change of Variables: Indefinite Integrals

Change of Variables: Indefinite Integrals Change of Variables: Indefinite Integrals Mathematics 11: Lecture 39 Dan Sloughter Furman University November 29, 2007 Dan Sloughter (Furman University) Change of Variables: Indefinite Integrals November

More information

10.4 Hypothesis Testing: Two Independent Samples Proportion

10.4 Hypothesis Testing: Two Independent Samples Proportion 10.4 Hypothesis Testing: Two Independent Samples Proportion Example 3: Smoking cigarettes has been known to cause cancer and other ailments. One politician believes that a higher tax should be imposed

More information

Lecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

Lecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University Lecture 25 Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University November 24, 2015 1 2 3 4 5 6 7 8 9 10 11 1 Hypothesis s of homgeneity 2 Estimating risk

More information

Lecture 9. Selected material from: Ch. 12 The analysis of categorical data and goodness of fit tests

Lecture 9. Selected material from: Ch. 12 The analysis of categorical data and goodness of fit tests Lecture 9 Selected material from: Ch. 12 The analysis of categorical data and goodness of fit tests Univariate categorical data Univariate categorical data are best summarized in a one way frequency table.

More information

Mathematics 22: Lecture 5

Mathematics 22: Lecture 5 Mathematics 22: Lecture 5 Autonomous Equations Dan Sloughter Furman University January 11, 2008 Dan Sloughter (Furman University) Mathematics 22: Lecture 5 January 11, 2008 1 / 11 Solving the logistics

More information

Tests for Population Proportion(s)

Tests for Population Proportion(s) Tests for Population Proportion(s) Esra Akdeniz April 6th, 2016 Motivation We are interested in estimating the prevalence rate of breast cancer among 50- to 54-year-old women whose mothers have had breast

More information

Contingency Tables Part One 1

Contingency Tables Part One 1 Contingency Tables Part One 1 STA 312: Fall 2012 1 See last slide for copyright information. 1 / 32 Suggested Reading: Chapter 2 Read Sections 2.1-2.4 You are not responsible for Section 2.5 2 / 32 Overview

More information

Chapter 10. Chapter 10. Multinomial Experiments and. Multinomial Experiments and Contingency Tables. Contingency Tables.

Chapter 10. Chapter 10. Multinomial Experiments and. Multinomial Experiments and Contingency Tables. Contingency Tables. Chapter 10 Multinomial Experiments and Contingency Tables 1 Chapter 10 Multinomial Experiments and Contingency Tables 10-1 1 Overview 10-2 2 Multinomial Experiments: of-fitfit 10-3 3 Contingency Tables:

More information

Module 10: Analysis of Categorical Data Statistics (OA3102)

Module 10: Analysis of Categorical Data Statistics (OA3102) Module 10: Analysis of Categorical Data Statistics (OA3102) Professor Ron Fricker Naval Postgraduate School Monterey, California Reading assignment: WM&S chapter 14.1-14.7 Revision: 3-12 1 Goals for this

More information

Contingency Tables. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels.

Contingency Tables. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels. Contingency Tables Definition & Examples. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels. (Using more than two factors gets complicated,

More information

Ling 289 Contingency Table Statistics

Ling 289 Contingency Table Statistics Ling 289 Contingency Table Statistics Roger Levy and Christopher Manning This is a summary of the material that we ve covered on contingency tables. Contingency tables: introduction Odds ratios Counting,

More information

Introduction to Analysis of Genomic Data Using R Lecture 6: Review Statistics (Part II)

Introduction to Analysis of Genomic Data Using R Lecture 6: Review Statistics (Part II) 1/45 Introduction to Analysis of Genomic Data Using R Lecture 6: Review Statistics (Part II) Dr. Yen-Yi Ho (hoyen@stat.sc.edu) Feb 9, 2018 2/45 Objectives of Lecture 6 Association between Variables Goodness

More information

Section 9.5. Testing the Difference Between Two Variances. Bluman, Chapter 9 1

Section 9.5. Testing the Difference Between Two Variances. Bluman, Chapter 9 1 Section 9.5 Testing the Difference Between Two Variances Bluman, Chapter 9 1 This the last day the class meets before spring break starts. Please make sure to be present for the test or make appropriate

More information

Mathematics 22: Lecture 4

Mathematics 22: Lecture 4 Mathematics 22: Lecture 4 Population Models Dan Sloughter Furman University January 10, 2008 Dan Sloughter (Furman University) Mathematics 22: Lecture 4 January 10, 2008 1 / 6 Malthusian growth model Let

More information

Example. Mathematics 255: Lecture 17. Example. Example (cont d) Consider the equation. d 2 y dt 2 + dy

Example. Mathematics 255: Lecture 17. Example. Example (cont d) Consider the equation. d 2 y dt 2 + dy Mathematics 255: Lecture 17 Undetermined Coefficients Dan Sloughter Furman University October 10, 2008 6y = 5e 4t. so the general solution of 0 = r 2 + r 6 = (r + 3)(r 2), 6y = 0 y(t) = c 1 e 3t + c 2

More information

Computational Systems Biology: Biology X

Computational Systems Biology: Biology X Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#7:(Mar-23-2010) Genome Wide Association Studies 1 The law of causality... is a relic of a bygone age, surviving, like the monarchy,

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

Summary of Chapters 7-9

Summary of Chapters 7-9 Summary of Chapters 7-9 Chapter 7. Interval Estimation 7.2. Confidence Intervals for Difference of Two Means Let X 1,, X n and Y 1, Y 2,, Y m be two independent random samples of sizes n and m from two

More information

Lecture 21 Comparing Counts - Chi-square test

Lecture 21 Comparing Counts - Chi-square test Lecture 21 Comparing Counts - Chi-square test Thais Paiva STA 111 - Summer 2013 Term II August 5, 2013 1 / 20 Thais Paiva STA 111 - Summer 2013 Term II Lecture 21, 08/05/2013 Lecture Plan 1 Goodness of

More information

Lecture 28 Chi-Square Analysis

Lecture 28 Chi-Square Analysis Lecture 28 STAT 225 Introduction to Probability Models April 23, 2014 Whitney Huang Purdue University 28.1 χ 2 test for For a given contingency table, we want to test if two have a relationship or not

More information

Mathematics 22: Lecture 10

Mathematics 22: Lecture 10 Mathematics 22: Lecture 10 Euler s Method Dan Sloughter Furman University January 22, 2008 Dan Sloughter (Furman University) Mathematics 22: Lecture 10 January 22, 2008 1 / 14 Euler s method Consider the

More information

Generalized logit models for nominal multinomial responses. Local odds ratios

Generalized logit models for nominal multinomial responses. Local odds ratios Generalized logit models for nominal multinomial responses Categorical Data Analysis, Summer 2015 1/17 Local odds ratios Y 1 2 3 4 1 π 11 π 12 π 13 π 14 π 1+ X 2 π 21 π 22 π 23 π 24 π 2+ 3 π 31 π 32 π

More information

11-2 Multinomial Experiment

11-2 Multinomial Experiment Chapter 11 Multinomial Experiments and Contingency Tables 1 Chapter 11 Multinomial Experiments and Contingency Tables 11-11 Overview 11-2 Multinomial Experiments: Goodness-of-fitfit 11-3 Contingency Tables:

More information

Goodness of Fit Tests

Goodness of Fit Tests Goodness of Fit Tests Marc H. Mehlman marcmehlman@yahoo.com University of New Haven (University of New Haven) Goodness of Fit Tests 1 / 38 Table of Contents 1 Goodness of Fit Chi Squared Test 2 Tests of

More information

Discrete Multivariate Statistics

Discrete Multivariate Statistics Discrete Multivariate Statistics Univariate Discrete Random variables Let X be a discrete random variable which, in this module, will be assumed to take a finite number of t different values which are

More information

Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution

Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution James V. Lambers Department of Mathematics The University of Southern Mississippi James V. Lambers Statistical Data Analysis

More information

Categorical Data Analysis Chapter 3

Categorical Data Analysis Chapter 3 Categorical Data Analysis Chapter 3 The actual coverage probability is usually a bit higher than the nominal level. Confidence intervals for association parameteres Consider the odds ratio in the 2x2 table,

More information

Mathematics 22: Lecture 11

Mathematics 22: Lecture 11 Mathematics 22: Lecture 11 Runge-Kutta Dan Sloughter Furman University January 25, 2008 Dan Sloughter (Furman University) Mathematics 22: Lecture 11 January 25, 2008 1 / 11 Order of approximations One

More information

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651 Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html Lecture 31 (MWF) Review of test for independence and starting with linear regression Suhasini Subba

More information

STAT Chapter 13: Categorical Data. Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure).

STAT Chapter 13: Categorical Data. Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure). STAT 515 -- Chapter 13: Categorical Data Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure). Many studies allow for more than 2 categories. Example

More information

Tests for Two Correlated Proportions in a Matched Case- Control Design

Tests for Two Correlated Proportions in a Matched Case- Control Design Chapter 155 Tests for Two Correlated Proportions in a Matched Case- Control Design Introduction A 2-by-M case-control study investigates a risk factor relevant to the development of a disease. A population

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions.

The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions. The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions. A common problem of this type is concerned with determining

More information

MAT 2379, Introduction to Biostatistics, Sample Calculator Questions 1. MAT 2379, Introduction to Biostatistics

MAT 2379, Introduction to Biostatistics, Sample Calculator Questions 1. MAT 2379, Introduction to Biostatistics MAT 2379, Introduction to Biostatistics, Sample Calculator Questions 1 MAT 2379, Introduction to Biostatistics Sample Calculator Problems for the Final Exam Note: The exam will also contain some problems

More information

The Chain Rule. Mathematics 11: Lecture 18. Dan Sloughter. Furman University. October 10, 2007

The Chain Rule. Mathematics 11: Lecture 18. Dan Sloughter. Furman University. October 10, 2007 The Chain Rule Mathematics 11: Lecture 18 Dan Sloughter Furman University October 10, 2007 Dan Sloughter (Furman University) The Chain Rule October 10, 2007 1 / 15 Example Suppose that a pebble is dropped

More information

We know from STAT.1030 that the relevant test statistic for equality of proportions is:

We know from STAT.1030 that the relevant test statistic for equality of proportions is: 2. Chi 2 -tests for equality of proportions Introduction: Two Samples Consider comparing the sample proportions p 1 and p 2 in independent random samples of size n 1 and n 2 out of two populations which

More information

Contingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878

Contingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878 Contingency Tables I. Definition & Examples. A) Contingency tables are tables where we are looking at two (or more - but we won t cover three or more way tables, it s way too complicated) factors, each

More information

Asymptotic equivalence of paired Hotelling test and conditional logistic regression

Asymptotic equivalence of paired Hotelling test and conditional logistic regression Asymptotic equivalence of paired Hotelling test and conditional logistic regression Félix Balazard 1,2 arxiv:1610.06774v1 [math.st] 21 Oct 2016 Abstract 1 Sorbonne Universités, UPMC Univ Paris 06, CNRS

More information

12 Chi-squared (χ 2 ) Tests for Goodness-of-fit and Independence

12 Chi-squared (χ 2 ) Tests for Goodness-of-fit and Independence 12 Chi-squared (χ 2 ) Tests for Goodness-of-fit and Independence The chi-squared tests are for H 0 : The frequency distribution of events observed in a sample is with a particular distribution against

More information

Sociology 362 Data Exercise 6 Logistic Regression 2

Sociology 362 Data Exercise 6 Logistic Regression 2 Sociology 362 Data Exercise 6 Logistic Regression 2 The questions below refer to the data and output beginning on the next page. Although the raw data are given there, you do not have to do any Stata runs

More information

For more information about how to cite these materials visit

For more information about how to cite these materials visit Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/

More information

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j. Chapter 9 Pearson s chi-square test 9. Null hypothesis asymptotics Let X, X 2, be independent from a multinomial(, p) distribution, where p is a k-vector with nonnegative entries that sum to one. That

More information

Analysis of Categorical Data Three-Way Contingency Table

Analysis of Categorical Data Three-Way Contingency Table Yu Lecture 4 p. 1/17 Analysis of Categorical Data Three-Way Contingency Table Yu Lecture 4 p. 2/17 Outline Three way contingency tables Simpson s paradox Marginal vs. conditional independence Homogeneous

More information

Lecture 20. Poisson Processes. Text: A Course in Probability by Weiss STAT 225 Introduction to Probability Models March 26, 2014

Lecture 20. Poisson Processes. Text: A Course in Probability by Weiss STAT 225 Introduction to Probability Models March 26, 2014 Lecture 20 Text: A Course in Probability by Weiss 12.1 STAT 225 Introduction to Probability Models March 26, 2014 Whitney Huang Purdue University 20.1 Agenda 1 2 20.2 For a specified event that occurs

More information

ij i j m ij n ij m ij n i j Suppose we denote the row variable by X and the column variable by Y ; We can then re-write the above expression as

ij i j m ij n ij m ij n i j Suppose we denote the row variable by X and the column variable by Y ; We can then re-write the above expression as page1 Loglinear Models Loglinear models are a way to describe association and interaction patterns among categorical variables. They are commonly used to model cell counts in contingency tables. These

More information

Unit 9: Inferences for Proportions and Count Data

Unit 9: Inferences for Proportions and Count Data Unit 9: Inferences for Proportions and Count Data Statistics 571: Statistical Methods Ramón V. León 1/15/008 Unit 9 - Stat 571 - Ramón V. León 1 Large Sample Confidence Interval for Proportion ( pˆ p)

More information

Testing a Claim about the Difference in 2 Population Means Independent Samples. (there is no difference in Population Means µ 1 µ 2 = 0) against

Testing a Claim about the Difference in 2 Population Means Independent Samples. (there is no difference in Population Means µ 1 µ 2 = 0) against Section 9 2A Lecture Testing a Claim about the Difference i Population Means Independent Samples Test H 0 : µ 1 = µ 2 (there is no difference in Population Means µ 1 µ 2 = 0) against H 1 : µ 1 > µ 2 or

More information

Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2)

Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2) Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2) B.H. Robbins Scholars Series June 23, 2010 1 / 29 Outline Z-test χ 2 -test Confidence Interval Sample size and power Relative effect

More information

2 and F Distributions. Barrow, Statistics for Economics, Accounting and Business Studies, 4 th edition Pearson Education Limited 2006

2 and F Distributions. Barrow, Statistics for Economics, Accounting and Business Studies, 4 th edition Pearson Education Limited 2006 and F Distributions Lecture 9 Distribution The distribution is used to: construct confidence intervals for a variance compare a set of actual frequencies with expected frequencies test for association

More information

Analysis of data in square contingency tables

Analysis of data in square contingency tables Analysis of data in square contingency tables Iva Pecáková Let s suppose two dependent samples: the response of the nth subject in the second sample relates to the response of the nth subject in the first

More information

Introduction to logistic regression

Introduction to logistic regression Introduction to logistic regression Tuan V. Nguyen Professor and NHMRC Senior Research Fellow Garvan Institute of Medical Research University of New South Wales Sydney, Australia What we are going to learn

More information

Review. More Review. Things to know about Probability: Let Ω be the sample space for a probability measure P.

Review. More Review. Things to know about Probability: Let Ω be the sample space for a probability measure P. 1 2 Review Data for assessing the sensitivity and specificity of a test are usually of the form disease category test result diseased (+) nondiseased ( ) + A B C D Sensitivity: is the proportion of diseased

More information

2.6.3 Generalized likelihood ratio tests

2.6.3 Generalized likelihood ratio tests 26 HYPOTHESIS TESTING 113 263 Generalized likelihood ratio tests When a UMP test does not exist, we usually use a generalized likelihood ratio test to verify H 0 : θ Θ against H 1 : θ Θ\Θ It can be used

More information

POLI 443 Applied Political Research

POLI 443 Applied Political Research POLI 443 Applied Political Research Session 6: Tests of Hypotheses Contingency Analysis Lecturer: Prof. A. Essuman-Johnson, Dept. of Political Science Contact Information: aessuman-johnson@ug.edu.gh College

More information

Hypothesis Testing: Chi-Square Test 1

Hypothesis Testing: Chi-Square Test 1 Hypothesis Testing: Chi-Square Test 1 November 9, 2017 1 HMS, 2017, v1.0 Chapter References Diez: Chapter 6.3 Navidi, Chapter 6.10 Chapter References 2 Chi-square Distributions Let X 1, X 2,... X n be

More information

This does not cover everything on the final. Look at the posted practice problems for other topics.

This does not cover everything on the final. Look at the posted practice problems for other topics. Class 7: Review Problems for Final Exam 8.5 Spring 7 This does not cover everything on the final. Look at the posted practice problems for other topics. To save time in class: set up, but do not carry

More information

Probability. Chapter 1 Probability. A Simple Example. Sample Space and Probability. Sample Space and Event. Sample Space (Two Dice) Probability

Probability. Chapter 1 Probability. A Simple Example. Sample Space and Probability. Sample Space and Event. Sample Space (Two Dice) Probability Probability Chapter 1 Probability 1.1 asic Concepts researcher claims that 10% of a large population have disease H. random sample of 100 people is taken from this population and examined. If 20 people

More information

Regression so far... Lecture 21 - Logistic Regression. Odds. Recap of what you should know how to do... At this point we have covered: Sta102 / BME102

Regression so far... Lecture 21 - Logistic Regression. Odds. Recap of what you should know how to do... At this point we have covered: Sta102 / BME102 Background Regression so far... Lecture 21 - Sta102 / BME102 Colin Rundel November 18, 2014 At this point we have covered: Simple linear regression Relationship between numerical response and a numerical

More information

Lecture 5: ANOVA and Correlation

Lecture 5: ANOVA and Correlation Lecture 5: ANOVA and Correlation Ani Manichaikul amanicha@jhsph.edu 23 April 2007 1 / 62 Comparing Multiple Groups Continous data: comparing means Analysis of variance Binary data: comparing proportions

More information

Conditional Probability (cont'd)

Conditional Probability (cont'd) Conditional Probability (cont'd) April 26, 2006 Conditional Probability (cont'd) Midterm Problems In a ten-question true-false exam, nd the probability that a student get a grade of 70 percent or better

More information

Probability and Probability Distributions. Dr. Mohammed Alahmed

Probability and Probability Distributions. Dr. Mohammed Alahmed Probability and Probability Distributions 1 Probability and Probability Distributions Usually we want to do more with data than just describing them! We might want to test certain specific inferences about

More information

Lecture 2: Categorical Variable. A nice book about categorical variable is An Introduction to Categorical Data Analysis authored by Alan Agresti

Lecture 2: Categorical Variable. A nice book about categorical variable is An Introduction to Categorical Data Analysis authored by Alan Agresti Lecture 2: Categorical Variable A nice book about categorical variable is An Introduction to Categorical Data Analysis authored by Alan Agresti 1 Categorical Variable Categorical variable is qualitative

More information

Two-sample Categorical data: Testing

Two-sample Categorical data: Testing Two-sample Categorical data: Testing Patrick Breheny April 1 Patrick Breheny Introduction to Biostatistics (171:161) 1/28 Separate vs. paired samples Despite the fact that paired samples usually offer

More information

3.1 Events, Sample Spaces, and Probability

3.1 Events, Sample Spaces, and Probability Chapter 3 Probability Probability is the tool that allows the statistician to use sample information to make inferences about or to describe the population from which the sample was drawn. 3.1 Events,

More information

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary Patrick Breheny October 13 Patrick Breheny Biostatistical Methods I (BIOS 5710) 1/25 Introduction Introduction What s wrong with z-tests? So far we ve (thoroughly!) discussed how to carry out hypothesis

More information

Central Limit Theorem ( 5.3)

Central Limit Theorem ( 5.3) Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately

More information

Question. Hypothesis testing. Example. Answer: hypothesis. Test: true or not? Question. Average is not the mean! μ average. Random deviation or not?

Question. Hypothesis testing. Example. Answer: hypothesis. Test: true or not? Question. Average is not the mean! μ average. Random deviation or not? Hypothesis testing Question Very frequently: what is the possible value of μ? Sample: we know only the average! μ average. Random deviation or not? Standard error: the measure of the random deviation.

More information

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression Rebecca Barter April 20, 2015 Fisher s Exact Test Fisher s Exact Test

More information

Hypothesis testing for µ:

Hypothesis testing for µ: University of California, Los Angeles Department of Statistics Statistics 10 Elements of a hypothesis test: Hypothesis testing Instructor: Nicolas Christou 1. Null hypothesis, H 0 (always =). 2. Alternative

More information

Truck prices - linear model? Truck prices - log transform of the response variable. Interpreting models with log transformation

Truck prices - linear model? Truck prices - log transform of the response variable. Interpreting models with log transformation Background Regression so far... Lecture 23 - Sta 111 Colin Rundel June 17, 2014 At this point we have covered: Simple linear regression Relationship between numerical response and a numerical or categorical

More information

STAT 526 Spring Midterm 1. Wednesday February 2, 2011

STAT 526 Spring Midterm 1. Wednesday February 2, 2011 STAT 526 Spring 2011 Midterm 1 Wednesday February 2, 2011 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points

More information

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1) Summary of Chapter 7 (Sections 7.2-7.5) and Chapter 8 (Section 8.1) Chapter 7. Tests of Statistical Hypotheses 7.2. Tests about One Mean (1) Test about One Mean Case 1: σ is known. Assume that X N(µ, σ

More information

Introduction. Dipankar Bandyopadhyay, Ph.D. Department of Biostatistics, Virginia Commonwealth University

Introduction. Dipankar Bandyopadhyay, Ph.D. Department of Biostatistics, Virginia Commonwealth University Introduction Dipankar Bandyopadhyay, Ph.D. Department of Biostatistics, Virginia Commonwealth University D. Bandyopadhyay (VCU) BIOS 625: Categorical Data & GLM 1 / 56 Course logistics Let Y be a discrete

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 3: Bivariate association : Categorical variables Proportion in one group One group is measured one time: z test Use the z distribution as an approximation to the binomial

More information

Unit 9: Inferences for Proportions and Count Data

Unit 9: Inferences for Proportions and Count Data Unit 9: Inferences for Proportions and Count Data Statistics 571: Statistical Methods Ramón V. León 12/15/2008 Unit 9 - Stat 571 - Ramón V. León 1 Large Sample Confidence Interval for Proportion ( pˆ p)

More information

Confidence Intervals, Testing and ANOVA Summary

Confidence Intervals, Testing and ANOVA Summary Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0

More information

hypothesis testing 1

hypothesis testing 1 hypothesis testing 1 Does smoking cause cancer? competing hypotheses (a) No; we don t know what causes cancer, but smokers are no more likely to get it than nonsmokers (b) Yes; a much greater % of smokers

More information

Categorical Variables and Contingency Tables: Description and Inference

Categorical Variables and Contingency Tables: Description and Inference Categorical Variables and Contingency Tables: Description and Inference STAT 526 Professor Olga Vitek March 3, 2011 Reading: Agresti Ch. 1, 2 and 3 Faraway Ch. 4 3 Univariate Binomial and Multinomial Measurements

More information

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics Faculty of Health Sciences Regression models Counts, Poisson regression, 27-5-2013 Lene Theil Skovgaard Dept. of Biostatistics 1 / 36 Count outcome PKA & LTS, Sect. 7.2 Poisson regression The Binomial

More information