Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Similar documents
Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

( θ. sup θ Θ f X (x θ) = L. sup Pr (Λ (X) < c) = α. x : Λ (x) = sup θ H 0. sup θ Θ f X (x θ) = ) < c. NH : θ 1 = θ 2 against AH : θ 1 θ 2

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

Table 12.1: Contingency table. Feature b. 1 N 11 N 12 N 1b 2 N 21 N 22 N 2b. ... a N a1 N a2 N ab

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Class 27. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Read through these prior to coming to the test and follow them when you take your test.

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Topic 9: Sampling Distributions of Estimators

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

Topic 9: Sampling Distributions of Estimators

Chapter 11: Asking and Answering Questions About the Difference of Two Proportions

Chapter 6 Principles of Data Reduction

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Topic 9: Sampling Distributions of Estimators

Chi-Squared Tests Math 6070, Spring 2006

Direction: This test is worth 150 points. You are required to complete this test within 55 minutes.

Statistics 511 Additional Materials

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

5. Likelihood Ratio Tests

Problem Set 4 Due Oct, 12

This is an introductory course in Analysis of Variance and Design of Experiments.

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

Probability and statistics: basic terms

Agreement of CI and HT. Lecture 13 - Tests of Proportions. Example - Waiting Times

STAT431 Review. X = n. n )

Common Large/Small Sample Tests 1/55

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Estimation for Complete Data

MidtermII Review. Sta Fall Office Hours Wednesday 12:30-2:30pm Watch linear regression videos before lab on Thursday

Stat 319 Theory of Statistics (2) Exercises

TAMS24: Notations and Formulas

UCLA STAT 110B Applied Statistics for Engineering and the Sciences

1 Inferential Methods for Correlation and Regression Analysis

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Expectation and Variance of a random variable

Recall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y.

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.

Lecture 7: Properties of Random Samples

Math 152. Rumbos Fall Solutions to Review Problems for Exam #2. Number of Heads Frequency

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

A statistical method to determine sample size to estimate characteristic value of soil parameters

GG313 GEOLOGICAL DATA ANALYSIS

April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE

The Random Walk For Dummies

1.010 Uncertainty in Engineering Fall 2008

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Lecture Notes 15 Hypothesis Testing (Chapter 10)

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test.

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

Understanding Samples

Agenda: Recap. Lecture. Chapter 12. Homework. Chapt 12 #1, 2, 3 SAS Problems 3 & 4 by hand. Marquette University MATH 4740/MSCS 5740

Lecture 6 Simple alternatives and the Neyman-Pearson lemma

STATISTICAL INFERENCE

Chapter 13: Tests of Hypothesis Section 13.1 Introduction

Summary. Recap ... Last Lecture. Summary. Theorem

(7 One- and Two-Sample Estimation Problem )

1 Constructing and Interpreting a Confidence Interval

Data Analysis and Statistical Methods Statistics 651

Class 23. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Linear Regression Demystified

Random Variables, Sampling and Estimation

Statistics 20: Final Exam Solutions Summer Session 2007

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9

Chapter 8: STATISTICAL INTERVALS FOR A SINGLE SAMPLE. Part 3: Summary of CI for µ Confidence Interval for a Population Proportion p

Stat 200 -Testing Summary Page 1

Power and Type II Error

1 Models for Matched Pairs

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

CS284A: Representations and Algorithms in Molecular Biology

Last Lecture. Wald Test

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

Statistical Properties of OLS estimators

Recurrence Relations

Frequentist Inference

Sample Size Determination (Two or More Samples)

10. Comparative Tests among Spatial Regression Models. Here we revisit the example in Section 8.1 of estimating the mean of a normal random

2 1. The r.s., of size n2, from population 2 will be. 2 and 2. 2) The two populations are independent. This implies that all of the n1 n2

6.3 Testing Series With Positive Terms

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

TMA4245 Statistics. Corrected 30 May and 4 June Norwegian University of Science and Technology Department of Mathematical Sciences.

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Properties and Hypothesis Testing

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

Department of Mathematics

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Interval Estimation (Confidence Interval = C.I.): An interval estimate of some population parameter is an interval of the form (, ),

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

Regression, Part I. A) Correlation describes the relationship between two variables, where neither is independent or a predictor.

Math 140 Introductory Statistics

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

Chapter 6 Sampling Distributions

1 Introduction to reducing variance in Monte Carlo simulations

Confidence intervals summary Conservative and approximate confidence intervals for a binomial p Examples. MATH1005 Statistics. Lecture 24. M.

Transcription:

Goodess-of-Fit Tests ad Categorical Data Aalysis (Devore Chapter Fourtee) MATH-252-01: Probability ad Statistics II Sprig 2019 Cotets 1 Chi-Squared Tests with Kow Probabilities 1 1.1 Chi-Squared Testig................ 1 1.1.1 Example: Twis.............. 2 1.2 Chi-Squared Test for a Specified Distributio... 3 2 Chi-Squared Tests with Estimated Parameters 3 2.1 Chi-Squared for a Parametrized Distributio... 4 3 Two-Way Cotigecy Tables 5 3.1 Perspective 1: Test for homogeeity........ 6 3.2 Perspective 2: Test for idepedece....... 6 3.2.1 Example................... 7 Copyright 2019, Joh T. Whela, ad all that Tuesday 16 April 2019 1 Chi-Squared Tests with Kow Probabilities 1.1 Chi-Squared Testig The ext sort of iferece we cosider is kow as categorical data aalysis. Cosider a experimet with a set of k possible discrete outcomes. If we do idepedet repetitios of that experimet, we fid 1 that ed i outcome #1, 2 that ed i outcome #2, up to k edig i outcome #k, where each i is a o-egative iteger, ad 1 2 k. There are k differet categories ito which each observatio ca fall, ad we cout up the umbers of each. We have a model which tells us the expected probabilities p 1, p 2,..., p k of the differet outcomes, where 0 p i 1 ad p 1 p 2 p k 1. The expected umbers of observatios i each category accordig to the model are p 1, p 2,..., p k. (Note that these will i geeral ot be itegers.) If we cosider this model to be the ull hypothesis, we re iterested i a test which rejects the model if 1

the observed umber couts i the categories are very differet from their expected values. To see the test statistic, we take a slight diversio ad recall that if X 1,..., X k are idepedet radom ormal variables such that X i N(µ i, σ 2 i ), the U k (X i µ i ) 2 i1 σ 2 i k i1 Z i 2 (1.1) is a chi-square radom variable with k degrees of freedom, also kow as χ 2 (k). Similarly, oe of the cosequeces of Studet s theorem is that if {X i } is a sample from N(µ, σ 2 ) V k (X i X) 2 i1 σ 2 χ 2 (k 1) (1.2) as we saw whe we cosidered cofidece itervals for the populatio variace, if we have a statistic W χ 2 (ν), P (W χ 2 α,ν) α (1.3) so comparig the statistic value to χ 2 α,ν gives a test at sigificace level α of the origial model. To coect this to the categorical data aalysis problem, cosider the k 2 case, where there are two possible outcomes. The N 1 Bi(, p 1 ) ad N 2 N 1. If p 1 ad p 2 (1 p 1 ) are both more tha about 5, we ca treat the biomial radom variable N 1 as approximately N(p 1, p 1 p 2 ) so W (N 1 p 1 ) 2 p 1 p 2 (1.4) Is approximately χ 2 (1)-distributed. If we use a little algebra, specifically 1 p 1 p 2 1 p 1 1 p 2 ad N 1 p 1 (N 2 p 2 ), we ca show W (N 1 p 1 ) 2 (N 2 p 2 ) 2 (1.5) p 1 p 2 which is a form of the chi-squared statistic (with k 1 1 degree of freedom) which treats outcomes 1 ad 2 o equal footig. Now cosider the case where k > 2. The radom variables N 1, N 2,..., N k are ot idepedet, but obey what s called a multiomial distributio, whose pmf (for every possible combiatio of o-egative itegers with 1 2 k ) is p( 1,..., k )! 1! 2! k! p 1 1 p 2 2 p k k (1.6) I fact, the last radom variable N k N 1 N 2 N k 1 is redudat; all the iformatio is carried i the first k 1 variables. It turs out that, as log as all the expected umber couts {p i } (icludig p k ) are at least 5, the statistic W k (N i p i ) 2 i1 p i (1.7) writte i aalogy with (1.5) is a chi-squared radom variable with k 1 degrees of freedom. So the statistic value (to be compared to χ 2 α,k 1 ) for a specific set of umber couts is k ( i p i ) 2 (observed expected) 2 p i expected i1 1.1.1 Example: Twis (1.8) Cosider 50 sets of twis. If they are all frateral, ad boys ad girls are equally likely, we d expect o average 12.5 boy-boy pairs, 25 mixed pairs, ad 12.5 girl-girl pairs. obs 8 23 19 exp 12.5 25 12.5 2

We ca calculate χ 2 (8 12.5)2 12.5 (23 25)2 25 (19 12.5)2 12.5 5.16 (1.9) ad ote that χ 2.10,2 4.605 while χ 2.05,2 5.992 so this hypothesis would be rejected at the 10% level but ot the 5% level. 1.2 Chi-Squared Test for a Specified Distributio We ca also apply the chi-squared test to a histogram of data hypothesized to be a sample draw from a specified distributio. The the umber couts are the umber of observatios which fall ito each bi. For example, suppose we wat to test the hypothesis that a set of 80 observatios is a sample from a expoetial distributio with rate parameter λ l 2. The cdf is F (x) 1 e x l 2 1 1 2 x (1.10) so the expected umber of observatios lyig betwee x 1 ad x 2 is. Suppose we divide ito four bis, ad obtai the 2 x 1 2 x 2 followig observed ad expected histogram. x < 1 1 x < 2 2 x < 3 3 x obs 40 21 14 5 exp 40 20 10 10 The chi-squared statistic will be χ 2 (40 40)2 40 (21 20)2 20 0 40 1 20 16 10 25 10 (14 10)2 10 1 32 50 20 (5 10)2 10 83 20 4.15 (1.11) We ca compare this to the percetiles of χ 2 (3) distributio. Sice χ 2.10,3 6.251, we see that the P -value is greater tha 10%, ad we would ot reject this model at ay reasoable cofidece level. Note that we could choose differet bis tha the oes we did; i practice it s good to choose the bis to have similar umbers of expected observatios. Practice Problems 14.1, 14.3 Thursday 18 April 2019 2 Chi-Squared Tests with Estimated Parameters I both of the examples from last time, we assumed that the expected umbers for each category were kow, but that required a assumptio i each case. Cosiderig the problem of twi distributio, we assumed that half of all childre were male, leadig to expected umbers of /4 /2 /4 If the fractio of male childre is θ, this becomes θ 2 2θ(1 θ) (1 θ) 2 we ve replaced the fractio expected i category k, which was previously a fixed umber p i, with a fuctio π i (θ). Ad i priciple, we ca imagie that there would be ot just oe parameter θ, but a set of parameters θ {θ 1, θ 2,..., θ m }, where 3

m < k 1. The chi-square statistic would the be χ 2 (θ) k [ i π i (θ)] 2 i1 π i (θ) (2.1) To actually evaluate this for a test, we eed to put i a estimate ˆθ for the parameters. Oe choice would be to fid the oe which miimizes χ 2 (θ) for the observed { i }, but istead, we will use the maximum likelihood estimate, i.e., the oe which maximizes f( 1,..., k ; θ) [π 1 (θ)] 1 [π 2 (θ)] 2 [π k (θ)] k (2.2) or equivaletly l f( 1,..., k ; θ) k! i l π i (θ) l 1! k! i1 (2.3) where the 1,..., k we use are the actual observed values. For example, i the twi example, the log-likelihood is l f( 1,..., k ; θ) 2 1 l θ 2 [l θ l(1 θ))] 2 3 l(1 θ) cost (2 1 2 ) l θ ( 2 2 3 ) l(1 θ) cost (2.4) where the costat depeds o 1, 2 ad 3, but ot θ. Differetiatig with respect to θ gives the maximum likelihood equatio which has the solutio 2 1 2 ˆθ ˆθ 2 1 2 2 2 2 3 1 ˆθ 16 23 100 0 (2.5).39 (2.6) We ca plug that ito the formulas for the expected umbers of twis i each category (for 50) ad get obs 8 23 19 exp θ 2 2θ(1 θ) (1 θ) 2 exp for θ.39 7.605 23.790 18.605 The chi-squared is the χ 2 (8 7.605)2 7.605 (23 23.790)2 23.790 (19 18.605)2 18.605.0551 (2.7) However, sice we ve used the data to fit oe parameter, the percetiles we should compare it to are ot χ 2 (2) but χ 2 (2 1) χ 2 (1). I geeral, if we have k categories ad m fuctioally idepedet parameters (meaig o oe parameter ca be completely determied from the other m 1), the resultig statistic will be approximately χ 2 (k 1 m). 2.1 Chi-Squared for a Parametrized Distributio We ca also retur to the applicatio where we apply the chisquared test to a bied histogram to check a hypothesized distributio. Last time we gave a example where we observed the followig umber couts for a data sample of size 80: x < 1 1 x < 2 2 x < 3 3 x 40 21 14 5 Last time we hypothesized a expoetial distributio with λ l 2, but suppose we allowed λ to be a ukow parameter. The the cdf of the distributio would be F (x) 1 e λx (2.8) ad the expected umber couts i the bis would be 4

x < 1 1 x < 2 2 x < 3 3 x (1 e λ ) (e λ e 2λ ) (e 2λ e 3λ ) e 3λ If we wated to follow the maximum likelihood prescriptio from the previous sectio, we would eed to fid the λ value which maximizes (1 e λ ) 1 (e λ e 2λ ) 2 (e 2λ e 3λ ) 3 (e 3λ ) 4 (2.9) This seems like a pretty complicated fuctio, although it turs out that it is maximized by settig λ to ˆλ l 1 2 2 3 3 3 4 2 2 3 3 4 (2.10) ad the resultig statistic is approximately χ 2 (3)-distributed. But i geeral it may be difficult ad/or require a umerical method to fid the parameters ˆθ which maximize k i l[f (b i ; θ) F (a i ; θ)] (2.11) i1 where the ith histogram bi covers a i < x b i, which is what we d eed to get a statistic which was χ 2 distributed with k 1 m degrees of freedom. Istead, it is ofte coveiet to use some other poit estimate costructed ot just from the bied histogram, but from the full sample of data values. For istace, for the expoetial model we could use λ 1/x. If we do this, though the χ 2 value will geerally ot be as low as if we had really miimized over θ. This meas that the threshold c α for a test with false alarm probability α will be betwee the value χ 2 α,k 1 we d use if we had fixed the parameters arbitrarily ad the (lower) value χ 2 α,k 1 m we d use if we had miimized the χ 2 over the m parameters, i.e., χ 2 α,k 1 m c α χ 2 α,k 1 (2.12) Practice Problems 14.15, 14.17 Tuesday 23 April 2019 3 Two-Way Cotigecy Tables We ow cosider aother categorical data aalysis problem, i which we have two sets of categories, ad each observatio falls ito oe category from the first set ad oe from the secod set. We d like to kow if the data idicate ay statistical relatioship betwee which of the first set of categories a observatio falls ito ad which of the secod. For istace, suppose we survey studets majorig i four disciplies about their food choices: Vega Vegetaria No-Veg Total Math & Stat 9 23 49 81 Physics 6 15 30 51 Chemistry 12 28 62 102 Biology 25 50 98 173 Total 52 116 239 407 We d like to kow if the data idicate ay sigificat tedecies for studets i oe major to have oe diet or aother. There are two ways to pose the questio: 1. Is there ay differece i the tedecies of studets i oe major or aother to have a vega, vegetaria, or ovegetaria diet? (homogeeity) 2. Is there ay correlatio betwee the major chose by a studet ad their dietary choices? (idepedece) The two questios have differet iterpretatios i the framework of classical statistics, but they tur out to lead to idetical data aalysis procedures. 5

As a matter of otatio, we ll cosider the table to have I rows ad J colums, labelled by i 1, 2,..., I ad j 1, 2,..., J, respectively. (I the table above, I is 4 ad J is 3.) We ll call the observed umber i each cell (i, j). (Devore calls this N ij.) We ll write the total of the umbers i row i as (i, ) J j1 (i, j) (Devore: i ) ad i colum j as (, j) I i1 (i, j) (Devore: j). The total umber of observatios is I (i, ) i1 I i1 J (i, j) j1 J (, j) (3.1) j1 3.1 Perspective 1: Test for homogeeity I the first way of lookig at thigs, the umber (i, ) i each row is a give, ad for each i we cosider {N(i, j) j 1,... J} to be a multiomial radom vector with probabilities p(j i). (Devore calls this p ij.) Note that this is ot a coditioal probability accordig to the set theory defiitio of Devore Chapter Two, but it s the probability for a observatio which is i row i to fall ito colum j. These probabilities obey J j1 p(j i) 1 for each i. The most importat property of the multiomial for us is that E (N(i, j)) (i, )p(j i) (3.2) The ull hypothesis of homogeeity is that each of these I sets of J probabilities is the same: H 0 : p(j i) p(, j) for all I (3.3) We ca estimate this commo probability as ˆp(, j) (,j) so that the estimated expected umber of items i each cell is ê(i, j) (i, )ˆp(, j) (i, ) (, j) (3.4) We the use these estimated expected umbers to make a chisquared statistic for the whole table s divergece from homogeeity: I J χ 2 [(i, j) ê(i, j)] 2 (3.5) ê(i, j) i1 j1 We have I multiomial radom variables with J categories each, which meas we ve observed I(J 1) idepedet umbers. We ve estimated J probabilities, but oly J 1 of them were idepedet because they had to add to 1. Thus the umber of degrees of freedom for the chi-squared should be I(J 1) (J 1) (I 1)(J 1) (3.6) Note that although the formalism treats the rows ad colums rather differetly, the fial data aalysis prescriptio treats them symmetrically. 3.2 Perspective 2: Test for idepedece I the secod way of lookig at thigs, the oly thig we treat as give is the total umber of observatios, ad the observatio {N(i, j) i 1,... I; j 1,... J} is a (IJ-dimesioal) multiomial radom vector with probabilities p(i, j). (Devore also calls this p ij, which is oe reaso I wated to make the otatio more descriptive.) These probabilities obey I J i1 j1 p(i, j) 1. Now the multiomial says E (N(i, j)) p(i, j) (3.7) The ull hypothesis of idepedece is that the probabilities ca be decomposed assumig idepedece of the two sets of categories: H 0 : p(i, j) p(i, ) p(, j) (3.8) 6

We ca estimate the probabilties as ˆp(i, ) (i, ) ad ˆp(, j) (,j) so that the estimated expected umber of items i each cell is ê(i, j) ˆp(i, )ˆp(, j) (i, ) (, j) 2 (i, ) (, j) (3.9) which is exactly what we saw above 1 We the make the chisquared as before: χ 2 I i1 J [(i, j) ê(i, j)] 2 j1 ê(i, j) (3.10) We have a sigle multiomial with IJ categories ow, so there are IJ 1 idepedet observatios. We ve estimated I probabilities {ˆp(i, ), I 1 of which are idepedet, ad J probabilities {ˆp(, j), J 1 of which are idepedet, so we have Vega Vegetaria No-Veg Total Math & Stat 10.35 23.09 47.57 81 Physics 6.52 14.54 29.95 51 Chemistry 13.03 29.07 59.90 102 Biology 22.10 49.31 101.59 173 Total 52 116 239 407 Note that it s really easy to do this i a simple spreadsheet program. If I costruct the chi-squared statistic I get 0.99. I eed to compare this to a chi-squared distributio with (4 1)(3 1) (3)(2) 6 degrees of freedom, so 0.99 is rather low ideed ad the data show o sigificat correlatio. Practice Problems 14.27, 14.34, 14.35, 14.41 IJ 1 (I 1) (J 1) IJ I J 1 (I 1)(J 1) (3.11) which, agai, is the same umber of degrees of freedom as the homogeeity test told us. 3.2.1 Example We retur to the example above. The estimates accordig to the model are (81)(52) 10.35, (81)(116) 23.09, etc.: 407 407 1 Note that viewed as a statistic, the estimate from idepedece is N(i, ) N(,j) ê(i, j) while that from homogeeity is ê(i, j) but this distictio has o effect o the data aalysis prescriptio. (i, ) N(,j), 7