Chapter 11. Hypothesis Testing (II)

Similar documents
Lecture 28 Chi-Square Analysis

Testing Independence

Chapter 10. Hypothesis Testing (I)

Discrete Multivariate Statistics

Statistics 3858 : Contingency Tables

STAT 461/561- Assignments, Year 2015

We know from STAT.1030 that the relevant test statistic for equality of proportions is:

Nominal Data. Parametric Statistics. Nonparametric Statistics. Parametric vs Nonparametric Tests. Greg C Elvers

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Recall the Basics of Hypothesis Testing

13.1 Categorical Data and the Multinomial Experiment

STAT Chapter 13: Categorical Data. Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure).

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

STP 226 EXAMPLE EXAM #3 INSTRUCTOR:

One-Sample Numerical Data

Statistics 135 Fall 2008 Final Exam

Lecture 10: Generalized likelihood ratio test

Chapter 9: Hypothesis Testing Sections

Inferential statistics

Chi-Squared Tests. Semester 1. Chi-Squared Tests

CONTINUOUS RANDOM VARIABLES

Lecture 21: October 19

Topic 21 Goodness of Fit

Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

1 Hypothesis testing for a single mean

Central Limit Theorem ( 5.3)

4 Hypothesis testing. 4.1 Types of hypothesis and types of error 4 HYPOTHESIS TESTING 49

Class 24. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Inferential Statistics

Contents 1. Contents

Log-linear Models for Contingency Tables

Section 4.6 Simple Linear Regression

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

Lecture 9. Selected material from: Ch. 12 The analysis of categorical data and goodness of fit tests

Nonparametric Location Tests: k-sample

Statistics. Statistics

Goodness of Fit Goodness of fit - 2 classes

MATH5745 Multivariate Methods Lecture 07

Topic 2: Probability & Distributions. Road Map Probability & Distributions. ECO220Y5Y: Quantitative Methods in Economics. Dr.

Hypothesis testing: theory and methods

Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution

STAT FINAL EXAM

simple if it completely specifies the density of x

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

This does not cover everything on the final. Look at the posted practice problems for other topics.

Investigation of goodness-of-fit test statistic distributions by random censored samples

Psych Jan. 5, 2005

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Dr. Maddah ENMG 617 EM Statistics 10/15/12. Nonparametric Statistics (2) (Goodness of fit tests)

More on nuisance parameters

Mathematical Statistics

STAT2912: Statistical Tests. Solution week 12

Stat 135 Fall 2013 FINAL EXAM December 18, 2013

Psych 230. Psychological Measurement and Statistics

The Multinomial Model

Some General Types of Tests

Exercises and Answers to Chapter 1

Chapter 7. Hypothesis Testing

2.3 Analysis of Categorical Data

HYPOTHESIS TESTING: THE CHI-SQUARE STATISTIC

TUTORIAL 8 SOLUTIONS #

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions.

Unit 14: Nonparametric Statistical Methods

Notes on the Multivariate Normal and Related Topics

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Chapter 10: Chi-Square and F Distributions

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

12.10 (STUDENT CD-ROM TOPIC) CHI-SQUARE GOODNESS- OF-FIT TESTS

Math 181B Homework 1 Solution

Mean Vector Inferences

Lecture 26: Likelihood ratio tests

Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.

Module 10: Analysis of Categorical Data Statistics (OA3102)

8 Comparing Two Populations via Independent Samples, Part 1/2

Chapter 26: Comparing Counts (Chi Square)

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

# of 6s # of times Test the null hypthesis that the dice are fair at α =.01 significance

Rama Nada. -Ensherah Mokheemer. 1 P a g e

Chapter 15: Nonparametric Statistics Section 15.1: An Overview of Nonparametric Statistics

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Topic 19 Extensions on the Likelihood Ratio

Chapter 8: Hypothesis Testing Lecture 9: Likelihood ratio tests

Statistics for Managers Using Microsoft Excel

ST3241 Categorical Data Analysis I Two-way Contingency Tables. 2 2 Tables, Relative Risks and Odds Ratios

Question. Hypothesis testing. Example. Answer: hypothesis. Test: true or not? Question. Average is not the mean! μ average. Random deviation or not?

Qualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama

Problem #1 #2 #3 #4 #5 #6 Total Points /6 /8 /14 /10 /8 /10 /56

Part III: Unstructured Data

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

Inferences for Correlation

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami

Statistical Analysis for QBIC Genetics Adapted by Ellen G. Dow 2017

Chapter 10. Chapter 10. Multinomial Experiments and. Multinomial Experiments and Contingency Tables. Contingency Tables.

6 Single Sample Methods for a Location Parameter

STAT 730 Chapter 5: Hypothesis Testing

Transcription:

Chapter 11. Hypothesis Testing (II) 11.1 Likelihood Ratio Tests one of the most popular ways of constructing tests when both null and alternative hypotheses are composite (i.e. not a single point). Let X f (, θ). Consider hypotheses H 0 : θ Θ 0 vs H 1 : θ Θ Θ 0. The likelihood ratio test will reject H 0 for the large values of the statistic LR = LR(X) sup θ Θ f (X, θ) sup θ Θ0 f (X, θ) = f (X, θ) / f (X, θ), where θ the (unconstrained) MLE, and θ is the constrained MLE under hypothesis H 0.

Remark. (i) It is easy to see that LR 1. (ii) The exact sampling distributions of LR are usually unknown, except in a few special cases. Example 1. (One-sample t -test) Let X = (X 1,, X n ) τ be a random sample from N (µ, σ 2 ). We are interested in testing hypotheses H 0 : µ = µ 0 against H 1 : µ µ 0, where µ 0 is given, and σ 2 is unknown and is a nuisance parameter. Now both H 0 and H 1 are composite. The likelihood function is L(µ, σ 2 ) = C σ n exp { 1 2σ 2 n (X i µ) 2}.

The unconstrained MLEs are and the constrained MLE is The LR-ratio statistic is then Since µ = X, σ 2 = 1 n σ 2 = 1 n n (X i X ) 2, n (X i µ 0 ) 2. LR = L( µ, σ2 ) L(µ 0, σ 2 ) = ( σ2 / σ 2 ) n/2. n σ 2 = n σ 2 + n( X µ 0 ) 2,

it holds that σ 2 / σ 2 = 1 + T 2 /(n 1), where T = n( X µ 0 ) /{ 1 n 1 n (X i X ) 2} 1/2. Note that T t n 1 under H 0. The LRT will reject H 0 iff T > t n 1,α/2, where t k,α is the upper α-point of the t -distribution with k degrees of freedom.

Asymptotic Distribution of Likelihood ratio test statistic Let X = (X 1,, X n ) τ, and assume certain regularity conditions. Then as n, the distribution of 2 log(lr) under H 0 converges to the χ 2 - distribution with d d 0 degrees of freedom, where d is the dimension of Θ and d 0 is the dimension of Θ 0. To make the computation of dimension easy, reparametrisation is often adopted. Suppose that the parameter θ may be written in two parts θ = (ψ, λ) where ψ is k 1 parameter of interest, and λ is of little interest and is called nuisance parameters. The hypotheses to be tested may be expressed as H 0 : ψ = ψ 0 vs H 1 : ψ ψ 0.

Now the LR-statistic is of the form LR = L( ψ, λ; X) L(ψ 0, λ; X), where ( ψ, λ) is unconstrained MLE while λ is the constrained MLE of λ subject to ψ = ψ 0. Then as n, 2 log(lr) D χ 2 k under H 0. Example 2. Let X 1,, X n be independent, and X j N (µ j, 1). Consider the null hypothesis H 0 : µ 1 = = µ n.

The likelihood function is L(µ 1,, µ n ) = C exp { 1 2 n (X j µ j ) 2}, where C > 0 is a constant independent of µ j. Then the unconstrained MLE are µ j = X j and the constrained MLE is µ = X. Hence Hence LR = L( µ 1,, µ n ) L( µ,, µ) 2 log(lr) = = exp { 1 2 n (X j X ) 2}. n (X j X ) 2 χn 1 2 under H 0, which is true for any finite n as well. How to calculate the degree of freedom?

Since d = n, d 0 = 1, the d.f. is d d 0 = n 1. Alternatively we may adopt the following reparametrisation: µ j = µ 1 + ψ j for 2 j n. Then the null hypothesis can be expressed as H 0 : ψ 2 = = ψ n = 0. Therefore ψ = (ψ 2,, ψ n ) τ has n 1 component, i.e. k = n 1. 11.2 The permutation test a nonparametric method for testing if two distributions are the same. It is particularly appealing when sample sizes are small, as it does not rely on any asymptotic theory.

Let X 1,, X m be sample from distribution F x and Y 1,,Y n be a sample from distribution F y. We are interested in testing H 0 : F x = F y versus H 1 : F x F y. Key idea: under H 0, {X 1,, X m,y 1,,Y n } form a sample of size m + n from a single distribution. Choose a test statistic T = T (X 1,, X m,y 1,,Y n ) which is capable to tell the difference between the two distribution, e.g. T = X Y, or T = X Y 2 + Sx 2 Sy 2. Consider all (m + n)! permutations of (X 1,, X m,y 1,,Y n ), compute the test statistic T for each permutation, yielding the values T 1,,T (m+n)!.

The p-value of the test is defined as p = 1 (m + n)! (m+n)! I (T j > t obs ), where t obs = T (X 1,, X m,y 1,,Y n ). We reject H 0 at the significance level α if p α. Note. When H 0 holds, all those (m + n)! T j s are on the equal footing, and t obs = T (X 1,, X m,y 1,,Y n ) is one of them. Therefore t obs is unlikely to be an extreme value among T j s.

Algorithm for Permutation Tests: 1. Compute t obs = T (X 1,, X m,y 1,,Y n ). 2. Randomly permute the data. Compute T again using the permuted date. 3. Repeat Step 2 B times, and let T 1,,T B denote the resulting values. 4. The approximate p-value is B 1 1 j B I (T j > t obs ). Remark. Let Z = (X 1,, X m,y 1,,Y n ) (Z <- c(x,y)). A permutation of Z may be obtained in R as Zp <- sample(z, n+m) You may also use the R-function sample.int: k <- sample.int(n+m, n+m) Now k is a permutation of {1, 2,, n + m}.

Example 3. Class A was taught using detailed PowerPoint slides. The marks in the final exam are 45, 55, 39, 60, 64, 85, 80, 64, 48, 62, 75, 77, 50. Students in Class B were required to read books and answer questions in class discussions. The marks in the final exam are 45, 59, 48, 74, 73, 78, 66, 69, 79, 81, 60, 52. Can we infer that the marks from the two classes are significantly different? We conduct the permutation test using the test statistic T = X Y in R: > x <- c(45, 55, 39, 60, 64, 85, 80, 64, 48, 62, 75, 77, 50) > y <- c(45, 59, 48, 74, 73, 78, 66, 69, 79, 81, 60, 52) > length(x); length(y) [1] 13

[1] 12 > summary(x) Min. 1st Qu. Median Mean 3rd Qu. Max. 39.00 50.00 62.00 61.85 75.00 85.00 > summary(y) Min. 1st Qu. Median Mean 3rd Qu. Max. 45.00 57.25 67.50 65.33 75.00 81.00 > Tobs <- abs(mean(x)-mean(y)) > z <- c(x,y) > k <- 0 > for(i in 1:5000) { + zp <- sample(z, 25) # zp is a permutation of z + T <- abs(mean(zp[1:13])-mean(zp[14:25])) + if(t>tobs) k <- k+1 + } cat("p-value:", k/5000, "\n") p-value: 0.5194 Since p-value is 0.5194, we cannot reject the null-hypothesis that the mark distributions of the two classes are the same.

We also apply the t -sample, obtaining the similar results: > t.test(x, y, var.equal=t) # mu=0 is the default Two Sample t-test data: x and y t = -0.6472, df = 23, p-value = 0.5239 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -14.632967 7.658608

11.3 χ 2 -tests 11.3.1 Goodness-of-fit tests: to test if a given distribution fits the data. Let {X 1,, X n } be a random sample from a discrete distribution of k categories denoted by 1,, k. Denote the probability function Then p j 0 and k p j = 1. p j = P (X i = j ), j = 1,, k. Typically n >> k. Therefore the data are often compressed into a table: Category 1 2 k Frequency Z 1 Z 2 Z k

where Obviously k Z j = n. Z j = No. of X i s equal to j, j = 1,, k. To test the null hypothesis H 0 : p i = p i (θ), i = 1,, k, where the function forms of p i (θ) are known but the parameter θ is unknown. For example, p i (θ) = θ i 1 e θ /(i 1)! (i.e. Poisson distribution). We first estimate θ by, for example, its MLE θ. The expected frequencies under H 0 are E i = np i ( θ), i = 1,, k.

Listing them together with observed frequencies, we have Category 1 2 k Frequency Z 1 Z 2 Z k Expected frequency E 1 E 2 E k If H 0 is true, we expect Z j E j = np j ( θ) when n is large, as, by the LLN, it holds Z j n = 1 n I (X n i = j ) E {I (X i = j )} = P (X i = j ) = p j (θ). i =1 Test statistic: T = k (Z j E j ) 2/ E j. Theorem. Under H 0, T components in θ. D χ 2 k 1 d as n, where d is the number of

Remark. (i) It is important that E i 5 at least. This may be achieved by combining together the categories with smaller expected frequencies. (ii) When p j are completely specified (i.e. known) under H 0, d = 0. Example 4. A supermarket recorded the numbers of arrivals over 100 one-minute intervals. The data were summarized as follows No. of arrivals 0 1 2 3 4 5 7 Frequency 13 29 32 20 4 1 1 Do the data match a Poisson distribution? The null hypothesis is H 0 : p i = λ i e λ /i! for i = 0, 1,. We find the MLE for λ first.

The likelihood function: L(λ) = 100 i =1 λx i 100 X i! e λ λ i =1 X i e 100λ. The log-likelihood function: l (λ) = log(λ) 100 i =1 X i 100λ. Let d dλ l (λ) = 0, leading to λ = 1 100 100 i =1 X i = X. Since we are only given the counts Z j instead of X i, we need to compute X from Z j. Recall Z j = no. of X i equal to j. Hence = X = 1 n n X i = 1 n i =1 k j Z j 1 (0 13 + 1 29 + 2 32 + 3 20 + 4 4 100 +5 1 + 7 1) = 1.81.

With λ = 1.81, the expected frequencies are E i = n p i ( λ) = 100 (1.81)i e 1.81, i! i = 0, 1,. We combine the last three categories to make sure E i 5. No. of arrivals 0 1 2 3 4 Total Frequency Z i 13 29 32 20 6 100 p i ( λ) = λ i e λ /i! 0.164 0.296 0.268 0.162 0.110 1 Expected frequency E i 16.4 29.6 26.8 16.2 11.0 100 Difference Z i E i -3.4-0.6 5.2 3.8-5 0 (Z i E i ) 2 /E i 0.705 0.012 1.01 0.891 2.273 4.891 Note under H 0, T = 4 i =0 (Z i E i ) 2 /E i χ 5 1 1 2 = χ2. Since T = 4.891 < 3 χ 0.10, 2 = 6.25, we cannot reject the assumption that the data follow a 3 Poisson distribution.

Remark. (i) The goodness-of-fit test has been widely used in practice. However we should bear in mind that when H 0 cannot be rejected, we are not in the position to conclude that the assumed distribution is true, as not reject accept" (ii) The above test may be used to test the goodness-of-fit of a continuous distribution via discretization. However there exist more appropriate methods such as Kolmogorov-Smirnov test and Cramér-von Mises test, which deal with the goodness-of-fit for continuous distributions directly.

11.3.2 Tests for contingency tables Tests for independence of two discrete random variables Let (X,Y ) be two discrete random variables, and X have r categories and Y have c categories. Let p ij = P (X = i, Y = j ), i = 1,, r, j = 1,, c. Then p ij 0 and i,j p ij r i =1 cj =1 p ij = 1. Let p i = P (X = i ) and p j = P (Y = j ). It is easy to see that p i = c P (X = i, Y = j ) = c p ij = j p ij Similarly, p j = i p ij

X and Y are independent iff p ij = p i p j for i = 1,, r and j = 1,, c. Suppose we have n pairs of observations from (X,Y ). The data are presented in a contingency table below Y 1 2 c 1 Z 11 Z 12 Z 1c X 2. Z 21. Z 22.. Z 2c. r Z r 1 Z r 2 Z r c where Z ij = no. of the pairs equal to (i, j ).

It is often useful to add the marginals into the table: c r r Z i = Z ij, Z j = Z ij, Z = Z i = i =1 i =1 c Z j = n Y 1 2 c 1 Z 11 Z 12 Z 1c Z 1 X 2. Z 21. Z 22.. Z 2c. Z 2. r Z r 1 Z r 2 Z r c Z r Z 1 Z 2 Z c Z = n

We are interested in testing the independence H 0 : p ij = p i p j, i = 1,, r, j = 1,, c. Under H 0, a natural estimator for p ij is p ij = p i p j = Z i n Hence the expected frequency at the (i, j )-th cell is E ij = n p ij = Z i Z j /n = Z i Z j /Z, i = 1,, r, j = 1,, c. If H 0 is true, we expect Z ij defined as T = Z j n E ij. The goodness-of-fit test statistic is r i =1 c (Z ij E ij ) 2/ E ij. We reject H 0 for large values of T.

Under H 0, T χ 2 p d, where p = no. of cells -1 = r c 1 d = no. of estimated free parameters = r + c 2. Note. 1. The sum of r c counts Z ij is n fixed. So knowing r c 1 of them, the other one is also known. This is why p = r c 1. 2. The estimated parameters are p i and p j. But r i =1 p i = 1 and c p j = 1. Hence d = (r 1) + (c 1) = r + c 2. 3. For testing independence, it always holds that Z i E i = 0 and Z j E j = 0. Those are useful facts in checking for computational errors. The proofs are simple, as, for example, Z i E i = Z i E ij = Z i j j Z i Z j Z = Z i Z i Z Z = 0.

Theorem. Under H 0, the limiting distribution of T is χ 2 with (r 1)(c 1) degrees of freedom, as n.

Example. The table below lists the counts on the beer preference and gender of beer drinker from randomly selected 150 individuals. Test at the 5% significance level the hypothesis that the preference is independent of the gender. Beer preference Light ale Lager Bitter Total Gender Male 20 40 20 80 Female 30 30 10 70 Total 50 70 30 150 The expected frequencies are: E 11 = E 21 = 80 50 150 = 26.67, E 12 = 70 50 150 = 23.33, E 22 = 80 70 150 = 37.33, E 13 = 70 70 150 = 32.67, E 33 = 80 30 150 = 16, 70 30 150 = 14.

26.67 37.33 16 80 23.33 32.67 14 70 50 70 30 150-6.67 2.67 4 0 6.67-2.67-4 0 0 0 0 0 1.668 0.191 1.000 2.859 1.907 0.218 1.142 3.267 6.126 Under the null hypothesis of independence, T = i,j (Z ij E ij ) 2 /E ij χ 2 2. Note the degree freedom is (2 1)(3 1) = 2. Since T = 6.126 > χ 0.05, 2 = 5.991, we reject the null hypothesis, i.e. there 2 is significant evidence from the data indicating that the beer preference and the gender of beer drinker are not independent.

Tests for several binomial distributions Consider a real example: Three independent samples of sizes 80, 120 and 100 are taken respectively from single, married, and widowed or divorced persons. Each individual was asked to if friends and social life or job and primary activity contributes most to their general well-being. The counts from the three samples are summarized in the table below. Friends and social life Job or primary activity Single Married Widowed or divorced 47 59 56 33 61 44 Total 80 120 100

Conditional Inference: Sometimes we conduct inference under the assumption that all the row (or column) margins are fixed. Different from the tables for independent tests, now Z 1j Bi n(z j, p 1j ), j = 1, 2, 3, where Z j are fixed constants sample sizes. Furthermore, p 2j = 1 p 1j. We are interested in testing hypothesis H 0 : p 11 = p 12 = p 13. Under H 0, the three independent samples may be seen from the same population. Furthermore, Z 11 + Z 12 + Z 13 Bi n(z 1 + Z 2 + Z 3, p),

where p denotes the common value of p 11, p 12 and p 13. Therefore the MLE is p = Z 11 + Z 12 + Z 13 Z 1 + Z 2 + Z 3 = The expected frequencies are 47 + 59 + 56 80 + 120 + 100 = 0.54. E 1j = pz j and E 2j = Z j E 1j, j = 1, 2, 3. 43.2 64.8 54.0 36.8 55.2 46.0 Total 80 120 100 3.8-5.8 2.0-3.8 5.8-2.0 Total 0 0 0

0.334 0.519 0.074 0.392 0.609 0.087 Total 0.726 1.128 0.161 2.015

Under H 0, T = i,j (Z ij E ij ) 2 /E ij χ 2 p d = χ2 2, where p = no. of free counts Z ij = 3 d = no. of estimated free parameters = 1. Since T = 2.015 < χ 2 0.10,2 = 4.605, we cannot reject H 0, i.e. there is no significant difference among the three populations in terms of choosing between F&SL and J&PA as the more important factor towards their general well-being. Remark. Similar to the independence tests, it holds that Z i E i = 0 and Z j E j = 0.

Tests for r c tables a general description In general, we may test for different types of the structure in a r c table, for example, the symmetry (p ij = p j i ). The key is to compute expected frequencies E ij under null hypothesis H 0. Under H 0, the test statistic T = r i =1 c (Z ij E ij ) 2 p = no. of free counts among Z ij, E ij χ 2 p d, d = no. of the estimated free parameters. We reject H 0 if T > χ 2 α, p d. Remark. The R-function chisq.test performs both the goodness-of-fit test and the contingency table test.