Chapter 8: Hypothesis Testing Lecture 9: Likelihood ratio tests

Similar documents
Lecture 34: Properties of the LSE

Lecture 26: Likelihood ratio tests

Stat 710: Mathematical Statistics Lecture 27

Theorem A: Expectations of Sums of Squares Under the two-way ANOVA model, E(X i X) 2 = (µ i µ) 2 + n 1 n σ2

Lecture 17: Likelihood ratio and asymptotic tests

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

Lecture 6: Linear models and Gauss-Markov theorem

Analysis of Variance and Design of Experiments-I

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

If we have many sets of populations, we may compare the means of populations in each set with one experiment.

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

Statistics For Economics & Business

Lecture 20: Linear model, the LSE, and UMVUE

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

2017 Financial Mathematics Orientation - Statistics

Lecture 13: p-values and union intersection tests

I i=1 1 I(J 1) j=1 (Y ij Ȳi ) 2. j=1 (Y j Ȳ )2 ] = 2n( is the two-sample t-test statistic.

Two-Way Analysis of Variance - no interaction

Statistics for Managers Using Microsoft Excel Chapter 10 ANOVA and Other C-Sample Tests With Numerical Data

Disadvantages of using many pooled t procedures. The sampling distribution of the sample means. The variability between the sample means

TUTORIAL 8 SOLUTIONS #

Fractional Factorial Designs

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

iron retention (log) high Fe2+ medium Fe2+ high Fe3+ medium Fe3+ low Fe2+ low Fe3+ 2 Two-way ANOVA

3. Factorial Experiments (Ch.5. Factorial Experiments)

Some General Types of Tests

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Lecture 32: Asymptotic confidence sets and likelihoods

Statistical Hypothesis Testing

Multiple Comparisons. The Interaction Effects of more than two factors in an analysis of variance experiment. Submitted by: Anna Pashley

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

The legacy of Sir Ronald A. Fisher. Fisher s three fundamental principles: local control, replication, and randomization.

Chapter 9: Hypothesis Testing Sections

Stat 710: Mathematical Statistics Lecture 40

Stat 579: Generalized Linear Models and Extensions

H 2 : otherwise. that is simply the proportion of the sample points below level x. For any fixed point x the law of large numbers gives that

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

Multiple comparisons - subsequent inferences for two-way ANOVA

MAT3378 ANOVA Summary

Central Limit Theorem ( 5.3)

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

A Likelihood Ratio Test

In a one-way ANOVA, the total sums of squares among observations is partitioned into two components: Sums of squares represent:

Factorial designs. Experiments

Composite Hypotheses and Generalized Likelihood Ratio Tests

Ch. 1: Data and Distributions

Lecture 15. Hypothesis testing in the linear model

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

Formal Statement of Simple Linear Regression Model

11 Hypothesis Testing

2 Hand-out 2. Dr. M. P. M. M. M c Loughlin Revised 2018

Inferences for Regression

Correlation Analysis

Master s Written Examination

Unit 7: Random Effects, Subsampling, Nested and Crossed Factor Designs

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

SIMPLE REGRESSION ANALYSIS. Business Statistics

ML and REML Variance Component Estimation. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

Masters Comprehensive Examination Department of Statistics, University of Florida

Stat 502 Design and Analysis of Experiments General Linear Model

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

MATH 644: Regression Analysis Methods

Multiple Sample Numerical Data

Masters Comprehensive Examination Department of Statistics, University of Florida

1 One-way analysis of variance

The One-Way Repeated-Measures ANOVA. (For Within-Subjects Designs)

First Year Examination Department of Statistics, University of Florida

Master s Written Examination - Solution

[y i α βx i ] 2 (2) Q = i=1

Summary of Chapters 7-9

Ch 3: Multiple Linear Regression

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

Solutions to Final STAT 421, Fall 2008

Chapter 15: Analysis of Variance

8. Hypothesis Testing

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

Stat 710: Mathematical Statistics Lecture 41

Lecture 5: ANOVA and Correlation

Math 152. Rumbos Fall Solutions to Assignment #12

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Chapter 8 Student Lecture Notes 8-1. Department of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance

Linear models and their mathematical foundations: Simple linear regression

Homework 3 - Solution

Theory of Statistical Tests

Statistical Analysis of Unreplicated Factorial Designs Using Contrasts

Outline Topic 21 - Two Factor ANOVA

Direction: This test is worth 250 points and each problem worth points. DO ANY SIX

Likelihood Ratio Test of a General Linear Hypothesis. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 42

Unit 27 One-Way Analysis of Variance

Topic 19 Extensions on the Likelihood Ratio

Introduction to Business Statistics QM 220 Chapter 12

Institute of Actuaries of India

Two or more categorical predictors. 2.1 Two fixed effects

STAT 514 Solutions to Assignment #6

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

While you wait: Enter the following in your calculator. Find the mean and sample variation of each group. Bluman, Chapter 12 1

Mathematics Ph.D. Qualifying Examination Stat Probability, January 2018

Week 12 Hypothesis Testing, Part II Comparing Two Populations

Factorial and Unbalanced Analysis of Variance

Transcription:

Chapter 8: Hypothesis Testing Lecture 9: Likelihood ratio tests Throughout this chapter we consider a sample X taken from a population indexed by θ Θ R k. Instead of estimating the unknown parameter, we want to make a decision on whether the unknown θ is in Θ 0, a subset of Θ. Definition 8.1.1. A hypothesis is a statement about the population parameter θ. It is specified as H : θ Θ 0 for a Θ 0 Θ, where H stands for a hypothesis. Definition 8.1.2. For any hypothesis H 0 : θ Θ 0, its complementary hypothesis is H 1 : θ Θ 1 = Θ c 0. H 0 is called the null hypothesis and H 1 is called the alternative hypothesis. Based on a sample from the population, we want to decide which of the two complementary hypotheses is true, i.e., to test H 0 : θ Θ 0 versus H 1 : θ Θ 1 = Θ c 0 UW-Madison (Statistics) Stat 610 Lecture 9 2016 1 / 16

Definition 8.1.3. A hypothesis test (or simply a test) is a rule that specifies for which sample values H 0 is accepted or rejected (H 1 is accepted). The subset of the sample space for which H 0 is rejected is called the rejection region or critical region. Its complement is called the acceptance region. In this course, rejecting H 0 results in accepting H 1 and accepting H 0 results in rejecting H 1. Also, not rejecting" is the same as accepting. Typically, a test is specified in terms of a test statistic T (X) = T (X 1,...,X n ), a function of the sample X. For example, a test might specify that H 0 is to be rejected if the sample mean X is greater than 3. We introduce a method of using the likelihood function to construct tests, which is applicable as long as a likelihood is available. Properties and optimality of these tests will be studied later. UW-Madison (Statistics) Stat 610 Lecture 9 2016 2 / 16

Definition 8.2.1. The likelihood ratio test statistic for testing H 0 : θ Θ 0 versus H 1 : θ Θ c 0 is sup L(θ X) θ Θ 0 λ(x) = L(θ X), sup θ Θ where L(θ x) is the likelihood function based on X = x. A likelihood ratio test (LRT) is any test that has a rejection region of the form {x : λ(x) c} where c is a constant satisfying 0 c 1. The rationale behind LRTs is that λ(x) is likely to be small if there if there are parameter points in Θ c 0 for which x is much more likely than for any parameter in Θ 0. Note that in the denominator of the likelihood ratio, the sup is taken over Θ, not Θ c 0. The method of determining c is discussed later. UW-Madison (Statistics) Stat 610 Lecture 9 2016 3 / 16

The likelihood ratio method is related to the MLE discussed in Section 7.2.2. Suppose that θ is the MLE of θ and θ 0 is the MLE of θ when Θ 0 is treated as the parameter space (the so-called restricted MLE). Then λ(x) = L( θ 0 X) L( θ X) Example 8.2.2. Let X 1,...,X n be iid from N(µ,σ 2 ) with unknown µ R. Let µ 0 be a constant and consider H 0 : µ = µ 0 versus H 1 : µ µ 0 First, assume σ 2 = 1. Θ 0 has only one element, µ 0, and the numerator of λ(x) is L(µ 0 x). Since the MLE of µ is X, λ(x) = (2π) n/2 exp( n i=1 (x i µ 0 ) 2 /2) (2π) n/2 exp( n i=1 (x i x) 2 /2) UW-Madison (Statistics) Stat 610 Lecture 9 2016 4 / 16

= exp Hence, ( n 1 i µ 0 ) 2 i=1(x 2 + 1 2 n i=1 ) (x i x) 2 = exp ( n 2 ( x µ 0) 2) { {x : λ(x) c} = x : x µ 0 } 2(logc)/n Next, we consider the situation where σ 2 is unknown, i.e., θ = (µ,σ 2 ) R (0, ) and Θ 0 = {µ 0 } (0, ). Under H 0, the MLE of θ is (µ 0, σ 0 2 ), and the MLE without restriction is θ = ( X, σ 2 ), where Then σ 2 0 = 1 n n i=1(x i µ 0 ) 2 and σ 2 = 1 n n i=1 (X i X) 2 λ(x) = (2π σ 0 2) n/2 exp( n i=1 (x i µ 0 ) 2 /2 σ 0 2) (2π σ 2 ) n/2 exp( n i=1 (x i x) 2 /2 σ 2 ) = σ n exp( n/2) σ 0 n exp( n/2) [ n i=1 = (x i x) 2 ] n n i=1 (x i x) 2 + n( x µ 0 ) 2 = [1 + n( x µ 0) 2 ] n (n 1)s 2 UW-Madison (Statistics) Stat 610 Lecture 9 2016 5 / 16

where s 2 is the observed value of the sample variance S 2. Hence, { n x µ0 } {x : λ(x) c} = x : c 1/n 1 n 1s Example 8.2.3. Let X 1,...,X n be iid from exponential(θ,1) with unknown θ R and pdf f θ (x) = e (x θ), x > θ. Let θ 0 be a fixed constant and the hypotheses be H 0 : θ θ 0 versus H 1 : θ > θ 0 The likelihood function with X = x = (x 1,...,x n ) is { e (x 1 θ) e L(θ x) = (x n θ) θ x i,i = 1,...,n, 0 otherwise { exp( n = i=1 x i + nθ) θ x (1) 0 θ > x (1) Clearly, L(θ x) is an increasing function of θ on (,x (1) ]. UW-Madison (Statistics) Stat 610 Lecture 9 2016 6 / 16

Thus, the unrestricted MLE of θ is X (1) and ( ) n supl(θ x) = exp x i + nx (1) θ R The restricted MLE of θ depends on whether x (1) θ 0 ; if yes, the restricted MLE is still x (1) ; if x (1) > θ 0, the restricted MLE is θ 0 since L(θ x) increases for θ x (1). Hence, { ( ) exp n sup L(θ x) = i=1 x i + nx (1) x (1) θ 0 θ θ 0 exp( n i=1 x i + nθ 0 ) x (1) > θ 0 { 1 x(1) θ λ(x) = 0 i=1 exp ( n(x (1) θ 0 ) ) x (1) > θ 0 {x : λ(x) c} = { x : exp ( n(x (1) θ 0 ) ) c } = { } x : x (1) θ 0 n 1 logc In both examples, λ(x) is a function of sufficient statistics, which is true in general. UW-Madison (Statistics) Stat 610 Lecture 9 2016 7 / 16

Theorem 8.2.4. If T (X) is a sufficient statistic for θ and λ (T ) and λ(x) are the likelihood ratios based on T and X, respectively, then λ (T (x)) = λ(x) for every x X (the sample space). Proof. From the factorization theorem, L(θ x) = g θ (T (x))h(x) for some functions g θ and h. Then λ(x) = = sup L(θ x) θ Θ 0 sup θ Θ L(θ x) = sup L (θ T (x)) θ Θ 0 sup θ Θ sup g θ (T (x))h(x) θ Θ 0 sup θ Θ g θ (T (x))h(x) = L (θ T (x)) = λ (T (x)) sup g θ (T (x)) θ Θ 0 sup θ Θ g θ (T (x)) where the 3rd equality holds because h(x) does not depend on θ and the 4th equality holds because it can be shown that g θ (x) is proportional to the pdf or pmf of T (X). UW-Madison (Statistics) Stat 610 Lecture 9 2016 8 / 16

One way analysis of variance (ANOVA) Consider independent data Y i1,...,y ini N(µ i,σ 2 ), i = 1,...,k We want to test whether the populations have a common mean, H 0 : µ i = µ for all i vs H 1 : µ i s are not all equal We now derive the LRT for testing H 0. Under H 0, Y ij s are iid N(µ,σ 2 ), and the result about normal population tells us that the MLE of (µ,σ 2 ) under H 0 is θ 0 = (Ȳ,SST/N), where Ȳ = 1 n k n i i=1 j=1 Y ij SST = k n i i=1 j=1 (Y ij Ȳ )2 are respectively the sample mean of the total sample, the sample variance of the total sample multiplied by n 1, and n = k i=1 n i is the total sample size. In general, the MLE of (µ 1,..., µ k,σ 2 ) is (Ȳ1,...,Ȳk,SSW/n), where Ȳ i = 1 n i n i Y ij SSW = j=1 i=1 j=1 UW-Madison (Statistics) Stat 610 Lecture 9 2016 9 / 16 k n i (Y ij Ȳi) 2

are respectively the sample mean of the ith sample (from population i) and the so-called within population (group) sum of squares. Then, the likelihood ratio based on X = (Y ij,j = 1,...,n i,i = 1,...,k) is λ(x) = (2πSST/n) n/2 e n/2 ( ) SST n/2 (2πSSW/n) n/2 e n/2 = SSW To derive the cut-off value c for the LRT that rejects H 0 iff λ(x) < c, we need to find the distribution of a monotone function of λ(x). Note that SST = k n i i=1 j=1 (Y ij Ȳi) 2 + k i=1 n i (Ȳi Ȳ )2 = SSW + SSB where SSB is the sum of squares between population (groups). This identity is a special case of the so-called ANVOA decomposition. Then ( ) (k 1)F n/2 SSB/(k 1) λ(x) = 1 +, F = n k SSW/(n k) i.e., λ(x) is a strictly decreasing function of F. UW-Madison (Statistics) Stat 610 Lecture 9 2016 10 / 16

We may show that F has an F-distribution with degrees of freedom k 1 and n k, and this F-distribution is central under H 0 so that we reject H 0 if F > the upper α quantile of the central F-distribution. But this result follows from the general result we will establish next. In applications, it is common to use the following ANOVA table. ANOVA table Source df SS MS F between k 1 SSB MSB=SSB/(k 1) MSB/MSW within n k SSW MSW=SSW/(n k) total n 1 SST df = degrees of freedom Example 11.2.12 Source df SS MS F between 3 995.90 331.97 26.09 within 15 190.83 12.72 total 18 1186.73 UW-Madison (Statistics) Stat 610 Lecture 9 2016 11 / 16

Hypotheses concerning linear functions of β in a linear model Under the linear model with Y N(Xβ,σ 2 I n ), we consider H 0 : Lβ = 0 versus H 1 : Lβ 0, where L is an s p matrix of rank s p. For the special case of one-way ANOVA with H 0 : µ 1 = µ 2 = = µ k, p = k, s = k 1 and 1 0 0 1 L = 0 1 0 1 0 0 1 1 We consider the likelihood ratio test. The likelihood function is L(β,σ 2 ) = 1 (2πσ 2 ) exp { n/2 { 1 (2πσ 2 exp ) n/2 } Y Xβ 2 2σ 2 } Y X β 2 2σ 2 UW-Madison (Statistics) Stat 610 Lecture 9 2016 12 / 16

since Y Xβ 2 Y X β 2 for any β and the LSE β. Treating the right-hand side of this expression as a function of σ 2, it is easy to show that it has a maximum at σ 2 = σ 2 = Y X β 2 /n and supl(β,σ 2 ) = (2π σ 2 ) n/2 e n/2. β,σ 2 Similarly, let β H0 be the LSE under H 0 and σ 2 H 0 = Y X β H0 2 /n. Then Thus, the LR is sup L(β,σ 2 ) = (2π σ H 2 0 ) n/2 e n/2. Lβ=0,σ 2 λ = ( σ 2 / σ H 2 0 ) n/2 Y X β n = Y X β H0 n Next, we derive the distribution of a monotone function of λ under Lβ = 0 so that we can obtain a size α test. For the given L, there exists a (p s) p matrix M such that G = (M L ) is nonsingular. Let γ = Gβ, whose last s components are 0 s when Lβ = 0. UW-Madison (Statistics) Stat 610 Lecture 9 2016 13 / 16

Note that Xβ = XG 1 Gβ = (XG 1 )γ = W γ, W = XG 1 Then the problem reduces to testing whether the last s components of γ are 0 s with Y N(W γ,σ 2 I n ), γ = G β, and X β = W γ. Under H 0, X β H0 = W 1 (W 1 W 1) 1 W 1 Y, where W 1 is the first p s columns of W, i.e., W = (W 1 W 2 ). Then Y X β 2 = Y W (W W ) 1 W Y 2 = Y (I n H w )Y, where H w = W (W W ) 1 W, a projection matrix of rank p, and Y X β H0 2 = Y W 1 (W 1 W 1) 1 W 1 Y 2 = Y (I n H w1 )Y, where H w1 = W 1 (W 1 W 1) 1 W 1, a projection matrix of rank p s. Since Y Y = Y H w1 Y + Y (H w H w1 )Y + Y (I n H w )Y and the sum of the ranks of H w1, H w H w1, and I n H w is p s + s + n p = n, by Cochran s theorem, Y (H w H w1 )Y and Y (I n H w )Y are independently chi-square distributed. UW-Madison (Statistics) Stat 610 Lecture 9 2016 14 / 16

Hence F = Y (H w H w1 )Y /s Y (I n H w )Y /(n p) = [ Y X β H0 2 Y X β 2 ]/s Y X β 2 /(n p) which is a decreasing function of the LR λ, has the noncentral F-distribution with degrees of freedom s and n p and the noncentrality parameter δ = γ W (H w H w1 )W γ/σ 2. Under H 0, δ = 0 (exercise) and, hence, the LR test of size α rejects H 0 when F > F s,n p,α, and the power of this test is related to the noncentral F-distribution. Examples For the one-way ANOVA with H 0 : µ 1 = µ 2 = = µ k, the result holds with s = k 1 and p = k. Consider the two-way balanced ANOVA with H 0 : α 1 = = α a 1 = 0. The statistic F can be calculated using the SSR under the two-way balanced ANOVA model and the SSR under the reduced model with α i s equal to 0. UW-Madison (Statistics) Stat 610 Lecture 9 2016 15 / 16

Alternatively, we can use the following decomposition: (Y ijk Ȳij + α i + β ) 2 j + γ ij (Y ijk Ȳ ) 2 = i,j,k i,j,k = bc α 2 i i + ac j β 2 j = SSA + SSB + SSAB + SSR + c γ 2 ( ij + Yijk Ȳij ) 2 i,j i,j,k The LR tests of size α can be obtained using the following table. ANOVA table Source df SS MS F A a 1 SSA MSA = SSA a 1 B b 1 SSB MSB = SSB b 1 AB (a 1)(b 1) SSAB MSAB = SSAB (a 1)(b 1) Residual ab(c 1) SSR MSR = SSR ab(c 1) total abc 1 SST df = degrees of freedom MSA MSR MSB MSR MSAB MSR UW-Madison (Statistics) Stat 610 Lecture 9 2016 16 / 16