Lecture 26: Likelihood ratio tests Likelihood ratio When both H 0 and H 1 are simple (i.e., Θ 0 = {θ 0 } and Θ 1 = {θ 1 }), Theorem 6.1 applies and a UMP test rejects H 0 when f θ1 (X) f θ0 (X) > c 0 for some c 0 > 0. The following definition is a natural extension of this idea. Definition 6.2 Let l(θ) = f θ (X) be the likelihood function. For testing H 0 : θ Θ 0 versus H 1 : θ Θ 1, a likelihood ratio (LR) test is any test that rejects H 0 if and only if λ(x) < c, where c [0,1] and λ(x) is the likelihood ratio defined by / λ(x) = sup l(θ) θ Θ 0 supl(θ). θ Θ UW-Madison (Statistics) Stat 710, Lecture 26 Jan 2012 1 / 10
Lecture 26: Likelihood ratio tests Likelihood ratio When both H 0 and H 1 are simple (i.e., Θ 0 = {θ 0 } and Θ 1 = {θ 1 }), Theorem 6.1 applies and a UMP test rejects H 0 when f θ1 (X) f θ0 (X) > c 0 for some c 0 > 0. The following definition is a natural extension of this idea. Definition 6.2 Let l(θ) = f θ (X) be the likelihood function. For testing H 0 : θ Θ 0 versus H 1 : θ Θ 1, a likelihood ratio (LR) test is any test that rejects H 0 if and only if λ(x) < c, where c [0,1] and λ(x) is the likelihood ratio defined by / λ(x) = sup l(θ) θ Θ 0 supl(θ). θ Θ UW-Madison (Statistics) Stat 710, Lecture 26 Jan 2012 1 / 10
Discussions If λ(x) is well defined, then λ(x) 1. The rationale behind LR tests is that when H 0 is true, λ(x) tends to be close to 1, whereas when H 1 is true, λ(x) tends to be away from 1. If there is a sufficient statistic, then λ(x) depends only on the sufficient statistic. LR tests are as widely applicable as MLE s in 4.4 and, in fact, they are closely related to MLE s. If θ is an MLE of θ and θ 0 is an MLE of θ subject to θ Θ 0 (i.e., Θ 0 is treated as the parameter space), then λ(x) = l( θ 0 ) / l( θ). For a given α (0,1), if there exists a c α [0,1] such that sup P θ (λ(x) < c α ) = α, θ Θ 0 then an LR test of size α can be obtained. Even when the c.d.f. of λ(x) is continuous or randomized LR tests are introduced, it is still possible that such a c α does not exist. UW-Madison (Statistics) Stat 710, Lecture 26 Jan 2012 2 / 10
Optimality When a UMP or UMPU test exists, an LR test is often the same as this optimal test. Proposition 6.5 Suppose that X has a p.d.f. in a one-parameter exponential family: f θ (x) = exp{η(θ)y(x) ξ(θ)}h(x) w.r.t. a σ-finite measure ν, where η is a strictly increasing and differentaible function of θ. (i) For testing H 0 : θ θ 0 versus H 1 : θ > θ 0, there is an LR test whose rejection region is the same as that of the UMP test T given in Theorem 6.2. (ii) For testing H 0 : θ θ 1 or θ θ 2 versus H 1 : θ 1 < θ < θ 2, there is an LR test whose rejection region is the same as that of the UMP test T given in Theorem 6.3. (iii) For testing the other two-sided hypotheses, there is an LR test whose rejection region is equivalent to Y(X) < c 1 or Y(X) > c 2 for some constants c 1 and c 2. UW-Madison (Statistics) Stat 710, Lecture 26 Jan 2012 3 / 10
Optimality When a UMP or UMPU test exists, an LR test is often the same as this optimal test. Proposition 6.5 Suppose that X has a p.d.f. in a one-parameter exponential family: f θ (x) = exp{η(θ)y(x) ξ(θ)}h(x) w.r.t. a σ-finite measure ν, where η is a strictly increasing and differentaible function of θ. (i) For testing H 0 : θ θ 0 versus H 1 : θ > θ 0, there is an LR test whose rejection region is the same as that of the UMP test T given in Theorem 6.2. (ii) For testing H 0 : θ θ 1 or θ θ 2 versus H 1 : θ 1 < θ < θ 2, there is an LR test whose rejection region is the same as that of the UMP test T given in Theorem 6.3. (iii) For testing the other two-sided hypotheses, there is an LR test whose rejection region is equivalent to Y(X) < c 1 or Y(X) > c 2 for some constants c 1 and c 2. UW-Madison (Statistics) Stat 710, Lecture 26 Jan 2012 3 / 10
Proof We prove (i) only. Let θ be the MLE of θ. Note that l(θ) is increasing when θ θ and decreasing when θ > θ. Thus, { 1 θ θ 0 λ(x) = l(θ 0 ) l( θ) θ > θ 0. Then λ(x) < c is the same as θ > θ 0 and l(θ 0 )/l( θ) < c. From the property of exponential families, θ is a solution of the likelihood equation logl(θ) θ = η (θ)y(x) ξ (θ) = 0 and ψ(θ) = ξ (θ)/η (θ) has a positive derivative ψ (θ). Since η ( θ)y ξ ( θ) = 0, θ is an increasing function of Y and d θ dy > 0. UW-Madison (Statistics) Stat 710, Lecture 26 Jan 2012 4 / 10
Proof (continued) Consequently, for any θ 0 Θ, d [ ] logl( θ) logl(θ 0 ) dy = d [ ] η( θ)y ξ( θ) η(θ 0 )Y + ξ(θ 0 ) dy = d θ dy η ( θ)y + η( θ) d θ dy ξ ( θ) η(θ 0 ) = d θ dy [η ( θ)y ξ ( θ)]+η( θ) η(θ 0 ) = η( θ) η(θ 0 ), which is positive (or negative) if θ > θ 0 (or θ < θ 0 ), i.e., logl( θ) logl(θ 0 ) is strictly increasing in Y when θ > θ 0 and strictly decreasing in Y when θ < θ 0. Hence, for any d R, θ > θ 0 and l(θ 0 )/l( θ) < c is equivalent to Y > d for some c (0,1). UW-Madison (Statistics) Stat 710, Lecture 26 Jan 2012 5 / 10
Example 6.20 Consider the testing problem H 0 : θ = θ 0 versus H 1 : θ θ 0 based on i.i.d. X 1,...,X n from the uniform distribution U(0,θ). We now show that the UMP test with rejection region X (n) > θ 0 or X (n) θ 0 α 1/n given in Exercise 19(c) is an LR test. Note that l(θ) = θ n I (X(n), )(θ). Hence { (X(n) /θ λ(x) = 0 ) n X (n) θ 0 0 X (n) > θ 0 and λ(x) < c is equivalent to X (n) > θ 0 or X (n) /θ 0 < c 1/n. Taking c = α ensures that the LR test has size α. Example 6.21 Consider normal linear model X = N n (Z β,σ 2 I n ) and the hypotheses H 0 : Lβ = 0 versus H 1 : Lβ 0, where L is an s p matrix of rank s r and all rows of L are in R(Z). UW-Madison (Statistics) Stat 710, Lecture 26 Jan 2012 6 / 10
Example 6.20 Consider the testing problem H 0 : θ = θ 0 versus H 1 : θ θ 0 based on i.i.d. X 1,...,X n from the uniform distribution U(0,θ). We now show that the UMP test with rejection region X (n) > θ 0 or X (n) θ 0 α 1/n given in Exercise 19(c) is an LR test. Note that l(θ) = θ n I (X(n), )(θ). Hence { (X(n) /θ λ(x) = 0 ) n X (n) θ 0 0 X (n) > θ 0 and λ(x) < c is equivalent to X (n) > θ 0 or X (n) /θ 0 < c 1/n. Taking c = α ensures that the LR test has size α. Example 6.21 Consider normal linear model X = N n (Z β,σ 2 I n ) and the hypotheses H 0 : Lβ = 0 versus H 1 : Lβ 0, where L is an s p matrix of rank s r and all rows of L are in R(Z). UW-Madison (Statistics) Stat 710, Lecture 26 Jan 2012 6 / 10
Example 6.21 (continued) The likelihood function in this problem is l(θ) = ( 1 2πσ 2 ) n/2 exp { 1 2σ 2 X Z β 2 }, θ = (β,σ 2 ). Since X Z β 2 X Z β 2 for any β and the LSE β, ( ) n/2 { l(θ) 1 2πσ exp 1 X Z β } 2. 2 2σ 2 Treating the right-hand side of this expression as a function of σ 2, it is easy to show that it has a maximum at σ 2 = σ 2 = X Z β 2 /n and supl(θ) = (2π σ 2 ) n/2 e n/2. θ Θ Similarly, let β H0 be the LSE under H 0 and σ H 2 0 = X Z β H0 2 /n. Then sup l(θ) = (2π σ H 2 0 ) n/2 e n/2. θ Θ 0 Thus, ( λ(x) = ( σ 2 / σ H 2 0 ) n/2 X Z = β ) n/2 2 X Z β. H0 2 UW-Madison (Statistics) Stat 710, Lecture 26 Jan 2012 7 / 10
Example 6.21 (continued) For a two-sample problem, we let n = n 1 + n 2, β = (µ 1, µ 2 ), and ( Jn1 0 Z = 0 J n2 Testing H 0 : µ 1 = µ 2 versus H 1 : µ 1 µ 2 is the same as testing H 0 : Lβ = 0 versus H 1 : Lβ 0 with L = ( 1 1 ). Since β H0 = X and β = ( X 1, X 2 ), where X 1 and X 2 are the sample means based on X 1,...,X n1 and X n1 +1,...,X n, respectively, we have and n σ 2 = n 1 i=1 (X i X 1 ) 2 + n i=n 1 +1 ). (X i X 2 ) 2 = (n 1 1)S 2 1 +(n 2 1)S 2 2 n σ 2 H 0 = (n 1)S 2 = n 1 n 1 n 2 ( X 1 X 2 ) 2 +(n 1 1)S 2 1 +(n 2 1)S 2 2. UW-Madison (Statistics) Stat 710, Lecture 26 Jan 2012 8 / 10
Example 6.21 (continued) Therefore, λ(x) < c is equivalent to t(x) > c 0, where t(x) = ( X 2 X 1 ) / n 1 1 + n 1 2, [(n 1 1)S1 2 +(n 2 1)S2 2]/(n 1 + n 2 2) and LR tests are the same as the two-sample two-sided t-tests in 6.2.3. Asymptotic tests It is often difficult to construct tests (such as LR tests) with exactly size α or level α. Asymptotic approximation can be used Statistical inference based on asymptotic criteria and approximations is called asymptotic statistical inference or simply asymptotic inference. We now focus on asymptotic hypothesis tests. UW-Madison (Statistics) Stat 710, Lecture 26 Jan 2012 9 / 10
Example 6.21 (continued) Therefore, λ(x) < c is equivalent to t(x) > c 0, where t(x) = ( X 2 X 1 ) / n 1 1 + n 1 2, [(n 1 1)S1 2 +(n 2 1)S2 2]/(n 1 + n 2 2) and LR tests are the same as the two-sample two-sided t-tests in 6.2.3. Asymptotic tests It is often difficult to construct tests (such as LR tests) with exactly size α or level α. Asymptotic approximation can be used Statistical inference based on asymptotic criteria and approximations is called asymptotic statistical inference or simply asymptotic inference. We now focus on asymptotic hypothesis tests. UW-Madison (Statistics) Stat 710, Lecture 26 Jan 2012 9 / 10
Definition 2.13 Let X = (X 1,...,X n ) be a sample from P P and T n (X) be a test for H 0 : P P 0 versus H 1 : P P 1. (i) If limsup n α Tn (P) α for any P P 0, then α is an asymptotic significance level of T n. (ii) If lim n sup P P0 α Tn (P) exists, it is called the limiting size of T n. (iii) T n is consistent iff the type II error probability converges to 0. Discussion If P 0 is not a parametric family, it is likely that the limiting size of T n is 1 (see, e.g., Example 2.37). This is the reason why we consider the weaker requirement in Definition 2.13(i). If α (0,1) is a pre-assigned level of significance for the problem, then a consistent test T n having asymptotic significance level α is called asymptotically correct, and a consistent test having limiting size α is called strongly asymptotically correct. UW-Madison (Statistics) Stat 710, Lecture 26 Jan 2012 10 / 10
Definition 2.13 Let X = (X 1,...,X n ) be a sample from P P and T n (X) be a test for H 0 : P P 0 versus H 1 : P P 1. (i) If limsup n α Tn (P) α for any P P 0, then α is an asymptotic significance level of T n. (ii) If lim n sup P P0 α Tn (P) exists, it is called the limiting size of T n. (iii) T n is consistent iff the type II error probability converges to 0. Discussion If P 0 is not a parametric family, it is likely that the limiting size of T n is 1 (see, e.g., Example 2.37). This is the reason why we consider the weaker requirement in Definition 2.13(i). If α (0,1) is a pre-assigned level of significance for the problem, then a consistent test T n having asymptotic significance level α is called asymptotically correct, and a consistent test having limiting size α is called strongly asymptotically correct. UW-Madison (Statistics) Stat 710, Lecture 26 Jan 2012 10 / 10