Lecture 13: p-values and union intersection tests p-values After a hypothesis test is done, one method of reporting the result is to report the size α of the test used to reject H 0 or accept H 0. If α is small, the rejecting H 0 is convincing, but if α is large, rejecting H 0 is not very convincing. Another way of reporting the result of a hypothesis test is to report the value of a certain kind of test statistic called p-value. Definition 8.3.26. A p-value p(x) is a test statistic satisfying 0 p(x) 1 for every x X. mall values of p(x) give evidence that H 1 is true. A p-value is valid iff for every θ Θ 0 and every α [0,1], P θ (p(x) α) α If p(x) is a valid p-value, then the test that rejects H 0 iff p(x) α is a level α test. UW-Madison (tatistics) tat 610 Lecture 13 2016 1 / 16
An advantage to reporting a test result via a p-value is that each person can choose the α he or she considers appropriate and then can compare the p-value to α and know whether the data lead to acceptance or rejection of H 0. The smaller the p-value, the stronger the evidence for rejecting H 0. A p-value reports the results of a test on a more continuous scale, rather than just the dichotomous decision accept" or reject" H 0. The most common way to define a p-value is to define it as the probability of a test statistic is more extreme than its observed value under the null hypothesis. The follow result shows the validity of this p-value. Theorem 8.3.27. Let W (X) be a statistic such that large values of W give evidence that H 1 is true. For each observed sample value x, define p(x) = sup θ Θ 0 P θ (W (X) W (x)). Then p(x) is a valid p-value. UW-Madison (tatistics) tat 610 Lecture 13 2016 2 / 16
Proof. For a fixed θ Θ 0, let F θ (w) denote the cdf of W (X), and p θ (x) = P θ (W (X) W (x)) = P θ ( W (X) W (x)) = F θ ( W (x)). For the random variable p θ (X) = F θ ( W (X)), by the probability integral transformation (Exercise 2.10), P θ (p θ (X) α) α for every α [0,1] ince p(x) = sup θ Θ0 p θ (x) p θ (x) for every x X, P θ (p(x) α) P θ (p θ (X) α) α This is true for every θ Θ 0 and every α [0,1]; hence p(x) is a valid p-value. Another common way to define a p-value involves a class of tests T α indexed by α such that T α is a level α test. We can then define a p-value to be the smallest possible level α at which H 0 would be rejected for the computed T α (x), i.e., p(x) = inf{α (0,1) : T α (x) rejects H 0 } UW-Madison (tatistics) tat 610 Lecture 13 2016 3 / 16
Thus, if p(x) is observed, then we reject H 0 if α p(x), and accept H 0 if α < p(x), based on the observed data x. We now show that this p-value p(x) is valid. Note that inf{t (0,1) : T t (x) rejects H 0 } α implies T α (x) rejects H 0. Hence, for any θ Θ 0, because T α has level α. P θ (p(x) α) P θ (T t (X) rejects H 0 ) α Example 8.3.28 (two sided normal p-value) Let X 1,...,X n be iid from N(µ,σ 2 ) with unknown µ R and σ 2 > 0. Consider testing two sided hypotheses H 0 : µ = µ 0 versus H 1 : µ µ 0 with a constant µ 0. From the last lecture, a UMPU test rejects H 0 for large values of t(x) = n X µ 0 /. If µ = µ 0 (H 0 holds), regardless of the value of σ, t(x) has the t-distribution with degrees of freedom n 1 and, hence, UW-Madison (tatistics) tat 610 Lecture 13 2016 4 / 16
sup P θ ( t(x) > t(x) ) = P µ=µ0 ( t(x) > t(x) ) θ Θ 0 = P( T n 1 > t(x) ) = 2P(T n 1 > t(x) ), where T n 1 denotes a random variable t-distribution with degrees of freedom n 1 which is symmetric about 0. Hence, p(x) defined in Theorem 8.3.27 is 2P(T n 1 > t(x) ). On the other hand, the UMPU test of size α rejects H 0 iff t(x) > t n 1,α/2, where P(T n 1 > t n 1,α/2 ) = α/2. ince t n 1,α/2 as a function of α is continuous, we obtain that inf{t (0,1) : the UMPU test of size t rejects H 0 } = 2P(T n 1 > t(x) ) Thus, the two definitions of p-value are the same. Example 8.3.29 (one-sided normal p-value) Let X 1,...,X n be iid from N(µ,σ 2 ) with unknown µ R and σ 2 > 0. Consider testing one-sided hypotheses H 0 : µ µ 0 versus H 1 : µ > µ 0 withuw-madison a constant (tatistics) µ. tat 610 Lecture 13 2016 5 / 16
From the last lecture, the UMPU test of size α rejects H 0 for large values of t(x) = n( X µ 0 )/. The p-value defined in Theorem 8.3.27 is ( ) n( X µ0 ) sup P θ (t(x) t(x)) = sup P µ t(x) θ Θ 0 θ Θ 0 ( ) n( X µ) n(µ0 µ) = sup P µ t(x) + θ Θ 0 ( ) n(µ0 µ) = sup P µ T n 1 t(x) + θ Θ 0 = P (T n 1 t(x)) imilar to the two-sided hypothesis case, inf{t (0,1) : the UMPU test of size t rejects H 0 } = P(T n 1 > t(x) ) UW-Madison (tatistics) tat 610 Lecture 13 2016 6 / 16
Union-intersection tests In some problems, tests for a complicated null hypothesis can be developed from a number of tests for simpler null hypotheses. The union-intersection method deals with the following hypotheses H 0 : θ Θ γ versus H 1 : θ γ Γ γ ΓΘ c γ where Θ γ Θ for any γ Γ and Γ is an index set that may be finite or infinite. uppose that, for each γ Γ, we have a test for H 0γ : θ Θ γ versus H 1γ : θ Θ c γ with rejection region R γ. Then the rejection region for the union-intersection test (UIT) is R = γ ΓR γ This is because, H 0 is true iff H 0γ holds for every γ Γ so that if any one of H 0γ is rejected, H 0 must be rejected; only if each of H 0γ is accepted will the intersection H 0 be accepted. UW-Madison (tatistics) tat 610 Lecture 13 2016 7 / 16
If each rejection region is of the form R γ = {x : T γ (x) > c}, where T γ is a statistic and c does not depend on γ, then the rejection region for the UIT becomes { } {x : T γ (x) > c} = γ Γ x : supt γ (x) > c γ Γ The form of sup γ Γ T γ (x) may be derived, as the following trivial example indicates. Other examples can be found in Chapter 11. Example 8.2.8. Let X 1,...,X n be iid from N(µ,σ 2 ) with unknown µ R and σ 2 > 0. Consider Γ = {L,U} (two elements), H 0L : µ µ 0 and H 0U : µ µ 0, where µ 0 is a constant. H 0 : µ {µ µ 0 } {µ µ 0 } = {µ 0 } i.e., the intersection of two one-sided null hypotheses is a two-sided null hypothesis. UW-Madison (tatistics) tat 610 Lecture 13 2016 8 / 16
The LRT (as well as UMPU test) of size α for testing H 0L : µ µ 0 versus H 1L : µ > µ 0 rejects H 0L iff n( X µ 0 )/ t n 1,α. The LRT (as well as UMPU test) of size α for testing H 0U : µ µ 0 versus H 1U : µ < µ 0 rejects H 0U iff n( X µ 0 )/ t n 1,α, which is the same as n( X µ 0 )/ t n 1,α. Then sup T γ (X) = max γ=l,u { n( X µ0 ), n( X } µ 0 ) n X µ0 = and the UIT for testing H 0 : µ = µ 0 versus H 1 : µ µ 0 rejects iff n X µ0 / t n 1,α, which is the LRT or UMPU test of size 2α. Intersection-union tests The intersection-union method may be useful when the null hypothesis is conveniently expressed as a union: H 0 : θ Θ γ versus H 1 : θ γ Γ γ ΓΘ c γ UW-Madison (tatistics) tat 610 Lecture 13 2016 9 / 16
uppose that, for each γ Γ, we have a test for H 0γ : θ Θ γ versus H 1γ : θ Θ c γ with rejection region R γ. Then the rejection region for the intersection-union test (IUT) is R = γ ΓR γ H 0 is false iff all of the H 0γ are false so H 0 can be rejected iff each of H 0γ can be rejected. Again, the test can be simplified if each R γ is of the form {x : T γ (x) c} with a c not depending on γ, in which case the rejection region for the IUT is R = γ Γ{x { } : T γ (x) > c} = x : inf T γ(x) > c γ Γ Example 8.2.9 Let X 1,...,X n be iid from N(µ,σ 2 ) with unknown µ R and σ 2 > 0, and let Y 1,...,Y m be iid Bernoulli variables with unknown p (0,1). For the quality control, we want µ > µ 0 and p > p 0, where u 0 and p 0 are constants. UW-Madison (tatistics) tat 610 Lecture 13 2016 10 / 16
Then, we would like to test H 0 : {µ µ 0 } {p p 0 } versus H 1 : {µ > µ 0 } {p > p 0 } The LRT or UMPU test of size α rejects H 01 : µ µ 0 iff n( X µ0 )/ > t n 1,α/2, and the LRT or UMP test rejects H 02 : p p 0 iff m i=1 Y i > b for a constant b. The rejection region for the IUT is { n( X µ0 ) > t n 1,α, m i=1 Y i > b The level and size of a UIT or IUT Although UIT s and IUT s are easy to construct, their properties may not be easy to obtain. The type I error probabilities of UIT s and IUT s can often be bounded above by the level or size of another test. uch bounds are useful to derive the level of a UIT or IUT, but to derive the size of a UIT or IUT may be difficult, because the bounds may not be sharp. } UW-Madison (tatistics) tat 610 Lecture 13 2016 11 / 16
Theorem 8.3.21. Let λ γ (X) be the LRT statistic for testing H 0γ : θ Θ γ versus H 1γ : θ Θ c γ, γ Γ, and let λ(x) be the LRT statistic for testing H 0 : θ Θ 0 versus H 1 : θ Θ c 0, where Θ 0 = γ Γ Θ γ. Define T (X) = inf γ Γ λ γ (X) and form the UIT with rejection region Then {x : λ γ (x) < c for some γ Γ} = {x : T (x) < c} a. T (x) λ(x) for every x X. b. If β T (θ) and β λ (θ) are the power functions of the tests based on T and λ, respectively, then β T (θ) β λ (θ) for every θ Θ. c. If the LRT is a level α test, then the UIT is a level α test. Proof. ince Θ 0 = γ Γ Θ γ Θ γ for any γ Γ, by the definition of the LRT, λ γ (x) λ(x), x X, γ Γ, which implies that This proves a. T (x) = inf γ Γ λ γ(x) λ(x), x X UW-Madison (tatistics) tat 610 Lecture 13 2016 12 / 16
By a, {x : T (x) < c} {x : λ(x) < c} so that b follows because β T (θ) = P θ (T (X) < c) P θ (λ(x) < c) = β λ (θ), From b, if the LRT has level α, then sup β T (θ) sup β λ (θ) α. θ Θ 0 θ Θ 0 θ Θ Theorem 8.3.21 shows that the LRT is more powerful than the UIT and, hence, the LRT should be used when it can be derived. However, the LRT may not be easy to construct, and that is the reason we consider UIT. Theorem 8.3.21 also shows that the UIT may have smaller type I error probabilities than the LRT, although we may be satisfied as long as all type I error probabilities are α. The size of the UIT may be smaller than the size of the LRT. If H 0 is rejected, the UIT may provide us more information about which H 0γ is rejected. In some special cases, the UIT is the same as the LRT, i.e., T (x) = λ(x), x X (e.g., Example 8.2.8). UW-Madison (tatistics) tat 610 Lecture 13 2016 13 / 16
Theorem 8.3.23. For every γ Γ, let α γ be the size of the test with rejection region R γ for testing H 0γ : θ Θ γ versus H 1γ : θ Θ c γ. For testing H 0 : θ Θ 0 versus H 1 : θ Θ c 0, where Θ 0 = γ Γ Θ γ, the IUT with rejection region R = γ Γ R γ has level sup γ Γ α γ. Proof. Let θ Θ 0. Then θ Θ γ for some γ Γ, and P θ (X R) P θ (X R γ ) α γ supα γ γ Γ Then the result follows since θ Θ 0 is arbitrary. Typically, each α γ = α so that the IUT is of level α. In Theorem 8.3.21, tests are LRT s; but in Theorem 8.3.23, tests can be arbitrary. Note that an LRT of a given size or level may not be so easy to obtain. Theorem 8.3.23 gives a level for the IUT, not the size. The following result gives conditions under which the size of the IUT is exactly α. UW-Madison (tatistics) tat 610 Lecture 13 2016 14 / 16
Theorem 8.3.24. Let Θ 0 = k j=1 Θ j, where k is a fixed positive integer and Θ j Θ. For each j = 1,...,k, let R j be the rejection region of a level α test of H 0j : θ Θ j versus H 1j : θ Θ c j. uppose that for some integer i, 1 i k, there exists a sequence of parameter values θ l Θ i, l = 1,2,..., such that (i) lim l P θl (X R i ) = α; (ii) for every integer j i, 1 j k, lim l P θl (X R j ) = 1. Then, the IUT with rejection region k j=1 R j is a size α test for testing H 0 : θ Θ 0 versus H 1 : θ Θ c 0. Proof. By Theorem 8.3.23, the size of the IUT α. ince all θ l Θ i Θ 0, IUT s size = sup θ Θ 0 P θ (X R) lim l P θl (X R) = lim l P θl By Bonferroni s inequality, the last quantity is no smaller than ( X ) k R j j=1 UW-Madison (tatistics) tat 610 Lecture 13 2016 15 / 16
lim l [ k ] ( ) P θl X Rj (k 1) = (k 1) + α (k 1) = α j=1 by conditions (i)-(ii). Thus, the size must be α. Example 8.3.25. In Example 8.2.9 (k = 2), let X = (X 1,...,X n ), Y = (Y 1,...,Y m ), θ = (µ,σ 2,p), Θ 1 = {µ µ 0 }, and Θ 2 = {p p 0 }. For θ l = (µ 0,1,1 l 1 ), l = 1,2,..., θ l Θ 0 = Θ 1 Θ 2 for all l 2, and lim l P θ l ( (X,Y ) : lim l P θ l ( ) n( X µ0 ) > t n 1,α = α ) (X,Y ) : m i=1 Y i > b = 1 Hence, Theorem 8.3.24 applies and the size of the IUT is α. The size of test j in Theorem 8.3.24 does not have to be α. In Example 8.3.25, only the marginal distributions of X i s and Y i s are needed, not the joint distribution of (X,Y ). UW-Madison (tatistics) tat 610 Lecture 13 2016 16 / 16