8: Hypothesis Testing

Size: px

Start display at page:

Download "8: Hypothesis Testing"

Pearl Sharp
5 years ago
Views:

1 Some definitions 8: Hypothesis Testing. Simple, compound, null and alternative hypotheses In test theory one distinguishes between simple hypotheses and compound hypotheses. A simple hypothesis Examples: is an hypothesis that completely specifies the probability distribution. The parameter of this binomial distribution is p = 0.6. This distribution is a normal one of average µ = 4.5 and standard deviation σ =.23. The new treatments gives identical results to the previous one. A compound hypothesis does not completely specify the distribution. Examples: The parameter p of this binomial distribution is greater than 0.6. These two distributions of common variance have the same mean. The new treatment gives better results than the previous one. Often one has to consider the alternative to the proposed hypothesis. For example, if the parameter of this binomial law is not 0.6, does this just mean that data are not distributed according to p = 0.6, or more specifically that p = 0.7, p < 0.6, or that the distribution is not binomial at all? There is somehow an asymmetry between the hypotheses one aims at checking: The default hypothesis, traditionally noted H 0, is called the null hypothesis The alternative hypothesis is noted H.2 Type I and Type II Errors Two types of errors can be made: a Type I Error happens when the null hypothesis H 0 was rejected though it should have been accepted, and a Type II Error occurs when the alternative hypothesis H was rejected though it should have been accepted, or, equivalently, when the null hypothesis H 0 was accepted though it should have been rejected. An example can be seen in the case of a criminal trial, in a democratic state where the rule is innocent until proven guilty. The null hypothesis is thus he is innocent, the alternative one is he is guilty ; a type I error consists of condemning an innocent person, and a type II error consists of letting free a guilty. A test is a procedure which divides the space of observations into 2 regions, R and A. The two important characteristics of a test are called significance and power, which refer to errors of type I and II respectively: Significance = α = Prob (xϵr H 0 ) = Prob (x H 0 ) dx = Prob (x H 0 ) dx R A Power = β = Prob (xϵa H ) = Prob (x H ) dx = Prob (x H ) dx A R

2 The determination of a test is usually a trade-off between α and β. One commonly encountered procedure is to set a priori the significance to a fixed value (α = 0.0, 0.05,...) and find the most powerful test. To make β as small as possible for a given α, the integral over the chosen rejection region R Prob (x H ) dx = β must be as large as possible, for a given R Prob (x H 0) dx = α. In the case where data consist of one measurement, say x, the choice of α sets β, through the test x < x c. In other cases, different tests correspond tho the same given α. It should be noted that in the following nothing is known about the a priori probability, if such a thing exists, of the hypothesis H 0 with respect to that of H. For example, if we are dealing with the one-by-one identification of two types of cell in a test-tube, the formalism makes no use of their relative concentration. Back to our example of a trial, the procedure is : given a priori a (low) risk of condemning an innocent, what is the most powerful method to convict guilty people? However, in a non-democratic State, where the null hypothesis is he is guilty, the procedure would be given a priori a low risk of releasing a real guilty, what is the most powerful method to prove one s innocence?. This shows how asymmetric H 0 and H are. 2 The Neyman Pearson Test The Neyman Pearson test applies to the case of a simple null hypothesis against a simple alternative hypothesis. The rejection region is determined by the following theorem: For a given α, the most powerful test rejects H 0 in a region such as Prob (x H ) Prob (x H 0 ) > k Let s first give a rigorous proof, then a more intuitive one. Rigorous proof. Let R be the rejection region, defined by Prob (x H ) / Prob (x H 0 ) > k. By definition of the acceptance α, we have Prob (xϵr H 0 ) = α. Let be another test of significance α, of rejection region S, with α α. We want to show that this new test is less powerful, i.e. that One has: Prob (xϵs H ) < Prob (xϵr H ) α = Prob (xϵr S H 0 ) + Prob (xϵr S H 0 ) α = Prob (xϵr S H 0 ) + Prob (xϵs R H 0 ) α α Prob (xϵr S H 0 ) Prob (xϵr S H 0 ) R is defined by For any region I inside R, Prob (x H ) Prob (x H 0 ) > k Prob (xϵi H ) > k Prob (xϵi H 0 ) and for any region O outside R, Prob (xϵo H ) k Prob (xϵo H 0 ) Thus 2

3 Prob (xϵr S H 0 ) < k Prob (xϵr S H ) Prob (xϵr S H 0 ) k Prob (xϵr S H ) k Prob (xϵr S H ) < Prob (xϵr S H 0 ) < k Prob (xϵr S H ) Prob (xϵr S H ) < Prob (xϵr S H ) Adding to the latter two terms the quantity Prob (xϵr S H ), one gets the final result: Prob (xϵs H ) < Prob (xϵr H ) The requirement of H 0 and H be simple is essential to be able to write expressions involving probabilities like Prob (xϵs H ),... A more intuitive proof can be given at follows: assume we have defined an acceptance region A. Let s further assume we want to slightly modify it, keeping the same significance α; this is achieved by adding a small region 2 and removing, of equal weight in terms of H 0 : Prob (xϵ H 0 ) = Prob (xϵ 2 H 0 ). If we want to increase the power of the test, we must fulfill the condition Prob (xϵ 2 H ) > Prob (xϵ H ) which is, by construction of A, impossible. 3 The Neyman Pearson Theorem at work 3. First Example: Gaussian distributions Let be n observations from a Gaussian distribution of unknown mean µ, but of known variance σ 2. µ can be either µ 0 (null hypothesis), or µ (alternative hypothesis). Note that both hypotheses are simple ones. Let s assume that µ > µ 0. The likelihood of the observation is Prob (x µ) = Π i e (x i µ)2 2 σ 2 2π σ = = ( Σ i (x i µ) 2 2π) n σ n e 2 σ 2 ( 2π) n σ n e nµ 2 2 σ 2 e nµx σ 2 e Σx2 2 σ 2 In the framework of the Neyman Pearson test, the ratio of the likelihood reads: Prob (x µ ) Prob (x µ 0 ) = K e n(µ µ0)x σ 2 where K is a constant which does not depend on the observations. The Neyman Pearson theorem tells us to reject H 0 if e n(µ µ 0 )x σ 2 where k is here the generic term for a constant. Since µ µ 0 > 0, the test will reject the hypothesis µ = µ 0 if x > µ c. The value of µ c is determined by the equation > k Prob (x > µ c ) = α 3

4 As an example, if we set α = 0.05, this corresponds to µ c = µ σ/ n If we now test H against H 0, we get the result that one has to reject H if x < µ D, with Prob (x < µ d ) = α µ d = µ.645 σ/ n, for α = 0.05 These striking results show how asymmetric are the roles played by the null and the alternative hypotheses. 3.2 Second Example: Binomial distributions We have performed n observations from a binomial law, and got r successes. The null hypothesis is the parameter of this distribution is p = p 0, and the alternative one is p = p, with p > p 0. In the Neyman Pearson formalism, the ratio of likelihoods is Prob (r H ) Prob (r H 0 ) Prob (r H ) Prob (r H 0 ) = ( p p 0 ) r ( p p 0 ) n r = ( p p 0 ) n ( p /( p ) p 0 /( p 0 ) )r > k ( p p 0 ) n ( p /( p ) p 0 /( p 0 ) )r > k r log p /( p ) p 0 /( p 0 ) > k r > r c A numerical example will introduce the concept of randomized tests: assume N = 0, p 0 = 0.5, p = 0.6. Let s set the significance to α = We are looking for r c such as Prob (r > r c p = 0.5) = Looking at the tables show that Two options are opened: Prob (r > 7 p = 0.5) = Prob (r > 8 p = 0.5) = Change the significance of the test to α = by rejecting if r > 7; Decide that in the case of r = 8, reject H 0 with a probability γ such as; γ Prob (r = 8 p = 0.5) + Prob (r > 8 p = 0.5) = 0.05 One finds γ = In other words, chance will help in deciding. Let s now consider a slightly modified test: we set a priori r to a given value, and perform n experiments until we get r successes. The formulae read: ( ) n Prob (n) = p r ( p) n r r Prob (n p = p ) Prob (n p = p 0 ) = ( p ) r ( p ) r ( p 0 ) n p 0 p 0 p Prob (n p = p ) Prob (n p = p 0 ) > k ( p 0 ) n > k p 4

5 where the last equation comes from the fact that it is the only term depends on n. This leads to reject the hypothesis p = p 0 if n < n c, since p 0 p < One can see, comparing with the first method described at the beginning of this section, that the same pair of results (n, r) can lead to a different conclusion, depending on the way it was obtained. This is the classical criticism addressed by Bayesian statisticians to the Neynam Pearson theory. 4 Uniformly Powerful Tests 4. Introduction In the previous two sections we have discussed tests theory in the case of a simple null hypothesis against a simple alternative hypothesis. We have seen that the Neyman Pearson theorem gives the framework for finding the most powerful test, given a priori the significance. In this section we will extend these results to compound hypotheses, restricting ourselves to a specific class of distributions: the Exponential family. 4.2 Distributions of the Exponential family The Exponential family comprises all distributions which can be generically written: θt (x) Prob (x, θ) = C(θ) h(x)e Let s start with the particular case of the exponential distribution Prob (x, µ) = µ e x µ and lets again test the null hypothesis µ = µ 0 against the alternative one µ = µ, with µ > µ 0. The ratio of likelihoods can be written Prob (x µ = µ ) = ( µ 0 ) n e Σ x i Σ x i µ 0 µ Prob (x µ = µ 0 ) µ Prob (x µ = µ ) Prob (x µ = µ 0 ) > k Σ x i Σ x i > k µ 0 µ n Σx i > x c where x c is defined by Prob ( n Σx i > x c ) = α In the general case, the result is basically the same: Prob (x θ = θ ) = ( C(θ ) Prob (x θ = θ 0 ) C(θ 0 ) )n eθσt (xi) e θ 0ΣT (x i ) Prob (x θ = θ ) Prob (x θ = θ 0 ) > k (θ θ 0 )ΣT (x i ) > k As a consequence, the rejection region is n ΣT (x i) > T c 5

6 4.3 Simple null hypothesis against compound unilateral alternative hypothesis Let s now consider the test of the simple hypothesis H 0 : θ = θ 0 against the compound unilateral alternative H : θ > θ 0. We just saw that the test of H 0 against any simple alternative H : θ = θ > θ 0 does not depend at all on the value of θ. This test is therefore the most powerful one for each of the simple hypotheses composing H. It it said to be uniformly the most powerful. 4.4 Simple null hypothesis against compound bilateral alternative hypothesis We now want to test the simple hypothesis H 0 : θ = θ 0 against the alternative H : θ θ 0. The situation becomes more complex. One possibility is to envisage a reasonable solution: reject H 0 if ΣT (x i ) < k or if ΣT (x i ) > k 2, sharing the risk of type I errors: Prob (ΣT (x i ) < k H 0 ) = α 2 Prob (ΣT (x i ) > k 2 H 0 ) = α 2 5 Null compound hypothesis against simple alternative In this section we will show a generalization of the Neyman Pearson theorem to the case of a null compound hypothesis against simple alternative one. H 0 is the union of a (infinite) number of simple hypotheses. Any test of H 0 against H will always result to split the set of observations into two areas, the acceptance region A and the rejection region R. In such a situation one can generally not set a priori the risk of a type I error to a given value α. At best, one can set the conditions Prob (xϵr H 0) α, H 0 Prob (xϵr H ) maximum The generalization of the Neyman Pearson theorem reads: given H 0 compound of H 0, H the most powerful test of H 0 against the simple hypothesis H rejects if 0, H(3) 0 Prob (x H ) > k 0 Prob (x H 0) + k 0 Prob (x H 0 ) k (n) 0 Prob (x H (n) 0 ) The k (n) 0 being chosen to fulfill the condition: Prob (xϵr H 0) α, H 0,..., H(n) 0, Example: Let x be a sample from a Gaussian distribution N(µ, ); one wishes to test H 0 : µ = µ 0 or µ 0 against H : µ = µ. Let s assume µ 0 < µ 0 < µ. The extended Neyman Pearson theorem yields: e nµ x > k 0 e nµ 0 x + k 0 e nµ 0 x e n(µ µ 0 )x > k 0 + k 0 e n(µ 0 µ 0 )x The problem is now to determine k 0 and k 0. They can t both be negative, otherwise we would always reject H 0. Let s assume that k 0 is positive and k 0 either positive or negative. In such a case, the test rejects if x > x c, which was a predictable result. Because of the properties of the exponential distribution, Prob (x > x c µ 0) > Prob (x > x c µ 0) 6

7 In order to limit the risk of a Type I error to α, one has to choose x c so that Prob (x > x c µ 0 ) = α. Remains the case of k 0 negative and k 0 positive. Depending on the value of k 0, this would lead either to always reject H 0, or accept it inside a given interval, which is impossible. Conclusion: k, k > 0; the test is: reject H 0 is x > x c, with x c so that Prob (x > x c µ 0 ) = α. 6 A last word Absence of evidence is not evidence of absence 7

Preliminary Statistics. Lecture 5: Hypothesis Testing

Preliminary Statistics. Lecture 5: Hypothesis Testing Preliminary Statistics Lecture 5: Hypothesis Testing Rory Macqueen (rm43@soas.ac.uk), September 2015 Outline Elements/Terminology of Hypothesis Testing Types of Errors Procedure of Testing Significance