Lecture Notes 15 Hypothesis Testing (Chapter 10)

1 Itroductio Lecture Notes 15 Hypothesis Testig Chapter 10) Let X 1,..., X p θ x). Suppose we we wat to kow if θ = θ 0 or ot, where θ 0 is a specific value of θ. For example, if we are flippig a coi, we may wat to kow if the coi is fair; this correspods to p = 1/2. If we are testig the effect of two drugs whose meas effects are θ 1 ad θ 2 we may be iterested to kow if there is o differece, which correspods to θ 1 θ 2 = 0. We formalize this by statig a ull hypothesis H 0 ad a alterative hypothesis H 1. For example: H 0 : θ = θ 0 versus θ θ 0. More geerally, cosider a parameter space Θ. We cosider H 0 : θ Θ 0 versus H 1 : θ Θ 1 where Θ 0 Θ 1 =. If Θ 0 cosists of a sigle poit, we call this a simple ull hypothesis. If Θ 0 cosists of more tha oe poit, we call this a composite ull hypothesis. Example 1 X 1,..., X Beroullip). H 0 : p = 1 2 H 1 : p 1 2. The questio is ot whether H 0 is true or false. The questio is whether there is sufficiet evidece to reject H 0, much like a court case. Our possible actios are: reject H 0 or retai do t reject) H 0. H 0 true H 1 true Decisio Retai H 0 Reject H 0 Type I error false positive) Type II error false egative) Warig: Hypothesis testig should oly be used whe it is appropriate. Ofte times, people use hypothesis testig whe it would be much more appropriate to use cofidece itervals. 1

Notatio: Let Φ be the cdf of a stadard Normal radom variable Z. For 0 < α < 1, let z α = Φ 1 1 α). Hece, P Z > z α ) = α. Also, P Z < z α ) = α. I these otes we sometimes write px; θ) istead of p θ x). 2 Costructig Tests Hypothesis testig ivolves the followig steps: 1. Choose a test statistic T = T X 1,..., X ). 2. Choose a rejectio regio R. 3. If T R we reject H 0 otherwise we retai H 0. Example 2 Let X 1,..., X Beroullip). Suppose we test H 0 : p = 1 2 H 1 : p 1 2. Let T = 1 i=1 X i ad R = {x 1,..., x : T x 1,..., x ) 1/2 > δ}. So we reject H 0 if T 1/2 > δ. We eed to choose T ad R so that the test has good statistical properties. We will cosider the followig tests: 1. The Neyma-Pearso Test 2. The Wald test 3. The Likelihood Ratio Test LRT) 4. The permutatio test. Before we discuss these methods, we first eed to talk about how we evaluate tests. 3 Error Rates ad Power Suppose we reject H 0 whe X 1,..., X ) R. Defie the power fuctio by βθ) = P θ X 1,..., X R). We wat βθ) to be small whe θ Θ 0 ad we wat βθ) to be large whe θ Θ 1. The geeral strategy is: 1. Fix α [0, 1]. 2

2. Now try to maximize βθ) for θ Θ 1 subject to βθ) α for θ Θ 0. We eed the followig defiitios. A test is size α if sup βθ) α. θ Θ 0 Example 3 X 1,..., X Nθ, σ 2 ) with σ 2 kow. Suppose we test H 0 : θ = θ 0, H 1 : θ > θ 0. This is called a oe-sided alterative. Suppose we reject H 0 if T > c where The βθ) = P θ X θ 0 = P T = X θ 0. ) > c Z > c + θ where Φ is the cdf of a stadard Normal ad Z Φ. Now To get a size α test, set 1 Φc) = α so that X θ = P θ > c + θ ) ) 1 Φ c + θ ) sup βθ) = βθ 0 ) = 1 Φc). θ Θ 0 c = z α where z α = Φ 1 1 α). Our test is: reject H 0 whe T = X θ 0 > z α. Example 4 X 1,..., X Nθ, σ 2 ) with σ 2 kow. Suppose H 0 : θ = θ 0, H 1 : θ θ 0. This is called a two-sided alterative. We will reject H 0 if T > c where T is defied as before. Now βθ) = P θ T < c) + P θ T > c) ) ) X θ 0 = P θ < c X θ 0 + P θ > c = P Z < c + θ ) σ/ + P Z > c + θ ) = Φ c + θ ) σ/ + 1 Φ c + θ ) = Φ c + θ ) σ/ + Φ c θ ) 3

sice Φ x) = 1 Φx). The size is βθ 0 ) = 2Φ c). To get a size α test we set 2Φ c) = α so that c = Φ 1 α/2) = Φ 1 1 α/2) = z α/2. The test is: reject H 0 whe T = X θ 0 > z α/2. 4 The Neyma-Pearso Test Not i the book.) Let C α deote all level α tests. A test i C α with power fuctio β is uiformly most powerful UMP) if the followig holds: if β is the power fuctio of ay other test i C α the βθ) β θ) for all θ Θ 1. Cosider testig H 0 : θ = θ 0 versus H 1 : θ = θ 1. Simple ull ad simple alterative.) Theorem 5 Let Lθ) = px 1,..., X ; θ) ad T = Lθ 1) Lθ 0 ). Suppose we reject H 0 if T > k where k is chose so that This test is a UMP level α test. P θ0 X R) = α. The Neyma-Pearso test is quite limited because it ca be used oly for testig a simple ull versus a simple alterative. So it does ot get used i practice very ofte. But it is importat from a coceptual poit of view. 5 The Wald Test Let T = θ θ 0 se where θ is a asymptotically Normal estimator ad se is the estimated stadard error of θ or the stadard error uder H 0 ). Uder H 0, T N0, 1). Hece, a asymptotic level α test is to reject whe T > z α/2. That is P θ0 T > z α ) α. For example, with Beroulli data, to test H 0 : p = p 0, T = p p 0 4 p1 p).

You ca also use T = p p 0 p 0 1 p 0 ) I other words, to compute the stadard error, you ca replace θ with a estimate θ or by the ull value θ 0. 6 The Likelihood Ratio Test LRT) This test is simple: reject H 0 if λx 1,..., x ) c where where θ 0 maximizes Lθ) subject to θ Θ 0. Example 6 X 1,..., X Nθ, 1). Suppose After some algebra, So λx 1,..., x ) = sup θ Θ 0 Lθ) sup θ Θ Lθ) = L θ 0 ) L θ) H 0 : θ = θ 0, H 1 : θ θ 0. λ = exp { } 2 X θ 0 ) 2. R = {x : λ c} = {x : X θ 0 c } where c = 2 log c/. Choosig c to make this level α gives: reject if T > z α/2 where T = X θ 0 ) which is the test we costructed before. Example 7 X 1,..., X Nθ, σ 2 ). Suppose The H 0 : θ = θ 0, H 1 : θ θ 0. λx 1,..., x ) = Lθ 0, σ 0 ) L θ, σ) where σ 0 maximizes the likelihood subject to θ = θ 0. Exercise: Show that λx 1,..., x ) < c correspods to rejectig whe T > k for some costat k where T = X θ 0 S/. Uder H 0, T has a t-distributio with 1 degrees of freedom. So the fial test is: reject H 0 if T > t 1,α/2. This is called Studet s t-test. It was iveted by William Gosset workig at Guiess Breweries ad writig uder the pseudoym Studet. 5.

We ca simplify the LRT by usig a asymptotic approximatio. First, some otatio: Notatio: Let W χ 2 p. Defie χ 2 p,α by P W > χ 2 p,α) = α. Theorem 8 Cosider testig H 0 : θ = θ 0 versus H 1 : θ θ 0 where θ R. Uder H 0, 2 log λx 1,..., X ) χ 2 1. Hece, if we let T = 2 log λx ) the P θ0 T > χ 2 1,α) α as. Proof. Usig a Taylor expasio: ad so lθ) l θ) + l θ)θ θ) + l 2 θ θ) θ) 2 = l θ) + l 2 θ θ) θ) 2 P 2 log λx 1,..., x ) = 2l θ) 2lθ 0 ) 2l θ) 2l θ) l θ)θ θ) 2 = l θ)θ θ) 2 = l θ) I θ 0 ) I θ 0 ) θ θ 0 )) 2 = A B. Now A 1 by the WLLN ad B N0, 1). The result follows by Slutsky s theorem. Example 9 X 1,..., X Poissoλ). We wat to test H 0 : λ = λ 0 versus H 1 : λ λ 0. The 2 log λx ) = 2[λ 0 λ) λ logλ 0 / λ)]. We reject H 0 whe 2 log λx ) > χ 2 1,α. Now suppose that θ = θ 1,..., θ k ). Suppose that H 0 : θ Θ 0 fixes some of the parameters. The, uder coditios, where T = 2 log λx 1,..., X ) χ 2 ν ν = dimθ) dimθ 0 ). Therefore, a asymptotic level α test is: reject H 0 whe T > χ 2 ν,α. 6

Example 10 Cosider a multiomial with θ = p 1,..., p 5 ). So Suppose we wat to test Lθ) = p y 1 1 p y 5 5. H 0 : p 1 = p 2 = p 3 ad p 4 = p 5 versus the alterative that H 0 is false. I this case ν = 4 1 = 3. The LRT test statistic is 5 i=1 λx 1,..., x ) = j 0j 5 i=1 py j j where p j = Y j /, p 10 = p 20 = p 30 = Y 1 + Y 2 + Y 3 )/, p 40 = p 50 = 1 3 p 10 )/2. These calculatios are o p 491. Make sure you uderstad them. Now we reject H 0 if 2λX 1,..., X ) > χ 2 3,α. 7 p-values Whe we test at a give level α we will reject or ot reject. It is useful to summarize what levels we would reject at ad what levels we woud ot reject at. The p-value is the smallest α at which we would reject H 0. I other words, we reject at all α p. So, if the pvalue is 0.03, the we would reject at α = 0.05 but ot at α = 0.01. Hece, to test at level α whe p < α. Theorem 11 Suppose we have a test of the form: reject whe T X 1,..., X ) > c. The the p-value is p = sup θ Θ 0 P θ T X 1,..., X ) T x 1,..., x )) where x 1,..., x are the observed data ad X 1,..., X p θ0. Example 12 X 1,..., X Nθ, 1). Test that H 0 : θ = θ 0 versus H 1 : θ θ 0. We reject whe T is large, where T = X θ 0 ). Let t be the obsrved value of T. Let Z N0, 1). The, p = P θ0 X θ 0 ) > t ) = P Z > t ) = 2Φ t ). Theorem 13 Uder H 0, p Uif0, 1). Importat. Note that p is NOT equal to P H 0 X 1,..., X ). The latter is a Bayesia quatity which we will discuss later. 7

8 The Permutatio Test This is a very cool test. approximatios. Suppose we have data ad We wat to test: Let Create labels It is distributio free ad it does ot ivolve ay asymptotic X 1,..., X F Y 1,..., Y m G. H 0 : F = G versus H 1 : F G. Z = X 1,..., X, Y 1,..., Y m ). L = 1,..., 1, 2,..., 2). }{{}}{{} values m values A test statistic ca be writte as a fuctio of Z ad L. For example, if T = X Y m the we ca write N i=1 T = Z iil i = 1) N i=1 N i=1 IL Z iil i = 2) i = 1) N i=1 IL i = 2) where N = + m. So we write T = gl, Z). Defie p = 1 IgL π, Z) > gl, Z)) N! π where L π is a permutatio of the labels ad the sum is over all permutatios. Uder H 0, permutig the labels does ot chage the distributio. I other words, gl, Z) has a equal chace of havig ay rak amog all the permuted values. That is, uder H 0, Uif0, 1) ad if we reject whe p < α, the we have a level α test. Summig over all permutatios is ifeasible. But it suffices to use a radom sample of permutatios. So we do this: 1. Compute a radom permutatio of the labels ad compute W. Do this K times givig values T 1),..., T K). 2. Compute the p-value 1 K K IT j) > T ). j=1 8