Neyman-Pearson paradigm. Suppose that a researcher is interested in whether the new drug works. The process of determining whether the outcome of the experiment points to yes or no is called hypothesis test. A widely used formalization of this process is due to Neyman and Pearson. Here we begin with null hypothesis that the new drug has no effect, denoted by H 0. The null hypothesis is often the reverse of what we actually believe why? Because the researcher hopes to reject the hypothesis and announce that the new drug leads to significant improvements. If the hypothesis is not rejected, the researcher can announce nothing and move on to a new trial. Hypothesis test of population mean. Hospital workers are subject to a radiation exposure emanating from the skin of the patient. A researcher is interested in the plausibility of the statement that the population mean µ of radiation level is µ 0 the researcher s hypothesis. Then the null hypothesis is H 0 : µ = µ 0. The opposite of the null hypothesis, called an alternative hypothesis, becomes H A : µ µ 0. Thus, the hypothesis test problem H 0 versus H A is formed. The problem here is to whether or not to reject H 0 in favor of H A. Assessment of null hypothesis. To assess the null hypothesis, the radiation levels X 1,..., X n are measured from n patients who had been injected with a radioactive tracer, and assumed to be independent and normally distributed with the mean µ. Under the null hypothesis, the random variable T = X µ 0 S/ n has the t-distribution with (n 1 degrees of freedom, and it is called a test statistic. Thus, we obtain the exact probability P ( T t α/2,n 1 = α. When α is chosen to be a small value (0.05 or 0.01, for example, it is unlikely that the absolute value T is larger than the critical point t α/2,n 1. Assessment of null hypothesis, continued. We say that the null hypothesis H 0 is rejected with significance level α (or, size α when the observed value t of T satisfies t > t α/2,n 1. Example 1. We have µ 0 = 5.4 for the hypothesis, and decided to give a test with significance level α = 0.05. Suppose that we have obtained X = 5.145 and S = 0.7524 from the actual data with n = 28. Solution. We can compute T = 5.145 5.4 0.7524/ 28 1.79. Since T = 1.79 t 0.025,27 = 2.052, the null hypothesis cannot be rejected. Thus, the evidence against the null hypothesis is not persuasive. Page 1 Mathematical Statistics/October 30, 2018
One-sided hypothesis test. In the same case of hospital workers subject to a radiation exposure, this time the researcher is interested in the plausibility of the statement that the population mean µ is less than µ 0. Then the hypothesis test problem is H 0 : µ µ 0 versus H A : µ < µ 0. Here we use the same test statistic T = X µ 0 S/ n, and we reject H 0 with significant level α when you find that t < t α,n 1 for the observed value t of T. Example 2. We use the same µ 0 = 5.4 for the hypotheses and the same significance level α = 0.05, but use the one-sided test. Solution. Recall that X = 5.145 and S = 0.7524 were obtained from the data with n = 28. Since T = 1.79 < t 0.05,27 = 1.703, the null hypothesis H 0 is rejected. Thus, the outcome is statistically significant so that the population mean µ is smaller than 5.4. Simple and composite hypotheses. Let θ be a parameter of an underlying probability density function f(x; θ for a certain population. The hypothesis H 0 : θ = θ 0 is called a simple hypothesis, since it completely specifies the underlying distribution. Whereas, the hypothesis H 0 : θ Θ 0 with a set Θ 0 of parameters is called a composite hypothesis if the set Θ 0 contains more than one element. The opposite of the null hypothesis is called an alternative hypothesis, and is similarly expressed as H A : θ Θ 1 where Θ 1 is another set of parameters and it satisfies Θ 0 Θ 1 =. The set Θ 1 is typically (but not necessarily chosen to be the complement of Θ 0. Thus, the hypothesis test problem can be formed as H 0 : θ Θ 0 versus H A : θ Θ 1 (4.1 in order to determine whether or not to reject H 0 in favor of H A. Power function. Given a random sample X = (X 1,..., X n, a function 1 if H 0 is rejected; δ(x = 0 otherwise. (4.2 is called a test function. Given the test (4.2, we can define the power function by K(θ 0 = P ( Reject H 0 θ = θ 0 = E (δ(x θ = θ 0. Test statistic. A typical test, however, is presented in the form H 0 is rejected if T (X c. Here T (X is called a test statistic, and c is called a critical value. Then the test function can be expressed as 1 T (X c; δ(x = (4.3 0 otherwise. Thus, we obtain K(θ 0 = P (T (X c θ = θ 0. Page 2 Mathematical Statistics/October 30, 2018
Type I error and significance level. The probability of type I error (i.e., H 0 is incorrectly rejected when H 0 is true is defined by α = sup θ 0 Θ 0 K(θ 0, which is also known as the size of the test. Having calculated the size α of the test, (4.2 or (4.3 is said to be a level α test, or a test with significance level α. Type II error and power of test. What is the probability that we incorrectly accept H 0 when it is actually false? Such probability β is called the probability of type II error. Then the value (1 β is known as the power of the test, indicating how correctly we can reject H 0 when it is actually false. Suppose that H 0 is in fact false, say θ = θ 1 for some θ 1 Θ 1. Then the power of the test is calculated by K(θ 1. Example 3. Suppose that the true population mean is µ = 5.1 (versus the value µ 0 = 5.4 in our hypotheses. Then calculate the power of the test with significance level α = 0.05. Solution. (a In the two-sided hypothesis testing, we reject H 0 when T > t 0.025,27 = 2.052. Therefore, the power of the test is K(5.1 = P ( T > 2.052 µ = 5.1 0.523 (b In the one-sided hypothesis testing, we reject H 0 when T < t 0.05,27 = 1.703. Therefore, the power of the test is K(5.1 = P (T < 1.703 µ = 5.1 0.658. This explains why we could not reject H 0 in the two-sided hypothesis testing. Our chance to detect the falsehood of H 0 is only 52%, while we have 66% of the chance in the one-sided hypothesis testing. Uniformly most powerful test. Suppose that the test (4.2 has the size α. This test is said to be uniformly most powerful (UMP if it satisfies K(θ 1 K (θ 1 for all θ 1 Θ 1 for the power function K of every other level α test. Furthermore, if this is given in the form (4.3 with test statistic T (X, then the test statistic T (X is said to be optimal. Neyman-Pearson lemma. Consider the testing problem with simple (null and alternative hypotheses: H 0 : θ = θ 0 versus H A : θ = θ 1. Then we can construct the likelihood ratio by where is the likelihood function. L(θ 0, θ 1 ; x = L(θ 1, x L(θ 0, x L(θ; x = n f(x i ; θ Page 3 Mathematical Statistics/October 30, 2018
Lemma 4. δ(x = 1 L(θ 0, θ 1 ; X c; 0 otherwise. is uniformly most powerful, and called the Neyman Pearson test. (4.4 Solution. For any function ψ(x satisfying 0 ψ(x 1, we obtain E(ψ(X δ(x θ = θ 1 ( } L(θ0, θ 1 ; X = c E [ψ(x δ(x] θ = θ 0 c c E(ψ(X δ(x θ = θ 0. Observe that the right-hand side vanishes when the two test functions ψ(x and δ(x share the same size α. Monotone likelihood ratio family. Let f(x; θ be a joint density function with parameter θ, and let L(θ 0, θ 1 ; x be the likelihood ratio. Suppose that T (X is a statistic and does not depend on the parameter θ. Then f(x; θ is called a monotone likelihood ratio family in T (X if (a f(x; θ 0 and f(x; θ 1 are distinct for θ 0 θ 1 ; (b L(θ 0, θ 1 ; x is a strictly increasing function of T (x whenever θ 0 < θ 1. Monotone likelihood ratio family. Let θ 0 < θ 1, and consider the following test problem: H 0 : θ = θ 0 versus H A : θ = θ 1. Suppose that f(x; θ is a monotone likelihood ratio family in T (X. Then we can express the UMP test (4.4 by 1 T (X c; δ(x = 0 otherwise. By setting ψ(x K(θ 0 in the proof of Neyman-Pearson lemma we can observe that K(θ 0 = E(ψ(X θ = θ 1 K(θ 1 Optimal tests. Consider the following test problem: H 0 : θ θ 0 (or H 0 : θ = θ 0 versus H A : θ > θ 0. (4.5 If f(x; θ is a monotone likelihood ratio family in T (X, then the test functions (4.3 and (4.4 is equivalent whenever θ 0 < θ 1, and the power function K(θ for these tests becomes an increasing function. Furthermore, T (X is an optimal test statistic, and the size of the test is simply given by α = K(θ 0. Optimal tests, continued. Page 4 Mathematical Statistics/October 30, 2018
(a Essentially, uniformly most powerful tests exist only for the test problem (4.5. (b Suppose that f(x; θ is of the exponential family f(x; θ = exp [c(θu(x + h(x + d(θ], x A and that c(θ is a strictly increasing function. Then f(x; θ is a monotone likelihood ratio family in u(x. And the natural sufficient statistic u(x becomes an optimal test statistic. Likelihood ratio test procedure. The Neyman Pearson test (4.4 can be generalized for the composite hypotheses in (4.1: (i obtain the maximum likelihood estimate (MLE ˆθ of θ, (ii calculate also the MLE ˆθ 0 restricted for θ Θ 0, and (iii construct the likelihood ratio λ(x = L(ˆθ; X L(ˆθ 0 ; X = sup L(θ; X θ sup L(θ; X = max θ Θ 0 sup L(θ; X θ Θ 1 sup L(θ; X, 1. θ Θ 0 The test statistic λ(x yields an excellent test procedure in many practical applications, though it is not an optimal test in general. Page 5 Mathematical Statistics/October 30, 2018
Problem 1. Suppose that (X 1,..., X n and (Y 1,..., Y n are two independent random samples respectively from N(µ 1, 400 and N(µ 2, 225. Let θ = µ 1 µ 2, and let K(θ be the power function for the test δ( X, Ȳ = 1 if X Ȳ c; where X = 1 n n X i and Ȳ = 1 n n Y i. Calculate n and c so that K(0 = 0.05 and K(10 = 0.9. Solution. Observe that X N ( µ 1, ( 400 n, Ȳ N µ 2, 225 n, and therefore, that X Ȳ N ( θ, 625 n. In order for K(θ to achieve K(0 = P ( X Ȳ c θ = 0 = 0.05 and K(10 = 0.9, c we must find c and n satisfying = z 0.05 and c 10 = z 0.9. Therefore, we obtain 625/n 625/n c 5.62 and n 53.55 (or 54. Problem 2. Suppose that (X 1,..., X n is a random sample from N(0, θ with parameter 0 < θ <. Then show that the joint density f(x; θ is a monotone likelihood ratio family in T (X = n X2 i. Find a UMP test for H 0 : θ θ 0 versus H A : θ > θ 0. Solution. The joint density f(x; θ is of the exponential family f(x; θ = exp [c(θt (x + h(x + d(θ] with c(θ = 1, h(x 0 and d(θ = n ln(2πθ. Thus, c(θ is an increasing function of θ, 2θ 2 and therefore, 1 if T (X c; δ(x = is the UMP test, Problem 3. Suppose that (X 1,..., X 25 is a random sample of size n = 25 from N(θ, 100. Find the UMP test of size α = 0.1 for testing H 0 : θ 75 versus H A : θ > 75. Solution. When X 1,..., X iid n N(θ, σ 2 with known σ 2 = 100, the joint density [ ] θ n f(x; θ = exp x σ 2 i 1 n x 2 2σ 2 i nθ2 2σ n 2 2 ln(2πσ2 becomes a monotone likelihood ratio family in T (X = n X i, and 1 if T (X c; δ(x = is the UMP test for H 0 : θ θ 0 versus H A : θ > θ 0. Since P (T (X c θ = 75 = 0.1 with n = 25, we can find c = (50z 0.1 + (75(25 1939.1, where z 0.1 1.282 is the critical point with level α = 0.1. Page 6 Mathematical Statistics/October 30, 2018
Problem 4. Suppose that (X 1,..., X n is a random sample from N(θ, 16. Find the UMP test of H 0 : θ 25 versus H A : θ < 25 so that the power function K(θ achieves K(25 = 0.1 and K(23 = 0.9. Solution. We can re-parametrize the density function by setting θ = λ. Then the hypotheses are restated as H 0 : λ = 25 versus H A : λ > 25, and 1 if X c; δ(x = becomes a UMP test with X = 1 n n X i. Here we want to achieve P ( X c θ = 25 = 0.1 and P ( X c θ = 23 = 0.9 by choosing appropriate n and c. Since c = 25 z 0.1 16/n and c = 23 z 0.9 16/n, we obtain c = 24 and n 26.3. Problem 5. Suppose that (X 1,..., X n is a random sample from the pdf f(x; θ = θx θ 1, 0 < x < 1, with parameter 0 < θ <. Then show that the joint density f(x; θ is a monotone likelihood ratio family in T (X = n ln X i. Find a UMP test for H 0 : θ θ 0 versus H A : θ > θ 0. Solution. The joint density f(x; θ is of the exponential family f(x; θ = exp [c(θt (x + h(x + d(θ] with c(θ = θ 1, h(x 0 and d(θ = n ln θ. Since c(θ is increasing, f(x; θ is a monotone likelihood ratio family in T (X, and 1 if T (X c; δ(x = is the UMP test. Problem 6. Suppose that (X 1,..., X 5 is a random sample of five Bernoulli trials having the frequency function f(x; θ = θ x (1 θ 1 x, x = 0, 1, with parameter 0 < θ < 1. (a Show that the joint frequency f(x; θ is a monotone likelihood ratio family in T (X = 5 X i, and that 1 if T (X c; δ(x = is a UMP test for H 0 : θ 1 2 versus H A : θ > 1 2. (b Find the size of test (i.e., significance level when c = 4. (c Find the size of test (i.e., significance level when c = 5. Solution. The joint frequency f(x; θ is of the exponential family f(x; θ = exp [c(θt (x + h(x + d(θ] with c(λ = ln ( θ 1 θ, h(x 0 and d(θ = 5 ln(1 θ. Page 7 Mathematical Statistics/October 30, 2018
(a Since c(θ is an increasing function, f(x; θ is a monotone likelihood ratio family in T (X, and therefore, δ(x is the UMP test. (b P (T (X 4 θ = 1 = ( ( 5 1 5 ( 2 4 2 + 5 ( 1 5 5 2 = 6. 32 (c P (T (X 5 θ = 1 = ( ( 5 1 5 2 5 2 = 1. 32 Problem 7. Suppose that X = (X 1,..., X n and Y = (Y 1,..., Y m are two random samples from group 1 and 2 respectively distributed as N(θ 1, θ 3 and N(θ 2, θ 3. (a Under H 0 : θ 1 = θ 2 = θ 0, calculate the MLE ˆθ 0 and ˆθ 3, and simplify L(ˆθ 0, ˆθ 3. (b Under H A : θ 1 θ 2, calculate the MLE ˆθ1, ˆθ2, and ˆθ3. Then obtain L(ˆθ1, ˆθ2, ˆθ3. (c Now suppose that n = m = 8, X = 75.2, Ȳ = 78.6, n (X i X 2 = 71.2, and m (Y j Ȳ 2 = 54.8. Then construct a likelihood ratio test procedure for H 0 : θ 1 = θ 2. Test it at the significance level 0.05. Obtain the p-value and write a conclusion of the test. Solution. (a Under H 0 we obtain the log likelihood function ln L(θ 0, θ 3 of θ 0 and θ 3 by n + m ln(2πθ 3 1 (X i θ 0 2 + (Y j θ 0 2. 2 2θ 3 Then we can find the MLE by solving ln L(θ 0, θ 3 = 1 (X i θ 0 + (Y j θ 0 = 0; θ 0 θ 3 ln L(θ 0, θ 3 = n + m + 1 (X θ 3 2θ 3 2θ3 2 i θ 0 2 + (Y j θ 0 2 = 0. Thus, we can simplify ( (n+m/2 ( L(ˆθ 0, ˆθ 1 3 = exp n + m 2πˆθ 3 2 by applying the solution ˆθ 0 = 1 X i + Y j n + m ˆθ 3 = 1 (X i n + m ˆθ 0 2 + (Y j ˆθ 0 2 Page 8 Mathematical Statistics/October 30, 2018
(b Under H A we obtain the log likelihood function ln L(θ 1, θ 2, θ 3 by n + m ln(2πθ 3 1 (X i θ 1 2 + (Y j θ 2 2. 2 2θ 3 Then we can find the MLE by solving θ 1 ln L(θ 1, θ 2, θ 3 = 1 θ 3 θ 2 ln L(θ 1, θ 2, θ 3 = 1 θ 3 θ 3 ln L(θ 1, θ 2, θ 3 = n + m 2θ 3 + 1 2θ 2 3 n (X i θ 1 = 0; (Y j θ 2 = 0; (X i θ 1 2 + (Y j θ 2 2 = 0, We can simplify by applying the solution L(ˆθ1, ˆθ2, ˆθ3 ( 1 = 2πˆθ3 (n+m/2 ( exp n + m 2 ˆθ 1 = X := 1 n X i n ˆθ 2 = Ȳ := 1 Y j m ˆθ 3 = 1 (X i ˆθ1 2 + n + m (Y j ˆθ2 2 (c We can construct the likelihood ratio test statistic λ(x, Y from a random sample X = (X 1,..., X n and Y = (Y 1,..., Y m of group 1 and 2 by λ(x, Y = L(ˆθ1, ˆθ2, ˆθ3 L(ˆθ 0, ˆθ 3 ( ˆθ3 = ˆθ 3 (n+m/2 Here the test statistic can expressed as ( (n + m 2 + T 2 λ(x, Y = n + m 2 (n+m/2 Page 9 Mathematical Statistics/October 30, 2018
where T = X Ȳ with the pooled variance 1 S + 1 n m S 2 = 1 (X i n + m 2 X 2 + (Y j Ȳ 2 The test function is equivalently constructed by 1 if T c; δ(x, Y = 0 otherwise. Note that T has a t-distribution with (n + m 2 degrees of freedom under H 0. choosing the critical value c = t α/2,n+m 2 we can achieve the significance level Thus, by P ( T t α/2,n+m 2 θ 1 = θ 2 = α for type I error. Finally suppose that n = m = 8, X = 75.2, Ȳ = 78.6, n (X i X 2 = 71.2, and m (Y j Ȳ 2 = 54.8. Then we obtain S = 3 and T = 2.267. By comparing T with the critical value t 0.025,14 = 2.1448, we can reject H 0. The same conclusion was obtained by calculating the p-value 0.04 which is less than α = 0.05. Page 10 Mathematical Statistics/October 30, 2018