PHP2510: Principles of Biostatistics & Data Analysis Lecture X: Hypothesis testing PHP 2510 Lec 10: Hypothesis testing 1
In previous lectures we have encountered problems of estimating an unknown population mean and constructing confidence intervals for the mean. We have given the example of cholesterol levels for people who go on a new diet that may help lower the cholesterol level. If we have a sample of these people and observe their cholesterol levels after they stay on the diet for some time, we can estimate the expected cholesterol level for people on this diet. Assume X 1, X 2,...,X n N(µ, σ 2 ), we can estimate µ with X and construct a (1 α)100% confidence interval X ± tα/2,df=n 1 S/ n PHP 2510 Lec 10: Hypothesis testing 2
The numerical example we gave was a sample of size 10: 174 178 196 181 181 197 185 167 173 176 X = 180.8, SX 2 = 93. A 95% confidence interval for µ is X ± t α/2,df=n 1 S/ n = 180.8 ± 2.26 93/10 = (173.9, 187.7). If we know that the mean cholesterol level from the general population is 200 (for example, you looked it up from cdc website), another question we may ask is: Is the sample mean we observe (180.8) compatible with the hypothesis that people on the diet actually have the same mean cholesterol level as the general population? What do we mean by compatible? In other words, if we had sampled 10 people randomly from the general population, instead of the diet sub population, could we have observed 180.8 as well? PHP 2510 Lec 10: Hypothesis testing 3
To better illustrate the logic behind hypothesis testing, we will study a side example on whether the data are compatible with the hypothesis. Suppose we have a coin and the hypothesis is that this is a regular coin that head or tail has equal probability when I flip it. We refer to this hypothesis the null hypothesis and denote it by H 0. If now I flip it for 10 times and see all heads, you may start doubting the H 0 because it seems unlikely to observe all heads for 10 times if H 0 were true. How unlikely is it? The probability of observing 10 heads out of 10 flips based on H 0 (a fair coin) is ( ) 10.5 10 (1.5) 0 =.5 10 = 0.00098 10 This is a very small probability not impossible, but rather unlikely. The data do not appear to be compatible with the hypothesis. PHP 2510 Lec 10: Hypothesis testing 4
There are two possible explanations: either something really unlikely happened (unlikely impossible), or that the hypothesis is wrong. Action: Reject the H 0. PHP 2510 Lec 10: Hypothesis testing 5
Does this suggest that we should simply compute the probability of observing the data under the null hypothesis (H 0 )? Suppose we flip a coin for 200 times and observe 100 heads and 100 tails. It seems there is no reason to doubt that this coin is fair, P(H) = P(T) =.5. However, under the model X Binomial(200,.5), P(X = 100) = This is not a large probability itself. ( ) 200.5 100 (1.5) 100 = 0.056 100 The probability of getting exactly 500 heads from 1000 flips is only.025. What should we do? We certainly are not willing to conclude that the observation is incompatible with the hypothesis in this case. PHP 2510 Lec 10: Hypothesis testing 6
We need an alternative hypothesis, which we denote with H 1 or H a. For example, if the coin is not fair, the alternative could be that p.5, but we do not know whether p >.5 or p <.5. Under the model H 0, which values are as or more extreme than the one observed? By more extreme, we mean data that would make you lean towards the alternative more compared to the data you observed. # of heads 0 1 2 3 4 5 6 7 8 9 10 Probability.000977.00977.0439.117.205.246.205.117.0439.00977.000977 PHP 2510 Lec 10: Hypothesis testing 7
Since our alternative is p.5, extreme observations are either small number of heads or large number of heads values that are far away from the EX = 10.5 = 5. As extreme as the observation 2 heads is 8 heads, more extreme than 2 heads is 0,1, or 9, or 10 heads. Now we can ask the question: what is probability of observing something as or more extreme than the actual data, under H 0 )? P 0 (X = 0 or 1 or 2 or 8 or 9, or 10) = P 0 (X = 0) + P 0 (X = 1) + P 0 (X = 2) + P 0 (X = 8) + P 0 (X = 9) + P 0 (X = 10) =.11 (We use subscript 0 in P 0 to indicate the probability is calculated under H 0 model.) PHP 2510 Lec 10: Hypothesis testing 8
This probability says that, if there are a lot of people doing the same experiment of flipping a fair coin for 10 times, about 11% of them will see values as or more extreme than 2 heads. If you don t consider 11% a very rare probability, then you may not be surprised when the observation is 2. PHP 2510 Lec 10: Hypothesis testing 9
We have actually done hypothesis testing already! Given the data observed 2 heads out of 10 flips of a coin, We tested the hypothesis H 0 that the coin is fair, against the alternative hypothesis H 1 that the coin is not fair. We computed the probability of observing results as or more extreme than the data, under H 0. This probability is referred to as the p-value. If the p-value is small, it means either something improbable has happened, or that H 0 is problematic. We reject H 0 when the p-value is small. How small is small? Traditionally people have used.05 and.01. This number is called the significance level. PHP 2510 Lec 10: Hypothesis testing 10
What if the alternative hypothesis is p <.5 instead of p.5? In hypothesis testing, we think that either H 0 or H 1 is true. H 1 is used to determine which values are as or more extreme under H 0. # of heads 0 1 2 3 4 5 6 7 8 9 10 Probability.000977.00977.0439.117.205.246.205.117.0439.00977.000977 Which values, compared to the actual data X = 2, would make you lean more towards the alternative hypothesis p <.5? These would be X = 0 and X = 1. 8,9,10 are no longer extreme values if the alternative is p <.5 instead of p.5. Now the p-value becomes P 0 (X=0 or 1 or 2) = P(X = 0) + P(X = 1) + P(X = 2) = 0.055 When we used H 1 : p.5, we call it a two-sided test. When we use H 1 : p <.5 or H 1 : p >.5, we call it a one-sided test. PHP 2510 Lec 10: Hypothesis testing 11
Now let s get back to our original example: From a random sample of 10 people who are on a new diet, we observed cholesterol levels 174 178 196 181 181 197 185 167 173 176 Can we test the hypothesis that people on the diet actually have the same mean cholesterol level as the general population? H 0 : µ = 188 First, let s do a two sided test. H 1 : µ 188. PHP 2510 Lec 10: Hypothesis testing 12
We start with the simplest case, as we did for confidence intervals. Let s assume that we know the standard deviation is 9.8. Under H 0, X10 N(188, 9.8 2 /10). 0.00 0.04 0.08 0.12 175 180 185 190 195 200 Now the question is, which values are as or more extreme? PHP 2510 Lec 10: Hypothesis testing 13
0.00 0.04 0.08 0.12 175 180 185 190 195 200 X 10 N(188, 9.8 2 /10) Can you compute the probability of as or more extreme than 180.8? PHP 2510 Lec 10: Hypothesis testing 14
P( X 10 <= 180.8) = P( X 188 9.82 /10 < 180.8 188 9.82 /10 ) = P(Z < 2.32) =.01 p-value=2.01 =.02 <.05, so we would reject H 0 at significance level 0.05. PHP 2510 Lec 10: Hypothesis testing 15
For the observed sample mean 180.8, we have rejected the H 0 at significance level 0.05. What if the observations is 182? What about 183? Or more general, what are the values of X10 such that you would just reject H 0 at significance level.05? What we know under H 0 : distribution of normal Z If X 10 188 9.82 /10 is the standard X 10 188 < 1.96 or X10 188 > 1.96, we would reject H 0 at 9.82 /10 9.82 /10 significance level.05. If X 10 188 9.82 /10 < z.01/2 = 2.58 or would reject H 0 at significance level.01. X 10 188 9.82 /10 > z.01/2 = 2.58, we PHP 2510 Lec 10: Hypothesis testing 16
0.0 0.1 0.2 0.3 0.4 3 2 1 0 1 2 3 PHP 2510 Lec 10: Hypothesis testing 17
We call X 10 188 9.82 /10 the test statistic and regions (, 1.96), (1.96, ) or (, 2.58), (, 2.58) critical regions. When the test statistic is inside the critical region, we reject H 0. We say X 10 188 is a Z-statistic since it follows a standard 9.82 /10 normal distribution under H 0 model. In our example, X 10 = 180.8, so X 10 188 = 180.8 188 = 2.32. 9.82 /10 9.82 /10-2.32 is within the region (, 1.96), but not within the regions (, 2.58) or (, 2.58). Thus we reject H 0 at the.05 level, but not the.01 level. PHP 2510 Lec 10: Hypothesis testing 18
What if we have an one-sided alternative? H 0 : µ = 188 H 1 : µ < 188. Now which values are as or more extreme? 0.00 0.04 0.08 0.12 175 180 185 190 195 200 PHP 2510 Lec 10: Hypothesis testing 19
X 10 188 = 180.8 188 = 2.32 9.82 /10 9.82 /10 p-value: P(Z < 2.32) =.01 critical value for α =.05: P(Z < q) =.05 q = 1.64 critical value for α =.01: P(Z < q) =.05 q = 2.32 For one sided test, we would reject H 0 at both significance level 0.05 and.01 PHP 2510 Lec 10: Hypothesis testing 20
Summary of the steps in hypothesis testing A: In general 1. Select the probability model 2. Set up the null and alternative hypothesis 3. determine a test statistic 4. determine significance level and critical region 5. reject H 0 if test statistic is in critical region In previous example 1. Use a normal probability model with known variance 2. H 0 : µ = 188, H 1 : µ < 188 3. Z statistic X µ σ/ n = 2.32 4. α =.05,(, 1.64) 5. -2.32 in critical region, reject H 0 PHP 2510 Lec 10: Hypothesis testing 21
Summary of the steps in hypothesis testing B: In general 1. Select the probability model 2. Set up the null and alternative hypothesis 3. determine a test statistic 4. compute p-value 5. reject H 0 if p-value less than significance level In previous example 1. Use a normal probability model with known variance 2. H 0 : µ = 188, H 1 : µ < 188 3. Z statistic X µ σ/ n = 2.32 4. P(Z < 2.32) =.01 5. 0.01 < α =.05, reject H 0 PHP 2510 Lec 10: Hypothesis testing 22
Now, what if we do not know the standard deviation? Can we still do hypothesis testing? 1. Use a normal probability model with unknown variance 2. H 0 : µ = 188, H 1 : µ < 188 We can no longer form the Z-statistic. But we can estimate σ 2 and form a T statistic: From our data we have S 2 = 93, thus T = X µ S/ n = 180.8 188 = 2.36 93/10 X µ S/ n t df=n 1 Method A: find critical region from t-distribution (df=9). t.05,df=9 = 1.83, thus we have critical region (, 1.83) The test statistics is in critical region, reject H 0 at significance level.05. Method B: p-value P(T < 2.36) = 0.02 <.05, reject H 0 at significance level.05. PHP 2510 Lec 10: Hypothesis testing 23
Hypothesis testing for comparing two means: I X 1,...,X n1 N(µ X, σx 2 )(σ2 X known) Y 1,...,Y n2 N(µ Y, σy 2 )(σ2 Y known) H 0 : µ X = µ Y or µ x µ Y = 0 T = X Ȳ (µ X µ Y ) σ 2 X /n 1 + σ 2 Y /n 2 N(0, 1) under H 0 PHP 2510 Lec 10: Hypothesis testing 24
Example: we had the cholesterol data from last lecture on confidence intervals X: 174 178 196 181 181 197 185 167 173 176 Y: 212 204 204 201 194 218 205 180 207 195 189 198 190 193 194 183 208 202 189 213 X = 180.8, Ȳ = 199. Suppose we know σ2 X = σ2 Y = 9.82. T = 180.8 199 9.82 /10 + 9.8 2 /20 = 4.79 One sided critical value for α =.05: (, 1.64) One sided critical value for α =.01: (, 2.32) Reject H 0. Or, compute p-value P(Z < 4.79) 0, reject H 0 PHP 2510 Lec 10: Hypothesis testing 25
Hypothesis testing for comparing two means: II X 1,...,X n1 N(µ X, σ 2 X ) Y 1,...,Y n2 N(µ Y, σ 2 Y ) (σ 2 X, σ2 Y unknown but equal) T = X Ȳ (µ X µ Y ) S 2 p/n 1 + S 2 p/n 2 t df=n1 +n 2 2 under H 0 Estimate common variance by pooled sample variance S 2 p = 100.4 (if you forgot how this is done, review lecture 13). Form test statistic T = 180.8 199 100.4(1/10 + 1/20) = 4.69 PHP 2510 Lec 10: Hypothesis testing 26
One sided critical value for α =.05: (, t.05,df=28 ) = (, 1.70) One sided critical value for α =.01: (, t.01,df=28 ) = (, 2.47) Reject H 0 (exercise: What would be the critical regions if we were doing two-sided tests?) Or compute p-value P(t df=28 < 4.69) 0, reject H 0 PHP 2510 Lec 10: Hypothesis testing 27
Hypothesis testing for comparing two means: III X 1,...,X n1 N(µ X, σ 2 X ) Y 1,...,Y n2 N(µ Y, σ 2 Y ) (σ 2 X, σ2 Y unknown and unequal) T = X Ȳ (µ X µ Y ) S 2 X /n 1 + S 2 Y /n 2 Welch t under H 0 As we learned in previous lectures, the degree of freedom for this distribution is not simple. For large samples we know this converges to N(0,1), for small samples we can be conservative and use df = min(n 1 1, n 2 1) if there is no computer available. PHP 2510 Lec 10: Hypothesis testing 28
What happens when the two populations are not independent? What happens when we do not start with normal distributions? PHP 2510 Lec 10: Hypothesis testing 29
Difference of means of paired observations X N(µ X, σ 2 x) Y N(µ Y, σ 2 y) where X k is paired with Y k (before/after, left-hand/right-hand, paired treated/control...) H 0 : µ x = µ y, H 1 : µ x µ y But since X and Y are not independent, X and Ȳ are not independent. We do not have the simple result X Ȳ N(µ X µ Y, σx 2 /n + σ2 Y /n) (why not?) PHP 2510 Lec 10: Hypothesis testing 30
Solution: For each pair, we form the difference D k = X k Y k. Thus D 1,...,D k N(µ X µ Y, σd 2 ) and we estimate σ2 D with the sample variance SD 2. The test statistic for testing D = d 0 is then D d 0 S d / n degree of freedom n 1. student t with The most common test is for H 0 : D = 0. PHP 2510 Lec 10: Hypothesis testing 31
Example: Suppose you wish to test the effect of Prozac on the well-being of depressed individuals, using a standardised well-being scale. Higher scores indicate greater well-being (that is, Prozac is having a positive effect). We assume that the scores are approximately normally distributed. ID Pre Post 1 0 1 2 3 5 3 6 5 4 7 7 5 4 10 6 3 9 7 2 7 8 1 11 9 4 8 PHP 2510 Lec 10: Hypothesis testing 32
ID Pre Post difference (post-pre) 1 0 1 1 2 3 5 2 3 6 5-1 4 7 7 0 5 4 10 6 6 3 9 6 7 2 7 5 8 1 11 10 9 4 8 4 d = 3.67 S 2 d = 12.25 H 0 : d = 0, H 1 : d 0 t = D 0 S/ = 3.143 df = 8 9 Rejection region at α =.05: (2.306, ) and (, 2.306) Rejection region at α =.01: (3.355, ) and (, 3.355) PHP 2510 Lec 10: Hypothesis testing 33
When we do not start with normal distributions, Central limit theorem is our friend again, as long as we have large samples. PHP 2510 Lec 10: Hypothesis testing 34
Example: Bernoulli Suppose the incidence rate for children at 5 for disease W is.0137 (137 per 10,000) in 2007. We want to know if the incidence rate in Providence is the same as the national rate. A sample of 2000 children were randomly selected and their medical record queried to see if they had caught the disease in 2007. 30 of them had the disease. PHP 2510 Lec 10: Hypothesis testing 35
1. 1. select a probability model: Bernoulli(p) 2. 2. H 0 : p =.0137. H 1 : p.0137 3. determine a test statistic: 2000 is a large sample, by CLT, X. N(p, p(1 p)/n), thus 4. Under H 0, X p p(1 p)/n N(0, 1) X.0137.0137(1.0137)/2000 N(0, 1) We observe X = 30/2000 =.015, thus the test statistic is.015.0137.0137(1.0137)/2000 = 0.519 PHP 2510 Lec 10: Hypothesis testing 36
0.0 0.1 0.2 0.3 0.4 30% 1.96 2.5% 3 2 1 0.519 1 2 3 For two-sided test, the critical regions for α =.05 are (, 1.96) and (1.96, ). We do not reject H 0 at significance level 0.05 (thus we certainly do not reject H 0 at any more significant level, such as.01). OR, we compute the p-value 2P(Z >.519) = 2.30 =.60.60 >.05 and we do not reject H 0 at significance level.05 PHP 2510 Lec 10: Hypothesis testing 37
Example: Suppose we want to compare the average daily visit for two emergency rooms, A and B. For each we record the daily visit number for a year. On average, there is 15.4 visits to ER A a day, and 14.8 visits to ER B a day. Do these two ERs have the same daily visit rates? 1. Probability model: Poisson, for events randomly happen over time 2. H 0 : λ A = λ B, H 1 : λ A λ B 3. Test statistic: For either ER we observe 365 days, by CLT, X. A N(λ A, λ A /n), X. B N(λ B, λ B /n), X A X. B N(λ A λ B, λ A /n + λ B /n) PHP 2510 Lec 10: Hypothesis testing 38
( X A X B ) (λ A λ B ) λa /n + λ B /n. N(0, 1) 4. Under H 0, (λ A λ B ) = 0, ( X A X B ). N(0, 1) 2λ/n We can pool estimate λ from the two samples and get ˆλ = (15.4 365 + 14.8 365)/(365 + 365) = 15.1 Test statistic: (15.4 14.8) 2 15.1/365 = 2.09 > Z.025 = 1.96 OR,p value = 2P(Z > 2.09) = 2.018 =.036 <.05 Reject H 0. PHP 2510 Lec 10: Hypothesis testing 39
Review: The logic for hypothesis testing: If the null hypothesis, instead of the alternative hypothesis, is true, should I be surprised by the data? I am surprised, if the the probability of observing such or more extreme result is small based on H 0. This probability is called the p-value. By convention, most people reject the null hypothesis if p-value is smaller than 0.05. The smaller the p-value, the stronger my doubt is about H 0, thus the more significant the result is against H 0. PHP 2510 Lec 10: Hypothesis testing 40
Review: The procedure for hypothesis testing 1. Select the probability model 2. Set up the null and alternative hypothesis 3. determine a test statistic 4. compute p-value 5. reject H 0 if p-value less than significance level PHP 2510 Lec 10: Hypothesis testing 41
Possible results from hypothesis testing: Decision H 0 True Reject, Type I error (α) Not reject H 1 True Reject Not reject, Type II error β Type I error: Rejecting H 0 when H 0 is true. Type II error: Fail to reject H 0 when H 0 is false (H 1 is true) Power: The ability to reject H 0 when H 0 is false P 0 ( Reject H 0 ) = P(Reject H 0 H 0 true) = α this probability is the significance level (Type I error rate) P 1 ( Reject H 0 ) = P(reject H 0 H 1 true) = 1 β this probability is the power of a hypothesis test PHP 2510 Lec 10: Hypothesis testing 42
Consider a one-sided test first: H 0 : µ = µ 0 = 5 H 1 : µ = µ 1 = 8 > µ 0. Suppose we have a normal model and know the variance is 10 2. For a sample size of 100, we form the test statistic T = X 5 10/ = X 5. 100 Under H 0, we know T follows Z distribution. 0.0 0.1 0.2 0.3 0.4 critical region, reject H0 2 0 c 2 4 6 For any decision rule reject H 0 if test statistic is greater than c, the type I error is the area of the area of red shaded region. PHP 2510 Lec 10: Hypothesis testing 43
Demo 1: type I error and choice of critical region PHP 2510 Lec 10: Hypothesis testing 44
What about type II error? X 8 Under H 1, 10/ = X 8 N(0, 1) 100 Under H 1, X 5 = ( X 8) + 3 N(3, 1) 0.0 0.1 0.2 0.3 0.4 H0 β α H1 critical region, reject H0 2 0 2 4 6 We can try to reduce type I error by using a larger cutoff, but this would increase type II error (reducing power). We can try to increase power by using a smaller cutoff, but this would increase the type I error. PHP 2510 Lec 10: Hypothesis testing 45
Demo: the trade-off between type I and type II error. PHP 2510 Lec 10: Hypothesis testing 46
Two-sided test: 0.0 0.1 0.2 0.3 0.4 critical region, reject H0 critical region, reject H0 6 4 2 0 2 4 6 PHP 2510 Lec 10: Hypothesis testing 47
We have seen that for a hypothesis test, there is a trade off between type I and II errors. For the same study design, we cannot simultaneously reduce both of them. The common practice is to fix the type I error at a small level, such as.05 or.01, so that we know that at least we are not rejecting H 0 too often when we should not. What other factors affect power, for a given type I value? PHP 2510 Lec 10: Hypothesis testing 48
(1. Type I error) 2. effect size If H 1 is true, the larger the difference between µ 0 (nullvalue) and µ 1 (alternative), the higher the power. 0.0 0.1 0.2 0.3 0.4 H0 β H1 H1 critical region, reject H0 α 2 0 2 4 6 8 PHP 2510 Lec 10: Hypothesis testing 49
3. Sample size We know that X. N(µ, σ 2 /n). This means as sample size increases, the sample mean gets more concentrated near the true mean. Thus the null and alternative hypothesis becomes easier to separate. H0 H1 H0 H1 PHP 2510 Lec 10: Hypothesis testing 50
demo3 PHP 2510 Lec 10: Hypothesis testing 51
Computation of power 1. Write down the two hypothesis H 0 and H 1 2. Write down the probability models based on each hypothesis 3. determine the test statistic 4. For given type I error rate (α), effect size and sample size, determine the critical region (rejection region) 5. computer power 1-β PHP 2510 Lec 10: Hypothesis testing 52
Computation of power : one sided test Example: Suppose we want to test whether the mean of a population is 12 or less than 12. We assume normal distribution with known variance 36. What is the power of this test if the true mean is 10, and we have a sample size of 25, and set significance level.05? 1. H 0 : µ = 12;H 1 : µ < 12 2. Under H 0 : X N(12, 36) X N(12, 36/n) Truth:X N(10, 36) X N(10, 36/n) 3. Test statistic: X 12 36 4. Since we have a one sided test with H 1 : µ < 12, we will reject H 0 when the test statistic is less than a cutoff. X 12 α = 0.05 = P 0 (Reject H 0 ) = P 0 ( 36/ 25 < C) = P(Z < C) From the Z-table we know C = 1.645 5. Under H 1 PHP 2510 Lec 10: Hypothesis testing 53
X 12 POWER = P 1 ( < 1.645) 36/ 25 = P 1 ( X 10 + 10 12 6/5 = P 1 ( X 10 6/5 = P 1 ( X 10 6/5 + 10 12 6/5 < 1.645 < 1.645) < 1.645) 10 12 6/5 ) = P(Z <.0217) = 1 P(Z >.0217) =.52 PHP 2510 Lec 10: Hypothesis testing 54
Example: Suppose we want to test whether the mean of a population is 12 or greater than 12. We assume normal distribution with known variance 36. What is the power of this test if the true mean is 10, and we have a sample size of 25, and set significance level.05? 1. H 0 : µ = 12;H 1 : µ > 12 2. Under H 0 : X N(12, 36) X N(12, 36/n) Truth : X N(10, 36) X N(10, 36/n) 3. Test statistic: X 12 36 4. Since we have a one sided test with H 1 : µ > 12, we will reject H 0 when the test statistic is greater than a cutoff. X 12 α = 0.05 = P 0 (Reject H 0 ) = P 0 ( 36/ 25 > C) = P(Z > C) From the Z-table we know C = 1.645 5. Under H 1 PHP 2510 Lec 10: Hypothesis testing 55
X 12 POWER = P 1 ( > 1.645) 36/ 25 = P 1 ( X 10 + 10 12 6/5 = P 1 ( X 10 6/5 = P 1 ( X 10 6/5 + = P(Z > 3.31) 0 10 12 6/5 > 1.645 > 1.645) > 1.645) 10 12 6/5 ) When truth is µ = 10, the probability that you will be able to reject H 0 in a test for µ = 12 versus µ > 12, is nearly 0. PHP 2510 Lec 10: Hypothesis testing 56
Example: Suppose we want to test whether the mean of a population is 12 or not 12. We assume normal distribution with known variance 36. What is the power of this test if the true mean is 10, and we have a sample size of 25, and set significance level.05? 1. H 0 : µ = 12;H 1 : µ 12 2. Under H 0 : X N(12, 36) X N(12, 36/n) Truth X N(10, 36) X N(10, 36/n) 3. Test statistic: X 12 36 4. Since we have a two-sided test, we will reject H 0 when the absolute value of test statistic is greater than a cutoff. X 12 α = 0.05 = P 0 (Reject H 0 ) = P 0 ( 36/ 25 > C) = P( Z > C) From the Z-table we know C = Z.025 = 1.96 5. Under H 1 PHP 2510 Lec 10: Hypothesis testing 57
X 12 P 1 ( > 1.96) 36/ 25 = P 1 ( X 12 6/5 = P 1 ( X 10 + 10 12 6/5 = P 1 ( X 10 6/5 > 1.96) + P 1 ( X 12 6/5 + = P(Z > 1.96 = P(Z > 3.63) + P(Z <.29) < 1.96) > 1.96) + P 1 ( X 10 + 10 12 6/5 10 12 + 6/5 10 12 10 12 ) + P(Z < 1.96 6/5 6/5 ) > 1.96) + P 1 ( X 10 6/5 = P(Z > 3.63) + P(Z >.29) =.386 10 12 6/5 < 1.96) < 1.96) PHP 2510 Lec 10: Hypothesis testing 58
In general, for X 1, X 2,...,X n N(µ, σ 2 ), the power for test H 0 : µ = µ 0 versus H 1 : µ < µ 0 is P(Z < µ 0 µ 1 σ/ n Z α) the power for test H 0 : µ = µ 0 versus H 1 : µ > µ 0 is P(Z > µ 0 µ 1 σ/ n + Z α) the power for test H 0 : µ = µ 0 versus H 1 : µ µ 0 is P(Z < µ 0 µ 1 σ/ n Z α/2) + P(Z > µ 0 µ 1 σ/ n + Z α/2) PHP 2510 Lec 10: Hypothesis testing 59
Exercise: Now if I give you the same set up, just different numberes: different H 0 mean (not 12, but 200), different truth (not 10, but 180), different variance( not 36, but 64), different significance level (not.05, but.01), different sample size (not 25, but 49), can you compute the power for the test H 0 : µ = 200 versus H 1 : µ > 200? PHP 2510 Lec 10: Hypothesis testing 60
Next topic: We now know that type I error, Type II error (1-power), effect size and sample size are all connected. Can we determine a necessary sample size when we need to meet certain requirements of error rates? For one sample two-sided test:h 0 : µ = µ 0 versus µ µ 0 n = (Z α/2 + Z β ) 2 σ 2 (µ 1 µ 0 ) 2 For one sample one-sided test:h 0 : µ = µ 0 versus µ µ 0 n = (Z α + Z β ) 2 σ 2 (µ 1 µ 0 ) 2 the smaller the error rate, the larger the sample size the larger the variance, the larger the sample size the larger the effect size, the smaller the sample size PHP 2510 Lec 10: Hypothesis testing 61