Hypothesis Testing. ) the hypothesis that suggests no change from previous experience

Hypothesis Testing Definitions Hypothesis a claim about something Null hypothesis ( H 0 ) the hypothesis that suggests no change from previous experience Alternative hypothesis ( H 1 ) the hypothesis that will be accepted if the null hypothesis is rejected Test statistic the statistical measure used to decide whether to accept or reject the null hypothesis Critical (rejection) region the values of the test statistic for which the null hypothesis will be rejected (and hence the alternative hypothesis accepted) Critical value(s) the boundary value(s) of the critical region Acceptance region the values of the test statistic for which the null hypothesis is accepted (and hence the alternative hypothesis rejected) Significance level the probability that the test statistic will fall in the critical region when H0 is true. Type of Alternative Hypothesis for a test of the mean of a normal distribution H : 0 0 H : 1 0 when testing for a decrease in the mean a one-tailed test H : 1 0 when testing for an increase in the mean a one-tailed test H : 1 0 when testing for a change in the mean a two-tailed test Significance Levels and Critical z-values Significance Level 10% 5% 2.5% 2% 1% 0.5% 0.25% One tailed 1.282 1.645 1.96 2.326 2.576 2.807 Two tailed 1.645 1.960 2.326 2.576 2.807 1

Procedure for a Hypothesis Test It is believed that the masses of bars of chocolate are normally distributed with mean 200g and standard deviation 5. A bar of chocolate is selected at random and found to have a mass of 192 g. Test at the 5% significance level whether the mean mass of the bars has decreased. Method 1a calculating the critical region H : 200 0 H : 200 1 The test is one tailed. The critical region will be X 200 1.645 5 X 191.775 Since 192 does not fall in the critical region we accept H0 and conclude there is no significant evidence to support the assertion that the mean masses of the bars of chocolate has decreased. Note that we cannot conclude that H0 is definitely true, simply that there is insufficient evidence to conclude that it is untrue. Method 1b calculating the critical region in terms of z-values This is a modification of the first approach. The critical region for z-values is Z 1.645. For our value of 192 g we have a z-value of 192 200 1.6. 5 Since this is not in the critical region we accept H 0. This is marginally quicker than Method 1a. However if you are testing lots of different individual values at the 5% significance level Method 1a is much more efficient. 2

Method 2 calculating probabilities This time we calculate 192 200 P( X 192) P Z P( Z 1.6) 1 P( Z 1.6) 1 0.9452 0.0548 5.48% 5 Since 5.48% 5% we again accept H 0 Note.. If we were testing a number of individual values these calculations would need to be completed many times and would be quite inefficient as a method. Both Methods 1b and 2 are very efficient when testing the same thing at different significance levels. For example in Method 1b we could compare out test z-value of 1.6 against other critical values obtained for other significance levels very quickly. For example at the 10% level with a critical value of 1.282 we could quickly see that we would reject H0 and conclude that there was significant evidence at the 10% significance level to conclude that there has been a reduction in the mean. Note that if you accept a null hypothesis at a particular significance level in a one tailed test then you would definitely accept it at the two-tailed test of the same significance level. This is because the critical value for the two-tailed test is larger than the critical value for the one-tailed test. This is illustrated in the diagram below by the line A. Note that if you accept a null hypothesis in a two-tailed test at a particular significance level it may be that you would reject it in a one tailed test of the same hypothesis and at the same significance level as shown in the diagram below by the line B. A B 2.5% 2.5% 1.96 1.96 1.645 5% z z 3

Testing the Sample Mean It is believed that the masses of bars of chocolate are normally distributed with mean 200g and standard deviation 5. Three bars of chocolate are selected at random and found to have a mean mass of 192 g. Test at the 5% significance level whether the mean mass of the bars has decreased. H : 200 0 H : 200 1 The distribution of the mean mass of three bars has distribution X ~ N (200, ). 5 2 3 We again have a critical region for z-values of Z 1.645. The z-value corresponding to value 192 is 192 200 192 200 2.771. 5 2 5 3 3 So because 2.771 lies in the critical region we reject H0 and conclude that there is significant evidence at the 5% significance level of a decrease in the mean. Notes The outcome has changed because a difference of 8 g was not significant for a single bar but is for the mean mass of 3 bars. Indeed we would expect variation with individual bars but the more bars we include in our sample the less overall mean variation we would expect. This is in part because we would expect bars that are lighter than they should be to be compensated for by bars that are heavier than they should be. Suggested questions to try Ex 5B p 105 Q 3, 4, 5, 6 Ex 5C p 108 Q 1, 2, 4, 7 4

Tests involving sample means of non-normal distributions If the distribution we are sampling from is not normal we still know by the central limit theorem that X N n 2 ~ (, ) approximately where and are the mean and standard deviation respectively of the original distribution and n is large ( a rough rule of thumb is that n is at least 30). Note The approximation can always be made the issue is whether it is any good! The value of n suggested above is quite a good rule of thumb although there may be distributions for which it will work with n smaller than 30 or for which you would be advised to go for an even larger value of n. If the original distribution has or approximately has many of the properties of a normal distribution e.g. symmetry it might be possible to get away with a lower value of n. If the original distribution is highly skewed then going for a larger value of n might even be advised. Example It is claimed that the times of telephone calls made from a busy office have mean 4.2 minutes and variance 1.44 minutes. The times of 70 telephone calls were recorded and found to have a mean of 3.9 minutes. Is there significant evidence at the 5% significance level to suggest that the claim is incorrect? Solution H 0 1 : 4.2 H : 4.2 The critical values for Z are 1.960 The distribution is not normal but because n is large (70) we may use the central limit theorem and conclude that X 1.44 ~ N 4.2, 70 approximately. Z test 3.9 4.2 2.091... 1.44 70 Therefore we reject H0 since this value is in the critical region and conclude that there is significant evidence that the mean length of telephone calls is not 4.2 minutes. 5

Tests involving unknown variances If we do not know the value of population variance (which is not an unreasonable occurrence in the real world) then we have an added issue to consider. From earlier work however we know how to form an unbiased estimate of the population variance from a sample using the formula or for a larger sample Example s s n 1 2 2 n x i 2 f f 1 n f x x 2 2 i i 2 The breaking stresses of rubber bands is claimed to have mean 46.50 N. A sample of 100 bands is tested with the following summary statistics: x f x 2 4715 x 222910 Is there evidence at the 1% significance level that the breaking stress of the rubber bands has increased? Solution 2 2 100 222910 4715 s 6.037... 99 100 100 H 0 1 : 46.50 H : 46.50 Critical value at 1% level is 2.326. Since n is 100 (large) we may use the central limit theorem and assume that X is normally distributed with mean and variance 2 s 100 using our estimate obtained earlier. Z test 4715 46.50 100 2.645... 6.037.. 100 6

Since our test statistic is in the critical region we reject H0 and conclude that there is significant evidence at the 1% level to believe that the mean breaking stress has increased. Note H0 would also be rejected at the 0.5% level (but not at the 0.25% level). We clearly have very strong evidence of an increase in mean breaking stress here. Suggested questions to try Ex 5D p 111 Q 2, 3, 4, 5 Ex 5E p 114 Q 3, 6 (NB by a p-value they mean what I can Method 2!) Miscellaneous Exercise 5 p 115 Q 2, 3, 4, 5, 6, 7 (these questions are particularly recommended!) 7

Errors in Hypothesis Testing In hypothesis testing we wish to accomplish one of the following two conclusions: Accept a true hypothesis Reject a false hypothesis We could however make the wrong conclusion. That is we could make an error and either reject a true hypothesis (known as a Type I error) accept a false hypothesis (known as a Type II error) Which of these errors would be worse very much depends on the situation. Consider a couple of examples. Courtroom Hypothesis: Max is a psychopathic axe murder who is a danger to society If this is true and but we reject it (believing that it is false) Max will walk free from the courtroom and society is at risk! If this is false but we accept it as true, Max could end up languishing in prison for a crime he did not commit. Production line Hypothesis: The batch of sugar delivered to make yummy fudge is of a high quality. If this is true but we reject it (believing that it is false) we would return it to the suppliers causing unnecessary delay and expense. If this is false and we accept it as true, we could end up with a batch of fudge that is far from yummy. 8

Type I and Type II errors for the normal distribution P(Type I error) P(rejecting H H is true) 0 0 P(Type II error) P(accepting H H is false) P(accepting H H is true) 0 0 0 1 Returning to our first example, namely It is believed that the masses of bars of chocolate are normally distributed with mean 200g and standard deviation 5. A bar of chocolate is selected at random and found to have a mass of 192 g. Test at the 5% significance level whether the mean mass of the bars has decreased. H : 200 H : 200 0 1 The probability of Type I error is simply the significance level of the test since the probability of rejecting a true hypothesis is the probability of your test statistics being in the critical region. To calculate a Type II error we will need to assume H1 is true. Since this means that is some value less than 200 we cannot make further progress without having some further information about the particular value of. Suppose the mean mass of the chocolate bars was in fact 190 g. P(Type II error) P(accepting H 190) P( X 191.775 190) 0 191.775 190 P Z P( Z 0.355) 1 P( Z 0.355) 1 0.6387 0.3613 5 So we have quite a large risk of accepting a false hypothesis. The left hand shaded tail of the right hand graph shows the Type I error. The right hand shaded tail of the left hand graph shows the Type II error. 0.1 0.05 0 160 170 180 190 200 210 220 It is quite clear that the smaller the Type I error the larger the Type II error and vice versa. This just emphasises the fact that you need to consider which of the two types of error is the worst to have in a given scenario. 9

Example It is believed that the mean mark on a particular examination paper is 52% with a variance of 100% 2. Determine the critical region for a two-tailed test for the sample mean of a sample of size 9 with a 1% significance level. Given that the mean mark is actually 58% find the probability of making a Type II error. Explain what Type I and Type II errors mean in the context of the question. Solution H : 52 H : 52 0 1 The critical region is given by Z 2.576 and Z 2.576. So X 52 2.576 i.e. X 43.41% or 100 9 X 52 2.576 i.e. X 60.59%. 100 9 If the mean is 58% then P(Type II error) P(43.41% X 60.59% 58%). 43.41 58 60.59 58 P(43.41% X 60.59% 58%) P Z 100 100 9 9 P(Type II error) P( 4.377 Z 0.777) P( Z 0.777) P( Z 4.377) P(Type II error) P( Z 0.777) 0.7815 A Type I error occurs if H0 examination paper is 52%. A Type II error occurs if H0 paper is not 52%. is rejected when in fact the mean mark on the is not rejected but the mean mark on the examination Suggested questions to try Ex 7A p 135 Q 1, 3, 4, 5 10

Hypothesis Tests with the Binomial Distribution One tailed test Example 1 Consider the hypotheses H 0 : p = 0.3 H 1 : p < 0.3 for a sample of size 30. X ~ B (30,0.3) Because we are looking for a reduction in p we look for a critical region of the form X c. P (X 3) = 0.0093 P (X 4) = 0.0302 P (X 5) = 0.0766 P (X 6) = 0.1595 Suppose we are looking for a critical region for a nominal significance level of 5%. The first region which gives a probability of less than 5% is X 4 so this is our critical region. For a nominal significance level of 10% we would choose a critical region of X 5 for this is the first region that gives a probability of less than 10%. The actual significance levels (i.e. the Type I errors) are 3.02% and 7.66% respectively. 11

Example 2 Consider the hypotheses H 0 : p = 0.8 H 1 : p > 0.8 for a sample of size 30. X ~ B (30,0.8) This time we are looking for a critical region of the form X c. P (X 25) = 1 P (X 24) = 1 0.5725 = 0.4275 P (X 26) = 1 P (X 25) = 1 0.7448 = 0.2552 P (X 27) = 1 P (X 26) = 1 0.8773 = 0.1227 P (X 28) = 1 P (X 27) = 1 0.9558 = 0.0442 Suppose we are looking for a critical region for a nominal significance level of 5%. The first region which gives a probability of less than 5% is X 28 so this is our critical region. For a nominal significance level of 20% we would choose a critical region of X 27 for this is the first region that gives a probability of less than 20%. The actual significance levels (i.e. the Type I errors) are 4.42% and 12.27% respectively. Notice that if we were looking for a nominal significance level of 10% we would (for this particular case) have to choose an actual significance level of 4.42% as with the test of a nominal significance level of 5% with the same critical region i.e. X 28. Similar arguments apply for other nominal significance levels. 12

Two tailed test Example 3 Consider the hypotheses H 0 : p = 1/6 H 1 : p 1/6 again for a sample size of 30. X ~ B (30,1/6) Suppose we are looking for a critical region for a nominal significance level of 5%. We look for 2.5% in each tails. P (X 0) = 0.0042 P (X 1) = 0.0295 P (X 2) = 0.1028 P (X 8) = 1 P (X 7) = 1 0.8863 = 0.1137 P (X 9) = 1 P (X 8) = 1 0.9494 = 0.0506 P (X 10) = 1 P (X 9) = 1 0.9803 = 0.0197 The two parts to the critical region would be X 0 and X 10. This gives a total significance level (Type I error) of 0.0042 + 0.0197 = 0.0239. For a nominal significance level of 10% we would look for 5% in each tail. The two parts to the critical region would be X 1 and X 10. This gives a total significance level (Type I error) of 0.0295 + 0.0197 = 0.0492. Note It is sometimes possible to not to look for critical regions that give symmetry in the probabilities in each tail but which give an actual significance as close as possible to the nominal one. The usual practice is to look for a region giving symmetrical probabilities of half the total significance level unless directed otherwise on this course. In the examples above this would lead to a critical region at the 5% level of X 1 and X 10 giving a total significance level (Type I error) of 0.0295 + 0.0197 = 0.0492 = 4.92% and a critical region at the 10% level of X 1 and X 9 giving a total significance level (Type I error) of 0.0295 + 0.0506 = 0.0801 = 8.01%. 13

Calculating Type II Errors Example 1 revisited We return to Example 1 and calculate the probability of obtaining a Type II error if p was in fact 0.1. Suppose we had been looking for a nominal significance level of 5%. This gave a critical region of X 4 with a Type I error of 0.0302 or 3.02%. We were looking at a sample size of 30. P (Type II Error) = P (X > 4 p = 0.1) = 1 P (X 4 p = 0.1) = 1 0.8245 = 0.1755. Example 3 revisited We return to Example 3 and calculate the probability of obtaining a Type II error if p was in fact 1/3. Suppose we had been looking for a nominal significance level of 10% the two parts to the critical region were be X 1 and X 10. This gave a total significance level (Type I error) of 0.0295 + 0.0197 = 0.0492. P (Type II Error) = P (2 X 9 p = 1/3) = P (X 9 p = 1/3) - P (X 1 p = 1/3) = 0.4317 0.0001 = 0.4316. Note When conducting hypothesis tests of this type it is of course possible to emulate Method 2 for the normal distribution and to consider the probability of a value being greater than or equal to (or less than or equal to as appropriate) the test statistic and comparing this with the significance level. 14

Example 4 A coin is spun 30 times and only 8 heads come up. It is suggested that the coin is biased in favour of tails. Is there evidence to support this claim at the 1% significance level? Solution This is a one-tailed test since we are looking for a bias in a particular direction. H 0 : p = 0.5 i.e. that the coin is fair H 1 : p < 0.5 i.e. that the probability of obtaining a head is less than 0.5 Where p is the probability of obtaining a head on a single spin of the coin. P( X 8) 0.0081 0.81% 1% We therefore reject H0 and conclude that there is evidence at the 1% significance level that the coin is biased in favour of tails. Example 5 A die is rolled 18 times and only one 1 come up. It is claimed that the die is fair. Is this claim reasonable based on the number of 1 s occurring. Solution A two-tailed test is appropriate since we are considering whether it is different from what is expected not that there is a lower number of 1 s than expected. H 0 : p = 1/6 since we are looking at it from the point of view of the 1 s only H 1 : p 1/6 1 P( X 1) 0.1728 17.28% using a B(18, ) 6 distribution. So with a significance level of 5% we would accept H0 since 17.28% 2.5% and conclude that there is insufficient evidence to suggest that the die is not fair. Note that we are using 2.5% in one tail because it is a two-tailed 5% test. 15

Using the Normal Approximation in Calculating Errors For Binomial Tests Example 6 Suppose we are considering the hypotheses H 0 : p = 0.3 H 1 : p < 0.3 for a sample of size 100. X ~ B(100,0.3) We know that we can approximate X as X ~ N (100 0.3, 100 0.3 0.7) i.e. N (30,21) Consider a test at a nominal 5% significance level. The critical region would be X 0.5 30 1.645 21 i.e. X 21.96166298... or X 21 since X is discrete. This would give an actual significance level of 21.5 30 P Z P( Z 1.855) 1 P( Z 1.855) 1 0.9682 0.0318 21 If we wanted to calculate the probability of a Type II error if p was in fact 0.2 we would proceed as follows. Now X ~ B (100,0.2) We know that we can approximate X as X ~ N (100 0.2, 100 0.2 0.8) i.e. N (20,16) so 21.5 20 P Z P( Z 0.375) 1 P( Z 0.375) 1 0.6461 0.3539 16 As before increasing the size of a Type I error would reduce the size of a Type II error. 16

Example 7 Suppose we are considering the hypotheses H 0 : p = 0.3 H 1 : p > 0.3 for a sample of size 100. X ~ B (100,0.3) We know that we can approximate X as X ~ N (100 0.3, 100 0.3 0.7) i.e. N (30,21) Consider a test at a nominal 5% significance level. The critical region would be X 0.5 30 1.645 21 i.e. X 38.03833702... or X 39. This too would give an actual significance level of 38.5 30 P Z P( Z 1.855) 1 P( Z 1.855) 1 0.9682 0.0318 21 If we wanted to calculate the probability of a type II error if p was in fact 0.5 we would proceed as follows. Now X ~ B (100,0.5) We know that we can approximate X as X ~ N (100 0.5, 100 0.5 0.5) i.e. N (50,25) so 38.5 50 P Z P( Z 2.3) 1 P( Z 2.3) 1 0.9893 0.0107 25 Problems involving two tailed hypotheses can be handled in a similar way. 17

Example 8 Suppose we are considering the hypotheses H 0 : p = 0.4 H 1 : p 0.4 for a sample of size 200. X ~ B (200,0.4) We know that we can approximate X as X ~ N (200 0.4, 200 0.4 0.6) i.e. N (80,48) Consider a test at a nominal 1% significance level. The critical region would be X 0.5 80 2.576 48 or X 0.5 80 2.576 48 i.e. X 61.652948... or X 98.34705... i.e. X 61 or X 99. This would give an actual significance level of 98.5 80 61.5 80 P Z P Z P( Z 2.670) P(Z 2.670) 48 48 2P( Z 2.670) 2(1 P( Z 2.670)) 2(1 0.9962) 0.76% If we wanted to calculate the probability of a Type II error if p was in fact 0.5 we would proceed as follows. Now X ~ B (200,0.5) We know that we can approximate X as X ~ N (200 0.5, 200 0.5 0.5) i.e. N (100,50) so the probability of a Type II error is given by 61.5 100 98.5 100 P Z P( 5.445 Z 0.212) 50 50 P(0.212 Z 5.445) P( Z 5.445) P( Z 0.212) 1 0.5840 0.4160 18

Poisson Distribution Example 1 The number of telephone calls arriving at a busy office each week requesting help of a particular nature has been assumed to have a Poisson distribution with mean 7. The office staff feel that the number of telephone calls asking for this kind of help has increased. (a) Find a suitable critical region at a 5% significance level. (b) State the actual significance level of the test. (c) If the mean has actually increased to 9 calculate the probability of a Type II error. Solution (a) H 0 : λ = 7 H 1 : λ > 7 Under H 0, X ~ Po (7) P (X 11) = 1 P (X 10) = 1 0.9015 = 0.0985 P (X 12) = 1 P (X 11) = 1 0.9467 = 0.0533 P (X 13) = 1 P (X 12) = 1 0.9730 = 0.0270 So the critical region is X 13 (b) The actual significance level is 2.7% (c) P (Type II error) = P (X <13 λ = 9) = P (X 12 λ = 9) = 0.8758 19

Example 2 The average number of cars passing a particular point on a local road during a 5- minute period at rush hour has been found to be 19. Some road improvements have been made. A count of the number of cars passing that point after the road improvement gave 16 cars passing the point during a particular 5-minute period. (a) Determine a suitable critical region for a two-tailed at a nominal 10% significance level. (b) Determine the actual significance level. (c) Make a conclusions based on the available data. (d) If the average number of cars is in fact 14, calculate the probability of a Type II error. (e) State one way in which the probability of a Type II error could be reduced? (f) What will happen to the probability of a Type II error if the true average was lower than 14? Solution (a) H 0 : λ = 19 and H 1 : λ 19 Under H 0, X ~ Po (19) P (X 11) = 0.0347 P (X 12) = 0.0606 P (X 26) = 1 P (X 25) = 1 0.9269 = 0.0731 P (X 27) = 1 P (X 26) = 1 0.9514 = 0.0486 A suitable critical region would be X 11 and X 27. (b) P (X 11 and X 27) = 0.0347 + 0.0486 = 0.0833 So the actual significance level is 8.33% (c) 16 is in the acceptance region so conclude that there is no evidence that the mean number of cars has changed from 19. (d) P (Type II error) = P (12 X 26 λ =14) = P (X 26 λ =14) P (X 11 λ =14) = 0.9987 0.2600 = 0.7387 (e) Increase the size of the significance level of the test. (f) It will decrease since the true average is moving further away from the average under the null hypothesis. 20

Example 3 The number of times per week that a particular machine breaks down is thought to follow a Poisson distribution with mean 3.1. Some modifications are made to the machine to make it more reliable. Over the next 6 weeks the machine breaks down 15 times. (a) Determine a suitable critical region at a nominal 10% significance level. (b) Determine the actual significance level of this test. (c) Given that the mean number of breakdowns has in fact decreased to 2.9 find the probability of a Type II error. (d) What type of error will have been made in this case? Solution (a) Over the six-week period we have H 0 : λ = 18.6 and H 1 : λ < 18.6. Under H 0, X ~ Po (18.6) Because λ > 15 (and since 18.6 is not in the table) we may use a Normal approximation for X. This gives X ~ N (18.6) approximately. The critical region will be given by X 0.5 18.6 1.282 18.6 i.e. X 12.57102664. Because X is a discrete distribution we make this X 12 (b) To calculate the actual significance level we evaluate 12.5 18.6 P Z P( Z 1.414) 1 P( Z 1.414) 1 0.9213 0.0787 18.6 So the actual significance level is 7.87% (c) If the number of mean weekly breakdowns is 2.9 over a six week period it will be 6 times as great i.e. 17.4. P (Type II error) = 12.5 17.4 P Z P( Z 1.175) P( Z 1.175) 0.8800 17.4 (e) Since 15 is in the acceptance region but the mean has reduced a Type II error will have been made. 21

Observations Remember that if you are simply carrying out a hypothesis test without needing to find the critical region it is often most efficient to actually work out a probability and compare it with the significance level for a one-tailed test or with half the significance level for a two-tailed test. Suggested questions to try Ex 6A p 119 Q 1, 2, 3, 5, 6 Ex 6B p 122 Q 1, 2, 4 Ex 6C p 124 Q 1, 4 Miscellaneous Ex 6 p 125 Q 3, 7, 8, 10, 14, 18 Ex 7B p 141 Q 1, 2, 3, 4, 6 Ex 7C p 145 Q 1, 3, 4, 6 Misc Ex 7 p 147 Q 1, 2, 3, 4, 5, 7 22