The one-sample t test for a population mean

Size: px

Start display at page:

Download "The one-sample t test for a population mean"

Shannon Baldwin
6 years ago
Views:

1 Objectives Constructing and assessing hyotheses The t-statistic and the P-value Statistical significance The one-samle t test for a oulation mean One-sided versus two-sided tests Further reading: OS3, Sections 4.3 (using known oulation standard deviation) (using estimated oulation standard deviation). ISP: Sections 6.2 and 7.1.

2 Toics: Understanding hyothesis tests Learning objectives Be able to construct the aroriate null and alternative hyothesis based on what one wants to learn (the null hyothesis will always have the equal sign embedded in it). Understand that a statistical test is based on assessing the likelihood/lausibility of the data being generated under null hyothesis is true. If the robability is large then the null is lausible and we cannot reject the null hyothesis. It does not rove the null, but simly states that the null is ossible. If the robability is small the null seems imlausible and we reject the null hyothesis and determine the alternative to be true.

3 Examles of hyothesis We call H 0 the null hyothesis and H A the alternative hyothesis. Examles include: Comaring roduct reviews H 0 : The reviews of two roducts are the same H A : The reviews of two roducts are different. Wine consumtion and health H 0 : Regular consumtion of wine has no effect on olyhenol levels in the blood. H A : Regular consumtion of wine increases olyhenol levels in the blood. Birds and flight H 0 : Bird secies A cannot fly. H A : Bird secies A can fly.

4 Motivation

5 Examle 1: Flying Birds Based on emirical observations we want to answer the following question: Question Can Bird secies A fly? We write this as a hyothesis (conjecture) Null Hyothesis: H 0 : Bird secies A cannot fly. Alternative Hyothesis: H A : Bird secies A can fly. Based on what we observe, is the null lausible? Scenario1: You see one bird fly. You have immediately disroven the null (that this secies of birds cannot fly) and this roves the alternative. Scenario 2: None of the birds are flying (may be too much food to even attemt it). All this is consistent with the null being true, however it does not rove the null (they may fly later on). In this situation we say there is no evidence in the data to rove the alternative.

6 Setting-u the hyothesis

Examle 2: Comaring roduct reviews. q The Coffea (left) has an average review of 4.4 (over 261 customers). q The Smart Lintelek (right) has an average review of 4.

7 Examle 2: Comaring roduct reviews. q The Coffea (left) has an average review of 4.4 (over 261 customers). q The Smart Lintelek (right) has an average review of 4.8 (over 58 customers). q Over these customers Smart Lintelek scored highly. However, just comaring the samled customers does not take into account samling variability.

8 H 0 : The reviews of these two roducts are the same H A : The reviews of these two roducts are different. In an ideal world all tracker customers would rate both devices and we would be able to comare the mean ratings over both. Our hyothesis should be based on the mean ratings of all customers (which in reality can never be observed). Let denote the mean rating of the Coffea over all tracker customers. Let denote the mean rating of the Smart Lintekel over all tracker customers. µ C H 0 : µ C H A : µ C µsm The hyothesis we want to investigate is The null is that globally they would get the same mean ratings. We write this as µ SM = 0 same as H 0 : µ C = µ SM The alternative is that globally they would get different mean ratings. We write this as µ SM 6= 0 same as H A : µ C 6= µ SM

9 Examle 3: Buying a roduct q The Smart Lintelek has an average rating of 4.8 (over 58 customers). This is great, but the samle size is small. q I only buy roducts if I am confident that the oulation mean is not 4 or below. H 0 : µ ale 4.0 vs H A : µ>4.0 {z } I ll buy

10 Looking at the rating 4.8 it is clearly greater than 4.0. This is great. But there is also a doubt in my mind. Could it be that a (samle mean) of 4.8 can arise when the oulation mean is 4.0? I want to calculate the chance of this haening. We will calculate this chance both by hand and using Statcrunch. If the chance turns out to be small I can be sure that that the oulation mean is not 4.0 and must be greater. I will then go on to buy the roduct (since I have rejected the null).

11 The null hyothesis is a very secific statement about arameter(s) of the oulation(s). It is labeled H 0. This is the hyothesis we assess. The null should always have an equal sign in it. Either: =, or. The alternative hyothesis is a more general statement about the arameter(s) that is exclusive of the null hyothesis. It is labeled H A. In all hyothesis test the focus is only on the null hyothesis and assessing its lausibility based on the data.

12 Setting-u the hyothesis and understanding when it is immediately clear that the null is lausible and we cannot reject it.

Examle 4: Buying a roduct q The Teslasz has an average (samle mean) rating of 3.3. q I only buy roducts if I can be sure that the oulation mean of the reviews is over 4.0. H 0 : µ ale 4.

13 Examle 4: Buying a roduct q The Teslasz has an average (samle mean) rating of 3.3. q I only buy roducts if I can be sure that the oulation mean of the reviews is over 4.0. H 0 : µ ale 4.0 vs H A : µ>4.0 {z } I ll buy q With a samle mean of 3.3. I am certainly not going to buy this one. There is no evidence in the samle the mean is greater than 4.0. I cannot reject the null.

14 H 0 : µ ale 4.0 vs H A : µ>4.0 {z } I ll buy q The samle mean will usually be close (in some sense) to the oulation mean. When the samle mean is 3.3 the true mean could easily be lie around 3.3, which is less than 4.0. q This tells us that null hyothesis is lausible and exlains why I cannot reject the null.

15 Question Time Question It is known that a freshman biology has mean score 75%. A rofessor thinks that students who attend early morning classes have a higher mean score. Her early morning class this year can be considered as a samle of all students who take an early morning class. What is the hyothesis of interest? (A) H 0 : μ 75% against H A : μ < 75%. (B) H 0 : μ 75% against H A : μ > 75%. (C) H 0 : μ = 75% against H A : μ 75%. (D) H 0 : μ < 75% against H A : μ 75%. htt:// fcc6

16 Question Time Question It is known that freshman biology has mean score 75%. A rofessor thinks that students who attend early morning classes have a higher mean score. Her early morning class this year can be considered as a samle of all students who take an early morning class. The samle mean (average grade) in her class is 78%. What is the hyothesis of interest. (A) H 0 : μ 78% against H A : μ < 78%. (B) H 0 : μ 78% against H A : μ > 78%. (C) H 0 : μ = 75% against H A : μ 75%. (D) H 0 : μ 75% against H A : μ > 75%. The stated hyothesis should never be based on the data. htt://

17 Question Time q The rice of gasoline has changed. Previously the mean yearly mileage of a vehicle was 4000 miles. I want to see whether the mean yearly mileage has changed after the rice change. What is the hyothesis of interest? q (A) H 0 : μ 4000 against H A : μ = q (B) H 0 : μ = 4000 against H A : μ q (C) H 0 : μ 4000 against H A : μ > q htt:// cb

18 Visually checking lausibility of the null

19 The null hyothesis is a very secific statement about arameter(s) of the oulation(s). It is labeled H 0. This is the hyothesis that we assess. Only if the null seems unlikely do we reject it. Reject the null and acceting the alternative are the same thing. A hyothesis test always checks the validity of the null. In the following Examle 5, we are going to ask if the numbers (observations) in each samle can arise if the null were true? If it seems unlikely, we reject the null and accet the alternative. If it seems lausible, then we do not reject null (though we do not say the null is true; recall Examle 1 with birds flying).

20 Examle 5: Benefits of wine? Wine consumtion and health H 0 : Regular consumtion of wine has no effect on olyhenol levels in the blood. H A : Regular consumtion of wine increases olyhenol levels in the blood. Of course different eole react in different ways. So we should not focus on the individual nor should we focus on only the articiants who took art in the study. We should focus on the oulation of interest (young males, say) and in articular the mean change in olyhenol levels over this entire oulation. µ Let denote the mean change (over the entire oulation) in olyhenol levels after consuming a small amount of red wine on a regular basis. H 0 : µ ale 0 {z } vs H A : µ>0 {z } mean levels of olyhenols have stayed the same or reduced mean levels of olyhenols have risen

21 Whenever we are given the data and the null hyothesis. We must always ask ourselves, could we have obtained that data set if the null hyothesis were true. If it seems that we can, then we cannot reject the null (we cannot say the alterative were true). In the following examles, look at the data and ask yourself could the data have been generated if the null were true. Later we will ut robabilities (called -values) to these notions.

22 White wine: Situation 1 9 males are given white wine. The difference in olyhenol levels before and after the study is: We lot the changes in the olyhenol levels for each individual on the time line (each blue sot corresonds to one observations). We see that there were some negative readings. These are ersons who observed a decrease in olyhenol levels some which are ositive. But every erson is different. The samle mean change is the green vertical line x = 0.7

23 H 0 : µ ale 0 {z } vs H A : µ>0 {z } mean levels of olyhenols have stayed the same or reduced mean levels of olyhenols have risen The aim of the study is to see whether consumtion of white wine increases olyhenol levels. But you find that for these 9 articiants the samle mean has droed. Well it is clear that for these guys we did not see an increase. We could easily have observed such a situation under the scenario that white wine has no effect on olyhenol. Though we cannot say the null is true, we cannot make any ositive claims about the alternative. Formally: Since x = 0.7 is consistent with the null hyothesis μ 0, there is no evidence to disrove the null. There is no evidence in the data that the olyhenol levels increase with moderate consumtion of white wine. In conclusion we cannot reject the null based on this data set.

24 Red wine: Situation 2 9 males are given red wine. The difference in olyhenol levels before and after the study is: We lot the change in olyhenol levels on the line. Every articiant observed an increase in olyhenol levels, all are over are 8.0. The samle mean is x =9.86. So for this grou of articiants there is a clear increase.

25 H 0 : µ ale 0 {z } vs H A : µ>0 {z } mean levels of olyhenols have stayed the same or reduced mean levels of olyhenols have risen Of course, the increases could just be by chance. The concentration of olyhenol in a ersons blood will always change. But it does seem highly unlikely that 9 eole will all observe a substantial increase in olyhenol under the scenario that the red wine did not have an effect. This samle does not aear to be a fluke. Formally we say: It really seems very unlikely we could have obtained this data under the null (oulation mean μ 0) and it strongly suggests that the alternative is true. The data strongly suggests drinking red wine increases olyhenol levels.

26 Red wine: Situation 3 The difference in olyhenol levels before and after the study is: 0.06, -0.36, 0.98, 0.82, -0.25, 2.49, -1.34, 1.16, We lot the change in olyhenol levels on the line. Some observed an increase others observed a decrease in olyhenol. The samle mean is x =0.56

27 H 0 : µ ale 0 {z } vs H A : µ>0 {z } mean levels of olyhenols have stayed the same or reduced mean levels of olyhenols have risen For these articiants there is a small overall increase. Could this data have been observed if wine had no influence on olyhenol level (in other words, if the null were true)? A formal statistical test (we do later) will hel us answer this question. Visual conclusion: Unsure

28 Red wine: Situation 4 The difference in olyhenol levels before and after the study is: -0.43, , 26.11, 4.32, 25.02, 9.40, 11.54, We lot the change in olyhenol levels on the line. Some observed an increase others observed a decrease in olyhenol. There is a lot of variability in the data. But the majority are ositive. x =6.66 The samle mean is.

29 H 0 : µ ale 0 {z } vs H A : µ>0 {z } mean levels of olyhenols have stayed the same or reduced mean levels of olyhenols have risen In order words, to rove the alternative we need to show that the data is unlikely to have been observed if the null were true. Looking at changes in olyhenol levels for the articiants, do you think they could have been observed when red wine has no influence on olyhenol? A formal statistical test (we do later) will hel us answer this question. Visual conclusion: Unsure. A statistical tests and tools allow us to systematically navigate these different scenarios

30 Question Time The hyothesis is H 0 : µ ale 1 H A : µ>1 The green line is the samle mean = 4.8 What is the conclusions of the test? (A) Reject Null (H A is true) (B) Cannot Reject Null (Cannot say H A is true) (C) Do not know. htt://

31 Question Time The hyothesis is H 0 : µ =1 H A : µ 6= 1 The green line is the samle mean = 4.8 What is the conclusions of the test? (A) Reject Null (H A is true) (B) Cannot Reject Null (Cannot say H A is true) (C) Do not know. htt://

32 Question Time The hyothesis is H 0 : µ ale 4 H A : µ>4 The green line is the samle mean = 2.4. These are the ratings of a roduct. What is the conclusions of the test? (A) Reject Null (H A is true, I buy) (B) Cannot Reject Null (Cannot say if H A is true, but I won t buy) htt://

33 Question Time The hyothesis is H 0 : µ 1 H A : µ<1 The green line is the samle mean = 4.8 What is the conclusions of the test? (A) Reject Null (H A is true) (B) Cannot Reject Null (Cannot say H A is true, but I won t buy) htt://

34 Question Time The hyothesis is H 0 : µ ale 4 H A : µ>4 The green line is the samle mean = 5 These are the ratings of a roduct. What is the conclusions of the test? (A) Reject Null (H A is true, I buy) (B) Cannot Reject Null (Cannot say if H A is true, I won t buy) (C) Do not know. htt:// 5

35 Question Time The hyothesis is H 0 : µ ale 4 H A : µ>4 The green line is the samle mean = 4.12 These are the ratings of a roduct. What is the conclusions of the test? (A) Reject Null (H A is true, I buy) (B) Cannot Reject Null (Cannot say if H A is true, I won t buy) (C) Do not know. htt://

36 Discussion It was retty clear what the answer should have been for most of the revious questions. But, the solution to the last question was unclear. This is where we require statistical tools. These tools will give us the chance of obtaining a samle mean of 4.12, when the oulation mean rating (amongst everyone who could have rated the roduct) was 4.0 or less. Interreting robabilities is very imortant.

37 Examle 6: Does the lady take milk? Recall the tea story in Chater 1: In the 1930s a lady, in Cambridge, insisted that the tea tasted different deending on whether milk was oured into the cu and then the tea or if the tea was first oured and then the milk. Fisher suggests that this can be statistically tested, by giving her tea where some cus are made with tea first and other cus are made with milk first and asking her to identify the cu. The cometing hyothesis are: H 0 : The lady has no idea and just guesses. H A: The lady is able to select the correct cu. They collect the data and find that she identifies all 8 cus of tea correctly. This is the observed information from which we have to draw a statistical conclusion. The chance of her identifying all cus correctly is 1/72 = 1.39% under the scenario she is guessing (this is the null hyothesis).

38 Motivation 2 (cont)? Assessing the robability: If the robability is over a threshold, then the null is deemed lausible and we cannot reject the null. If the robability is below the threshold then the null is deemed imlausible and we reject the null. Tyically, the α=5% significance level as used as the threshold. Since 1/72 = 1.39% is less than 5%, we believe the null is imlausible (at the 5% level) and thus reject it (saying that there is evidence to suggest the alternative, that she knows her tea, is true). However, we will never know the truth! If she did the exeriment 100 times and was simly guessing, then about 1.39 times out of a 100 she would correctly identify all cus correctly. Recall 5% is the roortion of times we are willing to reject the null, when in fact the null is true.

39 To summarize In order to rove the alternative we have to calculate how lausible (this is a robability) it is to correctly identify all the cus of tea correctly, under the null that she was simly guessing. How likely is one to collect the data that is observed under the scenario of the null being true. If this robability is small, then it suggests that the null is an imlausible scenario. If the null is an imlausible scenario, then this imlies the alternative is the lausible scenario (we say: there is evidence to suggest the alternative is true).

40 Toic: How to do a hyothesis test Learning objectives: Evaluating a robability Understand how to do a one-sided (both left and right) and two-sided test. Be able to connect the -values of a one-sided test with those of a twosided test. Be able to construct the correct test based on the summary statistics table. Be able to do the test in Statcrunch and interret the outut. Most tests use a t-distribution, but you should understand that a normal distribution is used when the oulation standard deviation is known. You should be able to check for normality of the samle mean based on a QQlot of the data set and using the samling distribution alet in Statcrunch. This will tell us whether the -values are correct or not.

41 The underlying rincile in a test A hyothesis test always checks the validity of the null; in other words, could the numbers in front of you arise if the null were true? In a hyothesis test we calculate the robability of observing the data under the scenario the null is true. Does the data disrove the null? The underlying idea of a hyothesis test is that events with small robabilities are unlikely to haen. If this robability turns out to be small, it suggests that the null assumtion made in the calculation is not true and the alternative is a more logical exlanation for the data.

42 The underlying rincile in a test In most statistical tests we encounter will based on the oulation mean. This may seem very simle, but it will allow us to test a wide range of useful hyotheses. Most calculations will be made using that the samle mean is normal, therefore we always need to check this assumtion else the robability we calculate will be incorrect. In the next few slides we will exlain how to calculate these robabilities. Using one and two sided tests.

43 One-sided tests

44 Examle 1 (one-sided test) A erson will only buy a roduct if they are sure the oulation mean rating is over 4. These are the hyothesis: H 0 : µ ale 4 H A : µ>4 This is an examle of a one-sided test Here is the data that was collected. The samle mean and samle standard deviation is 5 and 0. The samle size 31.

45 A erson will only buy a roduct if they are sure the oulation mean rating is over 4. These are the hyothesis: H 0 : µ ale 4 H A : µ>4 The samle mean and samle standard deviation is 5 and 0. The samle size 31. Since the samle standard deviation is 0, the samle standard error is 0 31 =0 To understand if the null is viable, we evaluate the number of standard errors the samle mean is from the mean under the null. This is the t- transform t = = 1 If the null is true, we comare the above with a t-distribution with 30 df.

46 The t-value The t-value is t = = 1

The white region gives ossible t-values if the null that were true. µ =4 µ<4 µ ale 4 Suose then the t-value is like to be less than 1.697.

47 The white region gives ossible t-values if the null that were true. µ =4 µ<4 µ ale 4 Suose then the t-value is like to be less than Suose then the t-value is likely to be a lot less than For examle if the oulation mean is 1 and then the samle mean will be close to 1 so the t-transform as defined on the revious age will be negative.

48 The t-value from the data us t = = 1 The t-value is infinity it is at the very right of the blue tail. The area to the right of this is called the -value and is zero. This tells us it is imossible to obtain the samle mean 5, when the oulation mean is 4. Conclusion The null is imlausible and we reject the null. We do, however, have to be careful. This data is not normal, it is integer valued. This means the samle mean is not normal, so we have to be careful when using interreting the -value from using a t- distribution.

49 Returning to red wine: Situation 3 The difference in olyhenol levels before and after the study is: 0.06, -0.36, 0.98, 0.82, -0.25, 2.49, -1.34, 1.16, We lot the change in olyhenol levels on the line. Some observed an increase, others observed a decrease in olyhenol. x =0.56 H 0 : µ ale 0 {z } vs H A : µ>0 {z } mean levels of olyhenols have stayed the same or reduced mean levels of olyhenols have risen The samle mean is the samle std. dev = s = 1.14

50 Reminder: This is a one-sided test H 0 : µ ale 0 {z } vs H A : µ>0 {z } mean levels of olyhenols have stayed the same or reduced mean levels of olyhenols have risen This is an examle of a one-sided test A one-sided test is when the alternative hyothesis has a greater than or less than sign. Later we consider examles of two-sided tests. The way we aroach these two different tests are slightly different.

51 In order to rove the alternative, we have to calculate the likelihood of observing the samle mean under the scenario the null is true. The null is and the samle mean is estimating zero (red wine exerts no influence). Here we use the CLT. If the data comes from the normal distribution or the samle size is sufficiently large the average will be close to normal. Thus if the null is true the t-ratio For this data set (standard error = 1.14/3 = 0.38) µ =0 samle mean 0 standard error x =0.56 t = / 9 =1.47 = t distribution with 8df This is a measure of distance between the samle mean and the oulation under the scenario that the null is true.

For this data set the t-ratio is t = 0.56 0 1.14/ 9 =1.

52 For this data set the t-ratio is t = / 9 =1.47 The chance of this haening when the oulation mean is zero is This is what the distribution of the the t- value will look like if the samle mean is normal and the oulation mean is 0 (null is true). The -value = 8.9%

53 q This tells us that there is a 8.9% chance of observing the differences 0.06, -0.36, 0.98, 0.82, -0.25, 2.49, -1.34, 1.16, 1.53 in 8 individuals when over the entire oulation of males who consume red wine there is no mean change. q Since 8.9% is relatively large, the null is lausible (but it does not rove the null). q We cannot reject the null. There is no evidence in the data to back the claim that red wine consumtion increases the mean olyhenol levels. Usually 5% is used as the decision rule. If the -value is less than 5%, we deem the chance small and reject the null at the 5% level. Since 8.9% > 5%, for this data set we cannot reject the null at the 5% level. Warning: 5% is the roortion of times we are will reject the null, when it is true.

54 Reca: The P-value (for one sided test) Definition We want to quantify the roortion of random samles that are at least as unusual as our actual result, if the null hyothesis were true. This quantity is called the -value. The - value (for a one-sided test, which this is) is the area greater than the t-value (since the alternative contains a greater than sign). Red wine Examle: -value = area greater than t-value = 8.9% Since 8.9% > 5% we deem this robability large. For this data set, there is not evidence in the data to suggest that regular consumtion of red wine increases olyhenol levels.

55 Statcrunch Load data into statcrunch Go to Stat -> T Stat -> One Samle (a dro down menu) Select column (choose the data sets) Perform (choose the hyothesis) Press comute

56 Understanding -values from the ersective of rejection regions and boundary of decisions This art is only necessary to understand what statistical ower means.

57 Red wine: One-sided boundary of decision If -value > 5% we cannot reject the null. However, if -value <5% then we reject the null. α=5% is the boundary of the decision. This means the area on the right (since this is a one-sided test) should be less than 5%. This corresonds rejecting the null for any t-transform that is larger than Remember that the t-transform = number of standard errors the samle mean is from the oulation mean under the null. If the null is true the t-transform is small and 1.86 is considered too large (based on the 5% decision rule)

58 1.86 standard errors from the null corresonds to the the samle mean X = = 0.71 Therefore t-values greater than 1.86 corresond to samle means greater than If the null is true, samle means less than blue bar are lausible (at the 5% level). And we reject the null if the samle mean is greater than 0.71 (over the blue bar),

59 Summary We reject the null the samle mean is greater than *0.38 = 0.71 We do not reject the null if the samle mean is less than In this examle, since the samle mean 0.56 is less than 0.71 we cannot reject the null (it is on the left of the blue bar).

60 Red wine: All 4 situations Here the data is lotted for each of the situations. The green line is the samle mean. Focus on the sread of each data set.

61 Matching -values to the red wine examles (Examle 5)

62 For each situation suerimose a bell shae curve centered at zero (see the next slide). Focus only on the right hand side of zero, because we are only looking for evidence of the red wine causing an increase in olyhenol levels.

64 Recall the hyothesis is H 0 : µ ale 0 {z } mean levels of olyhenols have stayed the same or reduced vs H A : µ>0 {z } mean levels of olyhenols have risen Cannot reject Null Reject Null Cannot reject Null Cannot reject Null

65 We test H 0 : μ 0 vs H A : μ > 0. For each case we assess the likelihood of observing the data under the null hyothesis μ 0. We see Situation 1: -value = 98%. It is highly likely to see data like this if red wine did not increase olyhenol levels. Conclusion: cannot reject the null. Situation 2: -value <0.01%. It is highly unlikely to see data like this if red wine did not increase olyhenol levels. Conclusion: reject null, strong evidence of alternative. Situation 3: -value = 8.9%. Data like this can be seen when μ 0. Conclusion: Cannot reject null. Situation 4: -value = 7.67% Data like this can be seen when μ 0. Conclusion: Cannot reject null.

66 Examle: Buying a roduct q I only buy roducts if I can be certain that their oulation mean reviews are over 4.0. H 0 : µ ale 4.0 vs H A : µ>4.0 {z } I ll buy

67 The summary statistics for the tracker is 4.25 is 2.4 standard errors to the right of the null: t = =2.4 The -value is the area to the right of 2.4, which is 0.8%. Since 0.8%<5% we reject the null. If the reviews are reresentative of all the eole who bought the roduct then there is evidence from the samle that the oulation mean review is greater than 4.

68 Question Time Below is the summary statistics for the Coffea watch. Suose we will only buy the watch if we can be sure the mean review is over 4.0. What is the hyothesis of interest, the t-value, -value and conclusion of the test (use the 5% level)? Use that the t-value for t-distribution with 260 df are (A) H 0: µ 4.0 H A: µ > 4.0. the t-value is 2.2 the -value is between 1%-2.5%. We reject the null and I buy the roduct. (B) H 0: µ 4.0 H A: µ > 4.0. the t-value is 0.13 the -value greater than 30% I will not buy the roduct. htt://

69 Question Time Entomologists want to understand the number of chirs a minute a cricket makes. They conjecture that it is less than 17 chis er minutes. They collect the data on 15 crickets. The data is summarized above. What is the hyothesis of interest, the t-value, the -value and result of test (use t-distribution with 14df). (A) H 0: µ 16.6 H A: µ < The t-value is The -value is more than 15% we cannot reject the null. We cannot say the oulation mean is less than 17. (B) H 0: µ 17 H A: µ < 17. The t-value is The -value is more than 15% we cannot reject the null. We cannot say the oulation mean is less than 17. (C)H 0: µ 17 H A: µ > 17. The t-value is The -value is more than 50% we cannot reject the null. We cannot say the oulation mean is less than 17. htt://

70 Two-sided tests

71 Examle: Tomatoes 1 Examle: You are in charge of quality control in your food comany. You randomly samle fourteen acks of cherry tomatoes, each labeled 224 grams. The average weight from your fourteen boxes is 226.1g. Obviously, we cannot exect boxes filled with whole tomatoes to all weigh exactly 224 grams. x Is the somewhat larger samle mean simly due to chance variation? Or is it evidence that the machine that sorts the cherry tomatoes into ackages needs to be recalibrated? The hyothesis: H 0 : µ = 224g (µ is equal to the value claimed by the roduce comany) H A : µ 224g (µ is either larger or smaller than the value claimed)

72 This is a two-sided test H 0 : µ = 224 vs H A : µ 6= 224 This is a two-sided test. This is because there is a not equal sign in the alternative hyothesis.

73 H 0 : µ = 224 vs H A : µ 6= 224 This is the data suerimosed with a normal distribution centered about the mean 224g. We see the data seems to be slightly shifted to the right. We will test if this is statistically significant.

74 H 0 : µ = 224 vs H A : µ 6= 224 After collecting the data, the basic rescrition is to make a z/ttransform. t = 0 X µ A {z} mean under the null s.e t = =2.07

75 How unusual is this data, assuming it is roerly calibrated (null is true)? We calculated that the samle mean is t = 2.07 standard errors from the mean under the null. The area to the right of 2.07 is 2.9%. Samles that are roerly calibrated and are at least as unusual as this have t-value that is either greater than 2.07 or less than The chance of this is the area to the right of 2.07 or area to the left of , which is = 5.8%.

76 Definition: The P-value (for two sided test) Definition We want to quantify the roortion of random samles that are at least as unusual as our actual result, if the null hyothesis were true. This quantity is called the -value. The -value (for a two-sided test, which this is) is 2 the smallest area. Tomato Examle: -value = 2 smallest area = =5.8% Since 5.8% > 5% we deem this robability large. For this data set, we cannot reject the null. We will not investigate the tomato acking machine. There is always the ossibility that the conclusion is incorrect. Further reading: htt://onlinestatbook.com/2/tests_of_means/single_mean.html

77 Always try to match the calculation you have made with the Statcrunch outut for the same roblem. We calculated t = =2.07 Which using a t-distribution with 13 dfs the -value is betweem 5-10% using tables or with comuter exactly 5.8%. We cannot reject the null. Statcrunch gives the same result. It is imortant to ma the calculation to the statcrunch outut.

78 Examle: Tomatoes 2 Examle: You are (again) in charge of quality control in your food comany. You randomly samle fourteen acks of cherry tomatoes, each labeled 224 grams. The average weight from your fourteen boxes is 221.7g. Obviously, we cannot exect boxes filled with whole tomatoes to all weigh exactly 224 grams. Is the somewhat smaller weight simly due to chance variation? Or is it evidence that the machine that sorts the cherry tomatoes into ackages needs to be recalibrated? The hyothesis: H 0 : µ = 224g (µ is equal to the value claimed by the roduce comany) H A : µ 224g (µ is either larger or smaller than the value claimed)

79 The tomato machine data (2) The next data we observe this data The summary statistics are The samle mean is 221.7g (average of 14 boxes).

80 H 0 : µ = 224 vs H A : µ 6= 224 This is the data suerimosed with a normal distribution centered about the mean 224g. We see the data seems to be shifted to the left. We will test if this is statistically significant.

81 The basic rescrition H 0 : µ = 224 vs H A : µ 6= 224 After collecting the data, the basic rescrition is to make a z/ttransform. t = X µ A {z} mean under the null s.e This is a summary of the statistics: 1 t = = 9.3

82 How unusual is this data assuming it is roerly calibrated (null is true)? We calculated that the samle mean is t = -9.3 standard errors from the mean under the null. The area to the left of -9.3 is almost 0%. Samles that are roerly calibrated and are at least as unusual as this have t-value that is either greater than 9.3 or less than The chance of this is the area to the right of 9.3 or area to the left of -9.3, which is 2 0 = 0%.

83 Since 0% < 5% we deem this robability very small. It is very, very hard to get this tye of data under the scenario that the machine is roerly calibrated and working. Thus there is strong evidence that the tomato acking machine is not acking correctly and the machine will have to be recalibrated.

84 Connecting two sided tests and confidence intervals The results of a test at a certain significance level and confidence intervals are closely related. We use the two tomato examles to illustrate the connects. We recall the hyothesis is H 0 : µ = 224 vs H A : µ 6= 224 Tomato 1: Summary statistics The 95% confidence interval for the mean is [226.1± ] = [223.9, 228.3]

85 H 0 : µ = 224 vs H A : µ 6= 224 Tomato 1: The 95% confidence interval for the mean is [226.1± ] = [223.9, 228.3] The confidence interval gives lausible values for the mean. This means that 224 is a lausible mean. We cannot discount the null hyothesis. If the mean under the null is inside the 95% confidence interval, then for a two-sided test the -value is greater than 5% and we cannot reject the null. Similarly if the mean under the null is inside a 99% confidence interval for the mean, then the -value for a two sided test is greater than 1%.

86 H 0 : µ = 224 vs H A : µ 6= 224 Tomato 2: The summary statistics is Based on the data the 95% confidence interval for the mean is [221.8± ] = [221.13, 222.5] The interval tells us where the oulation mean is likely to like. 224g is not in this interval. This suggests that 224g is not a lausible mean. Since 224g is not in the 95% confidence interval for the mean, the - value for the two sided test is less than 5%. If 224g is not in the 99% confidence interval for the mean, the -value for the two-sided test will be less than 1%.

87 The above arguments do not hold for one-sided test. The relationshi between one-sided tests and confidence intervals is more comlicated and will not be covered in this class.

88 Question Time The Windchill factor in a certain area is measured over a eriod of 216 days. The summary statistics and the critical values for the t-distribution with 215 degrees of freedom are given below.

89 Linking the different sided tests We recall for a given data set and oulation mean we can do three different tests. However, the results of all the tests are closely related. Situation 1: The results for: H 0 : μ 0 against H A : μ > 0 is q Suose we want to test the hyothesis that red wine decreases olyhenol levels. Then our hyothesis of interest is H 0 : μ 0 against H A : μ < 0. The -value for this test can easily be deduced from the above table. q q The t-value is the same. The -value is different. q The -value is the area to the LEFT of -2.45, which is = 2%.

90 o Testing H 0 : μ 0 against H A : μ < 0. Since the -value 2% < 5% there is some evidence based on this data set that red wine decreases olyhenol levels. o If we test H 0 : μ = 0 against H A : μ 0, the -value is 4% and there is evidence to suggest the mean is not zero.

91 Question Time Exerts conjecture that the weighting time between erutions of Old Faithful is more than 68 minutes. What is the hyothesis of interest and the -value (using the above outut). q (A) q (B) q (C) H 0 : µ ale 68, H A > 68 H 0 : µ ale 70.9, H A > 70.9 H 0 : µ ale 68, H A > 68 the -value is 0.05%. Reject the null. the -value is 0.1%. Reject null the -value is 0.025%. Reject null q htt://

92 Question Time Exerts conjecture that the weighting time for between erutions of Old Faithful less than 68 minutes. What is the hyothesis of interest and the -value (using the above outut). htt:// q (A) q (B) null H 0 : µ 68, H A < 68 H 0 : µ 70.9, H A < 70.9 the -value is 99.95% reject null the -value is 0.05% cannot reject q (C) H 0 : µ 68, H A < 68 the -value is 99.75%, cannot reject null.

93 Question Time (one-sided) Let µ denote the (oulation) mean level of glucose in an exectant mother. If µ > 140 gestational diabetes is diagnosed. The hyothesis we want to test is H 0 : µ ale 140 H A : µ>140 6 blood samles are taken. The results are summarized above. What is the result of the test at the 5% level (use t-distribution with 5df)? (A) The t-value is 1.64 and the -value is between 5-10%. We can reject the null and diagnose diabetes. (B) The t-value is 1.64 and the -value is between 5-10%. We cannot reject the null. The data does not suggest she has gestational diabetes, she could have got a samle mean of 142 even if she were well. htt://

94 When to use the normal distribution instead of a t-distibution in a statistical test

95 Examle: Using the normal distribution Low Potassium Hyokalemia is diagnosed when the blood otassium level is below 3.5mEq/dl. The otassium in a blood samle varies from samle to samle and follows a normal distribution with unknown mean. However, several years of data means that the standard deviation (the variation between samles) is known to be 0.2. Since the standard deviation is known and not estimated from a samle we use a normal distribution instead of a t-distribution (look back at chater 6). As we looking for evidence of low otassium the hyothesis of interest is H 0 : μ 3.5 against H A : μ<3.5. This is a one-sided test.

96 Examle: Using the normal distribution: Low Potassium We test H 0 : μ 3.5 against H A : μ<3.5. A atient has 9 blood samles taken, their samle mean/average is 3.4, is there evidence to suggest low otassium (use 5% significance level)? The standard error is 0.2/ 9 = Below we lot the distribution of the samle mean if the null were true. Left: Distribution of samle mean under the null. The -value is in red. The -value is 6.6%.

97 To calculate the -value using the z-tables, we make the z-transform, which is identical to a t-transform. We simly use a different tables to get the -values z = (s.e =0.2/ 9) = = 1.5 Looking u the z-tables (remember the standard deviation is known) gives the -value 6.68%. As this is greater than 5% we cannot reject the null. Desite the erson having a samle mean below 3.5, such a samle can be collected when their true mean is 3.5. Thus there is not enough evidence that the erson has low otassium. Consequence We do not subject the erson to more medical checks.

98 Examle: Gestational diabetes A atient has gestational diabetes if the mean glucose level of the atient is over 140. We are looking for evidence of gestational diabetes. The test is H 0 : μ 140 against H A : μ > 140. μ is never known. All we have are the results from a few blood samles. However, it is known that the amount of glucose in blood is normally distributed with known standard deviation with σ=4. A atient goes to the doctors. We do not know if she has gestational diabetes (μ is unknown). The glucose level in her blood samles is assumed to normally distributed with σ=4. After taking 4 blood samles her samle mean is 145. Is there evidence that she has gestational diabetes?

99 Examle: Gestational diabetes We want to test H 0: μ 140 against the alternative H A: μ > 140. Based on the data can we disrove that she is healthy. To this we need to know the variability in the samle mean, this is quantified by the standard error = 4/ 4 = 2. Next we have to calculate how far her samle mean is from the mean if she were healthy: z-transform = ( )/2 = 2.5 (we call it a z-transform rather than a t-transform because we know the standard deviation). Since the alternative is ointing to the right, we need to calculate the robability to the right of 2.5. From the z-tables this is 0.6%. 0.6% is quite small. It says the chance of getting a samle mean of 145 or higher, when the atient does not have gestational diabetes is 6 in a Since 0.6% < 5% (it is very small), we disrove the null. There is strong evidence from her blood samles that she has gestational diabetes.

100 Question Time Low otassium is diagnosed if the mean level in a erson is less than 3.5. The standard deviation of a given blood samle is known to be (0.3, this means use a normal distribution). The hyothesis of interest is H 0 : µ 3.5 H A : µ<3.5 A erson has 4 blood samles taken. The samle mean is 3.0. If there any evidence they have low otassium (use the 5% level)? (A) The z-value is z = The -value is 0.04%, this is so small, there is strong evidence to suggest they have low otassium (reject null). (B) The z-value is z = The -value is 0.04%. This is so small, we cannot reject the null. (C) The z-value is z = The -value is 4.7%. There is some evidence to reject the null and determine they have low otassium. htt://

101 Choice of level

102 Deciding the conclusion with α A very small P-value indicates that our results robably did not occur when the null hyothesis is true, and therefore H 0 is imlausible. It should be rejected. In this case we say the evidence is significant. The smaller the P-value the stronger the evidence against H 0. The significance level α is the largest P-value for which we are willing to reject the null hyothesis. The value of α is decided before conducting the test. If the P-value is equal to or less than α then we reject H 0. This is when we accet H a as the truth. If the P-value is greater than α then we fail to reject H 0. Whatever evidence there is, it is not sufficient to accet H a. Tyically we set α=5%.

103 Comments on the decision rule The objective of a test is to make a decision between the lausibility of two cometing hyothesis. The -value is the robability of observing the data under the assumtion the null hyothesis is true. If the -value is less than the significance level (often set at 5%). The decision is to reject the null and go for the alternative instead. If the -value is greater than 5% than the data is consistent with the null being true and we cannot reject the null. The oint is there is a chance we made the wrong decision. We could have wrongly rejected the null when actually the null is true. The chance of this haening is the significance level. In other words, if we set the significance level at 5% and our -value is less than 5% there is 5% chance we have made the wrong decision.

104 The value at which we set the significance level determines how willing we are to wrongly reject the null hyothesis. Examles: Suose we are in a tomato acking lant. Our aim is to ensure that the mean weight of a tomato box is 227g. Every few hours we randomly samle 14 boxes of tomatoes and do a hyothesis test. Each test is done at the 5% level. We do the test 100 times, if the null hyothesis is true, then on average we would falsely reject the null 5 times. Each time we falsely reject the null, it is called a tye I error or in medical terms a false ositive. Suose we reduce the significance level to 1%, in this case if the null were true we would falsely reject the null 1 time out of a hundred.

105 We will show in Chater 8 that by increasing the significance level (from, say 5% to 10%) we increase the number of false ositives, but we are more likely to detect the alternative (if it is true). Decreasing the significance level will have the oosite effect. The -value is measuring the level of evidence against the null. The smaller the -values the more the evidence against it.

106 The Significance level How to choose the significance level? There is a trade off between not wanting to falsely reject the null but wanting to detect the alternative. The lower the significance level, the less likely we are to falsely reject the null, but this makes detecting the alternative much harder! Examle: Consider the court case H 0: Innocent H A: Guilty. The -value is the robability of observing the evidence given the null is actually true. If we set the significance level at 5%. Then a erson is determined guilty if the -value is less than 5%, This means 5% ercent of all innocent eole who were ut on trial will be determined gulity. This is too much! To avoid convicting such a large roortion of guilty eole we need to reduce the significance level.

107 If the significance level is ut to zero, this means that no one who is innocent is ut into jail. However, it also means that all guilty eole are free. In other words no amount of evidence is enough to convict a erson. What significance level seems reasonable in this case? 0.01%? This choice of significance level deends on the alication. 5% is reasonable for a tomato acking lant (we can afford to check a machine several times), but too large for a conviction.

108 Checking reliability of the -value

109 How reliable are these -values? Remember, to calculate the -values we have used the normal or t- distribution (deending on whether the oulation standard deviation is known or not). q Underlying these calculation is the assumtion that the samle mean is normally distributed (remember we always make a lot of of the normal distribution and center it about the mean under the null). If the samle size is not large enough, the central limit theorem will not have `kicked-in. Then the samle mean won t be normally distributed. This means the robabilities we have calculated won t be reliable just like the 95% CI for the mean won t really be a 95% confidence interval. In this case we must be cautious in interreting the results of the test.

110 Nevertheless: If the -value is extremely small (say ), it would be small even if the correct distribution of the samle mean were used. On the other hand, if the -value is close to the 5% significance level we need to careful about its statistical significance (since the correct distribution may mean the true -value is greater than 5%).

111 Examle: Siblings The university is interested in the (oulation) mean number of younger siblings a student has at the university (in the hoe that they will attended the university). They believe that the mean is greater than To test this hyothesis, H 0: μ 0.25 against H A: μ> 0.25 they randomly samle 3 students ask them how many siblings they have, they answer 0, 1, 3. The samle mean is 1.33 and the samle standard deviation is Question: What are the conclusions of the test at the 10% level and comment on the reliability of the result. Answer: The t-transform is t = ( )/(1.53/ 3) = Using the t- tables (with 2df) we see this lies somewhere between 15-20%. Since the alternative hyothesis is ointing RIGHT this means the -value is between 15-20%. Now we comment on the reliability of this -value. In HW9, Q1 we made lot of the samle mean (based on size 3) for younger sibling numbers.

112 q q The distribution of the samle mean is the lowest lot on the left, this is clearly not normal (see also the corresonding QQlot). This means that the -value is not correct, it is based on normality when the samle mean is not normal. This means we have to be very careful when we interret this -value. q We recall if the samle size is larger (in Q2, Quiz 9 we looked at samle size n = 150), then samle mean is close to normal and we corresonding -value will be closer to the truth (as it if came from the true distribution of the samle mean).

113 Lab ractice Out aim is to make inference about the mean weight of a newborn calf based on the samle mean of 44 calves. We first make a histogram of the data, to see if there are any major deviation from normality.

114 The distribution of weights at birth does not have a obvious skew or thick tail. This means that distribution of the samle mean based on a samle of 44 will be very close to normal. So we can rest assured that using the t-distribution (since the standard deviation is unknown) will be reliable. Question: Based on the data is evidence to suggest the mean weight of calves is greater than 90 ounds? We test H 0: μ 90 against H A : μ > 90.

115 We deduce the -value in Statcrunch. The -value 0.44%. Since 0.44% < 1% level, we reject the null at the 1% level (the alternative is true). This means there is strong evidence in the data to suggest the mean weight of calves (of that breed) is greater than 90 ounds.

116 Connecting confidence intervals and statistical tests. This is old material and will not be tested or covered.

117 q Two-sided tests and confidence intervals There is a close connection between confidence intervals and two-sided tests. Let us return to the one bed aartment in Dallas examle. 10 aartments are randomly samled. The samle mean and the samle standard deviation based on this samle is 980 dollars and 250 dollars (both are estimators based on a samle of size ten). The 95% confidence interval for the mean is [980± ]=[801,1159]. Suose we want to know whether the rice of aartments has changed since last year, where the mean rice was 850 dollars. q Based on this interval we see that 850 dollars is contained in this interval. This means the mean could be 850 dollars. There given the samle it is unclear whether the mean rice of aartments is the same since last year or not. q We can rewrite the above as a statistical test H 0: μ = 850 against H A : μ 850. The t-transform is t = ( )/79 = Looking at the t-distribution, we see that 1.64 < (this is the t-value corresonding to 9df at 2.5%). Therefore, the -value is greater than 5%. Thus we cannot reject the null at the 5% level. q Further reading: htt://onlinestatbook.com/2/logic_of_hyothesis_testing/sign_conf.html

118 Summarizing these two observations we see that: 850 lies inside the 95% confidence interval [801,1159]. We are unable to reject the null at the 5% level. If the mean under the null lies in the 95% confidence interval, then this imlies the corresonding -value will be greater than 5%. On the other hand, if the mean under the null does not lie in the 95% confidence interval its -value will be less than 5%. This is easily seen with an illustration (see later slides). If 850 is in an interval centered about 980 (where each side has length 178.7). Then 980 must be the interval centered about 850 with sides of length A few slides earlier we showed that this interval [850± ]=[671,1028] corresonded to oints where we make a decision to reject the null or not at the 5% level. In general, if the mean under the null lies in a (1-α) 100% confidence interval, then the -value for a two sided test will be greater than α.

119 Confidence intervals and one-sided tests Consider the olyhenol and red wine examle considered in Chater randomly samled men were asked to drink red wine every day for two weeks. Their change in olyhenol levels was measured: 0.7, 3.5, 4.0, 4.9, 5.5, 7,0, 7.4, 8.1, 8.4, 3.2, 0.8, 4.3, -0.2, -0.6, 7.5. The average change is 4.3 and samle standard deviation is Review: Two-sided tests and confidence intervals The 95% confidence interval for the change in olyhenol levels is [2.6,5.99]. This means if I am testing the hyothesis H 0 :μ = 0 against the alternative H A : μ 0, since 0 is not in the interval the -value is less than % = 5%. The 99% confidence interval for the chance in olyhenol levels is [1.94,6.66]. This means if I am testing the hyothesis H 0 :μ = 0 against the alternative H A : μ 0, since 0 is not in the interval the -value is less than % = 1%.

120 q One Sided test (ointing RIGHT) Suose we are testing that olyhenol levels increase. This means testing the hyothesis H 0 :μ 0 against the alternative H A : μ > 0. The -value is the area to the right of 4.3 (see that the alternative is ointing to the right). Since from above we have deduced that in the two sided test the -value is less than 5%, so for the one-sided the -value is less than 2.5%. q Why? Recall the -value for two-sided tests is the smallest area to the left/right of of the t-transform times 2. In this case it is the area to the right of 4.3 times 2. For the two sided test we have deduced that the -value is less than 5%, this imlies that the area to the RIGHT of 4.3 is less than 5/2 = 2.5%. The -value for the one-sided test ointing to the RIGHT is the area to the right of 4.3. We have just shown that the area to the right of 4.3 less than 2.5%. Thus the -value for the one-sided test ointing to the RIGHT is less than 2.5%.

q One Sided test (ointing LEFT) Suose we are testing that olyhenol levels decrease. This means testing the hyothesis H 0 :μ 0 against the alternative H A : μ < 0. Since 4.

121 q One Sided test (ointing LEFT) Suose we are testing that olyhenol levels decrease. This means testing the hyothesis H 0 :μ 0 against the alternative H A : μ < 0. Since 4.3 is not in the 95% confidence interval this means the -value is greater than 97.5% (there is no evidence to reject the null which is clear 4.3 lies within the null hyothesis). q Why? On the revious slide we showed that the -value for the hyothesis ointing to the RIGHT is less than 2.5% - the area to the RIGHT of 4.3 is less than 2.5%. The -value for the test ointing to the LEFT is the area to the LEFT of 4.3. Which has to be greater than 97.5% (since the area to the left lus the area to the right is 100%). But this is obvious. The oint of a test is to see how lausible the data is under the null. If the samle mean is 4.3 and the null is that the true mean is greater than or equal to 0, this is highly lausible! If this is highly lausible we cannot reject the null.

Objectives. 6.1, 7.1 Estimating with confidence (CIS: Chapter 10) CI)

Objectives. 6.1, 7.1 Estimating with confidence (CIS: Chapter 10) CI) Objectives 6.1, 7.1 Estimating with confidence (CIS: Chater 10) Statistical confidence (CIS gives a good exlanation of a 95% CI) Confidence intervals. Further reading htt://onlinestatbook.com/2/estimation/confidence.html