Data Analysis and Statistical Methods Statistics 65 http://www.stat.tamu.edu/~suhasini/teaching.html Suhasini Subba Rao Review In the previous lecture we considered the following tests: The independent sample t-test This when we have two completely independent samples drawn from two populations, and we want to compare their population means. If the sample sizes are relatively large and there are few outliers we can test equality of the means (one-sided versions) using the independent sample t-test. Wilcoxon sum rank test This when we have two completely independent samples drawn from two populations, and we want to compare their population means (or distributions). If the sample sizes are small and there appears to outliers we can test equility of their distributions (means) using the Wilcoxon sum rank test (we do not have standard errors in this test). The paired t-test This is when we have paired observations, each of the pairs coming from different populations. Eg. the running time of a runner at high and low altitudes. In this case the pairs are dependent, we cannot use either of the above tests because they are dependent. We can check for dependence by plotting the pairs against each other (eg. high altitude against low altitude). If sample size is relatively large and their are not many outliers use a paired t-test. The wilcox sigh rank test Today s class. A quick review on the idea of testing Suppose a bomb was exploded at the center 0. The bomb spreads debris everywhere. The closer you are to the bomb the more likely the debris will hit you. The standard derivation of the bomb spread distance (remember it is a measure of the amount of spread) is 3 miles (this means the variance is 9). Supposing the spread of the debris has a normal distribution - basically this means that about 95% percent of the debris will be spread over.96 3 radius (meaning either side) of 0. The other 5% will outside this radius. Suppose I am standing 4 miles from the center (0). Do you think the bomb will hit me? Suppose I am standing 9 miles to the right of the center, do you think I will get hit? Whats the probability I will get hit? It is P(Z > 9 3 ) = P(Z > 3) = 0.003. This is very small. 2 3
Suppose I am standing 9 miles to the right of the center and I did get hit. Everyone is telling me the bomb was exploded at 0. But I know the the probability of me being hit when the bomb is located at center 0 is 0.%. I am suspicious, this number is really small - though I don t know the exact location of the bomb. But I am pretty sure that its not at 0 or any location less than 0. This value 0.003 is a p-value. Its the probability of being hit when the center is zero. If its small its unlikely the center is zero. This means rejecting the null that the bomb is located at zero. Another story A more scattering bomb is being exploded at center 0. The standard deviation (measure of spread) of the debris is 6 miles (this means the variance is 36). This means that that 95% of the debris will be spread within.96 6 miles of the center 0. I am standing 9 miles away from the bomb. The chance of me being hit is P(Z > 9 6 ) = 0.09. This means almost a 0% chance of me being hit when the bomb is at zero. Suppose I get hit, then I cannot say that the bomb is at zero, but I cannot say that its not at zero. 0.09 is the p-value for me being hit when the bomb was exploded at zero. This means we cannot reject the null: that the bomb is located at zero. 4 5 What have you learnt from my stories I am an idiot for standing so close to the bombs. The distance that is safe to stay away from the bomb depends on the standard deviation (the variance). The smaller the ratio: distance/(standard deviation), the more likely I am to be hit when the center is zero. The larger the ratio distance/(standard deviation) the less likely I am to be hit when the center is zero. A quick review of JMP output Typically the output in JMP looks like: Mean (difference) std. error Upper 95% Lower 95% N t ratio DF Prob > t Mean is the sample mean, or the differences in the sample mean. Std.Error refers to the estimated standard deviation of the estimator (sample mean). N refers to the sample size. 6 7
t-ratio refers to the z-transform of the estimator. Usually t = M ean/std.error DF refs to the degrees of freedom of the t-distribution used. Prob > t refers to 2 p-value, when testing hypothesis H 0 : true mean = 0 (or difference in means is equal to zero) against the alternative H A : true mean not equal to zero (or difference in means is not equal to zero). JMP output and what you should be able to do with it Using this output you should be able to: Construct 90% CIs and 99% CIs etc. Test the hypothesis H 0 : true mean = 5, for example, (or difference in means is equal to five) against the alternative H A : true mean not equal to 5, for example, (or difference in means is not equal to zero). 8 9 Tests we have done so far Unstandardised Coefficients Example B Std. Error t Sig one sample mean independent samples t-test paired t-test X D D q s 2 n r s 2 n p + m r s 2 d n X µ 0 q s 2 n 0 P 0 @t n X µ 0 q s 2 n A r D s 2 n p + m P B @ t n+m 2 r D C s 2 n p + m A D r s 2 d n 0 P B @ t n r D s 2 C A d n Remember means taking the positive value of a number eg 3 = 3 etc. Remember standard deviation is a measure of spread. If we have normality then a 95% CI for the true parameter is [B.96 Std.Error, B +.96 Std.Error]. Roughly, this means it is highly likely the true mean lies in this interval. In the above table the p-value is evaluated when testing if the true parameters coefficient is zero. That is: Example B hypothesis, p-value evaluted under the null sample mean X H0 : µ = µ 0 H A : µ µ 0 t-test D H0 : µ µ 2 = 0 H A : µ µ 2 0 paired t-test D H0 : µ µ 2 = 0 H A : µ µ 2 0 0
Example I: Runners at altitude Runners were compared at a high and low altitude. For each runner, the running time was measured at a high altitude and then again at a low altitude. 2 runners were used. Runner 2 3 4 5 6 7 8 9 0 2 High 9.4 9.8 9.9 0.3 8.9 8.8 9.8 8.2 9.4 9.9 2.2 9.3 Low 8.7 7.9 8.3 8.4 9.2 9. 8.2 8. 8.9 8.2 8.9 7.5 Do you think altitude has an effect on running time? Let µ Y denote the mean time at a low altitude and µ X denote the mean time at a high altitude. Use α = 0.05. We want to test H 0 : µ Y µ X = µ d 0 against the alternative H A : µ Y µ X = µ d < 0. Solution I: Using the paired t-test Runner 2 3 4 5 6 7 8 9 0 High 9.4 9.8 9.9 0.3 8.9 8.8 9.8 8.2 9.4 9.9 2. Low 8.7 7.9 8.3 8.4 9.2 9. 8.2 8. 8.9 8.2 8.9 Low - High -0.7 -.9 -.6 -.9 0.3 0.3 -.6-0. -0.5 -.7-3.3 Using the sample differences, calculate the sample mean and sample variance: D =.2 and s 2 d = {(0.7.2)2 +...+(.8.2) 2 } =.6. Since D =.2, which is on the side of the alternative (noting it is a one-sided test), we can do the test. 2 3 Under the null hypothesis Example II: The use of cell phones D t(2 ). s 2 d /2 Now we construct a rejection region. The rejection region is towards the left hand side (we see this from the alternative). Hence we reject the null if D = Ȳ X s =.2 is less than 0 t 0.05 () 2 d 2 =.6.79 2 = 0.556. Since D =.2 is less than 0.566 we can reject the null. Equivalently. Using JMP we can get the p-value we see that P(t 3.89) = 0.0026, the p-value is so small that there is enough evidence to reject the null. There has been a lot of speculation that the use of cell phones while driving has increased the number of accidents. Scientists wanted to test whether talking on a cell phone increased a drivers reaction time. To test the hypothesis they randomly sampled 30 people and placed each of them in a car simulator. For each driver the reaction time to the sudden appearance of a colour stimuli when the driver was not on the the phone and when the driver was on the phone was recorded. Based on this sample the average reaction time when not on the phone was X = 0.5 seconds. The average response time when on the phone was Ȳ = 0.7 seconds. The pooled sample variance is s 2 p = 0.4 2 (s 2 p = 0.6) and the sample variance of the differences is s 2 d = 0.2 (s 2 d = 0.0). Is there evidence to suggest that using a cell phone increases the reaction 4 5
time while driving (state the test you would use, the hypothesis and do the test at the 5% level)? Solutions II Because the same person is being used in both experiments (and it is highly likely that reaction time is individual dependent), it would be wise to do a paired based test (since there is likely to be dependence between the pairs). Let µ X be the mean reaction time not on the phone and µ Y be the mean reaction time when on the phone. Let µ d = µ Y µ X, this is the mean reaction on the phone minus mean rection time when off the phone. We do not observe µ X, µ Y and µ d. However, it is conjectured that reaction time increases with cell phone use, if this is true then µ d = µ Y µ X > 0 (eg. average reaction time on phone minus average reaction time without phone is great than zero). Hence we want to test H 0 : µ d 0 against H A : µ d > 0. Since the pairs in the data are dependent, we use the paired t-test. This means to do 6 7 the test we use the sample variance of the differences which is s 2 d = 0.0 and as an estimate of µ d we use ˆD = Ȳ X = 0.7 0.5 = 0.2. Using D = 0.2, s 2 d = 0.0 and (n=30 - there are n pairs) we can do the test just as if it were a one-sample test but using the differences instead (hence we use a t-test - since we estimate the variance). We note that s the standard error is 2 d 0.0 n = 30 = 0.08. Question Suppose we want to test that the average reaction time increases by more than 0. second, what would be the conclusion of the test be (using α = 5%)? We construct a 5% rejection region (RR), it is on the right hand side (since H A is pointing right). Remember to construct the RR we need to center it about the mean in the null which is 0, hence the RR is any value D s greater than 0 + t 29 (0.05) 2 d n =.699 0.08 = 0.03. Since D = 0.2 > 0.03, there is evidence to reject the null. 8 9
What to do when the number of pairs is small? The sample size n = 6 is small, and we have used a t-distribution from 5 degrees of freedom (which means it is not very sensitive at detecting effects, since t 0.025 (5) is quite large). [But despite this we have still rejected the null hypothesis]. Important For obvious reasons a paired t-test is only possible when the two samples are of the same size. But to do the paired t-test we require the usual assumptions. If n is small, then the observations must from a normal distribution (difficult to check in practice). If n is large it does not matter because D will be almost normal. The small sample size begs the question; is there nonparametric version of this test, which does not require normality of D i?. A Nonparametric alternative: The Wilcoxon Sign-Rank test (uses Table 6) This is to test H 0 : µ d = 0 against H A : µ d 0. We do not require normality of D i, but the distribution of the differences D i must be symmetric about the median (one of the main assumptions in this test). Hence the test is equivalent to checking whether the median of the difference is zero, against the alternative that it is not zero. Recipe: Calculate the difference between the pairs of samples D i = X i Y i. 20 2 Delete all zero values and let n be the number of non-zero values. List the absolute values of the differences. To each rank give sign (negative or positive), depending on whether the difference is negative or positive. Add all the negative ranks together, call this T +. Add all the positive ranks together, call this T. Find the smaller of T and T +, and label this T (T = min(t, T + )). Wilcoxon sign-rank test for the Friday 3th data Sam. Sam. 2 Difference sign Abs. Rank - + 9 3-4 - 4 4 6 2-6 - 5 5 4-3 - 3 3 0 +.5.5 3 4 - -.5.5 5 2-7 - 6 6 Total T = 20.5 T + =.5 Look up Table 6. The columns are the sample size of the pairs, depending on whether you are doing a two-sided or one-sided test and α, is select the appropriate value. If T is less than this value, there is enough evidence to reject the null. The test is H 0 : is the mean number of accidents which happen on Friday 3th and 6th are the same (µ d = 0), against the alternative H A : the mean number of accidents which happen on Friday 3th and 6th are different. 22 23
The Wilcoxon sign rank test and Table 6 T + = sum of positive ranks (T = 20.5). T = sum of negative ranks (T + =.5). Though I think this is just because the sample size is too small and the power of the test is zero! If we increased the level to α = 0.2, then Value= 2 and we would be able reject the null. Based on this test, what are your conclusions about Friday 3th??? T = smaller of T + and T (in our example T =.5). Look up Table 6, for a particular p = α and n (so in our case we use a two-sided test with p = 0.05 and n = 6). Reject H 0 if T < value in the table. Looking up the tables with p = 0.05 and n = 6 we see that Value= 0. Since.5 > 0. Using the Wilcoxon signed-rank test we do not have enough evidence to reject the null hypothesis. 24 25 Reminder: What test to do when... If we want to test whether two samples come from the same population (or whether one distribution is a shift of another) and the both samples are of the same size and pairs of observations are independent and the sample size is large or small and data normal then use the t-test. the sample size is small and the data not normal then use the Wilcoxon rank sum (Mann-Whitney U) test. pairs of observations are dependent and the sample size is large or small and data normal then use the paired t-test. the sample size is small and the data not normal then use the Wilcoxon signed rank test. We should use the test which suits the data. Because loss of power can occur if we use the wrong test. Aside What happens when we use the wrong test: For the t-test comparing two sample with the same size we have that the non-rejection region is 2 [ t α/2 (2n 2)s p n, t 2 α/2(2n 2)s n ], s p pooled variance. For the paired t-test we use a t-distribution, and the non-rejection region is [ t α/2 (n )s n, t α/2(n )s d n ], s d = n i (d i d) 2. 26 27
Remember s d and s p are different. In the case of independent pairs s d 2s p. By comparing the non-rejection regions we see: If there is large dependence between pairs and we mistakenly use the t-test we loose power because s is large. If there is only a small dependence between pairs and we mistakenly use the paired t-test we loose power because we are using a t(n ) rather than t(2n 2). Example: Runners at altitude Runners were compared at a high and low altitude. For each runner, the running time was measure at a high altitude and then again at a low altitude. 2 runners were used. Runner 2 3 4 5 6 7 8 9 0 2 High 9.4 9.8 9.9 0.3 8.9 8.8 9.8 8.2 9.4 9.9 2.2 9.3 Low 8.7 7.9 8.3 8.4 9.2 9. 8.2 8. 8.9 8.2 8.9 7.5 Use the Wilcoxon sign rank test to test the hypothesis that the median difference is the same against the alternative that it is different. 28 29 Aside: The normal approximation of the sign-rank test When n > 50 we use a normal approximation. Let µ T = n (n + ) 4 σt 2 = n (n + )(2n + ). 24 Under the null Z = T µ T σ 2 T N(0,). Calculate P(Z < z ). If this is smaller than α/2 reject H 0. 30