S D / n t n 1 The paediatrician observes 3 =

Size: px

Start display at page:

Download "S D / n t n 1 The paediatrician observes 3 ="

Donna Hines
6 years ago
Views:

1 Non-parametric tests Paired t-test A paediatrician measured the blood cholesterol of her patients and was worried to note that some had levels over 00mg/100ml To investigate whether dietary regulation would lower blood cholesterol, the paediatrician selected 10 patients at random Blood tests were conducted on these patients both before and after they had undertaken a two month nutritional program The results are shown below Patient (X) Before (Y ) After (D) Difference For example, (x, y ) = (, 13) and d = x y = 13 = We observe x = 117, s x = 3, y = 087 and s y = 389 There is lots of variability so it is difficult to tell from these summaries whether there is a statistically significant difference The question of interest is whether dietary regulation lowers blood cholesterol The paediatrician tests the hypotheses H 0 : µ D = 0 versus H 1 : µ D > 0 where D 1,, D 10 are assumed to be iid N(µ D, σd ) The observed estimates of µ D and σd are d = 3 and s d = 3 respectively Under H D 0 0, S D / n t n 1 The paediatrician observes t = 3 3/10 = 5356 From tables, t 9,005 = 6 and t 9,001 = 8 Hence, we reject H 0 at the 5% level but not at the 1% level The p-value is thus between these values and is about 0019 (using R: 1 - pt(5356,9)) There is fairly strong evidence to suggest that this dietary program is effective in lowering blood cholesterol level in these patients 1

2 Signed-rank test aka Wilcoxon matched-pairs test The signed-rank test, which is also known as the Wilcoxon matched-pairs test, is a non-parametric test based on ranks for paired samples We shall utilise it as the non-parametric version of the t-test for testing equivalence of means for paired normal data Suppose we have n pairs and let (X i, Y i ) denote the ith pair We shall assume that the X i s are iid with mean µ X and variance σx and that the Y i s are iid with mean µ Y and variance σy but we shall not make any assumption of normality Let D i = X i Y i i = 1,, n denote the differences The D i are iid with E(D i ) = µ D and V ar(d i ) = σd We shall assume that the distribution is symmetric so that m D = µ D, where m D is the median difference The idea behind the test is that if there is no difference between the pairs, then about half the D i should be negative and half the D i should be positive If we rank the absolute values of the differences, then positive ranks = negative ranks We construct a test statistic T as follows 1 Rank, in ascending order, the absolute value of the differences Let R i be the rank of D i for i = 1,, n If there are ties, each D i is assigned the average value of the ranks for which it is tied Restore the signs of the D i to the ranks, obtaining signed ranks If some of the differences are equal to zero, the most common technique is to discard these observations 3 Calculate T = i:d i>0 R i = positive ranks We wish to test the hypothesis H 0 : distribution of the D i symmetric about zero which is the same as m D = µ D = 0 If H 0 is true, what is the sampling distribution of T? In its simplest form (assuming no ties) then the kth largest difference is equally likely to be positive or negative, and any assignment of signs to the ranks 1,, n is equally likely There are n possible assignments, each occurring with probability n, and for each assignment we may calculate the value of T and thus compute the sampling distribution of T Note that we can write T = n k=1 ki k where I k = { 1 if kth largest Di has D i > 0 0 otherwise

3 Under H 0, the I k s are independent Bernoulli trials with p = P (D i > 0) = 1 and so E(I k ) = 1 and V ar(i k) = 1 Thus, E(T ) = E(I k ) n k = k=1 V a(t ) = V ar(i k ) n k = k=1, (n + 1) So, if H 0 is true, we d expect T / Values much smaller or larger than this are evidence against H 0 If there only a few ties, then the distribution of T is approximately that of the case with no ties 01 H 0 : m D = 0 versus H 1 : m D > 0 Under the assumption of symmetry, this is equivalent to testing H 0 : µ D = 0 versus H 0 : µ D > 0 If H 1 is true then most of the D i > 0 In the most extreme case, all the D i are positive so that T attains its maximum value, T max = We will reject H 0 if our observed value T obs is sufficiently close to T max Tables give us a critical value as to how close T obs must be to T max to reject H 0 The critical region is C = {T obs n(n+1) Critical value in tables} This is a one-sided test, and the critical values for a test of significance α correspond to those for a two-sided test of significance α For example, if n = 10 we have a critical value of 10, 8, 5, 3 for a test of significance 5%, 5%, 1$, 05% respectively 0 H 0 : m D = 0 versus H 1 : m D < 0 Under the assumption of symmetry, this is equivalent to testing H 0 : µ D = 0 versus H 0 : µ D < 0 If H 1 is true then most of the D i < 0 In the most extreme case, all the D i are negative so that T attains its minimum value, T min = 0 We will reject H 0 if our observed value T obs is sufficiently close to 0 Tables give us a critical value as to how close T obs must be to 0 to reject H 0 The critical region is C = {T obs Critical value in tables} This is a one-sided test, and the critical values for a test of significance α correspond to those for a two-sided test of significance α For example, if n = 0 we have a critical value of 60, 5, 3, 37 for a test of significance 5%, 5%, 1$, 05% respectively 3

4 03 H 0 : m D = 0 versus H 1 : m D 0 Under the assumption of symmetry, this is equivalent to testing H 0 : µ D = 0 versus H 0 : µ D 0 We will reject H 0 either if the observed value of T, T obs is sufficiently close to T min = 0 or T max = n(n+1) The critical region is C = {T obs Critical value in tables, T obs n(n+1) Critical value in tables} Once again, critical values are given in the tables For example, if n = 0 we have a critical value of 60, 5, 3, 37 for a test of significance 10%, 5%, $, 1% respectively 0 Worked example We consider the blood cholesterol data set when the blood cholesterol of ten patients was measured before and after a two month nutritional program We no longer assume normality for the distribution of the before and after levels but we do assume symmetry and test the hypothesis H 0 : µ D = 0 versus H 0 : µ D > 0 The following table shows the data, differences, ranks and signed ranks Patient (X) Before (Y ) After (D) Difference Rank of D Signed Rank Our observed value of T is 1 T obs = = 8 For a test of significance 5%, the critical value is 8 and C = {T obs 10(11) 8 = 7} Similarly, a test of significance 1% has a critical value of 5 and C = {T obs 10(11) 5 = 50} We reject H 0 at the 5% level but not at the 1% level This concurs with the t-test analysis

5 Mann-Whitney U-test The Mann-Whitney U-test is the non-parametric test of whether two distributions with the same shape have the same median We assume that X 1,, X n are independent realisations from a distribution with median m X and that Y 1,, Y m are independent realisations from a distribution with median m Y and that the two underlying distributions have the same shape We calculate a test statistic T in the following way 1 Group all n+m observations together and rank them in order of increasing size If there are ties, then an observation is assigned the average value of the ranks for which it is tied Calculate T = smaller sample ranks eg If n < m then there are fewer X observations than Y and T = X ranks If the two sample sizes agree, so n = m, it doesn t matter whether we choose T = X ranks or T = Y ranks Without loss of generality, we shall assume that n m so that there are at least as many observations from Y as from X and T = X ranks If H 0 is true, what is the sampling distribution of T? For simplicity, assume that there are no ties If there are only a few ties then significance levels are not greatly affected Under H 0, every assignment of ranks to the n + m observations is equally likely ( ) n + m There are possible assignments of ranks to the n observations n from X, each of which is equally likely By counting the sum of the ranks for each of these assignments, we may construct the probability distribution of T 05 H 0 : m X = m Y versus H 1 : m X > m Y (n m) The test statistic is T = X ranks If H 1 is true, then we would expect the observations from X to be larger than those from Y and thus have the largest ranks The most extreme case if when the n X values are the largest n values and T attains its maximum, T max = (m + 1) + (m + ) + + (m + n) = nm + 5

6 We will reject H 0 if our observed value T obs is sufficiently close to T max Tables give us a critical value, U, as to how close T obs must be to T max to reject H 0 The critical region is C = {T obs T max U} This is a one-sided test, and the critical values, U for a test of significance α are given in tables for α = 005, 005 and 001 For example, with n = 10 and m =, then T max = 05 and for α = 005, 005 and 001 we have U =, 39 and 33 respectively 06 H 0 : m X = m Y versus H 1 : m X < m Y (n m) The test statistic is T = X ranks If H 1 is true, then we would expect the observations from X to be smaller than those from Y and thus have the smallest ranks The most extreme case if when the n X values are the smallest n values and T attains its minimum, T min = n = We will reject H 0 if our observed value T obs is sufficiently close to T min Tables give us a critical value, U, as to how close T obs must be to T max to reject H 0 The critical region is C = {T obs T min + U} This is a one-sided test, and the critical values, U for a test of significance α are given in tables for α = 005, 005 and 001 For example, with n = 6 and m = 9, then T min = 1 and for α = 005, 005 and 001 we have U = 1, 10 and 7 respectively 07 H 0 : m X = m Y versus H 1 : m X m Y We will reject H 0 either if the observed value of T, T obs is sufficiently close to T min = n(n+1) or T max = nm + n(n+1) Tables give us a critical value U as to how far in from the ends of the interval (T min, T max ) the critical region should be for a test of significance α The critical region is C = {T obs T min + U, T obs T max U} This is a two-sided test, and the critical values, U for a test of significance α are given in tables for α = 010, 005 and 00 For example, with n = 7 and m = 11, then T min = 8 and T max = 105 and for α = 010, 005 and 00 we have U = 19, 16 and 1 respectively 6

7 08 Worked example Do public hospitals in the United States carry out fewer heart surgeries than those of private hospitals? Independent random samples give the number of surgeries: We test the hypothesis Public Private H 0 : med pub = med pri versus H 1 : med pub < med pri We rank the observed data: Data Sample pub pub pri pub pri pub pri pri pub pub pri pub pri Rank The smaller data set corresponds to the private libraries so we follow Subsection 05 and reject H 0 if our observed value of T = pri ranks is sufficiently close to T max where and our observed value of T is T max = = 63 T obs = = 7 For a test of α = 005, the critical value is U = 8 and the critical region is C = {T obs 63 8 = 55} For a test of α = 005, the critical value is U = 6 and the critical region is C = {T obs 63 6 = 57} Finally, a test of α = 001 gives U = and critical region C = {T obs 63 = 59} We have insufficient evidence to reject H 0 at even the 5% level 7

Nonparametric Tests. Mathematics 47: Lecture 25. Dan Sloughter. Furman University. April 20, 2006

Nonparametric Tests. Mathematics 47: Lecture 25. Dan Sloughter. Furman University. April 20, 2006 Nonparametric Tests Mathematics 47: Lecture 25 Dan Sloughter Furman University April 20, 2006 Dan Sloughter (Furman University) Nonparametric Tests April 20, 2006 1 / 14 The sign test Suppose X 1, X 2,...,