Nonparametric Methods Marc H. Mehlman marcmehlman@yahoo.com University of New Haven Nonparametric Methods, or Distribution Free Methods is for testing from a population without knowing anything about the population s distribution. (University of New Haven) Nonparametric Methods 1 / 44
Table of Contents 1 Sign Test for Median of Ordinal Data 2 Wilcoxon Signed Ranks Test for Matched Pairs 3 Wilcoxon Rank Sum Test for Two Independent Samples 4 Kruskal Wallis Test for Multiple Independent Samples 5 Spearman Rank Correlation Test for Bivariate Data 6 Chapter #12 R Assignment (University of New Haven) Nonparametric Methods 2 / 44
Advantages and Disadvantages to Nonparametric Methods Advantages of Nonparametric Methods Less assumptions so nonparametric methods can be used in more situations. Computations are often simple and easier to understand. Less sensitive to outliers that are actually incorrect observations. Disadvantages of Nonparametric Methods Information is wasted when numerical data is converted into rank data so conclusions tend to be weaker. Underestimates the effect of correct outliers. (University of New Haven) Nonparametric Methods 3 / 44
Definition Let T 1 and T 2 be two tests statistics of H 0 versus H 1. Let n 1 and n 2 be the minimum sample sizes to achieve a test of power β and size α using T 1 or T 2 respectively. The Pitman asymptotic relative efficiency, ARE(T 1, T 2 ), of test statistic, T 1, to test statistic, T 2, is the limit, if it exists, of the ratios n 2 /n 1 as n 1. The ARE of a parametric Test to a nonparametric test is generally less than one because one must give up some efficiency in return for less assumptions. (University of New Haven) Nonparametric Methods 4 / 44
Sign Test for Median of Ordinal Data Sign Test for Median of Ordinal Data Sign Test for Median of Ordinal Data A nonparametric version of the one sample t test, with ARE = 2 π = 0.64. (University of New Haven) Nonparametric Methods 5 / 44
Sign Test Sign Test for Median of Ordinal Data Situations for using the Sign Test Test of Magnitude between Matched Pairs, (x 1, y 1 ), (x 2, y 2 ), (x n.y n). For any (x j, y j ) s such that x j = y j, delete (x j, y j ) from the sample and readjust n, the sample size. Let H 0 : P(X > Y ) = P(Y > X ) d j def = y j x j x def = min(# of positive d j s, # of negative d j s). Test for Median of a x 1, x 2,, x n. Fixed M. For andy x j s such that x j = M, delete x j from the sample and readjust n, the sample size. Let H 0 : median = M d j def = x j M x def = min(# of positive d j s, # of negative d j s). (University of New Haven) Nonparametric Methods 6 / 44
Sign Test Sign Test for Median of Ordinal Data Situations for using the Sign Test Test of Distribution of Nominal Data where x 1,, x n. Let H 0 : P(X = first value) = P(X = second value) { def 1 if x j = first value d j = 0 if x j = second value n n x def = min( d j, 1 d j ) j=1 j=1 = min (# first values, # second values). Nominal data need not be numbers. The two Situations for using the Sign Test on the previous slide are just special cases of the situation above were what was being counted was + s and s. (University of New Haven) Nonparametric Methods 7 / 44
Sign Test Sign Test for Median of Ordinal Data Theorem (Sign Test for x 25) The test statistic is x and the critical values are found in Table A 7. Theorem (Sign Test for x > 25) The test statistic is z = (x+0.5) n/2 n/2 normal. which is approximately standard (University of New Haven) Nonparametric Methods 8 / 44
Sign Test for Median of Ordinal Data (University of New Haven) Nonparametric Methods 9 / 44
Sign Test Sign Test for Median of Ordinal Data Sign Test Example You re a marketing analyst for Chefs-R-Us. You ve asked 8 people to rate a new ravioli on a 5-point Likert scale 1 = terrible to 5 = excellent The ratings are: 2 4 1 2 1 1 2 1 At the.05 level, is there evidence that the median rating is at least 3? 22 H 0 : median = 3 versus H A : median < 3 Table A 7 says to reject at 0.05 significance level since x = 1. (University of New Haven) Nonparametric Methods 10 / 44
Sign Test Sign Test for Median of Ordinal Data Example (example solution using R) > install.packages("bsda") > library(bsda) > dat=c(2,4,1,2,1,1,2,1) > SIGN.test(dat,md=3,alternative="less") One-sample Sign-Test data: dat s = 1, p-value = 0.03516 alternative hypothesis: true median is less than 3 95 percent confidence interval: -Inf 2 sample estimates: median of x 1.5 Conf.Level L.E.pt U.E.pt Lower Achieved CI 0.8555 -Inf 2 Interpolated CI 0.9500 -Inf 2 Upper Achieved CI 0.9648 -Inf 2 (University of New Haven) Nonparametric Methods 11 / 44
Wilcoxon Signed Ranks Test for Matched Pairs Wilcoxon Signed Ranks Test for Matched Pairs Wilcoxon Signed Ranks Test for Matched Pairs A nonparametric version of the matched pair one sample t test, with ARE = 3 π = 0.955. (University of New Haven) Nonparametric Methods 12 / 44
Wilcoxon Signed Ranks Test for Matched Pairs Wilcoxon Signed Ranks Test Given (x 1, y 1 ), (x 2, y 2 ),, (x n, y n), define d j def = y j x y. Discard all bivariate data with d j = 0 (and readjust the sample size n). Next rank the d j s from smallest to largest. For ties, reassign them the average of their would be ranks. (For ties, there is better way than averaging the ranks, but it is more complicated.) Definition The Wilcoxon Signed Rank Statistics are t +(ω) def = sum of ranks of the positive d j s, t (ω) def = sum of ranks of the negative d j s and t def = min(t, t +). Frank Wilcoxon (1892 1965), American Since the Wilcoxon Signed-Ranks Test takes into account both the signs and the ranks of the d j s, it obtains a higher ARE. (University of New Haven) Nonparametric Methods 13 / 44
Wilcoxon Signed Ranks Test for Matched Pairs Wilcoxon Signed Ranks Test Example Let (7.4, 5.5), (5.5, 3.2), (5.6, 6.0), (7.9, 4.5), (6.7, 6.7), (6.5, 3.0), (7.8, 4.8), (3.5, 6.3) be a matched pair random sample. Calculate the Wilcoxon Signed Rank Statistics. Solution: After eliminating d 5 = 0 and reducing the sample size to 7, d 1 = 1.9, d 2 = 2.3, d 3 = 0.4, d 4 = 3.4, d 5 = 3.5, d 6 = 3.0, d 7 = 2.8 so one has rank 1 2 3 4 5 6 7 magnitude d 3 d 1 d 2 d 7 d 6 d 4 d 5 p/m + + + + +. Thus t + = 2 + 3 + 5 + 6 + 7 = 23 t = 1 + 4 = 5, t = 5. (University of New Haven) Nonparametric Methods 14 / 44
Wilcoxon Signed Ranks Test for Matched Pairs Wilcoxon Signed Ranks Test Theorem (Wilcoxon Signed Rank Test) Given a random bivariate sample, (x 1, y 1 ), (x 2, y 2 ),, (x n, y n ), such that def the distribution of the differences, d j = y j x j, is approximately symmetric about zero and has a median of M, consider H 0 : M = 0 versus H A : not H 0. Then a conservative test of H 0 versus H A is is to use t if n 30 t n(n+1)/4 z = if n > 30 n(n+1)(2n+1)/24 as the test statistic. Use Table A 8 if n 30 and assume that Z N(0, 1) if n > 30. (University of New Haven) Nonparametric Methods 15 / 44
Wilcoxon Signed Ranks Test for Matched Pairs (University of New Haven) Nonparametric Methods 16 / 44
Wilcoxon Signed Ranks Test for Matched Pairs Wilcoxon Signed Ranks Test Example (continued) Using data from the above Example, find p value of H 0 : population differences have a median of 0 versus H A : not H 0. Solution: Here n = 7. The approximate p value is smallest α such that t = 5 < t α. From A 8 one has the p value > 0.1. Using R: > xdat=c(7.4,5.5,5.6,7.9,6.5,7.8,3.5) > ydat=c(5.5,3.2,6.0,4.5,3.0,4.8,6.3) > wilcox.test(xdat,ydat,paired=true) Wilcoxon signed rank test data: xdat and yydat V = 23, p-value = 0.1563 alternative hypothesis: true location shift is not equal to 0 (University of New Haven) Nonparametric Methods 17 / 44
Wilcoxon Rank Sum Test for Two Independent Samples Wilcoxon Rank Sum Test for Two Independent Samples Wilcoxon Rank Sum Test for Two Independent Samples A nonparametric version of the two sample t test, with ARE of = 3 π = 0.955. (University of New Haven) Nonparametric Methods 18 / 44
Wilcoxon Rank Sum Test for Two Independent Samples Wilcoxon Rank Sum Test for Two Independent Samples This test is equivalent to the Mann Whitney Test. With a high ARE it is very efficient. Assume one has a random sample from two different populations, namely and an independent population One wishes to test x 1,, x n, y 1,, y m. H 0 : the two populations have equal medians versus H A : not H 0. (University of New Haven) Nonparametric Methods 19 / 44
Wilcoxon Rank Sum Test for Two Independent Samples Wilcoxon Rank Sum Test for Two Independent Samples To obtain test statistics, combine the 2 random samples and order them according to increasing size. One frequently assumes the distributions, F X ( ) and F Y ( ), to be continuous to prevent the possibility of ties when ordering by size (if there are ties average their ranks). Example (Drug Example) Consider (University of New Haven) Nonparametric Methods 20 / 44
Wilcoxon Rank Sum Test for Two Independent Samples Wilcoxon Rank Sum Test for Two Independent Samples Example (Five x s and Four y s Example) Let a random sample of X be 1.2 0.8 0.39 0.7 0.1 and a random sample of Y be 0.2 1.3 0.5 0.6. Then the combined sample is rank 1 2 3 4 5 6 7 8 9 value 0.2 0.1 0.39 0.5 0.6 0.7 0.8 1.2 1.3 x/y y x x y y x x x y Note that in general, there exists ( ) n+m n possible rankings and under H0 they are all equally likely.. (University of New Haven) Nonparametric Methods 21 / 44
Wilcoxon Rank Sum Test for Two Independent Samples Wilcoxon Rank Sum Test for Two Independent Samples Definition After creating the combined, ranked sample, let W n,m x W n,m y def = sum of the ranks of the x s and def = sum of the ranks of the y s. The Wilcoxon statistics are W n,m x and W n,m y. Example (Five x s and Four y s Example, continued) wx 5,4 = 2 + 3 + 6 + 7 + 8 = 26, wy 5,4 = 1 + 4 + 5 + 9 = 19, (University of New Haven) Nonparametric Methods 22 / 44
Wilcoxon Rank Sum Test for Two Independent Samples Wilcoxon Rank Sum Test for Two Independent Samples Theorem One has n(n + m + 1) µ W n,m = x 2 and Theorem (Wilcoxon Rank Sum Test) To test let σ W n,m x = nm(n + m + 1). 12 H 0 : the two populations have equal medians versus H A : not H 0. = W x n,m Z def µ W n,m x σ W n,m x be the test statistic. Then if n, m > 10 then Z is approximately N(0, 1). (University of New Haven) Nonparametric Methods 23 / 44
Wilcoxon Rank Sum Test for Two Independent Samples Wilcoxon Rank Sum Test for Two Independent Samples Example (Drug Example, continued) µ W 12,12 x σ W 12,12 x Using a two sided test, 12(12 + 12 + 1) = = 150 2 12 12(12 + 12 + 1) = = 10 3 12 119.5 150 Z = 10 = 1.760918. 3 p value = 2P(Z 1.760918) = 0.078. (University of New Haven) Nonparametric Methods 24 / 44
Wilcoxon Rank Sum Test for Two Independent Samples Wilcoxon Rank Sum Test for Two Independent Samples Example (Drug Example continued with R) > xdat=c(11,15, 9, 4,34,17,18,14,12,13,26,31) > ydat=c(34,31,35,29,28,12,18,30,14,22,10,29) > sum(rank(c(xdat,ydat))[1:12]) [1] 119.5 > wilcox.test(xdat,ydat,correct=false) Wilcoxon rank sum test data: xdat and ydat W = 41.5, p-value = 0.07786 alternative hypothesis: true location shift is not equal to 0 Warning message: In wilcox.test.default(xdat, ydat, correct = FALSE) : cannot compute exact p-value with ties (University of New Haven) Nonparametric Methods 25 / 44
Kruskal Wallis Test for Multiple Independent Samples Kruskal Wallis Test for Multiple Independent Samples Kruskal Wallis Test for Multiple Independent Samples Nonparametric version of the ANOVA F test, with ARE of = 3 π = 0.955. (University of New Haven) Nonparametric Methods 26 / 44
Kruskal Wallis Test for Multiple Independent Samples Kruskal Wallis Test This nonparametric test is for a one factor ANOVA design without any assumptions of normal error. Furthermore, there is also no assumption that the one way layout is balanced. Definition The Kruskal Wallis Test (also called the H test) is a nonparametric ANOVA-like test that uses the ranks for sample data from k independent populations (k 2). One test H 0 : the populations have the same median versus H A : not H 0. The test is similar to the Wilcoxon Rank Sum Test in that it takes rankings into consideration. In fact, if k = 2, the Kruskal Wallis test is equivalent to the Wilcoxon Rank Sum Test! (University of New Haven) Nonparametric Methods 27 / 44
Kruskal Wallis Test for Multiple Independent Samples Kruskal Wallis Test Conditions for Kruskal Wallis Test 1 There are k independent random samples where k 2. 2 The sample size of the j th sample is n j 5. 3 The total number of observations is N = k j=1 n j. Procedure for Finding Value of Test Statistic H 1 Combine all the samples into one big sample, sort for lowest to highest, and assign a rank to each sample value. In the cases of ties, assign to each observation the average of the ranks involved. 2 For each sample, let R j def = the sum of ranks for sample j. (University of New Haven) Nonparametric Methods 28 / 44
Kruskal Wallis Test for Multiple Independent Samples Kruskal Wallis Test Definition The Kruskal Wallis Test Statistic is H def = Theorem (Kruskal Wallis) For testing 12 N(N + 1) k j=1 R 2 j n j 3 (N + 1). H 0 : the k populations have equal medians versus H A : not H 0. use a right hand test on the test statistic H. The test statistic is approximately χ 2 (k 1). (University of New Haven) Nonparametric Methods 29 / 44
Kruskal Wallis Test for Multiple Independent Samples Kruskal Wallis Test From Nonparametric Statistical Methods, by Myles Hollander, Douglas A. Wolfe and Eric Chicken, third edition, 2014, John Wiley & Sons. where H def = H ( g ), 1 j=1 (t3 j t j )/(N 3 N) g def = number of tied groups t j def = size of tied group j One can even use this formula for groups of one. If there are no ties, one just has H. When there are ties it is best to use H instead of H in the Kruskal Wallis Test. Our textbook, Biostatistics for Biology and Health Sciences, by Marc M. Triola and Mario F. Triola does not define or use H when there are ties, but R does. (University of New Haven) Nonparametric Methods 30 / 44
Kruskal Wallis Test for Multiple Independent Samples Kruskal Wallis Test Example The built in R dataset airquality contains daily air quality measurements in New York, May to September 1973. One wants to test if the medians of the five monthly ozone counts are equal. H 0 : the 5 month s daily ozone readings have equal medians H A : not H 0. versus Using R: > kruskal.test(ozone~month, data=airquality) Kruskal-Wallis rank sum test data: Ozone by Month Kruskal-Wallis chi-squared = 29.267, df = 4, p-value = 6.901e-06 With such a low p value, one rejects H 0. (University of New Haven) Nonparametric Methods 31 / 44
Spearman Rank Correlation Test for Bivariate Data Spearman Rank Correlation Test for Bivariate Data Spearman Rank Correlation Test for Bivariate Data A nonparametric version of the Pearson Correlation test, with an ARE of (3/π) 2 = 0.91. (University of New Haven) Nonparametric Methods 32 / 44
Spearman Rank Correlation Test for Bivariate Data Rank Correlation Given a paired random sample, (X 1, Y 1 ),, (X n, Y n ), from a continuous joint distribution, F XY (, ), one desires to test H 0 : There is no monotonic association between the two variables versus H A : not H 0. A Monotonic association between X and Y means that there is either a positive association between X and Y or there is a positive association between X and Y. (University of New Haven) Nonparametric Methods 33 / 44
Spearman Rank Correlation Test for Bivariate Data Rank Correlation Suppose the paired random sample, (x 1, y 1 ),, (x n, y n ) has been transformed into paired random sample of ranks, (u 1, v 1 ),, (u n, v n ), (if there are ties average their ranks) where u j def = the rank order of the x j among all the x i s and v j def = the rank order of the y j among all the y i s. Definition The Spearman s Rank Correlation Coefficient, r s, of a matched paired random sample, (x 1, y 1 ),, (x n, y n ) is the Pearson Product Moment Correlation Coefficient, r p, of the matched paired random sample, (u 1, v 1 ),, (u n, v n ), formed from the ranks of the original matched paired random sample. Charles Spearman (1863 1945), English psychologist (University of New Haven) Nonparametric Methods 34 / 44
Spearman Rank Correlation Test for Bivariate Data Rank Correlation Advantages of Spearman over Pearson 1 One can use Spearman s Rank Correlation Coefficient for matched pair ordinal data. 2 Spearman s Rank Correlation Coefficient can detect monotonic non linear tendencies between two variables. Disadvantages 1 Less efficient than Pearson s Rank Correlation Coefficient. (University of New Haven) Nonparametric Methods 35 / 44
Spearman Rank Correlation Test for Bivariate Data Rank Correlation Theorem (Spearman Rank Correlation Test) Given a paired random sample, (X 1, Y 1 ),, (X n, Y n) let H 0 : There is no monotonic association between the two variables versus H A : not H 0. Then 1 6 n j=1 dj 2 n(n 2 1) r s = ) n n ( n nj=1 )( nj=1 ) j=1 x j y j x j y j ( nj=1 ( x j 2 nj=1 ) 2 ( nj=1 x j n y j 2 if there are no ties ) ( nj=1 ) 2 if there are ties y j be the test statistic. For critical values, use Table A-9 if n 30. If n > 30. the critical values are r s = ±z n 1 where ±z are the corresponding critical values from N(0, 1). (University of New Haven) Nonparametric Methods 36 / 44
Spearman Rank Correlation Test for Bivariate Data (University of New Haven) Nonparametric Methods 37 / 44
Spearman Rank Correlation Test for Bivariate Data Rank Correlation Example (Job Salary and Stress Rankings) Job Salary Rank Stress Rank d j dj 2 Stockbroker 2 2 0 0 Zoologist 6 7 1 1 Electrical Engineer 3 6 3 9 School Principal 5 4-1 1 Hotel Manager 7 5-2 4 Bank Officer 10 8-2 4 Occupational Safety Inspector 9 9 0 0 Home Economist 8 10 2 4 Psychologist 4 3-1 1 Airline Pilot 1 1 0 0 TOTAL 24 r s = 1 6 n j=1 d2 j n(n 2 1) = 1 6(24) 10(10 2 1) = 0.855. Referring to Table A 9 for n = 10, one sees that ±0.648 corresponds to a significance level of 0.05. Since 0.855 > 0.648, one rejects H 0 (no monotonic association between stress and salary) with a significance level of 0.05. (University of New Haven) Nonparametric Methods 38 / 44
Spearman Rank Correlation Test for Bivariate Data Rank Correlation Example (Job Salary and Stress Rankings, using R) > xdat=c(2,6,3,5,7,10,9,8,4,1) > ydat=c(2,7,6,4,5,8,9,10,3,1) > cor.test(xdat,ydat,method="spearman") Spearman s rank correlation rho data: xdat and ydat S = 24, p-value = 0.003505 alternative hypothesis: true rho is not equal to 0 sample estimates: rho 0.8545455 (University of New Haven) Nonparametric Methods 39 / 44
Spearman Rank Correlation Test for Bivariate Data Rank Correlation Example (Air Quality) > cor(airquality$wind,airquality$temp) [1] -0.4579879 > cor(airquality$wind,airquality$temp,method="spearman") [1] -0.4465408 > cor.test(airquality$wind,airquality$temp,method="spearman") Spearman s rank correlation rho data: airquality$wind and airquality$temp S = 863450, p-value = 7.229e-09 alternative hypothesis: true rho is not equal to 0 sample estimates: rho -0.4465408 Warning message: In cor.test.default(airquality$wind, airquality$temp, method = "spearman") : Cannot compute exact p-value with ties With such a small p value, one rejects the hypothesis that there is no correlation between temperature and wind. (University of New Haven) Nonparametric Methods 40 / 44
Chapter #12 R Assignment Chapter #12 R Assignment Chapter #12 R Assignment (University of New Haven) Nonparametric Methods 41 / 44
Chapter #12 R Assignment 1 Compute the p-value for a two sided sign test for the null hypothesis that the population median for x is 6.5. The alternative hypothesis is that the median is not 6.5. x = c(7.8, 6.6, 6.5, 7.4, 7.3, 7., 6.4, 7.1, 6.7, 7.6, 6.8) 2 In order to investigate whether adults report verbally presented material more accurately from the right than from their left ear, a dichotic listening task was carried out. LEar=c(15,29,10,31,27,24,26,29,30,32,20,5) REar=c(32,30,8,32,20,32,27,30,32,32,30,32) Use the Wilcoxon Signed Rank Test for Matched Pairs to find the p value of H 0 : the left and right ear data have the same median, versus H A : not H 0. 3 Consider the following sets of data on the latent heat of the fusion of ice (cal/gm). Enter the following three lines into R: AA=c(79.98, 80.04, 80.02, 80.04, 80.03, 80.03, 80.04, 79.97, 80.05, 80.03, 80.02, 80.00, 80.02) BB=c(80.02, 79.94, 79.98, 79.97, 79.97, 80.03, 79.95, 79.97) Using the Wilcoxon Rank Sum Test, test to see if the populations AA and BB were sampled from have equal medians. Give the p value of the test. (University of New Haven) Nonparametric Methods 42 / 44
Chapter #12 R Assignment 4 Using the R built in dataset, iris, do a Krushal Wallis Test of Sepal.Width according to Species. (University of New Haven) Nonparametric Methods 43 / 44
Chapter #12 R Assignment 5 Enter in R the data from the following frigatebird example (paste the following block into R): Input = (" Volume Pitch 1760 529 2040 566 2440 473 2550 461 2730 465 2740 532 3010 484 3080 527 3370 488 3740 485 4910 478 5090 434 5090 468 5380 449 5850 425 6730 389 6990 421 7960 416 ") Data = read.table(textconnection(input),header=true) Using the Spearman Rank Correlation test, find the p value of H 0 there isn t a monotonic association between Volume and Pitch versus H A : not H 0. (University of New Haven) Nonparametric Methods 44 / 44