Carapace Example: Confidence Intervals in the Wilcoxon Sign-rank Test

Size: px

Start display at page:

Download "Carapace Example: Confidence Intervals in the Wilcoxon Sign-rank Test"

Quentin Perkins
5 years ago
Views:

1 Math Treibergs Carapace Example: Confidence Intervals in the Wilcoxon Sign-rank Test Name: Example April 19, 2014 In this discussion, we look at confidence intervals in the Wilcoxon Sign-Rank Test for one sample. The data comes from an article by Jeffries, Voris and Yang, Diversity and Distribution of the Pedunculate Barnacles... on the Scillarid Lobster, Crustaceana, 1984, as quoted by Mendenhall, Beaver & Beaver, Introduction to Probability sand Statistics, 14th ed., Brooks Cole, The carapace lengths (in mm) of randomly selected lobsters caught in the seas near Singapore are simulated consistent with the given data. We assume that the sample X 1,..., X N comes from a continuous distribution that is symmetric about its mean µ. The Wilcoxon signed rank statistic is computed as follows. To test H 0 : µ = µ 0 ; H a : µ µ 0. we order the values X i µ 0 from lowest to highest. after throwing out any zero values, we have n nonzero terms. We rank them 1 to n. S + is the sum of the ranks corresponding to terms for which X i µ 0 is positive. The null hypothesis is rejected at the significance level α if S + c or S + n(n+1) 2 c where 2P (S + c) = α. Theorem 1. Fix µ 0. Assume that there are no ties and no zeros among the values X i µ 0 for i = 1..., n. Then S + is equal to the number of pairs (i, j) such that i j and X i + X j 2µ 0. Proof. Observe that the number of such pairs is unchanged if the observations are rearranged. Thus we may suppose that the observations are arranged from smallest to largest X (1) < < X (n). Let r i denote the rank of X (i) µ 0. Let k be the index where the X (i) µ 0 change sign, that is X (k) µ 0 < 0 < X (k+1) µ 0 Then the ranks decrease to k and then increase from k + 1: r 1 > r 2 > > r k, r k+1 < r k+2 < < r n. For example if the sample is 12.2, 10.0, 11.1, 15.5, 14.4 then X (1) = 10.0, X (2) = 11.1, X (3) = 12.2, X (4) = 14.4 and X (5) = 17.7 so if µ 0 = 13.9 then X (1) µ 0 = 3.9, X (2) µ 0 = 2.8, X (3) µ 0 = 1.7, X (4) µ 0 =.5 and X (5) µ 0 = 3.8 thus r 1 = 5, r 2 = 3, r 3 = 2, r 4 = 1 and r 5 = 4, and finally k = 3 and S + = r k r n = = 5. Let χ P be the indicator function which equals 1 if P is true and 0 otherwise. Then the number of pairs satisfying the condition is χ{x (i) + X (j) 2µ 0 } (1) χ{x i + X j 2µ 0 } = i j i j n = = = j=k+1 i=1 n j=k+1 i=1 n j=k+1 j χ{x (j) µ 0 µ 0 X (i) } (2) n χ{r i r j } (3) r j (4) = S + (5) 1

2 (1) holds because the number of pairs does not depend on the order of the variables. (2) holds because if i j k then X (i) µ 0 < X (j) µ 0 < 0 so that X (i) +X (j) < 2µ 0 and does not add to the sum. k + 1 i j if and only if r i r j and j < i if and only if r j < r i. On the other hand if i k then r i r j if and only is µ 0 X (i) = X (i) µ 0 X (j) µ 0 = X (j) µ 0 so X (i) + X (j) 2µ 0. On the other hand, r i r j if and only is µ 0 X (i) = X (i) µ 0 > X (j) µ 0 = X (j) µ 0 so X (i) + X (j) < 2µ 0. It means that for a given j and any i, X (i) + X (j) 2µ 0 if and only if r i < r j so we may replace the sum in (3). Now there are exactly r j is such that r i r j giving (4). But (5) is the definition of S +. The confidence interval associated to the Sign-Rank test H 0 above is the following: we consider the set of values { } Xi + X j A = : 1 i j n 2 If there are no ties, then S + is the number of elements of A that are not smaller than µ 0. If we sort A into {A (1) A (2) A (m) } where m = n(n+1) 2. The one and two-sided critical values for S + may be computed as follow. Assuming H 0, the chance that the sign of the ith observation X i µ 0 is positive or negative is equally likely. Thus the distribution of S + is obtained by looking at the histogram of values ±1 ± 2 ± 3 ± ± n each of which has an equal chance 2 n of occurring. They range from 0 to m. The pmf p(x) of S + is symmetrical about the mean µ S+ = n(n+1) 4. The one-sided upper critical value is P (S + c 1 ) = i c 1 p(i) = α. The two sided critical value c satisfies P (S + c) = P (S + m c) = α 2. These numbers are tabulated in tables A 13 and A 15 of the text. We show how to compute the values of these tables in the code, albeit inefficiently. Then the two-sided confidence interval for µ at the level α where c is the two-sided critical value that satisfies P (S + c) = α 2 is given by ( ) A(m+1 c), A (c). 2

3 Data Set Used in this Analysis : # Math 3080 Carapace Data April 19, 2014 # Treibergs # # From an article by Jeffries, Voris and Yang, "Diversity and Distribution # of the Pedunculate Barnacles on the Scillarid Lobster," Crustaceana, # 1984 as quoted by Mendenhall, Beaver & Beaver, Introduction to # Probability sand Statistics, 14th ed., Brooks Cole, # # The carapace lengths (in mm) of randomly selected lobsters caught in # the seas near Singapore are simulated consistent with the given data R Session: R version ( ) Copyright (C) 2011 The R Foundation for Statistical Computing ISBN Platform: i386-apple-darwin9.8.0/i386 (32-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type license() or licence() for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type contributors() for more information and citation() on how to cite R or R packages in publications. Type demo() for some demos, help() for on-line help, or help.start() for an HTML browser interface to help. 3

4 Type q() to quit R. [R.app GUI 1.41 (5874) i386-apple-darwin9.8.0] [History restored from /Users/andrejstreibergs/.Rapp.history] > > tt=read.table("m3082datacarapace.txt",header=t) > attach(tt) > tt > t2=read.table("m3082datacarapace.txt",header=t) > attach(t2) > table()

5 > ############### CRITICAL VALUES FOR SIGNED-RANK TEST ################# > ############### ILLUSTRATE DETAILS FOR N=5 ########################### > di=5 > #### MATRIX WHOSE COLS ARE ALL POSSIBLE POSITIVE RANK SUMS ############ > M=matrix(1:(di*2^di),ncol=2^di) > for(i in 1:di){M[i,]=rep(i*0:1,each=2^(i-1),length=2^di)} > M [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [1,] [2,] [3,] [4,] [5,] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [1,] [2,] [3,] [4,] [5,] [,26] [,27] [,28] [,29] [,30] [,31] [,32] [1,] [2,] [3,] [4,] [5,] > ################ PDF OF S+ ######################################## > p=table(margin.table(m,2))/2^di;sum(p) [1] 1 > p # ############# WORK OUT TAIL PROBABILITIES ########################## > cump=cumsum(p) > np=length(p);np;di*(di+1)/2+1 [1] 16 [1] 16 > names(cump)=np-1:np > #### N=5 ONE-TAILED CRITICAL FOR SIGNED-RANK TEST (TABLE A13) #### > #### c1 AND ALPHA TABLE ######################################### > cump

6 > cum2p=1-2*cumsum(p) > names(cum2p)=np-1:np > #### N=5 TWO-TAILED CRITICAL FOR SIGNED-RANK TEST (TABLE A15) #### > #### c AND ALPHA TABLE ########################################## > cum2p[1:(np/2)] > > #### COMPUTE TWO-SIDED CRITICAL VALUES FOR CARAPACE DATA ########### > di=length(); di [1] 18 > M=matrix(1:(di*2^di),ncol=2^di) > for(i in 1:di){M[i,]=rep(i*0:1,each=2^(i-1),length=2^di)} > p=table(margin.table(m,2))/2^di;sum(p) [1] 1 > cum2p=1-2*cumsum(p) > np=length(p);np;di*(di+1)/2+1 [1] 172 [1] 172 > names(cum2p)=np-1:np > #### N=18 TWO-TAILED CRITICAL FOR SIGNED-RANK TEST (TABLE A15) ### > #### c AND ALPHA TABLE ########################################## > cum2p[1:(np/2)]

7 > > [1] [16] > ######### GENERATE ALL (X[I]+X[J])/2 FOR I \le J ##################### > xpx=1:(n*(n+1)/2) > n=length();n [1] 18 > for(j in 1:n){for(i in 1:j){xPx[j*(j-1)/2+i]=([i]+[j])/2}} > sxpx=sort(xpx) > ############## SORTED VALUES (X[I]+X[J])/2 FOR I \le J ############## > sxpx [1] [13] [25] [37] [49] [61] [73] [85] [97] [109] [121] [133] [145] [157] [169] > ############## TWO-SIDED CI ON MU FOR CARAPACE DATA ################## > ############## ALPHA =.05 ############################################# > c05=131;n1=di*(di+1)/2-c05+1;c(sxpx[n1],sxpx[c05]) [1] > ############## CANNED CI FOR MU IN SIGNED-RANK TEST ################# > wilcox.test(,conf.int=t) Wilcoxon signed rank test data: V = 171, p-value = 7.629e-06 alternative hypothesis: true location is not equal to 0 95 percent confidence interval: sample estimates: (pseudo)median

8 > ############## ALPHA =.01 ############################################# > c01=143;n1=di*(di+1)/2-c01+1;c(sxpx[n1],sxpx[c01]) [1] > wilcox.test(,conf.int=t,conf.level=.99) Wilcoxon signed rank test data: V = 171, p-value = 7.629e-06 alternative hypothesis: true location is not equal to 0 99 percent confidence interval: sample estimates: (pseudo)median 66.1 > ### CANNED CI DOES NOT USE C=143 OF TABLE A15 FOR 99.0 % TEST ##### > c01=144;n1=di*(di+1)/2-c01+1;c(sxpx[n1],sxpx[c01]) [1]

9 Histogram of Frequency

Gas Transport Example: Confidence Intervals in the Wilcoxon Rank-Sum Test

Math 3080 1. Treibergs Gas Transport Example: Confidence Intervals in the Wilcoxon Rank-Sum Test Name: Example April 19, 014 In this discussion, we look at confidence intervals in the Wilcoxon Rank-Sum