16. Nonparametric Methods. Analysis of ordinal data

Size: px

Start display at page:

Download "16. Nonparametric Methods. Analysis of ordinal data"

Norah Patrick
6 years ago
Views:

1 16. Nonparametric Methods 數 Analysis of ordinal data 料 1

2 Data : Non-interval data : nominal data, ordinal data Interval data but not normally distributed Nonparametric tests : Two dependent samples pair t-test 1. Sign Test : Only the information of + or - is used Also can be used to test the median of a population 2. Wilcoxon Signed-Rank Test : Use the rank of the data. Two independent samples : t/z test Wilcoxon Rank-Sum Test More than 2 independent samples : ANOVA F-test Kruskal-Wallis Test Spearman s coefficient of rank correlation r s r Test the significance of r s 2

3 Sign test Data: Binary, + or Problem: H0 : P(+) = P( ) = 0.5 In contrast, if + is preferred, H1 : P( ) > P( ) Or, if is preferred, H1 : P( + ) < P( ) 3

4 Sign test Application 1 1. Whether there is change after an intervention? Before/after experiments : two dependent samples Define increase = +; decrease = If there is no change after the event, P(+)=P(-)=1/2 H0 : P(+) = P( ) = 0.5 In contrast, if there is an increase, P( )>1/2>P( ). Example. 識 Experiment: Data: X=, if > ; X=-, if < Recall : for continuous data, the paired t-test is used under normal population assumption. 4

5 Sign test Application 2 Multiple choices: A or B is preferred? 2. Whether A is more popular than B? Single population : a sample data Define prefer A = +, prefer B = No difference H0: P(prefer A) = P(prefer B) = 0.5 H0: P(+) = P( ) In contrast, if A is more preferred, H1: P(+)>P( ) Example. 諾樂樂 Q A 諾 B 樂 5

6 Sign test Application 3 3. Whether the population median = m0? Single population : a sample data < m0 = ; > m0 = + Whether P(X>m0) = P(X<m0) = 0.5? Example. 數 80? Recall : for continuous data, Ch. 9~10 emphasize only on the population mean. 6

7 Exact Sign Test Let P(+) = Step 1. Null hypothesis : H0 : = 0.5 Step 2. Significance level Step 3. Test statistic : B = number of + 7

8 Step 4. Rejection region : Direction : coincident with H1 H1 : > 0.5, B c, (large B reject H0) H1 : < 0.5, B c, (small B reject H0) H1 : 0.5, B c2 or B c1 (large or small B) Critical value(s) : c? c1? c2? B~ distribution? Check table? 8

9 Step 4. Rejection region : Critical value(s): Tie = no change or x=m0 Let n = untied number = (total + ) + (total ) Then under null hypothesis H0: = 0.5 B ~ Binomial( n, 0.5) check Binomial(n,0.5) table 9

10 Under H0, B ~ Binomial(n, =0.5), the critical value is such that the type I error rate For example, if H1 : >0.5, H0 is rejected if B is too large. Type I error rate =tail prob. =0.1 Thus, rejection region is {B 10}. 10

11 Step 4. Rejection region, critical value : If H1 : > 0.5, find c in Binomial(n, 0.5) table such that P( B c ) If H1 : < 0.5, find c in Binomial(n, 0.5) table such that P( B c ) If H1 : 0.5, find c1, c2 in Binomial(n, 0.5) table such that P(B c 1) /2 and P(B c2 ) /2 Step 5. Draw conclusion 11

12 Example. P548 Samuelson Chemicals. Intervention A computer training program Sample n=15 managers was selected at random Experiment competence and understanding wrt computer before/after the intervention of these managers Ordinal outcome {poor, fair, good, excellent, outstanding} Q : Are the managers more competent after the program? Q : 練識 Define : Sign of difference : + = improved; = worse. Table 16-1 Q : P(improved)=P(+)= > 0.5? Note: if continuous outcome, one may use a pair-t test. 12

13 A tie 13

14 Step 1. H0 : =0.5, H1 : > 0.5 Step 2. = 0.1 Step 3. Test statistic = B = no of + in the sample Step 4. Rejection region The 5 th pair observation is a tie. Thus, n = n-1=15-1=14 Direction : B c Critical value : find c such that P(B c) =0.1 (Table 16-2) 14

15 Table Bin(14,0.5) Exact critical value : c = 10, Exact rejection region : H0 is rejected at =0.1 if B 10 15

16 CHART Bin(14,0.5), under H0 Type I error rate

17 Step 5. Since there are 11 + out of the 14 managers. B=11 10! H0 is rejected at =0.1. Conclusion : the computer course was effective. 17

18 Normal Approximation: Under H0, d B ~ Binomial(n ', 0.5), B N(n ' 0.5, n ' 0.5 (1 0.5)) Validity : n 0 = 0.5 n 5; or equivalently, n 10 Continuity correction ( 連 ): 0.5 Step 3. Test statistic Z B n ', B > 0.5n ' 0.5 n ' = B n ', B < 0.5n ' 0.5 n ' 18

19 Normal Approximation Step 4. Rejection region : Z ~ N(0,1), check N(0,1) table. If H1 : > 0.5, Z Z If H1 : < 0.5, Z -Z If H1 : 0.5, Z -Z /2, Z Z /2 19

20 Example. P553 Cola, Inc. Two versions of the drink are considered : A and B Experiment : a sample of 64 consumers were selected. Each will taste both A and B and indicate a preference. Data : 42 preferred A. Which one is preferred? Let A= +, B= 20

21 Step 1. Hypotheses H0 : no difference in consumer preference Or let =P(A)=P(+), then, H0 : =0.5, H1 : 0.5 Step 2. = 0.05 Step 3. Test statistic B = no of + (A) in the sample No tie, n =n=64, Since n =64 >10 normal approximation Z-test is used. B n ', B > 0.5n ' 0.5 n ' Z = B n ', B < 0.5n ' 0.5 n ' 21

22 Step 4. Rejection region :Z = 1.96, H0 is rejected if Z 1.96 or Z Step 5. Since B = 42 > 0.5(64) = 32, B n ' (64) Z = = = 2.38 > n ' Reject H0 at level Note that : p-value=2p(z>2.38)=2(0.0087)= <

23 Example. P Customer Research Department of Superior Grocers found the median weekly spent on grocery by young couples was $123. The CEO would like to repeat the research to determine whether the median has changed. Data : a random sample of n=102 young couples, 60 spent more than $123, 40 spent less, and 2 spent exactly $123. At the 0.10 significance level, is it reasonable to conclude that the median has changed? Let >123 = +, <123 = 23

24 Step 1. Hypotheses H0 : median = 123, H1 : median 123 Let =P(Spent >123)=P(+), Step 2. = 0.1 H0 : =0.5 H1 : 0.5 Step 3. Test statistic : sign test B = no of + ( >123) in the sample Ignore the 2 ties (exactly x = 123), since n =100 >10 normal approximation Z-test is used. B n', B > 0.5n ' 0.5 n ' Z = B n ', B < 0.5n ' 0.5 n ' 24

25 Step 4. Rejection region :Z 0.05 = 1.65, H0 is rejected if Z 1.65 or Z Step 5. Since B = 60 > 0.5(100) = 50, B n ' (100) Z= = = 1.9> n ' Reject H0 at level Note that : p-value=2p(z>1.9)=2(0.0287)= <

26 Wilcoxon signed-rank test In comparing two dependent sample, the sign test uses only the information of progression(+) or regression( ). The information of magnitudes are lost. Example. Data: A:0 1(+), B:0 1(+), C:0-20(-), D:0-20(-) There are 2 + s and 2 s. The sign test has an insignificant result. However, the decrements are more severe. To embed the information of magnitudes, one may consider a Wilcoxon signed-rank test. 26

27 Wilcoxon signed-rank test Problem: Comparing two populations Data: Two dependent samples Before/After Match pair Measurement: Ordinal or continuous Nonparametric tests: Sign test Wilcoxon signed-rank test Sign test vs. Wilcoxon signed-rank test : Sign Test : categorize data into binary outcomes, either + or -. Wilcoxon Signed-Rank Test : fully using the rank of the data. 27

28 Wilcoxon signed-rank test statistic T statistic 1. Compute the difference(diff) between each pair: 2. If the DIFF = 0, this is a tie. Drop ties from the sample. 3. Classify the data into two groups : Group : DIFF>0 Group : DIFF<0 4. Calculate the absolute DIFF and pool the two groups 5. Order the DIFF s from smallest to largest, assign rank for each. 6. R+ = rank sum for group 7. R- = rank sum for group 8. The test statistic T = the smaller one of R+ and R- 28

29 Rationale of using T H0 : No difference between two populations Recall for the sign test : Under H0, P(+)=P(-)=0.5 The proportions of the two groups are about the same. 29

30 Moreover, H0 : No difference between two populations For the Wilcoxon signed-rank test, If H0 is true, The rank sums should be about the same, R+ R-. The total rank sum = S =? S=( n)=n(n+1)/2. R+ and R- are expected to be equal and close to S/2 If H0 is false, One of R+ and R- is extremely large T = min (R+, R-) should be sufficiently small 30

31 Wilcoxon signed-rank test statistic Step 1. Hypotheses H0 : No difference in the two populations Step 2. Step 3. Wilcoxon signed-rank test statistic, T Step 4. Decision rule : H0 is rejected if T is too small Critical value(s) : Appendix H The row is used for one-tailed tests and 2 row for twotailed tests. Step 5. Draw conclusion 31

32 Example. P557 Mr. Frick, the owner of a family restaurant, developed a new recipe. He wants to conduct some tests to find that if the new recipe is better. Data : a random sample of n=15 customers Each one rates new/old flavors on a scale of 1 to 20. At =

33 Step 1. Hypotheses H0 : no difference in the ratings of the two flavors H1 : the new recipe have higher ratings Step 2. At =0.05 Step 3. Test statistic : Table Compute the DIFF between each pair: 2. If the DIFF= 0, this is a tie. Drop ties from the sample. 3. Classify the data into two groups : Group : DIFF>0 Group : DIFF<0 4. Calculate DIFF and pool the two groups 5. Order the DIFF s from smallest to largest, assign rank for each. 6. R+ = rank sum for group + 7. R- = rank sum for group 8. The test statistic T = the smaller one of R+ and R- 33

34 Table DIFF Remove the sign Pool rank 34

35 What if there are more than one observations which have same value? In the example, the pool DIFF data set is, 2, 3, 4, 7, 8, 8, 8, 9, 9, 9, 10, 12, 14, 16 there are 3 replicated 8 s and 3 replicated 9 s. How to assign rank for these values? These are also ties. same value = a tie When assigning ranks, break the ties by giving the average ranks. For example. The corresponding ranks for the three 8 s should be 5,6,7. Then the average rank is (5+6+7)/3=6, one gives each observation rank 6. 35

36 Step 4. Decision Rule : Appendix H Remove one tie, n=14 At =0.05, H0 is rejected if T 25 Step 5. Since R- = 30 < R+ = 75, T = 30 > 25, thus H0 of no difference is not rejected at level

37 Wilcoxon Rank-Sum Test Problem: Comparing two populations Data: Two independent samples Measurement: Ordinal or continuous Without normal assumption Wilcoxon rank-sum test. Recall : for two continuous independent samples, see Ch. 11 Normal + known variances : Z-test Large sample sizes : Z-test Normal + unknown equal variances : t- test. 37

38 Wilcoxon Rank-Sum Test Wilcoxon rank-sum test statistic, W : 1. Pool the two samples from different populations 2. Order the pooled sample from smallest to largest and assign the rank 3. Calculate the rank sum of each sample 4. W = rank sum of the first sample The critical value(s) of an exact Wilcoxon rank-sum test can be found on some other books. Here, an approximated test is used. 38

39 If H0 is true, W has Mean = Wilcoxon Rank-Sum Test n (n1 + n ) Variance = n n 2(n1 + n ) Approximately normal, if n1, n2 are large. 39

40 The mean of the W-statistic The total rank sum =(1+2+ +(n1+n2))=(n1+n2)(n1+n2+1)/2 If H0 is true, there is no difference, W is expected to be closed to n1 (n1+ n2)(n1+ n2 + 1) n1(n1+ n2 + 1) = n1+ n

41 Step 1. Hypotheses Wilcoxon Rank-Sum Test H0 : No difference in two populations Step 2. At significance level Step 3. Wilcoxon rank-sum test statistic : Z = W n 1 n n 1 2 (n1 + n 2 (n1 + n ) + 1) Where W = sum of the ranks of the 1 st sample from the 1 st population, n1 = sample size of the 1 st sample, n2 = sample size of the 2 nd sample

42 Step 4. Decision Rule : Under null hypothesis, Wilcoxon rank-sum test N(0,1) As a Z-test, find the critical value(s) in N(0,1) table Step 5. Draw conclusion : Note : for small sample sizes, some books provide the exact critical values of the Wilcoxon rank-sum test. 42

43 Example. P561 The president of CEO airlines recently noted an increase in the number of no-shows for flights out of Atlanta. Determine whether there are more no-shows for flights from Atlanta compared with from Chicago. Data : Table 16-4 Sample 1 : number of no-shows of 9 flights from Atlanta Sample 2 : number of no-shows of 8 flights from Chicago At 0.05 significance level. Departure from N o. of no-shows Atlanta Chicago

44 Wilcoxon Rank-Sum Test Step 1. Hypotheses H0 : The distribution of no-shows is the same for Atlanta and Chicago H1 : The distribution of no-shows is larger for Atlanta than for Chicago. Step 2. At =0.05 significance level Step 3. Wilcoxon rank-sum test statistic : Z = W n 1 n n 1 2 (n1 + n 2 (n1 + n ) + 1) Where W = sum of the ranks of the 1 st sample from Atlanta, n1 = sample size of the Altanta sample = 9 n2 = sample size of the Chicago sample = 8 44

45 Step 4. Decision Rule : one-tailed z-test H0 is rejected if Z > Z 0.05 =1.65 Step 5. Draw conclusion : w=96.5 Z = W n 1 n n 1 2 (n1 + n 2 (n1 + n ) + 1) 9( ) 96.5 = 2 9 8( ) 12 = 1.49 < 1.65 Since z = 1.49 < 1.65, the null hypothesis is not rejected at =0.05. Also the p-value = P(Z>1.49)=0.0681>0.05, same conclusion is obtained. 45

46 From No-shows(pool) Rank-1 Chicago Chicago Atlanta Chicago Atlanta Atlanta Chicago Chicago Atlanta Chicago Chicago Atlanta Atlanta Chicago Atlanta Atlanta Atlanta Table W=96.5 Atlanta Chicago No-shows Rank No-shows Rank Chicago ranksum Atlanta ranksum 46

47 Kruskal-Willis Test : ANOVA by ranks Recall to test the equality of several independent, continuous population means, under normal + equal variance assumption, ANOVA is used. See Ch. 12 For ordinal/continuous data, without any assumption about the populations, the Kruskal-Willis one-way analysis of variance by ranks is proposed. 47

48 Kruskal-Wallis Test Step 1. Hypotheses H0 : Several population distributions are the same H1 : The distributions are not all equal Step 2. At significance level Step 3. Kruskal-Wallis test statistic : H 2 2 ( R ) ( R ) ( R ) 12 2 = 1 2 k + + L+ 3(n + 1) n(n + 1) n1 n2 nk Where ( R 1 ),, ( R k ) = sum of the ranks of sample 1,..,k. n 1,,n k = sample size of sample 1,,k n = total sample size = n i 48

49 Step 4. Decision Rule : Under null hypothesis, when n1, nk 5, H has approximately a chisquare distribution with df = (k-1) H d χ 2 k 1 H0 is rejected if H χ 2 (k 1, α) Step 5. Draw conclusion : 49

50 Example. P565 A manager seminar consists of executives from manufacturing, finance, and the trades. Before the seminar, the seminar leader is interested in whether the three groups are equally knowledgeable about management principles. Data : 3 independent samples on scores of a test. Manufacturing executives group : 56, 39, 48, 38, 73, 50, 62 Finance executives group : 102, 87, 51, 95, 68, 42, 107, 89 Trade executives group : 42, 38, 89, 75, 35, 61 50

51 Step 1. Hypotheses Kruskal-Wallis Test H0 : The three population scores have same distributions H1 : The distributions are not all equal Step 2. At =0.05 significance level Step 3. Kruskal-Wallis test statistic : H 2 2 ( R ) ( R ) ( R ) 12 2 = (n + 1) n(n + 1) n1 n2 n3 Where ( R 1 ),( R 2 ),( R 3 ) = rank sums of sample 1,2,3. n 1 =7, n 2 =8, n 3 =6, n = 7+8+6=21 51

52 Step 4. Decision Rule : Since n1, n2, n3 5, df = (k-1)=(3-1)=2, =0.05 H0 is rejected if H χ ( 2,0.05) = Step 5. Draw conclusion : ( R ) ( R ) ( R ) 12 H = (n n(n + 1) n1 n2 n ( 57.5) ( 121) ( 52.5) = + + 3(21+ 1) 21(21+ 1) = < ) Since H=5.736<5.991, the null hypothesis of no difference is not rejected at level However, P-value = is close to

53 Division Test Scores rank sum sample size Trade Manufacturing Manufacturing Finance Trade Trade Manufacturing Finance Trade Manufacturing Manufacturing Finance Manufacturing Trade Manufacturing Finance Manufacturing Trade Finance Finance Trade Finance Finance Finance 53

54 Note. P567 Compare to usual ANOVA F-test in ANOVA can be used if Normal population Equal variances/standard deviations Independent samples Testing H 0 : µ E = µ F = µ T 54

55 異數 EXCEL : ANOVA 數異數度臨 Since p-value=0.034<0.05, H0 is rejected at =0.05 different conclusion! However, the difference in p-values = =0.023, not large. 55

56 Spearman s coefficient of correlation Recall between continuous variables, the Pearson correlation coefficient is considered. See Ch. 13 For ordinal-level data, Spearman s coefficient of rank correlation is used to describe the relationship. r s 6 (R = 1 n(n 1 2 R 2 1) R 1, R 2 = the ranks of each pair of observation ) 2 56

57 Rank-Order Correlation -1 r s 1-1 : perfect negative correlation 1 : perfect positive correlation. 0 : no strong association among the ranks. 57

58 Example. 570 A rating is given by executives o each college graduate joining a plastics manufacturing firm. The rating is an expression of the future potential. Another rating of an training program is also given. Find the correlation between these two ratings. 58

59 Table

60 Then r 2 E RT ) s = 2 6 (R = 1 n(n 1) 6(78.5) = 1 12(144 1).726 There is a strong positive association between these two ratings. 60

61 Test the significance of rank correlation: n 10 Step 1. Hypotheses H0 : The population rank correlation = 0 Step 2. At significance level Step 3. A t-test statistic : t = r s n 2 1 r 2 s Step 4. Decision rule : Under null hypothesis, t~ t-distribution with df = n-2 This is a t-test, find critical values in Appendix F Step 5. Draw conclusion. 61

62 Step 1. Hypotheses Example. 570 H0 : The population rank correlation = 0 H1 : The population rank correlation > 0 Step 2. At =0.05 significance level Step 3. A t-test statistic : t = r s n 2 1 r 2 s 62

63 Example. P570 Continued Step 4. Decision rule : a one-tailed t-test Since df = n-2=12-2=10, =0.05, H0 is rejected if t Step 5. Draw conclusion. t = n r s =.726 = r > s Since t = > 1.812, the null hypothesis of zero correlation is rejected at significance level

64 Sign test Preference : 31 Before/after : 34 Median : 32 Exercise Wilcoxon Signed rank test : 34 Wilcoxon Rank sum test : 36 Kruskal-Wallis test : 35, 37 Spearman s coefficient of rank correlation : 39 64

Data are sometimes not compatible with the assumptions of parametric statistical tests (i.e. t-test, regression, ANOVA)

BSTT523 Pagano & Gauvreau Chapter 13 1 Nonparametric Statistics Data are sometimes not compatible with the assumptions of parametric statistical tests (i.e. t-test, regression, ANOVA) In particular, data