Wilcoxon Test and Calculating Sample Sizes

Wilcoxon Test and Calculating Sample Sizes Dan Spencer UC Santa Cruz Dan Spencer (UC Santa Cruz) Wilcoxon Test and Calculating Sample Sizes 1 / 33

Differences in the Means of Two Independent Groups When using the t, t or t p test statistics, we assume that the responses in both groups are normally distributed What if they are not normally distributed? If n 1 and n 2 are large enough, it is still okay to use the t-distribution However, if n1 and n 2 are small, this is a problem This non-normality sometimes occurs in animal studies Dan Spencer (UC Santa Cruz) Wilcoxon Test and Calculating Sample Sizes 2 / 33

Wilcoxon Rank-Sum Test Sometimes called the Mann-Whitney-Wilcoxon test, the Mann-Whitney U test, or the Wilcoxon-Mann-Whitney test Test to see if the location of the responses between the groups is different Interpreted as a test for a difference in medians An example of a nonparametric test, as it does not test about parameters in an assumed distribution Dan Spencer (UC Santa Cruz) Wilcoxon Test and Calculating Sample Sizes 3 / 33

Wilcoxon Rank-Sum: Assumptions Responses are either continuous or ordinal Observations from both groups are independent The shape and spread of the response in the two different populations is the same, but not necessarily normal Dan Spencer (UC Santa Cruz) Wilcoxon Test and Calculating Sample Sizes 4 / 33

t-test Group Density Assumption Density Assumption for t Tests 0.4 0.3 Density 0.2 Group 1 2 0.1 0.0 5.0 2.5 0.0 2.5 5.0 Values Dan Spencer (UC Santa Cruz) Wilcoxon Test and Calculating Sample Sizes 5 / 33

Wilcoxon Group Density Assumption Wilcoxon Density Assumption 0.15 Density 0.10 Group1 Group2 0.05 0.00 0 5 10 Values Dan Spencer (UC Santa Cruz) Wilcoxon Test and Calculating Sample Sizes 6 / 33

Wilcoxon Rank-Sum: Hypotheses Null Hypothesis (H 0 ): The probability of a randomly-selected response from the first population exceeding that of a randomly-selected response from the second population is equal to 0.5 A slightly stronger hypothesis is that the distributions are equal in terms of location This hypothesis implies the above null hypothesis Alternative Hypothesis (H 1 ): The probability of a randomly-selected response from the first population exceeding that of a randomly-selected response from the second population is Not equal to 0.5 Greater than 0.5 Less than 0.5 Dan Spencer (UC Santa Cruz) Wilcoxon Test and Calculating Sample Sizes 7 / 33

Case Study: Chick Weights Newly hatched chicks were separated into two groups Sunflower seed diet Horsebean seed diet After six weeks, the weights of the chicks were measured in grams Dan Spencer (UC Santa Cruz) Wilcoxon Test and Calculating Sample Sizes 8 / 33

Case Study: Chick Weights Boxplots of Chick Weights by Feed Type 270.0 Weight (grams) 267.5 feed horsebean sunflower 265.0 horsebean Feed Type sunflower Dan Spencer (UC Santa Cruz) Wilcoxon Test and Calculating Sample Sizes 9 / 33

Case Study: Chick Weights Both distributions look to be somewhat skewed to the right because they either have a long tail or an outlier (shown as a solitary point) Sample sizes are small (8 and 10, respectively), so t and t are not appropriate here Hypotheses: H 0 : The distribution of chick weights in the two groups is equal H 1 : The distribution of chick weights is lower for the horsebean group Dan Spencer (UC Santa Cruz) Wilcoxon Test and Calculating Sample Sizes 10 / 33

Wilcoxon Rank-Sum Test Statistic Combine groups, and rank all responses from smallest to largest The ranks number from 1 to n n = n1 + n 2 If there are ties, the ranks should be averaged Values 7, 5, 6, 6 Their ranks would be 4, 1, 2.5, 2.5 The test statistic T is the sum of the ranks for the group with the smallest sample size If n 1 = n 2, T falls between the two rank sums Dan Spencer (UC Santa Cruz) Wilcoxon Test and Calculating Sample Sizes 11 / 33

Rank Sums Horsebean Weights Ranks 266.84 14 264.07 6 263.82 4 263.47 2 264.33 8 264.25 7 263.22 1 263.92 5 Sum = 47 Sunflower Weights Ranks 267.75 15 266.02 12 266.29 13 264.89 10 269.24 17 271.63 18 264.74 9 268.36 16 264.99 11 263.69 3 Sum = 124 Dan Spencer (UC Santa Cruz) Wilcoxon Test and Calculating Sample Sizes 12 / 33

Case Study: Chick Weights T = 47 Wilcoxon Rank-Sum rejection region values can be found in a table at https://metxstats.soe.ucsc.edu/node/5 Since the research hypothesis is that the horsebean group has a lower-shifted distribution than the sunflower group, reject H 0 if T is less than the values in the table when n 1 = 8 and n 2 = 10 T is larger than the critical value for α = 0.025, 0.05, and 0.10 Fail to reject H 0 and conclude that distributions are not significantly shifted from one another Dan Spencer (UC Santa Cruz) Wilcoxon Test and Calculating Sample Sizes 13 / 33

Normal Approximation When both treatment groups are larger than 10, the normal distribution approximates the distribution of the Wilcoxon Rank-Sum test statistic rather well z = T µ T σ T µ T = n 1(n 1 + n 2 + 1) 2 n1 n 2 (n 1 + n 2 + 1) σ T = 12 Dan Spencer (UC Santa Cruz) Wilcoxon Test and Calculating Sample Sizes 14 / 33

Normal Approximation: Our Example µ T = n 1(n 1 + n 2 + 1) 2 8(8 + 10 + 1) = 2 = 76 n1 n 2 (n 1 + n 2 + 1) σ T = 12 (8)(10)(8 + 10 + 1) = 12 = 11.25463 Dan Spencer (UC Santa Cruz) Wilcoxon Test and Calculating Sample Sizes 15 / 33

Normal Approximation: Our Example 47 76 z = 11.25463 = 2.576717 This z-score certainly does fall in the rejection region P-value 0.00499 This is a contradictory conclusion! Use this approximation only when samples are large enough! Dan Spencer (UC Santa Cruz) Wilcoxon Test and Calculating Sample Sizes 16 / 33

Wilcoxon Rank-Sum Test in JMP Analyze Fit Y by X Drag your variables to the appropriate Response and Factor boxes and click OK Click the Nonparametric Wilcoxon Test Dan Spencer (UC Santa Cruz) Wilcoxon Test and Calculating Sample Sizes 17 / 33

Wilcoxon Rank-Sum Test in JMP JMP calls the test statistic S instead of T Only the two-sided p-value for the normal aproximation is given For the one-sided p-value, divide by 2 Dan Spencer (UC Santa Cruz) Wilcoxon Test and Calculating Sample Sizes 18 / 33

Sample Size Researchers aim to present evidence to support their hypotheses about how the world works Most of the time, this hypothesis aims to show that treatments are significantly different from one another Usually, the aim is to reject H 0 Ideally, sample sizes would be as big as possible However, time and money often limit sample sizes Dan Spencer (UC Santa Cruz) Wilcoxon Test and Calculating Sample Sizes 19 / 33

Power We want to minimize the chance of failing to reject a false H 0 This chance is often represented by β An experiment s power is the chance that a false H 0 is correctly rejected 1 β When the chance of incorrectly rejecting H 0 is fixed at some value α, the power of a test can be estimated for different sample sizes Dan Spencer (UC Santa Cruz) Wilcoxon Test and Calculating Sample Sizes 20 / 33

Power: t Distributions When H 0 is true, the test statistic is centered around 0 When H 1 is true, the test statistic is proportionally centered at = µ 1 µ 2 D 0 σ 1 n 1 + 1 n 2 For simplicity, the quantity µ 1 µ 2 D 0 is represented as Dan Spencer (UC Santa Cruz) Wilcoxon Test and Calculating Sample Sizes 21 / 33

Calculating Power An experiment where n 1 = n 2 = 5, σ = 10, and = 25 α is fixed at 0.05 for the hypotheses H 0 : µ 1 µ 2 = 0 H 1 : µ 1 µ 2 0 Dan Spencer (UC Santa Cruz) Wilcoxon Test and Calculating Sample Sizes 22 / 33

Power Illustrated β, α, and t 0.4 0.3 Density 0.2 Hypothesis H 0 H 1 0.1 0.0 t* 5 0 5 t Dan Spencer (UC Santa Cruz) Wilcoxon Test and Calculating Sample Sizes 23 / 33

Changing σ 0.4 σ = 10 0.4 σ = 8 0.3 0.3 Density 0.2 Hypothesis H 0 H 1 Density 0.2 Hypothesis H 0 H 1 0.1 0.1 0.0 t* 0.0 t* 5 0 5 t 5 0 5 t Dan Spencer (UC Santa Cruz) Wilcoxon Test and Calculating Sample Sizes 24 / 33

Changing n 0.4 n 1 = n 2 = 5 0.4 n 1 = n 2 = 10 0.3 0.3 Density 0.2 Hypothesis H 0 H 1 Density 0.2 Hypothesis H 0 H 1 0.1 0.1 0.0 t* 0.0 t* 5 0 5 t 5 0 5 t Dan Spencer (UC Santa Cruz) Wilcoxon Test and Calculating Sample Sizes 25 / 33

Maximizing Power Increase n 1 and n 2 and decrease experimental error as much as possible We have previously discussed reducing experimental error by standardizing measurement practices How do we choose the smallest possible sample size while achieving a fixed α and β? Dan Spencer (UC Santa Cruz) Wilcoxon Test and Calculating Sample Sizes 26 / 33

Calculating n Fix or estimate α - Chance of incorrectly rejecting H 0 β - Chance of incorrectly failing to reject H 0 σ - Estimated population standard deviation - The size of difference that is desirable to detect One-sided tests for µ 1 µ 2 : n 1 = n 2 = 2σ 2(z α + z β ) 2 2 Two-sided tests for µ 1 µ 2 : n 1 = n 2 = 2σ 2(z α/2 + z β ) 2 Dan Spencer (UC Santa Cruz) Wilcoxon Test and Calculating Sample Sizes 27 / 33 2

Calculating n If µ 1 µ 2 D 0, type II error probability β Typically, β is chosen to be 0.2 σ is estimated as s calculated from previous experiments is set as the minimum difference that is desirable to detect A treatment is only preferable if it increases CD4 cell count by 100 or more, so 100 Dan Spencer (UC Santa Cruz) Wilcoxon Test and Calculating Sample Sizes 28 / 33

Calculating n: Tooth Growth In a previous lesson, we examined the effects of the source of vitamin C on tooth growth in guinea pigs Let s say we want to conduct another study, but this time, we want to be able to detect a true difference of 3 millimeters in tooth length We ll estimate that σ = 7.5, which was our estimate s p Fix α = 0.05 Fix β = 0.20 We ll assume a two-sided test Dan Spencer (UC Santa Cruz) Wilcoxon Test and Calculating Sample Sizes 29 / 33

Calculating n: Tooth Growth n 1 = n 2 = 2(7.5 2 ) (z 0.05/2 + z 0.20 ) 2 = 2(7.5 2 ) = 98.111 3 2 (1.959964 + 0.8416212)2 3 2 In order to have power = 1 -.2 =.8, the minimum sample size for each group is 99 guinea pigs In the case where a non-integer sample size is found, round up to the nearest whole number Dan Spencer (UC Santa Cruz) Wilcoxon Test and Calculating Sample Sizes 30 / 33

Calculating Sample Size in JMP DOE Sample Size and Power Two Sample Means Enter α σ (Std Dev) Difference to detect ( ) Power (1 β) Continue Note, small differences may exist due to rounding errors Dan Spencer (UC Santa Cruz) Wilcoxon Test and Calculating Sample Sizes 31 / 33

JMP Output Dan Spencer (UC Santa Cruz) Wilcoxon Test and Calculating Sample Sizes 32 / 33

Notes on JMP Note that this tool can also be used to evaluate the power of a proposed study A plot of power versus sample size can also be useful in determining sample size Dan Spencer (UC Santa Cruz) Wilcoxon Test and Calculating Sample Sizes 33 / 33