Dr. Maddah ENMG 617 EM Statistics 10/12/12. Nonparametric Statistics (Chapter 16, Hines)

Size: px

Start display at page:

Download "Dr. Maddah ENMG 617 EM Statistics 10/12/12. Nonparametric Statistics (Chapter 16, Hines)"

Barry Carter
5 years ago
Views:

1 Dr. Maddah ENMG 617 EM Statistics 10/12/12 Nonparametric Statistics (Chapter 16, Hines) Introduction Most of the hypothesis testing presented so far assumes normally distributed data. These approaches (based on t- and F-tests) are not too sensitive to slight departures from normality. These are called parametric approaches, as they assume a parametric family of distributions, the normal. Here, we describe nonparametric or distribution free methods that make no assumptions about the distribution. Nonparametric procedures also have the advantage of working with categorical or rank data, and being quick and easy. One disadvantage of nonparametric procedures is that they do not utilize all the information in the sample, and as a result have a higher Type II error than parametric ones. The major improvement of nonparametric methods is when the underlying data is not normal. 1

2 Inference about a single population location: Sign test This test works with n independent observations from a continuous distribution, X 1, X 2,, X n. It tests whether the median,, is equal to a certain value 0, H 0 :, 0 H 1 :. 0 The test statistic is based on the signs of the difference Xi 0. Under the null hypothesis this difference is equally likely to be positive or negative. Then, letting R + and R be the number of positive and negative differences, both R + and R have a binomial distribution with parameters n and p = 0.5, under H 0. The test statistic is chosen as R = min(r +, R ). Noting, that R + + R = n, H 0 is rejected when R is too small. That is, the number of positive or negative signs is too few, when it is expected to be half of the observation, n / 2. Specifically, H 0 is rejected, at significance level, if R. R As presented in Table X of the appendix, P X R { } / 2, where X ~ Bin(n, 1/2). The p-value of the test can be found from the binomial distribution as R n r n r p-value P{ X R} (0.5) (0.5). r 0 r 2

3 Alternatively, H 0 is rejected if p-value Example of the sign test (Ex. 16.1) Data on the shear strength between two rocket propellants (in psi) is as follows. We would like to test the hypothesis that the median shear strength is 2000 psi. Then, R + = 14, R = 6, and R = min(14, 6) = 6. The critical value at 5% significance level, is given from Table X as R Since R R 0.05, we cannot reject H 0. That is, we cannot conclude that the median shear strength is different than 2000 psi. Alternatively, the p-value is 3

4 r 20 r p-value P{ X 6} (0.5) (0.5) (0.5) r 0 r r 0 r ( ) Since p-value > we cannot reject H 0. Issues with the signs test The actual significance level is typically smaller that the stated one because of the discreteness of the binomial distribution. Specifically, R is chosen as the smallest value of the integer r such that P{X r}. Ties occur when X. 0 In these cases, tied observations i should be ignored and the test should be limited to non-tied observations. One-sided alternatives, 0 and 0, can be tested by using R and R + as test statistic, respectively, and R2 (from Table X) as critical value. (Note that P{ X R }. ) 2 For n 2, the normal approximation to the binomial distribution can be used. A sample distribution of R which is normal with mean np = n / 2 and variance np(1 p) = n / 4, can be used. The test statistic is then Z 0 R n/2. n /2 4

5 The critical value for the two-sided test is Z /2, equivalently, H 0 is rejected if Z 0 > Z /2. For the single-sided tests, 0 and 0, the critical values are Z. Comparison of the sign test with the t-test For symmetric populations, the mean and median are equal. With normal and nonnormal symmetric populations, the t-test can be shown to have a smaller type II ( ) error, unless the tails of the distribution are too heavy (high variability). Therefore, the t-test is preferred for symmetric populations except under high variability. For nonsymmetric populations, the sign test is appropriate for testing medians. Inference about a single population location: The Wilcoxon signed rank test This applies to symmetric and continuous populations. It can be seen as a test on the mean,. For nonsymmetric populations, the sign test is preferred. Wilcoxon test is a variation of the sign test that attempts to utilize more information from the data, leading to more efficiency (lower ) than the sign test. In addition, to the sign of the deviations from the hypothesized mean, X, 0 their ranks are used. i 5

6 Specifically, the test statistic and procedure is the same as the signs test, R = min(r +, R ), except that R + and R are the sums of the ranks of positive and negative deviations. The rank is based on the absolute deviation Xi 0. The critical value R is also different. It is no longer based on the binomial distribution. Critical values are presented in Table X1 of the appendix. The same convention as before applies with P{ X R } / 2, where X is the sum of ranks of a given sign. Ties (deviations having equal rank) are broken by assigning an average rank. E.g., if the three smallest deviations are equal, one assigns to each a rank of (1+2+3)/3 = 2. A large sample approximation can be used when n > 20, with the sample distribution of R considered normal with mean n(n+1)/4 and variance n(n+1)(2n+1)/24. Example of the Wilcoxon signed rank test (Ex. 16-3) Consider again the shear strength between two rocket propellants data. Assume the data is symmetric. Symmetry can be checked by looking at the histogram of the data and verifying that the mean and the median are close. 6

7 Find the absolute deviations X i 2000, rank them by increasing order, and assign ranks as shown below. Then, the sum of positive and negative ranks are R + = = 150, R = = 60. The test statistic is then R = 60. At 5% significance level, the critical value from Table XI is R Since R > R 0.05, we cannot reject H 0. That is, we cannot conclude that the mean (assumed equal to the median) is different than 2000 psi. Comparison of the Wilcoxon signed rank test with the t-test For symmetric populations, the Wilcoxon test can be seen as a competitor to the t-test for testing means. 7

8 Research indicates that the efficiency of Wilcoxon test, in terms of the sample size required to produce a given type II error, is quite good in comparison with that of the t-test. For large samples, the Wilcoxon test requires a sample size equal to at most 1/0.86, i.e. 116%, of that of t-test to produce the same. In many cases, with non-normal distributions, the Wilcoxon test requires less data. In conclusion, the Wilcoxon test should be seen as a useful alternative to the t-test for symmetric populations, especially, when normality is not guaranteed. Inference about paired samples from two populations Both the sign test and the Wilcoxon signed rank test can be used to test whether the medians of two populations are different based on paired observations. Specifically, the tests would work with paired observations, (X 11, X 21 ), (X 12, X 22 ),, (X 1n, X 2n ). Letting D j = X 1j X 2j, testing that the medians are equal, 1 2, is equivalent to d 0, where d is the median of the observations difference, with sample values D j. The assumption of continuity on both populations is needed. For the Wilcoxon test, the difference of the paired observations is also assumed to be symmetric. 8

9 Example of paired samples test (Ex. 16-2) Consider the following fuel mileage data on two metering devices for a car fuel injection system. Each metering device was installed on 12 different cars, and a test is run with each metering system on each car. Utilizing the sign test on the difference on fuel mileage performance between the two devices, we see that R + = 8 and R = 4, and the test statistic is R = 4. At 5% significance level, the critical value from Table X is R We cannot conclude that the two systems have different fuel mileage performance. 9

10 Inference about independent samples from two populations: The Wilcoxon rank-sum test This test considers two independent continuous populations X 1 and X 2. These two populations are assumed to have the same probability distribution (same shape and spread), but with possibly different means, 1 and 2. That is, this is a test of whether the mean (position) have shifted while maintaining the same shape and spread, H 0 :, 1 2 H 1 :. 1 2 Data for the test are two independent samples of sizes n 1 and n 2 from the two populations, 11, 12,, 1n1 X 21, X 22,, X 2n, where n 1 n 2. 2 X X X and The test works by arranging all n 1 + n 2 observations in increasing order and assigning ranks. Ties are broken using an average rank, as discussed above. The idea of the test is that when the mean is not shifted, i.e. H 0 holds, all rankings are equally likely. The test statistic is based on the sum of the ranks in the X 1 sample, R 1, which should not be too small or too large when H 0 holds. Specifically, the test statistic is R = min(r 1, R 2 ), where R n ( n n 1) R

11 Note that R would be small if R 1 is too large or too small. Therefore, H 0 is rejected, at significance level, if R. R The critical values R are given in Table IX of the appendix, where like before R is computed for two-sided tests. (For, related single-sided tests, see text, Table IX gives R 2.) For large samples, n 1 > 8 and n 2 > 8, a normal approximation can be used. Specifically, R 1 is assumed to be normally distributed with the following mean and variance, The test statistic is R n 1( n 1 n 2 1) 2 1 2( 1 2 1), n n n n R Z 0 R 1 1. Then, H 0 is rejected if Z 0 > Z /2. Example of the Wilcoxon rank-sum test (Ex. 16-5) The mean axial stress in tensile members of an aircraft is being investigated. Two alloys are being considered. Ten specimens of each alloy type are tested. Data is shown below. The data is arranged in increasing order and ranked as shown below. R 1 R 11

12 Then, R 1 = = 99, R n ( n n 1) R 10( ) , and R = The critical value from Table IX is R > R. At 5% significance level, we cannot reject the hypothesis H 0 that the two alloys exhibit the same mean axial stress. 12

13 Inference about three or more populations: Analysis of variance with the Kruskal-Wallis test Consider the single-factor analysis of variance under completely randomized design (Chapter 12). This model considers a treatments and n i observations per treatment and adopts the following model for the data. Y, i 1, a, j 1,, n, ij i ij where the error terms ij are assumed to independent and identically normally distributed. Analysis of variance to test the equality of the treatment means, i = i, was carried out, H 0 : 1 = 2 = = a, H 1 : Not all means are equal. When the error terms ij have a common distribution, which is not necessary normal, the following Kruskal-Wallis nonparametric test can be used. The test idea is similar to the Wilcoxon rank-sum test, by noting, that under H 0, all rankings of the observations from all treatments are equally likely. Specifically, under H 0, the expected average rank of treatment i is (N+1)/2, where N is the total number of observations, N ni. a i 1 13

14 The test works by arranging all the data in increasing order and assigning a rank R ij for observation j from treatment i. The test statistic is based on the sum of square deviation of the average ranks of treatments from this expected average, a 2 a 2 N Ri. i i. i 1 i 1 i K n R 3( N 1), N( N 1) 2 N( N 1) n where Ri. n i Rij, Ri. Rij / ni j 1 j 1 n i, i =1,, a. The null hypothesis H 0 is rejected if K is large. Specifically, if the sample sizes, n i are large enough (n i 6 when a = 3, or n i 5, when a 3), a chi-square distribution gives the critical value of K. Specifically, at significance level, H 0 is rejected if K 2., a 1 A correction to the test statistic K can be applied if too many ties in the ranks occur. Specifically, the corrected K is where S 2 is the variance of the ranks, 14

15 Example of the Kruskal-Wallis test (Ex. 16-6) The data below shows the tensile strength of a fabric for five different percentages of cotton content. It is desired to test whether cotton content affect tensile strength. The rank Rij is shown next to every observation. Too many ties occur here. The test statistic is given by where the variance S 2 = 53.03, has been computed as Since 2 K 0.01, , we reject H 0 and conclude that treatments differ. That is, cotton content affects tensile strength. 15

Non-parametric Hypothesis Testing

Non-parametric Hypothesis Testing Procedures Hypothesis Testing General Procedure for Hypothesis Tests 1. Identify the parameter of interest.. Formulate the null hypothesis, H 0. 3. Specify an appropriate