Contents Kruskal-Wallis Test Friedman s Two-way Analysis of Variance by Ranks... 47

Size: px

Start display at page:

Download "Contents Kruskal-Wallis Test Friedman s Two-way Analysis of Variance by Ranks... 47"

Corey Stanley
6 years ago
Views:

1 Contents 1 Non-parametric Tests Introduction Advantages of Non-parametric Tests Disadvantages of Non-parametric Tests Some Terms Associated with Non-parametric Test Chi -Square Test for Goodness of Fit Chi -Square Test for Independence of Attributes Another Non-parametric Tests for Goodness of Fit Kolmogorov-Smirnov Test for One Sample Comparison between the Chi-square Test and the Kolomogrov-Smirnov Test Sign Tests One-sample Sign Test Paired Sign Test Run Test for Randomness Wilcoxon One Sample Signed Rank Test Wilcoxon Matched Pair Signed Rank Test Wald-Wolfowitz Run Test Kolmogorov-Smirnov Test for Two Samples Mann-Whitney s U Test Median Test Spearman s Rank-Correlation Test Kendall s Rank Correlation Test Difference between Spearman s Rank Correlation and Kendall s Rank Correlation Kruskal-Wallis Test Friedman s Two-way Analysis of Variance by Ranks

2 2 CONTENTS

3 Chapter 1 Non-parametric Tests 1.1 Introduction Methods of statistical inferences can basically be divided into two categories Parametric and Non-parametric inferences. In parametric inferences we generally assume the specific form of the distribution and the problem comprises of estimating the parameters or/and testing certain hypothesis related to them. But non-parametric procedure is developed which does not require the knowledge of the distribution of the random variate under study. So, non-parametric inferences are also called as distribution free inference. In non-parametric test we are concerned with the form of the population distribution and not with the values of the parameters of the distribution. Hence this type of tests are called as non-parametric tests due to Jacob Wolfowitz in In one of his works published in the Annals of Mathematical Statistics (now called Annals of Statistics) he explained that the parametric procedures signify those where one makes the assumption that distributions have known from, whereas non-parametric procedures are those that do not require such assumptions. During 1940 s it was believed that non-parametric methods meant Shortcuts for well-established parametric methods to many, and in 1950 s it was believed as Quick and inefficient methods that are wasteful of information. However in the 1960 s these tests grew over its criticisms and seemed to be hardly differentiable from parametric statistics at all. The next decade that is by 1970 s this technique was recognized as a science of providing statistical inference procedures that depends on weaker assumptions of the underlying distributions. Thus, non-parametric statistics is a subfield of statistics that provides statistical inference procedures which rely on weaker assumptions about the underlying distribution of the population than the parametric procedures. Since these type of procedures assume less about the underlying distribution, errors in correct assessment of the nature of the underlying distribution will usually have less effect on non-parametric procedures than on parametric procedures. Since, the latter usually rely more heavily on the correct assessment of the nature of the underlying distribution. However, the more information we have about the underlying distribution the better is the inference. So, for a given situation a non-parametric procedure will usually be the one 3

4 4 CHAPTER 1. NON-PARAMETRIC TESTS with greater variance in case of point estimation, with less power in case of hypothesis testing, with wider intervals in case of confidence interval estimation and with higher risk in decision theory when compared with a corresponding parametric procedure provided the assumptions are not violated. Since, in non-parametric methods no assumption is made about the form of the probability distribution of the population from which the sample has been drawn, so these methods are also termed as Distribution free methods Thus, the words, non-parametric and distribution free does not bear similar meaning, but here the former is used to indicate that the distribution gives no idea about the parameters and the later indicates that the test is free from the knowledge of distribution of the variate under study. If the method used to solve a statistical problem depends neither on the form of the parent distribution nor on its parameters, then the procedure is said to be distribution free. There may be cases when we come across procedures that depend on the form of the parent distribution and not on the value of the parameters. Such procedures may be termed as parameter free procedures. Thus both parametric and nonparametric methods may or may not be distribution free. But since distribution free procedures are widely used in non-parametric problems so these two terms can be used interchangeably. According to Gibbons, a statistical technique is said to be non-parametric if it satisfies one of the following five criteria: (i) The data are count data of number of observations in each category. (ii) The data are nominal scale data. (iii) The data are ordinal scale data. (iv) The inference does not concern a parameter. (v) The assumptions are general rather than specific. 1.2 Advantages of Non-parametric Tests The non-parametric tests has certain advantages over the parametric methods. Some of them are as follows: 1. It is simple to understand. 2. The calculations associated are relatively simple compared to parametric tests. Also these tests can be used when the actual measurements are not available but the ranks of the observations are given. Non-parametric methods can also be applied to data measured in nominal or ordinal scale. 3. The non-parametric tests are based on very mild assumptions compared to parametric tests, thus, this test can be easily applied. Frequently, it is assumed that the variables just come from a continuous distribution. The parametric tests are based on some strong assumptions and cannot provide proper results when the underlying assumptions are violated. 4. In non-parametric methods there is no restriction on the minimum size of the sample for valid and reliable results. Even with a small sample size non-parametric methods are quite powerful.

5 1.3. DISADVANTAGES OF NON-PARAMETRIC TESTS Disadvantages of Non-parametric Tests Some of the disadvantages commonly encountered by the non-parametric method can be as follows: 1. Though the assumptions in non-parametric tests are less restrictive than parametric tests but the assumption of independence is as important in non-parametric tests as in case of parametric tests. 2. Though it is often claimed that the non-parametric tests are very simple computationally but this in not always true. Some non-parametric tests demands lots of calculations. 3. In case of estimation parametric methods are more robust compared to the nonparametric methods as in case of the parametric methods the estimates remain unbiased even on violation of the underlying assumption of normality. 4. The parametric tests are more efficient compared to the non-parametric tests as a parametric test requires a smaller sample size compared to a non-parametric test to achieve the same level of power. Test Situation Parametric Non-parametric Efficiency 1 Single Mean t-test Sign Test 0.63 Two Independent Means Two sample t-test Mann-Whitney U-test 0.95 Two Dependent Means Paired t-test Wilcoxon Rank test The non-parametric tests cannot handle a complicated design like parametric tests. Friedman s two way analysis of variance by ranks is the most complex analysis that can be managed by a non-parametric procedure. Non-parametric test procedures for ANOVA of split-plot, strip-plot, nested etc. designs are yet to be discovered. Note: 1. Some of the assumptions on which non-parametric tests may be based on are: (i) Sample observations are independent. (ii) The variable under study is continuous. (iii) The probability density function of the random variable is continuous. (iv) Lower order moments exist. Obviously, these assumptions are fewer and much weaker than those associated with parametric inferences. 2. To solve a statistical problem using parametric or non-parametric method one should note whether he wants the parent distribution to depend on a finite number of parameters or to leave it to some general assumptions only like, continuity of the distribution. Hence, it is usually advisable to use procedures that eliminate assumptions about the underlying distributions if the validity of the assumptions are seriously doubtful. In this way we capture the greatest gain from both the parametric and non-parametric approaches to a problem. 1 Efficiency is the ratio of the sample size of the best parametric test to the sample size of the best nonparametric test of equal power.

6 6 CHAPTER 1. NON-PARAMETRIC TESTS 1.4 Some Terms Associated with Non-parametric Test 1. Run: A run is a sequence of symbols followed or preceded by other type of symbols or no symbols. For example let us consider the following sequence: MMMFFFFMFMMF we have 6 runs in all with 3 runs consisting of M and three runs of F. The number of runs in a sequence is taken as an indicator of randomness. 2. Ties: Often while ranking the data in case of non-parametric tests we find two or more observations with the same value. In such a case a tie is said to have occurred. 3. Nominal Scale of Measurement: The most elementary scale in measurement is one which identifies the categorized into which the subject under measurement can be classified. The categories are mutually exclusive. 4. The Ordinal Scale of Measurement: This measurement incorporates the classifying and labeling as done in the nominal scale but in addition to the it performs the task of arranging them in a proper order. In other words it also does the work of indicating the ranks. 5. Contingency Table: A contingency table is a two-way table in which the columns are classified according to one criterion or attribute and the rows are classified according to the other criterion or attribute. Thus we get a number of cells, where the number in a particular cell represents the number of observations at one label of a attribute cross classified under another level of the second attribute. 1.5 Chi -Square Test for Goodness of Fit Purpose This test is used to measure the discrepancy between the observed frequencies and theoretical frequency that is determined from the assumed distribution for the same event. In parametric test of hypothesis we assume the form of parent distribution, and then perform a test about same aspect of the population. These test are generally based on the assumption that the population is normally distributed. The suitability of the normal distribution or some other distribution may itself be verified by means of goodness of fit test. Let a random sample of size is drawn from a population with unknown c.d.f, say F. Here, we want to test the null hypothesis H 0 : F (x) = F 0 (x) for all X against the alternative hypothesis H 1 : F (x) F 0 (x)for some x. This test was discovered by Karl Pearson, which is the oldest non-parametric method.

7 1.6. CHI -SQUARE TEST FOR INDEPENDENCE OF ATTRIBUTES 7 Assumptions 1. The data is at nominal level of measurement and grouped into several categories. 2. For applying chi-square test the frequencies in the various categories should be reasonable large i.e The sum of the observed frequencies and the expected frequencies should be equal i.e. ei = o i. The Test Statistic The test is performed in the following manner: Step I: The random samples are classified into several categories if they are not arranged. Step II: Now, assuming that H 0 completely specifies F 0, one can obtain the value of the probability (p i ) of the random variable X to fall in the i th category (i = 1, 2,..., k). These probabilities multiplied by n the sample size, will give the expected numbers in the categories i.e. e i = np i for i = 1, 2,..., k. Step III: These e i s are than compared with the observed frequencies o i s for the different categories. Pearson suggested that the test statistic as, χ 2 = k (o i e i ) 2 i=1 e i χ 2 with k 1 degrees of freedom. If the value of o i and e i of each category is closer to each other and in such a case the calculated value of the χ 2 statistic will be small, otherwise it will be large. The larger the value of χ 2 the more likely it is that o i does not come from F 0. For applying the χ 2 test for goodness of fit it is essential that each of the e i s should be 5. In case some of the e i < 5 then two or more categories are combined till the expected frequencies is atleast 5. This is called as pooling. The same pooling is also done for the observed frequencies leading to a decrease in the number of categories and hence in the d.f. The calculated value of χ 2 is than compared with the critical value and H 0 is rejected if the calculated value is more than the critical value obtained from the table. In case the parameters of F 0 are not known then it is estimated. The estimated values are then replaced in F 0 and accordingly e i s are obtained. However, in such a case the d.f is further diminished by the number of parameters estimated. This is probably the most commonly used non-parametric test and is the oldest nonparametric test. The test is simple to calculate which is probably the reason for its popularity. 1.6 Chi -Square Test for Independence of Attributes Purpose This test is used to measure if the two attributes under consideration is independent of each other. Let A and B be two attributes where A is divided into r classes, A 1, A 2,...,A r and

8 8 CHAPTER 1. NON-PARAMETRIC TESTS B is divided into s classes B 1, B 2,...,B s. The various categories under each of the attributes can be, classified into a (r s) two-way table commonly called as the contingency table. A 1 A 2... A i... A r Total B 1 A 1 B 1 A 2 B 1... A i B 1... A r B 1 (B 1 ) B 2 A 1 B 2 A 2 B 2... A i B 2... A r B 2 (B 2 ) B j A 1 B j A 2 B j... A i B j... A r B j (B j ) B s A 1 B s A 2 B s... A i B s... A r B s (B s ) Total (A 1 ) (A 2 )... (A i )... (A r ) N Where A i B j represents the number of cases possessing both the attributes A i (i=1,2,...,r) and B j (j =1,2,...,s) and N is the grand total. Here, we want to test the null hypothesis H 0 : The two attributes A and B are independent of each other. The null hypothesis is tested against the alternative hypothesis that H 1 : The attributes A and B are dependent on each other. Assumptions 1. The data is at nominal level of measurement and grouped into several categories. 2. The subjects in each of the group are randomly and independently selected. 3. For applying chi-square test the frequencies in the various cells should be reasonable large i.e. 5. The Test Statistic Under the null hypothesis that the attributes are independent, the theoretical cell frequencies are calculated as follows: P [A i ] = Probability that the subject possesses attribute A i = A i ; i = 1, 2,..., r N P [B j ] = Probability that the subject possesses attribute B j = B j ; j = 1, 2,..., s N P [A i B j ] = Probability that the subject possesses the attributes A i and B j = P [A i ]P [B j ] = A i N.B j ; i = 1, 2,..., r and j = 1, 2,..., s N E[A i B j ] = Expected number of persons processing both the attributes A i and B j

9 1.6. CHI -SQUARE TEST FOR INDEPENDENCE OF ATTRIBUTES 9 = N.P [A i B j ] = N. A i N.B j N = A ib j N By using this formula we can find out the expected frequencies for each of the cell frequencies (A i B j ) where i = 1, 2,..., r and j = 1, 2,..., s. Under the hypothesis of independence the test statistic for χ 2 is given by, r s [ χ 2 {Ai B j E(A i B j )} 2 ] = χ 2 variate with (r 1)(s 1) d.f. E(A i B j ) i=1 j=1 The calculated value of χ 2 is than compared with the critical value at the desired level of significance and H 0 is rejected if the calculated value is more than the critical value obtained from the table otherwise the decision is taken in favour of H 0. Note: 1. A particular case of this is the independence of a 2 2 contingency table. This is obtained when we have two attributes each having two levels. Thus putting r = 2 and s = 2 we have the contingency table as: A 1 A 2 Total B 1 a b a+b B 2 c d c+d Total a + c b + d a + b + c + d The direct formula for the test statistic χ 2 is given by, χ 2 = n(ad bc) 2 (a + b)(c + d)(a + c)(b + d) χ2 with 1 d.f. However, if the expected frequency in a (2 2) contingency table is less than 5, then the test does not hold good. Yates suggested that in such a case 0.5 can be added to the small frequency for which the expected frequency is less than 5 and other cell frequencies are adjusted by adding or subtracting 0.5 such that the marginal totals remains the same. After the adjustment is done the calculations are done afresh. A direct formula for that may be given as, χ 2 n( ad bc n/2) 2 = (a + b)(c + d)(a + c)(b + d) χ2 with 1 d.f. This correction is valid for 2 2 contingency tables only. 2. If the calculations results in the rejection of the null hypothesis then one may be interested to know the cell or cells which are responsible for disturbing the independence. For knowing this we calculate the Pearsonian residual for each of the cells. The Pearsonian residual follow standard normal distribution and is given by the formula: Z ij = o i e i ei N(0, 1) Thus, if the cell frequencies are significant at 5% level then we have the calculated values of Z ij is greater than 1.96.

10 10 CHAPTER 1. NON-PARAMETRIC TESTS Illustration 1: Test the goodness of fit using χ 2 test by fitting a Poisson distribution to the data given below: No. of cells per square (X): No. of squares (f) : Solution: Here we are to test that H 0 : The data comes from a Poisson distribution. Here, since the parameter of the distribution is not known so we estimate the parameter and then use it to fit the distribution. We know that mean is the parameter of a Poisson distribution (λ) so in order to calculate the mean we construct the following table: X f f i X i Totals fi = 400 fi x i = 529 fi X i Thus, λ = mean = = 529 fi 400 = Thus the probability mass function can be written as P (X = x) = e x, x = 0, 1, 2,... x! Now, e i = N P (X = x) = 400 e x x! To calculate the expected frequencies and then the χ 2 statistic we construct the following table (o i e i ) 2 e i X f i = o i P(X=x) e i = N P (X = x)

11 1.6. CHI -SQUARE TEST FOR INDEPENDENCE OF ATTRIBUTES Totals oi = ei = Thus the calculated value of χ 2 is equal to Now, the table value of χ 2 at 5% level of significance for (6-1-1) d.f 2 is equal to Since, the calculated value of χ 2 is less than the tabulated value so the null hypothesis is accepted and it is concluded that the data fits Poisson distribution quiet well. Illustration 2: A sample of 200 men, all retired, were classified according to education and number of children. Test the hypothesis that the size of the family is independent of the level of education attained by the father. Education Number of children over 3 Elementary Secondary College Solution: Here we are to test the hypothesis that H 0 : The size of the family is independent of the level of education of the father. To perform the test we first obtain the row totals and column totals and then calculate the expected frequencies as follows: Number of children Education over 3 Total Elementary Secondary College Total Now, let e i j denotes the expected frequency of the observation in the i th row and j th column. Thus we have, e 11 = = 18.7 e 12 = = 39.8 e 13 = = e 21 = = 17.6 e 22 = = 37.4 e 23 = = Since after pooling there are 6 cells and also the parameter of the distribution i.e. λ was estimated. Thus the d.f for the test is (6-1)-1.

12 12 CHAPTER 1. NON-PARAMETRIC TESTS e 31 = = 8.8 e 32 = = 18.7 e 33 = Now, to calculate the value of the χ 2 statistic given by, χ 2 = k (o i e i ) 2 i=1 e i we construct the following table: = 11.5 Observed Expected (o i e i ) 2 e i (o ij ) (e ij ) Total 7.44 Now, the calculated value of χ 2 is 7.44 and the tabulated value of χ 2 for (3 1) (3 1) = 4 d.f at 5% l.o.s is Since, the calculated value is less than the tabulated value so we accept the null hypothesis and hence conclude that the two attributes i.e. size of the family and education attained by the father is independent. 1.7 Another Non-parametric Tests for Goodness of Fit Two Russian statisticians Kolmogorov and Smirnov in 1933 developed distribution free techniques based on empirical distributions. These statistical procedures used the maximum vertical distance between the density functions and is used to decide if a random sample is from a pre-specified density function. Also this can be used for testing whether two separate data sets have the same density function. In the first case the maximum vertical distance is measured between one empirical distribution and in the later case the maximum vertical distance is measured between two empirical distributions Kolmogorov-Smirnov Test for One Sample Purpose This test is used to check if the random sample under consideration is drawn from a population

13 1.7. ANOTHER NON-PARAMETRIC TESTS FOR GOODNESS OF FIT 13 with specified cumulative distribution function F 0 (x). This test is used to check the hypothesis H 0 : F (x) = F 0 (x) against the alternatives H 1 : F (x) F 0 (x). Assumptions X 1, X 2,, X n is a random sample of size n drawn from a continuous population. Derivation of the Test Statistic This test is due to two Russian statisticians Kolmogorov and Smirnov in The test is based on the empirical distribution function of a continuously distributed random variable. Let X 1, X 2,, X n be a random sample from a unknown continuous population having the cumulative distribution function F (x). Also the corresponding ordered statistics be x (1), x (2),, x (n) with x (i) be the i th value in this arrangement. Now, the empirical distribution function is defined by F n (x) = 0 if x < x (1) = i/n if x (i) x < x (i+1) for i = 1, 2,..., n 1 = 1 if x x (n) Thus, nf n (x) is the number of sample observations that are less than or equal to x. Now, if F is the distribution function of x then we have, P [F n (x) = k n] = n C k [F (x)] k [1 F (x)] n k where k = 0, 1, 2,..., n. Thus, for a fixed value of x, nf n (x) B(n, F (x)) Which implies that E(nF n (x)) = nf (x) E(F n (x)) = F (x) and V ar(nf n (x)) = nf (x){1 F (x)} V ar(f n (x)) = Thus, F n (x) is an unbiased and consistent estimator of F (x). F (x){1 F (x)} n 0 as n. Now, under the null hypothesis H 0 : F (x) = F 0 (x) we have F n (x) is an unbiased and consistent estimator of F 0 (x). So, Kolmogorov-Smirnov stated that under the null hypothesis the empirical distribution function F n (x) approaches the true distribution defined by the null hypothesis i.e. F 0 (x). They defined the test statistic as D n = Sup x F n (x) F 0 (x) So, under the null hypothesis one would expect that the value of D n to be small, while a large value of D n may be taken as an indicator that the actual distribution is not F 0 (x) i.e. a violation of the null-hypothesis. Thus one would reject H 0 if and only if the observed value of D n for the given size of the sample exceeds the critical value of D n, for a given level of significance in the table Critical Values of the Kolmogorov-Smirnov Statistic. However for large samples the critical values at 1%, 5% and 10% level of significance can be obtained from the following table:

14 14 CHAPTER 1. NON-PARAMETRIC TESTS Limitations 100α% 1% 5% 10% D n,α 1.63/ n 1.36/ n 1.22/ n 1. It only applies to continuous distributions. 2. It tends to be more sensitive near the center of the distribution than at the tails. Keeping this issue in mind Doksum (1977) improved the K-S Statistic by dividing it by a factor F (x)(1 F (x)). This factor acts as variance equalizer and the band thus generated would be slightly wider in the middle and much narrower at the tails. 3. Perhaps the most serious limitation is that the distribution must be fully specified. That is, if location, scale, and shape parameters are estimated from the data, the critical region of the K-S test is no longer valid. Illustration 3: Suppose that a data set is claimed to have come from a uniform distribution with range [0,1]. The random sample is 0.44, 0.76, 0.13, 0.27, 0.97, 0.45, 0.1, 0.94, 0.39, 0.53, 0.85, 0.45, 0.23, 0.98, 0.5. Apply an appropriate test to check the validity of the claim. Solution: Here the null hypothesis of interest is that the data comes from U(0,1) i.e. H 0 : The data comes from U(0,1). Now, if X U(0, 1) then F 0 (x) = ] x x 0 [x 1dx = = x. 0 Also, the empirical distribution function is given by, F n (x) = 0 if x < x (1) = i/n if x (i) x < x (i+1) for i = 1, 2,..., n 1 = 1 if x x (n) Now to find the values of F 0 (x), F n (x) etc, we construct the following table: x i x (i) F n (x) F 0 (x) F n (x) F 0 (x)

15 1.7. ANOTHER NON-PARAMETRIC TESTS FOR GOODNESS OF FIT So, the Kolmogorov-Smirnov statistic, D n = Sup x F n (x) F 0 (x) = Now, the critical value of D n at 5% level of significance for n = 15 is Thus, the calculated value is less than the tabulated value, so we accept the null hypothesis and conclude that the sample is drawn from U(0, 1). Illustration 4: The data given below shows the values of a random variable X and its corresponding frequencies X: F: Using Kolmogorov-Smirnov statistic check if the random numbers come from a Poisson Distribution with mean 7.6. Solution: Here the null hypothesis of interest is H 0 : The random numbers follow Poisson distribution with mean 7.6 Now, if X Poisson(7.6) then F 0 (z) = z x=0 e λ λ x x!. Also, the empirical distribution function is given by, F n (z) = 0 if z < 0 = 1 if z maximum(x) = cu(z)/n otherwise. Where cu(z) is the cumulative frequency of X at the point X = z and N is the total frequency. Now to find the values of F 0 (x), F n (x) etc, we construct the following table: x i f i cu(x i ) F n (x i ) F 0 (x i ) F n (x i ) F 0 (x i )

16 16 CHAPTER 1. NON-PARAMETRIC TESTS So, the calculated value of Kolmogorov-Smirnov statistic, D n = Sup x F n (x) F 0 (x) = Now, the tabulated value of the Kolmogorov-Smirnov statistic at 5% level of significance is D n,0.05 = 1.36 n = = Thus, the calculated value is more than the tabulated value, so we reject the null hypothesis and conclude that the sample is not from Poisson(7.6) Comparison between the Chi-square Test and the Kolomogrov-Smirnov Test Both the K-S test and the chi-square test are distribution free test in the sense that the sampling distribution of the test statistic does not depend on the distribution of the variable under consideration. But there are several points in which the the two tests differ from each other some of them are as follows: 1. The K-S test is more powerful than the chi-square test when the sample size n is small or when the expected frequencies (e i ) are small. 2. The chi-square test is specially meant for categorical data where as the K-S test are for random samples from continuous populations. 3. The K-S test utilizes each of the n observations, which cannot be done in case of the chi-square test, as the chi-square test requires the data to be arranged into several categories. Hence, the K-S statistic makes a better use of the available information than the chi-square test. 4. The chi-square test can be used for both discrete and continuous distribution but the K-S test is restricted to continuous distributions only. 5. For applying K-S test the distribution should be completely specified. But the chi-square test can be applied in case one knows just the form of the distribution. In such a case one can estimate the parameter or parameters of the distribution and can perform the test with adjustment in the degrees of freedom. 6. Computation involved in the K-S test is more simple compared to the chi-square test. 7. The K-S test is more flexible compared to the chi-square test as it can also be used to determine the minimum sample size and the confidence band. 8. The K-S test can also be used as a one sided test which is not possible for a chi-square test.

17 1.8. SIGN TESTS Sign Tests One-sample Sign Test Purpose This test is used to check if the median θ of a distribution is significantly different from a specified value θ 0. Thus we can check the null hypothesis H 0 : θ = θ 0 against the alternatives H 1 : θ θ 0 or H 1 : θ > θ 0 or H 1 : θ < θ 0. Assumptions The distribution is continuous in the vicinity of the median θ, such that we have P (X < θ) = P (X > θ) = 1 2. Derivation of the Test Statistic Let a sample X 1, X 2,, X n be drawn from a population with median θ. If the sample comes from a distribution with median θ = θ 0 then one would have half of the observations greater than θ 0 and half of them less than θ 0. Next, each observation greater than θ 0 is replaced by a positive sign and each observation less than θ 0 is replaced by a negative sign, any value equal to θ 0 is ignored. Since we assumed that the distribution is continuous about the median so the probability that any value of the random variable is equal to the median is zero. We then count the number of plus signs (r) and the number of minus signs (s). So, we have r + s n. For this test we consider only the value of r. Thus r follows binomial distribution. Under the null hypothesis p = 1 2. Thus, r Bin(n, 1 2 ). Thus, the null hypothesis becomes equivalent to testing H 0 : p = 1 2 against the alternatives H 1 : p 1 2 or H 1 : p > 1 2 or H 1 : p < 1 2 whatever the case may be. Case I : For small samples If the sample size is small then the test criterion is to reject H 0 if the number of plus signs r r α/2 (for a two sided alternative hypothesis), where r α/2 is the critical value at significance level α. r α/2 is designed in such a way that it is the smallest integer that satisfies the condition, n n C r ( 1 2 )r ( 1 2 )n r α/2 r=r α/2 Or r r α/2 is the smallest integer such that, r α/2 r=0 n C r ( 1 2 )r ( 1 2 )n r α/2 Case II : For large samples If the sample size is large i.e. if n 25, the normal test can be used to decide about H 0.

18 18 CHAPTER 1. NON-PARAMETRIC TESTS The Z statistic is given by, Z = (r + 0.5) n/2 n/4 where r < n/2 = (r 0.5) n/2 n/4 where r > n/2 Now at the 5% level of significance the decision of accepting the null hypothesis is taken if the calculated value of the Z statistic is less than Illustration 5: Suppose that we want to test the hypothesis that the median body length (θ) of frogs of a particular variety is θ 0 = 6.9cms against the alternative hypothesis θ 0 6.9cms with α = 0.05 on the basis of the following measurements. 6.3, 5.8, 7.7, 8.5, 5.2, 6.7, 7.3, 5.6, 8.3, 7.7, 8.2, 6.0, 6.8, 6.9, 6.3, 7.3, 7.0, 7.1, 6.6, 7.4 Solution: We set up the following null hypothesis H 0 : θ = 6.9 to be tested against the alternative hypothesis H 1 : θ 6.9 Let us put + for values in the series greater than 6.9, - for values in the series less than 6.9 and 0 for values in the series equal to 6.9. Thus we get,, +, +,,, +,, +, +, +,,, 0,, +, +, +,, + Thus, number of positive sign = 10, number of negative sign = 9 and so n = 19. x i d i = x i 6.9 d i R i ignored

19 1.8. SIGN TESTS Thus, the sum of ranks with positive values of d i s (rows with grey color) is T + = Similarly, the sum of ranks with negative values of d i s is T = Thus, T = min(t +, T ) = min(99.5, 90.5) = Here, since one of the observation is ignored so we have n = 19 and thus for a two sided alternative we find the table value of T α = 46 for n = 19 for α = 0.05 level of significance. Since T α < T so we accept the null hypothesis and conclude that H 0 : θ = 6.9 is true. Illustration 6: A drag was injected to a fresh group of 10 rats every day. The scientist-in-charge of the experiment made a claim that not more than 3 rats showed an increase in blood pressure on an average. The increase in blood pressure was noticed in the following number of rats in the last 10 days after the drug was administered. 2, 4, 5, 1, 6, 3, 2, 1, 7 and 8. Solution: Here the null hypothesis is H 0 : µ = 3 tested against H 1 : µ > 3. To perform the calculations we construct the following table: x i d i = x i 3 Signs ignored Thus, n = 9, and the number of plus signs = x = 5. Under the null hypothesis X B(9, 1 2 ). So, P (X 5) = 1 P (X < 5) = 1 4 x=0 9 (1) x (1) 9 x C x 2 2 = 1 [ ( 1 2 )9 + 9 ( 1 2 ) ( 1 2 ) ( 1 2 ) ( 1 2 )9] = 1 ( 1 2 )9 ( ) = 1 2 = 0.5 Thus, P (X 5) > 0.05 (level of significance) α. So, H 0 is accepted and thus the claim made by the Scientist is true.

20 20 CHAPTER 1. NON-PARAMETRIC TESTS Paired Sign Test Purpose Here based on two random samples X 1, X 2,..., X n and Y 1, Y 2,..., Y n of same size this test is used to check the hypothesis that the two random samples come from populations with identical density function. H 0 : f 1 (x) = f 2 (y) against a two sided alternative. Assumptions 1. The data is available in pairs of observations on two things being compared i.e. in the form (X i, Y i ) 2. For any given pair, each of the two observations are made under similar conditions. 3. Different pairs were observed under identical conditions. Derivation of the Test Statistic Here, we have two population p.d.f f 1 (x) and f 2 (y). Based on two random samples X 1, X 2,..., X n and Y 1, Y 2,..., Y n of same size a decision is to be taken about the null hypothesis H 0 : f 1 (x) = f 2 (y). The observations are arranged in pairs i.e. (x i, y i ), i = 1, 2, cdots, n and it is assumed that each pair is observed under identical conditions. Now, d i = x i y i is measured and only the sign (+ or -) is noted in lieu of the actual deviations. Now, under the null hypothesis the probability that the first observation of the first sample exceeds the first observation of the second sample is equal to the probability that the first observation of the second sample exceeds the first observation of the first sample and the probability of a tie is zero. So, H 0 can be written as: H 0 : P [X Y > 0] = 1 2 and P [X Y < 0] = 1 2 Let us define, u i = 1, if x i y i > 0 = 0, if x i y i < 0 So, u i is a Bernoulli s variate with p = P (x i y i > 0) = 1/2 Thus, u = u i, gives the total number of positive deviations and u is a binomial variate with parameters n and p = 1/2(under H 0 ). Let k be the number of positive deviations, so P (U k) = k r=0 n C r p r q n r = ( 1 2) n k r=0 n C r = p (Say) Now, if p 0.05 then we reject H 0 at 5% level of significance and if p > 0.05 then we conclude that the data does not go against the null hypothesis and so the null hypothesis is accepted. 1.9 Run Test for Randomness Purpose The theory of runs can be used to check the randomness in a set of observations. It is used to check if a random sample X 1, X 2,..., X n can be considered as a random sample from a

21 1.9. RUN TEST FOR RANDOMNESS 21 continuous distribution. Here, H 0 : The observations are random in nature. The null hypothesis is tested against the alternative that the observations are non-random. Assumptions The observations in the sample are obtained under similar conditions. The Test Statistic Here from the sample of the form X 1, X 2,, X n the median is calculated. The sample is then rewritten in such a way that each observation above the sample median is replaced by + and that below the sample median is replaced by -. If n is odd then the observation that is equal to the median is ignored. Thus the effective sample size is n 1 in case of odd number of observations and n in case of even number of observations. Thus, we get a series of + and - signs. The number of runs in this series is then counted. Let the total number of runs be k. Let n 1 be the number of + signs and n 2 be the number of - signs. As the number of sample values greater than the median is equal to the number of observations less than the median so we have n 1 = n 2 = m (say). Thus, the effective sample size is 2m. Now if K is a random variable which represents the number of runs. So, we can consider that K can assume 2n observations from a discrete distribution given by P (K = k) = 2 [m 1 C (k/2) 1 ] 2, k = 2, 4, 6,..., 2m 2m C m = 2 [m 1 C (k 1)/2 ][ m 1 C (k 3)/2 ] 2m C m, k = 3, 5, 7,..., 2m 1 If n is the initial sample size then for 5 n 40, the critical values of K can be obtained from the Table. However for n > 40, K can be assumed to follow a normal distribution with mean = (2n 1)/3 and variance (16n 29)/90. In both the cases if the test statistic lies in the critical region the the null hypothesis is rejected. Illustration 7: A coin is tossed 14 times. Following is the sequence of Heads (H) and Tails (T) that are obtained: H T T H H H T H T T H H T H Test using Run Test whether the heads and tails occur in random order. (Given α = 0.05, K L = 3, K U = 12). Solution: Let H 0 : Heads and tails occur in random order. Here the runs are directly provided. So we have n = 14, K = no. of runs = 9. Since, the observed values of the total number of runs lies between the critical values i.e. between 3 and 12 so we accept H 0. Thus we can conclude that the heads and tails occur in random order and hence the coin is unbiased.

22 22 CHAPTER 1. NON-PARAMETRIC TESTS 1.10 Wilcoxon One Sample Signed Rank Test Purpose Let a sample X 1, X 2,, X n be drawn from a population function with cumulative distribution function F (x). This test is used to check the hypothesis H 0 : F (m) = 1 2 against the alternatives H 1 : F (m) 1 2 or H 1 : F (m) > 1 2 or H 1 : F (m) < 1 2. Assumptions 1. F (x) is the cumulative distribution function of the random variable X and is absolutely continuous. 2. The density function f(x) is symmetric about the median, i.e. F (m x) = 1 F (x + m) and f(m x) = f(x + m). 3. No two observations in the random sample are equal. Derivation of the Test Statistic Case I Let us assume that m = 0. So, the distribution is symmetric about the origin. Hence, we have, F ( x) = 1 F (x) and f( x) = f(x) To test the null hypothesis H 0 : m = 0 or H 0 : F (m = 0) = 1 2 we proceed as follows: We order the absolute values of X i and denote their rank by R i. Let us define the random variable ξ i such that, ξ i = 1, if X i < 0 = 1, if X i > 0 Now, let us define a statistic W = ξ i R i, which is called as the Wilcoxon statistic. Also, let V 1, V 2,..., V n be a set of random variables subject to the condition that P (V i = i) = P (V i = i) = 1 2, where i = 1, 2,..., n and V = n i=1 V i. Then we have W = n i=1 ξ ir i has the same distribution as V. The mean and variance of W under H 0 is given by: E(W ) = E(V ) = E( V i ) = E(V i ) = E(V 1 ) + E(V 2 ) E(V n ) = ( ) + ( ) (n ) = 0 V ar(w ) = V ar(w i ) = [ E(Vi 2) {E(V i)} 2] = {i 2 n(n + 1)(2n + 1) 0} = 6 For sample size n > 25 we have W N(0, n(n+1)(2n+1) 6 ). Thus, the test statistic for testing the null hypothesis H 0 : m = 0 is given by, Case II T = W n(n+1)(2n+1) 6 N(0, 1) is an approximate statistic for testing H 0 : m = 0.

23 1.10. WILCOXON ONE SAMPLE SIGNED RANK TEST 23 In general, if the test is for H 0 : m = m 0 i.e. H 0 : F (m = m 0 ) = 1 2 then it can be shown n(n + 1) n(n + 1)(2n + 1) that for sample size n > 25 we have W N(, ). Thus, in such a case 4 24 the test statistic T is given by, W n(n + 1)/4 T = N(0, 1) is an appropriate statistic for testing H 0 : m = m 0. n(n + 1)(2n + 1) 24 Case III For testing H 0 : m = m 0 when the sample size is small (n < 25) we first find out the differences d i = x i θ 0. Under the null hypothesis we can assume that the values of d i are independent and comes from a population symmetrical about 0. We then find d i, the absolute difference. These, absolute differences are than arranged in ascending order and are accordingly ranked. The absolute differences with 0 values are ignored. Let the number of observations be now n 1 < n after the tied ranks are eliminated. In case of a tie in rank, each of the tied values are given the average value of the rank if there had been no ties. Let T + be the sum of ranks for positive d i s and T be the sum of ranks for negative d i s. Then T + + T = n 1(n 1 + 1). 2 The null distribution of T + and T are identical and is symmetrical about the value n 1 (n 1 + 1). The smaller of the two values T + and T is than compared with the table value in 2 the table Critical Values for T in the Wilcoxon Signed Rank Test for a given level of significance for n 1 number of observations and accordingly decision about the null hypothesis is taken. If the alternative hypothesis is H 1 : m > m 0 then we reject H 0 if T T α. If the alternative hypothesis is H 1 : m < m 0 then we reject H 0 if T + T α. If the alternative hypothesis is H 1 : m m 0 then we reject H 0 if T T α or T + T α. Illustration 8: A medical representative visited 12 doctors in a town. In order to meet the doctor he had to wait for 25, 10, 15, 20, 17, 11, 30, 27, 36, 40, 5 and 26 minutes respectively. However, the senior sales representative earlier claimed that the doctor claimed him waiting for more than 20 minutes. Using Wilcoxon Signed rank test verify the claim made by the senior sales representative at 5% level of significance. Solution: Here the null hypothesis is H 0 : µ = 20 minutes tested against the alternative hypothesis H 1 : µ > 20 minutes. To calculate the test statistic we construct the following table: x i x i 20 d i = x i 20 Ranks

24 24 CHAPTER 1. NON-PARAMETRIC TESTS ignored Here, the sum of positive ranks is equal to T + = 40 and the sum of negative ranks is equal to T = 26 (Sum of ranks in the columns with gray shade). Also, T = min(t +, T ) = 26. The effective sample size is n = 11. Since the alternative sample size is of the form H 1 : µ > µ 0 so the test statistic is T. Here, T = 26 > T 0.05 = 14. Thus H 0 is accepted, and so it can be concluded that the average waiting time of the sales representative for the doctor is 20 minutes Wilcoxon Matched Pair Signed Rank Test Purpose This test can be used to study significant difference between two samples consisting of observations in matched pairs. Matched pairs are generally two observations taken on the same item for two different situations. Let us consider two samples X 1, X 2,, X n and Y 1, Y 2,, Y n be obtained from the same item at two different situations. This test is used to check the hypothesis H 0 : There is no significant difference between the two samples. To be tested against the alternatives H 1 : The two samples differ significantly. Assumptions 1. The distribution function of both the random samples are absolutely continuous. 2. The random samples are independent of each other. Derivation of the Test Statistic This test can be performed in the following manner: Step I For each paired observation obtain the difference in scores i.e. d i, where d i = X i Y i. Step II Rank these differences by ignoring the plus and minus signs of d i. d i = 0 are ignored. When ranks are tied assign the average of the tied ranks to them. Step III

25 1.11. WILCOXON MATCHED PAIR SIGNED RANK TEST 25 Assign each rank the + or - sign that the difference it represents. Step IV Calculate T + the sum of positive ranks and T the sum of negative ranks. Hence, obtain T = min(t +, T Step V Case I: If the number of pairs (n) is 25, then compare the calculated value of T with the critical value of T obtained from the table for given sample size at a required level of significance. If the calculated value of T is less than the critical value then the null hypothesis is accepted. Case II: If the number of pairs (n) is > 25, then we can use the Z approximation to the T statistic: n(n + 1) T Z = 4 N(0, 1) n(n + 1)(2n + 1) 24 However in case of ties a correction factor u(u 2 1)/48 is introduced in the test statistic in the following manner: n(n + 1) T Z = 4 N(0, 1) n(n + 1)(2n + 1) u(u 2 1) Note: 1. This test solves the same problem that is done by two sample sign test but this test is more powerful than the sign test since the former takes into account both the sign and magnitude of the members of each pair. 2. This test is the non-parametric counterpart of paired t-test used for correlated data. 3. The calculation involved in derivation of the test statistic is similar to that of the Wilcoxon Signed rank test. Illustration 9: Two computer Key Boards manufactured by two different companies are put to test. Twenty typist were chosen at random and asked to type on both the key boards one by one. There speed were recorded in terms of the number of words per minute. The following results were obtained. Typist No: Keyboard I: Keyboard II: Using Wilcoxon Signed rank test verify the claim whether the second key-board is better than the first one at 5% level of significance. Solution: Let µ 1 be the average number of words typed per minute in Keyboard I and µ 2 be the

26 26 CHAPTER 1. NON-PARAMETRIC TESTS average number of words typed per minute in Keyboard II. Here we are to test H 0 : µ 1 µ 2 = 0 to be tested against the one sided alternative hypothesis H 1 : µ 1 µ 2 < 0. In order to perform the Wilcoxon Signed rank test for two samples we construct the following table: x i y i x i y i d i = x i y i Ranks ignore Here, the sum of positive ranks is equal to T + = and the sum of negative ranks is equal to T = 61.5 (Sum of ranks in the columns with gray shade). Also, T = min(t +, T ) = 67.5 and the effective sample size is n = 19. Since the alternative sample size is of the form H 1 : µ 1 µ 2 < 0 so the test statistic is T +. Here, T + = > T 0.05 = 54 for n = 19. Thus, H 0 is accepted, and so it can be concluded that the second key-board is not better than the first one tested at 5% level of significance Wald-Wolfowitz Run Test Purpose The theory of runs can be used to check if two random samples are drawn from the same distribution against the alternative that they are not. It is used to check if the two random

27 1.12. WALD-WOLFOWITZ RUN TEST 27 samples X 1, X 2,..., X n1 and Y 1, Y 2,..., Y n2 can be considered to have appeared from identical populations against that they are not 3. Thus if F 1 and F 2 be the distribution functions from which the two samples are drawn respectively then the hypothesis to be tested is that: H 0 : F 1 (.) = F 2 (.) to be tested against H 1 : F 1 (.) F 2 (.) Assumptions 1. The observations in the two samples are obtained under similar conditions. 2. The observations are independent of each other. 3. The distributions from which the samples are drawn is continuous. The Test Statistic To perform this test we first combine the two samples and arrange them in ascending order of magnitude. We rewrite the sample in terms of X and Y such that, X represents the member of the first sample and Y represents the member of the second sample. Thus runs of X and Y are obtained. Let K be the total number of runs. If both the samples come from the same population then there will be a thorough mixing of X and Y and accordingly the number of runs will be more. However, if the the number of runs is less than the distributions are not considered to be identical and H 0 will be rejected eventually. Now, there are n 1 X s and n 2 Y s which can arrange themselves in n 1+n 2 Cn 1 number of ways. Now the number of runs i.e. K can be either odd or even accordingly two cases arise. Case I: When K is even. Let K = 2m. This means that there will be m runs of X s and Y s because under H 0 each of the n 1 and n 2 arrangements are equally likely. Now, for having m runs of X s the different combinations of X s should be separated by m 1 spaces. this could happen in n 1 1 C m 1 ways. Thus, P (K = 2m) = n 1 1 C m 1 n 2 1 C m 1 n 1 +n 2 C n1 Case II: When K is odd. Let K = 2m + 1. This means that there will be either: (i) m runs of X s and m + 1 runs of Y s OR (ii) there will be m + 1 runs of X s and m runs of Y s. Now (i) can materialize in n 1 1 C m 1 n 2 1 C m number of ways. Similarly, (ii) can materialize in n 1 1 C m n 2 1 C m 1 number of ways. Thus the required probability is P (K = 2m + 1) = n 1 1 C m 1 n 2 1 C m + n 1 1 C m n 2 1 C m 1 n 1 +n 2 C n1 Let the probability of Type-I error be α then the critical value of K (= k 0, say)can be determined from the following equation: 3 This test can be used to study any type of difference between the samples like median, variability or skewness.

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007) FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter