Contents Kruskal-Wallis Test Friedman s Two-way Analysis of Variance by Ranks... 47

Size: px
Start display at page:

Download "Contents Kruskal-Wallis Test Friedman s Two-way Analysis of Variance by Ranks... 47"

Transcription

1 Contents 1 Non-parametric Tests Introduction Advantages of Non-parametric Tests Disadvantages of Non-parametric Tests Some Terms Associated with Non-parametric Test Chi -Square Test for Goodness of Fit Chi -Square Test for Independence of Attributes Another Non-parametric Tests for Goodness of Fit Kolmogorov-Smirnov Test for One Sample Comparison between the Chi-square Test and the Kolomogrov-Smirnov Test Sign Tests One-sample Sign Test Paired Sign Test Run Test for Randomness Wilcoxon One Sample Signed Rank Test Wilcoxon Matched Pair Signed Rank Test Wald-Wolfowitz Run Test Kolmogorov-Smirnov Test for Two Samples Mann-Whitney s U Test Median Test Spearman s Rank-Correlation Test Kendall s Rank Correlation Test Difference between Spearman s Rank Correlation and Kendall s Rank Correlation Kruskal-Wallis Test Friedman s Two-way Analysis of Variance by Ranks

2 2 CONTENTS

3 Chapter 1 Non-parametric Tests 1.1 Introduction Methods of statistical inferences can basically be divided into two categories Parametric and Non-parametric inferences. In parametric inferences we generally assume the specific form of the distribution and the problem comprises of estimating the parameters or/and testing certain hypothesis related to them. But non-parametric procedure is developed which does not require the knowledge of the distribution of the random variate under study. So, non-parametric inferences are also called as distribution free inference. In non-parametric test we are concerned with the form of the population distribution and not with the values of the parameters of the distribution. Hence this type of tests are called as non-parametric tests due to Jacob Wolfowitz in In one of his works published in the Annals of Mathematical Statistics (now called Annals of Statistics) he explained that the parametric procedures signify those where one makes the assumption that distributions have known from, whereas non-parametric procedures are those that do not require such assumptions. During 1940 s it was believed that non-parametric methods meant Shortcuts for well-established parametric methods to many, and in 1950 s it was believed as Quick and inefficient methods that are wasteful of information. However in the 1960 s these tests grew over its criticisms and seemed to be hardly differentiable from parametric statistics at all. The next decade that is by 1970 s this technique was recognized as a science of providing statistical inference procedures that depends on weaker assumptions of the underlying distributions. Thus, non-parametric statistics is a subfield of statistics that provides statistical inference procedures which rely on weaker assumptions about the underlying distribution of the population than the parametric procedures. Since these type of procedures assume less about the underlying distribution, errors in correct assessment of the nature of the underlying distribution will usually have less effect on non-parametric procedures than on parametric procedures. Since, the latter usually rely more heavily on the correct assessment of the nature of the underlying distribution. However, the more information we have about the underlying distribution the better is the inference. So, for a given situation a non-parametric procedure will usually be the one 3

4 4 CHAPTER 1. NON-PARAMETRIC TESTS with greater variance in case of point estimation, with less power in case of hypothesis testing, with wider intervals in case of confidence interval estimation and with higher risk in decision theory when compared with a corresponding parametric procedure provided the assumptions are not violated. Since, in non-parametric methods no assumption is made about the form of the probability distribution of the population from which the sample has been drawn, so these methods are also termed as Distribution free methods Thus, the words, non-parametric and distribution free does not bear similar meaning, but here the former is used to indicate that the distribution gives no idea about the parameters and the later indicates that the test is free from the knowledge of distribution of the variate under study. If the method used to solve a statistical problem depends neither on the form of the parent distribution nor on its parameters, then the procedure is said to be distribution free. There may be cases when we come across procedures that depend on the form of the parent distribution and not on the value of the parameters. Such procedures may be termed as parameter free procedures. Thus both parametric and nonparametric methods may or may not be distribution free. But since distribution free procedures are widely used in non-parametric problems so these two terms can be used interchangeably. According to Gibbons, a statistical technique is said to be non-parametric if it satisfies one of the following five criteria: (i) The data are count data of number of observations in each category. (ii) The data are nominal scale data. (iii) The data are ordinal scale data. (iv) The inference does not concern a parameter. (v) The assumptions are general rather than specific. 1.2 Advantages of Non-parametric Tests The non-parametric tests has certain advantages over the parametric methods. Some of them are as follows: 1. It is simple to understand. 2. The calculations associated are relatively simple compared to parametric tests. Also these tests can be used when the actual measurements are not available but the ranks of the observations are given. Non-parametric methods can also be applied to data measured in nominal or ordinal scale. 3. The non-parametric tests are based on very mild assumptions compared to parametric tests, thus, this test can be easily applied. Frequently, it is assumed that the variables just come from a continuous distribution. The parametric tests are based on some strong assumptions and cannot provide proper results when the underlying assumptions are violated. 4. In non-parametric methods there is no restriction on the minimum size of the sample for valid and reliable results. Even with a small sample size non-parametric methods are quite powerful.

5 1.3. DISADVANTAGES OF NON-PARAMETRIC TESTS Disadvantages of Non-parametric Tests Some of the disadvantages commonly encountered by the non-parametric method can be as follows: 1. Though the assumptions in non-parametric tests are less restrictive than parametric tests but the assumption of independence is as important in non-parametric tests as in case of parametric tests. 2. Though it is often claimed that the non-parametric tests are very simple computationally but this in not always true. Some non-parametric tests demands lots of calculations. 3. In case of estimation parametric methods are more robust compared to the nonparametric methods as in case of the parametric methods the estimates remain unbiased even on violation of the underlying assumption of normality. 4. The parametric tests are more efficient compared to the non-parametric tests as a parametric test requires a smaller sample size compared to a non-parametric test to achieve the same level of power. Test Situation Parametric Non-parametric Efficiency 1 Single Mean t-test Sign Test 0.63 Two Independent Means Two sample t-test Mann-Whitney U-test 0.95 Two Dependent Means Paired t-test Wilcoxon Rank test The non-parametric tests cannot handle a complicated design like parametric tests. Friedman s two way analysis of variance by ranks is the most complex analysis that can be managed by a non-parametric procedure. Non-parametric test procedures for ANOVA of split-plot, strip-plot, nested etc. designs are yet to be discovered. Note: 1. Some of the assumptions on which non-parametric tests may be based on are: (i) Sample observations are independent. (ii) The variable under study is continuous. (iii) The probability density function of the random variable is continuous. (iv) Lower order moments exist. Obviously, these assumptions are fewer and much weaker than those associated with parametric inferences. 2. To solve a statistical problem using parametric or non-parametric method one should note whether he wants the parent distribution to depend on a finite number of parameters or to leave it to some general assumptions only like, continuity of the distribution. Hence, it is usually advisable to use procedures that eliminate assumptions about the underlying distributions if the validity of the assumptions are seriously doubtful. In this way we capture the greatest gain from both the parametric and non-parametric approaches to a problem. 1 Efficiency is the ratio of the sample size of the best parametric test to the sample size of the best nonparametric test of equal power.

6 6 CHAPTER 1. NON-PARAMETRIC TESTS 1.4 Some Terms Associated with Non-parametric Test 1. Run: A run is a sequence of symbols followed or preceded by other type of symbols or no symbols. For example let us consider the following sequence: MMMFFFFMFMMF we have 6 runs in all with 3 runs consisting of M and three runs of F. The number of runs in a sequence is taken as an indicator of randomness. 2. Ties: Often while ranking the data in case of non-parametric tests we find two or more observations with the same value. In such a case a tie is said to have occurred. 3. Nominal Scale of Measurement: The most elementary scale in measurement is one which identifies the categorized into which the subject under measurement can be classified. The categories are mutually exclusive. 4. The Ordinal Scale of Measurement: This measurement incorporates the classifying and labeling as done in the nominal scale but in addition to the it performs the task of arranging them in a proper order. In other words it also does the work of indicating the ranks. 5. Contingency Table: A contingency table is a two-way table in which the columns are classified according to one criterion or attribute and the rows are classified according to the other criterion or attribute. Thus we get a number of cells, where the number in a particular cell represents the number of observations at one label of a attribute cross classified under another level of the second attribute. 1.5 Chi -Square Test for Goodness of Fit Purpose This test is used to measure the discrepancy between the observed frequencies and theoretical frequency that is determined from the assumed distribution for the same event. In parametric test of hypothesis we assume the form of parent distribution, and then perform a test about same aspect of the population. These test are generally based on the assumption that the population is normally distributed. The suitability of the normal distribution or some other distribution may itself be verified by means of goodness of fit test. Let a random sample of size is drawn from a population with unknown c.d.f, say F. Here, we want to test the null hypothesis H 0 : F (x) = F 0 (x) for all X against the alternative hypothesis H 1 : F (x) F 0 (x)for some x. This test was discovered by Karl Pearson, which is the oldest non-parametric method.

7 1.6. CHI -SQUARE TEST FOR INDEPENDENCE OF ATTRIBUTES 7 Assumptions 1. The data is at nominal level of measurement and grouped into several categories. 2. For applying chi-square test the frequencies in the various categories should be reasonable large i.e The sum of the observed frequencies and the expected frequencies should be equal i.e. ei = o i. The Test Statistic The test is performed in the following manner: Step I: The random samples are classified into several categories if they are not arranged. Step II: Now, assuming that H 0 completely specifies F 0, one can obtain the value of the probability (p i ) of the random variable X to fall in the i th category (i = 1, 2,..., k). These probabilities multiplied by n the sample size, will give the expected numbers in the categories i.e. e i = np i for i = 1, 2,..., k. Step III: These e i s are than compared with the observed frequencies o i s for the different categories. Pearson suggested that the test statistic as, χ 2 = k (o i e i ) 2 i=1 e i χ 2 with k 1 degrees of freedom. If the value of o i and e i of each category is closer to each other and in such a case the calculated value of the χ 2 statistic will be small, otherwise it will be large. The larger the value of χ 2 the more likely it is that o i does not come from F 0. For applying the χ 2 test for goodness of fit it is essential that each of the e i s should be 5. In case some of the e i < 5 then two or more categories are combined till the expected frequencies is atleast 5. This is called as pooling. The same pooling is also done for the observed frequencies leading to a decrease in the number of categories and hence in the d.f. The calculated value of χ 2 is than compared with the critical value and H 0 is rejected if the calculated value is more than the critical value obtained from the table. In case the parameters of F 0 are not known then it is estimated. The estimated values are then replaced in F 0 and accordingly e i s are obtained. However, in such a case the d.f is further diminished by the number of parameters estimated. This is probably the most commonly used non-parametric test and is the oldest nonparametric test. The test is simple to calculate which is probably the reason for its popularity. 1.6 Chi -Square Test for Independence of Attributes Purpose This test is used to measure if the two attributes under consideration is independent of each other. Let A and B be two attributes where A is divided into r classes, A 1, A 2,...,A r and

8 8 CHAPTER 1. NON-PARAMETRIC TESTS B is divided into s classes B 1, B 2,...,B s. The various categories under each of the attributes can be, classified into a (r s) two-way table commonly called as the contingency table. A 1 A 2... A i... A r Total B 1 A 1 B 1 A 2 B 1... A i B 1... A r B 1 (B 1 ) B 2 A 1 B 2 A 2 B 2... A i B 2... A r B 2 (B 2 ) B j A 1 B j A 2 B j... A i B j... A r B j (B j ) B s A 1 B s A 2 B s... A i B s... A r B s (B s ) Total (A 1 ) (A 2 )... (A i )... (A r ) N Where A i B j represents the number of cases possessing both the attributes A i (i=1,2,...,r) and B j (j =1,2,...,s) and N is the grand total. Here, we want to test the null hypothesis H 0 : The two attributes A and B are independent of each other. The null hypothesis is tested against the alternative hypothesis that H 1 : The attributes A and B are dependent on each other. Assumptions 1. The data is at nominal level of measurement and grouped into several categories. 2. The subjects in each of the group are randomly and independently selected. 3. For applying chi-square test the frequencies in the various cells should be reasonable large i.e. 5. The Test Statistic Under the null hypothesis that the attributes are independent, the theoretical cell frequencies are calculated as follows: P [A i ] = Probability that the subject possesses attribute A i = A i ; i = 1, 2,..., r N P [B j ] = Probability that the subject possesses attribute B j = B j ; j = 1, 2,..., s N P [A i B j ] = Probability that the subject possesses the attributes A i and B j = P [A i ]P [B j ] = A i N.B j ; i = 1, 2,..., r and j = 1, 2,..., s N E[A i B j ] = Expected number of persons processing both the attributes A i and B j

9 1.6. CHI -SQUARE TEST FOR INDEPENDENCE OF ATTRIBUTES 9 = N.P [A i B j ] = N. A i N.B j N = A ib j N By using this formula we can find out the expected frequencies for each of the cell frequencies (A i B j ) where i = 1, 2,..., r and j = 1, 2,..., s. Under the hypothesis of independence the test statistic for χ 2 is given by, r s [ χ 2 {Ai B j E(A i B j )} 2 ] = χ 2 variate with (r 1)(s 1) d.f. E(A i B j ) i=1 j=1 The calculated value of χ 2 is than compared with the critical value at the desired level of significance and H 0 is rejected if the calculated value is more than the critical value obtained from the table otherwise the decision is taken in favour of H 0. Note: 1. A particular case of this is the independence of a 2 2 contingency table. This is obtained when we have two attributes each having two levels. Thus putting r = 2 and s = 2 we have the contingency table as: A 1 A 2 Total B 1 a b a+b B 2 c d c+d Total a + c b + d a + b + c + d The direct formula for the test statistic χ 2 is given by, χ 2 = n(ad bc) 2 (a + b)(c + d)(a + c)(b + d) χ2 with 1 d.f. However, if the expected frequency in a (2 2) contingency table is less than 5, then the test does not hold good. Yates suggested that in such a case 0.5 can be added to the small frequency for which the expected frequency is less than 5 and other cell frequencies are adjusted by adding or subtracting 0.5 such that the marginal totals remains the same. After the adjustment is done the calculations are done afresh. A direct formula for that may be given as, χ 2 n( ad bc n/2) 2 = (a + b)(c + d)(a + c)(b + d) χ2 with 1 d.f. This correction is valid for 2 2 contingency tables only. 2. If the calculations results in the rejection of the null hypothesis then one may be interested to know the cell or cells which are responsible for disturbing the independence. For knowing this we calculate the Pearsonian residual for each of the cells. The Pearsonian residual follow standard normal distribution and is given by the formula: Z ij = o i e i ei N(0, 1) Thus, if the cell frequencies are significant at 5% level then we have the calculated values of Z ij is greater than 1.96.

10 10 CHAPTER 1. NON-PARAMETRIC TESTS Illustration 1: Test the goodness of fit using χ 2 test by fitting a Poisson distribution to the data given below: No. of cells per square (X): No. of squares (f) : Solution: Here we are to test that H 0 : The data comes from a Poisson distribution. Here, since the parameter of the distribution is not known so we estimate the parameter and then use it to fit the distribution. We know that mean is the parameter of a Poisson distribution (λ) so in order to calculate the mean we construct the following table: X f f i X i Totals fi = 400 fi x i = 529 fi X i Thus, λ = mean = = 529 fi 400 = Thus the probability mass function can be written as P (X = x) = e x, x = 0, 1, 2,... x! Now, e i = N P (X = x) = 400 e x x! To calculate the expected frequencies and then the χ 2 statistic we construct the following table (o i e i ) 2 e i X f i = o i P(X=x) e i = N P (X = x)

11 1.6. CHI -SQUARE TEST FOR INDEPENDENCE OF ATTRIBUTES Totals oi = ei = Thus the calculated value of χ 2 is equal to Now, the table value of χ 2 at 5% level of significance for (6-1-1) d.f 2 is equal to Since, the calculated value of χ 2 is less than the tabulated value so the null hypothesis is accepted and it is concluded that the data fits Poisson distribution quiet well. Illustration 2: A sample of 200 men, all retired, were classified according to education and number of children. Test the hypothesis that the size of the family is independent of the level of education attained by the father. Education Number of children over 3 Elementary Secondary College Solution: Here we are to test the hypothesis that H 0 : The size of the family is independent of the level of education of the father. To perform the test we first obtain the row totals and column totals and then calculate the expected frequencies as follows: Number of children Education over 3 Total Elementary Secondary College Total Now, let e i j denotes the expected frequency of the observation in the i th row and j th column. Thus we have, e 11 = = 18.7 e 12 = = 39.8 e 13 = = e 21 = = 17.6 e 22 = = 37.4 e 23 = = Since after pooling there are 6 cells and also the parameter of the distribution i.e. λ was estimated. Thus the d.f for the test is (6-1)-1.

12 12 CHAPTER 1. NON-PARAMETRIC TESTS e 31 = = 8.8 e 32 = = 18.7 e 33 = Now, to calculate the value of the χ 2 statistic given by, χ 2 = k (o i e i ) 2 i=1 e i we construct the following table: = 11.5 Observed Expected (o i e i ) 2 e i (o ij ) (e ij ) Total 7.44 Now, the calculated value of χ 2 is 7.44 and the tabulated value of χ 2 for (3 1) (3 1) = 4 d.f at 5% l.o.s is Since, the calculated value is less than the tabulated value so we accept the null hypothesis and hence conclude that the two attributes i.e. size of the family and education attained by the father is independent. 1.7 Another Non-parametric Tests for Goodness of Fit Two Russian statisticians Kolmogorov and Smirnov in 1933 developed distribution free techniques based on empirical distributions. These statistical procedures used the maximum vertical distance between the density functions and is used to decide if a random sample is from a pre-specified density function. Also this can be used for testing whether two separate data sets have the same density function. In the first case the maximum vertical distance is measured between one empirical distribution and in the later case the maximum vertical distance is measured between two empirical distributions Kolmogorov-Smirnov Test for One Sample Purpose This test is used to check if the random sample under consideration is drawn from a population

13 1.7. ANOTHER NON-PARAMETRIC TESTS FOR GOODNESS OF FIT 13 with specified cumulative distribution function F 0 (x). This test is used to check the hypothesis H 0 : F (x) = F 0 (x) against the alternatives H 1 : F (x) F 0 (x). Assumptions X 1, X 2,, X n is a random sample of size n drawn from a continuous population. Derivation of the Test Statistic This test is due to two Russian statisticians Kolmogorov and Smirnov in The test is based on the empirical distribution function of a continuously distributed random variable. Let X 1, X 2,, X n be a random sample from a unknown continuous population having the cumulative distribution function F (x). Also the corresponding ordered statistics be x (1), x (2),, x (n) with x (i) be the i th value in this arrangement. Now, the empirical distribution function is defined by F n (x) = 0 if x < x (1) = i/n if x (i) x < x (i+1) for i = 1, 2,..., n 1 = 1 if x x (n) Thus, nf n (x) is the number of sample observations that are less than or equal to x. Now, if F is the distribution function of x then we have, P [F n (x) = k n] = n C k [F (x)] k [1 F (x)] n k where k = 0, 1, 2,..., n. Thus, for a fixed value of x, nf n (x) B(n, F (x)) Which implies that E(nF n (x)) = nf (x) E(F n (x)) = F (x) and V ar(nf n (x)) = nf (x){1 F (x)} V ar(f n (x)) = Thus, F n (x) is an unbiased and consistent estimator of F (x). F (x){1 F (x)} n 0 as n. Now, under the null hypothesis H 0 : F (x) = F 0 (x) we have F n (x) is an unbiased and consistent estimator of F 0 (x). So, Kolmogorov-Smirnov stated that under the null hypothesis the empirical distribution function F n (x) approaches the true distribution defined by the null hypothesis i.e. F 0 (x). They defined the test statistic as D n = Sup x F n (x) F 0 (x) So, under the null hypothesis one would expect that the value of D n to be small, while a large value of D n may be taken as an indicator that the actual distribution is not F 0 (x) i.e. a violation of the null-hypothesis. Thus one would reject H 0 if and only if the observed value of D n for the given size of the sample exceeds the critical value of D n, for a given level of significance in the table Critical Values of the Kolmogorov-Smirnov Statistic. However for large samples the critical values at 1%, 5% and 10% level of significance can be obtained from the following table:

14 14 CHAPTER 1. NON-PARAMETRIC TESTS Limitations 100α% 1% 5% 10% D n,α 1.63/ n 1.36/ n 1.22/ n 1. It only applies to continuous distributions. 2. It tends to be more sensitive near the center of the distribution than at the tails. Keeping this issue in mind Doksum (1977) improved the K-S Statistic by dividing it by a factor F (x)(1 F (x)). This factor acts as variance equalizer and the band thus generated would be slightly wider in the middle and much narrower at the tails. 3. Perhaps the most serious limitation is that the distribution must be fully specified. That is, if location, scale, and shape parameters are estimated from the data, the critical region of the K-S test is no longer valid. Illustration 3: Suppose that a data set is claimed to have come from a uniform distribution with range [0,1]. The random sample is 0.44, 0.76, 0.13, 0.27, 0.97, 0.45, 0.1, 0.94, 0.39, 0.53, 0.85, 0.45, 0.23, 0.98, 0.5. Apply an appropriate test to check the validity of the claim. Solution: Here the null hypothesis of interest is that the data comes from U(0,1) i.e. H 0 : The data comes from U(0,1). Now, if X U(0, 1) then F 0 (x) = ] x x 0 [x 1dx = = x. 0 Also, the empirical distribution function is given by, F n (x) = 0 if x < x (1) = i/n if x (i) x < x (i+1) for i = 1, 2,..., n 1 = 1 if x x (n) Now to find the values of F 0 (x), F n (x) etc, we construct the following table: x i x (i) F n (x) F 0 (x) F n (x) F 0 (x)

15 1.7. ANOTHER NON-PARAMETRIC TESTS FOR GOODNESS OF FIT So, the Kolmogorov-Smirnov statistic, D n = Sup x F n (x) F 0 (x) = Now, the critical value of D n at 5% level of significance for n = 15 is Thus, the calculated value is less than the tabulated value, so we accept the null hypothesis and conclude that the sample is drawn from U(0, 1). Illustration 4: The data given below shows the values of a random variable X and its corresponding frequencies X: F: Using Kolmogorov-Smirnov statistic check if the random numbers come from a Poisson Distribution with mean 7.6. Solution: Here the null hypothesis of interest is H 0 : The random numbers follow Poisson distribution with mean 7.6 Now, if X Poisson(7.6) then F 0 (z) = z x=0 e λ λ x x!. Also, the empirical distribution function is given by, F n (z) = 0 if z < 0 = 1 if z maximum(x) = cu(z)/n otherwise. Where cu(z) is the cumulative frequency of X at the point X = z and N is the total frequency. Now to find the values of F 0 (x), F n (x) etc, we construct the following table: x i f i cu(x i ) F n (x i ) F 0 (x i ) F n (x i ) F 0 (x i )

16 16 CHAPTER 1. NON-PARAMETRIC TESTS So, the calculated value of Kolmogorov-Smirnov statistic, D n = Sup x F n (x) F 0 (x) = Now, the tabulated value of the Kolmogorov-Smirnov statistic at 5% level of significance is D n,0.05 = 1.36 n = = Thus, the calculated value is more than the tabulated value, so we reject the null hypothesis and conclude that the sample is not from Poisson(7.6) Comparison between the Chi-square Test and the Kolomogrov-Smirnov Test Both the K-S test and the chi-square test are distribution free test in the sense that the sampling distribution of the test statistic does not depend on the distribution of the variable under consideration. But there are several points in which the the two tests differ from each other some of them are as follows: 1. The K-S test is more powerful than the chi-square test when the sample size n is small or when the expected frequencies (e i ) are small. 2. The chi-square test is specially meant for categorical data where as the K-S test are for random samples from continuous populations. 3. The K-S test utilizes each of the n observations, which cannot be done in case of the chi-square test, as the chi-square test requires the data to be arranged into several categories. Hence, the K-S statistic makes a better use of the available information than the chi-square test. 4. The chi-square test can be used for both discrete and continuous distribution but the K-S test is restricted to continuous distributions only. 5. For applying K-S test the distribution should be completely specified. But the chi-square test can be applied in case one knows just the form of the distribution. In such a case one can estimate the parameter or parameters of the distribution and can perform the test with adjustment in the degrees of freedom. 6. Computation involved in the K-S test is more simple compared to the chi-square test. 7. The K-S test is more flexible compared to the chi-square test as it can also be used to determine the minimum sample size and the confidence band. 8. The K-S test can also be used as a one sided test which is not possible for a chi-square test.

17 1.8. SIGN TESTS Sign Tests One-sample Sign Test Purpose This test is used to check if the median θ of a distribution is significantly different from a specified value θ 0. Thus we can check the null hypothesis H 0 : θ = θ 0 against the alternatives H 1 : θ θ 0 or H 1 : θ > θ 0 or H 1 : θ < θ 0. Assumptions The distribution is continuous in the vicinity of the median θ, such that we have P (X < θ) = P (X > θ) = 1 2. Derivation of the Test Statistic Let a sample X 1, X 2,, X n be drawn from a population with median θ. If the sample comes from a distribution with median θ = θ 0 then one would have half of the observations greater than θ 0 and half of them less than θ 0. Next, each observation greater than θ 0 is replaced by a positive sign and each observation less than θ 0 is replaced by a negative sign, any value equal to θ 0 is ignored. Since we assumed that the distribution is continuous about the median so the probability that any value of the random variable is equal to the median is zero. We then count the number of plus signs (r) and the number of minus signs (s). So, we have r + s n. For this test we consider only the value of r. Thus r follows binomial distribution. Under the null hypothesis p = 1 2. Thus, r Bin(n, 1 2 ). Thus, the null hypothesis becomes equivalent to testing H 0 : p = 1 2 against the alternatives H 1 : p 1 2 or H 1 : p > 1 2 or H 1 : p < 1 2 whatever the case may be. Case I : For small samples If the sample size is small then the test criterion is to reject H 0 if the number of plus signs r r α/2 (for a two sided alternative hypothesis), where r α/2 is the critical value at significance level α. r α/2 is designed in such a way that it is the smallest integer that satisfies the condition, n n C r ( 1 2 )r ( 1 2 )n r α/2 r=r α/2 Or r r α/2 is the smallest integer such that, r α/2 r=0 n C r ( 1 2 )r ( 1 2 )n r α/2 Case II : For large samples If the sample size is large i.e. if n 25, the normal test can be used to decide about H 0.

18 18 CHAPTER 1. NON-PARAMETRIC TESTS The Z statistic is given by, Z = (r + 0.5) n/2 n/4 where r < n/2 = (r 0.5) n/2 n/4 where r > n/2 Now at the 5% level of significance the decision of accepting the null hypothesis is taken if the calculated value of the Z statistic is less than Illustration 5: Suppose that we want to test the hypothesis that the median body length (θ) of frogs of a particular variety is θ 0 = 6.9cms against the alternative hypothesis θ 0 6.9cms with α = 0.05 on the basis of the following measurements. 6.3, 5.8, 7.7, 8.5, 5.2, 6.7, 7.3, 5.6, 8.3, 7.7, 8.2, 6.0, 6.8, 6.9, 6.3, 7.3, 7.0, 7.1, 6.6, 7.4 Solution: We set up the following null hypothesis H 0 : θ = 6.9 to be tested against the alternative hypothesis H 1 : θ 6.9 Let us put + for values in the series greater than 6.9, - for values in the series less than 6.9 and 0 for values in the series equal to 6.9. Thus we get,, +, +,,, +,, +, +, +,,, 0,, +, +, +,, + Thus, number of positive sign = 10, number of negative sign = 9 and so n = 19. x i d i = x i 6.9 d i R i ignored

19 1.8. SIGN TESTS Thus, the sum of ranks with positive values of d i s (rows with grey color) is T + = Similarly, the sum of ranks with negative values of d i s is T = Thus, T = min(t +, T ) = min(99.5, 90.5) = Here, since one of the observation is ignored so we have n = 19 and thus for a two sided alternative we find the table value of T α = 46 for n = 19 for α = 0.05 level of significance. Since T α < T so we accept the null hypothesis and conclude that H 0 : θ = 6.9 is true. Illustration 6: A drag was injected to a fresh group of 10 rats every day. The scientist-in-charge of the experiment made a claim that not more than 3 rats showed an increase in blood pressure on an average. The increase in blood pressure was noticed in the following number of rats in the last 10 days after the drug was administered. 2, 4, 5, 1, 6, 3, 2, 1, 7 and 8. Solution: Here the null hypothesis is H 0 : µ = 3 tested against H 1 : µ > 3. To perform the calculations we construct the following table: x i d i = x i 3 Signs ignored Thus, n = 9, and the number of plus signs = x = 5. Under the null hypothesis X B(9, 1 2 ). So, P (X 5) = 1 P (X < 5) = 1 4 x=0 9 (1) x (1) 9 x C x 2 2 = 1 [ ( 1 2 )9 + 9 ( 1 2 ) ( 1 2 ) ( 1 2 ) ( 1 2 )9] = 1 ( 1 2 )9 ( ) = 1 2 = 0.5 Thus, P (X 5) > 0.05 (level of significance) α. So, H 0 is accepted and thus the claim made by the Scientist is true.

20 20 CHAPTER 1. NON-PARAMETRIC TESTS Paired Sign Test Purpose Here based on two random samples X 1, X 2,..., X n and Y 1, Y 2,..., Y n of same size this test is used to check the hypothesis that the two random samples come from populations with identical density function. H 0 : f 1 (x) = f 2 (y) against a two sided alternative. Assumptions 1. The data is available in pairs of observations on two things being compared i.e. in the form (X i, Y i ) 2. For any given pair, each of the two observations are made under similar conditions. 3. Different pairs were observed under identical conditions. Derivation of the Test Statistic Here, we have two population p.d.f f 1 (x) and f 2 (y). Based on two random samples X 1, X 2,..., X n and Y 1, Y 2,..., Y n of same size a decision is to be taken about the null hypothesis H 0 : f 1 (x) = f 2 (y). The observations are arranged in pairs i.e. (x i, y i ), i = 1, 2, cdots, n and it is assumed that each pair is observed under identical conditions. Now, d i = x i y i is measured and only the sign (+ or -) is noted in lieu of the actual deviations. Now, under the null hypothesis the probability that the first observation of the first sample exceeds the first observation of the second sample is equal to the probability that the first observation of the second sample exceeds the first observation of the first sample and the probability of a tie is zero. So, H 0 can be written as: H 0 : P [X Y > 0] = 1 2 and P [X Y < 0] = 1 2 Let us define, u i = 1, if x i y i > 0 = 0, if x i y i < 0 So, u i is a Bernoulli s variate with p = P (x i y i > 0) = 1/2 Thus, u = u i, gives the total number of positive deviations and u is a binomial variate with parameters n and p = 1/2(under H 0 ). Let k be the number of positive deviations, so P (U k) = k r=0 n C r p r q n r = ( 1 2) n k r=0 n C r = p (Say) Now, if p 0.05 then we reject H 0 at 5% level of significance and if p > 0.05 then we conclude that the data does not go against the null hypothesis and so the null hypothesis is accepted. 1.9 Run Test for Randomness Purpose The theory of runs can be used to check the randomness in a set of observations. It is used to check if a random sample X 1, X 2,..., X n can be considered as a random sample from a

21 1.9. RUN TEST FOR RANDOMNESS 21 continuous distribution. Here, H 0 : The observations are random in nature. The null hypothesis is tested against the alternative that the observations are non-random. Assumptions The observations in the sample are obtained under similar conditions. The Test Statistic Here from the sample of the form X 1, X 2,, X n the median is calculated. The sample is then rewritten in such a way that each observation above the sample median is replaced by + and that below the sample median is replaced by -. If n is odd then the observation that is equal to the median is ignored. Thus the effective sample size is n 1 in case of odd number of observations and n in case of even number of observations. Thus, we get a series of + and - signs. The number of runs in this series is then counted. Let the total number of runs be k. Let n 1 be the number of + signs and n 2 be the number of - signs. As the number of sample values greater than the median is equal to the number of observations less than the median so we have n 1 = n 2 = m (say). Thus, the effective sample size is 2m. Now if K is a random variable which represents the number of runs. So, we can consider that K can assume 2n observations from a discrete distribution given by P (K = k) = 2 [m 1 C (k/2) 1 ] 2, k = 2, 4, 6,..., 2m 2m C m = 2 [m 1 C (k 1)/2 ][ m 1 C (k 3)/2 ] 2m C m, k = 3, 5, 7,..., 2m 1 If n is the initial sample size then for 5 n 40, the critical values of K can be obtained from the Table. However for n > 40, K can be assumed to follow a normal distribution with mean = (2n 1)/3 and variance (16n 29)/90. In both the cases if the test statistic lies in the critical region the the null hypothesis is rejected. Illustration 7: A coin is tossed 14 times. Following is the sequence of Heads (H) and Tails (T) that are obtained: H T T H H H T H T T H H T H Test using Run Test whether the heads and tails occur in random order. (Given α = 0.05, K L = 3, K U = 12). Solution: Let H 0 : Heads and tails occur in random order. Here the runs are directly provided. So we have n = 14, K = no. of runs = 9. Since, the observed values of the total number of runs lies between the critical values i.e. between 3 and 12 so we accept H 0. Thus we can conclude that the heads and tails occur in random order and hence the coin is unbiased.

22 22 CHAPTER 1. NON-PARAMETRIC TESTS 1.10 Wilcoxon One Sample Signed Rank Test Purpose Let a sample X 1, X 2,, X n be drawn from a population function with cumulative distribution function F (x). This test is used to check the hypothesis H 0 : F (m) = 1 2 against the alternatives H 1 : F (m) 1 2 or H 1 : F (m) > 1 2 or H 1 : F (m) < 1 2. Assumptions 1. F (x) is the cumulative distribution function of the random variable X and is absolutely continuous. 2. The density function f(x) is symmetric about the median, i.e. F (m x) = 1 F (x + m) and f(m x) = f(x + m). 3. No two observations in the random sample are equal. Derivation of the Test Statistic Case I Let us assume that m = 0. So, the distribution is symmetric about the origin. Hence, we have, F ( x) = 1 F (x) and f( x) = f(x) To test the null hypothesis H 0 : m = 0 or H 0 : F (m = 0) = 1 2 we proceed as follows: We order the absolute values of X i and denote their rank by R i. Let us define the random variable ξ i such that, ξ i = 1, if X i < 0 = 1, if X i > 0 Now, let us define a statistic W = ξ i R i, which is called as the Wilcoxon statistic. Also, let V 1, V 2,..., V n be a set of random variables subject to the condition that P (V i = i) = P (V i = i) = 1 2, where i = 1, 2,..., n and V = n i=1 V i. Then we have W = n i=1 ξ ir i has the same distribution as V. The mean and variance of W under H 0 is given by: E(W ) = E(V ) = E( V i ) = E(V i ) = E(V 1 ) + E(V 2 ) E(V n ) = ( ) + ( ) (n ) = 0 V ar(w ) = V ar(w i ) = [ E(Vi 2) {E(V i)} 2] = {i 2 n(n + 1)(2n + 1) 0} = 6 For sample size n > 25 we have W N(0, n(n+1)(2n+1) 6 ). Thus, the test statistic for testing the null hypothesis H 0 : m = 0 is given by, Case II T = W n(n+1)(2n+1) 6 N(0, 1) is an approximate statistic for testing H 0 : m = 0.

23 1.10. WILCOXON ONE SAMPLE SIGNED RANK TEST 23 In general, if the test is for H 0 : m = m 0 i.e. H 0 : F (m = m 0 ) = 1 2 then it can be shown n(n + 1) n(n + 1)(2n + 1) that for sample size n > 25 we have W N(, ). Thus, in such a case 4 24 the test statistic T is given by, W n(n + 1)/4 T = N(0, 1) is an appropriate statistic for testing H 0 : m = m 0. n(n + 1)(2n + 1) 24 Case III For testing H 0 : m = m 0 when the sample size is small (n < 25) we first find out the differences d i = x i θ 0. Under the null hypothesis we can assume that the values of d i are independent and comes from a population symmetrical about 0. We then find d i, the absolute difference. These, absolute differences are than arranged in ascending order and are accordingly ranked. The absolute differences with 0 values are ignored. Let the number of observations be now n 1 < n after the tied ranks are eliminated. In case of a tie in rank, each of the tied values are given the average value of the rank if there had been no ties. Let T + be the sum of ranks for positive d i s and T be the sum of ranks for negative d i s. Then T + + T = n 1(n 1 + 1). 2 The null distribution of T + and T are identical and is symmetrical about the value n 1 (n 1 + 1). The smaller of the two values T + and T is than compared with the table value in 2 the table Critical Values for T in the Wilcoxon Signed Rank Test for a given level of significance for n 1 number of observations and accordingly decision about the null hypothesis is taken. If the alternative hypothesis is H 1 : m > m 0 then we reject H 0 if T T α. If the alternative hypothesis is H 1 : m < m 0 then we reject H 0 if T + T α. If the alternative hypothesis is H 1 : m m 0 then we reject H 0 if T T α or T + T α. Illustration 8: A medical representative visited 12 doctors in a town. In order to meet the doctor he had to wait for 25, 10, 15, 20, 17, 11, 30, 27, 36, 40, 5 and 26 minutes respectively. However, the senior sales representative earlier claimed that the doctor claimed him waiting for more than 20 minutes. Using Wilcoxon Signed rank test verify the claim made by the senior sales representative at 5% level of significance. Solution: Here the null hypothesis is H 0 : µ = 20 minutes tested against the alternative hypothesis H 1 : µ > 20 minutes. To calculate the test statistic we construct the following table: x i x i 20 d i = x i 20 Ranks

24 24 CHAPTER 1. NON-PARAMETRIC TESTS ignored Here, the sum of positive ranks is equal to T + = 40 and the sum of negative ranks is equal to T = 26 (Sum of ranks in the columns with gray shade). Also, T = min(t +, T ) = 26. The effective sample size is n = 11. Since the alternative sample size is of the form H 1 : µ > µ 0 so the test statistic is T. Here, T = 26 > T 0.05 = 14. Thus H 0 is accepted, and so it can be concluded that the average waiting time of the sales representative for the doctor is 20 minutes Wilcoxon Matched Pair Signed Rank Test Purpose This test can be used to study significant difference between two samples consisting of observations in matched pairs. Matched pairs are generally two observations taken on the same item for two different situations. Let us consider two samples X 1, X 2,, X n and Y 1, Y 2,, Y n be obtained from the same item at two different situations. This test is used to check the hypothesis H 0 : There is no significant difference between the two samples. To be tested against the alternatives H 1 : The two samples differ significantly. Assumptions 1. The distribution function of both the random samples are absolutely continuous. 2. The random samples are independent of each other. Derivation of the Test Statistic This test can be performed in the following manner: Step I For each paired observation obtain the difference in scores i.e. d i, where d i = X i Y i. Step II Rank these differences by ignoring the plus and minus signs of d i. d i = 0 are ignored. When ranks are tied assign the average of the tied ranks to them. Step III

25 1.11. WILCOXON MATCHED PAIR SIGNED RANK TEST 25 Assign each rank the + or - sign that the difference it represents. Step IV Calculate T + the sum of positive ranks and T the sum of negative ranks. Hence, obtain T = min(t +, T Step V Case I: If the number of pairs (n) is 25, then compare the calculated value of T with the critical value of T obtained from the table for given sample size at a required level of significance. If the calculated value of T is less than the critical value then the null hypothesis is accepted. Case II: If the number of pairs (n) is > 25, then we can use the Z approximation to the T statistic: n(n + 1) T Z = 4 N(0, 1) n(n + 1)(2n + 1) 24 However in case of ties a correction factor u(u 2 1)/48 is introduced in the test statistic in the following manner: n(n + 1) T Z = 4 N(0, 1) n(n + 1)(2n + 1) u(u 2 1) Note: 1. This test solves the same problem that is done by two sample sign test but this test is more powerful than the sign test since the former takes into account both the sign and magnitude of the members of each pair. 2. This test is the non-parametric counterpart of paired t-test used for correlated data. 3. The calculation involved in derivation of the test statistic is similar to that of the Wilcoxon Signed rank test. Illustration 9: Two computer Key Boards manufactured by two different companies are put to test. Twenty typist were chosen at random and asked to type on both the key boards one by one. There speed were recorded in terms of the number of words per minute. The following results were obtained. Typist No: Keyboard I: Keyboard II: Using Wilcoxon Signed rank test verify the claim whether the second key-board is better than the first one at 5% level of significance. Solution: Let µ 1 be the average number of words typed per minute in Keyboard I and µ 2 be the

26 26 CHAPTER 1. NON-PARAMETRIC TESTS average number of words typed per minute in Keyboard II. Here we are to test H 0 : µ 1 µ 2 = 0 to be tested against the one sided alternative hypothesis H 1 : µ 1 µ 2 < 0. In order to perform the Wilcoxon Signed rank test for two samples we construct the following table: x i y i x i y i d i = x i y i Ranks ignore Here, the sum of positive ranks is equal to T + = and the sum of negative ranks is equal to T = 61.5 (Sum of ranks in the columns with gray shade). Also, T = min(t +, T ) = 67.5 and the effective sample size is n = 19. Since the alternative sample size is of the form H 1 : µ 1 µ 2 < 0 so the test statistic is T +. Here, T + = > T 0.05 = 54 for n = 19. Thus, H 0 is accepted, and so it can be concluded that the second key-board is not better than the first one tested at 5% level of significance Wald-Wolfowitz Run Test Purpose The theory of runs can be used to check if two random samples are drawn from the same distribution against the alternative that they are not. It is used to check if the two random

27 1.12. WALD-WOLFOWITZ RUN TEST 27 samples X 1, X 2,..., X n1 and Y 1, Y 2,..., Y n2 can be considered to have appeared from identical populations against that they are not 3. Thus if F 1 and F 2 be the distribution functions from which the two samples are drawn respectively then the hypothesis to be tested is that: H 0 : F 1 (.) = F 2 (.) to be tested against H 1 : F 1 (.) F 2 (.) Assumptions 1. The observations in the two samples are obtained under similar conditions. 2. The observations are independent of each other. 3. The distributions from which the samples are drawn is continuous. The Test Statistic To perform this test we first combine the two samples and arrange them in ascending order of magnitude. We rewrite the sample in terms of X and Y such that, X represents the member of the first sample and Y represents the member of the second sample. Thus runs of X and Y are obtained. Let K be the total number of runs. If both the samples come from the same population then there will be a thorough mixing of X and Y and accordingly the number of runs will be more. However, if the the number of runs is less than the distributions are not considered to be identical and H 0 will be rejected eventually. Now, there are n 1 X s and n 2 Y s which can arrange themselves in n 1+n 2 Cn 1 number of ways. Now the number of runs i.e. K can be either odd or even accordingly two cases arise. Case I: When K is even. Let K = 2m. This means that there will be m runs of X s and Y s because under H 0 each of the n 1 and n 2 arrangements are equally likely. Now, for having m runs of X s the different combinations of X s should be separated by m 1 spaces. this could happen in n 1 1 C m 1 ways. Thus, P (K = 2m) = n 1 1 C m 1 n 2 1 C m 1 n 1 +n 2 C n1 Case II: When K is odd. Let K = 2m + 1. This means that there will be either: (i) m runs of X s and m + 1 runs of Y s OR (ii) there will be m + 1 runs of X s and m runs of Y s. Now (i) can materialize in n 1 1 C m 1 n 2 1 C m number of ways. Similarly, (ii) can materialize in n 1 1 C m n 2 1 C m 1 number of ways. Thus the required probability is P (K = 2m + 1) = n 1 1 C m 1 n 2 1 C m + n 1 1 C m n 2 1 C m 1 n 1 +n 2 C n1 Let the probability of Type-I error be α then the critical value of K (= k 0, say)can be determined from the following equation: 3 This test can be used to study any type of difference between the samples like median, variability or skewness.

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007) FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter

More information

Chapter 15: Nonparametric Statistics Section 15.1: An Overview of Nonparametric Statistics

Chapter 15: Nonparametric Statistics Section 15.1: An Overview of Nonparametric Statistics Section 15.1: An Overview of Nonparametric Statistics Understand Difference between Parametric and Nonparametric Statistical Procedures Parametric statistical procedures inferential procedures that rely

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric Assumptions The observations must be independent. Dependent variable should be continuous

More information

Lecture Slides. Elementary Statistics. by Mario F. Triola. and the Triola Statistics Series

Lecture Slides. Elementary Statistics. by Mario F. Triola. and the Triola Statistics Series Lecture Slides Elementary Statistics Tenth Edition and the Triola Statistics Series by Mario F. Triola Slide 1 Chapter 13 Nonparametric Statistics 13-1 Overview 13-2 Sign Test 13-3 Wilcoxon Signed-Ranks

More information

Lecture Slides. Section 13-1 Overview. Elementary Statistics Tenth Edition. Chapter 13 Nonparametric Statistics. by Mario F.

Lecture Slides. Section 13-1 Overview. Elementary Statistics Tenth Edition. Chapter 13 Nonparametric Statistics. by Mario F. Lecture Slides Elementary Statistics Tenth Edition and the Triola Statistics Series by Mario F. Triola Slide 1 Chapter 13 Nonparametric Statistics 13-1 Overview 13-2 Sign Test 13-3 Wilcoxon Signed-Ranks

More information

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE THE ROYAL STATISTICAL SOCIETY 004 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE PAPER II STATISTICAL METHODS The Society provides these solutions to assist candidates preparing for the examinations in future

More information

NONPARAMETRIC TESTS. LALMOHAN BHAR Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-12

NONPARAMETRIC TESTS. LALMOHAN BHAR Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-12 NONPARAMETRIC TESTS LALMOHAN BHAR Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-1 lmb@iasri.res.in 1. Introduction Testing (usually called hypothesis testing ) play a major

More information

Nonparametric statistic methods. Waraphon Phimpraphai DVM, PhD Department of Veterinary Public Health

Nonparametric statistic methods. Waraphon Phimpraphai DVM, PhD Department of Veterinary Public Health Nonparametric statistic methods Waraphon Phimpraphai DVM, PhD Department of Veterinary Public Health Measurement What are the 4 levels of measurement discussed? 1. Nominal or Classificatory Scale Gender,

More information

Textbook Examples of. SPSS Procedure

Textbook Examples of. SPSS Procedure Textbook s of IBM SPSS Procedures Each SPSS procedure listed below has its own section in the textbook. These sections include a purpose statement that describes the statistical test, identification of

More information

B.N.Bandodkar College of Science, Thane. Random-Number Generation. Mrs M.J.Gholba

B.N.Bandodkar College of Science, Thane. Random-Number Generation. Mrs M.J.Gholba B.N.Bandodkar College of Science, Thane Random-Number Generation Mrs M.J.Gholba Properties of Random Numbers A sequence of random numbers, R, R,., must have two important statistical properties, uniformity

More information

Unit 14: Nonparametric Statistical Methods

Unit 14: Nonparametric Statistical Methods Unit 14: Nonparametric Statistical Methods Statistics 571: Statistical Methods Ramón V. León 8/8/2003 Unit 14 - Stat 571 - Ramón V. León 1 Introductory Remarks Most methods studied so far have been based

More information

Inferential Statistics

Inferential Statistics Inferential Statistics Eva Riccomagno, Maria Piera Rogantin DIMA Università di Genova riccomagno@dima.unige.it rogantin@dima.unige.it Part G Distribution free hypothesis tests 1. Classical and distribution-free

More information

Non-parametric methods

Non-parametric methods Eastern Mediterranean University Faculty of Medicine Biostatistics course Non-parametric methods March 4&7, 2016 Instructor: Dr. Nimet İlke Akçay (ilke.cetin@emu.edu.tr) Learning Objectives 1. Distinguish

More information

CHI SQUARE ANALYSIS 8/18/2011 HYPOTHESIS TESTS SO FAR PARAMETRIC VS. NON-PARAMETRIC

CHI SQUARE ANALYSIS 8/18/2011 HYPOTHESIS TESTS SO FAR PARAMETRIC VS. NON-PARAMETRIC CHI SQUARE ANALYSIS I N T R O D U C T I O N T O N O N - P A R A M E T R I C A N A L Y S E S HYPOTHESIS TESTS SO FAR We ve discussed One-sample t-test Dependent Sample t-tests Independent Samples t-tests

More information

NON-PARAMETRIC STATISTICS * (http://www.statsoft.com)

NON-PARAMETRIC STATISTICS * (http://www.statsoft.com) NON-PARAMETRIC STATISTICS * (http://www.statsoft.com) 1. GENERAL PURPOSE 1.1 Brief review of the idea of significance testing To understand the idea of non-parametric statistics (the term non-parametric

More information

3. Nonparametric methods

3. Nonparametric methods 3. Nonparametric methods If the probability distributions of the statistical variables are unknown or are not as required (e.g. normality assumption violated), then we may still apply nonparametric tests

More information

Contents. Acknowledgments. xix

Contents. Acknowledgments. xix Table of Preface Acknowledgments page xv xix 1 Introduction 1 The Role of the Computer in Data Analysis 1 Statistics: Descriptive and Inferential 2 Variables and Constants 3 The Measurement of Variables

More information

Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p.

Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p. Preface p. xi Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p. 6 The Scientific Method and the Design of

More information

Data are sometimes not compatible with the assumptions of parametric statistical tests (i.e. t-test, regression, ANOVA)

Data are sometimes not compatible with the assumptions of parametric statistical tests (i.e. t-test, regression, ANOVA) BSTT523 Pagano & Gauvreau Chapter 13 1 Nonparametric Statistics Data are sometimes not compatible with the assumptions of parametric statistical tests (i.e. t-test, regression, ANOVA) In particular, data

More information

psychological statistics

psychological statistics psychological statistics B Sc. Counselling Psychology 011 Admission onwards III SEMESTER COMPLEMENTARY COURSE UNIVERSITY OF CALICUT SCHOOL OF DISTANCE EDUCATION CALICUT UNIVERSITY.P.O., MALAPPURAM, KERALA,

More information

STATISTIKA INDUSTRI 2 TIN 4004

STATISTIKA INDUSTRI 2 TIN 4004 STATISTIKA INDUSTRI 2 TIN 4004 Pertemuan 11 & 12 Outline: Nonparametric Statistics Referensi: Walpole, R.E., Myers, R.H., Myers, S.L., Ye, K., Probability & Statistics for Engineers & Scientists, 9 th

More information

Nonparametric Statistics

Nonparametric Statistics Nonparametric Statistics Nonparametric or Distribution-free statistics: used when data are ordinal (i.e., rankings) used when ratio/interval data are not normally distributed (data are converted to ranks)

More information

Rank-Based Methods. Lukas Meier

Rank-Based Methods. Lukas Meier Rank-Based Methods Lukas Meier 20.01.2014 Introduction Up to now we basically always used a parametric family, like the normal distribution N (µ, σ 2 ) for modeling random data. Based on observed data

More information

Dr. Maddah ENMG 617 EM Statistics 10/12/12. Nonparametric Statistics (Chapter 16, Hines)

Dr. Maddah ENMG 617 EM Statistics 10/12/12. Nonparametric Statistics (Chapter 16, Hines) Dr. Maddah ENMG 617 EM Statistics 10/12/12 Nonparametric Statistics (Chapter 16, Hines) Introduction Most of the hypothesis testing presented so far assumes normally distributed data. These approaches

More information

QUANTITATIVE TECHNIQUES

QUANTITATIVE TECHNIQUES UNIVERSITY OF CALICUT SCHOOL OF DISTANCE EDUCATION (For B Com. IV Semester & BBA III Semester) COMPLEMENTARY COURSE QUANTITATIVE TECHNIQUES QUESTION BANK 1. The techniques which provide the decision maker

More information

6 Single Sample Methods for a Location Parameter

6 Single Sample Methods for a Location Parameter 6 Single Sample Methods for a Location Parameter If there are serious departures from parametric test assumptions (e.g., normality or symmetry), nonparametric tests on a measure of central tendency (usually

More information

PSY 307 Statistics for the Behavioral Sciences. Chapter 20 Tests for Ranked Data, Choosing Statistical Tests

PSY 307 Statistics for the Behavioral Sciences. Chapter 20 Tests for Ranked Data, Choosing Statistical Tests PSY 307 Statistics for the Behavioral Sciences Chapter 20 Tests for Ranked Data, Choosing Statistical Tests What To Do with Non-normal Distributions Tranformations (pg 382): The shape of the distribution

More information

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics DETAILED CONTENTS About the Author Preface to the Instructor To the Student How to Use SPSS With This Book PART I INTRODUCTION AND DESCRIPTIVE STATISTICS 1. Introduction to Statistics 1.1 Descriptive and

More information

Nonparametric Statistics. Leah Wright, Tyler Ross, Taylor Brown

Nonparametric Statistics. Leah Wright, Tyler Ross, Taylor Brown Nonparametric Statistics Leah Wright, Tyler Ross, Taylor Brown Before we get to nonparametric statistics, what are parametric statistics? These statistics estimate and test population means, while holding

More information

Basic Business Statistics, 10/e

Basic Business Statistics, 10/e Chapter 1 1-1 Basic Business Statistics 11 th Edition Chapter 1 Chi-Square Tests and Nonparametric Tests Basic Business Statistics, 11e 009 Prentice-Hall, Inc. Chap 1-1 Learning Objectives In this chapter,

More information

HYPOTHESIS TESTING II TESTS ON MEANS. Sorana D. Bolboacă

HYPOTHESIS TESTING II TESTS ON MEANS. Sorana D. Bolboacă HYPOTHESIS TESTING II TESTS ON MEANS Sorana D. Bolboacă OBJECTIVES Significance value vs p value Parametric vs non parametric tests Tests on means: 1 Dec 14 2 SIGNIFICANCE LEVEL VS. p VALUE Materials and

More information

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing 1 In most statistics problems, we assume that the data have been generated from some unknown probability distribution. We desire

More information

Statistics Handbook. All statistical tables were computed by the author.

Statistics Handbook. All statistical tables were computed by the author. Statistics Handbook Contents Page Wilcoxon rank-sum test (Mann-Whitney equivalent) Wilcoxon matched-pairs test 3 Normal Distribution 4 Z-test Related samples t-test 5 Unrelated samples t-test 6 Variance

More information

ANOVA - analysis of variance - used to compare the means of several populations.

ANOVA - analysis of variance - used to compare the means of several populations. 12.1 One-Way Analysis of Variance ANOVA - analysis of variance - used to compare the means of several populations. Assumptions for One-Way ANOVA: 1. Independent samples are taken using a randomized design.

More information

Formulas and Tables by Mario F. Triola

Formulas and Tables by Mario F. Triola Copyright 010 Pearson Education, Inc. Ch. 3: Descriptive Statistics x f # x x f Mean 1x - x s - 1 n 1 x - 1 x s 1n - 1 s B variance s Ch. 4: Probability Mean (frequency table) Standard deviation P1A or

More information

Glossary for the Triola Statistics Series

Glossary for the Triola Statistics Series Glossary for the Triola Statistics Series Absolute deviation The measure of variation equal to the sum of the deviations of each value from the mean, divided by the number of values Acceptance sampling

More information

Chapter 18 Resampling and Nonparametric Approaches To Data

Chapter 18 Resampling and Nonparametric Approaches To Data Chapter 18 Resampling and Nonparametric Approaches To Data 18.1 Inferences in children s story summaries (McConaughy, 1980): a. Analysis using Wilcoxon s rank-sum test: Younger Children Older Children

More information

Non-parametric Statistics

Non-parametric Statistics 45 Contents Non-parametric Statistics 45.1 Non-parametric Tests for a Single Sample 45. Non-parametric Tests for Two Samples 4 Learning outcomes You will learn about some significance tests which may be

More information

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

Chapte The McGraw-Hill Companies, Inc. All rights reserved. er15 Chapte Chi-Square Tests d Chi-Square Tests for -Fit Uniform Goodness- Poisson Goodness- Goodness- ECDF Tests (Optional) Contingency Tables A contingency table is a cross-tabulation of n paired observations

More information

Probability and Probability Distributions. Dr. Mohammed Alahmed

Probability and Probability Distributions. Dr. Mohammed Alahmed Probability and Probability Distributions 1 Probability and Probability Distributions Usually we want to do more with data than just describing them! We might want to test certain specific inferences about

More information

Statistics: revision

Statistics: revision NST 1B Experimental Psychology Statistics practical 5 Statistics: revision Rudolf Cardinal & Mike Aitken 29 / 30 April 2004 Department of Experimental Psychology University of Cambridge Handouts: Answers

More information

SBAOD Statistical Methods & their Applications - II. Unit : I - V

SBAOD Statistical Methods & their Applications - II. Unit : I - V SBAOD Statistical Methods & their Applications - II Unit : I - V SBAOD Statistical Methods & their applications -II 2 Unit I - Syllabus Random Variable Mathematical Expectation Moments Moment generating

More information

Lecture 7: Hypothesis Testing and ANOVA

Lecture 7: Hypothesis Testing and ANOVA Lecture 7: Hypothesis Testing and ANOVA Goals Overview of key elements of hypothesis testing Review of common one and two sample tests Introduction to ANOVA Hypothesis Testing The intent of hypothesis

More information

Comparison of Two Samples

Comparison of Two Samples 2 Comparison of Two Samples 2.1 Introduction Problems of comparing two samples arise frequently in medicine, sociology, agriculture, engineering, and marketing. The data may have been generated by observation

More information

Chapter Fifteen. Frequency Distribution, Cross-Tabulation, and Hypothesis Testing

Chapter Fifteen. Frequency Distribution, Cross-Tabulation, and Hypothesis Testing Chapter Fifteen Frequency Distribution, Cross-Tabulation, and Hypothesis Testing Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 15-1 Internet Usage Data Table 15.1 Respondent Sex Familiarity

More information

Non-parametric (Distribution-free) approaches p188 CN

Non-parametric (Distribution-free) approaches p188 CN Week 1: Introduction to some nonparametric and computer intensive (re-sampling) approaches: the sign test, Wilcoxon tests and multi-sample extensions, Spearman s rank correlation; the Bootstrap. (ch14

More information

Non-parametric tests, part A:

Non-parametric tests, part A: Two types of statistical test: Non-parametric tests, part A: Parametric tests: Based on assumption that the data have certain characteristics or "parameters": Results are only valid if (a) the data are

More information

Non-Parametric Statistics: When Normal Isn t Good Enough"

Non-Parametric Statistics: When Normal Isn t Good Enough Non-Parametric Statistics: When Normal Isn t Good Enough" Professor Ron Fricker" Naval Postgraduate School" Monterey, California" 1/28/13 1 A Bit About Me" Academic credentials" Ph.D. and M.A. in Statistics,

More information

Purposes of Data Analysis. Variables and Samples. Parameters and Statistics. Part 1: Probability Distributions

Purposes of Data Analysis. Variables and Samples. Parameters and Statistics. Part 1: Probability Distributions Part 1: Probability Distributions Purposes of Data Analysis True Distributions or Relationships in the Earths System Probability Distribution Normal Distribution Student-t Distribution Chi Square Distribution

More information

CENTRAL TENDENCY (1 st Semester) Presented By Dr. Porinita Dutta Department of Statistics

CENTRAL TENDENCY (1 st Semester) Presented By Dr. Porinita Dutta Department of Statistics CENTRAL TENDENCY (1 st Semester) Presented By Dr. Porinita Dutta Department of Statistics OUTLINES Descriptive Statistics Introduction of central tendency Classification Characteristics Different measures

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Physics 509: Non-Parametric Statistics and Correlation Testing

Physics 509: Non-Parametric Statistics and Correlation Testing Physics 509: Non-Parametric Statistics and Correlation Testing Scott Oser Lecture #19 Physics 509 1 What is non-parametric statistics? Non-parametric statistics is the application of statistical tests

More information

Statistical Inference Theory Lesson 46 Non-parametric Statistics

Statistical Inference Theory Lesson 46 Non-parametric Statistics 46.1-The Sign Test Statistical Inference Theory Lesson 46 Non-parametric Statistics 46.1 - Problem 1: (a). Let p equal the proportion of supermarkets that charge less than $2.15 a pound. H o : p 0.50 H

More information

Design of the Fuzzy Rank Tests Package

Design of the Fuzzy Rank Tests Package Design of the Fuzzy Rank Tests Package Charles J. Geyer July 15, 2013 1 Introduction We do fuzzy P -values and confidence intervals following Geyer and Meeden (2005) and Thompson and Geyer (2007) for three

More information

Contingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878

Contingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878 Contingency Tables I. Definition & Examples. A) Contingency tables are tables where we are looking at two (or more - but we won t cover three or more way tables, it s way too complicated) factors, each

More information

Introduction to Statistical Data Analysis III

Introduction to Statistical Data Analysis III Introduction to Statistical Data Analysis III JULY 2011 Afsaneh Yazdani Preface Major branches of Statistics: - Descriptive Statistics - Inferential Statistics Preface What is Inferential Statistics? The

More information

Module 9: Nonparametric Statistics Statistics (OA3102)

Module 9: Nonparametric Statistics Statistics (OA3102) Module 9: Nonparametric Statistics Statistics (OA3102) Professor Ron Fricker Naval Postgraduate School Monterey, California Reading assignment: WM&S chapter 15.1-15.6 Revision: 3-12 1 Goals for this Lecture

More information

1; (f) H 0 : = 55 db, H 1 : < 55.

1; (f) H 0 : = 55 db, H 1 : < 55. Reference: Chapter 8 of J. L. Devore s 8 th Edition By S. Maghsoodloo TESTING a STATISTICAL HYPOTHESIS A statistical hypothesis is an assumption about the frequency function(s) (i.e., pmf or pdf) of one

More information

What is a Hypothesis?

What is a Hypothesis? What is a Hypothesis? A hypothesis is a claim (assumption) about a population parameter: population mean Example: The mean monthly cell phone bill in this city is μ = $42 population proportion Example:

More information

4/6/16. Non-parametric Test. Overview. Stephen Opiyo. Distinguish Parametric and Nonparametric Test Procedures

4/6/16. Non-parametric Test. Overview. Stephen Opiyo. Distinguish Parametric and Nonparametric Test Procedures Non-parametric Test Stephen Opiyo Overview Distinguish Parametric and Nonparametric Test Procedures Explain commonly used Nonparametric Test Procedures Perform Hypothesis Tests Using Nonparametric Procedures

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

NAG Library Chapter Introduction. G08 Nonparametric Statistics

NAG Library Chapter Introduction. G08 Nonparametric Statistics NAG Library Chapter Introduction G08 Nonparametric Statistics Contents 1 Scope of the Chapter.... 2 2 Background to the Problems... 2 2.1 Parametric and Nonparametric Hypothesis Testing... 2 2.2 Types

More information

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology, Kharagpur

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology, Kharagpur Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology, Kharagpur Lecture No. # 38 Goodness - of fit tests Hello and welcome to this

More information

Introduction to Biostatistics: Part 5, Statistical Inference Techniques for Hypothesis Testing With Nonparametric Data

Introduction to Biostatistics: Part 5, Statistical Inference Techniques for Hypothesis Testing With Nonparametric Data SPECIAL CONTRIBUTION biostatistics Introduction to Biostatistics: Part 5, Statistical Inference Techniques for Hypothesis Testing With Nonparametric Data Specific statistical tests are used when the null

More information

Introduction to Statistical Analysis

Introduction to Statistical Analysis Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive

More information

Non-parametric Hypothesis Testing

Non-parametric Hypothesis Testing Non-parametric Hypothesis Testing Procedures Hypothesis Testing General Procedure for Hypothesis Tests 1. Identify the parameter of interest.. Formulate the null hypothesis, H 0. 3. Specify an appropriate

More information

Agonistic Display in Betta splendens: Data Analysis I. Betta splendens Research: Parametric or Non-parametric Data?

Agonistic Display in Betta splendens: Data Analysis I. Betta splendens Research: Parametric or Non-parametric Data? Agonistic Display in Betta splendens: Data Analysis By Joanna Weremjiwicz, Simeon Yurek, and Dana Krempels Once you have collected data with your ethogram, you are ready to analyze that data to see whether

More information

Everything is not normal

Everything is not normal Everything is not normal According to the dictionary, one thing is considered normal when it s in its natural state or conforms to standards set in advance. And this is its normal meaning. But, like many

More information

Nonparametric tests. Mark Muldoon School of Mathematics, University of Manchester. Mark Muldoon, November 8, 2005 Nonparametric tests - p.

Nonparametric tests. Mark Muldoon School of Mathematics, University of Manchester. Mark Muldoon, November 8, 2005 Nonparametric tests - p. Nonparametric s Mark Muldoon School of Mathematics, University of Manchester Mark Muldoon, November 8, 2005 Nonparametric s - p. 1/31 Overview The sign, motivation The Mann-Whitney Larger Larger, in pictures

More information

Recall the Basics of Hypothesis Testing

Recall the Basics of Hypothesis Testing Recall the Basics of Hypothesis Testing The level of significance α, (size of test) is defined as the probability of X falling in w (rejecting H 0 ) when H 0 is true: P(X w H 0 ) = α. H 0 TRUE H 1 TRUE

More information

Distribution-Free Procedures (Devore Chapter Fifteen)

Distribution-Free Procedures (Devore Chapter Fifteen) Distribution-Free Procedures (Devore Chapter Fifteen) MATH-5-01: Probability and Statistics II Spring 018 Contents 1 Nonparametric Hypothesis Tests 1 1.1 The Wilcoxon Rank Sum Test........... 1 1. Normal

More information

Formulas and Tables. for Elementary Statistics, Tenth Edition, by Mario F. Triola Copyright 2006 Pearson Education, Inc. ˆp E p ˆp E Proportion

Formulas and Tables. for Elementary Statistics, Tenth Edition, by Mario F. Triola Copyright 2006 Pearson Education, Inc. ˆp E p ˆp E Proportion Formulas and Tables for Elementary Statistics, Tenth Edition, by Mario F. Triola Copyright 2006 Pearson Education, Inc. Ch. 3: Descriptive Statistics x Sf. x x Sf Mean S(x 2 x) 2 s Å n 2 1 n(sx 2 ) 2 (Sx)

More information

Background to Statistics

Background to Statistics FACT SHEET Background to Statistics Introduction Statistics include a broad range of methods for manipulating, presenting and interpreting data. Professional scientists of all kinds need to be proficient

More information

Exam details. Final Review Session. Things to Review

Exam details. Final Review Session. Things to Review Exam details Final Review Session Short answer, similar to book problems Formulae and tables will be given You CAN use a calculator Date and Time: Dec. 7, 006, 1-1:30 pm Location: Osborne Centre, Unit

More information

Intro to Parametric & Nonparametric Statistics

Intro to Parametric & Nonparametric Statistics Kinds of variable The classics & some others Intro to Parametric & Nonparametric Statistics Kinds of variables & why we care Kinds & definitions of nonparametric statistics Where parametric stats come

More information

Dealing with the assumption of independence between samples - introducing the paired design.

Dealing with the assumption of independence between samples - introducing the paired design. Dealing with the assumption of independence between samples - introducing the paired design. a) Suppose you deliberately collect one sample and measure something. Then you collect another sample in such

More information

Contingency Tables. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels.

Contingency Tables. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels. Contingency Tables Definition & Examples. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels. (Using more than two factors gets complicated,

More information

Inferential statistics

Inferential statistics Inferential statistics Inference involves making a Generalization about a larger group of individuals on the basis of a subset or sample. Ahmed-Refat-ZU Null and alternative hypotheses In hypotheses testing,

More information

Confidence Intervals, Testing and ANOVA Summary

Confidence Intervals, Testing and ANOVA Summary Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0

More information

HYPOTHESIS TESTING: THE CHI-SQUARE STATISTIC

HYPOTHESIS TESTING: THE CHI-SQUARE STATISTIC 1 HYPOTHESIS TESTING: THE CHI-SQUARE STATISTIC 7 steps of Hypothesis Testing 1. State the hypotheses 2. Identify level of significant 3. Identify the critical values 4. Calculate test statistics 5. Compare

More information

Analysis of Variance and Co-variance. By Manza Ramesh

Analysis of Variance and Co-variance. By Manza Ramesh Analysis of Variance and Co-variance By Manza Ramesh Contents Analysis of Variance (ANOVA) What is ANOVA? The Basic Principle of ANOVA ANOVA Technique Setting up Analysis of Variance Table Short-cut Method

More information

= 1 i. normal approximation to χ 2 df > df

= 1 i. normal approximation to χ 2 df > df χ tests 1) 1 categorical variable χ test for goodness-of-fit ) categorical variables χ test for independence (association, contingency) 3) categorical variables McNemar's test for change χ df k (O i 1

More information

Nonparametric Location Tests: k-sample

Nonparametric Location Tests: k-sample Nonparametric Location Tests: k-sample Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 04-Jan-2017 Nathaniel E. Helwig (U of Minnesota)

More information

This gives us an upper and lower bound that capture our population mean.

This gives us an upper and lower bound that capture our population mean. Confidence Intervals Critical Values Practice Problems 1 Estimation 1.1 Confidence Intervals Definition 1.1 Margin of error. The margin of error of a distribution is the amount of error we predict when

More information

INTRODUCTION TO ANALYSIS OF VARIANCE

INTRODUCTION TO ANALYSIS OF VARIANCE CHAPTER 22 INTRODUCTION TO ANALYSIS OF VARIANCE Chapter 18 on inferences about population means illustrated two hypothesis testing situations: for one population mean and for the difference between two

More information

QT (Al Jamia Arts and Science College, Poopalam)

QT (Al Jamia Arts and Science College, Poopalam) QUANTITATIVE TECHNIQUES Quantitative techniques may be defined as those techniques which provide the decision makes a systematic and powerful means of analysis, based on quantitative data. It is a scientific

More information

14.30 Introduction to Statistical Methods in Economics Spring 2009

14.30 Introduction to Statistical Methods in Economics Spring 2009 MIT OpenCourseWare http://ocw.mit.edu 4.0 Introduction to Statistical Methods in Economics Spring 009 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Frequency Distribution Cross-Tabulation

Frequency Distribution Cross-Tabulation Frequency Distribution Cross-Tabulation 1) Overview 2) Frequency Distribution 3) Statistics Associated with Frequency Distribution i. Measures of Location ii. Measures of Variability iii. Measures of Shape

More information

Tribhuvan University Institute of Science and Technology 2065

Tribhuvan University Institute of Science and Technology 2065 1CSc. Stat. 108-2065 Tribhuvan University Institute of Science and Technology 2065 Bachelor Level/First Year/ First Semester/ Science Full Marks: 60 Computer Science and Information Technology (Stat. 108)

More information

1 ONE SAMPLE TEST FOR MEDIAN: THE SIGN TEST

1 ONE SAMPLE TEST FOR MEDIAN: THE SIGN TEST NON-PARAMETRIC STATISTICS ONE AND TWO SAMPLE TESTS Non-parametric tests are normally based on ranks of the data samples, and test hypotheses relating to quantiles of the probability distribution representing

More information

Summary of Chapters 7-9

Summary of Chapters 7-9 Summary of Chapters 7-9 Chapter 7. Interval Estimation 7.2. Confidence Intervals for Difference of Two Means Let X 1,, X n and Y 1, Y 2,, Y m be two independent random samples of sizes n and m from two

More information

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics Exploring Data: Distributions Look for overall pattern (shape, center, spread) and deviations (outliers). Mean (use a calculator): x = x 1 + x

More information

Bus 216: Business Statistics II Introduction Business statistics II is purely inferential or applied statistics.

Bus 216: Business Statistics II Introduction Business statistics II is purely inferential or applied statistics. Bus 216: Business Statistics II Introduction Business statistics II is purely inferential or applied statistics. Study Session 1 1. Random Variable A random variable is a variable that assumes numerical

More information

Data analysis and Geostatistics - lecture VII

Data analysis and Geostatistics - lecture VII Data analysis and Geostatistics - lecture VII t-tests, ANOVA and goodness-of-fit Statistical testing - significance of r Testing the significance of the correlation coefficient: t = r n - 2 1 - r 2 with

More information

STAT Section 3.4: The Sign Test. The sign test, as we will typically use it, is a method for analyzing paired data.

STAT Section 3.4: The Sign Test. The sign test, as we will typically use it, is a method for analyzing paired data. STAT 518 --- Section 3.4: The Sign Test The sign test, as we will typically use it, is a method for analyzing paired data. Examples of Paired Data: Similar subjects are paired off and one of two treatments

More information

Non-parametric Inference and Resampling

Non-parametric Inference and Resampling Non-parametric Inference and Resampling Exercises by David Wozabal (Last update. Juni 010) 1 Basic Facts about Rank and Order Statistics 1.1 10 students were asked about the amount of time they spend surfing

More information

CDA Chapter 3 part II

CDA Chapter 3 part II CDA Chapter 3 part II Two-way tables with ordered classfications Let u 1 u 2... u I denote scores for the row variable X, and let ν 1 ν 2... ν J denote column Y scores. Consider the hypothesis H 0 : X

More information

Review for Final. Chapter 1 Type of studies: anecdotal, observational, experimental Random sampling

Review for Final. Chapter 1 Type of studies: anecdotal, observational, experimental Random sampling Review for Final For a detailed review of Chapters 1 7, please see the review sheets for exam 1 and. The following only briefly covers these sections. The final exam could contain problems that are included

More information

LOOKING FOR RELATIONSHIPS

LOOKING FOR RELATIONSHIPS LOOKING FOR RELATIONSHIPS One of most common types of investigation we do is to look for relationships between variables. Variables may be nominal (categorical), for example looking at the effect of an

More information