Introduction to Nonparametric Statistics by James Bernhard Spring 2012
Parameters Parametric method Nonparametric method µ[x 2 X 1 ] paired t-test Wilcoxon signed rank test µ[x 1 ], µ[x 2 ] 2-sample t-test Wilcoxon rank sum test µ[x 1 ],..., µ[x k ] 1-way ANOVA Kruskal-Wallis test All of the above non-parametric tests are particular types of permutation tests Also, any parameter can be estimated or tested (with varying degrees of success) with a bootstrap In all of the following tests, assume that we have collected an n-sample of the relevant variable(s)
In a parametric method, the test statistic: 1. Is assumed to come from a known family of distributions (such as normal or t) 2. Varies over all n-samples of the random variable in question In a permutation test, the test statistic: 1. Is not assumed to come from a known family of distributions, although it may be required to have certain characteristics (such as symmetry) 2. Varies over all permutations of the values in our particular n-sample (or equivalently, of the group labels in our particular n-sample)
A rank-based test is a particular type of permutation test that uses not the values in our n-sample, but only their ranks (from smallest to largest) Rank-based tests are common among non-parametric techniques, largely because: They do not require as much computational power as more general permutation tests They are not very sensitive to outliers With the computing power available nowadays, more general permutation tests are often quite feasible and may be preferable
The null hypothesis of a paired t-test is H 0 : µ[x 2 X 1 ] = 0 The test statistic for a paired t-test under H 0 is T = (X 2 X 1 ) 0 se[ (X 2 X 1 ) ], where se[ (X 2 X 1 ) ] = s/ n and s is the sample standard deviation of our sample of X 2 X 1
Under the assumption that (X 2 X 1 ) is normally distributed, then the disribution of T under the null hypothesis is t[n 1] The Central Limit Theorem implies that (X 2 X 1 ) will be approximately normally distributed for large n. If X 1, X 2 are normally distributed, then this is exactly true even when n = 1, but if X 1 or X 2 is far from normally distributed, then n may need to be much larger for (X 2 X 1 ) to be approximately normally distributed In R, you can use the function t.test(x 1, X 2, paired=true) or t.test(y G, paired=true) (with G indicating which group that an observation of Y belongs to), depending on the format of your data
The null hypothesis of a Wilcoxon signed rank test is that X 1, X 2 have the same distribution The alternative hypothesis could be that they don t (2-sided), or it could be that one of their distributions is a shifted version of the other (1-sided, to the left or right) To compute the test statistic W + of this test, suppose that the observations of X 2 X 1 in our n-sample are x 1, x 2,..., x n, ordered from smallest absolute value to largest absolute value Then W + is the sum of the ranks of the x i that are positive In R, you can use the function wilcox.test(x 1, X 2, paired=true) to conduct a Wilcoxon signed rank test (and see the help page for specifying a 1-sided alternative hypothesis)
The null hypothesis of a 2-sample t-test is H 0 : µ[x 1 ] = µ[x 2 ] The alternative hypothesis can be two-sided ( ) or one-sided (either > or <) The test statistic for a 2-sample t-test under H 0 is T = (X 2 X 1 ) 0, se[x 2 X 1 ] where se[x 2 X 1 ] = s1 2/n 1 + s2 2/n 2 and s 1, s 2 are the sample standard deviations of our samples of X 1 and X 2
Under the assumption that X 1, X 2 have the same variance and that X 1 and X 2 are approximately normally distributed, then the disribution of T under the null hypothesis is approximately t[min(n 1, n 2 ) 1] The Central Limit Theorem implies that X 1 and X 2 will be approximately normally distributed for large n 1, n 2. If X 1, X 2 are normally distributed, then this is exactly true for any n 1, n 2, but if X 1 or X 2 is far from normally distributed, then n may need to be much larger for X 2 X 1 to be approximately normally distributed In R, you can use the function t.test(x 1, X 2, var.equal=true) or t.test(y G, var.equal=true), depending on the format of your data There is a (less powerful) variant of the 2-sample t-test that does not assume equal variance; you can use this by not specifying var.equal=true, or by specifying var.equal=false
The null hypothesis of a Wilcoxon rank sum test is that X 1 and X 2 have the same distribution The alternative hypothesis could be that they don t (2-sided), or it could be that one of their distributions is a shifted version of the other (1-sided, to the left or right) To compute the test statistic W of this test, suppose that the observations of the pooled sample of both X 1 and X 2 are x 1, x 2,..., x n, ordered from smallest to largest Then W is the sum of the ranks of the x i that come from the sample of X 1 In R, you can use the function wilcox.test(x 1, X 2 ) to conduct a Wilcoxon rank sum test (and see the help page for specifying a 1-sided alternative hypothesis)
The null hypothesis in a 1-way ANOVA is H 0 : µ[x 1 ] = µ[x 2 ] = = µ[x k ] You have seen this in the context of linear models as an F -test of the null hypothesis that Y 1 is adequate, as compared to Y G, where G is the grouping categorical variable specifying which X i that Y should take its value from The sampling variability assumptions for such a test are that the conditional error terms are independently normally distributed with equal variances; this is equivalent to requiring that the observations be independent, that each X i is normally distributed, and that the variances of all of the X i s are the same In R, you can conduct such a test with anova(lm(y 1), lm(y G))
The null hypothesis of a Kruskal-Wallis test is that X 1, X 2,..., X k all have the same distribution; the alternative hypothesis is that they don t, which means that at least one distribution is different from another distribution To compute the test statistic K of this test, suppose that the observations of the pooled sample of X 1,..., X k are x 1, x 2,..., x n, ordered from smallest to largest, so that the index is the rank of the observation Then let R be the mean rank (which is (n + 1)/2), and let R i be the mean rank within the group that x i belongs to. The test statistic is: 12 n K = (R i R) 2. n(n + 1) i=1
Under H 0, the test statistic K has approximately a χ 2 [k 1] distribution In R, you can use the function kruskal.test(y, G) to conduct a Kruskal-Wallis test, where Y is the pooled sample of X 1,..., X k, and G is a vector specifying which X i each observation belongs to
Chapter 4 Recall that a confidence interval of confidence level c for a parameter θ has the property that constructing such a confidence interval has a probability of c of containing θ More specifically, if θ is a parameter of X, and if we consider the endpoints of a confidence interval of confidence level c to be random variables C low and C high that vary over n-samples of X, then Prob(C low θ C high ) = c Note that this probability does not refer to individual values of C low and C high, but rather to the random variables C low and C high as they vary over n-samples of X
Chapter 4 The main point in Section 4.3 is that a confidence interval with confidence level 1 α consists of the set of real numbers x such that H 0 : θ = x will not be rejected in a 2-sided hypothesis test of signficance level α For example, if a 95% confidence interval for a random variable mean µ is from 10.1mm to 13.3mm, then in a 2-sided hypothesis with significance level 0.05, there would not be statistically significant evidence against the null hypothesis H 0 : µ = 11mm, but there would be statistically significant evidence against H 0 : µ = 14mm This can go slightly wrong if approximations are used (as is often the case) in computing the confidence interval in question or in conducting the hypothesis test
Chapter 4 One other issue raised in Section 4.3 is that of multiple testing If we conduct 100 tests each having significance level 0.05, then if all of the null hypotheses are true, the probability that none of these tests will result in a Type I error is: (0.95) 100 = 0.006 Therefore the probability that at least one of these tests will result in a Type I error is: 1 (0.95) 100 = 0.994, which is a far cry from the 0.05 that we would prefer Although we won t go into the details of how people address this difficulty here, you should be aware of this problem with multiple testing and address it appropriately if you are conducting a large number of hypothesis tests