Inferential Statistics

Size: px

Start display at page:

Download "Inferential Statistics"

Peter Long
5 years ago
Views:

1 Inferential Statistics Eva Riccomagno, Maria Piera Rogantin DIMA Università di Genova

2 Part G Distribution free hypothesis tests 1. Classical and distribution-free tests 2. Distribution-free statistics and tests 3. Aside of Probability. Two distribution-free statistics 4. The sign test 5. The Wilcoxon-Mann-Whitney test 6. The goodness-of-fit tests a) Chi-square test b) Kolmogorov-Smirnov tests (one and two samples) 7. Final remarks 1

3 1. Classical and distribution-free tests Differences between independent groups Classical: t-test (or Welch test) to compare the mean of two groups; ANOVA for more groups Distribution-free: Mann-Whitney U test and Kolmogorov- Smirnov two-sample test; Kruskal-Wallis and Median test for more groups. Differences between variables Classical: t-test for paired samples; repeated measures ANOVA for more than two variables Distribution-free: Sign test and Wilcoxon s matched pairs test. Relationships between variables Classical: correlation coefficient. Distribution-free: Spearman R,.... For binary variables: Chi-square test, Phi coefficient, and Fisher exact test. 2

4 2. Distribution-free statistics and tests Let X 1,..., X n F be i.i.d. sample variables. A statistic T = T (X 1,..., X n ) is distribution-free if its distribution is invariant for each distribution of the sample variables. An example: the Wald test (using CLT approximation for the distribution of X n ). Under H 0 : µ = µ 0 for large n X n µ 0 S/ n approx N (0, 1) This is a particular case of: the statistics with asymptotic (limit) distribution independent from the sample distribution are distributionfree. A test is distribution-free if the test statistic is distribution-free. 3

5 3. Aside of Probability. Two distribution free statistics Sign Statistics Consider any i.i.d. random sample X 1,..., X n with median equal to 0. Assume P(X i = 0) = 0, for i = 1,..., n (e.g. X i continuous). Define Z i = { 1 if Xi > 0 0 if X i < 0 and note that Z i B(1, 1/2) The statistic B = n i=1 Z i B(n, 1/2) is distribution free Furthermore, for large n the statistic B 1/2 1/2 n approx N (0, 1) 4

6 Rank statistics Consider the sample variables X 1,..., X n and the corresponding rank variables R 1,..., R n where R i represents the position of X i in the sample Note. In lecture 2 we saw that an observed sample can be ordered by eg. the R command sort. Also random variables can be sorted returning the ordered random vector (X (1),..., X (n) ). random sample and it is a random variable. Thus R 1 is the index of the minimum of the The joint distribution of (R 1,..., R n ) does not depend on the distribution of the sample variables. We do not give here the details (proof based on combinatorial computation). If the data contains ties, to the tied values assign the average of the ranks they would have received had they not been tied. E.g. to the values are assigned the ranks

7 4. The simplest distribution-free test: the sign test a) Test for the median of a random variable b) Test for the equality of two medians - paired sample a) Test for the median of a random variable. Consider X 1,..., X n i.i.d. random sample and the test with hypotheses: H 0 : Q2 = λ 0 against H 1 : Q2 λ 0 H 1 could be Q2 < λ 0 or Q2 > λ 0. Consider Z i = { 1 if Xi λ 0 0 if X i < λ 0 then Z i B(1, 1/2) and the test statistic is Under H 0, B B(n, 1/2) B = n Z i i=1 The test is carried out as usual. 6

8 b) Test for the equality of two medians - paired samples Let X and Y be two continuous random variables modeling some characteristic of the same population, with median Q2 X and Q2 Y respectively. Consider a test with hypotheses: H 0 : Q2 X = Q2 Y against H 1 : Q2 X > Q2 Y H 1 could be Q2 X Q2 Y or Q2 X < Q2 Y. Let (X 1, Y 1 ),..., (X n, Y n ) be the n paired random sample and define (D 1,..., D i,..., D n ) with D i = X i Y i. The test hypotheses become H 0 : Q2 D = 0 against H 1 : Q2 D > 0 and we fall in the set-up of case a). Remark. A powerful alternative for both a) and b) is the rank signed Wilcoxon test. We do not give here the details. 7

9 Example. Deer legs Zar, Jerold H. (1999), Chapter 24: More on Dichotomous Variables, Biostatistical Analysis (Fourth ed.), Prentice-Hall The null hypothesis is that there is no difference between the hind leg and foreleg length in deer. The alternative hypothesis is that the hind leg length is longer than foreleg length. Thus: Deer Hind leg Foreleg Diff. sign H 0 : Q2 D = 0 against H 1 : Q2 D > 0 Under H 0 the test statistic is B = 10 i=1 Z i B(10, 1/2). Its sample value is b = 8. The test is one-sided right. The p-value of b is (in R: 1-pbinom(7,10,0.5)). 8

10 Direct computation in R > binom.test(8,10,alternative ="greater") Exact binomial test data: 8 and 10 number of successes = 8, number of trials = 10, p-value = alternative hypothesis: true probability of success is greater than percent confidence interval: sample estimates: probability of success 0.8 There is weak evidence against H 0. When the sample size is small the tails of the test statistic distribution are large. This leads to reject H 0 often. In practice, to overcome this choose a high α. In our example there is evidence to retain H 0. 9

11 5. The Mann-Whitney U test or Wilcoxon rank-sum test (equality of two distributions - unpaired samples) The null hypothesis can be expressed as the probability of an observation from the population X exceeding an observation from the population Y equals the probability of an observation from Y exceeding an observation from X: H 0 : P(X > Y ) = P(X < Y ) = 0.5 The alternative hypothesis can be stated in terms of one-sided (left or right) or two-sided test. Here X and Y are two continuous independent random variables and to test H 0 we consider X 1,..., X n1 and Y 1,..., Y n2 two independent random samples, with possibly different size. The variables could be discrete or ordinal with P(X = Y ) = 0. 10

12 Put together the two samples, so that there are n = n 1 + n 2 observations in total. Let R 1,..., R n1 be the rank variables assigned to X 1,..., X n1 and R n1 +1,..., R n the rank variables assigned to Y 1,..., Y n2. The statistics W 1 = n 1 i=1 R i and U 1 = W 1 n 1(n 1 + 1) 2 are distribution-free and are used as test statistic. U 1 takes integer values between 0 and n 1 n 2. The statistics W 2 and U 2 (based on ranks of the Y s) are defined analogously. Moreover W 1 + W 2 = n(n + 1)/2. Which between W 1 and W 2 is to consider? (or U 1 and U 2?) Usually the statistics with lower sample value is used. 11

13 A small example Does the treatment A produce lower values of a variable than the treatment B? Denote by X and Y the variables modeling the results of treatment A and B respectively. H 0 : P(X < Y ) = P(X > Y ) H 1 : P(X < Y ) > P(X > Y ) Seven elements are drawn from the population at random. Three, randomly chosen, are assigned to treatment A; the other four to treatment B: n 1 = 3 and n 2 = 4. The sample values and the corresponding sample ranks are x i r(x i ) y i r(y i ) The sample value of W 1 is w = 7. 12

14 Computation of the distribution of W 1 under H 0 (n 1 = 3, n 2 = 4) W 1 is sum of 3 different numbers chosen among {1,..., 7}. It takes values between 6 and 18. It is symmetrical w.r.t. 12. How many ways are there to form w? - 6: one way, ; - 7: one way, ; - 8: two ways, and ;... Under H 0, the three ranks of X are randomly chosen among {1,..., 7}: ( 7 3) = 35 cases. Then the distribution of W 1 for n 1 = 3 and n 2 = 4 is w associated ranks f W1 (w) 6 (1,2,3) 1/35 7 (1,2,4) 1/35 8 (1,2,5);(1,3,4) 2/35 9 (1,2,6); (1,3,5); (2,3,4) 3/35 10 (1,2,7); (1,3,6); (1,4,5); (2,3,5) 4/35 11 (1,3,7); (1,4,6); (2,4,5); (2,3,6) 4/35 12 (1,4,7); (1,5,6); (2,3,7); (2,4,6); (3,4,5) 5/35 The distribution of W 1 depends only on n 1 and n 2 : W 1 is a distribution-free statistics 13

15 Some properties of W 1 and U 1 under H 0 Minimum value: all the ranks of the X i s are smaller than the ranks of the Y i s: min(w 1 ) = n 1 i=1 i = n 1(n 1 + 1) 2 min(u 1 ) = 0 Maximum value: all the ranks of the Y i s are smaller than the ranks of the X i s: n max(w 1 ) = i = n 1(n + n 2 + 1) max(u 1 ) = n 1 n 2 2 Mean value: Variance: i=n 1 +1 E(W 1 ) = n 1 (n + 1) 2 E(U 1 ) = n 1n 2 2 V(W 1 ) = V(U 1 ) = n 1 n 2 (n + 1) 12 W 1 and U 1 are symmetrical w.r.t. their mean values. Moreover, for n 1 and n 2 greater than 10: U 1 E(U 1 ) std(u 1 ) approx N (0, 1) 14

16 Back to the test The test is one-sided left; the sample value is 7 and its p-value is P(W 1 7) = 2/35 = In such a case with low sample size, we can say that the evidence is against H 0. Direct computation in R > x=c( 12, 16,13);y=c(17,15,18, 20) > wilcox.test(x,y,"less") Wilcoxon rank sum test data: x and y W = 1, p-value = alternative hypothesis: true location shift is less than 0 The approximation of W 1 with a standard normal distribution is not appropriate for small sample sizes. But, in such a case, the exact computation and the normal approximation give similar results: z = = 1.77 p-value( 1.77) =

17 6. Goodness-of-fit tests Measures of goodness-of-fit typically summarize the discrepancy between observed values and the values expected under a known probability model. Such measures can also be used to test whether two samples are drawn from identical distributions. We consider here two goodness-of-fit tests: a) Chi-square test (discrete variables) b) Kolmogorov-Smirnov 16

18 6. a) Chi-square goodness-of-fit tests Let X be a discrete random variable with finite support variable with P(X = x i ) = π i i = 1,..., r The test hypotheses are: Let H 0 : π i = π i0 for all i and H 1 : π i π i0 for at least one i - X 1,..., X n be a random sample - F 1,..., F r be the sample variables denoting the sample frequencies of the values 1,..., r - N 1,..., N r be the corresponding counts variables, N i = nf i, i = 1,..., r. Often the N i s variables are called observed (counts) while the nπ i0 s are called expected (counts) and denoted by O i and E i respectively. 17

19 The test statistic is Q = n r i=1 (F i π i0 ) 2 π i0 = r i=1 (N i nπ i0 ) 2 nπ i0 = (simply) r i=1 (O i E i ) 2 E i Its asymptotic distribution is a chi-square with r 1 degrees of freedom: Q approx χ2 [r 1] The test is one-sided right because large sample values of Q state large difference between observed frequencies and expected frequencies. 18

20 Dependence on a parameter Often the π i s depends on a unknown parameter θ. Examples: X B(n, θ) (binomial), X U{0, θ} (discrete uniform between 0 and θ), X P(θ) (truncated Poisson, considering null the probability of large integers) We can write π i = π i (θ) and the test hypotheses become: H 0 : π i = π i0 (θ) and H 1 : π i π i0 (θ) If Θ n is a consistent estimator of θ with normal asymptotic distribution N (θ, V(Θ n )) (e.g. maximum likelihood estimator) then the test can be conduct with the statistic: Q = n r i=1 (F i π i0 (Θ n )) 2 π i0 (Θ n ) 19

21 Example. Sons among the first 7 children (Edwards and Fraccaro 1960) Consider the number of males among the first seven sons of 1334 Swedish Ministers n. sons counts We want to test if they are sample values of a random variable X B(7, θ) The point estimator of θ is X/7, the maximum likelihood estimator. estimate of θ is The > x=c( 0,1,2,3,4,5,6,7);o=c(6,57,206,362,365,256,69,13) > t=sum(x*o)/sum(o)/7;t [1] The expected counts under H 0 are: > e=sum(o)*dbinom(0:7,7,t);round(e,1) [1] The sample values of Q is 5.98 with p-value (q=sum((o-e)^2/e); 1-pchisq(q,7)) Then there is no evidence to reject H 0 20

22 Effects of small sample size. Recall that Q = n r i=1 (F i π i0 ) 2 π i0 = r i=1 (N i nπ i0 ) 2 nπ i0 approx χ2 [r 1] The chi-square approximation is valid when the sample size is large and the expected counts nπ i are not too small (at least 5 for all i = 1,..., r). In fact: (1) small n small q risk of type II error (2) small nπ i0 large q risk type I error. 21

23 Examples. Case (1): small n small q risk of type II error Consider the expected and observed frequencies beside where the differences between them are greater than 40%. 1 2 expected observed In such a case: (f 1 π 10 ) 2 π 10 + (f 2 π 20 ) 2 π 20 = If n = 10, then q = = with p-value retain H 0. If n = 30, then q = = with p-value reject H 0. > e=c(0.4,0.6); o=c(0.15,0.85); cf=sum((o-e)^2/e);cf [1] > n=10;cbind(cf*n,1-pchisq(cf*n,1)) [1,] > n=30;cbind(cf*n,1-pchisq(cf*n,1)) [1,]

24 Case (2): small nπ i0 large q risk type I error Consider the expected and observed counts beside. In (A) the expected counts are small twice. values (A) expected observed values (B) expected observed In (A): q = with p-value reject H 0. In (B): q = with p-value retain H 0. > e=c(10,2,2); o=c(12,3,6); cf=sum((o-e)^2/e) > cbind(cf,1-pchisq(cf,2)) [1,] > e=c(10,12,12); o=c(12,13,16); cf=sum((o-e)^2/e) > cbind(cf,1-pchisq(cf,2)) [1,]

25 6 b1) Kolmogorov-Smirnov goodness-of-fit tests Let X 1,..., X n be i.i.d. sample variables from a continuous random variable X with cumulative distribution function F. Consider the test hypotheses: H 0 : F (x) = F 0 (x) for all x R H 1 : F (x) F 0 (x) for at least a x R Let F be the empirical cumulative distribution function: F (x) = n i i n ( X(i) < x < X (i+1) ) where (X (1),..., X (n) ) is the sorted random sample and (.) denote the indicator function (equal to 1 if the condition is satisfied and equal to 0 otherwise). ˆF is a step function. The sample values of F (x) are discussed in the slides Exploratory Data Analysis. 24

26 The Kolmogorov test statistic is D = sup x R F (x) F 0 (x) = { { i max max 1 x n n F ( ) 0 X(i), i 1 n F ( ) }} 0 X(i) D is a distribution-free statistic. The test is one-sided right because a large sample value of D corresponds to a large difference between empirical and tested cumulative distribution function. 25

27 Example. Goodness-of-fit of a uniform random variable X U(0, 2) We want to test if a uniform random variable X U(0, 2) fits the following (sorted) data: A random variable X U(0, 2) has cumulative distribution function F 0 (x) = 0 if x < 0 1/2 x if 0 x < 2 1x if 2 x Beside the empirical cumulative distribution function (red) and F 0 (black). The maximum distance fo the two plot is achieved for x = 0.49 (fifth sorted value) and d = =

28 Direct computation in R > s=c(0.03,0.12,0.25,0.41,0.49,1.18,1.21,1.56,1.57,1.69) > ks.test(s,"punif",0,2) One-sample Kolmogorov-Smirnov test data: s D = 0.255, p-value = alternative hypothesis: two-sided There is no evidence to reject H 0. 27

29 Example. Approximate distribution of X n see slides on Central limit theorem Consider a simulation of 1000 samples, of size n each, from an exponential random variable X E(λ) with λ = 2. The simulated distribution is compared with - a Normal variable with sample mean and standard deviation - a Normal variable with theoretical mean and standard deviation; which are known: 1/λ and 1/(λ n) respectively. n = 10 > lambda=2;x=c(1:1000);n=10 > for (i in 1:1000) x[i]=mean(rexp(n,lambda)) > ######### empirical mean and standard deviation > ks.test(x,"pnorm",mean(x),sd(x)) One-sample Kolmogorov-Smirnov test data: x D = , p-value = alternative hypothesis: two-sided 28

30 > ######### theoretical mean and standard deviation > ks.test(x,"pnorm",(1/lambda),(1/lambda/sqrt(n))) One-sample Kolmogorov-Smirnov test data: x D = , p-value = alternative hypothesis: two-sided In the first case there is evidence to reject that the simulated distribution of X 10 is Normal. In the second one the evidence is weak. 29

31 n = 30 > lambda=2;x=c(1:1000);n=30 > for (i in 1:1000) x[i]=mean(rexp(n,lambda)) > ######### empirical mean and standard deviation > ks.test(x,"pnorm",mean(x),sd(x)) One-sample Kolmogorov-Smirnov test data: x D = , p-value = alternative hypothesis: two-sided > ######### theoretical mean and standard deviation > ks.test(x,"pnorm",(1/lambda),(1/lambda/sqrt(n))) One-sample Kolmogorov-Smirnov test data: x D = , p-value = alternative hypothesis: two-sided In both cases there is evidence to retain that the simulated distribution of X 10 is Normal. 30

32 6 b2) Two-sample Kolmogorov-Smirnov goodness-of-fit tests Let X and Y be two continuous independent random variables with cumulative distribution functions F X and F Y respectively. The test hypotheses are: H 0 : F X (t) = F Y (t) for all t R H 1 : F X (t) F Y (t) for at least a t R Let X 1,..., X n1 and Y 1,..., Y n2 be two independent random samples with empirical cumulative distribution functions F X and F Y respectively. The Kolmogorov-Smirnov test statistic is D n1,n 2 = sup x R D n1,n 2 is a distribution-free statistic. F X (x) F Y (x) 31

33 Example. Juiper trees We want to test if biomass of male and female Juniper trees have the same distribution. The two samples have size 6 each. > m=c(71,72,74,76,77,78); f=c(73,79,80,82,83,84) > > Fm_Ff=rbind(cumsum(table(factor(m, levels=71:84)))/6, + cumsum(table(factor(f, levels=71:84)))/6) > round(fm_ff,2) plot(ecdf(m),col="blue", xlim=c(70,85),xlab="",ylab="",main="") plot(ecdf(f),add=t,col="red", xlim=c(70,85),xlab="",ylab="",main="")

34 The absolute values of difference between F M and F F are listed below and their maximum value is reached at 78 of biomass. > D=abs(Fm_Ff[1,]-Fm_Ff[2,]) > round(rbind(fm_ff,d),2) D > max(d) [1] Direct computation in R > ks.test(m, f) Two-sample Kolmogorov-Smirnov test data: m and f D = , p-value = alternative hypothesis: two-sided There is evidence to reject H 0 33

35 7. Final remarks Form the book by T. Hill and P. Levicky (2006) Statistics method and applications. StatSoft. p. 385 It is not easy to give simple advice concerning the use of nonparametric procedures. Each nonparametric procedure has its peculiar sensitivities and blind spots. For example, the Kolmogorov-Smirnov two-sample test is not only sensitive to differences in the location of distributions (for example, differences in means) but is also greatly affected by differences in their shapes. The Wilcoxon matched pairs test assumes that one can rank order the magnitude of differences in matched observations in a meaningful manner. If this is not the case, one should rather use the Sign test. 34

36 In general, if the result of a study is important (e.g., does a very expensive and painful drug therapy help people get better?), then it is always advisable to run different nonparametric tests; should discrepancies in the results occur contingent upon which test is used, one should try to understand why some tests give different results. On the other hand, nonparametric statistics are less statistically powerful (sensitive) than their parametric counterparts, and if it is important to detect even small effects (e.g., is this food additive harmful to people?) one should be very careful in the choice of a test statistic. Nonparametric methods are most appropriate when the sample sizes are small. 35

NON-PARAMETRIC STATISTICS * (http://www.statsoft.com)

NON-PARAMETRIC STATISTICS * (http://www.statsoft.com) 1. GENERAL PURPOSE 1.1 Brief review of the idea of significance testing To understand the idea of non-parametric statistics (the term non-parametric