Outline The Rank-Sum Test Procedure Paired Data Comparing Two Variances Lab 8: Hypothesis Testing with R. Week 13 Comparing Two Populations, Part II

Size: px

Start display at page:

Download "Outline The Rank-Sum Test Procedure Paired Data Comparing Two Variances Lab 8: Hypothesis Testing with R. Week 13 Comparing Two Populations, Part II"

Florence Cameron
5 years ago
Views:

1 Week 13 Comparing Two Populations, Part II

2 Week 13 Objectives Coverage of the topic of comparing two population continues with new procedures and a new sampling design. The week concludes with a lab session. In particular: 1 The rank-sum test, is presented. 2 The concept of paired data is introduced, and the paired-data T -test, the signed-rank test, and McNemar s test are described. 3 Levene s test and the F-test for comparing two variances are presented. 4 The lab session demonstrates the R implementation of test procedures for one-sample, two-samples, and regression including checking for the validity of assumptions.

4 Motivation Outline If the sample sizes are small and the populations non-normal the T test is not valid. The Mann-Whitney-Wilcoxon rank-sum test (or rank-sum test for short), which will be described, can be used with both small and large sample sizes. If the two populations are continuous, the null distribution of the TS is known even with very small sample sizes. For discrete populations, the null distribution of the TS can be well approximated with much smaller sample sizes than those required by contrast-based procedure. The rank-sum test has high power, especially if the two population distributions are heavy tailed, or skewed.

5 The Null Hypothesis and the TS The rank sum procedure tests H F 0 : F 1 = F 2. Let R ij denote the (mid-)rank of observation X ij in the combined set of N = n 1 + n 2 observations, and set n 1 W 1 = R 1j, R 1 = W 1, W 2 = n 1 j=1 n 2 j=1 R 2j, R 2 = W 2 n 2. Then, the Mann-Whitney-Wilcoxon TS is R 1 R 2 = N ( ) N + 1 W 1 n 1, or simply W 1 n 1 n 2 2

6 The Standardized Rank-Sum TS and RR If there are no ties, Z H0 = W 1 n 1 (N + 1)/2 n1 n 2 (N + 1)/12 If H F 0 holds, Z H 0 N(0, 1), for n 1, n 2 > 8. The RR are: H a µ 1 µ 2 > 0 Z H0 z α µ 1 µ 2 < 0 Z H0 z α µ 1 µ 2 0 Z H0 z α/2 Rejection region at level α

7 Example Data on sputum histamine levels from 9 allergic and 13 non-allergic individuals are given in edu/acq/401/data/histamindata.txt. Is there a difference between the two populations? Test at α =.01. Solution. Here R 11 = 18, R 12 = 11, R 13 = 22, R 14 = 19, R 15 = 17, R 16 = 21, R 17 = 7, R 18 = 20, R 19 = 16. Thus W = j R 1j = 151 and Z H0 = 151 9(23)/2 9(13)(23)/12 = Since n 1, n 2 > 8, p-value=2[1 Φ(3.17)] = Thus the difference is significant.

8 Effect of Outliers Outline In the above example, the t test does not reject at level With data in data frame hi, gives p-value of 0.13, and t.test(hi$level hi$sample) t.test(hi$level hi$sample, var.equal=true) gives p-value of In general, using a procedure when the underlying assumptions are violated will give misleading results.

9 Introduction, Motivation Paired data arise when each experimental unit receives each of the two treatments that being compared. 1 Compare the durability of two types of tires. 2 Compare two labs for the analysis of mercury content. 3 Two acne treatments, two cataract treatments, etc. Paired data are of the form: (X 11, X 21 ),..., (X 1n, X 2n ). CIs and the TS are again based on X 1 X 2. But now they are not independent. Thus, previous formulas do not apply. For example, σ 2 X 1 X 2 = σ 2 X 1 + σ 2 X 2 2Cov(X 1, X 2 ). Similarly, the rank sum test is not valid now.

10 The paired data T-test While Cov(X 1, X 2 ) can be estimated, it is easier to use D 1 = X 11 X 21,..., D n = X 1n X 2n D 1,..., D n are independent, and D = X 1 X 2. Thus, σ 2 = σ 2 X 1 X 2 D can be estimated by σ2 = D S2 D /n, where [ n ] SD 2 = 1 Di 2 1 n n 1 n ( D i ) 2 i=1 CIs and testing are based on the fact: i=1 D µ D S D / n T n 1 if normality holds, or if n 30

11 Example A total of 12 water samples are analyzed for mercury content by labs A and B. The paired data yields D = X 1 X 2 = and S D = Does lab B give, on average, higher concentration results than lab A? Test at α = Solution. Here H 0 : µ 1 µ 2 = 0, H a : µ 1 µ 2 < 0. Because n < 30, we must assume normality. Doing so we have: T H0 = D S D / n = / 12 = Since T H0 < t.05,11 = 1.796, H 0 is rejected.

12 It is important to be able to recognize paired data. For example, A study was conducted to see whether two cars, A and B, having very different wheel bases and turning radii, took the same time to parallel park. 7 drivers were randomly obtained and the time required for each of them to parallel park each of the 2 cars was measured. The results are as follows: Driver Car A B

13 The Signed-Rank test 1 Rank the absolute differences D 1,..., D n from smallest to largest. Let R i denote the rank of D i. 2 Assign to R i the sign of D i, forming thus signed ranks. 3 Let S + be the sum of the ranks R i with positive sign, i.e. the sum of the positive signed ranks. If H 0 holds, µ S+ = n(n+1) 4, σs 2 + = n(n+1)(2n+1) 24. If H 0 holds, and n > 10, S + N(µ S+, σs 2 + ). The TS for testing H 0 : µ D = 0 is ( ) n(n + 1) n(n + 1)(2n + 1) Z H0 = S + /, 4 24 The RRs are the usual RRs of a Z -test.

14 Example (Mercury concentrations from Labs A and B) The 12 differences, D i and the ranks of their absolute values are given in the table below. Test H 0 : µ 1 µ 2 = 0, H a : µ 1 µ 2 < 0 at α = D i R i D i R i Solution: Here S + = = 13. Thus Z H0 = = 2.04, with p-value= Φ( 2.04) = Setting the differences in the object d, e.g., d=c( , , , , , , , , , , , ), the command wilcox.test(d,alternative= less ) returns a p-value of

15 Two Proportions with paired data Here each pair (X 1j, X 2j ) can be either (1, 1) or (1, 0) or (0, 1) or (0, 0). As an example, if n voters are asked, both before and after a presidential speech, whether or not they support a certain policy, X 1j = 1 or 0 if the jth voter supports or not before the speech, and X 2j = 1 or 0 if the same voter supports or not after the speech. Typically, however, the pairs (X 1j, X 2j ) are not given. Instead the data are presented in the following table format.

16 After 1 0 Before 1 Y 1 Y 2 0 Y 3 Y 4 Y 1 is the number of (1, 1) pairs, Y 2 is the number of (1, 0) pairs, Y 3 is the number of (0, 1) pairs, Y 4 is the number of (0, 0) pairs, Y Y 4 = n

17 A variation of the T statistic, used only for testing H 0 : p 1 p 2 = 0, is the McNemar test statistic: MN = Y 2 Y 3 Y2 + Y 3 This is referred to N(0,1), so the RR for H a : p 1 > p 2 is MN > z α. Similarly for the other H a. R uses the square of MN and refers it to a χ 2 1 distribution. In this form only H a : p 1 p 2 can be tested with p-value 1-pchisq(MN 2, 1).

18 Example (McNemar s test) Data on approval of the President s performance in office in two surveys, one month apart, of 1600 voting-age Americans, give Y 1 = 794, Y 2 = 150, Y 3 = 86, Y 4 = 570. Is there evidence, at α = 0.05, of a shift in public opinion? Report the p-value. Solution. Here, MN = (150 86)/ = Since z = 1.96 we conclude that there is evidence of a shift in public opinion. The R command 2*(1-pnorm(4.166)) returns a p-value of 3.10e-05.

19 Levene s Test Outline It is based on the idea that if the variances are equal, V 1j = X 1j X 1, j = 1,..., n 1, and V 2j = X 2j X 2, j = 1,..., n 2, where X i, i = 1, 2 is the median from ith sample, correspond to populations with equal means and variances. Thus, equality of variances can be tested by testing the hypothesis H 0 : µ V1 = µ V2 vs µ V1 µ V2 using the two-sample t-test with pooled variance.

20 Example The plasma vitamin C concentration (µmol/l) of five randomly selected smokers and nonsmokers are: Nonsmokers s 1 = Smokers s 2 = Test H 0 : σ1 2 = σ2 2 vs H a : σ1 2 σ2 2 at α = Solution. Here X 1 = 41.68, X 2 = Thus, V 1 values for Nonsmokers V 2 values for Smokers The R commands x=c(0.20,0.03,0.30, 0.00, 0.50); y=c(0.26,0.00,0.17,0.05,0.23); t.test(x, y, var.equal=t) gives a p-value of Thus, H 0 is not rejected.

21 The F Test Under Normality When the two samples have been drawn from normal populations, the exact distribution of S1 2/S2 2 is a multiple of an F distribution. Theorem Let X 11,..., X 1n1 be a random sample from a normal distribution with variance σ1 2, let X 21,..., X 2n2 be another sample from a normal distribution with variance σ2 2, and let S2 1 and S2 2 denote the two sample variances. Then the rv F = S2 1 /σ2 1 S 2 2 /σ2 2 has an F distribution with ν 1 = n 1 1 and ν 2 = n 2 1 degrees of freedom.

22 The test statistic for H 0 : σ 2 1 = σ2 2 is: F H0 = S2 1 S2 2. If the ratio differs sufficiently from 1, the null hypothesis is rejected. In particular the RRs for testing H 0 : σ 2 1 = σ2 2 are H a RR at level α σ1 2 > σ2 2 F H0 > F n1 1,n 2 1;α σ1 2 < σ2 2 F 1 H 0 > F n2 1,n 1 1;α σ1 2 σ2 2 either F H0 > F n1 1,n 2 1;α/2 or F 1 H 0 > F n2 1,n 1 1;α/2

23 Example Consider the data in the previous example, and assume the underlying populations are normal. The test statistic is F H0 = = By the formula for the p-value in p. 333, the p-value, found with 2(1-pf(2.4, 3,3)) is 0.49.

25 The R commands Outline If y and x contain the values of the response and the predictor, the basic commands for testing in regression are: out=lm(y x); summary(out); summary(aov(out)) summary(out) gives the estimated regression coefficients and their standard errors, the p-values for testing that each coefficient is zero, R 2, and also the F-test statistic and p-value for the model utility test. summary(aov(out)) gives the ANOVA table.

26 Illustration with Simulated Data e=rnorm(50,0,5); x=runif(50,0,10); y=25-3.4*x+e; out=lm(y x) For the data generated the summary(out) output includes Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) <2e-16 x <2e-16 Residual standard error: on 48 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 1 and 48 DF, p-value: < 2.2e-16

27 Moreover, the summary(aov(out)) output includes Df Sum Sq Mean Sq F value Pr(>F) x <2e-16 Residuals The standard errors of the coefficients in the summary(out) output can be used for computing T statistics for other hypotheses regarding them. For example, for the T statistic for testing H 0 : β 1 = 3.4 vs H a : β is T H0 = = with corresponding p-value 2(1 G 48 (1.253)) = qqnorm(resid(out)); qqline(resid(out), col=2) can be used to check the normality assumption

29 T-tests and T-intervals for one mean Let x contain the data set. By default, the command t.test(x), which is equivalent to t.test(x, mu=0, alternative= two.sided, conf.level=0.95) gives the t-statistic, the df, the p-value for testing H 0 : µ = 0 against the two-sided alternative, the 95% CI for µ, and X. To test H 0 : µ = 8.5, replace mu=0 by mu=8.5. For one-sided alternatives, use alternative = less and alternative = greater. Note, however, the CIs are now one-sided.

30 Example Is there evidence that the average level of radiation is higher than the federal health standard of 10 W/cm 2? Use the data in ExRadiationTestData.txt to test at α = Also, report the p value, and construct a 95% CI. Solution. Reading the data set into the R object x, the command t.test(x, mu=10, alternative= greater ) returns a p-value of Thus, H 0 : µ = 10 cannot be rejected in favor of H a : µ > 10 at α = Next use t.test(x, mu=10, alternative= two.sided ) to get a 95% CI of (9.773, ).

31 Power and sample size calculations for H 0 : µ = µ 0 First one needs to install the package pwr using the command install.packages( pwr ) Then issue the command library(pwr) to load the package in the current R session. The command for computing the power at a given µ a with a given n, α and S value, for H a : µ > µ 0, is pwr.t.test(n, (µ a µ 0 )/S, α, power=null, one.sample, greater ) For H a : µ < µ 0 and H a : µ µ 0 replace greater by less and two.sided, respectively.

32 Example For the testing problem H 0 : µ = 10 vs H a : µ > 10 with the ExRadiationTestData.txt data set, find the power at µ a = 11. Solution. The commands length(x); sd(x) return n = 25 and S = 2.00 for this data set. The R command pwr.t.test(25, (11-10)/2.00, 0.05, power = NULL, one.sample, greater ) returns a power of NOTE: Treating S as the true σ, the command 1-pnorm((10-11)/(2.00/sqrt(25)) + qnorm(0.95)) returns a power of 0.80 according to the formula in the teaching slides.

33 The command for computing the sample size needed to achieve a certain level of power at µ a with a given α and S value, for H a : µ > µ 0, is pwr.t.test(n=null, (µ a µ 0 )/S, α, power(µ a ), one.sample, greater ) For H a : µ < µ 0 and H a : µ µ 0 replace greater by less and two.sided, respectively.

34 Example For the testing problem H 0 : µ = 10 vs H a : µ > 10 with the ExRadiationTestData.txt data set, find the sample size needed to achieve power of 0.9 at µ a = 11. Solution. The R command pwr.t.test(n=null, (11-10)/2.00, 0.05, 0.9, one.sample, greater ) returns a sample size of 35.65, which is rounded up to 36. NOTE: Treating S as the true σ, the command (2.00*(qnorm(.95)+qnorm(.9))/(10-11))**2 returns a sample size of 34.26, which is rounded to 35, according to the formula in the teaching slides.

35 Two independent samples The t.test command can also be used for comparing two means, both with independent and with paired data. The two samples can be in two separate columns (i.e., x and y), or combined in one column, say y, with a separate column, say x, indicating the sample membership of each observation. The default is to treat the two samples as independent, do 95% CI, and give the p-value for H a : µ 1 µ 2 0, without assuming σ 1 = σ 2. The command with these default options is: t.test(x, y) # One sample in x, the other in y t.test(y x) # For values in y and sample index in x

36 For the pooled variance T test, and 99% CI do: t.test(y x, var.equal = TRUE, conf.level = 0.99) and similarly if the two samples are in separate columns. To test a different null hypothesis, e.g., H 0 : µ 1 µ 2 = 1.8 vs H a : µ 1 µ 2 < 1.8 do: t.test(y x, mu=1.8, alternative = less ). and similarly if the two samples are in separate columns. Other options are: alternative = greater, or the default two.sided.

37 Example Use the R data set airquality to compare the ozone levels in May and August. Report the p-value, test at 0.05, and construct a 95%CI for µ 1 µ 2, with and without the assumption of equal variances. [NOTE: Normality is violated; check with boxplot(ozone Month, data = airquality). ] Solution: Use: y1=airquality$ozone; x1=airquality$month x=y1[which(x1==5)]; y=y1[which(x1==8)]; t.test(x, y); t.test(x, y, var.equal = T) More advanced application ( ) : t.test(ozone Month, data = airquality, subset = Month %in% c(5, 8))

38 Outline The basic command for testing and CI construction with paired data is t.test(y x, paired = T) and similarly if the two samples are in different columns. Other options can be added as before. For example, t.test(y x, alternative = c( two.sided, less, greater ), mu = 1.8, paired = T, conf.level = 0.9) With paired data, equality of the two marginal variances is a non-issue, so you never need to use var.equal=t.

39 Example Two brands of motorcycle tires are to be compared for durability. Eight motorcycles are selected at random and one tire from each brand is randomly assigned (front or back) on each motorcycle. The motorcycles are then run until the tires wear out. The data in motorcycletireslifetimes.txt are in km. Use the paired T -test procedure to test the hypothesis of equal average durability at level α = 0.05, and to construct a 90% CI for µ 1 µ 2. Solution: Read the data in tl and use: x=tl$brand1; y=tl$brand2; t.test(x,y,paired=t, conf.level=0.9) # set x and y and construct the test and CIs

41 The Rank Sum Test The wilcox.test command can be used to conduct both the rank-sum test and the signed-rank test. Again, the two samples can be in two separate columns, or combined in one column with a separate column indicating the sample membership of each observation. The default is to treat the two samples as independent, and give the p-value for testing equality of the two populations against the two-sided alternative, without constructing a CI: wilcox.test(x, y) # One sample in x, the other in y wilcox.test(y x) # For values in y and sample index in x

42 To get a CI for the location difference use: wilcox.test(y x, conf.int = TRUE, conf.level = 0.9) [The description of this CI is not in the book.] To test for different null and alternative hypotheses use: wilcox.test(y x, mu=1.8, alternative = c( less, greater )) Similarly if the two samples are in different columns.

43 Example Use the R data set airquality to compare the ozone levels in May and August. [Check data set with boxplot(ozone Month, data = airquality)] Solution: Use y1=airquality$ozone; x1=airquality$month x=y1[which(x1==5)]; y=y1[which(x1==8)]; wilcox.test(x, y, conf.int = T) More advanced application ( ) : wilcox.test(ozone Month, data = airquality, subset = Month %in% c(5, 8))

44 Rank sum for paired data (Signed-Rank Test) The basic command for the signed-rank test with paired data (without constructing a CI) is: wilcox.test(x, y, paired = T) # One sample in x, the other in y wilcox.test(y x, paired = T) # For values in y and sample index in x Other options can be added as before. For example, wilcox.test(x, y, alternative = c( less, greater ), mu = 1.8, paired = T, conf.int = T, conf.level = 0.9)

45 Example Two brands of motorcycle tires are to be compared for durability. Eight motorcycles are selected at random and one tire from each brand is randomly assigned (front or back) on each motorcycle. The motorcycles are then run until the tires wear out. The data in 401/Data/motorcycleTiresLifetimes.txt are in km. Use the signed-rank test procedure to test the hypothesis of equal durability at level α = 0.05, and to construct a 90% CI for the location difference. Solution: Read the data in tl and use: x=tl$brand1; y=tl$brand2; wilcox.test(x, y, paired=t, conf.int = T, conf.level=0.9) # set x and y and construct the test and CIs

47 Set the number of successes and the number of trials in x and n. For example, use x=c(16,14); n=c(200,400) if X 1 = 16, X 2 = 14, n 1 = 200, n 2 = 400. To test H 0 : p 1 p 2 = 0 vs the two-sided alternative, and construct a 95% CI for p 1 p 2, use prop.test(x, n), or, equivalently: prop.test(x, n, alternative = two.sided, conf.level = 0.95) Other alternative options are less, or greater. No option for testing other null hypotheses, e.g., H 0 : p 1 p 2 = 0.1

48 Example An article in Knee Surgery, Sports Traumatology, Arthroscopy (2005), Vol. 13, , reported results of arthroscopic meniscal repair with an absorbable screw. For tears greater than 25 millimeters, 10 of 18 repairs were successful, while for tears less than 25 millimeters, 22 of 30 were successful. Is there evidence that the success rate for the two types of tears are different? Test at α = 0.1, report the p-value, and construct a 90% confidence interval for p 1 p 2. Solution: Use x=c(10,22); n=c(18,30); prop.test(x, n, conf.level = 0.9)

Introduction to Nonparametric Statistics

Introduction to Nonparametric Statistics by James Bernhard Spring 2012 Parameters Parametric method Nonparametric method µ[x 2 X 1 ] paired t-test Wilcoxon signed rank test µ[x 1 ], µ[x 2 ] 2-sample t-test