8 Comparing Two Independent Populations

Size: px

Start display at page:

Download "8 Comparing Two Independent Populations"

Joy Mary Bishop
6 years ago
Views:

1 8 Comparing Two Independent Populations We ll study these methods for comparing two independent populations:. The Two-Sample T-Test (Normal with Equal Variances) 2. The Welch T-Test (Normal) 3. Bootstrap for Two Samples 4. The Wilcoxon Rank Sum or Mann-Whitney Test 5. Comparing Two Population Proportions 8. The Two-Sample T-Test (Normal with Equal but Unknown Variances) e.g. The horned lizard has spikes on its head that may protect against its primary predator, the loggerhead shrike. Researchers wanted to compare dead lizards killed by shrikes with live lizards from the same area. A SRS was taken from each population. The longest spike was measured, in mm. Is there a difference in longest spike length across the two populations? Here are the data: Dead: 7.65, 20.83, 24.59, 8.52, 2.40, 23.78, 20.36, 8.83, 2.83, Live: 23.76, 2.7, 26.3, 20.8, 23.0, 24.84, 9.34, 24.94, 27.4, 25.87, 8.95, 22.6 Start with graphical and numerical summaries: Longest Spike Length (mm): Dead Density Live Group n x s Dead Live Density The summaries show. Is it, or just the result of? The shift of sample means matters only of the sample data.

2 Compare two population means, µ and µ 2, by studying their. Notation: Populatio Populatio Variable X X 2 Mean µ Variance σ2 2 Sample size Sample mean X Sample variance s 2 2 For inference about µ µ 2, use the statistic, and then test H 0 : µ µ 2 = δ 0 (δ 0 = 0 = ) find a confidence interval for µ µ 2 To do this, we need the distribution of. Recall, for independent X and Y : E(X Y ) = V AR(X Y ) = E( X) = V AR( X) = If X N(µ X, σx 2 ) and Y N(µ Y, σy 2 ), then X Y For normal X, X It follows that, for normal populations and 2, X X 2 N ( ) µ µ 2, σ2 + σ2 2. But we don t know or. Here we assume they are equal and calculate a variance estimate: s 2 p = n i= (X,i X ) n2 i= (X 2,i X 2 ) 2 = ( )s 2 + ( )s Now we can state a test and a confidence interval: Many hypothesis tests use test statistics of the form (point estimate) (parameter value ) (estimated or true) of point estimate

3 This. point estimate tells how far the estimate is from the parameter, in For our difference of two means, this is T = ( X X 2 ) δ 0 t n +n s ( ) p + s2 p Recall that many confidence intervals have the form (point estimate) ± (margin of error) =(point estimate) ± ( value for confidence) [ of point estimate] =ˆθ ± (table value for confidence) σˆθ Our 00%( α) confidence interval for µ µ 2, assuming normal populations and equal population variances, is ( X X 2 ) ± t (n + 2,α/2) s 2 p + s2 p e.g. Test whether the lizard populations have the same population spike lengths and find a confidence interval for the difference in lengths. Check normality: QQ Plot of Dead QQ Plot of Live Sample Quantiles Sample Quantiles Theoretical Quantiles Theoretical Quantiles Check equal variances from original plots, above, or from a rule of thumb that it s plausible that population variances are equal if the larger sample variance is less than twice the smaller. e.g. Our sample variances are and, so we can.

4 e.g. Calculate a test and interval for the lizard spikes: Summary: Suppose we have independent simple random samples from normal populations with means µ and µ 2 and variances σ 2 and σ2 2, where σ2 = σ2 2. To test H 0 : µ µ 2 = δ 0,. State null and alternative hypotheses, H 0 and H A 2. Check assumptions (rule of thumb: σ 2 = σ2 2 is plausible if s2 and s2 2 within factor of 2) 3. Find the pooled variance estimate s 2 p = ( )s 2 + ( )s Find the test statistic t = ( x x 2 ) δ 0 s 2 p + s2 p 5. Find the p-value, which is an area under the t n + 2 curve depending on H A : H A : µ µ 2 > δ 0 = p-value = P (T > t), the area right of t H A : µ µ 2 < δ 0 = p-value = P (T < t), the area left of t H A : µ µ 2 δ 0 = p-value = P ( T > t ), the sum of the two tail areas 6. Draw a conclusion A 00%( α) confidence interval for µ µ 2 is ( X X 2 ) ± t (n + 2,α/2) s 2 p + s2 p. Note: I recommend Welch s t-test, below, instead of the two-sample t-test, above. We introduced the equal-variances two-sample t-test to see its s 2 pooled, which has the form sum of squared deviations from respective sample means degrees of freedom, which we ll see again in 0 on ANOVA.

5 8.2 The Welch T-Test (Normal without Assuming Equal Variances) e.g. Concrete is often reinforced with steel rebar ( reinforcing bar ). Steel is strong, but tends to corrode over time. An experiment tested two corrosion-resistant materials, one fiberglass and the other carbon. Eight concrete beams with fiberglass reinforcement, and with carbon reinforcement, were poured. Each was subjected to a load test, with the breaking force measured in kn (kilonewtons): Fiberglass: 37.3, 29.6, 33.4, 33.6, 30.7, 32.7, 34.6, 32.3 Carbon: 48.8, 38.0, 42.2, 45., 33.8, 47.2, 50.6, 44.0, 43.9, 40.4, 45.8 Is there a difference in the (population) mean strengths of the two types of beams? We test: H 0 : µ carbon µ fiber = vs. H A : µ carbon µ fiber First, make graphical and numerical summaries. Beam Type Sample Size Mean SD Fiber Carbo Concrete breaking force (kn): fiberglass Density carbon Density These summaries suggest. Let s test this. Is it plausible the two populations are? Here are QQ plots:

6 Normal Q Q Plot Normal Q Q Plot Sample Quantiles Sample Quantiles Theoretical Quantiles Theoretical Quantiles We ll assume populations. The first graph, and our rule of thumb ( ), suggest in the Carbon group, so we assume equal variances. Suppose, then, that we have independent simple random samples from two normal populations with means µ and µ 2 and variances σ 2 and σ2 2, which be equal. Recall (from 8., above) that X X 2 N ( ) µ µ 2, σ2 + σ2 2 We could standardize X X 2 as Z = ( X X 2 ) (µ µ 2 ), but we don t σ 2 + σ2 2 know or. We approximate them with and, and then get a distribution instead of a. (Recall that the t ν distributions look like, but are with ). Experts say T = ( X X 2 ) (µ µ 2 ) s 2 + s2 2 t ν ( ), where ν = ( s 2 + s2 2 ) 2 (s 2 /) 2 + (s2 2 /) 2 Now we can state a test and interval if we recall the common test statistic form,, rounded. ( ) ( ), (estimated or true) of point estimate which tells how far the estimate is from the parameter, in standard deviations, and the common confidence interval form, (point estimate) ± (margin of error) = ˆθ ± (table value for confidence) σˆθ

7 Suppose we have independent simple random samples from normal populations with means µ and µ 2 and variances σ 2 and σ2 2. To test H 0 : µ µ 2 = δ 0,. State null and alternative hypotheses, H 0 and H A 2. Check assumptions 3. Find the test statistic t = ( x x 2 ) δ 0 s 2 + s Find the degrees of freedom, ν = ( s 2 + s2 2 ) 2 (s 2 /) 2 + (s2 2 /) 2, rounded down 5. Find the p-value, which is an area under the t ν curve depending on H A : H A : µ µ 2 > δ 0 = p-value = P (T ν > t), the area right of t H A : µ µ 2 < δ 0 = p-value = P (T ν < t), the area left of t H A : µ µ 2 δ 0 = p-value = P ( T ν > t ), the sum of the two tail areas 6. Draw a conclusion ( X X 2 ) ± t ν,α/2 s 2 + s2 2 contains µ µ 2 for a proportio α of samples. Note that these formulas are like the 8. formulas, except that the estimated and changed. e.g. Test H 0 : µ carbon µ fiber = vs. H A : µ carbon µ fiber. t = ν = p-value = conclusion: 95% interval for µ carbon µ fiber : Compare two-sided test and interval:

8 To decide between the 8. two-sample t-test and this 8.2 Welch s t-test, consider If population variances are equal, but are not assumed to be equal (so Welch s test is used), the test loses a little, but is still a good test. If population variances are different, but are assumed equal (so the two-sample t-test is used), the test can make conclusions. 8.3 Bootstrap for Two Samples For populations that may not be, we can do a bootstrap test or interval for a difference of two means. It uses the Welch s t obs and resamples from the two samples many times, each time finding ˆt = ( x x 2 ) ( x x 2 ) s 2, to estimate the population distribution of t. + s2 2 e.g. When sage crickets mate, the male allows the female to eat part of his hind wings. Does female hunger influence desire to mate? An experiment randomly assigned 24 females to two groups. One group of was starved for two days, while the other group of 3 was fed normally. Each female was presented with a male and the time to mating (in hours) was recorded. Do starved females have a different mean time to mating than normally fed females? Here are the data: Starved:.9, 2., 3.8, 9.0, 9.6, 3.0, 4.7, 7.9, 2.7, 29.0, 72.3 Fed:.5,.7, 2.4, 3.6, 5.7, 22.6, 22.8, 39.0, 54.4, 72., 73.6, 79.5, 88.9 We test: H 0 : µ starved µ fed = vs. H A : µ starved µ fed Start with summaries: Group Sample Size Mean SD Starved Fed (So far, we might consider a, as the variances seem.) Fed Fed Starved Percent of Total 0 Starved Time (hrs) Time (hrs)

9 Note that the fed times, while the starved times include. QQ Plot of Starved QQ Plot of Fed Sample Quantiles Sample Quantiles Theoretical Quantiles Theoretical Quantiles We cannot use the Welch s T -test because. We can again use a method. To do a bootstrap test for H 0 : µ µ 2 = 0,. Draw simple random samples x,, x,2,..., x,n of size from the first population and x 2,, x 2,2,..., x 2,n2 of size from the second. Compute x, s 2, x 2, and s 2 2. Find t obs = ( x x 2 ) 0. s 2 + s Draw simple random samples with replacement, x,,..., x,, from the first sample and x 2,,..., x 2, from the second. 3. Compute the means and variances of the resampled data for each group. Call these x and s 2, and x 2 and s2 2. s 2 4. Compute the statistic ˆt = ( x x 2 ) ( x x 2 ) + s Repeat steps 2-4 a large number, B, times to get a collection of ˆt values that approximate the sampling distribution of t. 6. Find the p-value, an area under the approximate sampling distribution given by, where m depends on H A : H A : µ µ 2 > 0 = m is the number of values of ˆt for which ˆt H A : µ µ 2 0 = m is the number of values of ˆt for which ˆt < t obs H A : µ µ 2 0 = m is the number of values of ˆt for which t obs 7. Draw a conclusion: { p-value α (where α is the level,.05 by default) = reject H0 p-value > α = retain H 0 as plausible

10 # Here's one way to do the bootstrap for a difference of two means in R: starved = c(.9, 2., 3.8, 9.0, 9.6, 3.0, 4.7, 7.9, 2.7, 29.0, 72.3) fed = c(.5,.7, 2.4, 3.6, 5.7, 22.6, 22.8, 39.0, 54.4, 72., 73.6, 79.5, 88.9) summary(starved) # numerical summaries sd(starved) summary(fed) sd(fed) # install.packages("lattice") # Run once to download R code to your computer. require("lattice") all = c(starved, fed) # graphs group = c(rep("starved", ), rep("fed", 3)) dotplot(~all group, layout = c(,2), as.table = T, xlab = 'Time (hrs)') histogram(~all group, layout = c(,2), as.table = T, xlab = "Time (hrs)") qqnorm(starved, main = "QQ Plot of Starved") qqnorm(fed, main = "QQ Plot of Fed") # dat and dat2 are data from the two groups. nboot is the number of resamples. boottwo = function(dat, dat2, nboot) { bootstat = numeric(nboot) truediff = mean(dat) - mean(dat2) n = length(dat) n2 = length(dat2) for(i i:nboot) { samp = sample(dat, size = n, replace = T) samp2 = sample(dat2, size = n2, replace = T) bootmean = mean(samp) bootmean2 = mean(samp2) bootvar = var(samp) bootvar2 = var(samp2) bootstat[i] = ((bootmean - bootmean2) - truediff)/sqrt((bootvar/n) + (bootvar2/n2)) } return(bootstat) } B = 5000 cricketboot = boottwo(starved, fed, B) t.obs = (mean(starved) - mean(fed)) / sqrt(var(starved) / length(starved) low = sum(cricketboot < -abs(t.obs)) high = sum(cricketboot > abs(t.obs)) p.val = (low + high) / B + var(fed) / length(fed))

11 e.g. For the starved/fed cricket data, we find t obs =. I used R to run B = 5000 resamples and found ˆt values less than t obs and greater, for a p-value of. We conclude. 8.4 The Wilcoxon Rank Sum or Mann-Whitney Test One more test of location for two populations that may not be normal is the Wilcoxon Rank Sum Test or Mann-Whitney Test. e.g. Consider again the cricket data: starved:.9, 2., 3.8, 9.0, 9.6, 3.0, 4.7, 7.9, 2.7, 29.0, 72.3 (n starved = ) fed:.5,.7, 2.4, 3.6, 5.7, 22.6, 22.8, 39.0, 54.4, 72., 73.6, 79.5, 88.9 (n fed = ) For the Wilcoxon Rank Sum test, we must assume independence of sample data between and within groups and that the distributions of the two groups. Our hypotheses are in terms of the two population (not ): H 0 : The distributions of the two groups are identical vs. H A : The distributions of the two groups relative to the other. but one is The test statistic is related to of the samples, so we rank the data without regard for sample, while retaining sample labels. Then we find: R = sum of sample ranks, R min = ( +) 2 = minimum possible sum, and U = R R min H A : populatio is shifted left of 2 = p-value = P (U U obs ) p-value: H A : populatio is shifted right of 2 = p-value = P (U U obs ) H A : populatio is shifted from 2 = p-value = 2 min[p (U U obs ), P (U U obs ), 2 ]

12 e.g. Here are the cricket data again: rank time sample starved ranks.5 fed.7 fed 3 starved fed fed starved fed starved starved starved starved starved starved fed fed starved fed fed fed starved fed fed fed R = R min = U = (For observations, use ranks. e.g. If samples had two.2s, they d be # and #2 or #2 and #, so each would get rank.) How many possible rankings must we consider to find the p-value? Here s one way to do this with R: starved = c(.9, 2., 3.8, 9.0, 9.6, 3.0, 4.7, 7.9, 2.7, 29.0, 72.3) fed = c(.5,.7, 2.4, 3.6, 5.7, 22.6, 22.8, 39.0, 54.4, 72., 73.6, 79.5, 88.9) wilcox.test(starved, fed) For the cricket data, R gives p-value, so we.

13 e.g. Here s a simpler example for which it is not hard to calculate the p-value by hand. Suppose sample A is 4.8, 2.2 and sample B is 3.0,.5, 3.5. Sample A s ranks are and, R =, R min =, and U =. Under H 0, ranks are randomly assigned to the two samples from {, 2, 3, 4, 5}. Here are the possible sample A ranks and the statistics we get from them: Sample A ranks, 2, 3, 4, 5 2, 3 2, 4 2, 5 (observed) 3, 4 3, 5 4, 5 R R min = U The p-value is. Conclusion:

14 8.5 Comparing Two Population Proportions e.g. Does handedness differ by sex? A SRS of n M = 54 males and n F = 2 females was taken. Each person indicated his or her dominant hand: Female: 2 left, 9 right Male: 23 left, 3 right Let π F L = proportion of left-handed females and π ML = proportion of left-handed males in the population. We test H 0 : π F L π ML = 0 H A : π F L π ML 0 A natural point estimate for the population difference of proportions is. If π F L n F, ( π F L )n F, π ML n M, and ( π ML )n M,, are all greater than, we can use the CLT to say ˆπ F L ˆπ ML N ( π F L π ML, π F L( π F L ) + π ) ML( π ML ) n F n M But we don t know and. Under H 0, they are : π F L = π ML = π L, and the distribution becomes: ˆπ F L ˆπ ML N ( ( 0, π L ( π L ) + )) n F n M We don t know the common proportion, but we estimate it with a weighted average of the sample proportions: ˆπ L = ˆπ F Ln F + ˆπ ML n M number of in both samples combined = n F + n M combined our point estimate to get a test statistic: Z = ˆπ F L ˆπ ML ( ) ˆπ L ( ˆπ L ) n F + n M N(0, ) e.g. For the handedness data, we have: ˆπ F L = ˆπ ML = ˆπ L =

15 The (approximate) expected numbers of successes and failures are. z = p-value = conclusion: We can also make a CI. It does not come with a, so we use the more general form of the variance. An approximate 00( α)% CI for π F L π ML is: For our data, a 95% interval works out to: ˆπ F L ˆπ ML ± z α/2 ˆπ F L ( ˆπ F L ) n F + ˆπ ML( ˆπ ML ) n M Summary: Suppose X Bin(n X, π X ) and Y Bin(n Y, π Y ) are independent, with n X π X, n X ( π X ), n Y π Y, and n Y ( π Y ) all > 5. To test H 0 : π X π Y = 0:. State null and alternative hypotheses, H 0 and H A 2. Check assumptions 3. Find ˆπ X = X, ˆπ Y = Y, and pooled ˆπ = X + Y n X n Y n X + n Y (ˆπ X ˆπ Y ) 0 4. Find the test statistic, z = ˆπ( ˆπ)(/nX + /n Y ) 5. Find the p-value, which is an area under the N(0, ) curve depending on H : H A : π X π Y > 0 = p-value = P (Z > z), the area right of z H A : π X π Y < 0 = p-value = P (Z < z), the area left of z H A : π X π Y 0 = p-value = P ( Z > z ), the sum of the two tail areas 6. Draw a conclusion A (00%)( α) confidence interval for π X π Y is (ˆπ X ˆπ Y ) ± z α/2 ˆπX ( ˆπ X ) ˆn X + ˆπ Y ( ˆπ Y ) ˆn Y. In the next section, we compare two means when the samples are.

8 Comparing Two Populations via Independent Samples, Part 1/2

8 Comparing Two Populations via Independent Samples, Part /2 We ll study these ways of comparing two populations from independent samples:. The Two-Sample T-Test (Normal with Equal Variances) 2. The Welch