Chapters 4-6: Inference with two samples Read sections 4.2.5, 5.2, 5.3, 6.2

Size: px

Start display at page:

Download "Chapters 4-6: Inference with two samples Read sections 4.2.5, 5.2, 5.3, 6.2"

Bennett Marsh
5 years ago
Views:

1 Chapters 4-6: Inference with two samples Read sections 45, 5, 53, 6 COMPARING TWO POPULATION MEANS When presented with two samples that you wish to compare, there are two possibilities: I independent samples and II paired samples I INDEPENDENT SAMPLES: Select a SRS of size from populatio which has mean µ and standard deviation σ Independently from the first sample, select a SRS of size from populatio which has mean µ and standard deviation σ If sampling a finite population without replacement, then 05N and 05N The difference between the two population means is the parameter of interest, µ µ The statistic X X is a point estimator of the parameter µ µ Note about Study Design: Completely Randomized Design Experiment Two treatment groups from a CRD are independent In a CRD, it is possible to claim that the treatment caused the difference in means If the individuals in the CRD were chosen from a SRS, then conclusions about µ µ can be extended to the populations from which the individuals were drawn However, if the individuals were not a SRS, then conclusions to larger populations are dubious Observational Study Two samples to be compared in an observational study are considered to be independent if individuals were randomly chosen from each respective population Do not claim that an explanatory variable caused the difference in means Facts about the sampling distribution of X X : µ X X = µ µ, so X X is an unbiased point estimator of µ µ σ = σ X X + σ σ and σ X = X + σ IfthesamplingdistributionsofX andx areapproximately ( normal,thenthesamplingdistribution σ of X X is approximately normal, ie, X X N µ µ, + σ )

2 Un-pooled Approach to estimating and testing µ µ (53 and 53): A Confidence Interval for µ µ is X X ± t α,df S + S df = (V +V ) V + V and V = S and V = S A conservative degree of freedom estimate is df = min(, ) Check the assumptions that the SRS s are independent and that X and X are normal! A Hypothesis Test for µ µ is: Hypotheses: H 0 : µ µ = 0 where 0 is a specific value (usually 0) H a : µ µ > 0 Choose one: H a : µ µ < 0 H a : µ µ 0 NOTE: Perform steps -4 assuming that H 0 is true! Assumptions: (a) SRS s from each population (b) The two SRS s are independent of each other (c) If sampling finite populations without replacement, then 05N and 05N (d) The sampling distributions of X and X must be at least approximately normal (so the data is normal OR, 5 for symmetric non-normal data OR, 30 for skewed data) If H 0 is true, then ( X X ) N ( 0, σ + σ ) 3 Test Statistic: T = X X 0 S + S When H 0 is true, T t(df) where df = (V +V ) V + V A conservative degree of freedom estimate is df = min(, ) 4 The p-value is the probability of obtaining a sample statistic that is as extreme or more extreme than what was actually observed assuming that H 0 is true Alternative Hypothesis p-value H a : µ µ > 0 P(T > t) H a : µ µ < 0 P(T < t) H a : µ µ 0 P(T > t ) 5 Decision: Given a predetermined significance level α: Reject H 0 if p-value α Fail to Reject H 0 if p-value > α 6 Conclusion: a statement of how much evidence there is for the alternative hypothesis If you reject H 0, then you conclude The evidence suggests H a

3 If you fail to reject H 0, then you conclude The evidence fails to suggest H a In each case, specify H a in terms of the problem EXAMPLE: To compare the mean cholesterol content (in milligrams) of grilled chicken sandwiches from Arby s and McDonald s, you randomly select several sandwiches from each restaurant and measure their cholesterol contents From Arby s, = 5 sandwiches are randomly selected, x = 60 mg and s = 359 mg From McDonald s, = sandwiches are randomly selected, x = 70 mg and s = 4 mg Construct an 90% confidence interval for the difference in mean cholesterol content of grilled chicken sandwiches Is there sufficient evidence to conclude the McDonald s grilled chicken sandwiches have a higher cholesterol content, on average, compared to Arby s? Hypotheses: Check assumptions: 3 Test statistic: 4 p-value: 5 Decision: 6 Conclusion: 3

4 Pooled Approach to testing and estimating µ µ (536): Assume that σ = σ, called the homogeneity of variance assumption The pooled approach is more powerful than the un-pooled approach (ie Power= β is larger) if σ and σ are truly equal since the degrees of freedom for the pooled approach is larger than the degrees of freedom for the un-pooled approach If σ p = σ = σ, then σ X X = σ p + where σ p is the pooled population variance Confidence Interval for µ µ is X X ± t α,df S p + The pooled sample variance is S p = ( )S +( )S +, an unbiased estimator of σ p df = + Check the assumptions that the SRS s are independent and that X and X are normal! Hypothesis Testing for µ µ : Hypotheses: H 0 : µ µ = 0 where 0 is a specific value (usually 0) H a : µ µ > 0 Choose one: H a : µ µ < 0 H a : µ µ 0 NOTE: Perform steps -4 assuming that H 0 is true! Assumptions: (a) SRS s from each population (b) The two SRS s are independent of each other (c) If sampling finite populations without replacement, then 05N and 05N (d) The sampling distributions of X and X must be at least approximately normal (so the data is normal OR, 5 for symmetric non-normal data OR, 30 for skewed data) If H 0 is true, then ( X X ) N ( 0,σ p + ) (e) Homogeneity of variance: σ = σ If one sample standard deviation is more than twice the other, then this assumption is suspect 3 Test Statistic: T = X X 0 S p + If H 0 is true, T t(df) where df = + 4, 5 and 6 Find the p-value, make a Decision, and give a Conclusion as in the un-pooled case EXAMPLE: Redo the grilled chicken problem using a pooled variance this time 4

5 II PAIRED SAMPLES (5 and 5): Samples can be paired when: From a SRS of size n, two measurements (X i,y i ) are recorded for each individual, such as when there are before-treatment and after-treatment measurements From two SRS s, each of size n, two measurements (X i,y i ) are recorded for two similar individuals, such as when there are blocks of two individuals, and each individual gets a different treatment The paired measurements are subtracted, d i = X i Y i The differences d,d,,d n form a new set sample of data, which is analyzed using the one sample procedures from Chapter 0 The sample mean difference, d, is an unbiased point estimator of the parameter, µd, the mean of the population of differences Your book uses the notation x d instead of d The sampling distribution of d has mean µ d = µ d and sampling variability σ d = σ d n A Confidence Interval for µ d is d ± t α n Check the Assumption that d is normal!,n S d A Hypothesis Test for µ d : Hypotheses: H 0 : µ d = δ 0 where δ 0 is a specific value (usually 0) H a : µ d > δ 0 Choose one: H a : µ d < δ 0 H a : µ d δ 0 NOTE: Perform steps -4 assuming that H 0 is true! Assumptions: Same as for hypothesis testing using one sample: (a) The data must be a SRS (b) If sampling a finite population without replacement, then 05N n (c) The sampling distribution of d must be at least approximately normal (so the differences are normal OR n 5 for symmetric non-normal differences OR n 30 for skewed differences) If H 0 is true, then d ( N δ 0, σ d n ) 3 The Test Statistic is T = d δ 0 S d n 4 p-value where T t(n ) when H 0 is true Alternative Hypothesis p-value H a : µ d > δ 0 P(T > t) H a : µ d < δ 0 P(T < t) H a : µ d δ 0 P(T > t ) 5 and 6 Make a Decision and give a Conclusion 5

6 EXAMPLE: An herbal medicine is tested o4 randomly selected patients with sleeping disorders The table shows the hours of sleep patients got during one night without using the herbal medicine and during another night after the herbal medicine was administered Patient Without medicine With medicine Difference The mean difference in sleep is d = and the sample standard deviation of the differences is s d = 0677 Construct an 95% confidence interval for µ d Is there sufficient evidence to suggest the herbal medicine provides a longer night s sleep, on average? Hypotheses: Check assumptions: 3 Test statistic: 4 p-value: 5 Decision: 6 Conclusion: 6

7 COMPARING POPULATION PROPORTIONS FROM INDEPENDENT SAMPLES Collect binary data from a SRS of size from populatio, which has mean p Collect binary data from a SRS of size from populatio, which has mean p If sampling a finite population without replacement, then 05N and 05N When comparing two population proportions, the difference between the two population proportions is the parameter of interest, p p The statistic ˆp ˆp is a point estimator of the parameter p p Note about Study Design: See notes for comparing two population means Facts about the sampling distribution of ˆp ˆp (6): µˆp ˆp = p p so ˆp ˆp is an unbiased point estimator of p p σ ˆp = p ( p ) ˆp + p ( p ) and = p ( p ) σˆp ˆp + p ( p ) If p 0, ( p ) 0, p 0, and ( p ) 0 then p ( p ) (ˆp ˆp ) N p p, + p ( p ) Estimating and Testing p p (6 and 63): ˆp ( ˆp AConfidence Interval for p p is ˆp ˆp ± z ) α + ˆp ( ˆp ) that the SRS s are independent and that ˆp and ˆp are normal! Checktheassumptions A Hypothesis Test for p p : Hypotheses: H 0 : p p = 0 where 0 is a specific value (usually 0) H a : p p > 0 Choose one: H a : p p < 0 H a : p p 0 NOTE: Perform steps -4 assuming that H 0 is true! Assumptions: Under H 0, if 0 = 0, then p = p = p, and so an estimate of p is ˆp = total successes total sample size = count +count + (a) SRS s from each population (b) The two SRS s are independent of each other (c) If sampling a finite population without replacement, then 05N and 05N (d) The sampling distributions for ˆp and ˆp must be approximately normal 7

8 If 0 0, then check the sample size like this: ˆp 0, ( ˆp ) 0, ˆp 0, and ( ˆp ) 0 If 0 = 0, use ˆp to check the sample size: ˆp 0, ( ˆp) 0, ˆp 0, and ( ˆp) 0 For large sample sizes, if H 0 is true, p ( p ) (ˆp ˆp ) N 0, + p ( p ) For small sample sizes, consider Fisher s exact test (65 and 653) 3 Test Statistic: If 0 0, then Z = ˆp ˆp 0 ˆp ( ˆp ) + ˆp ( ˆp ) If 0 = 0, then p = p = p and now Z = ˆp ˆp 0 ˆp( ˆp) + ˆp( ˆp) Z N(0, ) if H 0 is true 4 p-value: Alternative Hypothesis p-value H a : p p > 0 P(Z > z) H a : p p < 0 P(Z < z) H a : p p 0 P(Z > z ) 5 and 6 Make a Decision and give a Conclusion EXAMPLE: Among 00 randomly selected male car occupants over the age of 8, 7% wear seat belts Among 380 randomly selected female car occupants over the age of 8, 84% wear seat belts Test the claim that both genders have the same rate of seat belt use Use α = 005 Does there appear to be a gender gap? Hypotheses: Check assumptions: 3 Test statistic: 4 p-value: 8

9 5 Decision: 6 Conclusion: Construct a 90% CI to estimate the true difference in the proportions of seat belt users in the male and female populations R code # Comparing Two Population Means using UNPOOLED Variance: # Arby s and McDonald s chicken sandwiches Example > ttest(arby,mcd,conflevel=090,varequal=false,alternative="twosided") Two Sample t-test data: Arby and McD t = , df = 437, p-value = e-09 alternative hypothesis: true difference in means is not equal to 0 90 percent confidence interval: sample estimates: mean of x mean of y # Comparing Two Population Means using POOLED Variance: # Arby s and McDonald s chicken sandwiches Example > ttest(arby,mcd,conflevel=090,varequal=true,alternative="twosided") Two Sample t-test data: Arby and McD t = , df = 5, p-value = 786e-09 alternative hypothesis: true difference in means is not equal to 0 90 percent confidence interval: sample estimates: mean of x mean of y # Comparing Two Population Means using Matched Pairs t-test: 9

10 # Herbal Medicine Example # Here s the data given as a table: > nomed=c(,4,34,37,5,5,5,3,55,58,4,48,9,45) > med=c(9,33,35,44,5,5,5,3,6,65,44,47,3,47) > rbind(nomed,med,nomed-med) [,] [,] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,0] [,] [,] [,3] [,4] nomed med > ttest(nomed,med,conflevel=095,paired=true,alternative="less") Paired t-test data: nomed and med t = -4094, df = 3, p-value = alternative hypothesis: true difference in means is less than 0 95 percent confidence interval: -Inf sample estimates: mean of the differences # Comparing Two Population Proportions: # Seat Belt Example > proptest(c(7*00,84*380),c(00,380),conflevel=95,alternative="twosided", correct=f) -sample test for equality of proportions without continuity correction data: c(07 * 00, 084 * 380) out of c(00, 380) X-squared = 96686, df =, p-value < e-6 alternative hypothesis: twosided 95 percent confidence interval: sample estimates: prop prop Exercises Two independent means on p63: 55, 56 (answers: Yes because n is large; test statistic is -578, p-value is tiny (< 0 6 ), so the evidence suggests that the age difference is not due to chance), odd, 538 (answers: T (if first sample is symmetric and the second is skewed) TF) Paired means on p59: 55, 56 (answers: TTTF), 57, 58 (answers: YYN), odd, 54 (answer: F), 57, 59 Two independent proportions on p37: odd, 637 0

Inferences About Two Population Proportions

Inferences About Two Population Proportions MATH 130, Elements of Statistics I J. Robert Buchanan Department of Mathematics Fall 2018 Background Recall: for a single population the sampling proportion