Chapters 4-6: Inference with two samples Read sections 4.2.5, 5.2, 5.3, 6.2

Similar documents
Inferences About Two Population Proportions

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8

Midterm 1 and 2 results

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

Two-Sample Inferential Statistics

Chapter 7 Comparison of two independent samples

Chapter. Hypothesis Testing with Two Samples. Copyright 2015, 2012, and 2009 Pearson Education, Inc. 1

5 Basic Steps in Any Hypothesis Test

Ch. 7. One sample hypothesis tests for µ and σ

Lecture 10: Comparing two populations: proportions

Problem Set 4 - Solutions

Psychology 282 Lecture #4 Outline Inferences in SLR

Hypothesis testing: Steps

7 Estimation. 7.1 Population and Sample (P.91-92)

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

Chapter 7: Statistical Inference (Two Samples)

Class 24. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Exam 2 (KEY) July 20, 2009

ME3620. Theory of Engineering Experimentation. Spring Chapter IV. Decision Making for a Single Sample. Chapter IV

Confidence Intervals, Testing and ANOVA Summary

What is Experimental Design?

Chapter 9. Inferences from Two Samples. Objective. Notation. Section 9.2. Definition. Notation. q = 1 p. Inferences About Two Proportions

Lecture 26 Section 8.4. Wed, Oct 14, 2009

For tests of hypotheses made about population, we consider. only tests having a hypothesis of. Note that under the assumption of

Statistics 251: Statistical Methods

Inference for Distributions Inference for the Mean of a Population. Section 7.1

Analysis of Variance (ANOVA)

Hypothesis testing: Steps

Testing a Claim about the Difference in 2 Population Means Independent Samples. (there is no difference in Population Means µ 1 µ 2 = 0) against

Difference between means - t-test /25

10.1. Comparing Two Proportions. Section 10.1

7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between

STAT Chapter 8: Hypothesis Tests

Purposes of Data Analysis. Variables and Samples. Parameters and Statistics. Part 1: Probability Distributions

STAT Chapter 9: Two-Sample Problems. Paired Differences (Section 9.3)

Lecture 5: ANOVA and Correlation

Chapter 23. Inferences About Means. Monday, May 6, 13. Copyright 2009 Pearson Education, Inc.

Discrete Multivariate Statistics

Lecture 28 Chi-Square Analysis

Section 9.4. Notation. Requirements. Definition. Inferences About Two Means (Matched Pairs) Examples

Lecture 9. Selected material from: Ch. 12 The analysis of categorical data and goodness of fit tests

MAT 2379, Introduction to Biostatistics, Sample Calculator Questions 1. MAT 2379, Introduction to Biostatistics

Chapter 10: STATISTICAL INFERENCE FOR TWO SAMPLES. Part 1: Hypothesis tests on a µ 1 µ 2 for independent groups

Sampling Distributions: Central Limit Theorem

LECTURE 12 CONFIDENCE INTERVAL AND HYPOTHESIS TESTING

Chapter 7. Inference for Distributions. Introduction to the Practice of STATISTICS SEVENTH. Moore / McCabe / Craig. Lecture Presentation Slides

Hypothesis testing. Data to decisions

Difference Between Pair Differences v. 2 Samples

Sleep data, two drugs Ch13.xls

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

STATISTICS 141 Final Review

CHAPTER 9, 10. Similar to a courtroom trial. In trying a person for a crime, the jury needs to decide between one of two possibilities:

hypotheses. P-value Test for a 2 Sample z-test (Large Independent Samples) n > 30 P-value Test for a 2 Sample t-test (Small Samples) n < 30 Identify α

Chapter 26: Comparing Counts (Chi Square)

Epidemiology Principles of Biostatistics Chapter 10 - Inferences about two populations. John Koval

Population Variance. Concepts from previous lectures. HUMBEHV 3HB3 one-sample t-tests. Week 8

Independent Samples t tests. Background for Independent Samples t test

CHAPTER 4 Analysis of Variance. One-way ANOVA Two-way ANOVA i) Two way ANOVA without replication ii) Two way ANOVA with replication

Testing Independence

Hypothesis Testing Problem. TMS-062: Lecture 5 Hypotheses Testing. Alternative Hypotheses. Test Statistic

Inference for Distributions Inference for the Mean of a Population

Performance Evaluation and Comparison

Two sided, two sample t-tests. a) IQ = 100 b) Average height for men = c) Average number of white blood cells per cubic millimeter is 7,000.

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

An inferential procedure to use sample data to understand a population Procedures

Hypothesis testing for µ:

Comparison of Two Population Means

Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.

Inferential statistics

Sampling Distribution of a Sample Proportion

Inferences About Two Proportions

The t-statistic. Student s t Test

Statistics for IT Managers

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 9.1-1

Ch18 links / ch18 pdf links Ch18 image t-dist table

df=degrees of freedom = n - 1

Contingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami

Solution: First note that the power function of the test is given as follows,

23. Inference for regression

Statistical Analysis for QBIC Genetics Adapted by Ellen G. Dow 2017

Chapter 9 Inferences from Two Samples

STA 101 Final Review

DETERMINE whether the conditions for performing inference are met. CONSTRUCT and INTERPRET a confidence interval to compare two proportions.

Chapters 4-6: Estimation

1 Statistical inference for a population mean

Probability and Statistics

The Empirical Rule, z-scores, and the Rare Event Approach

Inference for Regression Inference about the Regression Model and Using the Regression Line

HYPOTHESIS TESTING II TESTS ON MEANS. Sorana D. Bolboacă

FinalExamReview. Sta Fall Provided: Z, t and χ 2 tables

HYPOTHESIS TESTING. Hypothesis Testing

Business Statistics: Lecture 8: Introduction to Estimation & Hypothesis Testing

The Difference in Proportions Test

MATH Notebook 3 Spring 2018

MATH Chapter 21 Notes Two Sample Problems

One sample problem. sample mean: ȳ = . sample variance: s 2 = sample standard deviation: s = s 2. y i n. i=1. i=1 (y i ȳ) 2 n 1

Quantitative Methods for Economics, Finance and Management (A86050 F86050)

Introduction to Statistical Data Analysis III

Sampling Distribution of a Sample Proportion

Transcription:

Chapters 4-6: Inference with two samples Read sections 45, 5, 53, 6 COMPARING TWO POPULATION MEANS When presented with two samples that you wish to compare, there are two possibilities: I independent samples and II paired samples I INDEPENDENT SAMPLES: Select a SRS of size from populatio which has mean µ and standard deviation σ Independently from the first sample, select a SRS of size from populatio which has mean µ and standard deviation σ If sampling a finite population without replacement, then 05N and 05N The difference between the two population means is the parameter of interest, µ µ The statistic X X is a point estimator of the parameter µ µ Note about Study Design: Completely Randomized Design Experiment Two treatment groups from a CRD are independent In a CRD, it is possible to claim that the treatment caused the difference in means If the individuals in the CRD were chosen from a SRS, then conclusions about µ µ can be extended to the populations from which the individuals were drawn However, if the individuals were not a SRS, then conclusions to larger populations are dubious Observational Study Two samples to be compared in an observational study are considered to be independent if individuals were randomly chosen from each respective population Do not claim that an explanatory variable caused the difference in means Facts about the sampling distribution of X X : µ X X = µ µ, so X X is an unbiased point estimator of µ µ σ = σ X X + σ σ and σ X = X + σ IfthesamplingdistributionsofX andx areapproximately ( normal,thenthesamplingdistribution σ of X X is approximately normal, ie, X X N µ µ, + σ )

Un-pooled Approach to estimating and testing µ µ (53 and 53): A Confidence Interval for µ µ is X X ± t α,df S + S df = (V +V ) V + V and V = S and V = S A conservative degree of freedom estimate is df = min(, ) Check the assumptions that the SRS s are independent and that X and X are normal! A Hypothesis Test for µ µ is: Hypotheses: H 0 : µ µ = 0 where 0 is a specific value (usually 0) H a : µ µ > 0 Choose one: H a : µ µ < 0 H a : µ µ 0 NOTE: Perform steps -4 assuming that H 0 is true! Assumptions: (a) SRS s from each population (b) The two SRS s are independent of each other (c) If sampling finite populations without replacement, then 05N and 05N (d) The sampling distributions of X and X must be at least approximately normal (so the data is normal OR, 5 for symmetric non-normal data OR, 30 for skewed data) If H 0 is true, then ( X X ) N ( 0, σ + σ ) 3 Test Statistic: T = X X 0 S + S When H 0 is true, T t(df) where df = (V +V ) V + V A conservative degree of freedom estimate is df = min(, ) 4 The p-value is the probability of obtaining a sample statistic that is as extreme or more extreme than what was actually observed assuming that H 0 is true Alternative Hypothesis p-value H a : µ µ > 0 P(T > t) H a : µ µ < 0 P(T < t) H a : µ µ 0 P(T > t ) 5 Decision: Given a predetermined significance level α: Reject H 0 if p-value α Fail to Reject H 0 if p-value > α 6 Conclusion: a statement of how much evidence there is for the alternative hypothesis If you reject H 0, then you conclude The evidence suggests H a

If you fail to reject H 0, then you conclude The evidence fails to suggest H a In each case, specify H a in terms of the problem EXAMPLE: To compare the mean cholesterol content (in milligrams) of grilled chicken sandwiches from Arby s and McDonald s, you randomly select several sandwiches from each restaurant and measure their cholesterol contents From Arby s, = 5 sandwiches are randomly selected, x = 60 mg and s = 359 mg From McDonald s, = sandwiches are randomly selected, x = 70 mg and s = 4 mg Construct an 90% confidence interval for the difference in mean cholesterol content of grilled chicken sandwiches Is there sufficient evidence to conclude the McDonald s grilled chicken sandwiches have a higher cholesterol content, on average, compared to Arby s? Hypotheses: Check assumptions: 3 Test statistic: 4 p-value: 5 Decision: 6 Conclusion: 3

Pooled Approach to testing and estimating µ µ (536): Assume that σ = σ, called the homogeneity of variance assumption The pooled approach is more powerful than the un-pooled approach (ie Power= β is larger) if σ and σ are truly equal since the degrees of freedom for the pooled approach is larger than the degrees of freedom for the un-pooled approach If σ p = σ = σ, then σ X X = σ p + where σ p is the pooled population variance Confidence Interval for µ µ is X X ± t α,df S p + The pooled sample variance is S p = ( )S +( )S +, an unbiased estimator of σ p df = + Check the assumptions that the SRS s are independent and that X and X are normal! Hypothesis Testing for µ µ : Hypotheses: H 0 : µ µ = 0 where 0 is a specific value (usually 0) H a : µ µ > 0 Choose one: H a : µ µ < 0 H a : µ µ 0 NOTE: Perform steps -4 assuming that H 0 is true! Assumptions: (a) SRS s from each population (b) The two SRS s are independent of each other (c) If sampling finite populations without replacement, then 05N and 05N (d) The sampling distributions of X and X must be at least approximately normal (so the data is normal OR, 5 for symmetric non-normal data OR, 30 for skewed data) If H 0 is true, then ( X X ) N ( 0,σ p + ) (e) Homogeneity of variance: σ = σ If one sample standard deviation is more than twice the other, then this assumption is suspect 3 Test Statistic: T = X X 0 S p + If H 0 is true, T t(df) where df = + 4, 5 and 6 Find the p-value, make a Decision, and give a Conclusion as in the un-pooled case EXAMPLE: Redo the grilled chicken problem using a pooled variance this time 4

II PAIRED SAMPLES (5 and 5): Samples can be paired when: From a SRS of size n, two measurements (X i,y i ) are recorded for each individual, such as when there are before-treatment and after-treatment measurements From two SRS s, each of size n, two measurements (X i,y i ) are recorded for two similar individuals, such as when there are blocks of two individuals, and each individual gets a different treatment The paired measurements are subtracted, d i = X i Y i The differences d,d,,d n form a new set sample of data, which is analyzed using the one sample procedures from Chapter 0 The sample mean difference, d, is an unbiased point estimator of the parameter, µd, the mean of the population of differences Your book uses the notation x d instead of d The sampling distribution of d has mean µ d = µ d and sampling variability σ d = σ d n A Confidence Interval for µ d is d ± t α n Check the Assumption that d is normal!,n S d A Hypothesis Test for µ d : Hypotheses: H 0 : µ d = δ 0 where δ 0 is a specific value (usually 0) H a : µ d > δ 0 Choose one: H a : µ d < δ 0 H a : µ d δ 0 NOTE: Perform steps -4 assuming that H 0 is true! Assumptions: Same as for hypothesis testing using one sample: (a) The data must be a SRS (b) If sampling a finite population without replacement, then 05N n (c) The sampling distribution of d must be at least approximately normal (so the differences are normal OR n 5 for symmetric non-normal differences OR n 30 for skewed differences) If H 0 is true, then d ( N δ 0, σ d n ) 3 The Test Statistic is T = d δ 0 S d n 4 p-value where T t(n ) when H 0 is true Alternative Hypothesis p-value H a : µ d > δ 0 P(T > t) H a : µ d < δ 0 P(T < t) H a : µ d δ 0 P(T > t ) 5 and 6 Make a Decision and give a Conclusion 5

EXAMPLE: An herbal medicine is tested o4 randomly selected patients with sleeping disorders The table shows the hours of sleep patients got during one night without using the herbal medicine and during another night after the herbal medicine was administered Patient 3 4 5 6 7 8 9 0 3 4 Without medicine 0 4 34 37 5 5 5 3 55 58 4 48 9 45 With medicine 9 33 35 44 50 50 5 53 60 65 44 47 3 47 Difference -9-9 -0-07 0 0 0 0-05 -07-0 0-0 -0 The mean difference in sleep is d = -0436 and the sample standard deviation of the differences is s d = 0677 Construct an 95% confidence interval for µ d Is there sufficient evidence to suggest the herbal medicine provides a longer night s sleep, on average? Hypotheses: Check assumptions: 3 Test statistic: 4 p-value: 5 Decision: 6 Conclusion: 6

COMPARING POPULATION PROPORTIONS FROM INDEPENDENT SAMPLES Collect binary data from a SRS of size from populatio, which has mean p Collect binary data from a SRS of size from populatio, which has mean p If sampling a finite population without replacement, then 05N and 05N When comparing two population proportions, the difference between the two population proportions is the parameter of interest, p p The statistic ˆp ˆp is a point estimator of the parameter p p Note about Study Design: See notes for comparing two population means Facts about the sampling distribution of ˆp ˆp (6): µˆp ˆp = p p so ˆp ˆp is an unbiased point estimator of p p σ ˆp = p ( p ) ˆp + p ( p ) and = p ( p ) σˆp ˆp + p ( p ) If p 0, ( p ) 0, p 0, and ( p ) 0 then p ( p ) (ˆp ˆp ) N p p, + p ( p ) Estimating and Testing p p (6 and 63): ˆp ( ˆp AConfidence Interval for p p is ˆp ˆp ± z ) α + ˆp ( ˆp ) that the SRS s are independent and that ˆp and ˆp are normal! Checktheassumptions A Hypothesis Test for p p : Hypotheses: H 0 : p p = 0 where 0 is a specific value (usually 0) H a : p p > 0 Choose one: H a : p p < 0 H a : p p 0 NOTE: Perform steps -4 assuming that H 0 is true! Assumptions: Under H 0, if 0 = 0, then p = p = p, and so an estimate of p is ˆp = total successes total sample size = count +count + (a) SRS s from each population (b) The two SRS s are independent of each other (c) If sampling a finite population without replacement, then 05N and 05N (d) The sampling distributions for ˆp and ˆp must be approximately normal 7

If 0 0, then check the sample size like this: ˆp 0, ( ˆp ) 0, ˆp 0, and ( ˆp ) 0 If 0 = 0, use ˆp to check the sample size: ˆp 0, ( ˆp) 0, ˆp 0, and ( ˆp) 0 For large sample sizes, if H 0 is true, p ( p ) (ˆp ˆp ) N 0, + p ( p ) For small sample sizes, consider Fisher s exact test (65 and 653) 3 Test Statistic: If 0 0, then Z = ˆp ˆp 0 ˆp ( ˆp ) + ˆp ( ˆp ) If 0 = 0, then p = p = p and now Z = ˆp ˆp 0 ˆp( ˆp) + ˆp( ˆp) Z N(0, ) if H 0 is true 4 p-value: Alternative Hypothesis p-value H a : p p > 0 P(Z > z) H a : p p < 0 P(Z < z) H a : p p 0 P(Z > z ) 5 and 6 Make a Decision and give a Conclusion EXAMPLE: Among 00 randomly selected male car occupants over the age of 8, 7% wear seat belts Among 380 randomly selected female car occupants over the age of 8, 84% wear seat belts Test the claim that both genders have the same rate of seat belt use Use α = 005 Does there appear to be a gender gap? Hypotheses: Check assumptions: 3 Test statistic: 4 p-value: 8

5 Decision: 6 Conclusion: Construct a 90% CI to estimate the true difference in the proportions of seat belt users in the male and female populations R code # Comparing Two Population Means using UNPOOLED Variance: # Arby s and McDonald s chicken sandwiches Example > ttest(arby,mcd,conflevel=090,varequal=false,alternative="twosided") Two Sample t-test data: Arby and McD t = -86834, df = 437, p-value = 355646e-09 alternative hypothesis: true difference in means is not equal to 0 90 percent confidence interval: -9884-80859 sample estimates: mean of x mean of y 60 70 # Comparing Two Population Means using POOLED Variance: # Arby s and McDonald s chicken sandwiches Example > ttest(arby,mcd,conflevel=090,varequal=true,alternative="twosided") Two Sample t-test data: Arby and McD t = -864354, df = 5, p-value = 786e-09 alternative hypothesis: true difference in means is not equal to 0 90 percent confidence interval: -97605-803795 sample estimates: mean of x mean of y 60 70 # Comparing Two Population Means using Matched Pairs t-test: 9

# Herbal Medicine Example # Here s the data given as a table: > nomed=c(,4,34,37,5,5,5,3,55,58,4,48,9,45) > med=c(9,33,35,44,5,5,5,3,6,65,44,47,3,47) > rbind(nomed,med,nomed-med) [,] [,] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,0] [,] [,] [,3] [,4] nomed 0 4 34 37 5 5 5 3 55 58 4 48 9 45 med 9 33 35 44 50 50 5 3 60 65 44 47 3 47-9 -9-0 -07 0 0 00 00-05 -07-0 0-0 -0 > ttest(nomed,med,conflevel=095,paired=true,alternative="less") Paired t-test data: nomed and med t = -4094, df = 3, p-value = 00576 alternative hypothesis: true difference in means is less than 0 95 percent confidence interval: -Inf -054539 sample estimates: mean of the differences -0435743 # Comparing Two Population Proportions: # Seat Belt Example > proptest(c(7*00,84*380),c(00,380),conflevel=95,alternative="twosided", correct=f) -sample test for equality of proportions without continuity correction data: c(07 * 00, 084 * 380) out of c(00, 380) X-squared = 96686, df =, p-value < e-6 alternative hypothesis: twosided 95 percent confidence interval: -043856-0096474 sample estimates: prop prop 07 084 Exercises Two independent means on p63: 55, 56 (answers: Yes because n is large; test statistic is -578, p-value is tiny (< 0 6 ), so the evidence suggests that the age difference is not due to chance), 53-535 odd, 538 (answers: T (if first sample is symmetric and the second is skewed) TF) Paired means on p59: 55, 56 (answers: TTTF), 57, 58 (answers: YYN), 59-53 odd, 54 (answer: F), 57, 59 Two independent proportions on p37: 63-633 odd, 637 0