Lecture 15: Inference Based on Two Samples

Similar documents
Lecture 12: Small Sample Intervals Based on a Normal Population Distribution

You may not use your books/notes on this exam. You may use calculator.

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

Design of Engineering Experiments Part 2 Basic Statistical Concepts Simple comparative experiments

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

Mathematical statistics

CBA4 is live in practice mode this week exam mode from Saturday!

Chapter 10: Inferences based on two samples

STAT 511. Lecture : Simple linear regression Devore: Section Prof. Michael Levine. December 3, Levine STAT 511

One sample problem. sample mean: ȳ = . sample variance: s 2 = sample standard deviation: s = s 2. y i n. i=1. i=1 (y i ȳ) 2 n 1

An inferential procedure to use sample data to understand a population Procedures

PHP2510: Principles of Biostatistics & Data Analysis. Lecture X: Hypothesis testing. PHP 2510 Lec 10: Hypothesis testing 1

Smoking Habits. Moderate Smokers Heavy Smokers Total. Hypertension No Hypertension Total

Epidemiology Principles of Biostatistics Chapter 10 - Inferences about two populations. John Koval

Psychology 282 Lecture #4 Outline Inferences in SLR

Chapter 3. Comparing two populations

Outline. PubH 5450 Biostatistics I Prof. Carlin. Confidence Interval for the Mean. Part I. Reviews

Review. December 4 th, Review

Stat 427/527: Advanced Data Analysis I

Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.

T test for two Independent Samples. Raja, BSc.N, DCHN, RN Nursing Instructor Acknowledgement: Ms. Saima Hirani June 07, 2016

Chapter 7: Statistical Inference (Two Samples)

Midterm 1 and 2 results

Problem Set 4 - Solutions

Simple Linear Regression. Material from Devore s book (Ed 8), and Cengagebrain.com

Lecture Slides. Elementary Statistics. by Mario F. Triola. and the Triola Statistics Series

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

As an example, consider the Bond Strength data in Table 2.1, atop page 26 of y1 y 1j/ n , S 1 (y1j y 1) 0.

Confidence Regions For The Ratio Of Two Percentiles

CHAPTER 8. Test Procedures is a rule, based on sample data, for deciding whether to reject H 0 and contains:

Notes for Week 13 Analysis of Variance (ANOVA) continued WEEK 13 page 1

MBA 605, Business Analytics Donald D. Conant, Ph.D. Master of Business Administration

1 Statistical inference for a population mean

STAT Chapter 8: Hypothesis Tests

Business Statistics. Lecture 10: Course Review

CHAPTER 10 Comparing Two Populations or Groups

CHAPTER 10 Comparing Two Populations or Groups

Design of Engineering Experiments

INTERVAL ESTIMATION AND HYPOTHESES TESTING

7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between

Statistics. Statistics

Chapter 7 Comparison of two independent samples

8.1-4 Test of Hypotheses Based on a Single Sample

Introduction to Statistics

BIO5312 Biostatistics Lecture 6: Statistical hypothesis testings

Hypothesis Testing hypothesis testing approach formulation of the test statistic

Lecture 26: Chapter 10, Section 2 Inference for Quantitative Variable Confidence Interval with t

Simple Linear Regression. (Chs 12.1, 12.2, 12.4, 12.5)

Business Statistics: Lecture 8: Introduction to Estimation & Hypothesis Testing

Lecture 9 Two-Sample Test. Fall 2013 Prof. Yao Xie, H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech

STAT Chapter 9: Two-Sample Problems. Paired Differences (Section 9.3)

Inference for Distributions Inference for the Mean of a Population

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

LECTURE 12 CONFIDENCE INTERVAL AND HYPOTHESIS TESTING

CHAPTER 9, 10. Similar to a courtroom trial. In trying a person for a crime, the jury needs to decide between one of two possibilities:

Sampling distribution of t. 2. Sampling distribution of t. 3. Example: Gas mileage investigation. II. Inferential Statistics (8) t =

Classroom Activity 7 Math 113 Name : 10 pts Intro to Applied Stats

+ Specify 1 tail / 2 tail

Statistics: CI, Tolerance Intervals, Exceedance, and Hypothesis Testing. Confidence intervals on mean. CL = x ± t * CL1- = exp

Comparing Means from Two-Sample

i=1 X i/n i=1 (X i X) 2 /(n 1). Find the constant c so that the statistic c(x X n+1 )/S has a t-distribution. If n = 8, determine k such that

Inferences about central values (.)

HYPOTHESIS TESTING II TESTS ON MEANS. Sorana D. Bolboacă

Chapter 9 Inferences from Two Samples

STP 226 EXAMPLE EXAM #3 INSTRUCTOR:

Ch 8: Inference for two samples

Hypothesis Testing One Sample Tests

SMAM 314 Exam 3 Name. F A. A null hypothesis that is rejected at α =.05 will always be rejected at α =.01.

Paired comparisons. We assume that

Simple Linear Regression

EC2001 Econometrics 1 Dr. Jose Olmo Room D309

The Components of a Statistical Hypothesis Testing Problem

Inference for Distributions Inference for the Mean of a Population. Section 7.1

Confidence Intervals, Testing and ANOVA Summary

STT 843 Key to Homework 1 Spring 2018

On Assumptions. On Assumptions

Lecture 17: Small-Sample Inferences for Normal Populations. Confidence intervals for µ when σ is unknown

IENG581 Design and Analysis of Experiments INTRODUCTION

Confidence Intervals with σ unknown

Chapter 10: Analysis of variance (ANOVA)

Chapter 9. Inferences from Two Samples. Objective. Notation. Section 9.2. Definition. Notation. q = 1 p. Inferences About Two Proportions

4.1 Hypothesis Testing

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Visual interpretation with normal approximation

Multivariate Statistical Analysis

Correlation Analysis

Comparison of Two Population Means

Confidence intervals and Hypothesis testing

Population Variance. Concepts from previous lectures. HUMBEHV 3HB3 one-sample t-tests. Week 8

HYPOTHESIS TESTING. Hypothesis Testing

CH.9 Tests of Hypotheses for a Single Sample

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8

The t-test: A z-score for a sample mean tells us where in the distribution the particular mean lies

Section 9.4. Notation. Requirements. Definition. Inferences About Two Means (Matched Pairs) Examples

Basic Statistics and Probability Chapter 9: Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses

Relax and good luck! STP 231 Example EXAM #2. Instructor: Ela Jackiewicz

Hypothesis Testing. Hypothesis: conjecture, proposition or statement based on published literature, data, or a theory that may or may not be true

CHAPTER EIGHT TESTS OF HYPOTHESES

Single Sample Means. SOCY601 Alan Neustadtl

Introduction to Business Statistics QM 220 Chapter 12

Transcription:

Lecture 15: Inference Based on Two Samples MSU-STT 351-Sum17B (P. Vellaisamy: STT 351-Sum17B) Probability & Statistics for Engineers 1 / 26

9.1 Z-tests and CI s for (µ 1 µ 2 ) The assumptions: (i) X = {X 1,..., X m } is a random sample from N(µ 1, σ 2 1 ) (ii) Y = {Y 1,..., Y n } is a random sample from N(µ 2, σ 2 2 ) (iii) The samples X and Y are independent. Case I: σ 2 1 and σ2 2 are known. Note E(X Y) = µ 1 µ 2 ; V(X Y) = σ2 1 m + σ2 2 n == σ2 x y. Hence, Z = (X Y) (µ 1 µ 2 ) N(0, 1), σ 2 1 m + σ2 2 n and is used to test hypothesis concerning (µ 1 µ 2 ). (P. Vellaisamy: STT 351-Sum17B) Probability & Statistics for Engineers 2 / 26

Case I: σ 2 1 and σ2 2 known. Suppose H 0 : µ 1 µ 2 = 0. Then the test statistic is Z = (X Y) 0 σ 2 1 m + σ2 2 n and the test can be carried out in the usual way. When µ 1 µ 2 = 1, the probability of type II error for H 1 : µ 1 µ 2 > 1 is ( β( 1 ) = Φ z α 1 ) 0. σ x y Also, the sample sizes m and n that satisfy specified α and β (when µ 1 µ 2 = 1 ) are given by σ 2 1 m + σ2 2 n = ( 1 0 ) 2 (z α + z β ) 2. (P. Vellaisamy: STT 351-Sum17B) Probability & Statistics for Engineers 3 / 26

Example 1 (Ex 6): An experiment to compare the tension bond strength of polymer latex modified mortar to that of unmodified mortar resulted in X = 18.12 kgf/cm2 for the modified mortar (m = 40) and Y = 16.87 kgf/cm2 for the unmodified mortar (n = 32). Let µ 1 and µ 2 be the true average tension bond strengths for the modified and unmodified mortars, respectively. Assume that the bond strength distributions are both normal. (a) Assume that σ 1 = 1.6 and σ 2 = 1.4, test H 0 : µ 1 µ 2 = 0 versus H 1 : µ 1 µ 2 > 0 at level α = 0.01. (b) Compute the probability of a type II error for the test of Part(a) when µ 1 µ 2 = 1. (c) Suppose the investigator decided to use a level α = 0.05 test and wished β = 0.10 when µ 1 µ 2 = 1. If m = 40, what value of n is necessary? (d) How would the analysis and conclusion of Part (a) change if σ 1 and σ 2 were unknown but S 1 = 1.6 and S 2 = 1.4? (P. Vellaisamy: STT 351-Sum17B) Probability & Statistics for Engineers 4 / 26

Solution: (a) H 0 should be rejected if z 2.33 = z 0.01. Since z = 18.12 16.87 2.56 40 + 1.96 32 = 3.53 2.33. Hence, H 0 should be rejected at level.01. ( (b) β(1) = Φ 2.33 1 0 ) = Φ(.50) =.3085.3539 (c) 2.56 40 + 1.96 n = 1 1.96 =.1169 (1.645 + 1.28) 2 n =.0529 n = 37.06, So use n = 38. (d) Since n = 32 is a small sample, a small sample t-procedure should be used and the appropriate conclusion would follow. Note, however, that the test statistic value 3.53 would not change, and thus we would still reject H 0 at the.01 significance level. (P. Vellaisamy: STT 351-Sum17B) Probability & Statistics for Engineers 5 / 26

Case II: Large sample z-tests (unknown σ 2 1 σ2 2 variances; any population) Assume (i) Let X 1,..., X m is a random sample from any population with mean µ 1 and variance σ 2 1 ; (ii) Let Y 1,..., Y n is a random sample from any population with mean µ 2 and variance σ 2 2 ; (iii) The samples X = (X 1,..., X n ) and Y = (Y 1,..., Y n ) are independent. Our interest is on µ 1 µ 2, where both σ 2 1 and σ2 2 are unknown. Assume also both m and n are large. (P. Vellaisamy: STT 351-Sum17B) Probability & Statistics for Engineers 6 / 26

Test statistic The Z-test statistic Z = X Y (µ 1 µ 2 ) N(0, 1), S 2 1 m + S2 2 n under H 0. This statistics could be used for testiong about for µ 1 µ 2. Also, the (1 α) level confidence interval for µ 1 µ 2 is x y ± z α/2 S 2 1 m + S2 2 n. (P. Vellaisamy: STT 351-Sum17B) Probability & Statistics for Engineers 7 / 26

The P-Value Approach Null hypothesis H 0 : µ 1 µ 2 = 0. Alternative hypothesis H 1 : µ 1 µ 2 0 ; or H 1 : µ 1 µ 2 0 ; or H 1 : µ 1 µ 2 0. Test statistic value z = x y 0. S 2 1 m + S2 2 n The p-value is 2P(Z > z ) or P(Z < z) or P(Z > z), as per alternative hypotheses given above. (P. Vellaisamy: STT 351-Sum17B) Probability & Statistics for Engineers 8 / 26

Example 2 (Ex 8): Tensile strength tests were carried out on two different grades of wine rod, resulting in the following data. Grader Sample Size Sample Mean (kg/mm 2 ) Sample SD AISI 1064 m=129 X = 107.6 s 1 = 1.3 AISI 1078 n=129 Y = 123.6 s 2 = 2.0 (a) Does the data suggest that true average strength for the 1078 grade exceeds that for the 1064 grade by more than 10kg/mm 2? Test the appropriate hypotheses using the p-value approach. (b) Estimate the difference between true average strengths for the two grades so that it provides information about precision and reliability. (P. Vellaisamy: STT 351-Sum17B) Probability & Statistics for Engineers 9 / 26

Solution: (a) 1 Parameter of interest: µ 1 µ 2 = the true difference of mean tensile of the 1064 grade and the 1078 grade wire rod. Let µ 1 = 1064 grade average and µ 2 = 1078 grade average. 2 Test H 0 : µ 1 µ 2 = 10 vs H 1 : µ 1 µ 2 < 10 3 The test statistic is Z = x y 0 S 2 1 m + S2 2 n = 4 Reject H 0 if p-value < α = 0.05. x y ( 10). S 2 1 m + S2 2 n. (P. Vellaisamy: STT 351-Sum17B) Probability & Statistics for Engineers 10 / 26

1 The observed value of Z is z = (107.6 123.6) ( 10) 1.3 2 129 + 2.02 129 = 6.210 = 28.57. 2 For a lower-tailed test, the p-value =Φ( 28.57) 0 < α (for any value), so reject H 0. The data suggests that the mean tensile strength of the 1078 grade exceeds that of the 1064 grade by more than 10. (b) The requested information can be provided by a 95% confidence interval for µ 1 µ 2 : s 2 1 (x y) ± 1.96 m + s2 2 = ( 16) ± 1.96(.210) = ( 16.412, 15.588) n (P. Vellaisamy: STT 351-Sum17B) Probability & Statistics for Engineers 11 / 26

Example 3 (Ex 12): The accompanying table gives summary data on cube compressive strength (N/mm2) for concrete specimens made with a pulverized fuel-ash mix. Age (days) Sample Size Sample Mean Sample SD 7 68 26.99 4.89 28 74 35.76 6.43 Calculate and interpret a 99% CI for the difference between true average 7-day strength and true average 7-day strength and true average 28-day strength. (P. Vellaisamy: STT 351-Sum17B) Probability & Statistics for Engineers 12 / 26

Solution: The normal confidence interval is (note z 0.005 = 2.58) x ȳ ± 2.58 s 2 1 m + s2 2 n = ( 8.77) ± 2.58 9.104 = 8.77 ± 2.46 = ( 11.23, 6.31). With 99% confidence, we may say that the true difference between the average 7-day and 28-day strengths is between 11.23 and 6.31 N/mm 2. (P. Vellaisamy: STT 351-Sum17B) Probability & Statistics for Engineers 13 / 26

Example 4: Use the accompanying data to estimate with a 95% confidence interval for the difference between true average compressive strength (N/mm 2 ) for 7-day-old concrete specimens and true average strength for 28-day-old specimens. 7-day old : n 1 = 68, x 1 = 26.99, s 1 = 4.89 28-day old:n 2 = 74, x 2 = 35.76, s 2 = 6.43. Solution: A 95% confidence interval for the difference between the true average compressive strength for 7-day-old concrete specimens and the true average strength for 28-day-old concrete specimens is: ( s 2 1 x 1 x 2 ± 1.96 + s2 ) 2 (4.89) 2 = (26.99 35.76) ± 1.96 + (6.43)2 n 1 n 2 68 74 = 8.77 ± 1.87 = ( 10.64, 6.9). (P. Vellaisamy: STT 351-Sum17B) Probability & Statistics for Engineers 14 / 26

9.2 Two Sample t-test and Confidence Intervals (Small Sample Situation) Assumptions: (i) The two samples are independent. (ii) Both samples are simple random samples from normal populations. (iii) The variances are unknown and unequal. The test statistic is T = X Y (µ 1 µ 2 ) t ν, S 2 1 m + S2 2 n t-distribution with df ν which is estimated from the data as ν = (s2 1 /m + s2 2 /n)2 (s 2 1 /m)2 m 1 + (s2 2 /n)2 n 1. (P. Vellaisamy: STT 351-Sum17B) Probability & Statistics for Engineers 15 / 26

The two sample t-test : Null hypothesis: H 0 : µ 1 µ 2 = 0. Alternative hypotheses: H 1 : µ 1 µ 2 0 ; H 1 : µ 1 µ 2 0 ; or H 1 : µ 1 µ 2 0. The Test Statistic is: T ν = x y 0 s 2 1m + s2 2n. The p-value: 2P(T ν > t ); P(T ν < t); or P(T ν > t), as per H 1 defined above. The (1 α) CI based on two sample t-test: x ȳ ± t ν,α/2 s 2 1 m + s2 2 n. (P. Vellaisamy: STT 351-Sum17B) Probability & Statistics for Engineers 16 / 26

Note: (i) The two-sample T-statistic does not have a t-distribution and its exact distribution involves σ 1 and σ 2. The approximation used here is quite accurate when both n 1, n 2 5, and is used in most statistical softwares. (ii) Sometimes, the t-distribution with ν = min{n 1 1, n 2 1} is also used, for simplicity. Example 4 (Ex 18): Let µ 1 and µ 2 devote true average densities for two different types of brick. Assuming normality of the two density distributions, test H 0 : µ 1 µ 2 = 0 versus H 1 : µ 1 µ 2 0 using the following data: m = 6, X = 22.73, s 1 = 0.164; n = 5, Y = 21.95 and s 2 = 0.240. (P. Vellaisamy: STT 351-Sum17B) Probability & Statistics for Engineers 17 / 26

Solution: With H 0 : µ 1 µ 2 = 0 vs H 1 : µ 1 µ 2 0, we will reject H 0 if p-value < α. Now ( (.164) 2 ) + (.240)2 2 6 5 ν = = ((.164) 2 /6) 2 ((.240) 2 /5) 2 6.8 6, The test statistic value is t = 5 4 22.73 21.95 (.164) 2 + (.240)2 6 5 =.78.1265 = 6.17 which leads to the p-value of 2[P(T 6 > 6.17)] = 2(.0005) =.001 < α. We reject H 0 and conclude that there is a difference in the densities of the two brick types. (P. Vellaisamy: STT 351-Sum17B) Probability & Statistics for Engineers 18 / 26

Example 5 (Ex 32): An article gave the following summary data on provisional stress limits for specimens constructed using two different types of wood: Type of Wood Sample Size Sample Mean Sample SD Red oak 14 8.48 0.79 Douglas fir 10 6.65 1.28 Assuming that both samples were selected from normal distributions, carry out a test of hypotheses to decide whether the true average proportional stress limit for red oak joints exceeds that for Douglas fir joints by more than 1 MPa. (P. Vellaisamy: STT 351-Sum17B) Probability & Statistics for Engineers 19 / 26

Solution: Let µ 1 = the true average proportional stress limit for red oak and let µ 2 = the true average proportional stress limit for Douglas fir. We test H 0 : µ 1 µ 2 = 1 vs H 1 : µ 1 µ 2 > 1. The test statistic s value is t = (8.48 6.65) 1 79 2 14 1.282 10 = 1.83.2084 = 1.818. and has degrees of freedom ν = (79 2 /14) 2 13 (.2084) 2 + (1.282 /10) 2 9 = 13.85 13. The p-value=p(t 13 > 1.8) = 0.048. We would reject H 0 at significance levels greater than.046 (e.g., the standard 5% significance level). At α =.05, the data suggests that true average proportional stress limit for red oak exceeds that of Douglas fir by more than 1 MPa. (P. Vellaisamy: STT 351-Sum17B) Probability & Statistics for Engineers 20 / 26

Pooled t-test (unknown but equal variances σ 2 1 = σ2 2 = σ2 ) When the normal population variances are the equal (i.e., σ 2 1 = σ2 2 = σ2 ), the pooled estimator of common σ 2 is S 2 p = (m 1) (n 1) m + n 2 S2 1 + m + n 2 S2 2. and the pooled t-statistic for H 0 : µ 1 = µ 2 is T = (X Y) (µ 1 µ 2 ), 1 s p m + 1 n which follows t distribution with ν = (m + n 2) df. (P. Vellaisamy: STT 351-Sum17B) Probability & Statistics for Engineers 21 / 26

Example 6 (Ex.34): Consider the pooled T-variable T = (X Y) (µ 1 µ 2 ) 1 s p m + 1 n t m+n 2 when both population distributions are normal with σ 1 = σ 2. (a) Using T variable, get a pooled t confidence interval for µ 1 µ 2. (b) A sample on maximum output of moisture (oz) in a controlled chamber of an ultrasonic humidifier (Brand 1) were 14.0, 14.3, 12.2, and 15.1. A sample of the second brand (Brand 2) gave output values 12.1, 13.6, 11.9, and 11.2. Use the pooled t formula from Part (a) to estimate the difference between true average outputs for the two brands with 95% confidence interval. (c) Estimate the difference between the two µ s using the two-sample t interval, and compare it to the interval of Part (b). (P. Vellaisamy: STT 351-Sum17B) Probability & Statistics for Engineers 22 / 26

Solution: (a) Following the usual format for most confidence intervals, a pooled variance confidence interval for the difference between two means is (x y) ± t α/2,m+n 2.S p 1 m + 1 n. (b) The sample means and standard deviations of the two samples are x = 13.90, s 1 = 1.225, y = 12.20, s 2 = 1.010. The pooled variance estimate is ( Sp 2 m ( 1 = )S 21 m + n 2 + n 1 m + n 2 ( 4 ) ( 1 = (1.225) 2 + 4 + 4 2 = 1.260 ) S 2 2 4 1 4 + 4 2 ) (1.010) 2 (P. Vellaisamy: STT 351-Sum17B) Probability & Statistics for Engineers 23 / 26

Hence, S p = 1.1227. With df = m + n 2 = 6, t.025,6 = 2.447. Therefore, the desired interval is 1 (13.90 12.20) ± (2.447)(1.1227) 4 + 1 = 1.7 ± 1.943 = (.24, 3.64). 4 This interval contains 0, so it does not support the conclusion that the two population means are different. (P. Vellaisamy: STT 351-Sum17B) Probability & Statistics for Engineers 24 / 26

(c) Using the two-sample t interval discussed earlier, we find the CI as follows: First, we need to calculate the degrees of freedom. ν = ( 1.225 2 + 1.012 4 4 ( 1.225 2 ( 1.01 2 4 3 ) 2 + 4 3 So, t.025,5 = 2.571. Then the interval is (13.9 12.2) ± 2.571 1.225 2 4 ) 2 ) 2 + 1.012 4 =.3971 = 5.78 5..0686 = 1.7 ±2.571(0.7938) = ( 0.34, 3.74). This interval is slightly wider, but it still supports the same conclusion. (P. Vellaisamy: STT 351-Sum17B) Probability & Statistics for Engineers 25 / 26

Home work Sec 9.1: 5, 7, 11 Sec 9.2: 19, 22, 30 (P. Vellaisamy: STT 351-Sum17B) Probability & Statistics for Engineers 26 / 26