Lecture 9 Two-Sample Test. Fall 2013 Prof. Yao Xie, H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech

Similar documents
Tables Table A Table B Table C Table D Table E 675

CBA4 is live in practice mode this week exam mode from Saturday!

1 Statistical inference for a population mean

Statistical Inference

STA 101 Final Review

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

STATISTICS 141 Final Review

Unit5: Inferenceforcategoricaldata. 4. MT2 Review. Sta Fall Duke University, Department of Statistical Science

Summary: the confidence interval for the mean (σ 2 known) with gaussian assumption

Hypothesis for Means and Proportions

Stat 231 Exam 2 Fall 2013

Performance Evaluation and Comparison

POLI 443 Applied Political Research

Section 9.4. Notation. Requirements. Definition. Inferences About Two Means (Matched Pairs) Examples

CHAPTER 9, 10. Similar to a courtroom trial. In trying a person for a crime, the jury needs to decide between one of two possibilities:

A3. Statistical Inference Hypothesis Testing for General Population Parameters

Inference for Regression

Ch 8: Inference for two samples

(a) The density histogram above right represents a particular sample of n = 40 practice shots. Answer each of the following. Show all work.

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

Sociology 6Z03 Review II

Hypothesis Testing Problem. TMS-062: Lecture 5 Hypotheses Testing. Alternative Hypotheses. Test Statistic

Review. December 4 th, Review

Exam Empirical Methods VU University Amsterdam, Faculty of Exact Sciences h, February 12, 2015

Chapter 10: Inferences based on two samples

Exam 2 (KEY) July 20, 2009

PubH 5450 Biostatistics I Prof. Carlin. Lecture 13

Hypothesis Tests and Estimation for Population Variances. Copyright 2014 Pearson Education, Inc.

Sample size re-estimation in clinical trials. Dealing with those unknowns. Chris Jennison. University of Kyoto, January 2018

HYPOTHESIS TESTING II TESTS ON MEANS. Sorana D. Bolboacă

16.3 One-Way ANOVA: The Procedure

TUTORIAL 8 SOLUTIONS #

SEVERAL μs AND MEDIANS: MORE ISSUES. Business Statistics

Chapter 9. Hypothesis testing. 9.1 Introduction

HYPOTHESIS TESTING. Hypothesis Testing

Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.

Midterm 1 and 2 results

Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing

Summary of Chapters 7-9

Chapter 5: HYPOTHESIS TESTING

8.1-4 Test of Hypotheses Based on a Single Sample

Class 24. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

We need to define some concepts that are used in experiments.

MEI STRUCTURED MATHEMATICS STATISTICS 2, S2. Practice Paper S2-B

Part III: Unstructured Data

Stat 135, Fall 2006 A. Adhikari HOMEWORK 6 SOLUTIONS

Chapter 9. Inferences from Two Samples. Objective. Notation. Section 9.2. Definition. Notation. q = 1 p. Inferences About Two Proportions

GROUPED DATA E.G. FOR SAMPLE OF RAW DATA (E.G. 4, 12, 7, 5, MEAN G x / n STANDARD DEVIATION MEDIAN AND QUARTILES STANDARD DEVIATION

Inference About Two Means: Independent Samples

Ch. 7. One sample hypothesis tests for µ and σ

McGill University. Faculty of Science. Department of Mathematics and Statistics. Part A Examination. Statistics: Theory Paper

Dover- Sherborn High School Mathematics Curriculum Probability and Statistics

Section 9.5. Testing the Difference Between Two Variances. Bluman, Chapter 9 1

Soc3811 Second Midterm Exam

Population Variance. Concepts from previous lectures. HUMBEHV 3HB3 one-sample t-tests. Week 8

Sample Size and Power I: Binary Outcomes. James Ware, PhD Harvard School of Public Health Boston, MA

Classroom Activity 7 Math 113 Name : 10 pts Intro to Applied Stats

6.4 Type I and Type II Errors

Tests for Population Proportion(s)

The Components of a Statistical Hypothesis Testing Problem

Econ 325: Introduction to Empirical Economics

STA2601. Tutorial letter 203/2/2017. Applied Statistics II. Semester 2. Department of Statistics STA2601/203/2/2017. Solutions to Assignment 03

+ Specify 1 tail / 2 tail

WISE International Masters

Lecture 4: Statistical Hypothesis Testing

Sampling Distributions: Central Limit Theorem

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

Test 3 Practice Test A. NOTE: Ignore Q10 (not covered)

Two-Sample Inference for Proportions and Inference for Linear Regression

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

Announcements. Final Review: Units 1-7

Topic 22 Analysis of Variance

Chapter 9 Inferences from Two Samples

Medical statistics part I, autumn 2010: One sample test of hypothesis

Chapter Six: Two Independent Samples Methods 1/51

Natural Language Processing

16.400/453J Human Factors Engineering. Design of Experiments II

ANOVA: Comparing More Than Two Means

ECO220Y Review and Introduction to Hypothesis Testing Readings: Chapter 12

This paper is not to be removed from the Examination Halls

BIOS 6222: Biostatistics II. Outline. Course Presentation. Course Presentation. Review of Basic Concepts. Why Nonparametrics.

ME3620. Theory of Engineering Experimentation. Spring Chapter IV. Decision Making for a Single Sample. Chapter IV

INTERVAL ESTIMATION AND HYPOTHESES TESTING

Notes for Week 13 Analysis of Variance (ANOVA) continued WEEK 13 page 1

Inference for Proportions, Variance and Standard Deviation

1 Hypothesis testing for a single mean

We know from STAT.1030 that the relevant test statistic for equality of proportions is:

Chapter 7 Comparison of two independent samples

Chapter 7. Inference for Distributions. Introduction to the Practice of STATISTICS SEVENTH. Moore / McCabe / Craig. Lecture Presentation Slides

Discrete Probability distribution Discrete Probability distribution

Hypothesis Testing One Sample Tests

Relating Graph to Matlab

Visual interpretation with normal approximation

Outline. PubH 5450 Biostatistics I Prof. Carlin. Confidence Interval for the Mean. Part I. Reviews

Lecture 15: Inference Based on Two Samples

A proportion is the fraction of individuals having a particular attribute. Can range from 0 to 1!

Inference for Distributions Inference for the Mean of a Population

7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between

Lecture 18: Analysis of variance: ANOVA

Transcription:

Lecture 9 Two-Sample Test Fall 2013 Prof. Yao Xie, yao.xie@isye.gatech.edu H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech

Computer exam 1 18 Histogram 14 Frequency 9 5 0 75 83.33333333 91.66666667 More Bin mean 89.90 std 6.02 median 90

Midterm 2 Cover Confidence interval One sided and two sided confidence intervals Hypothesis testing Two approaches Fixed significance level p- value Can bring a 1- page 1- sided cheat sheet Make- up lecture on Friday Nov. 8: tentatively noon- 1:20pm in the area in front of my office, Groseclose #339

Outline Test difference in the mean Known variance Unknown variance Test difference in sample proportion Test difference in variance

Motivating Example Safety of drinking water (Arizona Republic, May 27, 2001) Water sampled from 10 communities in Pheonix And 10 communities from rural Arizona Arsenic concentration (AC): determines water quality, ranges from 3 ppb to 48 ppb Is there a difference in AC between these two areas? If the difference is large enough?

Formulate into statistical method Answered by statistical methods Pheonix μ 1 rural Arizona μ 2 Whether or not there is a difference between in mean AC level, μ 1 and μ 2, in these two areas? Equivalent to: test whether μ 1 - μ 2 is different from 0?

In general: comparing two populations Comparing two population means is often the way used to prove one population is different or better than another Competing Companies / Products Treatment vs. No Treatment New method vs. Old method

Test difference in the mean

Test difference in mean, variance known Solve the following hypothesis test H 0 : µ 1 µ 2 = Δ H 1 : µ 1 µ 2 Δ Assumptions for two sample inference

Test statistics A reasonable estimator for μ 1 - μ 2 is Under H 0, its mean is Δ Its variance is Detection statistic σ 1 2 X 1 X 2 2 n 1 + σ 2 n 2 Z = X 1 X 2 Δ σ 1 2 2 n 1 + σ 2 n 2

Detection for two sample difference For given significance level: Reject H 0 when Z > b Z = X 1 X 2 Δ σ 1 2 n 1 + σ 2 And decide threshold b for that given significance level 2 n 2

p-value Probability of observing sample difference even more extreme, under H 0 P( Z > z )= 1 Φ( z ) 0 0

Example: paint drying time

Solution test difference in mean drying time H 1 : µ 1 µ 2 > Δ Δ 0 H 0 : µ 1 = µ 2 H 0 : µ 1 µ 2 = Δ H 1 : µ 1 > µ 2 form test statistic Z = X 1 X 2 σ 2 1 + σ 2 2 n 1 n 2

Fixed significance level approach Reject H 0 when Calculate: α = 0.05 Z = X 1 X 2 σ 2 1 + σ 2 2 n 1 z 0.05 = 1.65 n 2 > z α x 1 x 2 σ 1 2 2 n 1 + σ 2 n 2 > 1.65 Reject H 0

Calculate p-value Compute p- value: Value of the statistic from data p- value: P( Z > z 0 )= 1 Φ( z 0 )= 1 Φ( 2.52)= 0.0059 Reject H 0 since its value is less than 0.01

Outline Test difference in the mean Known variance Unknown variance Test difference in sample proportion Test difference in variance

Case 2: test difference in mean, variance unknown, true variance equal Solve the following hypothesis test Variances are equal but unknown, so we pool the samples to estimate the variance H 0 : µ 1 µ 2 = Δ H 1 : µ 1 µ 2 Δ S 2 = (n 1 1)S 2 + (n 1)S 2 1 2 2 p n + n 2 1 2 S 1 and S 2 are sample variances S p 2 (n 1 + n 2 2) σ 2 ~ χ n1 +n 2 2

Use the following as the test statistics X 1 X 2 Δ S p 1/ n 1 +1/ n 2 ~ t n1 +n 2 2 For the following hypothesis test H 0 : µ 1 µ 2 = Δ H 1 : µ 1 µ 2 Δ Reject H 0 when X Y (µ 1 µ 2 ) S p 1/ n 1 +1/ n 2 > t α /2 19

Example α = 0.05 n 1 = 10 x 1 = 28 S 1 2 = 4 n 2 = 10 x 2 = 26 S 2 2 = 5 Assume true variance equal Test Statistic: t = x-y S 1 p n1 + 1 n 2 S 2 p = S 2 1 (n 1 1) + S 2 1 (n 2 1) n 1 + n 2 2 = 4(9) + 5(9) 18 = 4.5

S p 2 = 4.5 Recall degrees of freedom here is n + m 2 = 18 Threshold: t 18,0.025 = 2.101 t = 28-26 4.5 1/10 +1/10 = 2.11>2.101 Weakly reject H 0 Calculate p-value p value = P( T > 2.11)=2P(T > 2.11) = 2 0.0491=0.0982

Outline Test difference in the mean Known variance Unknown variance Test difference in sample proportion Test difference in variance

Formulation Two binomial parameters of interests Two independent random samples are taken from 2 populations Estimation of sample proportion X ~ Bin(n 1, p 1 ), Y ~ Bin(n 2, p 2 ) ˆp 1 = X n 1, ˆp 2 = Y n 2 H 0 : p 1 = p 2 H 1 : p 1 p 2

Test statistics Z = p ( 1 1 p ) 1 Pooled estimate ˆp 1 ˆp 2 ( ) n 1 + p 2 1 p 2 Estimate the test statistic: ˆp 1 ˆp 2 n 2 ˆp = X 1 + X 2 n 1 + n 2 ˆp 1 ˆp " ( ) 1 n + 1 # $ n 1 2 % & '

Two-sided test Z = ˆp 1 ˆp 2 ( p 1 p ) 2 p ( 1 1 p ) 1 For two-sided test, ( ) n 1 + p 2 1 p 2 n 2 H 0 : p 1 = p 2 reject H 0 when H 1 : p 1 p 2 ˆp 1 ˆp ˆp 1 ˆp 2 " ( ) 1 n + 1 # $ n 1 2 % & ' > z α /2

Test statistics and one-sided test H 0 : p 1 = p 2 H 1 : p 1 < p 2 Reject H 0 when H 0 : p 1 = p 2 H 1 : p 1 > p 2 Reject H 0 when ˆp 1 ˆp 2 ˆp 1 ˆp 2 ˆp 1 ˆp " ( ) 1 n + 1 # $ n 1 2 % & ' < z α ˆp 1 ˆp " ( ) 1 n + 1 # $ n 1 2 % & ' > z α

Comparing 2 population proportions: Example A new drug is being compared to a standard using 200 clinical trials (100 patients for each group). For the new drug, 83 of 100 patients improved. For the standard, 72 of 100 improved. Is the new drug statistically superior? Standard drug X ~ Bin(100, p 1 ) New drug Y ~ Bin(100, p 2 )

Fixed significance level approach H : p = p 0 1 2 H : p < p 1 1 2 X 1 = 72, X 2 = 83 n 1 = n 2 = 100 ˆp 1 = 0.72, ˆp 2 = 0.83 z 0.05 = 1.65 ˆp 1 ˆp ˆp 1 ˆp 2 " ( ) 1 n + 1 # $ n 1 2 % & ' = 1.7323 < 1.65 Reject H 0

p-value p- value P(Z < 1.7323) = 0.0418 Less than α = 0.05, reject H 0 Reject H 0, with p- value 0.0418

Outline Test difference in the mean Known variance Unknown variance Test difference in sample proportion Test difference in variance

Test difference in variance two independent normal populations means and variances of the two normals are unknown test whether or not two variances are the same H 0 : 2 1 2 2 H 1 : 2 1 2 2 31

Test based on sample variance ratio Test statistics: ratio of two sample variances F = S 2 1 2 S 2 Need to introduce F distribution Let W and Y be independent chi-square 1 2 a ba b random variables with u and v degrees of freedom, respectively. Then the ratio 1 2 F W 1 2 a b a b c a b d u (10-28) Y v is said to follow the F distribution with u degrees of freedom in the numerator v degrees of freedom in the denominator. It is usually abbreviated as F u,v. 1 2 a ba b 1 2 a b a b c a b d 1 2 32

F distribution A continuous distribution mean = we should reject H 0 when the statistic is large

Sample distribution Under H 0 the detection statistic 2 χ n1 1 F = S 2 1 S = (n 1 1)S 2 2 1 /σ 1 / (n 1 1) 2 2 (n 2 1)S 2 2 2 /σ 2 / (n 2 1) ( σ 2 2 1 = σ ) 2 has are indepe F n1 1,n 2 1 d 2 χ n2 1 distribution 34

Form of test Null hypothesis: H 0 : 2 1 2 2 Test statistic: F 0 S2 1 (10-31) S 2 2 Alternative Hypotheses H 1 : 2 1 2 2 H 1 : 2 1 2 2 H 1 : 2 1 2 2 Rejection Criterion f 0 f 2,n 1 1,n 2 1 or f 0 f 1 2,n 1 1,n 2 1 f 0 f,n1 1,n 2 1 f 0 f 1, n1 1,n 2 1 f (x) 2 n 1 f (x) 2 n 1 f (x) 2 n 1 α /2 α /2 α α 0 2 1 α /2, n 1 (a) 2 α /2, n 1 x 0 (b) 2 α, n 1 x 0 2 1 α, n 1 Figure 10-6 The F distribution for the test of with critical region values for (a), (b) H 1 : 2 2 H 1 : 2 2 H 0 : 2 2, and (c) H 1 : 1 2 1 2 1 2 1 2 2 2. (c) 35 x

Example: Semiconductor etch variability variability in oxide layer of semiconductor is a critical characteristic of the semiconductor two kind of semiconductors, sample standard deviation s 1 = 1.96 s 2 = 2.13 n 1 = n 2 = 16 α = 0.05 test: whether or not their variances are the same 36

1. Parameter of interest: The parameter of interest are the variances of oxide thickness 2 1 and 2 2. We will assume that oxide thickness is a normal random variable for both gas mixtures. 2. Null hypothesis: H 0 : 2 1 2 2 3. Alternative hypothesis: H 1 : 2 1 2 2 4. Test statistic: The test statistic is given by equation 10-31: f 0 s2 1 s 2 2 6. Reject H 0 if : Because n 1 n 2 16 and 0.05, we will reject H 0 : 2 1 2 2 if f 0 f 0.025,15,15 2.86 or if f 0 f 0.975,15,15 1 f 0.025,15,15 1 2.86 0.35. 37

7. Computations: Because s 2 1 (1.96) 2 3.84 and s 2 2 (2.13) 2 4.54, the test statistic is f 0 s2 1 s 2 2 3.84 4.54 0.85 8. Conclusions: Because f 0.975,15,15 0.35 0.85 f 0.025,15,15 2.86, we cannot reject the null hypothesis H 0 : 2 1 2 2 at the 0.05 level of significance. 38

p-value Observe test statistic more extreme than what we got Alternative Hypotheses H 1 : 2 1 2 2 H 1 : 2 1 2 2 H 1 : 2 1 2 2 calculate using R command p <- pf(x,d1,d2) p- value 2Ρ F > f 0 ( ) or 2Ρ F < f 0 ( ) ( ) Ρ F > f 0 Ρ F < f 0 ( ), depends on f 0 fall in upper or lower tail 39

Back to semiconductor example computed value of the test statistic in this example is f 0 0.85. P(F 15,15 0.85) 0.3785 a p- value 2(0.3785) 0.7570. calculate using R command 40