Comparing two independent samples

Similar documents
Paired comparisons. We assume that

Solution: First note that the power function of the test is given as follows,

Hypothesis Testing One Sample Tests

(a) (3 points) Construct a 95% confidence interval for β 2 in Equation 1.

Chapter 8 - Statistical intervals for a single sample

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

Applied Multivariate and Longitudinal Data Analysis

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given.

Lecture 3. Inference about multivariate normal distribution

Institute of Actuaries of India

Central Limit Theorem ( 5.3)

Classical Inference for Gaussian Linear Models

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

parameter space Θ, depending only on X, such that Note: it is not θ that is random, but the set C(X).

Asymptotic Statistics-VI. Changliang Zou

McGill University. Faculty of Science. Department of Mathematics and Statistics. Part A Examination. Statistics: Theory Paper

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

Chapter 9: Hypothesis Testing Sections

Statistics & Data Sciences: First Year Prelim Exam May 2018

Chapter 10: Inferences based on two samples

2.6.3 Generalized likelihood ratio tests

Statistics Ph.D. Qualifying Exam: Part II November 9, 2002

Lecture 15. Hypothesis testing in the linear model

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

You can compute the maximum likelihood estimate for the correlation

COMPARING GROUPS PART 1CONTINUOUS DATA

Hypothesis Testing for Var-Cov Components

Frequentist Statistics and Hypothesis Testing Spring

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

Cherry Blossom run (1) The credit union Cherry Blossom Run is a 10 mile race that takes place every year in D.C. In 2009 there were participants

Hypothesis testing for µ:

The Delta Method and Applications

Linear Models and Estimation by Least Squares

Section 9.4. Notation. Requirements. Definition. Inferences About Two Means (Matched Pairs) Examples

UNIVERSITY OF TORONTO Faculty of Arts and Science

Comparison of Two Samples

Statistics. Statistics

Sampling Distributions: Central Limit Theorem

Comparison of Two Population Means

Confidence Intervals for Normal Data Spring 2014

Swarthmore Honors Exam 2012: Statistics

The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80

Ch 2: Simple Linear Regression

Simple and Multiple Linear Regression

MATH 728 Homework 3. Oleksandr Pavlenko

Summary of Chapters 7-9

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

Stat 704 Data Analysis I Probability Review

i=1 X i/n i=1 (X i X) 2 /(n 1). Find the constant c so that the statistic c(x X n+1 )/S has a t-distribution. If n = 8, determine k such that

6 Sample Size Calculations

Economics 583: Econometric Theory I A Primer on Asymptotics: Hypothesis Testing

Power of a test. Hypothesis testing

Stat 5102 Final Exam May 14, 2015

Chapters 9. Properties of Point Estimators

Ch. 7. One sample hypothesis tests for µ and σ

Module 17: Two-Sample t-tests, with equal variances for the two populations

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

Outline. PubH 5450 Biostatistics I Prof. Carlin. Confidence Interval for the Mean. Part I. Reviews

Inferences about a Mean Vector

Wald s theorem and the Asimov data set

Problem 1 (20) Log-normal. f(x) Cauchy

Topic 19 Extensions on the Likelihood Ratio

ST495: Survival Analysis: Hypothesis testing and confidence intervals

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood

Econ 583 Final Exam Fall 2008

Epidemiology Principles of Biostatistics Chapter 10 - Inferences about two populations. John Koval

I i=1 1 I(J 1) j=1 (Y ij Ȳi ) 2. j=1 (Y j Ȳ )2 ] = 2n( is the two-sample t-test statistic.

Math 494: Mathematical Statistics

STT 843 Key to Homework 1 Spring 2018

Econ 583 Homework 7 Suggested Solutions: Wald, LM and LR based on GMM and MLE

Confidence Intervals with σ unknown

Statistical Inference

Lecture 28: Asymptotic confidence sets

Multivariate Statistical Analysis

Math 494: Mathematical Statistics

MVE055/MSG Lecture 8

MAS361. MAS361 1 Turn Over SCHOOL OF MATHEMATICS AND STATISTICS. Medical Statistics

Interval estimation. October 3, Basic ideas CLT and CI CI for a population mean CI for a population proportion CI for a Normal mean

Parameter Estimation and Fitting to Data

Product Held at Accelerated Stability Conditions. José G. Ramírez, PhD Amgen Global Quality Engineering 6/6/2013

Hypothesis Testing Chap 10p460

M(t) = 1 t. (1 t), 6 M (0) = 20 P (95. X i 110) i=1

Basic Concepts of Inference

Post-exam 2 practice questions 18.05, Spring 2014

Linear models and their mathematical foundations: Simple linear regression

Inference in Regression Analysis

MAT3379 (Winter 2016)

Hypothesis testing: Steps

Econometrics A. Simple linear model (2) Keio University, Faculty of Economics. Simon Clinet (Keio University) Econometrics A October 16, / 11

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER. 21 June :45 11:45

Comparing Two Variances. CI For Variance Ratio

Outline. Topic 19 - Inference. The Cell Means Model. Estimates. Inference for Means Differences in cell means Contrasts. STAT Fall 2013

2014/2015 Smester II ST5224 Final Exam Solution

MS&E 226: Small Data

On Assumptions. On Assumptions

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

Direction: This test is worth 250 points and each problem worth points. DO ANY SIX

Course information: Instructor: Tim Hanson, Leconte 219C, phone Office hours: Tuesday/Thursday 11-12, Wednesday 10-12, and by appointment.

Transcription:

In many applications it is necessary to compare two competing methods (for example, to compare treatment effects of a standard drug and an experimental drug). To compare two methods from statistical point of view: Select a statistical model for each method Collect a sample of data for each method Estimate parameters of the two models The difference between the two models can be quantified by contrasting the estimated parameters Standard techniques for hypothesis testing can be adapted to compare data for two samples.

Assume that we have two samples, A and B: Y1 A,..., Yn A A i.i.d. f Y ( θ A ) and Y1 B,..., Yn B B i.i.d. f Y ( θ B ). To compare these two samples, we can define δ A,B = θ A θ B. We can use some estimators of θ A and θ B, θ A and θ B, to define δ A,B = θ A θ B. The variance of δ A,B is Var( δ A,B ) = Var( θ A ) + Var( θ B ) 2Cov( θ A, θ B ).

If two samples are independent of each other, then Cov( θ A, θ B ) = 0 and Var( δ A,B ) = Var( θ A ) + Var( θ B ). In this case, θ A and θ B can be maximum likelihood estimators of θ A and θ B, and asymptotic normality of θ A and θ B implies asymptotic normality of δ A,B. If there is dependence across samples, the covariance term should be included. For example, when comparing treatment effects of the two drugs, A and B, the same patient is administered drug A for the first month and then drug B for the second month.

Example 1 (faults on data lines): Assume we collect data on the number of faults for lines of length 22 km (sample A) and for lines of length 170 km (sample B). The sample size n A = 40 and n B = 17 lines and the total number of faults for each line is 40 i=1 y i A = 10 and 17 i=1 y i B = 41. Probability model: Yi A ML estimators in the Poisson model: Pois(λ A ) and Y B i Pois(λ B ) λ A = 1 na n A i=1 Y i A, λb = 1 nb n B i=1 Y i B ML estimates (for observed data): λ A = 1 40 40 i=1 y A i = 0.25, λb = 1 17 17 i=1 y B i = 2.41 Estimates of the variance of MLE: Var( λ A ) = λ A 40 = 0.00625, Var( λ B ) = λ B 17 = 0.14176

Example 1 (faults on data lines, contd): Assume we collect data on the number of faults for lines of length 22 km (sample A) and for lines of length 170 km (sample B). Properties of δ A,B = θ A θ B : E( δ A,B ) = E( θ A ) E( θ B ) = θ A θ B = δ A,B Var( δ A,B ) = Var( θ A ) + Var( θ B ) Var( δ A,B ) = Var( θ A ) + Var( θ B ) = 0.00625 + 0.14176 = 0.14801 Exact distribution of δ A,B : δ A,B P A n A P B n B, where P A Pois(n A θ A ), P B Pois(n B θ B )

Example 1 (faults on data lines, contd): Assume we collect data on the number of faults for lines of length 22 km (sample A) and for lines of length 170 km (sample B). The approximate distribution of δ A,B is normal and δ A,B N(δ A,B, Var( δ A,B )). The approximate 100(1 α)% CI for δ A,B is ( δ A,B z 1 0.5α Var( δ A,B ), δ ) A,B + z 1 0.5α Var( δ A,B ). With α = 0.05, z 1 0.5α = 1.96 and the 95% CI is 2.16 ± 1.96 0.385 = (1.41, 2.91). This interval does not contain zero and therefore we are 95% confident that 170km lines have more faults per line that 22km lines.

: CLICKER QUESTION 1 Example 1 (faults on data lines, contd): Assume we collect data on the number of faults for lines of length 22 km (sample A) and for lines of length 170 km (sample B). Assume now we are interested in comparing the number of faults per km for the two samples. What quantity do we need to estimate? A λ B /λ A B λ B λ A C λ B /170 + λ A /22 D λ B /170 λ A /22 E None of the above

Example 1 (faults on data lines, contd): Assume we collect data on the number of faults for lines of length 22 km (sample A) and for lines of length 170 km (sample B). To compare the number of faults per km for the two samples, we consider δ A,B = λ B/170 λ A /22: δ A,B = 2.41 170 0.25 22 = 0.01419 0.01136 = 0.00283 (fault/km). Var( δ A,B ) = 1 170 2 Var( θ B ) + 1 22 2 Var( θ A ) = 1.7818 10 5. The 95% CI for δ A,B is ( δ A,B ± 1.96 Var( δ A,B ) ) = ( 0.0054, 0.0111) (fault/km). We cannot reject H 0 : δ A,B = 0 in favor of H a : δ A,B 0.

Comparing the means of two normal distributions Assume that we have two independent normal samples Z A 1,..., Z A n A i.i.d. N(µ A, σ 2 A ) and Z B 1,..., Z B n B i.i.d. N(µ B, σ 2 B ). Let δ A,B = µ A µ B. Define δ A,B = Z A Z B. It follows that δ A,B N ( ) δ A,B, σ2 A + σ2 B, n A n B δ A,B δ A,B σ 2 A n A + σ2 B n B N(0, 1). Assuming σa 2 and σ2 B are known, the 100(1 α)% CI for δ A,B is σa 2 δa,b z 1 0.5α + σ2 B < δ A,B < n A n δ σa 2 A,B + z 1 0.5α + σ2 B. B n A n B

Comparing the means of two normal distributions Assume now that σ 2 A = σ2 B = σ2 and σ 2 is unknown: δ A,B N ) (δ A,B, σ2 + σ2, n A n B δ A,B δ A,B N(0, 1). σ 1 n A + 1 n B To estimate σ 2, we use the pooled sample variance: where S 2 A = 1 n A 1 Sp 2 = (n A 1)SA 2 + (n B 1)SB 2, n A 1 + n B 1 n A i=1 (Z A i Z A ) 2, S 2 B = 1 n B 1 n B i=1 (Z B i Z B ) 2.

Comparing the means of two normal distributions We know that if the two normal samples are independent, then (n A 1)S 2 A /σ2 χ 2 n A 1, (n B 1)S 2 B /σ2 χ 2 n B 1 and This implies that [(n A 1)S 2 A + (n B 1)S 2 B ]/σ2 χ 2 n A 1+n B 1. δ A,B δ A,B S p 1 n A + 1 n B t na +n B 2 and (with n = n A + n B 2) the 100(1 α)% CI for δ A,B is ( 1 δ A,B t 1 0.5α,n s p + 1 < δ A,B < n A n δ 1 A,B + t 1 0.5α,n s p + 1 ). B n A n B

Comparing the means of two normal distributions If σ 2 A σ2 B and the variances are unknown, then δ A,B δ A,B t S 2 na,b, (approximate t-distribution) A n A + S2 B n B where n A,B can be obtained using a messy formula. If the difference between σa 2 and σ2 B is not too large, one can assume σa 2 = σ2 B = σ2 and use the pooled sample variance to estimate σ 2 as shown before.

Example 2: Assume that σa 2 = σ2 B = 1 and the observed data are 1.87, 0.95, 0.36, 0.84 (sample A) and 2.72, 1.52, 3.81 (sample B). We find z A = 1.01, z B = 2.68 and δ A,B = 1.67. If the variance is known, then the 95% CI for δ A,B is ( ) 1.67 ± z 0.975 1 3 + 1 4 = ( 3.17, 0.17). If the variance is unknown, we calculate s 2 A = 0.40 and s2 B = 1.31 and the pooled sample variance s 2 p = (4 1)s2 A + (3 1)s2 B 4 1 + 3 1 = 0.76, s p = 0.87.

: CLICKER QUESTION 2 Example 2: Assume that σa 2 = σ2 B = 1 and the observed data are 1.87, 0.95, 0.36, 0.84 (sample A) and 2.72, 1.52, 3.81 (sample B). Quantiles of what distribution should we use to construct CI for δ A,B if the variance is unknown? A t 6 B t 4 C t 2 D t 1 E None of the above

Example 2: Assume that σa 2 = σ2 B = 1 and the observed data are 1.87, 0.95, 0.36, 0.84 (sample A) and 2.72, 1.52, 3.81 (sample B). We use the t-distribution with 5 degrees of freedom to construct the 95% CI for δ A,B : ( ) 1.67 ± t 0.975,5 s p 1 3 + 1 4 = ( 3.38, 0.04). This CI is wider because the variance is unknown and has to be estimated from the data.