Data Analysis and Statistical Methods Statistics 651

Similar documents
Data Analysis and Statistical Methods Statistics 651

Lecture 11 - Tests of Proportions

Chapter Six: Two Independent Samples Methods 1/51

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651

DETERMINE whether the conditions for performing inference are met. CONSTRUCT and INTERPRET a confidence interval to compare two proportions.

Difference Between Pair Differences v. 2 Samples

10.1. Comparing Two Proportions. Section 10.1

Data Analysis and Statistical Methods Statistics 651

Math 124: Modules Overall Goal. Point Estimations. Interval Estimation. Math 124: Modules Overall Goal.

STAT Chapter 9: Two-Sample Problems. Paired Differences (Section 9.3)

Section Comparing Two Proportions

Introduction to the Analysis of Variance (ANOVA)

Chapter 22. Comparing Two Proportions 1 /29

PubH 5450 Biostatistics I Prof. Carlin. Lecture 13

Sampling Distributions: Central Limit Theorem

Chapter 22. Comparing Two Proportions 1 /30

Bayesian Statistics. Debdeep Pati Florida State University. February 11, 2016

One-sample categorical data: approximate inference

Data Analysis and Statistical Methods Statistics 651

Chapter 9. Hypothesis testing. 9.1 Introduction

Inference for Proportions

Inference for Proportions

STA Module 10 Comparing Two Proportions

Lecture 26 Section 8.4. Wed, Oct 14, 2009

Inferences About Two Population Proportions

Statistics in medicine

Comparing p s Dr. Don Edwards notes (slightly edited and augmented) The Odds for Success

Data Analysis and Statistical Methods Statistics 651

Chapter 10: Comparing Two Populations or Groups

MALLOY PSYCH 3000 Hypothesis Testing PAGE 1. HYPOTHESIS TESTING Psychology 3000 Tom Malloy

Chapter 20 Comparing Groups

LECTURE 5. Introduction to Econometrics. Hypothesis testing

Epidemiology Wonders of Biostatistics Chapter 11 (continued) - probability in a single population. John Koval

Chapter 9. Inferences from Two Samples. Objective. Notation. Section 9.2. Definition. Notation. q = 1 p. Inferences About Two Proportions

Clinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto.

Chapter 10: Comparing Two Populations or Groups

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH

Data Analysis and Statistical Methods Statistics 651

Chapter 10: Comparing Two Populations or Groups

LECTURE 12 CONFIDENCE INTERVAL AND HYPOTHESIS TESTING

Data Analysis and Statistical Methods Statistics 651

Dealing with the assumption of independence between samples - introducing the paired design.

Epidemiology Principles of Biostatistics Chapter 10 - Inferences about two populations. John Koval

STAT 201 Assignment 6

Data Analysis and Statistical Methods Statistics 651

Business Statistics: Lecture 8: Introduction to Estimation & Hypothesis Testing

Introduction to Statistics

A proportion is the fraction of individuals having a particular attribute. Can range from 0 to 1!

Testing adaptive hypotheses What is (an) adaptation? Testing adaptive hypotheses What is (an) adaptation?

The t-statistic. Student s t Test

Two-sample inference: Continuous data

Two-Sample Inferential Statistics

7 Estimation. 7.1 Population and Sample (P.91-92)

Section Inference for a Single Proportion

Practice Questions: Statistics W1111, Fall Solutions

Announcements. Unit 3: Foundations for inference Lecture 3: Decision errors, significance levels, sample size, and power.

Why should you care?? Intellectual curiosity. Gambling. Mathematically the same as the ESP decision problem we discussed in Week 4.

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 9.1-1

Two sample Hypothesis tests in R.

CS 160: Lecture 16. Quantitative Studies. Outline. Random variables and trials. Random variables. Qualitative vs. Quantitative Studies

Hypothesis testing for µ:

Confidence Intervals and Hypothesis Tests

Contingency Tables. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels.

ST3241 Categorical Data Analysis I Two-way Contingency Tables. 2 2 Tables, Relative Risks and Odds Ratios

1 Statistical inference for a population mean

10.4 Hypothesis Testing: Two Independent Samples Proportion

Bias Variance Trade-off

LAB 2. HYPOTHESIS TESTING IN THE BIOLOGICAL SCIENCES- Part 2

Lecture 10: Generalized likelihood ratio test

AP Statistics Ch 12 Inference for Proportions

ECO220Y Review and Introduction to Hypothesis Testing Readings: Chapter 12

First we look at some terms to be used in this section.

Contingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878

Psych 10 / Stats 60, Practice Problem Set 5 (Week 5 Material) Part 1: Power (and building blocks of power)

CBA4 is live in practice mode this week exam mode from Saturday!

Hypothesis testing. Data to decisions

GEOMETRIC -discrete A discrete random variable R counts number of times needed before an event occurs

Statistics 100A Homework 5 Solutions

Chapter 26: Comparing Counts (Chi Square)

Chapter 24. Comparing Means

Data Analysis and Statistical Methods Statistics 651

Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2)

Comparison of Two Population Means

1 and 2 proportions starts here

Chapter 7. Inference for Distributions. Introduction to the Practice of STATISTICS SEVENTH. Moore / McCabe / Craig. Lecture Presentation Slides

Section 5.2. Section Carol performs 100 dichotomous trials and obtains the following data. Extra Exercises; Statistics 301; Professor Wardrop

Introduction to Survey Analysis!

MA : Introductory Probability

Hypothesis tests

CHAPTER 10 Comparing Two Populations or Groups

Hypothesis Testing with Z and T

Harvard University. Rigorous Research in Engineering Education

Two-sample inference: Continuous data

Chapter 9 Inferences from Two Samples

Rama Nada. -Ensherah Mokheemer. 1 P a g e

ECO220Y Hypothesis Testing: Type I and Type II Errors and Power Readings: Chapter 12,

Introducing Statistics for Bioscientists

χ test statistics of 2.5? χ we see that: χ indicate agreement between the two sets of frequencies.

a Sample By:Dr.Hoseyn Falahzadeh 1

Transcription:

Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html Lecture 26 (MWF) Tests and CI based on two proportions Suhasini Subba Rao

Comparing proportions in two populations Just as it is interesting to compare the means in two different populations to see whether and effect has an influence or to see whether the overall behaviour is the same, it is also extremely useful to compare the proportions in two different populations. We will go through some examples and give the analysis. Later we will explain how we did the analysis. 1

The Thai HIV Vaccine Trials In 2006, drug trials in Thailand were done for the vaccine against HIV. number of people infected No affected Sample size HIV vaccine 51 7949 8000 Placebo 72 7928 8000 We want to test the hypothesis that the vaccine gave some protection. This would mean that the proportion of people in the entire population who take the vaccine and go on to develop HIV is less than the proportion of the entire population who do not take the vaccine and go on to develop HIV. Hence we want to test H 0 : p V p P 0 against H A : p V p P < 0 (ie. the people who took the vaccine are at less risk of infection). The data estimates ˆp V = 0.0065 and ˆp P = 0.009 and ˆp V ˆp P = 0.002625, is this slight difference (of 0.26%) statistically significant? 2

The analysis The main point is that the standard error is relatively small, 0.000138 (we calculate this later). This z-value z = 0.002625 0.00138 = 1.90. The area to the LEFT of -1.9 on the z-tables is 2.87%. If we use the significance level of 5% this result is statistically significant, however, if we use 1% as the significance level it s not. 3

Therefore, there is some evidence to suggest that the vaccination may offer some protection against HIV. If want to measure the degree of protection we can construct a CI for the difference 4

The SA HIV Vaccine Trials In 2005, drug trials in SA (amonst males) were done for the vaccine against HIV. number of people infected No affected Sample size HIV vaccine 49 865 914 Placebo 33 889 922 We want to test the hypothesis that the vaccine gave some protection. This would mean that the proportion of people in the entire population who take the vaccine and go on to develop HIV is less than the proportion of the entire population who do not take the vaccine and go on to develop HIV. Hence we want to test H 0 : p V p P 0 against H A : p V p P < 0 (ie. the people who took the vaccine are at less risk of infection). The data estimates ˆp V = 0.053 and ˆp P = 0.035 and ˆp V ˆp P = 0.0178. It is immediately clear that there is no evidence in the data that the vaccine works (in this cohort). 5

The analysis The main point ˆp V ˆp P = 0.0178 is positive and we are looking for a significant negative difference. We wanted to do the test. The z-value is z = 0.0178 0.0096 = 1.848. The area to the LEFT of 1.848 on the z-tables is 96.7%. There is no evidence to against the null. We cannot reject the null. 6

If want to measure the degree of difference we can construct a CI for the difference 7

Hair remedies The FDA approved the drug Minodixil as a remedy for male pattern baldness. They did a study and this is what they found: new hair growth no hair growth Sample size Minodixil 99 211 310 Placebo 60 242 302 Let π M be the probability a person has new hair growth and uses Minodixil and π P be the probability a person has new hair growth and and doesn t use Minodixil. Our estimates are ˆp M = 0.32 and ˆp P = 0.2. Use this data to test H 0 : p M p P 0 against the alternative H A : p M p P > 0. The data gives an estimated difference ˆp M ˆp P = 0.12, this this 12% difference seen in the data statistically significant? 8

The analysis The main point ˆp M ˆp P = 0.12 is positive and we are looking for a significant positive difference. We do a two sampe test on proportions. The z-value is z = 0.12 0.035 = 3.40. The area to the RIGHT of 3.40 on the z-tables is 0.03%. As this is less than 5%, there IS evidence that minidoxil reduces hair loss. 9

If want to measure how much better Minodoxil is over the placebo we can construct the CI Compare the standard errors of the test statistic with those use in constructing the confidence interval and we see that they are different. This is because like the one sample proportion procedures, they are constructed under different conditions. We explain why below. 10

The normality result The p-values and confidence intervals constructed above were derived under the assumption that the estimate for the difference in proportions is normally distributed (ˆp 1 ˆp 2 ) D N ( p 1 p 2, p 1(1 p 1 ) n + p ) 2(1 p 2 ). m The above result holds roughly true if the number of successes and failures in both groups is greater than 5 (this basically ensures the distributions are not too asymmetric about the mean of the distribution - just as in the one sample case). We then can almost plug and chug. There is however, one problem p 1 and p 2 are unknown. 11

These need to be estimated - but the eagle eyed may have noticed that the standard errors are different for testing and the confidence intervals. Importantly, we do not use the t-distribution in any of the calculations, we only use the normal distribution. 12

The standard error for constructing the test Let us return to the hair remedy example. We want to test H 0 : p M p P 0 against H A : p M p P > 0. new hair growth no hair growth Sample size Minodixil 99 211 310 Placebo 60 242 302 Remember we want to calculate the chance of the data giving a difference of 12% (0.12) when placebo and Minodixil have exactly the same effect of hair. The standard error is pm (1 p M ) + p P (1 p P ) 310 302 13

but the placebo and Minidoxil have same effect (p M = p P = p), then pm (1 p M ) + p ( P (1 p P ) 1 = p(1 p) 310 302 310 + 1 ). 302 Now we need to find the best estimator of p. The larger the sample size the the more reliable (and better) and estimator. Since under the null there is NO difference between the placebo or Minodixil we can pool the data. new hair growth no hair growth Sample size Minodixil 99 211 310 Placebo 60 242 302 Pooled 159 453 612 14

If there is no gain from using Minodixil, the proportion of of people who notice a difference by simply massaging the head would be ˆp = 159/612 = 0.260. We use this as our best estimator of p (under the null). The standard error under the null: ( 1 s.e = 0.26 0.74 310 + 1 ) = 0.0354. 302 This is quite important. If we have more information about the data we need to pool it to obtain a better estimator - as we have done above. 15

The standard error for constructing confidence intervals We return to the normality result: ( (ˆp 1 ˆp 2 ) D N p 1 p 2, ( p 1 (1 p 1 ) n + p 2(1 p 2 )) ). m Using this result the 95% CI for the difference in proportions is [ ] p1 (1 p 1 ) p 1 p 2 ± 1.96 + p 2(1 p 2 ). m n Application to hair remedy data: [ pm (1 p M ) 0.12 ± 1.96 310 ] + p P (1 p P ). 302 16

But we do not know p M or p P. Unlike testing we do not make any assumptions about it under the null. Therefore we simply replace p M and p P with their estimators p M = 0.32 and p P = 0.2 to give [ 0.12 ± 1.96 0.32 0.68 310 + ] 0.2 0.8 302 = [0.051, 0.189] = [5.1, 18.9]%. 17

Relative Risk In medical data and other applications the relative risk is often considered. We illustrate this through the HIV example: number of people infected Sample size HIV vaccine 51 8000 0.006375 Placebo 72 8000 0.009 We may ask how much more risk is there is taking the Placebo over the vaccine, this is best measured by the ratio RR = 0.009 0.006375 = 1.4. 18

Ie. The data suggests that you are 1.4 times more likely to develop HIV if you don t take the vaccine than if you do. However, caution needs to be used when interpreting 1.4. 1.4 has been calculated from the sample. We want to relative risk based on the population. Therefore confidence intervals need to obtained for the population RR. If these were constructed, one found that the interval is wide with the left hand side end overlapping 1. This would be mean we have to be very cautious about interpreting the factor 1.4 as a gain in using a vaccine. 19