Lecture 11 - Tests of Proportions

Similar documents
Agreement of CI and HT. Lecture 13 - Tests of Proportions. Example - Waiting Times

Occupy movement - Duke edition. Lecture 14: Large sample inference for proportions. Exploratory analysis. Another poll on the movement

STA 101 Final Review

Comparing Means from Two-Sample

Difference Between Pair Differences v. 2 Samples

Inference for Single Proportions and Means T.Scofield

DETERMINE whether the conditions for performing inference are met. CONSTRUCT and INTERPRET a confidence interval to compare two proportions.

AP Statistics Ch 12 Inference for Proportions

CHAPTER 10 Comparing Two Populations or Groups

Unit5: Inferenceforcategoricaldata. 4. MT2 Review. Sta Fall Duke University, Department of Statistical Science

One-sample categorical data: approximate inference

Do students sleep the recommended 8 hours a night on average?

10.1. Comparing Two Proportions. Section 10.1

Lecture #16 Thursday, October 13, 2016 Textbook: Sections 9.3, 9.4, 10.1, 10.2

Inference for Proportions

Inference for Proportions

Introduction to Survey Analysis!

Chapter 22. Comparing Two Proportions. Bin Zou STAT 141 University of Alberta Winter / 15

Harvard University. Rigorous Research in Engineering Education

Annoucements. MT2 - Review. one variable. two variables

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Data Analysis and Statistical Methods Statistics 651

1 Statistical inference for a population mean

Topic 3: Sampling Distributions, Confidence Intervals & Hypothesis Testing. Road Map Sampling Distributions, Confidence Intervals & Hypothesis Testing

Chapter 10: Comparing Two Populations or Groups

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

Chapter 9 Inferences from Two Samples

Announcements. Unit 3: Foundations for inference Lecture 3: Decision errors, significance levels, sample size, and power.

Chapter 12 - Lecture 2 Inferences about regression coefficient

Chapter 24. Comparing Means. Copyright 2010 Pearson Education, Inc.

PubH 5450 Biostatistics I Prof. Carlin. Lecture 13

Two-Sample Inference for Proportions and Inference for Linear Regression

STAT Chapter 9: Two-Sample Problems. Paired Differences (Section 9.3)

Chapter 10: Comparing Two Populations or Groups

Chapter 8: Sampling Variability and Sampling Distributions

Announcements. Final Review: Units 1-7

Chapter 10: Comparing Two Populations or Groups

Lecture 10: Generalized likelihood ratio test

Hypothesis Testing and Confidence Intervals (Part 2): Cohen s d, Logic of Testing, and Confidence Intervals

Chapter 7. Inference for Distributions. Introduction to the Practice of STATISTICS SEVENTH. Moore / McCabe / Craig. Lecture Presentation Slides

Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.

Chapter 6. Estimates and Sample Sizes

Student s t-distribution. The t-distribution, t-tests, & Measures of Effect Size

10.4 Hypothesis Testing: Two Independent Samples Proportion

Math 124: Modules Overall Goal. Point Estimations. Interval Estimation. Math 124: Modules Overall Goal.

Data Analysis and Statistical Methods Statistics 651

Inference for Proportions, Variance and Standard Deviation

A proportion is the fraction of individuals having a particular attribute. Can range from 0 to 1!

Business Statistics. Lecture 10: Course Review

STA Module 10 Comparing Two Proportions

Midterm 1 and 2 results

Business Statistics: Lecture 8: Introduction to Estimation & Hypothesis Testing

Two-sample inference: Continuous data

Example - Alfalfa (11.6.1) Lecture 16 - ANOVA cont. Alfalfa Hypotheses. Treatment Effect

Announcements. Lecture 5: Probability. Dangling threads from last week: Mean vs. median. Dangling threads from last week: Sampling bias

BIO5312 Biostatistics Lecture 6: Statistical hypothesis testings

Two-sample inference: Continuous data

Lecture 1: Probability Fundamentals

Statistical Inference for Means

PHP2510: Principles of Biostatistics & Data Analysis. Lecture X: Hypothesis testing. PHP 2510 Lec 10: Hypothesis testing 1

QUEEN S UNIVERSITY FINAL EXAMINATION FACULTY OF ARTS AND SCIENCE DEPARTMENT OF ECONOMICS APRIL 2018

Lecture 10A: Chapter 8, Section 1 Sampling Distributions: Proportions

Statistical inference provides methods for drawing conclusions about a population from sample data.

Extra Exam Empirical Methods VU University Amsterdam, Faculty of Exact Sciences , July 2, 2015

Confidence Intervals and Hypothesis Tests

LECTURE 12 CONFIDENCE INTERVAL AND HYPOTHESIS TESTING

Section Inference for a Single Proportion

STAT Chapter 8: Hypothesis Tests

Inferences About Two Proportions

appstats27.notebook April 06, 2017

Tests for Population Proportion(s)

ECO220Y Review and Introduction to Hypothesis Testing Readings: Chapter 12

Math 141. Lecture 10: Confidence Intervals. Albyn Jones 1. jones/courses/ Library 304. Albyn Jones Math 141

Confidence Intervals for Population Mean

Hypotheses. Poll. (a) H 0 : µ 6th = µ 13th H A : µ 6th µ 13th (b) H 0 : p 6th = p 13th H A : p 6th p 13th (c) H 0 : µ diff = 0

Hypothesis Testing Problem. TMS-062: Lecture 5 Hypotheses Testing. Alternative Hypotheses. Test Statistic

Lecture 9 Two-Sample Test. Fall 2013 Prof. Yao Xie, H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech

Chapter 26: Comparing Counts (Chi Square)

Chapter 27 Summary Inferences for Regression

Section 9.4. Notation. Requirements. Definition. Inferences About Two Means (Matched Pairs) Examples

Statistics 251: Statistical Methods

Random processes. Lecture 17: Probability, Part 1. Probability. Law of large numbers

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary

Class 24. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 9.1-1

Announcements. Unit 5: Inference for Categorical Data Lecture 1: Inference for a single proportion

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

MAT2377. Rafa l Kulik. Version 2015/November/23. Rafa l Kulik

6.4 Type I and Type II Errors

1 Binomial Probability [15 points]

Hypothesis tests

Stat 231 Exam 2 Fall 2013

Review 6. n 1 = 85 n 2 = 75 x 1 = x 2 = s 1 = 38.7 s 2 = 39.2

Lecture 15 - ANOVA cont.


Chapter 24. Comparing Means

CHAPTER 18 SAMPLING DISTRIBUTION MODELS STAT 203

Business Statistics. Lecture 5: Confidence Intervals

Project proposal feedback. Unit 4: Inference for numerical variables Lecture 3: t-distribution. Statistics 104

Two-sample inference: Continuous Data

Transcription:

Lecture 11 - Tests of Proportions Statistics 102 Colin Rundel February 27, 2013

Research Project Research Project Proposal - Due Friday March 29th at 5 pm Introduction, Data Plan Data Project - Due Friday, April 26th at 5 pm Introduction Analysis Conclusion Data Statistics 102 (Colin Rundel) Lec 11 February 27, 2013 2 / 32

Single population proportion Example - Experimental Design Two scientists want to know if a certain drug is effective against high blood pressure. The first scientist wants to give the drug to 1000 people with high blood pressure and see how many of them experience lower blood pressure levels. The second scientist wants to give the drug to 500 people with high blood pressure, and not give the drug to another 500 people with high blood pressure, and see how many in both groups experience lower blood pressure levels. Which is the better way to test this drug? (a) All 1000 get the drug (b) 500 get the drug, 500 don t Statistics 102 (Colin Rundel) Lec 11 February 27, 2013 5 / 32

Single population proportion Results from the GSS The GSS asks the same question, below is the distribution of responses from the 2010 survey: All 1000 get the drug 99 500 get the drug 500 don t 571 Total 670 Statistics 102 (Colin Rundel) Lec 11 February 27, 2013 6 / 32

Single population proportion Parameter and point estimate We would like to estimate the proportion of all Americans who have a good intuition about experimental design, i.e. would answer 500 get the drug 500 don t. What are the parameter of interest and the point estimate? Parameter of interest: Proportion of all Americans who have a good intuition about experimental design. p (a population proportion) Point estimate: Proportion of sampled Americans who have a good intuition about experimental design. ˆp (a sample proportion) Statistics 102 (Colin Rundel) Lec 11 February 27, 2013 7 / 32

Single population proportion Inference on a proportion What percent of all Americans have a good intuition about experimental design, i.e. would answer 500 get the drug 500 don t? We can answer this research question using a confidence interval, which we know is always of the form point estimate ± ME And we also know that ME = critical value standard error of the point estimate. SEˆp =? Standard error of a sample proportion p (1 p) SEˆp = n Statistics 102 (Colin Rundel) Lec 11 February 27, 2013 8 / 32

Single population proportion Identifying when a sample proportion is nearly normal Sample proportions are also nearly normally distributed Central limit theorem for proportions Sample proportions will be nearly normally distributed with mean equal to the population mean, p, and standard error equal to ˆp N ( mean = p, SE = ) p (1 p) n p (1 p) n. But of course this is true only under certain conditions... any guesses? independent observations, at least 10 successes and 10 failures Note: If p is unknown (most cases), we use ˆp in the calculation of the standard error, this is equivalent to using s instead of σ when performign inference on means. Statistics 102 (Colin Rundel) Lec 11 February 27, 2013 9 / 32

Single population proportion Identifying when a sample proportion is nearly normal Sample proportions are also nearly normally distributed Central limit theorem for proportions Sample proportions will be nearly normally distributed with mean equal to the population mean, p, and standard error equal to ˆp N ( mean = p, SE = ) p (1 p) n p (1 p) n. Assumptions/conditions: 1. Independence: Random sample 10% condition: If sampling without replacement, n < 10% of the population. 2. Normality: At least 10 successes [np 10] and 10 failures [n(1 p) 10]. Statistics 102 (Colin Rundel) Lec 11 February 27, 2013 10 / 32

Single population proportion Confidence intervals for a proportion Back to experimental design... The GSS found that 571 out of 670 (85%) of Americans answered the question on experimental design correctly. Estimate (using a 95% confidence interval) the proportion of all Americans who have a good intuition about experimental design? Given: n = 670, ˆp = 571 670 = 0.85. Are CLT conditions met? 1. Independence: The sample is random, and 670 < 10% of all Americans, therefore we can assume that one respondent s response is independent of another. 2. Success-failure: 571 people answered correctly (successes) and 99 answered incorrectly (failures), both are greater than 10. Statistics 102 (Colin Rundel) Lec 11 February 27, 2013 11 / 32

Single population proportion Confidence intervals for a proportion Calculating the Confidence Interval We are given that n = 670, ˆp = 0.85, we also just learned that the p(1 p) standard error of the sample proportion is SE = n. What is the 95% confidence interval for this proportion? CI = point estimate ± margin of error = point estimate ± critical value SE = ˆp ± z SE 0.85 0.15 = 0.85 ± 1.96 = (0.82, 0.88) 670 Statistics 102 (Colin Rundel) Lec 11 February 27, 2013 12 / 32

Single population proportion Choosing a sample size when estimating a proportion Choosing a sample size How many people should you sample in order to reduce the margin of error of a 95% confidence interval down to 1%. ME = z SE p (1 p) 0.01 1.96 n 0.85 0.15 0.01 1.96 Using ˆp from previous study n 0.01 2 1.96 2 0.85 0.15 n n 1.962 0.85 0.15 0.01 2 n 4898.04 n should be at least 4,899 Statistics 102 (Colin Rundel) Lec 11 February 27, 2013 13 / 32

Single population proportion Choosing a sample size when estimating a proportion What if there isn t a previous study?... use ˆp = 0.5. Why? if you don t know any better, 50-50 is a good guess ˆp = 0.5 gives the most conservative estimate largest standard error and thus the largest possible sample size. p * (1 - p) 0.00 0.10 0.20 0.0 0.2 0.4 0.6 0.8 1.0 p Statistics 102 (Colin Rundel) Lec 11 February 27, 2013 14 / 32

Single population proportion Hypothesis testing for a proportion CI vs. HT for proportions Success-failure condition: CI: At least 10 observed successes and failures HT: At least 10 expected successes and failures, calculated using the null value, p 0 Standard error: CI: calculate using observed sample proportion: SE = HT: calculate using the null value: SE = p 0(1 p 0) n p(1 p) n With means SE only depended on s or σ and n, so it was not determined in any way by H 0. With proportions the mean and SE both depend on p, therefore H 0 affects the SE. Statistics 102 (Colin Rundel) Lec 11 February 27, 2013 15 / 32

Single population proportion Hypothesis testing for a proportion Back to the GSS The GSS found that 571 out of 670 (85%) of Americans answered the question on experimental design correctly. Do these data provide convincing evidence that more than 80% of Americans have a good intuition about experimental design? H 0 : p = 0.80 H A : p > 0.80 SE = 0.80 0.20 = 0.0154 670 Z = 0.85 0.80 = 3.25 0.0154 p value = 1 0.9994 = 0.0006 sample proportions 0.8 0.85 Since p-value is small we reject H 0. The data provide convincing evidence that more than 80% of Americans have a good intuition on experimental design. Statistics 102 (Colin Rundel) Lec 11 February 27, 2013 16 / 32

Single population proportion Hypothesis testing for a proportion Common Misinterpretations 11% of 1,001 Americans responding to a 2006 Gallup survey stated that they have objections to celebrating Halloween on religious grounds. At 95% confidence level, the margin of error for this survey a is ±3%. A news piece on this study s findings states: More than 10% of all Americans have objections on religious grounds to celebrating Halloween. At 95% confidence level, is this news piece s statement justified? Statistics 102 (Colin Rundel) Lec 11 February 27, 2013 17 / 32

Difference of two proportions Example - Melting ice cap survey Scientists predict that global warming may have big effects on the polar regions within the next 100 years. One of the possible effects is that the northern ice cap may completely melt. Would this bother you a great deal, some, a little, or not at all if it actually happened? (a) A great deal (b) Some (c) A little (d) Not at all Statistics 102 (Colin Rundel) Lec 11 February 27, 2013 18 / 32

Difference of two proportions Results from the GSS & Duke The GSS asks this question, below is the distribution of responses from the 2010 survey: A great deal 454 Some 124 A little 52 Not at all 50 Total 680 The same question was asked of 88 Duke students, of which 56 said it would bother them a great deal. Statistics 102 (Colin Rundel) Lec 11 February 27, 2013 19 / 32

Difference of two proportions Parameter and point estimate Parameter of interest: Difference between the proportions of all Duke students and all Americans who would be bothered a great deal by the northern ice cap completely melting. p Duke p US Point estimate: Difference between the proportions of sampled Duke students and sampled Americans who would be bothered a great deal by the northern ice cap completely melting. ˆp Duke ˆp US Statistics 102 (Colin Rundel) Lec 11 February 27, 2013 20 / 32

Difference of two proportions Inference for comparing proportions The details are the same as before... CI: point estimate ± margin of error HT: Use Z = point estimate null value SE to find appropriate p-value. We just need the appropriate standard error of the point estimate (SE pduke p US ), Standard error of the difference between two sample proportions p 1 (1 p 1 ) SE (p1 p 2 ) = + p 2(1 p 2 ) n 1 n 2 Statistics 102 (Colin Rundel) Lec 11 February 27, 2013 21 / 32

Difference of two proportions Confidence intervals for difference of proportions Conditions for inference on the difference of proportions 1 Independence within groups: The US group is sampled randomly and we re assuming that the Duke group represents a random sample as well. n Duke < 10% of all Duke students and 680 < 10% of all Americans. We can assume that the attitudes of Duke students in the sample are independent of each other, and attitudes of US residents in the sample are independent of each other as well. 2 Independence between groups: The sampled Duke students and the US residents are independent of each other. 3 Success-failure: At least 10 observed successes and 10 observed failures in the two groups. Statistics 102 (Colin Rundel) Lec 11 February 27, 2013 22 / 32

Difference of two proportions Confidence intervals for difference of proportions CI for difference of proportions Construct a 95% confidence interval for the difference between the proportions of Duke students and Americans who would be bothered a great deal by the melting of the northern ice cap (p Duke p US ). Duke US A great deal 56 454 Not a great deal 32 226 Total 88 680 Statistics 102 (Colin Rundel) Lec 11 February 27, 2013 23 / 32

Difference of two proportions HT for comparing proportions Hypotheses for testing the difference of two proportions Which of the following is the correct set of hypotheses for testing if the proportion of all Duke students who would be bothered a great deal by the melting of the northern ice cap differs from the proportion of all Americans who do? H 0 : p Duke = p US H 0 : p Duke p US = 0 H A : p Duke p US H A : p Duke p US 0 Statistics 102 (Colin Rundel) Lec 11 February 27, 2013 24 / 32

Difference of two proportions HT for comparing proportions Flashback to working with one proportion When constructing a confidence interval for a population proportion, we check if the observed number of successes and failures are at least 10. nˆp 10 n(1 ˆp) 10 When conducting a hypothesis test for a population proportion, we check if the expected number of successes and failures are at least 10. np 0 10 n(1 p 0 ) 10 Statistics 102 (Colin Rundel) Lec 11 February 27, 2013 25 / 32

Difference of two proportions HT for comparing proportions A slight wrinkle... In the case of comparing two proportions where H 0 : p 1 = p 2, there isn t a null value we can use to calculated the expected number of successes and failures in each sample or the SE. Therefore, we need to first find a common (pooled) proportion for the two groups, and use that in our analysis. This involves finding the proportion of total successes among all observations. Pooled estimate of a proportion ˆp = # of successes in 1 + # of successes in 2 n 1 + n 2 = n 1ˆp 1 + n 2ˆp 2 n 1 + n 2 Statistics 102 (Colin Rundel) Lec 11 February 27, 2013 26 / 32

Difference of two proportions HT for comparing proportions Pooled estimate of a proportion Calculate the estimated pooled proportion of Duke students and Americans who would be bothered a great deal by the melting of the northern ice cap. Duke US A great deal 56 454 Not a great deal 32 226 Total 88 680 ˆp pooled = 56 + 454 88 + 680 = 0.664 Which sample proportion (ˆp Duke or ˆp US ) is closer to the pooled estimate? Why? Statistics 102 (Colin Rundel) Lec 11 February 27, 2013 27 / 32

Difference of two proportions HT for comparing proportions Implications for the SE Under the null hypothesis we are stating that p 1 = p 2 which does not uniquely identify either p 1 or p 2. Therefore we are using the pooled proportion (ˆp) as our best guess for p 1 and p 2 under the null hypothesis. For a confidence interval we have seen that p1(1 p 1) p2(1 p2) ˆp1(1 ˆp 1) SE = + + n 1 n 2 n 1 ˆp2(1 ˆp2) n 2 Therefore, for a hypothesis test we will use ˆp as our approximation for p 1 and p 2 p1(1 p 1) p2(1 p2) ˆp(1 ˆp) ˆp(1 ˆp) SE = + + n 1 n 2 n 1 n 2 ( 1 ˆp(1 ˆp) + 1 ) n1ˆp1 + n2ˆp2, where ˆp = n 1 n 2 n 1 + n 2 Statistics 102 (Colin Rundel) Lec 11 February 27, 2013 28 / 32

Difference of two proportions HT for comparing proportions HT for comparing proportions Do these data suggest that the proportion of all Duke students who would be bothered a great deal by the melting of the northern ice cap differs from the proportion of all Americans who do? ˆp pooled = 0.664, n 1 = 88, n 2 = 680 Statistics 102 (Colin Rundel) Lec 11 February 27, 2013 29 / 32

Recap Recap - inference for one proportion Population parameter: p, point estimate: ˆp Conditions: independence - random sample and 10% condition at least 10 successes and failures - observed for CI - expected for HT Standard error: SE = for CI: use ˆp for HT: use p 0 p(1 p) n Statistics 102 (Colin Rundel) Lec 11 February 27, 2013 30 / 32

Recap Recap - comparing two proportions Population parameter: (p 1 p 2 ), point estimate: (ˆp 1 ˆp 2 ) Conditions: independence within groups - random sample and 10% condition met for both groups independence between groups at least 10 successes and failures in each group - observed for CI - expected for HT SE (ˆp1 ˆp 2 ) = p1 (1 p 1 ) n 1 + p 2(1 p 2 ) n 2 for CI: use ˆp 1 and ˆp 2 for HT: when H 0 : p 1 = p 2: use ˆp pool = #suc 1+#suc 2 n 1 +n 2 when H 0 : p 1 p 2 = (some value other than 0): use ˆp 1 and ˆp 2 - this is pretty rare Statistics 102 (Colin Rundel) Lec 11 February 27, 2013 31 / 32

Recap Reference - standard error calculations one sample two samples mean SE = σ n SE = σ 2 1 n 1 + σ2 2 n 2 proportion SE = p(1 p) n SE = p1 (1 p 1 ) n 1 + p 2(1 p 2 ) n 2 When working with means, it s very rare that σ is known, so we usually use s as an approximation. When working with proportions, we will not know p therefore if doing a hypothesis test, p comes from the null hypothesis if constructing a confidence interval, use ˆp instead Statistics 102 (Colin Rundel) Lec 11 February 27, 2013 32 / 32