Inference for Proportions

Similar documents
Inference for Proportions

Section 10.1 (Part 2 of 2) Significance Tests: Power of a Test

10.1. Comparing Two Proportions. Section 10.1

Confidence Intervals, Testing and ANOVA Summary

Lecture 11 - Tests of Proportions

Statistical Inference. Section 9.1 Significance Tests: The Basics. Significance Test. The Reasoning of Significance Tests.

Difference Between Pair Differences v. 2 Samples

DETERMINE whether the conditions for performing inference are met. CONSTRUCT and INTERPRET a confidence interval to compare two proportions.

LECTURE 12 CONFIDENCE INTERVAL AND HYPOTHESIS TESTING

STA Module 10 Comparing Two Proportions

Chapter 22. Comparing Two Proportions. Bin Zou STAT 141 University of Alberta Winter / 15

13. Sampling distributions

Section 9.2. Tests About a Population Proportion 12/17/2014. Carrying Out a Significance Test H A N T. Parameters & Hypothesis

Unit 9: Inferences for Proportions and Count Data

STAT Chapter 9: Two-Sample Problems. Paired Differences (Section 9.3)

Data Analysis and Statistical Methods Statistics 651

Sections 7.1 and 7.2. This chapter presents the beginning of inferential statistics. The two major applications of inferential statistics

Section Inference for a Single Proportion

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 9.1-1

Lecture Slides. Elementary Statistics. Tenth Edition. by Mario F. Triola. and the Triola Statistics Series

Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2)

Harvard University. Rigorous Research in Engineering Education

STAT 201 Assignment 6

Chapter 9 Inferences from Two Samples

Business Statistics: Lecture 8: Introduction to Estimation & Hypothesis Testing

Chapter 6 Estimation and Sample Sizes

Math 124: Modules Overall Goal. Point Estimations. Interval Estimation. Math 124: Modules Overall Goal.

Inference for Single Proportions and Means T.Scofield

ACMS Statistics for Life Sciences. Chapter 13: Sampling Distributions

Chapter 10: Comparing Two Populations or Groups

Business Statistics. Lecture 5: Confidence Intervals

Chapter 10: Comparing Two Populations or Groups

Unit 9: Inferences for Proportions and Count Data

3.4. The Binomial Probability Distribution

STA 101 Final Review

Statistical inference provides methods for drawing conclusions about a population from sample data.

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

Chapter 7. Inference for Distributions. Introduction to the Practice of STATISTICS SEVENTH. Moore / McCabe / Craig. Lecture Presentation Slides

Inferences About Two Population Proportions

One-sample categorical data: approximate inference

Business Statistics. Lecture 10: Course Review

PubH 5450 Biostatistics I Prof. Carlin. Lecture 13

Statistic: a that can be from a sample without making use of any unknown. In practice we will use to establish unknown parameters.

Goodness of Fit Tests

Single Sample Means. SOCY601 Alan Neustadtl

Lecture 6: Point Estimation and Large Sample Confidence Intervals. Readings: Sections

Statistical Intervals (One sample) (Chs )

Confidence Intervals. Confidence interval for sample mean. Confidence interval for sample mean. Confidence interval for sample mean

Chapter 10: Comparing Two Populations or Groups

1 Binomial Probability [15 points]

Comparison of Two Population Means

Chapter 18. Sampling Distribution Models. Bin Zou STAT 141 University of Alberta Winter / 10

Lecture #16 Thursday, October 13, 2016 Textbook: Sections 9.3, 9.4, 10.1, 10.2

Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.

Binomial Model. Lecture 10: Introduction to Logistic Regression. Logistic Regression. Binomial Distribution. n independent trials

CHAPTER 10 Comparing Two Populations or Groups

Inferences for Proportions and Count Data

Interval estimation. October 3, Basic ideas CLT and CI CI for a population mean CI for a population proportion CI for a Normal mean

Lecture 26: Chapter 10, Section 2 Inference for Quantitative Variable Confidence Interval with t

Lecture 10: Introduction to Logistic Regression

CHAPTER 10 HYPOTHESIS TESTING WITH TWO SAMPLES

Chapter 5 Confidence Intervals

CHAPTER 14 THEORETICAL DISTRIBUTIONS

2011 Pearson Education, Inc

Dealing with the assumption of independence between samples - introducing the paired design.

Chapter 20 Comparing Groups

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Hypothesis Testing Problem. TMS-062: Lecture 5 Hypotheses Testing. Alternative Hypotheses. Test Statistic

Introduction to Survey Analysis!

Data Analysis and Statistical Methods Statistics 651

Chapter 24. Comparing Means. Copyright 2010 Pearson Education, Inc.

p = q ˆ = 1 -ˆp = sample proportion of failures in a sample size of n x n Chapter 7 Estimates and Sample Sizes

Medical statistics part I, autumn 2010: One sample test of hypothesis

Chapter. Hypothesis Testing with Two Samples. Copyright 2015, 2012, and 2009 Pearson Education, Inc. 1

Carolyn Anderson & YoungShil Paek (Slide contributors: Shuai Wang, Yi Zheng, Michael Culbertson, & Haiyan Li)

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary

GPCO 453: Quantitative Methods I Sec 09: More on Hypothesis Testing

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series

Chapter 9. Inferences from Two Samples. Objective. Notation. Section 9.2. Definition. Notation. q = 1 p. Inferences About Two Proportions

Confidence Intervals for Population Mean

Statistics for Business and Economics: Confidence Intervals for Proportions

You are allowed 3? sheets of notes and a calculator.

STATISTICAL INFERENCE PART II CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

Lecture 4: Random Variables and Distributions

1 Hypothesis testing for a single mean

Chapter 23. Inferences About Means. Monday, May 6, 13. Copyright 2009 Pearson Education, Inc.

AP Statistics Cumulative AP Exam Study Guide

Business Statistics: A Decision-Making Approach 6 th Edition. Chapter Goals

Mathematical Notation Math Introduction to Applied Statistics

PHP2510: Principles of Biostatistics & Data Analysis. Lecture X: Hypothesis testing. PHP 2510 Lec 10: Hypothesis testing 1

Statistical Analysis of Chemical Data Chapter 4

Statistics 251: Statistical Methods

Statistics in medicine

Hypothesis tests

hypotheses. P-value Test for a 2 Sample z-test (Large Independent Samples) n > 30 P-value Test for a 2 Sample t-test (Small Samples) n < 30 Identify α

Lecture 10: Comparing two populations: proportions

Inference About Means and Proportions with Two Populations. Chapter 10

Chapter 9. Hypothesis testing. 9.1 Introduction

Point Estimation and Confidence Interval

Inference for Distributions Inference for the Mean of a Population

Transcription:

Inference for Proportions Marc H. Mehlman marcmehlman@yahoo.com University of New Haven Based on Rare Event Rule: rare events happen but not to me. (University of New Haven) Inference for Proportions 1 / 22

Table of Contents 1 Inference for a Single Proportion 2 Comparing Two Proportions 3 Odds Ratios (University of New Haven) Inference for Proportions 2 / 22

Inference for a Single Proportion Inference for a Single Proportion Inference for a Single Proportion (University of New Haven) Inference for Proportions 3 / 22

Inference for a Single Proportion Let X 1,, X n be a random sample from BIN(1, p). Then X = n j=1 X j BIN(n, p). Definition The sample population proportion is ˆp def = X = X. n def ˆp(1 ˆp) The standard error of ˆp is SEˆp =. n ( By the CLT, X ) p(1 p) is approximately N p, for big n and also ˆp is approximately p n ( for big n. Thus for big n, X ) is approximately N ˆp,. Theorem (Large Sample Confidence Interval for p:) ˆp(1 ˆp) n ˆp(1 ˆp) margin of error = m = z = z SEˆp n and the confidence interval is ˆp ± m. Use this interval for confidence 90% or more and when the number of successes and failures are both at least 15. (University of New Haven) Inference for Proportions 4 / 22

Inference for a Single Proportion We compute a 90% confidence interval for the population proportion of arthritis patients who suffer some "adverse symptoms." What is the sample proportion p? p ˆ = 23 440 0.052 For a 90% confidence level, z* = 1.645. Confidence level C df 0.50 0.60 0.70 0.80 0.90 0.95 0.96 Using the large sample method: z* 0.674 0.841 1.036 1.282 1.645 1.960 2.054 m = z * pˆ (1 pˆ ) n m = 1.645* 0.052(1 0.052) / 440 m = 1.645*0.0106 0.017 90%CIfor p : pˆ ± m 0.052 ± 0.017 With 90% confidence level, between 3.5% and 6.9% of arthritis patients taking this pain medication experience some adverse symptoms. (University of New Haven) Inference for Proportions 5 / 22

Inference for a Single Proportion Theorem (Sample Size) Given a desired margin of error, m, one should chose the following sample size, n, to obtain the confidence interval, ˆp ± m (or p ± m) of p. n = { ( z m ) 2 p (1 p ) when p is an educated guess of what p is ( z 2m) 2 with no educated guess of p. Note: 1 round up n to ensure it is a positive integer. 2 the closer one s educated guess, p, of p is to 1/2, the safer one is. 3 n = (z ) 2 4m 2 (ie, when p = 1/2) is the most conservative estimate of n. (University of New Haven) Inference for Proportions 6 / 22

Inference for a Single Proportion Theorem (Sample Size) Given a desired margin of error, m, one should chose the following sample size, n, to obtain the confidence interval, ˆp ± m (or p ± m) of p. n = { ( z m ) 2 p (1 p ) when p is an educated guess of what p is ( z 2m) 2 with no educated guess of p. Note: 1 round up n to ensure it is a positive integer. 2 the closer one s educated guess, p, of p is to 1/2, the safer one is. 3 n = (z ) 2 4m 2 (ie, when p = 1/2) is the most conservative estimate of n. (University of New Haven) Inference for Proportions 6 / 22

Inference for a Single Proportion Theorem (Sample Size) Given a desired margin of error, m, one should chose the following sample size, n, to obtain the confidence interval, ˆp ± m (or p ± m) of p. n = { ( z m ) 2 p (1 p ) when p is an educated guess of what p is ( z 2m) 2 with no educated guess of p. Note: 1 round up n to ensure it is a positive integer. 2 the closer one s educated guess, p, of p is to 1/2, the safer one is. 3 n = (z ) 2 4m 2 (ie, when p = 1/2) is the most conservative estimate of n. (University of New Haven) Inference for Proportions 6 / 22

Inference for a Single Proportion Theorem (Sample Size) Given a desired margin of error, m, one should chose the following sample size, n, to obtain the confidence interval, ˆp ± m (or p ± m) of p. n = { ( z m ) 2 p (1 p ) when p is an educated guess of what p is ( z 2m) 2 with no educated guess of p. Note: 1 round up n to ensure it is a positive integer. 2 the closer one s educated guess, p, of p is to 1/2, the safer one is. 3 n = (z ) 2 4m 2 (ie, when p = 1/2) is the most conservative estimate of n. (University of New Haven) Inference for Proportions 6 / 22

Inference for a Single Proportion What sample size would we need in order to achieve a margin of error no more than 0.01 (1 percentage point) with a 90% confidence level? We could use 0.5 for our guessed p*. However, since the drug has been approved for sale over the counter, we can safely assume that no more than 10% of patients should suffer adverse symptoms (a better guess than 50%). For a 90% confidence level, z* = 1.645. Confidence level C df 0.50 0.60 0.70 0.80 0.90 0.95 0.96 z* 0.674 0.841 1.036 1.282 1.645 1.960 2.054 2 z * 1.645 n = p *(1 p*) = (0.1)(0.9) 2434.4 m 0.01 2 To obtain a margin of error no more than 0.01 we need a sample size n of at least 2435 arthritis patients. (University of New Haven) Inference for Proportions 7 / 22

Inference for a Single Proportion Theorem (Large Sample z Test for a Population Proportion) Let X 1,, X n be a random sample where X j BIN(1, p) and such that np 10 and n(1 p) 10. Let where p is unknown. Then H 0 : p = p 0 z = ˆp p 0 p 0 (1 p 0 ) n N(0, 1) is a test statistic for H 0. (University of New Haven) Inference for Proportions 8 / 22

Inference for a Single Proportion Example A potato-chip producer has just received a truckload of potatoes from its main supplier. If the producer determines that more than 8% of the potatoes in the shipment have blemishes, the truck will be sent away to get another load from the supplier. A supervisor selects a random sample of 500 potatoes from the truck. An inspection reveals that 47 of the potatoes have blemishes. Carry out a significance test at the α = 0.10 significance level. What should the producer conclude? We want to perform a test at the α = 0.10 significance level of H 0 : p = 0.08 H a : p > 0.08 where p is the actual proportion of potatoes in this shipment with blemishes. If conditions are met, we should do a one-sample z test for the population proportion p. Random: The supervisor took a random sample of 500 potatoes from the shipment. Normal: Assuming H 0 : p = 0.08 is true, the expected numbers of blemished and unblemished potatoes are np 0 = 500(0.08) = 40 and n(1 p 0 ) = 500(0.92) = 460, respectively. Because both of these values are at least 10, we should be safe doing Normal calculations. 13 (University of New Haven) Inference for Proportions 9 / 22

Inference for a Single Proportion Example The sample proportion of blemished potatoes is p ˆ = 47 /500 = 0.094. Test statistic z= p ˆ p 0 = p 0 (1 p 0 ) n 0.094 0.08 = 1.15 0.08(0.92) 500 P-value The desired P-value is: P(z 1.15) = 1 0.8749 = 0.1251 Since our P-value, 0.1251, is greater than the chosen significance level of α = 0.10, we fail to reject H 0. There is not sufficient evidence to conclude that the shipment contains more than 8% blemished potatoes. The producer will use this truckload of potatoes to make potato chips. 14 (University of New Haven) Inference for Proportions 10 / 22

Comparing Two Proportions Comparing Two Proportions Comparing Two Proportions (University of New Haven) Inference for Proportions 11 / 22

Comparing Two Proportions Comparing 2 independent samples We often need to compare 2 treatments with 2 independent samples. For large enough samples, the sampling distribution of approximately Normal. pˆ ˆ 1 p ) is ( 2 However, neither p 1 nor p 2 are known. (University of New Haven) Inference for Proportions 12 / 22

Comparing Two Proportions Given two random samples, X 1,, X nx and Y 1,, Y ny, where X i BIN(1, p X ) and Y j BIN(1, p Y ), define D def = ˆp X ˆp Y. Notice that 1 D is approximately normal for large n X and n Y. 2 µ D = µˆpx µˆpy = p X p Y. 3 σ 2 D = σ2ˆp X + σ 2ˆp Y = p X (1 p X ) n X + p Y (1 p Y ) n Y. Definition One can approximate σ D = error of D, SE D def = px (1 p X ) n X + p Y (1 p Y ) n Y ˆp X (1 ˆp X ) n X + ˆp Y (1 ˆp Y ) n Y with the standard (University of New Haven) Inference for Proportions 13 / 22

Comparing Two Proportions Thus for large n X and n Y, D is approximately This gives N (p X p Y, SE D ). Theorem (Large Sample CI for Difference Between Two Proportions) A (1 α)100% CI for p X p Y is Warning (ˆp X ˆp Y ) ± z SE D. Use this method only when the number of heads and tails is at least 10 for each sample. (University of New Haven) Inference for Proportions 14 / 22

Comparing Two Proportions Example Lyme disease is spread by infected ticks. Ticks feed mainly on mice. Mice feed on acorn. An experiment compared two similar forest areas in a year with low acorn amounts. One area was supplied large amounts of acorns, and the other untouched. The next spring mice populations were compared: trapped mice breeding mice Area 1: high in acorns 72 54 Area 2: low in acorns 17 10 Find a large sample 95% confidence interval for the difference in proportion of breeders in high acorn and low acorn areas. Solution for Large Sample 95% confidence interval: (ˆp X ˆp Y ) ± z SE D = 54 72 10 17 ± 1.96 54 72 = 0.1642959 ± 0.2544338. ( ) 1 54 10 72 17 + 72 ( ) 1 10 17 17 Thus the answer is ( 0.09, 0.42) (don t imply more accuracy than there is). (University of New Haven) Inference for Proportions 15 / 22

Comparing Two Proportions Theorem (Difference Between Two Proportions) Let X 1,, X nx and Y 1,, Y ny be independent r.s. where X j BIN(1, p X ) and Y k BIN(1, p Y ). Let H 0 : p X = p Y = p where p is unknown. Define the pooled estimate, ˆp, and the pooled standard error of p X and p Y to be ˆp def = n X ˆp X + n Y ˆp Y def and SE Dp = n X + n Y and the test statistic be for H 0. z = ˆp X ˆp Y ˆp(1 ˆp) ( ˆp(1 ˆp) ˆp(1 ˆp) 1 + = ˆp(1 ˆp) + 1 ) n X n Y n X n Y ( 1 n X + 1 n Y ) = ˆp x ˆp Y SE Dp N(0, 1) Warning Use this method only when the number of heads and tails in each sample is at least 5. (University of New Haven) Inference for Proportions 16 / 22

Comparing Two Proportions Example Gastric Freezing Gastric freezing was once a treatment for ulcers. Patients would swallow a deflated balloon with tubes to cool the stomach for an hour in hope of reducing acid production and relieving ulcer pain. The treatment was shown to be safe and significantly reducing ulcer pain and was widely used for years. A randomized comparative experiment later compared the outcome of gastric freezing with that of a placebo: 28 of the 82 patients subjected to gastric freezing improved, while 30 of the 78 in the control group improved. H 0 : p gf = p placebo H a : p gf > p placebo (University of New Haven) Inference for Proportions 17 / 22

Comparing Two Proportions Example (cont.) Results: 28 of the 82 patients subjected to gastric freezing improved 30 of the 78 patients in the control group improved z H 0 : p gf = p placebo 28 + 30 pˆ pooled = = 0.3625 82 + 78 H a : p gf > p placebo pˆ pˆ 0.342 0.385 0.043 1 1 1 1 0.076 pˆ (1 pˆ ) + 0.3625*0.6375 + n 82 78 1 n2 1 2 = = = 0.57 The P-value is greater than 50%... -0.3 0.0 0.3 pˆ p^ gf - p^ ˆ gf ppl plac Gastric freezing was not significantly better than a placebo (P-value > 0.1), and this treatment was abandoned. ALWAYS USE A CONTROL!!! (University of New Haven) Inference for Proportions 18 / 22

Odds Ratios Odds Ratios Odds Ratios (University of New Haven) Inference for Proportions 19 / 22

Odds Ratios Consider Disease No Disease Treatment a b Placebo c d Definition OR = odds ratio = odds of disease for treatment group odds of disease for placebo group = a/b c/d = ad bc. Notice 1 OR is a point estimator and 2 OR > 1 better to be in the placebo group. 3 OR < 1 better to be in the treatment group. 4 smaller OR is better. (University of New Haven) Inference for Proportions 20 / 22

Odds Ratios Theorem (1 α)100% CI for OR ( ) OR e α/2 z 1/a+1/b+1/c+1/d, OR e z α/2 1/a+1/b+1/c+1/d 1 1 CI treatment has no effect. 2 1 / CI treatment has an effect. Example Consider Disease No Disease Treatment 45 34 Placebo 56 52 Find a 95% confidence interval for the odds ratio. (University of New Haven) Inference for Proportions 21 / 22

Odds Ratios Example (Cont.) Note that OR = 45 52 34 56 = 585 476 so the 95% confidence interval for the odds ratio is ( ) 585 476 e 1.96 1/45+1/34+1/56+1/52 585, 476 e1.96 1/45+1/34+1/56+1/52 (0.65494, 2.20341). One can t be 95% confident that the treatment helps, but since OR > 1, if one had to guess, one would guess that it does help. (University of New Haven) Inference for Proportions 22 / 22