Section Inference for a Single Proportion

Similar documents
Section Comparing Two Proportions

Inference for Binomial Parameters

Two Sample Problems. Two sample problems

PubH 5450 Biostatistics I Prof. Carlin. Lecture 13

Common Discrete Distributions

Comparing p s Dr. Don Edwards notes (slightly edited and augmented) The Odds for Success

Loglikelihood and Confidence Intervals

Frequency table: Var2 (Spreadsheet1) Count Cumulative Percent Cumulative From To. Percent <x<=

SplineLinear.doc 1 # 9 Last save: Saturday, 9. December 2006

One-sample categorical data: approximate inference

Inference for Proportions

Lecture 11 - Tests of Proportions

Name: Biostatistics 1 st year Comprehensive Examination: Applied in-class exam. June 8 th, 2016: 9am to 1pm

Medical statistics part I, autumn 2010: One sample test of hypothesis

Inference for Proportions

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors

1 Statistical inference for a population mean

ST3241 Categorical Data Analysis I Two-way Contingency Tables. 2 2 Tables, Relative Risks and Odds Ratios

Chapter 7. Inference for Distributions. Introduction to the Practice of STATISTICS SEVENTH. Moore / McCabe / Craig. Lecture Presentation Slides

Lecture 7: Confidence interval and Normal approximation

Confidence Intervals. Confidence interval for sample mean. Confidence interval for sample mean. Confidence interval for sample mean

Lecture 26: Chapter 10, Section 2 Inference for Quantitative Variable Confidence Interval with t

ACMS Statistics for Life Sciences. Chapter 13: Sampling Distributions

Discrete Multivariate Statistics

Advanced Herd Management Probabilities and distributions

Lecture #16 Thursday, October 13, 2016 Textbook: Sections 9.3, 9.4, 10.1, 10.2

Nonlinear Regression Functions

Data Analysis and Statistical Methods Statistics 651

Dealing with the assumption of independence between samples - introducing the paired design.

One-stage dose-response meta-analysis

Sociology 6Z03 Review II

1 Independent Practice: Hypothesis tests for one parameter:

Violating the normal distribution assumption. So what do you do if the data are not normal and you still need to perform a test?

n y π y (1 π) n y +ylogπ +(n y)log(1 π).

Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2)

Binomial Model. Lecture 10: Introduction to Logistic Regression. Logistic Regression. Binomial Distribution. n independent trials

FinalExamReview. Sta Fall Provided: Z, t and χ 2 tables

Using R in Undergraduate and Graduate Probability and Mathematical Statistics Courses*

Lecture 10: Introduction to Logistic Regression

An overview of applied econometrics

REVIEW: Midterm Exam. Spring 2012

Chapter 1. The data we first collected was the diameter of all the different colored M&Ms we were given. The diameter is in cm.

Tests for Population Proportion(s)

Lecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression:

A proportion is the fraction of individuals having a particular attribute. Can range from 0 to 1!

Chapter 8: Sampling Variability and Sampling Distributions

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation

MATH Chapter 21 Notes Two Sample Problems

Inference for Single Proportions and Means T.Scofield

Mathematical statistics

Compare Predicted Counts between Groups of Zero Truncated Poisson Regression Model based on Recycled Predictions Method

L6: Regression II. JJ Chen. July 2, 2015

Chapter 6. Chapter A random sample of size n = 452 yields 113 successes. Calculate the 95% confidence interval

Categorical Data Analysis Chapter 3

Bootstrap. ADA1 November 27, / 38

Chapter 18 Sampling Distribution Models

ANOVA: Analysis of Variation

Section Least Squares Regression

Introduction to Statistical Analysis. Cancer Research UK 12 th of February 2018 D.-L. Couturier / M. Eldridge / M. Fernandes [Bioinformatics core]

CS 543 Page 1 John E. Boon, Jr.

Some Review Problems for Exam 2: Solutions

Analysis of Longitudinal Data. Patrick J. Heagerty PhD Department of Biostatistics University of Washington

Two-sample Categorical data: Testing

Applied Statistics and Econometrics

Ch18 links / ch18 pdf links Ch18 image t-dist table

Introduction to Crossover Trials

Math 143: Introduction to Biostatistics

STAT 705: Analysis of Contingency Tables

Lecture 10: Comparing two populations: proportions

Inference for Distributions Inference for the Mean of a Population

E509A: Principle of Biostatistics. GY Zou

Interval estimation. October 3, Basic ideas CLT and CI CI for a population mean CI for a population proportion CI for a Normal mean

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests

Likelihood Construction, Inference for Parametric Survival Distributions

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

Chapter 7. Hypothesis Tests and Confidence Intervals in Multiple Regression

The Difference in Proportions Test

Empirical Evaluation (Ch 5)

Business Statistics. Lecture 5: Confidence Intervals

Model Checking and Improvement

Review of Statistics 101

Correlation and Simple Linear Regression

Confidence intervals for kernel density estimation

LECTURE 12 CONFIDENCE INTERVAL AND HYPOTHESIS TESTING

An interval estimator of a parameter θ is of the form θl < θ < θu at a

13. Sampling distributions

Chapter 3. Estimation of p. 3.1 Point and Interval Estimates of p

Expected Value - Revisited

Chapter 11. Regression with a Binary Dependent Variable

Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk

Test Strategies for Experiments with a Binary Response and Single Stress Factor Best Practice

Many natural processes can be fit to a Poisson distribution

STA Module 10 Comparing Two Proportions

The t-test Pivots Summary. Pivots and t-tests. Patrick Breheny. October 15. Patrick Breheny Biostatistical Methods I (BIOS 5710) 1/18

Sample Size Determination

Weizhen Wang & Zhongzhan Zhang

Dover- Sherborn High School Mathematics Curriculum Probability and Statistics

Bus 216: Business Statistics II Introduction Business statistics II is purely inferential or applied statistics.

Math 141. Lecture 10: Confidence Intervals. Albyn Jones 1. jones/courses/ Library 304. Albyn Jones Math 141

Introduction to Econometrics

Transcription:

Section 8.1 - Inference for a Single Proportion Statistics 104 Autumn 2004 Copyright c 2004 by Mark E. Irwin

Inference for a Single Proportion For most of what follows, we will be making two assumptions 1. The response variable is based on X Bin(n, p), i.e. we observe the number of successes X or the proportions of successes ˆp. 2. The sample size n is large, so the normal approximation to the binomial is valid. e.g. ˆp approx. N ( ) p(1 p) p, n Two problems of interest: 1. Confidence interval for p 2. Tests on the hypothesis H 0 : p = p 0 Section 8.1 - Inference for a Single Proportion 1

If the normal approximation for ˆp is reasonable, we can use procedures similar to the ones for the mean of a single population. Confidence Interval for p The confidence interval is based on the Wilson estimate for p p = X + 2 n + 4 This is similar to the sample proportion but it adds 2 successes and two failures to the data. This pulls the estimate of p towards 1 2 slightly. The standard error of p is SE p = p(1 p) n + 4 Section 8.1 - Inference for a Single Proportion 2

Then an approximate level C confidence interval for p is where z is a normal critical value. p ± z SE p = p ± z p(1 p) n + 4 Note: This Wilson estimator is only used for calculating the confidence interval. You still want to use ˆp as your best guess for p. Example: Use of Aspirin to prevent strokes 155 patients who have had a previous stroke 78 received a daily aspirin dose 77 received a placebo The response of interest was no stroke in a 6 month period after starting treatment. Section 8.1 - Inference for a Single Proportion 3

Aspirin group: 63 or 78 with no strokes Placebo group: 43 of 77 with no strokes What are the 95% CI for have no addition strokes in the two groups. Group ˆp p 63 Aspirin 78 = 0.808 63+2 78+4 = 0.793 43 Placebo 77 = 0.558 43+2 77+4 = 0.556 Group Aspirin Placebo SE p 0.793 0.207 78+4 = 0.0447 0.556 0.444 77+4 = 0.0552 95% confidence z = 1.96 Section 8.1 - Inference for a Single Proportion 4

Aspirin: CI = 0.793 ± 1.96 0.0447 = 0.793 ± 0.088 = (0.705, 0.881) Placebo: CI = 0.556 ± 1.96 0.0552 = 0.556 ± 0.108 = (0.448, 0.664) Since the confidence intervals don t overlap, it appears that giving an aspirin helps prevent strokes. (A hypothesis test for comparing two binomial proportions will be discussed later.) Section 8.1 - Inference for a Single Proportion 5

Stata output:. prtesti 78 63 0.5,count One-sample test of proportion x: Number of obs = 78 ------------------------------------------------------------- Variable Mean Std. Err. [95% Conf. Interval] -------------+----------------------------------------------- x.8076923.0446246.7202298.8951548 ------------------------------------------------------------- So the output doesn t match the suggested interval. However we can make it match by adding 2 successes and 2 failures to the data.. prtesti 82 65 0.5,count One-sample test of proportion x: Number of obs = 82 ------------------------------------------------------------- Variable Mean Std. Err. [95% Conf. Interval] -------------+----------------------------------------------- x.7926829.0447672.7049407.8804251 ------------------------------------------------------------- Section 8.1 - Inference for a Single Proportion 6

This Wilson based interval (also known as the Agresti-Coull Add Two Success and Two Failures interval) is something that has started to become more popular recently since it has been shown to have better properties that the traditional interval, such as those calculated by Stata s prtest and prtesti. The traditional approach for the binomial confidence interval is ˆp ± z SEˆp = ˆp ± z ˆp(1 ˆp) n The traditional approach gives intervals of (0.720,895) and (0.448, 0.669). Note that the Wilson CI for p is not symmetric around ˆp, the standard estimate of p. Instead it is pulled towards 1 2. This is desirable, since the binomial distribution is not symmetric about its mean, unless p = 1 2. The asymmetry matches the skewness of the binomial distribution. Section 8.1 - Inference for a Single Proportion 7

What happens if n is small? Lets assume we did a study where there was 1 success in n = 10 trials. 95% Wilson CI for p in this case is (-0.00065, 0.429). The traditional interval is (-0.086, 0.286). Both these intervals give values below 0, which is impossible for a probability. Stata (and other packages) have routines for calculating binomial confidence intervals for p that don t depend on the normal approximation. The Stata functions ci and cii base the confidence intervals with exact calculations with the binomial distribution.. cii 10 1 -- Binomial Exact -- Variable Obs Mean Std. Err. [95% Conf. Interval] -------------+----------------------------------------------------- 10.1.0948683.0025286.4450161 This interval avoids the problem of having values less than 0 or greater than 1. Section 8.1 - Inference for a Single Proportion 8

This interval approach can be used for larger samples as well. For the aspirin group Exact interval: (0.7027, 0.8882) Wilson interval: (0.7049, 0.8804) Traditional interval: (0.7202, 0.8952) This illustrates the point suggesting the the Wilson interval has better coverage properties. Section 8.1 - Inference for a Single Proportion 9

Significance Test for p The test statistic for examining H 0 : p = p 0 is z = ˆp p 0 p 0 (1 p 0 ) n If n is large, z is approximately normal. A rule thumb for n being large is np 0 10 and n(1 p 0 ) 10. To examine how this test works, lets look at whether H 0 : p = 1 2 either the aspirin or the placebo group. holds for Aspirin: z = 0.808 0.5 0.5(1 0.5) 78 = 5.435 Section 8.1 - Inference for a Single Proportion 10

Placebo: z = 0.558 0.5 0.5(1 0.5) 77 = 1.026 The asymptotic p-values for the z test are given by H A : p > p 0 p-value = P [Z z obs ] H A : p < p 0 p-value = P [Z z obs ] H A : p p 0 p-value = 2 P [Z z obs ] So the two-sided p-values for these tests are Aspirin: 5.48e-08 Placebo: 0.305 Section 8.1 - Inference for a Single Proportion 11

Stata output (Placebo group). prtesti 77 43 0.5,count One-sample test of proportion x: Number of obs = 77 ---------------------------------------------------------------- Variable Mean Std. Err. [95% Conf. Interval] -------------+-------------------------------------------------- x.5584416.0565897.4475277.6693554 ---------------------------------------------------------------- Ho: proportion(x) =.5 Ha: x <.5 Ha: x!=.5 Ha: x >.5 z = 1.026 z = 1.026 z = 1.026 P < z = 0.8475 P > z = 0.3051 P > z = 0.1525 Section 8.1 - Inference for a Single Proportion 12

Exact test When n is small, this asymptotic test may give poor answers as the normal approximation to the binomial breaks down. Instead of using the normal distribution to get p-values, we can use the binomial distribution to calculate them exactly. Exact p-values: H A : p > p 0 H A : p < p 0 H A : p p 0 p-value = P [X x obs ] = p u p-value = P [X x obs ] = p l p-value = complicated but approximately 2 min(p u, p l ) This exact test can be done in Stata with bitest and bitesti Section 8.1 - Inference for a Single Proportion 13

. bitesti 77 43 0.5 N Observed k Expected k Assumed p Observed p ------------------------------------------------------------ 77 43 38.5 0.50000 0.55844 Pr(k >= 43) = 0.181016 (one-sided test) Pr(k <= 43) = 0.872848 (one-sided test) Pr(k <= 34 or k >= 43) = 0.362032 (two-sided test) As with the one-sample t test, testing a single proportion usually isn t particularly interesting. Usually of more interest is a test comparing whether two proportions are the same. For example, is the stroke rate in the aspirin group the same as the rate in the placebo group. (To come next class) However, this binomial test can be useful as a non-parametric test in the paired sample comparison when the normality assumption is not reasonable. Instead of looking at A B, you can look at whether A is greater or less than B. Section 8.1 - Inference for a Single Proportion 14

If the distribution of A and B is the same, then P [A > B] = P [A < B] = 1 2. So instead of doing a t test on A B, you can do a test on whether the sample proportion of times that A > B differs from 1 2. (For this test, you usually toss out the observations where A = B. This test is often referred to as the Sign test. Example: Shoe soles In the 10 pairs of shoes, material A has greater wear than material B 8 times. H 0 : p = 1 2 vs H A : p 1 2 p value = 2 P [ˆp ˆp obs ] = 2 P [X x obs ] = 2 P [X x obs ] = 0.109 Section 8.1 - Inference for a Single Proportion 15

. bitesti 10 8 0.5 N Observed k Expected k Assumed p Observed p ------------------------------------------------------------ 10 8 5 0.50000 0.80000 Pr(k >= 8) = 0.054688 (one-sided test) Pr(k <= 8) = 0.989258 (one-sided test) Pr(k <= 2 or k >= 8) = 0.109375 (two-sided test) The Sign test is usually less powerful than the paired t test. You usually one want to use it if the assumptions for the t test are strongly violated since the t test is fairly robust. Section 8.1 - Inference for a Single Proportion 16