Inference for Proportions Marc H. Mehlman marcmehlman@yahoo.com University of New Haven Based on Rare Event Rule: rare events happen but not to me. (University of New Haven) Inference for Proportions 1 / 22
Table of Contents 1 Inference for a Single Proportion 2 Comparing Two Proportions 3 Odds Ratios (University of New Haven) Inference for Proportions 2 / 22
Inference for a Single Proportion Inference for a Single Proportion Inference for a Single Proportion (University of New Haven) Inference for Proportions 3 / 22
Inference for a Single Proportion Let X 1,, X n be a random sample from BIN(1, p). Then X = n j=1 X j BIN(n, p). Definition The sample population proportion is ˆp def = X = X. n def ˆp(1 ˆp) The standard error of ˆp is SEˆp =. n ( By the CLT, X ) p(1 p) is approximately N p, for big n and also ˆp is approximately p n ( for big n. Thus for big n, X ) is approximately N ˆp,. Theorem (Large Sample Confidence Interval for p:) ˆp(1 ˆp) n ˆp(1 ˆp) margin of error = m = z = z SEˆp n and the confidence interval is ˆp ± m. Use this interval for confidence 90% or more and when the number of successes and failures are both at least 15. (University of New Haven) Inference for Proportions 4 / 22
Inference for a Single Proportion We compute a 90% confidence interval for the population proportion of arthritis patients who suffer some "adverse symptoms." What is the sample proportion p? p ˆ = 23 440 0.052 For a 90% confidence level, z* = 1.645. Confidence level C df 0.50 0.60 0.70 0.80 0.90 0.95 0.96 Using the large sample method: z* 0.674 0.841 1.036 1.282 1.645 1.960 2.054 m = z * pˆ (1 pˆ ) n m = 1.645* 0.052(1 0.052) / 440 m = 1.645*0.0106 0.017 90%CIfor p : pˆ ± m 0.052 ± 0.017 With 90% confidence level, between 3.5% and 6.9% of arthritis patients taking this pain medication experience some adverse symptoms. (University of New Haven) Inference for Proportions 5 / 22
Inference for a Single Proportion Theorem (Sample Size) Given a desired margin of error, m, one should chose the following sample size, n, to obtain the confidence interval, ˆp ± m (or p ± m) of p. n = { ( z m ) 2 p (1 p ) when p is an educated guess of what p is ( z 2m) 2 with no educated guess of p. Note: 1 round up n to ensure it is a positive integer. 2 the closer one s educated guess, p, of p is to 1/2, the safer one is. 3 n = (z ) 2 4m 2 (ie, when p = 1/2) is the most conservative estimate of n. (University of New Haven) Inference for Proportions 6 / 22
Inference for a Single Proportion Theorem (Sample Size) Given a desired margin of error, m, one should chose the following sample size, n, to obtain the confidence interval, ˆp ± m (or p ± m) of p. n = { ( z m ) 2 p (1 p ) when p is an educated guess of what p is ( z 2m) 2 with no educated guess of p. Note: 1 round up n to ensure it is a positive integer. 2 the closer one s educated guess, p, of p is to 1/2, the safer one is. 3 n = (z ) 2 4m 2 (ie, when p = 1/2) is the most conservative estimate of n. (University of New Haven) Inference for Proportions 6 / 22
Inference for a Single Proportion Theorem (Sample Size) Given a desired margin of error, m, one should chose the following sample size, n, to obtain the confidence interval, ˆp ± m (or p ± m) of p. n = { ( z m ) 2 p (1 p ) when p is an educated guess of what p is ( z 2m) 2 with no educated guess of p. Note: 1 round up n to ensure it is a positive integer. 2 the closer one s educated guess, p, of p is to 1/2, the safer one is. 3 n = (z ) 2 4m 2 (ie, when p = 1/2) is the most conservative estimate of n. (University of New Haven) Inference for Proportions 6 / 22
Inference for a Single Proportion Theorem (Sample Size) Given a desired margin of error, m, one should chose the following sample size, n, to obtain the confidence interval, ˆp ± m (or p ± m) of p. n = { ( z m ) 2 p (1 p ) when p is an educated guess of what p is ( z 2m) 2 with no educated guess of p. Note: 1 round up n to ensure it is a positive integer. 2 the closer one s educated guess, p, of p is to 1/2, the safer one is. 3 n = (z ) 2 4m 2 (ie, when p = 1/2) is the most conservative estimate of n. (University of New Haven) Inference for Proportions 6 / 22
Inference for a Single Proportion What sample size would we need in order to achieve a margin of error no more than 0.01 (1 percentage point) with a 90% confidence level? We could use 0.5 for our guessed p*. However, since the drug has been approved for sale over the counter, we can safely assume that no more than 10% of patients should suffer adverse symptoms (a better guess than 50%). For a 90% confidence level, z* = 1.645. Confidence level C df 0.50 0.60 0.70 0.80 0.90 0.95 0.96 z* 0.674 0.841 1.036 1.282 1.645 1.960 2.054 2 z * 1.645 n = p *(1 p*) = (0.1)(0.9) 2434.4 m 0.01 2 To obtain a margin of error no more than 0.01 we need a sample size n of at least 2435 arthritis patients. (University of New Haven) Inference for Proportions 7 / 22
Inference for a Single Proportion Theorem (Large Sample z Test for a Population Proportion) Let X 1,, X n be a random sample where X j BIN(1, p) and such that np 10 and n(1 p) 10. Let where p is unknown. Then H 0 : p = p 0 z = ˆp p 0 p 0 (1 p 0 ) n N(0, 1) is a test statistic for H 0. (University of New Haven) Inference for Proportions 8 / 22
Inference for a Single Proportion Example A potato-chip producer has just received a truckload of potatoes from its main supplier. If the producer determines that more than 8% of the potatoes in the shipment have blemishes, the truck will be sent away to get another load from the supplier. A supervisor selects a random sample of 500 potatoes from the truck. An inspection reveals that 47 of the potatoes have blemishes. Carry out a significance test at the α = 0.10 significance level. What should the producer conclude? We want to perform a test at the α = 0.10 significance level of H 0 : p = 0.08 H a : p > 0.08 where p is the actual proportion of potatoes in this shipment with blemishes. If conditions are met, we should do a one-sample z test for the population proportion p. Random: The supervisor took a random sample of 500 potatoes from the shipment. Normal: Assuming H 0 : p = 0.08 is true, the expected numbers of blemished and unblemished potatoes are np 0 = 500(0.08) = 40 and n(1 p 0 ) = 500(0.92) = 460, respectively. Because both of these values are at least 10, we should be safe doing Normal calculations. 13 (University of New Haven) Inference for Proportions 9 / 22
Inference for a Single Proportion Example The sample proportion of blemished potatoes is p ˆ = 47 /500 = 0.094. Test statistic z= p ˆ p 0 = p 0 (1 p 0 ) n 0.094 0.08 = 1.15 0.08(0.92) 500 P-value The desired P-value is: P(z 1.15) = 1 0.8749 = 0.1251 Since our P-value, 0.1251, is greater than the chosen significance level of α = 0.10, we fail to reject H 0. There is not sufficient evidence to conclude that the shipment contains more than 8% blemished potatoes. The producer will use this truckload of potatoes to make potato chips. 14 (University of New Haven) Inference for Proportions 10 / 22
Comparing Two Proportions Comparing Two Proportions Comparing Two Proportions (University of New Haven) Inference for Proportions 11 / 22
Comparing Two Proportions Comparing 2 independent samples We often need to compare 2 treatments with 2 independent samples. For large enough samples, the sampling distribution of approximately Normal. pˆ ˆ 1 p ) is ( 2 However, neither p 1 nor p 2 are known. (University of New Haven) Inference for Proportions 12 / 22
Comparing Two Proportions Given two random samples, X 1,, X nx and Y 1,, Y ny, where X i BIN(1, p X ) and Y j BIN(1, p Y ), define D def = ˆp X ˆp Y. Notice that 1 D is approximately normal for large n X and n Y. 2 µ D = µˆpx µˆpy = p X p Y. 3 σ 2 D = σ2ˆp X + σ 2ˆp Y = p X (1 p X ) n X + p Y (1 p Y ) n Y. Definition One can approximate σ D = error of D, SE D def = px (1 p X ) n X + p Y (1 p Y ) n Y ˆp X (1 ˆp X ) n X + ˆp Y (1 ˆp Y ) n Y with the standard (University of New Haven) Inference for Proportions 13 / 22
Comparing Two Proportions Thus for large n X and n Y, D is approximately This gives N (p X p Y, SE D ). Theorem (Large Sample CI for Difference Between Two Proportions) A (1 α)100% CI for p X p Y is Warning (ˆp X ˆp Y ) ± z SE D. Use this method only when the number of heads and tails is at least 10 for each sample. (University of New Haven) Inference for Proportions 14 / 22
Comparing Two Proportions Example Lyme disease is spread by infected ticks. Ticks feed mainly on mice. Mice feed on acorn. An experiment compared two similar forest areas in a year with low acorn amounts. One area was supplied large amounts of acorns, and the other untouched. The next spring mice populations were compared: trapped mice breeding mice Area 1: high in acorns 72 54 Area 2: low in acorns 17 10 Find a large sample 95% confidence interval for the difference in proportion of breeders in high acorn and low acorn areas. Solution for Large Sample 95% confidence interval: (ˆp X ˆp Y ) ± z SE D = 54 72 10 17 ± 1.96 54 72 = 0.1642959 ± 0.2544338. ( ) 1 54 10 72 17 + 72 ( ) 1 10 17 17 Thus the answer is ( 0.09, 0.42) (don t imply more accuracy than there is). (University of New Haven) Inference for Proportions 15 / 22
Comparing Two Proportions Theorem (Difference Between Two Proportions) Let X 1,, X nx and Y 1,, Y ny be independent r.s. where X j BIN(1, p X ) and Y k BIN(1, p Y ). Let H 0 : p X = p Y = p where p is unknown. Define the pooled estimate, ˆp, and the pooled standard error of p X and p Y to be ˆp def = n X ˆp X + n Y ˆp Y def and SE Dp = n X + n Y and the test statistic be for H 0. z = ˆp X ˆp Y ˆp(1 ˆp) ( ˆp(1 ˆp) ˆp(1 ˆp) 1 + = ˆp(1 ˆp) + 1 ) n X n Y n X n Y ( 1 n X + 1 n Y ) = ˆp x ˆp Y SE Dp N(0, 1) Warning Use this method only when the number of heads and tails in each sample is at least 5. (University of New Haven) Inference for Proportions 16 / 22
Comparing Two Proportions Example Gastric Freezing Gastric freezing was once a treatment for ulcers. Patients would swallow a deflated balloon with tubes to cool the stomach for an hour in hope of reducing acid production and relieving ulcer pain. The treatment was shown to be safe and significantly reducing ulcer pain and was widely used for years. A randomized comparative experiment later compared the outcome of gastric freezing with that of a placebo: 28 of the 82 patients subjected to gastric freezing improved, while 30 of the 78 in the control group improved. H 0 : p gf = p placebo H a : p gf > p placebo (University of New Haven) Inference for Proportions 17 / 22
Comparing Two Proportions Example (cont.) Results: 28 of the 82 patients subjected to gastric freezing improved 30 of the 78 patients in the control group improved z H 0 : p gf = p placebo 28 + 30 pˆ pooled = = 0.3625 82 + 78 H a : p gf > p placebo pˆ pˆ 0.342 0.385 0.043 1 1 1 1 0.076 pˆ (1 pˆ ) + 0.3625*0.6375 + n 82 78 1 n2 1 2 = = = 0.57 The P-value is greater than 50%... -0.3 0.0 0.3 pˆ p^ gf - p^ ˆ gf ppl plac Gastric freezing was not significantly better than a placebo (P-value > 0.1), and this treatment was abandoned. ALWAYS USE A CONTROL!!! (University of New Haven) Inference for Proportions 18 / 22
Odds Ratios Odds Ratios Odds Ratios (University of New Haven) Inference for Proportions 19 / 22
Odds Ratios Consider Disease No Disease Treatment a b Placebo c d Definition OR = odds ratio = odds of disease for treatment group odds of disease for placebo group = a/b c/d = ad bc. Notice 1 OR is a point estimator and 2 OR > 1 better to be in the placebo group. 3 OR < 1 better to be in the treatment group. 4 smaller OR is better. (University of New Haven) Inference for Proportions 20 / 22
Odds Ratios Theorem (1 α)100% CI for OR ( ) OR e α/2 z 1/a+1/b+1/c+1/d, OR e z α/2 1/a+1/b+1/c+1/d 1 1 CI treatment has no effect. 2 1 / CI treatment has an effect. Example Consider Disease No Disease Treatment 45 34 Placebo 56 52 Find a 95% confidence interval for the odds ratio. (University of New Haven) Inference for Proportions 21 / 22
Odds Ratios Example (Cont.) Note that OR = 45 52 34 56 = 585 476 so the 95% confidence interval for the odds ratio is ( ) 585 476 e 1.96 1/45+1/34+1/56+1/52 585, 476 e1.96 1/45+1/34+1/56+1/52 (0.65494, 2.20341). One can t be 95% confident that the treatment helps, but since OR > 1, if one had to guess, one would guess that it does help. (University of New Haven) Inference for Proportions 22 / 22