Comparing p s Dr. Don Edwards notes (slightly edited and augmented) The Odds for Success When the experiment consists of a series of n independent trials, and each trial may end in either success or failure, and the probability of success p on any given trial stays constant from trial to trial, we have learned a good approximate confidence interval for p is available (recommended via the method of Agresti and Coull, but we applied a different one this time). It has become popular to re express the success probability p in terms of the odds for success: = p/(1 p) Note that is equivalent information to p: if you know p, you can compute, and vice versa: As a result of this equivalence, a hypothesis test or confidence interval for the probability of success p can be re expressed as a hypothesis test or conf. interval on the odds of success : To help you become more comfortable with the notion of odds, here is a table: p = prob. of success = odds of success log10(odds) 0.0001 0.00010001 4.00 0.001 0.001001 3.00 0.010 0.0101 2.00 0.100 0.111 0.95 0.500 1.00 0.00 0.900 9.0 0.95 0.990 99.0 2.00 0.999 999.0 3.00 0.9999 9999.0 4.00 Notice that as the probability of success p ranges from 0 to 1, the corresponding odds for success ranges from 0 to, and the log(odds) ranges from to.
OK, so if we reject H 0 : p 1 =p 2, how different are p 1 and p 2? In addition to inferences on a single success probability p, if we have two treatments / populations and collect two independent samples of sizes n 1 and n 2, we will learn that we can test H 0 : p 1 =p 2 via a contingency table 2 test if the n s are large, and that there is a conservative test available (Fisher s exact test) when the n s are small. With these methods, we have no good way to carefully answer the important question: How different are p 1 and p 2? How should we compare these two values? There are at least three strategies, labeled A, B, C here. A) The Difference p 1 p 2 : If the p s are not too close to 0 or 1, a confidence interval for p 1 p 2 may be an appropriate way to compare them. There are a number of ways to construct an approximate confidence interval for p 1 p 2 using independent samples. The recommended confidence interval construction method is due to Agresti and Caffo (discussed without citation in section 10.7 of the text; the formula is omitted here). However, we apply a different method this semester (covered in the last set of lecture notes). Example: for the treatment of angina data used in our first 2x2 contingency table, we had trt 1 = Timolol and trt 2 = placebo. After the data are collected and analyzed, suppose the final approximate 95% interval for p 1 p 2 is (0.058, 0.234). Interpretation: B) Relative Risk p 1 /p 2 : If the p s are close to 0, comparing them via their difference is not always meaningful. For example, in the following two situations the difference p 1 p 2 is 0.01. In the first situation most people would say that the p s are nearly equal for practical purposes, but in the second situation the p s are dramatically unequal: situation 1: p 1 = 0.400, p 2 = 0.390 situation 1: p 1 = 0.011, p 2 = 0.001 In the second situation it makes more sense to compare the p s via their ratio, p 1 /p 2. If the event we call success is actually a BAD thing (like a radioactive exposure, or a toxic side effect, or death) then the p s are measuring the risk of the bad thing occurring for each trt, so p 1 /p 2 = the relative risk of the event for treatment 1, relative to treatment 2. When the data are collected in two independent samples, an approximate confidence interval for p 1 /p 2 is given by a formula (not shown here) in Agresti (1996, Introduction to Categorical Data Analysis), p. 47. Difference in p s, Relative Risk and the Odds Ratio Page 2
Example: In an analysis of all 577,006 reported automobile crash injuries in Florida in 1988, we assume (that is, we model) the crashes where seat belts were used and those where seat belts were not used as independent samples from the populations of all seat belt and non seat belt injury crashes. The incidences of fatalities in these samples were: 1. No Seat Belts: 1601 of 164,527 ( ˆp 1 = 0.009755) 2. Seat Belts: 510 of 412,878 ( ˆp 2 = 0.001235) and the 95% confidence interval for relative risk p 1 /p 2 using the Agresti formula is (7.15, 8.72). Interpretation: How do we compare the p s if p 1 and p 2 are both very close to 1? One strategy is to reverse the definitions of success and failure and compare (1 p 1 ) to (1 p 2 ), which values will be close to 0, via their ratio. Strategy C below seems to handle both of the extreme situations for the p s (close to 0 or close to 1) somewhat automatically. C) Odds ratios 1 / 2 : Since knowing the odds 1 = p 1 /(1 p 1 ) is equivalent to knowing p 1, and likewise knowing 2 is equivalent to knowing p 2, comparing odds 1 to 2 is really another way to compare the p s. Note: if the p s are very close to 0, then the odds ratio 1 / 2 is essentially the relative risk p 1 /p 2. If the p s are close to 1, then the odds ratio 1 / 2 is essentially (1 p 2 )/(1 p 1 ). When the data are collected in two independent samples, an approximate confidence interval for 1 / 2 is given by a formula in Agresti (1996), p. 24 (not shown here). Example: When we apply this formula to the seat belt data above, the 95% confidence interval for 1 / 2 is the same as what we found for p 1 /p 2 : (7.15, 8.72). (Why is this not surprising?) Interpret this interval. Difference in p s, Relative Risk and the Odds Ratio Page 3
Limitations with Odds Ratios and Relative Risk The Case Control Study Odds Ratio or Relative Risk? A case control design in a study is when subjects who have the disease under study (these are the cases ) are compared to subjects who are otherwise similar but do not have the disease (these are the controls ). The subjects prior health habits are related to current disease status. Sometimes it is not practical or ethical to conduct a randomized controlled experiment (like the Timolol vs. placebo study of example 10.11) or a cross sectional study (a snap shot at one point in time of the subjects current disease status and their past or current exposure status). A cross sectional study gives us information about the prevalence of the disease. Consider this example studying Hand Held Cellular Telephone Use and the Risk of Brain Cancer (Muscat JE et al. Handheld cellular telephone use and risk of brain cancer. JAMA 2000; 284: 3001 7). In this study, 469 individuals aged 18 80 with primary brain cancer and 422 matched closely on cell phone use without brain cancer were compared according to cell phone use. Here is a summary of the data: Primary No Brain What type of study is this? Case Control Cross sectional Randomized Control Regular past or current cell phone use No cell phone use Brain Cancer Cancer 66 76 403 346 Why was this design chosen rather than one of the other two designs discussed above? Relative comparison (a) Can we compute relative risk for disease? That is, the relative risk for primary brain cancer with regular past or current use of cell phones relative to no use of cell phones? Why or why not? Relative comparison (b) Can we (technically) compute the relative risk for exposure? That is, relative risk for past or current cell phone use with primary brain cancer relative to no brain cancer? Difference in p s, Relative Risk and the Odds Ratio Page 4
Which one do we care about? Two definitions for the odds ratio: The disease odds ratio is The exposure odds ratio is Let s examine the odds ratio in reference to (a) and (b) The odds ratio computed in this paper for brain cancer and regular handheld cellular telephone use was 0.74 (95% CI, 0.50 1.10) relative to nonusers. What is the consequence of 1 being included in the interval? How do we interpret this 95% confidence interval? Can the odds ratio of 0.74 be used as an estimate of the relative risk here? Why or why not? Difference in p s, Relative Risk and the Odds Ratio Page 5