William C.L. Stewart 1,2,3 and Susan E. Hodge 1,2

Size: px

Start display at page:

Download "William C.L. Stewart 1,2,3 and Susan E. Hodge 1,2"

Damian Leonard
6 years ago
Views:

1 A Targeted Investigation into Clopper-Pearson Confidence Intervals William C.L. Stewart 1,2,3 and Susan E. Hodge 1,2 1Battelle Center for Mathematical Medicine, The Research Institute, Nationwide Children s Hospital, Columbus, Ohio, Department of Pediatrics, 3 Department of Statistics, The Ohio State University, Columbus, Ohio Corresponding author: William C.L. Stewart, Ph.D., Research Building III, 575 Children s Crossroad, Columbus, OH william.stewart@nationwidechildrens.org

2 Abstract In this special report, we consider Clopper-Pearson (CP) confidence intervals (CIs) for the binomial probability distribution. We review their direct construction from coverage probabilities, and review how this construction corresponds to the more common derivation based on hypothesis testing. We then explore some features of CP CIs: We use the direct construction to elucidate their bizarre patterns of coverage and to understand the origin of these patterns. We present an argument for constructing closed rather than open CIs; and we show how poorly the uncorrected normal approximation performs relative to CP CIs, even when n is large (e.g., n = 500). We briefly discuss the difference between CP and fiducial CIs. These insights should provide users with a different perspective on and appreciation of Clopper- Pearson CIs. Keywords: confidence intervals, Clopper-Pearson, fiducial intervals, and coverage

3 1. Introduction In addition to parameter estimation, one is often concerned with the question: How close is my estimate to the truth? Although confidence intervals (CIs) do not answer this question, they do provide random intervals that contain the truth with some minimum probability (e.g. 95%). As such, they are widely used and favored (by many) over hypothesis tests [e.g., Poole, 2001; Rothman et al., 2012; Rotondi and Donner, 2012; Cocks and Torgerson, 2013]. In this paper, we highlight some intriguing features of Clopper-Pearson (CP) CIs (Clopper and Pearson, 1934), which are both exact 1 and applicable to discrete distributions that are not amenable to the most common method of CI construction (Peña et al., 1992). CP CIs are usually derived from the acceptance regions of hypothesis tests, but they can be derived directly from the definition of coverage as shown in Appendix A; see also Schilling and Doi (2014). The direct derivation sheds light on the seemingly bizarre coverage pattern of CP CIs (Section 3), provides yet another rationale for reporting closed rather than open CIs (Section 4), and illustrates an important difference between fiducial intervals and CP CIs (Section 5). In Section 6, we demonstrate the surprisingly poor coverage of the normal approximation, even when CP CIs are constructed from fairly large numbers of independent Bernoulli trials. 2. Clopper-Pearson (CP) Confidence Intervals (CIs)! Consider the binomial probability mass function p(x;n,θ) = # n$ " x% & θ x (1 θ) n x with success probability θ. To construct an exact (1 α) CI for θ, we seek a lower confidence limit U(X) and an upper confidence limit V(X) defined on [0, 1] such that, for any fixed α in (0,1) (θ (U,V )) 1 α, θ (0,1). (1) For CP CIs, the coverage of the two outer intervals is also constrained, so that (θ [0,U]) α 2 and (θ [V,1]) α 2. (2) We refer to the probabilities in (2) as the left interval (LI) and right interval (RI) coverage, respectively, and we refer to their sum as the outer coverage; then one minus the outer coverage yields the CI coverage in (1). Given n trials and X = x observed successes, the lower and upper limits for the CP CI are defined as 1 Exact refers to the exact probabilities used to calculate the CIs, which are conservative, with coverage 1 α.

4 " u x = # 0 for x = 0 $ for 0 < x n θ L " and v x = θ U for 0 x < n #, respectively, where θ L and θ U are the implicit $ 1 for x = n solutions to n! n # $ " j % & θ j (1 θ) n j = α x! 2, and n # $ " j % & θ j (1 θ) n j = α 2, (3) j=x j=0 respectively. The equations in (3) are the standard ones for constructing CP CIs, and they yield the following left interval (LI) and right interval (RI) coverages, respectively: $ (0,α 2] for u (U θ) : 0 <θ u n % & = 0 for u n <θ 1, # (V θ) = = 0 for 0 θ < v 0 $ % (0,α 2] for v 0 θ < v n (4) Accordingly, the CP CI coverage is 1 minus the sum of LI and RI coverage probabilities in (4): $ [1 α 2, 1) for 0 <θ < v 0 & (U <θ < V ) = % [1 α, 1) for v 0 θ u n & ' [1 α 2, 1) for u n <θ <1 (5) The special cases in which θ = 0 and θ =1 are discussed in Section 4. (Pastor (2005) has similar equations but breaks down the cases differently, depending on the value of n.) 3. Shapes of coverage curves Figure 1 graphs LI and RI coverages as separate functions of θ, for n = 10 and 20. The LI (RI) coverage function has discontinuities whenever θ happens to equal u j (v j,) for j in {0,, n}. For example, if θ happens to equals u j for j in {1,, n}, then the LI coverage equals α 2. However, because the LI coverage decreases smoothly and monotonically in the direction of u j 1, there are discontinuities at u j 1 where the coverage jumps to α 2. 2 Also, because the LI coverage is zero whenever θ exceeds u n, the function is discontinuous at u n as well. A similar line of reasoning explains the discontinuities in the RI coverage function. This leads to the forward and backward J -shaped segments of Figure 1. The forward segments correspond to the LI coverage function, while the backward segments correspond to the RI coverage function. Moreover, as n increases, the total number of segments increases, and the LI segments spread to the right, while the RI segments spread to the left. Because the coverage of the CP CI is simply 1 2 LI coverage jumps to 1 when θ = 0, and RI coverage jumps to 1 when θ =1 ; see Section 4.

minus the outer coverage, the interplay between forward and backward segments explains why the coverage functions for CP CIs are increasingly strange and discontinuous as n increases

(b) (U θ) (V θ) θ Figure 1. LI (left interval) coverage, in red, and RI (right interval) coverage, in blue, as functions of, α =.

5 minus the outer coverage, the interplay between forward and backward segments explains why the coverage functions for CP CIs are increasingly strange and discontinuous as n increases (Figure 2). For n > 5, the forward and backward segments overlap symmetrically about θ =.5, and the amount of overlap also increases with increasing n. (a) Figure 1. (b) (U θ) (V θ) θ Figure 1. LI (left interval) coverage, in red, and RI (right interval) coverage, in blue, as functions of, α =.05 θ = u j θ = v j (a) for n = 10 and (b) for n = 20 (with ). Black marks indicate those points ( for the LIs, for the RIs) at which coverage is exactly.025. Probabilities are plotted for a dense set of discrete values along (0, 1), so as not to obscure the discontinuities.

6 (a) Figure 2. (b) (U <θ < V ) Figure 2. CI coverage, as function of, (a) for n = 10 and (b) for n = 20. These curves result from subtracting the sum of Figure 1 s red and blue curves from 1 at each value of. θ 4. Open versus closed Cis Some authors have advocated for open CP CIs, whereas others suggest that the intervals should be closed. Casella (1986) states that CIs are always defined as closed, but points out that, in practice, there is no real difference between considering the intervals open or closed, and has advocated for half-open θ

7 CIs, i.e., in our notation, (u x, v x ]. Some authors, e.g., Zwillinger and Koskosa (2000), Fleiss et al. (2003), and Pastor (2005) report open CIs, whereas others, e.g., Lehmann and Romano (1959) and Casella and Berger (2002), imply the CI is closed. Stuart and Ord (1991) imply that both the CI and the outer intervals are closed, which seems logically inconsistent. Moreover, the entire issue of whether one should report open or closed CIs is rarely discussed. Nevertheless, there are good reasons for reporting closed CP CIs. First, when θ happens to equal one of the confidence limits, the open CP CI does not include the true value. For example, if n = 10 and θ =.1216, the event X = 4 (which occurs with probability.0201) yields the open CI (.1216,.7376), which does not include the true value. In general, when θ equals u j or v j there is a small but non-negligible probability (.0201 in this example) that the endpoint of the interval will equal θ. Whenever this happens, [u x, v x ] which is only infinitesimally larger than (u x, v x ), will have greater coverage. This is especially important when θ equals 0, since the open CP CI, (0,v 0 ), never includes θ, and its coverage is zero. In contrast, the closed CP CI, [0, v 0 ], always includes θ, so its coverage is one. A similar pathology exists when θ equals 1: (u n, 1) has coverage zero, whereas [u n, 1] has coverage one. Second, the special relationship between hypothesis testing and confidence intervals seems to suggest that CP CIs should be closed. Briefly, the validity of an exact CI constructed from an inverted hypothesis test is rooted in the following tautology: θ C(x 1,, x n ) (x 1,, x n ) A(θ), where C is a confidence interval that depends on the observed data, and A is the acceptance region of a level α hypothesis test (Lehmann and Romano, 1959; Casella and Berger, 2002). Moreover, because the equations in (3) provide unique solutions: θ L and θ U, and because the binomial distribution permits a stochastic ordering, the confidence interval C is implicitly closed. Hence, from a set theoretic perspective, and from the vantage point of robust coverage, it is both legitimate and logically more consistent to report closed CIs rather than open ones. 5. Behavior of the normal approximation for large n The normal approximation arises from the central limit theorem applied to ˆ θ = x = x n, and as a general rule of thumb, the approximation is presumed to be reasonably close for n 30. We briefly

8 examine CI coverage when the normal approximation is used for large samples. 3 We compare the normal approximation confidence limits with the exact CP CI limits (Figure 3). For n = 30 and α =.05, the two sets of limits appear to be quite close, with the normal-approximation limits a little low for observations less than n/2 and a little high for observations greater than n/2. However, the coverage probabilities from the normal approximation deviate considerably from the exact coverage probabilities of the CP CIs, with coverage well below 95% for most of the parameter space when n = 30 and α =.05 (Figure 4). We define Figure 3. Lower and upper 95% confidence limits, for n = 30, calculated by both Clopper-Pearson method and normal approximation. Q as the proportion of the parameter space that has coverage of at least 95%. Figure 5 shows that even increasing the sample size to n = 500, yields a Q of only.258. Moreover, even when the coverage criterion 3 We considered only the uncorrected normal approximation, where the 95% CI s lower limit is x 1.96 SEP and the upper limit is x SEP, with SEP = x(1 x) n.

9 is relaxed (requiring that CI coverage only exceed 94%, for example), it takes nearly 100 independent samples to attain a Q-value of 50% (details not shown). 6. Fiducial intervals One type of fiducial interval is defined by the set of [parameter] values that could have given rise to the observed value with the specified probability [greater than or equal to] 1 α (Wang, 2000; (a) Figure 4. (b) Figure 4. CI coverage, as function of, compared between (a) Clopper-Pearson intervals and (b) normal approximations, for n = 30,. The vertical axis is shown from 0.9 to 1. CP coverage is for all values of, whereas normal approximation coverage is for most values of. θ α = θ <.95 θ

10 Hannig et al., 2016). This contrasts with the focus of the CP CI on coverage probabilities (Equation 1). The two approaches: CP CIs and fiducial intervals, yield identical intervals for all outcomes except X = 0 and X = n. When X = n, one finds from Equation (3) that θ L equals α 2. However, the fiducial interval { } ( ) 1 n would yield θ L = α 1 n, since this set θ : α 1 n θ 1 satisfies the fiducial definition above. (Similarly, Figure 5. Figure 5. Graph of Q (proportion of the parameter space with at least 95% coverage) versus sample size n, when the normal approximation (without continuity correction) is used, for sample sizes from 20 to 500. ( ) 1 n 1 α θ when X = 0 the fiducial upper limit is θ U =1 α 1 n instead of θ U =1 α 2, which is the corresponding CP CI upper limit.) Thus, the fiducial approach does not ensure coverage of at least for all. Specifically, for values of θ between ( α 2) 1 n and α 1 n, LI (left interval) coverage will be greater than α 2 ; it will increase from α 2 to α as θ moves from ( α 2) 1 n to α 1 n ; and similarly for RI coverage for θ between 1 α 1 n and 1 ( α 2) 1 n. This removes the guarantee that CI coverage will never dip below 1 α for all θ. (Whether the CI coverage is actually <1 α for a given θ depends on the interplay between the RI and LI coverages at θ.) Figure 6 shows the differences in coverage for n =10.

11 Figure 6. (a) (b) (c) (d) Figure 6. LI and RI coverages and CI coverage, compared between CP vs. fiducial CIs, for n = 10 and. (a) LI and RI coverages from CP CIs; same as Figure 1(a), but on different vertical scale. (b) CI coverage from CP CIs; same as Figure 2(a), but on different vertical scale. (c) LI and RI coverages from fiducial intervals. LI coverage exceeds.025 for between ~.692 and ~.741; RI coverage exceeds.025 for θ θ between ~.259 and ~.308. (d) CI coverages from fiducial intervals. Coverage differs from that in (b) for between ~.259 and ~.308, and between ~.692 and ~.741. α = Conclusions We examine Clopper-Pearson binomial confidence intervals through the lens of coverage probabilities (see Appendix A for details). The bizarre and discontinuous patterns of CI coverage, as a function of θ, are more clearly understood as arising from the overlaps between the forward and backward J -shaped LI (left interval) and RI (right interval) coverage functions. After considering settheoretic arguments, and after carefully considering the impact of infinitesimally larger CP CIs, a clear and logically consistent rationale for reporting closed CIs (i.e., CIs that include the endpoints) emerges. Comparing CP intervals with those obtained from the (uncorrected) normal approximation reveals the θ

12 surprising result that even at large sample sizes, e.g., n = 500, the normal approximation yields conservative coverage for only a small proportion of the parameter space. While it is well known that the uncorrected normal approximation does not behave well for small n (e.g., Brown et al. 2001, 2002), it is perhaps less widely appreciated how poorly this approximation continues to perform for larger sample sizes as well. Finally, the distinction between one type of fiducial intervals and CP CIs is demonstrated by the differences in coverage when X = 0 or X = n. Acknowledgments This project was supported by The Research Institute, Nationwide Children s Hospital, Columbus, Ohio. References Brown, L.D., Cai, T.T., DasGupta, A. (2001). Interval estimation for a binomial proportion. Statistical Sci. 16, Brown, L.D., Cai, T.T., DasGupta, A. (2002). Confidence intervals for a binomial proportion and asymptotic expansions. Ann. Stat. 30, Casella, G. (1986). Refining binomial confidence intervals. Can. J. Stat. 14, Casella, G., Berger, R.L. (2002). Statistical Inference. Duxbury, Wadsworth Group, Boston. Clopper, C., Pearson, E.S. (1934). The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika 26, Cocks, K., Torgerson, D.J. (2013). J. Clin. Epidemiol. 66, Fleiss, J.L., Levin, B., Paik, M.C. (2003). Statistical Methods for Rates and Proportions, 3 rd ed. Wiley Interscience, John Wiley & Sons, Hoboken NJ. Hannig, J., Iyer, H.K., Lai, R.C.S., Lee, T.C.M. (2016) Generalized fiducial infrence: A review and new results. J Am Stat Assoc 111: Lehmann, E.L., Romano, J.P. (1959). Testing Statistical Hypotheses, 3 rd ed., Springer, New York, NY. Pastor, D. (2005). On the coverage probability of the Clopper-Pearson confidence interval. Technical report, ENST Bretagne. Peña, E.A., Rohatgi, V.K., Székely, G.J. (1992). On the non-existence of ancillary statistics. Stat. Prob. Letters 15, Poole, C. Low P-values or narrow confidence intervals: which are more durable? (2001) Epidemiology 12:

13 Rothman, K.J., Greenland, S., Lash, T. (2012) Modern Epidemiology, 3 rd edition. Wolters Kluwer / Lippincott Williams & Wilkins. Rotondi, M.A., Donner, A. (2012) A confidence interval approach to sample size estimation for interobserver agreement studies with multiple raters and outcomes. J Clin Epidemiol 65: Schilling, M.F., Doi J.A. (2014). A coverage probability approach to finding an optimal binomial confidence procedure. Am. Statist. 68, Stuart, A., Ord, J.K. (1991). Kendall s Advanced Theory of Statistics. Vol 2. Classical Inference and Relationship, 5 th ed. Oxford U Press, New York. Wang, Y.H. (2000). Fiducial intervals: what are they? Am Statist 54, Zwillinger, D., Kokoska, S. (2000). Standard Probability and Statistics Tables and Formulae. Chapman & Hall/CRC, New York. Appendix A: Direct construction of CP CIs Let θ denote the true but unknown value of the binomial parameter in the probability mass! function p(x;n,θ) = # n$. To construct an exact confidence interval (CI) for the binomial " x% & θ x (1 θ) n x (1 α) success probability, we seek random variables U < V, with U and V defined on [0, 1] such that the open interval (U, V) has probability of at least 1 α of covering θ : (θ (U,V )) 1 α. There is an additional constraint of symmetry on the two outer intervals: (θ [0,U]) α 2 and (θ [V,1]) α. (A.1) 2 We refer to the probabilities in (A.1) as the left interval (LI) and right interval (RI) coverage, respectively; the random variables U and V are the lower and upper confidence limits. The CI coverage equals 1 minus the sum of the two probabilities in (A.1): (θ (U,V )) =1 (U θ) (V θ) (A.2) Note that U is a discrete random variable, defined on a set of n +1 values: U :{ 0 = u 0 < u 1 < u n <1}. (A.3) Each value of u corresponds to exactly one value of x, where the random variable X is defined on { } u j x j X : 0, 1,, n. There is a one-to-one relationship between and for j = 0, n. Therefore,

14 (U = u j ) = (X = j). Similarly, V is a random variable, defined on V : 0 < v 0 < v 1 < v n =1. By u 0 convention, is set to 0 and is set to 1. v n The LI coverage in (A.1) can be expressed as an expectation: { } (U θ) n = I { u j θ} (X = j), (A.4) j=0 { } u j θ u j where I u j θ takes the value 1 if, and 0 otherwise. Thus, only the values of that are greater than or equal to θ contribute to the sum in (A.4). Although we do not know the values, we do know that they are discrete points in [0, 1], and that they are ordered as in (A.3). Therefore, (U θ) = (X 1) for 0 <θ u 1. (A.5) u j Equation (A.5) converts a probability concerning U to one concerning X. Moreover, if we set u 1 to θ *, where θ * is the unique solution to (X 1) = α, then for 0 <θ u 1 the LI coverage in (A.5) is less 2 n! n than or equal to α 2. In general, if we define u j implicitly as the solution to # $, then " j % & θ j (1 θ) n j = α 2 the LI coverage will be less than or equal to α 2 for 0 <θ 1. Thus, we have found the lower confidence limit for the CP CI directly from the LI coverage criterion in (A.1). For upper limits V, follow the same reasoning, but with reversed inequalities: P[V θ 0 θ =θ 0 ] = P[X j θ =θ 0 ], for v j θ 0 < v j+1, j = 0,, n 1, where each v x is defined implicitly as x! n the solution to # $. " j % & θ j (1 θ) n j = α 2 j=0 In summary, this approach arrives at the standard equations used to construct the CP CI, yet without using the standard test inversion reasoning. The corresponding left and right interval coverage probabilities respectively are: j=x $ (0,α 2] for 0 <θ u (U θ) : n % & = 0 for u n <θ 1 and # = 0 for 0 θ < v P. (A.6) θ (V θ) : 0 $ % (0,α 2] for v 0 θ <1 Accordingly, as in (A.2), the CI coverage is 1 minus the sum of LI and RI coverage probabilities in (A.6):

15 $ [1 α 2, 1) for 0 <θ < v 0 & (U <θ < V ) = % [1 α, 1) for v 0 θ u n & ' [1 α 2, 1) for u n <θ <1, which is the same as Equation (5) in the text.

arxiv: v1 [math.st] 5 Jul 2007

arxiv: v1 [math.st] 5 Jul 2007 EXPLICIT FORMULA FOR COSTRUCTIG BIOMIAL COFIDECE ITERVAL WITH GUARATEED COVERAGE PROBABILITY arxiv:77.837v [math.st] 5 Jul 27 XIJIA CHE, KEMI ZHOU AD JORGE L. ARAVEA Abstract. In this paper, we derive