Efficient and Exact Tests of the Risk Ratio in a Correlated 2x2 Table with Structural Zero

Size: px

Start display at page:

Download "Efficient and Exact Tests of the Risk Ratio in a Correlated 2x2 Table with Structural Zero"

Lynne Bryan
5 years ago
Views:

Ratio in a Correlated 2x2 Table with tructural Zero Chris

1 Melbourne Business chool From the electedworks of Chris J. Lloyd ummer 2007 Efficient and Exact Tests of the Risk Ratio in a Correlated 2x2 Table with tructural Zero Chris Lloyd Available at:

Computational tatistics & Data Analysis 51 (2007) 3765 3775 www.elsevier.com/locate/csda Efficient and exact tests of the risk ratio in a correlated 2 2 table with structural zero Abstract Chris J.

2 table where the (01) cell is empty by design, the parameter of interest is typically the ratio of the probability of secondary response conditional on primary response to the probability of primary

2 Computational tatistics & Data Analysis 51 (2007) Efficient and exact tests of the risk ratio in a correlated 2 2 table with structural zero Abstract Chris J. Lloyd Melbourne Business chool, Carlton 3053, Australia Received 7 July 2006; received in revised form 19 December 2006; accepted 19 December 2006 Available online 30 December 2006 For a correlated 2 2 table where the (01) cell is empty by design, the parameter of interest is typically the ratio of the probability of secondary response conditional on primary response to the probability of primary response, also known as a risk ratio. It is common to test whether or not the risk ratio equals one. One method of obtaining an exact P -value is to maximise the tail probability of the test statistic over the nuisance parameter. It is argued that better results are obtained by first replacing the nuisance parameter by its profile estimate in the calculation of its exact significance followed by maximisation termed an E + M P -value. We consider four standard approximate test statistics with and without the common correction of adding 2 1 to each count. From a complete enumeration of the distributions of these P -values (for sample sizes 50 and 100), we recommend E + M P -values based on the uncorrected Wald statistic for testing the greater than alternative and on the corrected Wald statistic on the log-scale for testing the less than alternative. A good compromise statistic for both kinds of alternatives is the likelihood ratio statistic Elsevier B.V. All rights reserved. Keywords: Nuisance parameters; Exact test; Correlated proportions; Discordant pairs; Maximised P -value 1. Introduction A sample of n individuals have a binary response measured. For reasons of design, only those who give a certain response on the first occasion are measured a second time. uch designs arise, for instance, in both treating and testing for disease, see Johnson and May (1995). An often quoted example is Toyota et al. (1999) who study the detection rates of a screening test for tuberculosis. For those who test negative on the first occasion the test is applied a second time 1 3 weeks later, whereas those who test positive on the first occasion do not need to be retested. It is suspected that application of the first test, even if negative, makes infected individuals more sensitive to subsequent tests. This booster phenomenon can be measured by the extent to which the probability of a negative response decreases from the first to the second occasion, given the first. Another example which we will study in some detail was given in Agresti (1990, p. 45) (Table 1). A sample of 156 calves were tested for pneumonia during the first 60 days of life and a total of T = 93 were positive. Of these 93 calves with primary infection, n 11 = 30 suffered a secondary infection in the following two weeks. There was interest address: c.lloyd@mbs.edu /$ - see front matter 2007 Elsevier B.V. All rights reserved. doi:116/j.csda

3 3766 Chris J. Lloyd / Computational tatistics & Data Analysis 51 (2007) Table 1 Example of Agresti Primary econdary Total Yes No Yes No Total in comparing the rate of primary infection, estimated to be 93/156 = 59.6%, with the rate of secondary infection, estimated to be 30/93 = 32.3%. The ratio of these two probabilities, known as the risk ratio (RR), represents the factor by which chance of infection changes after first infection, and is here estimated to be /93 2 = An RR less than 1.0 suggests that primary infection has an immunising effect. Agresti defines a kind of χ 2 statistic for testing whether the RR = 1. For this example, the value turns out to be The signed version of this statistic is 4.44 and the approximate one-sided P -value is Certainly the evidence for a protective effect of first infection seems to be overwhelming. Liu (1998, 2000) studied confidence intervals for the ratio and difference in response probabilities, respectively. This work was further developed by Tang and Tang (2002) for the ratio and Tang and Tang (2003) for the difference of probabilities. Lloyd and Moldovan (2007a,b) have recently applied the exact method of Buehler (1957) to confidence limits for the RR. There has been less work on the testing problem though obviously the confidence intervals can be used to define two-sided tests. This paper is motivated by several considerations. First, in this problem it is quite computationally feasible to calculate a P -value with exact statistical properties. This is achieved by maximising over the nuisance parameter. Within the frequentist paradigm of inference it is essential to account for the worst possible parameter values if the statistical properties are to be guaranteed. While this may seem conservative, maximisation is the most efficient method possible of achieving this guarantee. Tests which are not maximised over the nuisance parameter are either systematically conservative or explicitly violate their stated properties, as explained in Lloyd (2005). econd, standard asymptotic tests have statistical properties that are far from ideal, even for large samples. The issue of exactness is a practical one. For instance in the above example of Agresti, the exact P -value obtained by maximising over the nuisance parameter is 361 which, while still small, corresponds to an equivalent Z-statistic of 2.91 rather than uch behaviour is not at all uncommon. Third, such behaviour can be largely eliminated by replacing the nuisance parameter with a null estimate and then maximising, as described in ection 3. This results in a P -value that is less sensitive to the nuisance parameter and consequently the maximised versions tend to be smaller. This will be seen to translate into superior for guaranteed size. Lastly, we look at the performance when RR > θ 0 and RR < θ 0 separately and discover quite different behaviour. 2. Model notation and approximate test statistics The possible responses of an individual are {11, 10, 00}, where 00 denotes a negative response on occasion 1, in which case the second response is negative by convention. Let n ij be the number of individuals with response ij and p ij the probability of this response. The count n 01 =0 is absent by design. The probability of a positive response on the first occasion is = p 11 + p 10. The probability of a second positive response given a first positive response is p 11 /. The ratio of these two probabilities is p 11 / 2. This RR is the parameter of interest and is denoted by θ. The parametrisation of the model in terms of (θ, ) is summarised in Table 2. In order for all these probabilities to be less than 1, there is a restriction min(1, θ 1 ) so the parameter space is Ω ={0 1, θ < 1 }. The choice of nuisance parameter has no effect on the approximate or exact test statistics to be defined below. The data (n 11,n 10,n 00 ) are multinomial but it is convenient to take the data to be (X,T,n), where T = n 11 + n 10 is the number that tests positive on occasion one and X = n 11 the number of these that tests positive a second time. Denoting binomial probability with parameters (n, p) by B(x; n, p) the joint distribution of (X, T ) is Pr(X = x,t = t) = Pr(T = t)pr(x = x T = t) = B(t; n, )B(x; t,θ),

4 Chris J. Lloyd / Computational tatistics & Data Analysis 51 (2007) Table 2 Parametrisation of 2 2 table with structural zero First response econd response Total + + p 11 = θ 2 p 10 = (1 θ) p 00 = 1 1 and so the log-likelihood function is l(θ, x,t) = x log θ + (t + x)log + (n t)log(1 ) + (t x)log(1 θ). (1) We now describe four standard test statistics that may be used to test the null hypothesis θ = θ 0. While these statistics are asymptotically equivalent, they can behave quite differently for moderate samples Wald-type statistics: The maximum likelihood (ML) estimator ˆθ =ˆp 11 / ˆ 2, where ˆp 11 = x/n and ˆ = t/n are empirical estimators. The asymptotic variance of ˆθ = nx/t 2 is p 11 (1 p 11 )/(n 4 ) as given in Lui (1998) who went on to suggest two Wald-type test statistics for testing θ = θ 0 : W 1 = ˆθ θ 0 E(ˆθ) = xn θ 0t 2 nx(n x), W 2 = log ˆθ log θ 0 E(log ˆθ) 2.2. LR and score statistics: nx = (log(xn) 2 log t log θ 0 ) n x. The likelihood ratio (LR) and score tests require the restricted ML estimator ˆ 0 of for fixed θ = θ 0, which is obtained by solving a quadratic equation. In the special case θ 0 = 1 which is of primary interest in this paper, the solutions are ˆ 0 = (x + t)/(n + t) and 1. The first smaller solution corresponds to the maximum and is always within the range (0, min(1, θ0 1 )), see Appendix A. The signed root LR ratio statistic is = sign(ˆθ θ 0 ) 2(l(ˆθ, ˆ) l(θ 0, ˆ 0 )). The score statistic can be shown to be given by = n( ˆp 11 ˆ ˆ 0 θ 0 ) 1 + ˆ 0 n ˆ 2 0 (1 ˆ. 0 ) Agresti suggested a Pearson χ 2 statistic based on the expected values obtained by substituting (θ, ) = (θ 0, ˆ 0 ) into Table 2. It can be shown that for testing the null value θ 0 = 1, Agresti s statistic is identical to the score statistic Logical properties These statistics are infinite or undefined when certain counts are zero. ince there is no possible disadvantage in breaking ties in test statistics, all statistics are modified by replacing (x,t,n)by (x + ε,t+ 2ε,n+ 3ε) with ε extremely small. This will be the standard form of the statistic. It is more common practice to add 2 1 to all counts to deal with these

5 3768 Chris J. Lloyd / Computational tatistics & Data Analysis 51 (2007) problems and we refer to such statistics as modified. We will also investigate the four statistics with this modification, and denote them by W 1, etc. Finally, all test statistics are each defined to equal zero when x = t = 0 since there is no evidence against the null when no successes are observed. The parameter θ is a ratio of the conditional and marginal probability of success. For fixed t any reasonable test statistic should be increasing in x, and for fixed x decreasing in t. All statistics satisfy these properties apart from W 2 which can be non-increasing in x for fixed t. Wald-type test statistics commonly violate monotonicity conditions, essentially because standard error estimates break down near the boundary of the sample space. It has been previously noted by Tang and Tang (2002) that W 1 and W 2 are not monotone in x for fixed t x. A final point of interest is that the properties of one-sided tests of θ > θ 0 and θ < θ 0 are quite different. For instance, it is clear that the data set (x, t) = (0,n)points most strongly towards θ < θ 0, however it is not at all clear which data set points most strongly towards θ > θ 0. We will study test properties under deviations in both directions and find quite different behaviour of the test statistics. 3. Exact tests and P -values Tang and Tang (2002) have shown numerically that confidence intervals based on W 1 and W 2 can have poor coverage properties even for moderate sample sizes. This leads to poor performance of the implied two-sided test, at least for some null values. In this section we give a brief overview of methods for constructing the so-called exact tests from a given, possibly approximate, test statistic. We have data Y and parameter (θ, ) and want to test the null hypothesis θ = θ 0, against either one or two-sided alternatives. The test statistic generates an approximate P -value, denoted P(Y), from tail probabilities of an approximating null distribution. The exact significance level π(y, ) := Pr(P (Y ) P(y); θ 0, ) depends on the unknown value of. The function π is called the profile of the test statistic in Lloyd (2005). The classical solution is to maximise over the nuisance parameter, see Bickel and Doksum (1977, p. 168), giving the P -value P (Y ) where P (y) := sup π(y, ). This is sometimes called Basu s (1977) maximisation principle. The transformation from P(Y)to the new statistic P (Y ) is called the M-step in Lloyd (2005) who shows that P (Y ) satisfies the defining property of a P -value: sup {Pr(P (Y ) P (y); θ 0, )}=P (y) (2) (θ 0,) Ω and is as small as possible amongst valid P -values that are non-decreasing functions of the original statistic P(Y). The maximised P -value (which we will call the M P -value) depends only on the ordering that the test statistic induces on the sample space. Further details are in Lloyd (2005). Viewed this way, maximisation is an essential step in test construction and so any inadequacies in the generated test require a rethink of the basic test statistic, not the maximisation itself. An alternative to maximising over is to replace it with an estimate ˆ 0 under the null, admittedly not a new idea (see for instance torer and Kim, 1990). This generates a new P -value P(y)= ˆ π(y, ˆ 0 ) and the transformed statistic P(Y)is ˆ called the E P -value. The main reason for the E-step is to obtain a P -value whose profile depends less on. Heuristically, we expect that the estimated P -value imposes a more reasonable ordering on the sample space, because it is not based on an asymptotic approximation which may break down near the boundaries. Of course, the estimated P -value P(Y) ˆ is not exactly valid but can be made so by the M-step resulting in what we call the E + M P -value. Computational issues are described in Appendix B. We briefly illustrate these ideas on Agresti s data. The first columns of Table 3 list the values of the four test statistics. All these generate an approximate one-sided P -value based on the normal distribution. All P -values derived from these will be converted into an equivalent normal quantile to help the reader appreciate the patterns (since the P -values in this example are all rather small). Maximising the profile function typically gives a larger, i.e. less significant, P -value. Column 2 lists the normal quantile equivalent to this M P -value. Looking first at Agresti s statistic, the P -value changes from to 361, the equivalent normal quantiles changing from 4.44 to A graphical explanation is in the top left panel of Fig. 1 where we see a spike in the profile. Much worse behaviour occurs for the standard Wald statistic W 1.

6 Chris J. Lloyd / Computational tatistics & Data Analysis 51 (2007) Table 3 Two-sided significance values for example of Agresti (1990, p. 45), expressed in terms of an equivalent Z-statistic enerating statistic Raw M E E + M W W /A e e e e+00 Fig. 1. Plot of significance profile π(y, ) against with data y = (x,t,n)= (30, 93, 156) of Agresti (1990). In each case, the horizontal dashed line is the asymptotic P -value and the vertical dashed line is the profile ML estimate of. Left: Agresti s score statistic. Right: Agresti s statistic after E-step. Columns 3 and 4 give equivalent normal quantiles for the E and E + M P -values. The very minor changes from columns 3 to 4 suggests that the M-step after the E-step is practically unnecessary for this data set. For instance, the equivalent normal quantile based on Agresti s statistic only changes from 4.31 to The profile for the estimated P -value based on Agresti s statistic is in the right panel of Fig. 1 and the spike in the original profile is no longer present. In summary, across these four statistics the E + M P -value provides more evidence of significance than the M P -value and the E + M, and E P -values are almost identical. Of course, the spikes in the profiles can be traced to the high significance these statistics attach to some data sets where x and t are small. While it is well known that asymptotic test statistics of this kind do not perform adequately when counts are small, it is sometimes forgotten that in order to assess the significance of our observed data (x,t,n)= (30, 93, 156) in a frequentist framework we must specify what is to be done in the counterfactual case of small counts. The E-step gives a better ranking of data sets near the boundary in terms of their hostility to the null. The proof of the pudding is in the flatter profile. One cannot conclude much from a single example. For a start we have no idea if the null is actually true or not. The next section presents results on a complete numerical investigation of the performance of the various P -values that we have described. 4. Numerical study We have described four basic test statistics W 1,W 2,, and their modified versions. Each of these eight basic statistics generate four P -values, namely the approximate P -value based on the normal asymptotic, the M P -value, the E P -value and the E + M P -value. Only the M and E + M P -values can be guaranteed as valid. For all possible data sets when n = 50 and 100, all 32 P -values for testing the null hypothesis θ = 1 versus θ = 1 were computed. This allows a full investigation of the performance of the implied tests, without the uncertainty of simulation.

7 3770 Chris J. Lloyd / Computational tatistics & Data Analysis 51 (2007) EM P value M P value EM P value M P value Fig. 2. For n = 100, the left plots compares M and E + M P -values for the modified log-wald statistic W 2 and the right plot for the modified Agresti/score statistic. The E + M P -values are smaller and less discrete than the M P -values M versus E + MP-values We are firstly interested in comparing M P -values and E + M P -values, since only these are guaranteed to lead to valid tests. Fig. 2 presents a plot of the E + M P -values versus the M P -values restricted to the more interesting cases where the P -values are in the range (0, 0.2). The plots are for n = 100 and for the statistics W 2 and, though many similar plots are available in Lloyd (2006). Apparently, the E + M P -values tend to be systematically smaller than the M P -values. For instance, in the right plot when the M P -value is around 0.11, the E + M P -value is typically around or smaller. This does not by itself imply higher, but it certainly anticipates a advantage for the E + M P -values. Explicit comparisons in Lloyd (2006) show that for all eight basic test statistics, E + M P -values are to be preferred to M P -values Power of E + MP-values from different test statistics We next look at the s of the tests based on the E + M P -values. Denote by β α (θ, ) the of the size α test generated by rejecting the null when the P -value is less than or equal to α, defined over the region Ω = {0 1, θ < 1 }. For much of the parameter space the s of all tests will be extreme and of no practical interest. Firstly, as 0 we observe (x, t) = (0, 0) with probability one, all test statistics equal zero and so the is zero. On the other hand, as increases and θ deviates from θ 0 = 1, the will tend to increase. This is reflected in the formula for the standard deviation of ˆθ given in ection 2. Rather than giving contour plots of over the entire parameter space Ω, we focus on a one-dimensional subset where is in an interesting and moderate range, by systematically moving θ closer to θ 0 = 1as increases. We achieve this by choosing θ such that E(2(log l(θ, ) log l(θ 0, ))) = K 2, (3) where K is a quantile of the normal distribution, say ±1or±2. An expression for l(θ, ) was given in (1) which leads to the equation ( ) 1 θ θ 2 log θ + (1 θ) log = K 2 /(2n). 1 If we were investigating normal data, the analogous parameter values would be μ = μ 0 ± Kσ/ n. There will typically be two solutions, one greater and one smaller than θ 0 =1, though when is sufficiently small there will not be a solution less than θ 0 = 1. Calling the solution θ(,k,n), this results in the curves Π(; n, K, α) := β α ( θ(,k,n),). The reader should again note that all s are calculated numerically, not simulated. Figs. 3 and 4 display profiles for E + M P -values based on W 1, W 2, and for n = 50, 100, K =±1, ±2 and α =,. The plots suggest quite different behaviour for detecting alternatives θ > 1orθ < 1. For less than

8 Chris J. Lloyd / Computational tatistics & Data Analysis 51 (2007) Fig. 3. Power profiles Π(; 50,K,) of four E + M based tests. Top left: K = 1. Top right. K = 2. Bottom left: K = 1. Bottom right: K = 2. alternatives (K = 1, 2 in top plots), W 1 and perform best with W 1 clearly preferred when < 0.5 and slightly preferred when > 0.5. For greater than alternatives (K = 1, 2 in bottom plots), W 2 is clearly superior while and W 1 perform relatively poorly. The LR statistic performs almost as well as W 2. If a single compromise statistic is to be recommended, E + M P -values derived from the modified LR statistic seem to perform close to best for both kinds of alternatives Modification or no modification? In order to try and further summarise the general patterns, we have calculated the averages of these profiles to give a single measure. While this ignores some of the parameter space, it seems more sensible to take a average along the curve defined by (3) than to take an average of the entire parameter space. The results are in Table 4. The main new insight from this table is that E + M P -values based on the unmodified log-wald statistic W 2 (i.e. without adding 1 2 to all counts) performs even better than the modified log-wald statistic W 2 for greater than alternatives. A plot of the corresponding profiles indicate that the is almost uniformly superior for the test based on unmodified log-wald. It may also be noted that modification of the score statistic is contra-indicated for positive alternatives though neither version of is recommended for such alternatives in any case Is the M-step necessary after the E-step? While it is true that E P -values are not guaranteed to be valid, it may be that for practical purposes the computationally intensive M-step is not required. The simplest way to investigate this is to plot the E P -values against the E + M P - values over the range (0,0.2) of practical interest. In Fig. 5 we present plots for the unmodified log-wald statistic and the modified Wald statistic for n = 100, these being the two best statistics arising from the analysis. It seems

9 3772 Chris J. Lloyd / Computational tatistics & Data Analysis 51 (2007) Fig. 4. Power profiles Π(; 100,K,.05) of four E + M based tests. Top left: K = 1. Top right: K = 2. Bottom left: K = 1. Bottom right: K = 2. Table 4 Average profiles of eight alternative E + M P -values α n K W 1 W 1 W 2 W Highest is in bold font. clear from these plots that when the E P -value turns out to be small, say less than, the M-step is hardly necessary. However, for larger P -values the M-step has a non-negligible effect.

10 Chris J. Lloyd / Computational tatistics & Data Analysis 51 (2007) E P value M P value M P value Fig. 5. For n = 100, the left plots compares E and E + M P -values for the modified Wald statistic W 1 and the right plot for the unmodified log-wald statistic W 2. The effect of the theoretically essential M-step is to slightly increase the P -value, but the difference is negligible when the E P -value is less than. 5. Discussion Another method of comparing secondary and primary probability of success is by the simple difference rather than the ratio. Approximate confidence intervals which generate two-sided tests are given by Lui (2000) and Tang and Tang (2003). When only a proportion of individuals have a structural zero, inference has been studied by Tang and Tang (2004). The study in Lloyd (2005) has considered some other basic generating statistics, including one based on the conditional distribution of X given T. This statistic was found to be uncompetitive. Berger and Boos (1994) suggested a quite different method of accounting for nuisance parameters which involves maximising π(y, ) over a (1 γ) confidence region for and adding a penalty γ. Dependence of results on the choice of γ can be extreme, notwithstanding the general recommendation by Berger and idik (2003) that γ be small. It is quite unclear how to extend their ideas for multi-dimensional nuisance parameters. For this and other reasons, it is argued in Lloyd (2005) that the E + M approach is to be preferred. We have not investigated other null values besides θ 0 =1 in this paper though we have in all cases given the P -values in terms of a general null value θ 0. Unlike clinical trials where testing non-null values are of interest for establishing non-inferiority or bio-equivalence, it is not clear that such hypotheses are of interest in the contexts where structural zero matched pairs arise. There remain unresolved computational issues, especially in the maximisation step. A recent paper by Fang and Chen (2003) describes the use of the EM algorithm for this purpose but it is not clear how reliable this methodology is. ome computational issues are described in Appendix B. Computation times are largely dependent on the number of individuals n.forn = 100 we computed all 5151 possible EM P -values in roughly an hour. However, computing of a single P -value involves some overhead computations and when n = 100 a single P -value takes roughly 30 s, using the current unoptimised algorithm. Computation time for this naive algorithm increases with sample size at rate n 4 but can be reduced to O(n 2 log(n)) by using the known monotonicity of the test statistics. Development of efficient algorithms for EM P -value, both in the present context and in general, is an area of future research. R functions are available from the author. Appendix A. Roots of quadratic for profile MLE E P value Firstly, the restricted ML estimator ˆ 0 is an estimator of under the restriction that 2 = θ 0 π 11. Not surprisingly then, it can be shown that ˆ0 is an increasing function of ˆ = t/n and also an increasing function of ˆπ 11 = x/n for fixed t. We will show that the smaller of the two roots is always within [0, min(1, θ 1 )] by considering the data set x = t = 0 associated with the smallest roots and x = t = n associated with the largest roots.

11 3774 Chris J. Lloyd / Computational tatistics & Data Analysis 51 (2007) When x = 0 = t,wehavea = n, b = n, c = 0 and so the roots are n ± n 2 2nθ 0 giving solutions 0 and θ 1 0 and it is easy to check that ˆ 0 = θ 1 0 corresponds to a local maximum. When x = t = n we have a = 2nθ 0,b= 2n(1 + θ 0 ), c = 2n and so the roots are 2n(1 + θ 0 ) ± 4n 2 (1 + θ 0 ) 2 16n 2 θ 0 4nθ 0 = 1 + θ ± 1 θ 0 2θ 0 giving solutions θ 1 0 or 1. It follows that across all possible data sets, the smallest of the two solutions ranges from 0, when x = t = 0, to min(1, θ 1 0 ), when x = t = n. Appendix B. Computational aspects of the estimation and maximisation steps Both the E and M P -values require calculation of the profile π(y, ) := Pr(P (Y ) P(y); θ 0, ), which in turn requires defining the set {P(Y) P(y)}. In the worst case this would take N operations, where N is the cardinality of the sample space. In our example, Y =(X, T ) and the cardinality of the sample space N =n(n+1)/2 and so there are O(n 2 ) required. However, since our test statistics are known to be non-decreasing in x for fixed t, the set can be computed by bisection in x for fixed t which requires O(n log n) operations. The profile, which is the probability of this set, can also be efficiently computed since it is a weighted sum of binomial tail probabilities for each fixed t. The maximisation step requires finding the supremum of π(y, ). This function does not seem to have any special properties, except that it is a polynomial of degree n. It can contain quite extreme spikes. Little attention in the literature seems to have been paid to the possibility of missing such spikes. In our computations we have used a local optimiser in the R computing environment applied separately to a 10 equal sized intervals of [0, 1]. All P -values in this paper can be computed within a few seconds but could be computed much faster using monotonicity. A user friendly function is available from the author. References Agresti, A., Categorical data analysis. first ed. Wiley, New-York. Basu, D., On the elimination of nuisance parameters. J. Amer. tatist. Assoc. 72, Berger, R.L., Boos, D.D., P values maximised over a confidence set for the nuisance parameter. J. Amer. tatist. Assoc. 89, Berger, R.L., idik, K., Exact unconditional tests for a 2 2 matched pairs design. tatist. Methods Med. Res. 12, Bickel, P.J., Doksum, K.A., Mathematical tatistics. Holden-Day, Oakland. Buehler, R.J., Confidence intervals for the product of two binomial parameters. J. Amer. tatist. Assoc. 52, Fang, X.Z., Chen, J., EM algorithm and its application to testing hypotheses. ci. China A 46, Johnson, W.D., May, W.L., Combining 2 2 tables that contain structural zero. tatist. Med. 14, Lloyd, C.J., E + M P -values. Austral. NZ. J. tatist., submitted for publication and available as Working Paper Lloyd, C.J., Efficient and exact tests in a correlated 2 2 table with structural zero. Working Paper Lloyd, C.J., Moldovan, M., 2007a. Exact confidence bounds for the risk ratio in 2 2 tables with structural zero. Biometrical J., to appear. Lloyd, C.J., Moldovan, M., 2007b. Unconditional efficient upper limits for the odds ratio based on conditional likelihood. tatist. Med., to appear. Lui, K.J., Interval estimation of the risk ratio between secondary infection, given a primary infection, and the primary infection. Biometrics 54, Lui, K.J., Confidence intervals of the simple difference between the proportions of a primary infection and a secondary infection, given the primary infection. Biometrical J. 42, torer, B.E., Kim, C., Exact properties of some exact test statistics for comparing two binomial proportions. J. Amer. tatist. Assoc. 85,

12 Chris J. Lloyd / Computational tatistics & Data Analysis 51 (2007) Tang, N.., Tang, M.L., Exact unconditional inference for the risk ratio in a correlated 2 2 table with structural zero. Biometrics 58, Tang, N.., Tang, M.L., tatistical inference for risk difference in an incomplete correlated 2 2 table. Biometrical J. 45, Tang, M.L., Tang, N.., Exact tests for comparing two paired proportions with incomplete data. Biometrical J. 46, Toyota, M., Kudo, K., Kobori, O., High frequency of individuals with strong reactions to tuberculosis among clinical trainees. Japan J. Infectious Disease 52,

Power Comparison of Exact Unconditional Tests for Comparing Two Binomial Proportions

Power Comparison of Exact Unconditional Tests for Comparing Two Binomial Proportions Roger L. Berger Department of Statistics North Carolina State University Raleigh, NC 27695-8203 June 29, 1994 Institute