Biometrika (1991), 78, 4, pp. 891-902 Printed in Great Britain Approximations of marginal tail probabilities for a class of smooth functions with applications to Bayesian and conditional inference BY THOMAS J. DiCICCIO AND MICHAEL A. MARTIN Department of Statistics, Stanford University, Stanford, California 94305, U.S.A. SUMMARY This paper presents an asymptotic approximation of marginal tail probabilities for a real-valued function of a random vector, where the function has continuous gradient that does not vanish at the mode of the joint density of the random vector. This approximation has error O(n~ 2/2 ) and improves upon a related standard normal approximation which has error O(n~^). Derivation involves the application of a tail probability formula given by DiCiccio, Field & Fraser (1990) to an approximation of a marginal density derived by Tierney, Kass & Kadane (1989). The approximation can be applied for Bayesian and conditional inference as well as for approximating sampling distributions, and the accuracy of the approximation is illustrated through several numerical examples related to such applications. In the context of conditional inference, we develop refinements of the standard normal approximation to the distribution of two different signed root likelihood ratio statistics for a component of the natural parameter in exponential families. Some key words: Asymptotic expansion; Conditional likelihood; Confidence limit; Exponential family; Exponential regression model; Marginal posterior distribution function; Natural parameter; Normal approximation; Signed root likelihood ratio statistic. 1. INTRODUCTION Consider a continuous random vector X = (X 1,..., X p ) having probability density function of the form f x (x) = cb(x)exp{l(x)}, x = {x\...,x p ). (1) Suppose that the function / attains its maximum value at x = (x',..., x p ) and that X - x is O p {n~^) as some parameter n, usually sample size, increases indefinitely. For each fixed x, assume that l{x) and its partial derivatives are O(n) and that b(x) is 0(1). Now consider a real-valued variable Y = g(x), where the function g has continuous gradient that is nonzero at x. In this paper, we present an accurate approximation for marginal tail probabilities of Y that is easy to compute and does not involve numerical integration in high dimensions. To calculate an initial approximation of the marginal tail probability pr(y= _y), let x = x(y) be the value of x that maximizes l{x) subject to the constraint g(x) = y. Moreover, let y = g{x), so that Y-y is O p (n~^) and x(y) = x. Consider the function r(y) = sgn (y -y)(2[l(x) - l{x(y)}-\)k (2)
892 THOMAS J. DICICCIO AND MICHAEL A. MARTIN which is assumed to be monotonically increasing. Approximations to the distribution function of Y can be based on normal approximations to the distribution of R = r( Y). In particular, provided y y is O(n~*), PT(Y^y) = pr(r^r) = <t>(r) + O(n-i), (3) where r = r(y) and O is the standard normal distribution function. The standard normal approximation to the distribution of R can be improved. Additional notation is necessary to formulate a more accurate approximation. Let /,(JC) = dl(x)/dx', l ij (x) = d 2 l{x)/dx i dx J, g l {x) = dg(x)/dx i and g lj (x) = d 2 g(x)/dx i dx J, etc. (i,j=l,...,p). Put J 0 (y) = -l,j{ where k is any index such that g*{x(_y)} does not vanish. Such an index k always exists by virtue of the assumptions about g. Define J(y) = {Jij(y)}, and J(y)~ l = {J' J (y)}- Thus J(y) is a pxp matrix and J(y) = { l ij {x)}. Finally, let Q(y) = J u (y)g,{x(y)}gj{x(y)}, D{y) = {QiyWiyMAyW*- In this expression for Q(y) and in subsequent expressions, the summation convention is used. The improved approximation is where r = r(y), <f> is the standard normal probability density function and j is any index such that gj{x(y)} is nonzero. For the univariate casep = 1, when g is the identity function, approximation (4) reduces to ' U)/ "" il "'"'" 1 + O(n- 3 ' 2 ), (5) where r = r(x) = sgn (x - x)[2{/(jc) - /(*)}]* and / <k) (x) = d k l(x)/dx k (k= 1, 2). Formula (4) is especially useful in Bayesian situations where it provides accurate approximations to marginal posterior distribution functions. For such applications, it is convenient to have / the log likelihood function and b the prior density. An example of this type is considered in 3. In 2, we present a derivation of (4); we apply a tail probability approximation given by DiCiccio et al. (1990) to the approximation of a marginal density developed by Tierney et al. (1989). Section 3 contains several numerical examples which illustrate the accuracy of approximation (4) in a variety of situations. Applications of (4) to exponential families are discussed in 4. In particular, approximations to marginal tail probabilities for scalar functions of the sufficient statistic are given. Approximate conditional inference for the natural parameters of the family is also examined. 2. DERIVATION OF TAIL PROBABILITY APPROXIMATIONS DiCiccio et al. (1990) have considered tail probability approximations for (1) in the univariate case p = 1 with b(x) = 1. They showed that, provided x-x is O(n~^), pr (X^x) = *(r) + <M')[;+ iz^p] + O(n' 3/2 ), (6)
Approximations of marginal tail probabilities 893 where r is defined as for (5). This approximation applies even if the density of X is not completely known except for a normalizing constant. In particular, it is valid if f x (x) = c exp{/(jc)}{l + O(n~ 3/2 )} when x-x is O(n~ J ), where c is a normalizing constant such that c exp {/(x)} integrates to 1 + O(n~ 3/2 ). Approximation (4) can be derived by applying (6) to an approximation of an appropriate marginal density. Tierney et al. (1989, 1991) have given an asymptotic approximation to the marginal density of Y = g(x) for p s* 1. The renormalized version of their approximation to the true density f Y (y) is f* Y (y) = cd(y)[b{x(y)}/b(x)] exp [l{x(y)} - l(x)], (7) where c is a normalizing constant such that /t-(_y) integrates to l + O(n~ 3/2 ). Provided y y is Oin'*), this renormalized approximation has relative error of order n~ y2 ; that is,.my) =/*(.y){l + O(n~ y2 )}. Leonard, Hsu & Tsui (1989) also discuss the saddlepoint accuracy of(7). Now consider a change of variable W = h(y), where the function h(y) is chosen to satisfy dh(y)/dy = n~^d(y)b{x(y)}. Then h(y) is a monotonically increasing function. The Tierney et al. approximation to the density of W is f%,(w)ccexp{t(w)}, where Hh(y)} = l{x(y)}. Note that T(w) is maximized at w = h(y) and that T(w) = l(x). Application of (6) to this approximate marginal density of W yields where O{n~ y2 ), (8) w = h(y), r = sgn(h--w)[2{/"(vv)-r(m')}] i, T Ck \w) = d k T(w)/dw" (Ik = 1,2). Explicit knowledge of the function h(y) is not required to calculate approximation (8). Since w = h(y) in (8), it follows that r = sgn {h(y) -h{y)}{2[ T{h(y)} - T{h(y)}])^ = sgn (y-y)(2[l(x) - /{Jc(y)}])*, (9) which coincides with (2). To find an expression for T (x) (w)= T il) {h(y)} in (8), note that differentiation of T{h(y)} = l{x(y)} with respect to y yields T (n {h(y)}h w (y) = l,{x(y)}x' {l) (y), (10) where h il \y) = dh(y)/dy and x' (l) (y) = dx'(y)/dy (i = 1,...,p). A simple formula for the right-hand side of (10) is available by a Lagrange multiplier argument for maximizing l(x) subject to the constraint g(x) = y. By using such an argument, it may be shown that gi{x(y)}x[ u {y) = land l l {x(y)} = ^P^g l {x(y)} (i = l,...p), Sj\x\y)) for any index j having gj{x(y)} + 0. At least one such index always exists by assumption. Hence = nuj{x(y)}/[gj{x{y)}d(y)b{x(y)}l (12) where j is any index for which gj{x(y)} is nonzero. To find an expression for -l (2) (w) = -T i2) {h(y)} in (8), note that differentiation of (10) with respect to y yields w = -J ij (y)xuy)x{ l) (y). (13)
894 THOMAS J. DICICCIO AND MICHAEL A. MARTIN It follows from differentiation of (11) with respect to y that xu9) = J ii (y)gj{ )/Q{y) (/=l,...,p). (14) Substitution of (14) into (13) produces {-T w (w)}^nhhx)}-\ (15) Finally, by substitution of (9), (12) and (15) into (8), we obtain approximation (4). A desirable feature of (4) is its equivariance under invertible transformations of Y. For example, if Z = y(x) is related to y = g(x) by Y = (Z), where f is a real-valued, differentiate and increasing transformation, then the approximation to pr (Z^z) obtained by applying (4) to Z = y(x) directly coincides with the approximation to pr{y^ (z)} obtained by applying (4) to Y. Similarly, if is decreasing, then the approximation to pr(z^ z) coincides with that to 1-pr{y^f(z)}. Note, however, that (4) is not invariant under nonlinear transformations of the joint density (1). We discuss this issue further in 3. In the case where b{x) = 1 and g is a coordinate function, say g(x) = x 1, approximation (4) reduces to where r=r(x l ), D(x') ={/ II (x I ) 7(x 1 ) / J(jc I ) }- 1 and the components of J(y)i have the particularly simple form Jy(x') = l tj {x(x 1 )}. This formula was derived by DiCiccio et al. (1990). The conditions imposed on g place moderate limitations on the types of statistics for which tail probabilities may be approximated using (4). Leonard et al. (1989) present examples in which the Tierney et al. (1989) approximation for marginal densities produces inaccurate results. Approximation (4) also performs poorly for these instances. The examples of Leonard et al. focus on situations where the function g is many-to-one. For instance, if Y = g(x) = (X') 2 +... + (X P ) 2 and x = (0,...,0) then approximation (4) cannot be applied. On the other hand, if x is close to zero, although approximation (4) is formally applicable, it can be expected to yield poor results in small to moderately sized samples. An alternative approximation to (4) can be derived by applying (6) directly to the density (7) written as/t-(}0 oc exp{/*(>')}, where In general, a closed-form expression for x(y) is unavailable, and hence D(y) cannot be written explicitly. Since l*(y) depends on D(y), the maximizing point and derivatives of l*{y) required for application of (6) can be difficult to calculate. Numerical methods are available, however, that facilitate this application of (6). One drawback of (4) is that it can yield approximations which exceed one or are negative. Such problems can be avoided by using an alternative approximation. It is easily shown by Taylor expansion of the right-hand side of (4) that where c=c(y) This alternative approximation was suggested to us by Luke Tierney and a referee.
Approximations of marginal tail probabilities 895 Approximation (4) may be interpreted both algebraically and numerically. In certain circumstances, the approximation produces a convenient, closed-form expression estimating pr (y^^); see 3-2. However, in many cases the approximation does not result in a closed-form expression, and it is then most effectively viewed as a useful computational tool. 3. APPLICATIONS 3 1. Exponential regression model Feigl & Zelen (1965) investigated the relationship between survival time for leukaemia patients and a concomitant variable, patient white blood cell count. The sample they used consisted of 17 patients with acute myelogenous leukaemia. We study an exponential regression model for survival time T, which has density function conditional on x, the base 10 logarithm of white blood cell count, f(t\x) = 0" 1 exp (~t/6 x ), for t>0, where 6 X = exp (Po + Pix). Inference about 6 X for a specified value of x is important. The density function of Y = log T, conditional on x, is exp{j>-0 o -j3 1 x-expo'-0o-0 1 x)} (-oo<_y<oo); alternatively, we may write Y r ~/3 0 + )3 1 x +, where e has an extreme value distribution with density exg (z - e z ) for -oo<z<oo (Lawless, 1982). Let /3 = (@o, P\) be the maximum likelihood estimator of/s = (/3 0, P\)- When censoring is absent, the residuals A, = Y^-/3 0 -)3,x, (i = 1,..., n) are ancillary statistics. Let A = (A t,..., A n ) and (Z o, Z,) = (/3 -)9). Inference about )3 and about 0 x for some specified x, say XQ, can be based on the conditional density of Z = (Z o, Z,) given A. This conditional density is f z]a (z 0, z, a)ocexp{/(z 0, z,)}, where {a / -Zo-z 1 x / -exp(a i -z o -z I x I )}; (16) see Lawless (1982, p. 290), who develops exact conditional procedures based on /Z\A(ZO, ZI O)- We focus on inference for 0^ by considering the pivotal quantity Z 2 = Z 0 + Z 1 x 0 = \og6 Xo -\og6 Xo. Lawless derives an exact formula for pr(z 2 = _y A = a). Unfortunately, Lawless's technique does not extend easily beyond the case of a single regressor variable, as it requires numerical integration of the density of Z given A. As an alternative to the exact conditional procedure, we could use the large-sample normal approximation fi ~ N(f}, /"'), where / is the observed information matrix. Then Z 2 has an approximate normal distribution on which tests and confidence intervals for 8^ can be based. Table 1 contains exact and approximate values of pr (Z 2 = y \ A = a) for various values of y in the case when x 0 = 50 000. Exact tail probabilities were computed using equation (6.3.14) of Lawless (1982, p. 292) by numerical integration. For approximations (3) and Table 1. Approximations to tail probabilities of Z 2 y -105-0-95-0-85-0-65-0-55 0-35 0-55 0-75 Exact 0-6840 1-2222 21223 5-8273 91763 11-5163* 30101* 0-4468* Approximation (3) 0-4993 0-9086 1-6079 4-5957 7-3915 14-7927* 40157* 0-6524* Approximation (4) 0-6886 1-2290 21414 5-8684 9-2132 11-4800* 2-9958* 0-4448* Large sample 01274 0-3166 0-7288 30883 5-6986 15-7249* 5-6986* 1-5567* * Denotes tail probability taken to the right Table entries are percentages.
896 THOMAS J. DICICCIO AND MICHAEL A. MARTIN (4) we chose b in equation (1) to be 1 and / to be given by (16). For all values of y considered, approximation (4) gives results very close to the exact tail probabilities. Approximation (3) and the large-sample normal approximation give relatively inaccurate estimates. We now compare 95% confidence intervals for 6^ when x 0 = 50 000, obtained by the methods discussed above. Upper and lower 2-5% percentage points for the distribution of Z 2 and 95% confidence intervals for log 6^ and 6^ for each of the techniques are presented in Table 2. The intervals corresponding to approximations (3) and (4) were computed by numerical inversion of those formulae. The intervals obtained using approximation (4) are very close to the exact intervals, while those obtained using (3) are less accurate but still reasonable. The intervals derived from the large-sample normal approximation are quite inaccurate in comparison with the other intervals, which suggests that larger samples might be needed to obtain high accuracy with this method. Table 2. 95% confidence intervals for mean survival time of patients with white blood cell count of 50 000 Lower Upper 95% c.i. for log 0 x 95% c.i. for 6 X Exact -0-8192 0-5727 (2-6879,4-0799) (14-7008, 59-1371) Approximation (3) -0-7689 0-6092 (2-6514,4-0295) (14-1741,56-2324) Approximation (4) -0-8188 0-5723 (2-6883,4-0795) (14-7071, 591143) Large sample -0-6820 0-6820 (2-5786,3-9427) (13-1788,51-5554) Tierney et al. (1989) consider Bayesian inference for this model based on an improper uniform prior density on 0, = log /3 0 and 6 2 = /3,. Approximation (4) could be applied to produce approximate marginal posterior tail probabilities in this Bayesian context by choosing b, the prior density, equal to 1 and / to the log likelihood. Approximate posterior quantiles for linear functions of ft obtained in this way coincide with the approximate conditional confidence limits for those parameters obtained using (4) with our previous choice of b and /. This correspondence is natural because of the connection between Bayesian and conditional inference for location models under the assumption of uniform priors. Tierney et al. consider in particular construction of an approximate marginal posterior density for the two year survival probability of patients at a white blood cell count of 50 000. This probability is an increasing function of 6^. Consequently, use of (4) to derive approximate posterior quantiles or confidence limits for this probability produces the same results as transforming in the natural way the approximate quantiles or limits derived for 6^. 3-2. Noncentral t distribution Let X,,..., X n be independent and identically distributed observations from a normal N(fi, a 2 ) population, and let X = n~ x 1 X, and S 2 = (n - I)" 1 (X, -X) 2 denote sample mean and sample variance respectively. Given a value x 0, the quantity T'= n\x o -X)/ S has a noncentral t distribution with n 1 degrees of freedom and noncentrality parameter n\x 0 fi)/a. Computation of tail probabilities for the noncentral t distribution is difficult since it requires numerical integration of the noncentral t density, which is typically written in integral or infinite series form. In contrast, computation of approximation (4) to tail probabilities is relatively easy.
Approximations of marginal tail probabilities 897 Without loss of generality, suppose that the normal population has zero mean and unit variance. For each of the four choices of variables (U, V) = (X, S), (X, log 5), (X, S~ l ) and (X, S 2 ), we compute approximation (4) taking b in equation (1) to be a constant and / to be the logarithm of the joint density of U and V. Put Y = (x 0 - X)/S = n~^t'. We compute tail probability approximations for Y based on equation (4) for each of the four choices of variables. For the variables (X, S), we have for n > 3, where 1).VI ( ^ (17) nu [2(n2) + nyxv) nyx o v) J For the variables (X, log 5) the approximation is for n > 1, where pr(y^y)^<t>(r) + 4,(r)\-+(nueT i \- ^["'^ Vl (18) \_r [2(n l) + ny - nyu exp (-u)j J r = sgn (y-x X) ){nx o u-2{n - 1)6}*, u = x o -ye D, v = log {\{n - 1 + ny 2 y l [nyx 0 + {(nyx 0 ) 2 + 4(n - l)(n - 1 + n/)}*]). For the variables (X, S" 1 ), we have for n ^ 2, where = sgn /«1\*1 f -x 0^ I n (n 1) nu +3n^ -2nxz o u}(n 1 + " Finally, for the variables (X, S 2 ) we have the approximation for n ^ 4, where -x o (^ -J J (n-3)1 log (^ YJ-log u + nx o u I, u = x o -j't5 1, ), (19) (20) u = Kn - 1 + ny 2 y 2 [nyx 0 + {(nyx 0 ) 2 + 4{n-3)(n - 1 + n The results of a numerical study of tail probabilities for Y with n = 5, x 0 = 0, corresponding to a central t distribution, and n = 5, x o = 3-29053, corresponding to a noncentral t distribution, are given in Table 3. Approximations (17)-(20) all appear to perform reasonably well; in each case substantial improvement is gained over approximation (3).
898 THOMAS J. DICICCIO AND MICHAEL A. MARTIN Table 3. Tail probability approximations related to the noncentral t distribution y Exact (X, (3) 5) (18) (X, (3) logs) (19) (X, (3) 1/S) (20) (X, (3) 5 2 ) (21) -21-1-7-1-3-10 -0-6 1-5 1-6 1-8 2-0 2-3 6-5 80 9-7 12-4 0-467 0-954 2191 4-451 12-541 0-451 0-836 2-272 4-855 11-057 9-964 4-929 2-471 0-990 0-887 1-611 3-247 5-941 14-553 0-357 0-665 1-832 3-966 9-203 13-874 7-608 4-237 1-963 0-465 0-954 2197 4-467 12-577 0-444 0-822 2-237 4-787 10-920 10128 5019 2-518 1-006 Table entries are percentages. Values for y 3*6-5 are for right-hand tail. 0-309 0-670 1-654 3-585 11140 0-722 1-311 3-421 7031 15178 6-381 2-897 1-347 0-494 0-477 0-971 2-221 4-496 12-606 3-29053 0-454 0-841 2-282 4-872 11-083 10003 4-959 2-492 1-001 0110 0-285 0-859 2-203 8-644 1-311 2-316 5-727 11181 22-493 2-746 1029 0-399 0115 0-447 0-917 2115 4-322 12-315 0-475 0-876 2-364 5-025 11-366 9-746 4-809 2-432 0-963 2-644 4018 6-591 10142 19-433 0151 0-290 0-841 1-918 4-789 27-843 18-359 12-208 7130 0-203 0-570 1-620 3-695 11-601 0-485 0-898 2-436 5197 11-799 7-594 3141 1181 0190 The choice of variables seems important for the accuracy of the approximation. Approximations (17) and (18) involving variables (X, S) and (X, log 5) are clearly best, while approximation (20) involving (X, S 2 ) is the most inaccurate. An interesting point is that for x 0 = 0, approximation (17) is most accurate, while for the case *o = 3-29053, approximation (18) is preferable. Tierney et al. (1989) consider the problem of estimating the proportion of a normal N(fi,a 2 )_ population that falls below a point x 0. They study the estimator P = &{( x o~x)/s} of this quantity. Since <&(.) is a monotone function, approximation (4) to the tail probability pr(p^p) for each of the four choices of variables considered above can be obtained by replacing y by $>~ l (p) in formulae (17)-(20). 4. APPLICATIONS TO EXPONENTIAL FAMILIES 4-1. Marginal tail probabilities for a function of the sufficient statistic Suppose T,,..., T n is a sample of size n from the exponential family having density Let /3'(T) = dp(r)/dt, and /3 y (r) = d^i^/drjtj (i,j = 1,..., p). Put B(T) = {p lj (r)} and B(T) ' = {/3, 7 (T)} SO that B(T) is a pxp matrix and \B(T)\ is of order O(l). The density of the sufficient statistic X = n~ l 1 T, satisfies f x (x; r) = c B(f) "» exp [n{p(f)-p(r) + x'(f,-t,)}]{1 + O(n~ 3/2 )}, (21) provided x is O(n~^), where T is given by x' = )3'(T) (I = 1,..., p) and c is a normalizing constant such that the approximation on the right-hand side of (21) integrates to 1 + O(n~ 3/2 ). Here, f is the maximum likelihood estimator of T based on the observation X = x Formula (21) was derived by Barndorff-Nielsen & Cox (1979); see also Daniels (1958), Durbin (1980) and Reid (1988).
Approximations of marginal tail probabilities 899 Marginal tail probabilities for a real-valued function Y = g(x) can be approximated by applying (4) to (21). It is convenient to make the choice Note that l(x) attains its maximum value at the point x = (x\..., x p ) given by x' = -/3'(T). For fixed y, suppose x = x(y) maximizes l(x) subject to the constraint g(x) =y, and let r = f(y) satisfy X' = -B'(T) (i = 1,...,/?). Observe that r(y) = T for y = g(x). Then approximation (4) to the marginal tail probability pr(y* y) is where k is any index for which g k (x) does not vanish and { r k - r k (22) For the case of the coordinate function g(x) = x p, it is convenient to partition T into (A, iff), where_i = T p, and A = (A,,... z A p _,) has A a = r a (a = 1,...,p- 1). Then T(X P ) = (A, ip), with (^ given by x" = -/3 P (A, (/»), and (22) reduces to where pr (X" ^ x") = O(r) + <f>(r)\ l -+ n^ ^""^ f )} "*1 + O(n'^), (23) 4-2. Conditional inference for a component of the natural parameter Now suppose r = (A, i^) and tp is the parameter of interest, with A being a nuisance parameter. Unfortunately, the marginal distribution of X p is not particularly useful for inference about tp since that distribution depends on the nuisance parameter; indeed, approximation (23) involves A. However, the conditional distribution of X p given Y = (y 1,..., Y"~ 1 ) = (X\...,X P ' 1 ), depends on T only through <p. Bamdorff-Nielsen & Cox (1979) have shown that the conditional density of X p given Y satisfies x{l + O(n- 3/2 )}, (24) provided x"-x p is O(n'^), where (A, 4>) = f, where A* satisfies y" = -B"iX*, *P) \a = 1,..., p - 1), Bi(A, tp) is the (p -1) x (p - 1) submatrix of B(A, i/>) corresponding to A, and where c is a normalizing constant such that the approximation on the right-hand side of (24) integrates to 1 + O(n~ 3/2 ). Here A* is the constrained maximum likelihood estimator of A under the fixed value of ip having observed X = x, and x p is defined below.
900 THOMAS J. DICICCIO AND MICHAEL A. MARTIN Approximations to conditional tail probabilities for X p can be obtained by applying (5) to (24). In this case it is natural to choose Since y" = x a (a - 1,...,p-\) are taken as fixed and x' = -B'(k, </)) (i = 1,...,p), it is possible to regard (A, tp) as a function of x p alone. Straightforward calculations give / (1) (x') = «(< -*) and -P\x") = nb pp (k, $), where {B o (\, *)} = B(k, *)~\ Thus, /(*') is maximized at x p = -B p (k*, tp), and since B(k, <p) is positive definite for all (A, if/), it follows that x p is a decreasing function of \p. Formula (5) yields the approximation where pr (X p^x»\y = y) = *(r) + *(,)[! + -* 1 ff' (.f'^'vl + O{n~^), (25) r = sgn(x p -xn[2n{pa*,+)-f}aj)-y a a a -Xt)-x''(<i,-<l,)}f. (26) Having observed X = x, an exact upper 1 a conditional confidence limit for the parameter of interest can be computed as the value of ip such that pr (X p = x p \ Y = y) = I a. Similarly, an approximate conditional limit can be computed as the value of ip for which the right-hand side of (25) equals I-a. This approximate limit differs from the exact limit by terms of order O p (n~ 2 ), and the corresponding conditional confidence interval has coverage error of order O(n~ 3/2 ). In contrast, approximation (3) yields pr (X p ^ x p \ Y = y) 3>(r). The approximate confidence limit calculated as the value of ip for which <&(r) = 1 - a differs from the exact limit by terms of order O p (n~ l ), and the resulting interval has coverage error of order O(n^). This approach to constructing approximate confidence limits is closely related to a method given by Barndorff-Nielsen (1986). Since sgn (x p -jc /> ) = sgn (ip $), it follows that r defined at (26) is simply the signed root of the likelihood ratio statistic for ip. Barndorff-Nielsen shows that the marginal distribution of r - r~ l log K is standard normal to error of order O(n~ 3/2 ), where K is a variable admitting the expansion K = 1 + <?,(A*)r+ Q 2 (A*)r 2 + O p (n- 3/2 ), and Q, = CMA*) and Q 2 = CM^*) a re O p (n~ i ) and O p (n~ l ), respectively. An approximate upper 1 - a confidence limit for the parameter of interest can be calculated as the value of ip for which <J>(r-r~' log K) = 1 -a. Note that, to error of order O p (n~ 3/2 ), <t>(r-r l log K) = 4>{r-Q i -i(q 2 For the present problem, it follows from Barndorff-Nielsen's formula (3.11) that K j l_f B 1 (A^L ) ] i r~" ip-ip 1 \B(k,ip)\ J ' The approximate upper I-a limits obtained through <&(r-r~ l log K) and (25) therefore differ by terms of order O p (n~ 2 ), and the corresponding conditional confidence intervals both have coverage error of order O(n~ 3/2 ). The primary advantage in considering (25) is that it shows these limits to have conditional validity. Approximation (25) is also given by Skovgaard (1987) and is discussed by Davison (1988).
Approximations of marginal tail probabilities 901 Using (24), Barndorff-Nielsen & Cox (1979, formula (6.1)) have approximated the conditional log likelihood function for ip based on the observation X = x by T{*;x p \y)=$log\b l (k*,<l,)\-n{p{%*,t) + y %* + xw. (27) Let ip be the value of ip for which (27) is maximized, and put A = A*. Since y a = x a (a = 1,..., p -1) are taken as fixed, it is possible to regard (A, ip) as a function of x p ajone. The difference between the approximate conditional maximum likelihood estimator ip and the unconditional maximum likelihood estimator ip is of order O p (n~'); see Barndorff-Nielsen & Cox (1979, formula (6.2)). By a variation of the argument given by Barndorff-Nielsen & Cox that leads to their formulae (3.15) and (4.8), it follows that /x' y(x" \y) = c\b(k, iht^a x{l + O(n- 3/2 )}, (28) where c is a normalizing constant such that the approximation on the right-hand side of (28) integrates to l + O(rT 3/2 ). As for (24), approximations to conditional tail probabilities for X p can be obtained by applying (5) to (28). In this case, the choice is convenient. Since l(x") = -T{ip~; x p \y)-nx p tf/, we have l il) (x p ) = n(4>-tp). Hence x p is the value of x p whose corresponding i/f equals tp, and moreover, Formula (5) yields the approximation where 3 PP (A7, Since the -5log B,(A, \p)\ term in l(x p ) is O(l), it can be shown that (30) '{b{x»)y *-' 1 " Consequently, an alternative approximation to (29) is (,/,-</,) In an as yet unpublished technical report, D. A. S. Fraser and N. Reid have derived approximation (31) using different techniques.
902 THOMAS J. DICICCIO AND MICHAEL A. MARTIN Note that r defined at (30) is the signed root of the approximate conditional likelihood ratio statistic for \\i having observed X = x As in the case of (25), an approximate conditional upper 1 - a confidence limit for \fi can be computed as the value of i/> for which the right-hand side of (29) equals I-a. These approximate limits have the same asymptotic properties as those derived from (25), and they improve upon the usual limits derived from the uncorrected standard normal approximation O(r). For more general parametric models, corrections that improve the accuracy of the standard normal approximation to distributions of signed roots of likelihood ratio statistics can be derived from formula (4). Welch & Peers (1963) and Peers (1965) have described how a prior density function for a vector parameter should be chosen so that the posterior quantiles for a component of the vector are approximate confidence limits in the repeated sampling sense, having coverage error of order O{n~ x ). Using such prior densities, modifications to signed roots of likelihood ratio statistics can be derived by applying formula (4) to the joint posterior density of the vector parameter. We develop these corrections in an as yet unpublished paper and relate them to modifications proposed by Barndorff-Nielsen (1990a, b, c). Barndorff-Nielsen's modifications arise from integration of his formula for approximating the conditional density of the maximum likelihood estimator given an ancillary statistic. ACKNOWLEDGEMENT We are grateful to Luke Tierney for helpful discussions. REFERENCES BARNDORFF-NIELSEN, O. E. (1986). Inference on full or partial parameters based on the standardized signed log likelihood ratio. Biometrika 73, 307-22. BARNDORFF-NIELSEN, O. E. (1990a). Discussion of paper by D. A. Sprott. Can. J. Statist. 18, 12-4. BARNDORFF-NIELSEN, O. E. (1990b). A note on the standardized signed log likelihood ratio. Scand. J. Statist. 17, 157-60. BARNDORFF-NIELSEN, O. E. (1990C). Approximate interval probabilities. J. R. Statist. Soc B 52, 485-96. BARNDORFF-NIELSEN, O. E. & Cox, D. R. (1979). Edgeworth and saddlepoint approximations with statistical applications (with discussion). J. R. Statist. Soc. B 41, 279-312. DANIELS, H. E. (1958). Discussion of paper by D. R. Cox. J. R. Statist. Soc. B 20, 236-8. DAVISON, A. C. (1988). Approximate conditional inference in generalized linear models. /. R. Statist. Soc B 50, 445-61. DICICCIO, T. J., FIELD, C. A. & FRASER, D. A. S. (1990). Approximations of marginal tail probabilities and inference for scalar parameters. Biometrika 77, 77-95. DURBIN, J. (1980). Approximations for densities of sufficient estimators. Biometrika 67, 311-33. FEIGL, P. & ZELEN, M. (1965). Estimation of exponential survival probabilities with concomitant information. Biometrics 21, 826-37. LAWLESS, J. F. (1982). Statistical Models and Methods for Lifetime Data. New York: Wiley. LEONARD, T., HSU, J. S. J. & Tsui, K.-W. (1989). Bayesian marginal inference. J. Am. Statist. Assoc. 84, 1051-8. PEERS, H. W. (1965). On confidence points and Bayesian probability points in the case of several parameters. J. R. Statist Soc B 27, 9-16. REID, N. (1988). Saddlepoint methods and statistical inference (with discussion). Statist Set 3, 213-38. SKOVGAARD, I. M. (1987). Saddlepoint expansions for conditional distributions. / AppL Prob. 24, 875-87. TIERNEY, L., KASS, R. E. & KADANE, J. B. (1989). Approximate marginal densities of nonlinear functions. Biometrika 76, 425-33. Amendment (1991), 78, 233-4. WELCH, B. L. & PEERS, H. W. (1963). On formulae for confidence points based on intervals of weighted likelihoods. J. R. Statist Soc B 25, 318-29. [Received April 1990. Revised March 1991]