Approximations of marginal tail probabilities for a class of smooth functions with applications to Bayesian and conditional inference

Similar documents
CONVERTING OBSERVED LIKELIHOOD FUNCTIONS TO TAIL PROBABILITIES. D.A.S. Fraser Mathematics Department York University North York, Ontario M3J 1P3

A simple analysis of the exact probability matching prior in the location-scale model

DEFNITIVE TESTING OF AN INTEREST PARAMETER: USING PARAMETER CONTINUITY

ASSESSING A VECTOR PARAMETER

Conditional confidence interval procedures for the location and scale parameters of the Cauchy and logistic distributions

Conditional Inference by Estimation of a Marginal Distribution

The formal relationship between analytic and bootstrap approaches to parametric inference

Approximate Inference for the Multinomial Logit Model

Stat 5101 Lecture Notes

A NOTE ON LIKELIHOOD ASYMPTOTICS IN NORMAL LINEAR REGRESSION

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

Bootstrap and Parametric Inference: Successes and Challenges

Chapter 11. Taylor Series. Josef Leydold Mathematical Methods WS 2018/19 11 Taylor Series 1 / 27

Australian & New Zealand Journal of Statistics

Modern Likelihood-Frequentist Inference. Donald A Pierce, OHSU and Ruggero Bellio, Univ of Udine

Noninformative Priors for the Ratio of the Scale Parameters in the Inverted Exponential Distributions

Modern Likelihood-Frequentist Inference. Summary

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion

Parametric Evaluation of Lifetime Data

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data

Measuring nuisance parameter effects in Bayesian inference

A PRACTICAL WAY FOR ESTIMATING TAIL DEPENDENCE FUNCTIONS

Remarks on Improper Ignorance Priors

A correlation coefficient for circular data

The Surprising Conditional Adventures of the Bootstrap

Staicu, A-M., & Reid, N. (2007). On the uniqueness of probability matching priors.

A Very Brief Summary of Statistical Inference, and Examples

11 Survival Analysis and Empirical Likelihood

Applied Asymptotics Case studies in higher order inference

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Default priors and model parametrization

University of California, Berkeley

2 Functions of random variables

Inversion Base Height. Daggot Pressure Gradient Visibility (miles)

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

LECTURE NOTES 57. Lecture 9

Inference on reliability in two-parameter exponential stress strength model

11. Bootstrap Methods

Approximating models. Nancy Reid, University of Toronto. Oxford, February 6.

Miscellanea Kernel density estimation and marginalization consistency

STEIN-TYPE IMPROVEMENTS OF CONFIDENCE INTERVALS FOR THE GENERALIZED VARIANCE

The Logit Model: Estimation, Testing and Interpretation

Cheng Soon Ong & Christian Walder. Canberra February June 2018

OR MSc Maths Revision Course

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

Likelihood Inference in the Presence of Nuisance Parameters

Math 423/533: The Main Theoretical Topics

A note on profile likelihood for exponential tilt mixture models

PATTERN RECOGNITION AND MACHINE LEARNING

BARTLETT IDENTITIES AND LARGE DEVIATIONS IN LIKELIHOOD THEORY 1. By Per Aslak Mykland University of Chicago

Efficiency of Profile/Partial Likelihood in the Cox Model

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Accurate directional inference for vector parameters

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Sturm-Liouville Theory

University of Toronto

Bayesian and frequentist inference

Likelihood Inference in the Presence of Nuisance Parameters

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

STAT 730 Chapter 4: Estimation

Testing the homogeneity of variances in a two-way classification

PRINCIPLES OF STATISTICAL INFERENCE

Accurate directional inference for vector parameters

The Restricted Likelihood Ratio Test at the Boundary in Autoregressive Series

Sufficiency and conditionality

Reliability of Coherent Systems with Dependent Component Lifetimes

5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality.

Distribution Theory. Comparison Between Two Quantiles: The Normal and Exponential Cases

Sample size calculations for logistic and Poisson regression models

GOLDEN SEQUENCES OF MATRICES WITH APPLICATIONS TO FIBONACCI ALGEBRA

Fiducial Inference and Generalizations

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Variational Inference. Sargur Srihari

Derivatives and Integrals

Measure-theoretic probability

SEQUENTIAL TESTS FOR COMPOSITE HYPOTHESES

Part 6: Multivariate Normal and Linear Models

Improved Inference for First Order Autocorrelation using Likelihood Analysis

Lecture 19: Solving linear ODEs + separable techniques for nonlinear ODE s

INTRODUCTION TO PATTERN RECOGNITION

REVIEW OF DIFFERENTIAL CALCULUS

Stat 5102 Final Exam May 14, 2015

A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL

On consistency of Kendall s tau under censoring

Lecture 4: Numerical solution of ordinary differential equations

ESTIMATING THE MEAN LEVEL OF FINE PARTICULATE MATTER: AN APPLICATION OF SPATIAL STATISTICS

Quantile methods. Class Notes Manuel Arellano December 1, Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be

Theory and Methods of Statistical Inference. PART I Frequentist theory and methods

Lecture 4. f X T, (x t, ) = f X,T (x, t ) f T (t )

Statistical Models. ref: chapter 1 of Bates, D and D. Watts (1988) Nonlinear Regression Analysis and its Applications, Wiley. Dave Campbell 2009

Introduction to Maximum Likelihood Estimation

BFF Four: Are we Converging?

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky

Multivariate Analysis and Likelihood Inference

Massachusetts Institute of Technology Department of Economics Time Series Lecture 6: Additional Results for VAR s

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints

PROD. TYPE: COM ARTICLE IN PRESS. Computational Statistics & Data Analysis ( )

On prediction and density estimation Peter McCullagh University of Chicago December 2004

Stat 5101 Notes: Algorithms

Transcription:

Biometrika (1991), 78, 4, pp. 891-902 Printed in Great Britain Approximations of marginal tail probabilities for a class of smooth functions with applications to Bayesian and conditional inference BY THOMAS J. DiCICCIO AND MICHAEL A. MARTIN Department of Statistics, Stanford University, Stanford, California 94305, U.S.A. SUMMARY This paper presents an asymptotic approximation of marginal tail probabilities for a real-valued function of a random vector, where the function has continuous gradient that does not vanish at the mode of the joint density of the random vector. This approximation has error O(n~ 2/2 ) and improves upon a related standard normal approximation which has error O(n~^). Derivation involves the application of a tail probability formula given by DiCiccio, Field & Fraser (1990) to an approximation of a marginal density derived by Tierney, Kass & Kadane (1989). The approximation can be applied for Bayesian and conditional inference as well as for approximating sampling distributions, and the accuracy of the approximation is illustrated through several numerical examples related to such applications. In the context of conditional inference, we develop refinements of the standard normal approximation to the distribution of two different signed root likelihood ratio statistics for a component of the natural parameter in exponential families. Some key words: Asymptotic expansion; Conditional likelihood; Confidence limit; Exponential family; Exponential regression model; Marginal posterior distribution function; Natural parameter; Normal approximation; Signed root likelihood ratio statistic. 1. INTRODUCTION Consider a continuous random vector X = (X 1,..., X p ) having probability density function of the form f x (x) = cb(x)exp{l(x)}, x = {x\...,x p ). (1) Suppose that the function / attains its maximum value at x = (x',..., x p ) and that X - x is O p {n~^) as some parameter n, usually sample size, increases indefinitely. For each fixed x, assume that l{x) and its partial derivatives are O(n) and that b(x) is 0(1). Now consider a real-valued variable Y = g(x), where the function g has continuous gradient that is nonzero at x. In this paper, we present an accurate approximation for marginal tail probabilities of Y that is easy to compute and does not involve numerical integration in high dimensions. To calculate an initial approximation of the marginal tail probability pr(y= _y), let x = x(y) be the value of x that maximizes l{x) subject to the constraint g(x) = y. Moreover, let y = g{x), so that Y-y is O p (n~^) and x(y) = x. Consider the function r(y) = sgn (y -y)(2[l(x) - l{x(y)}-\)k (2)

892 THOMAS J. DICICCIO AND MICHAEL A. MARTIN which is assumed to be monotonically increasing. Approximations to the distribution function of Y can be based on normal approximations to the distribution of R = r( Y). In particular, provided y y is O(n~*), PT(Y^y) = pr(r^r) = <t>(r) + O(n-i), (3) where r = r(y) and O is the standard normal distribution function. The standard normal approximation to the distribution of R can be improved. Additional notation is necessary to formulate a more accurate approximation. Let /,(JC) = dl(x)/dx', l ij (x) = d 2 l{x)/dx i dx J, g l {x) = dg(x)/dx i and g lj (x) = d 2 g(x)/dx i dx J, etc. (i,j=l,...,p). Put J 0 (y) = -l,j{ where k is any index such that g*{x(_y)} does not vanish. Such an index k always exists by virtue of the assumptions about g. Define J(y) = {Jij(y)}, and J(y)~ l = {J' J (y)}- Thus J(y) is a pxp matrix and J(y) = { l ij {x)}. Finally, let Q(y) = J u (y)g,{x(y)}gj{x(y)}, D{y) = {QiyWiyMAyW*- In this expression for Q(y) and in subsequent expressions, the summation convention is used. The improved approximation is where r = r(y), <f> is the standard normal probability density function and j is any index such that gj{x(y)} is nonzero. For the univariate casep = 1, when g is the identity function, approximation (4) reduces to ' U)/ "" il "'"'" 1 + O(n- 3 ' 2 ), (5) where r = r(x) = sgn (x - x)[2{/(jc) - /(*)}]* and / <k) (x) = d k l(x)/dx k (k= 1, 2). Formula (4) is especially useful in Bayesian situations where it provides accurate approximations to marginal posterior distribution functions. For such applications, it is convenient to have / the log likelihood function and b the prior density. An example of this type is considered in 3. In 2, we present a derivation of (4); we apply a tail probability approximation given by DiCiccio et al. (1990) to the approximation of a marginal density developed by Tierney et al. (1989). Section 3 contains several numerical examples which illustrate the accuracy of approximation (4) in a variety of situations. Applications of (4) to exponential families are discussed in 4. In particular, approximations to marginal tail probabilities for scalar functions of the sufficient statistic are given. Approximate conditional inference for the natural parameters of the family is also examined. 2. DERIVATION OF TAIL PROBABILITY APPROXIMATIONS DiCiccio et al. (1990) have considered tail probability approximations for (1) in the univariate case p = 1 with b(x) = 1. They showed that, provided x-x is O(n~^), pr (X^x) = *(r) + <M')[;+ iz^p] + O(n' 3/2 ), (6)

Approximations of marginal tail probabilities 893 where r is defined as for (5). This approximation applies even if the density of X is not completely known except for a normalizing constant. In particular, it is valid if f x (x) = c exp{/(jc)}{l + O(n~ 3/2 )} when x-x is O(n~ J ), where c is a normalizing constant such that c exp {/(x)} integrates to 1 + O(n~ 3/2 ). Approximation (4) can be derived by applying (6) to an approximation of an appropriate marginal density. Tierney et al. (1989, 1991) have given an asymptotic approximation to the marginal density of Y = g(x) for p s* 1. The renormalized version of their approximation to the true density f Y (y) is f* Y (y) = cd(y)[b{x(y)}/b(x)] exp [l{x(y)} - l(x)], (7) where c is a normalizing constant such that /t-(_y) integrates to l + O(n~ 3/2 ). Provided y y is Oin'*), this renormalized approximation has relative error of order n~ y2 ; that is,.my) =/*(.y){l + O(n~ y2 )}. Leonard, Hsu & Tsui (1989) also discuss the saddlepoint accuracy of(7). Now consider a change of variable W = h(y), where the function h(y) is chosen to satisfy dh(y)/dy = n~^d(y)b{x(y)}. Then h(y) is a monotonically increasing function. The Tierney et al. approximation to the density of W is f%,(w)ccexp{t(w)}, where Hh(y)} = l{x(y)}. Note that T(w) is maximized at w = h(y) and that T(w) = l(x). Application of (6) to this approximate marginal density of W yields where O{n~ y2 ), (8) w = h(y), r = sgn(h--w)[2{/"(vv)-r(m')}] i, T Ck \w) = d k T(w)/dw" (Ik = 1,2). Explicit knowledge of the function h(y) is not required to calculate approximation (8). Since w = h(y) in (8), it follows that r = sgn {h(y) -h{y)}{2[ T{h(y)} - T{h(y)}])^ = sgn (y-y)(2[l(x) - /{Jc(y)}])*, (9) which coincides with (2). To find an expression for T (x) (w)= T il) {h(y)} in (8), note that differentiation of T{h(y)} = l{x(y)} with respect to y yields T (n {h(y)}h w (y) = l,{x(y)}x' {l) (y), (10) where h il \y) = dh(y)/dy and x' (l) (y) = dx'(y)/dy (i = 1,...,p). A simple formula for the right-hand side of (10) is available by a Lagrange multiplier argument for maximizing l(x) subject to the constraint g(x) = y. By using such an argument, it may be shown that gi{x(y)}x[ u {y) = land l l {x(y)} = ^P^g l {x(y)} (i = l,...p), Sj\x\y)) for any index j having gj{x(y)} + 0. At least one such index always exists by assumption. Hence = nuj{x(y)}/[gj{x{y)}d(y)b{x(y)}l (12) where j is any index for which gj{x(y)} is nonzero. To find an expression for -l (2) (w) = -T i2) {h(y)} in (8), note that differentiation of (10) with respect to y yields w = -J ij (y)xuy)x{ l) (y). (13)

894 THOMAS J. DICICCIO AND MICHAEL A. MARTIN It follows from differentiation of (11) with respect to y that xu9) = J ii (y)gj{ )/Q{y) (/=l,...,p). (14) Substitution of (14) into (13) produces {-T w (w)}^nhhx)}-\ (15) Finally, by substitution of (9), (12) and (15) into (8), we obtain approximation (4). A desirable feature of (4) is its equivariance under invertible transformations of Y. For example, if Z = y(x) is related to y = g(x) by Y = (Z), where f is a real-valued, differentiate and increasing transformation, then the approximation to pr (Z^z) obtained by applying (4) to Z = y(x) directly coincides with the approximation to pr{y^ (z)} obtained by applying (4) to Y. Similarly, if is decreasing, then the approximation to pr(z^ z) coincides with that to 1-pr{y^f(z)}. Note, however, that (4) is not invariant under nonlinear transformations of the joint density (1). We discuss this issue further in 3. In the case where b{x) = 1 and g is a coordinate function, say g(x) = x 1, approximation (4) reduces to where r=r(x l ), D(x') ={/ II (x I ) 7(x 1 ) / J(jc I ) }- 1 and the components of J(y)i have the particularly simple form Jy(x') = l tj {x(x 1 )}. This formula was derived by DiCiccio et al. (1990). The conditions imposed on g place moderate limitations on the types of statistics for which tail probabilities may be approximated using (4). Leonard et al. (1989) present examples in which the Tierney et al. (1989) approximation for marginal densities produces inaccurate results. Approximation (4) also performs poorly for these instances. The examples of Leonard et al. focus on situations where the function g is many-to-one. For instance, if Y = g(x) = (X') 2 +... + (X P ) 2 and x = (0,...,0) then approximation (4) cannot be applied. On the other hand, if x is close to zero, although approximation (4) is formally applicable, it can be expected to yield poor results in small to moderately sized samples. An alternative approximation to (4) can be derived by applying (6) directly to the density (7) written as/t-(}0 oc exp{/*(>')}, where In general, a closed-form expression for x(y) is unavailable, and hence D(y) cannot be written explicitly. Since l*(y) depends on D(y), the maximizing point and derivatives of l*{y) required for application of (6) can be difficult to calculate. Numerical methods are available, however, that facilitate this application of (6). One drawback of (4) is that it can yield approximations which exceed one or are negative. Such problems can be avoided by using an alternative approximation. It is easily shown by Taylor expansion of the right-hand side of (4) that where c=c(y) This alternative approximation was suggested to us by Luke Tierney and a referee.

Approximations of marginal tail probabilities 895 Approximation (4) may be interpreted both algebraically and numerically. In certain circumstances, the approximation produces a convenient, closed-form expression estimating pr (y^^); see 3-2. However, in many cases the approximation does not result in a closed-form expression, and it is then most effectively viewed as a useful computational tool. 3. APPLICATIONS 3 1. Exponential regression model Feigl & Zelen (1965) investigated the relationship between survival time for leukaemia patients and a concomitant variable, patient white blood cell count. The sample they used consisted of 17 patients with acute myelogenous leukaemia. We study an exponential regression model for survival time T, which has density function conditional on x, the base 10 logarithm of white blood cell count, f(t\x) = 0" 1 exp (~t/6 x ), for t>0, where 6 X = exp (Po + Pix). Inference about 6 X for a specified value of x is important. The density function of Y = log T, conditional on x, is exp{j>-0 o -j3 1 x-expo'-0o-0 1 x)} (-oo<_y<oo); alternatively, we may write Y r ~/3 0 + )3 1 x +, where e has an extreme value distribution with density exg (z - e z ) for -oo<z<oo (Lawless, 1982). Let /3 = (@o, P\) be the maximum likelihood estimator of/s = (/3 0, P\)- When censoring is absent, the residuals A, = Y^-/3 0 -)3,x, (i = 1,..., n) are ancillary statistics. Let A = (A t,..., A n ) and (Z o, Z,) = (/3 -)9). Inference about )3 and about 0 x for some specified x, say XQ, can be based on the conditional density of Z = (Z o, Z,) given A. This conditional density is f z]a (z 0, z, a)ocexp{/(z 0, z,)}, where {a / -Zo-z 1 x / -exp(a i -z o -z I x I )}; (16) see Lawless (1982, p. 290), who develops exact conditional procedures based on /Z\A(ZO, ZI O)- We focus on inference for 0^ by considering the pivotal quantity Z 2 = Z 0 + Z 1 x 0 = \og6 Xo -\og6 Xo. Lawless derives an exact formula for pr(z 2 = _y A = a). Unfortunately, Lawless's technique does not extend easily beyond the case of a single regressor variable, as it requires numerical integration of the density of Z given A. As an alternative to the exact conditional procedure, we could use the large-sample normal approximation fi ~ N(f}, /"'), where / is the observed information matrix. Then Z 2 has an approximate normal distribution on which tests and confidence intervals for 8^ can be based. Table 1 contains exact and approximate values of pr (Z 2 = y \ A = a) for various values of y in the case when x 0 = 50 000. Exact tail probabilities were computed using equation (6.3.14) of Lawless (1982, p. 292) by numerical integration. For approximations (3) and Table 1. Approximations to tail probabilities of Z 2 y -105-0-95-0-85-0-65-0-55 0-35 0-55 0-75 Exact 0-6840 1-2222 21223 5-8273 91763 11-5163* 30101* 0-4468* Approximation (3) 0-4993 0-9086 1-6079 4-5957 7-3915 14-7927* 40157* 0-6524* Approximation (4) 0-6886 1-2290 21414 5-8684 9-2132 11-4800* 2-9958* 0-4448* Large sample 01274 0-3166 0-7288 30883 5-6986 15-7249* 5-6986* 1-5567* * Denotes tail probability taken to the right Table entries are percentages.

896 THOMAS J. DICICCIO AND MICHAEL A. MARTIN (4) we chose b in equation (1) to be 1 and / to be given by (16). For all values of y considered, approximation (4) gives results very close to the exact tail probabilities. Approximation (3) and the large-sample normal approximation give relatively inaccurate estimates. We now compare 95% confidence intervals for 6^ when x 0 = 50 000, obtained by the methods discussed above. Upper and lower 2-5% percentage points for the distribution of Z 2 and 95% confidence intervals for log 6^ and 6^ for each of the techniques are presented in Table 2. The intervals corresponding to approximations (3) and (4) were computed by numerical inversion of those formulae. The intervals obtained using approximation (4) are very close to the exact intervals, while those obtained using (3) are less accurate but still reasonable. The intervals derived from the large-sample normal approximation are quite inaccurate in comparison with the other intervals, which suggests that larger samples might be needed to obtain high accuracy with this method. Table 2. 95% confidence intervals for mean survival time of patients with white blood cell count of 50 000 Lower Upper 95% c.i. for log 0 x 95% c.i. for 6 X Exact -0-8192 0-5727 (2-6879,4-0799) (14-7008, 59-1371) Approximation (3) -0-7689 0-6092 (2-6514,4-0295) (14-1741,56-2324) Approximation (4) -0-8188 0-5723 (2-6883,4-0795) (14-7071, 591143) Large sample -0-6820 0-6820 (2-5786,3-9427) (13-1788,51-5554) Tierney et al. (1989) consider Bayesian inference for this model based on an improper uniform prior density on 0, = log /3 0 and 6 2 = /3,. Approximation (4) could be applied to produce approximate marginal posterior tail probabilities in this Bayesian context by choosing b, the prior density, equal to 1 and / to the log likelihood. Approximate posterior quantiles for linear functions of ft obtained in this way coincide with the approximate conditional confidence limits for those parameters obtained using (4) with our previous choice of b and /. This correspondence is natural because of the connection between Bayesian and conditional inference for location models under the assumption of uniform priors. Tierney et al. consider in particular construction of an approximate marginal posterior density for the two year survival probability of patients at a white blood cell count of 50 000. This probability is an increasing function of 6^. Consequently, use of (4) to derive approximate posterior quantiles or confidence limits for this probability produces the same results as transforming in the natural way the approximate quantiles or limits derived for 6^. 3-2. Noncentral t distribution Let X,,..., X n be independent and identically distributed observations from a normal N(fi, a 2 ) population, and let X = n~ x 1 X, and S 2 = (n - I)" 1 (X, -X) 2 denote sample mean and sample variance respectively. Given a value x 0, the quantity T'= n\x o -X)/ S has a noncentral t distribution with n 1 degrees of freedom and noncentrality parameter n\x 0 fi)/a. Computation of tail probabilities for the noncentral t distribution is difficult since it requires numerical integration of the noncentral t density, which is typically written in integral or infinite series form. In contrast, computation of approximation (4) to tail probabilities is relatively easy.

Approximations of marginal tail probabilities 897 Without loss of generality, suppose that the normal population has zero mean and unit variance. For each of the four choices of variables (U, V) = (X, S), (X, log 5), (X, S~ l ) and (X, S 2 ), we compute approximation (4) taking b in equation (1) to be a constant and / to be the logarithm of the joint density of U and V. Put Y = (x 0 - X)/S = n~^t'. We compute tail probability approximations for Y based on equation (4) for each of the four choices of variables. For the variables (X, S), we have for n > 3, where 1).VI ( ^ (17) nu [2(n2) + nyxv) nyx o v) J For the variables (X, log 5) the approximation is for n > 1, where pr(y^y)^<t>(r) + 4,(r)\-+(nueT i \- ^["'^ Vl (18) \_r [2(n l) + ny - nyu exp (-u)j J r = sgn (y-x X) ){nx o u-2{n - 1)6}*, u = x o -ye D, v = log {\{n - 1 + ny 2 y l [nyx 0 + {(nyx 0 ) 2 + 4(n - l)(n - 1 + n/)}*]). For the variables (X, S" 1 ), we have for n ^ 2, where = sgn /«1\*1 f -x 0^ I n (n 1) nu +3n^ -2nxz o u}(n 1 + " Finally, for the variables (X, S 2 ) we have the approximation for n ^ 4, where -x o (^ -J J (n-3)1 log (^ YJ-log u + nx o u I, u = x o -j't5 1, ), (19) (20) u = Kn - 1 + ny 2 y 2 [nyx 0 + {(nyx 0 ) 2 + 4{n-3)(n - 1 + n The results of a numerical study of tail probabilities for Y with n = 5, x 0 = 0, corresponding to a central t distribution, and n = 5, x o = 3-29053, corresponding to a noncentral t distribution, are given in Table 3. Approximations (17)-(20) all appear to perform reasonably well; in each case substantial improvement is gained over approximation (3).

898 THOMAS J. DICICCIO AND MICHAEL A. MARTIN Table 3. Tail probability approximations related to the noncentral t distribution y Exact (X, (3) 5) (18) (X, (3) logs) (19) (X, (3) 1/S) (20) (X, (3) 5 2 ) (21) -21-1-7-1-3-10 -0-6 1-5 1-6 1-8 2-0 2-3 6-5 80 9-7 12-4 0-467 0-954 2191 4-451 12-541 0-451 0-836 2-272 4-855 11-057 9-964 4-929 2-471 0-990 0-887 1-611 3-247 5-941 14-553 0-357 0-665 1-832 3-966 9-203 13-874 7-608 4-237 1-963 0-465 0-954 2197 4-467 12-577 0-444 0-822 2-237 4-787 10-920 10128 5019 2-518 1-006 Table entries are percentages. Values for y 3*6-5 are for right-hand tail. 0-309 0-670 1-654 3-585 11140 0-722 1-311 3-421 7031 15178 6-381 2-897 1-347 0-494 0-477 0-971 2-221 4-496 12-606 3-29053 0-454 0-841 2-282 4-872 11-083 10003 4-959 2-492 1-001 0110 0-285 0-859 2-203 8-644 1-311 2-316 5-727 11181 22-493 2-746 1029 0-399 0115 0-447 0-917 2115 4-322 12-315 0-475 0-876 2-364 5-025 11-366 9-746 4-809 2-432 0-963 2-644 4018 6-591 10142 19-433 0151 0-290 0-841 1-918 4-789 27-843 18-359 12-208 7130 0-203 0-570 1-620 3-695 11-601 0-485 0-898 2-436 5197 11-799 7-594 3141 1181 0190 The choice of variables seems important for the accuracy of the approximation. Approximations (17) and (18) involving variables (X, S) and (X, log 5) are clearly best, while approximation (20) involving (X, S 2 ) is the most inaccurate. An interesting point is that for x 0 = 0, approximation (17) is most accurate, while for the case *o = 3-29053, approximation (18) is preferable. Tierney et al. (1989) consider the problem of estimating the proportion of a normal N(fi,a 2 )_ population that falls below a point x 0. They study the estimator P = &{( x o~x)/s} of this quantity. Since <&(.) is a monotone function, approximation (4) to the tail probability pr(p^p) for each of the four choices of variables considered above can be obtained by replacing y by $>~ l (p) in formulae (17)-(20). 4. APPLICATIONS TO EXPONENTIAL FAMILIES 4-1. Marginal tail probabilities for a function of the sufficient statistic Suppose T,,..., T n is a sample of size n from the exponential family having density Let /3'(T) = dp(r)/dt, and /3 y (r) = d^i^/drjtj (i,j = 1,..., p). Put B(T) = {p lj (r)} and B(T) ' = {/3, 7 (T)} SO that B(T) is a pxp matrix and \B(T)\ is of order O(l). The density of the sufficient statistic X = n~ l 1 T, satisfies f x (x; r) = c B(f) "» exp [n{p(f)-p(r) + x'(f,-t,)}]{1 + O(n~ 3/2 )}, (21) provided x is O(n~^), where T is given by x' = )3'(T) (I = 1,..., p) and c is a normalizing constant such that the approximation on the right-hand side of (21) integrates to 1 + O(n~ 3/2 ). Here, f is the maximum likelihood estimator of T based on the observation X = x Formula (21) was derived by Barndorff-Nielsen & Cox (1979); see also Daniels (1958), Durbin (1980) and Reid (1988).

Approximations of marginal tail probabilities 899 Marginal tail probabilities for a real-valued function Y = g(x) can be approximated by applying (4) to (21). It is convenient to make the choice Note that l(x) attains its maximum value at the point x = (x\..., x p ) given by x' = -/3'(T). For fixed y, suppose x = x(y) maximizes l(x) subject to the constraint g(x) =y, and let r = f(y) satisfy X' = -B'(T) (i = 1,...,/?). Observe that r(y) = T for y = g(x). Then approximation (4) to the marginal tail probability pr(y* y) is where k is any index for which g k (x) does not vanish and { r k - r k (22) For the case of the coordinate function g(x) = x p, it is convenient to partition T into (A, iff), where_i = T p, and A = (A,,... z A p _,) has A a = r a (a = 1,...,p- 1). Then T(X P ) = (A, ip), with (^ given by x" = -/3 P (A, (/»), and (22) reduces to where pr (X" ^ x") = O(r) + <f>(r)\ l -+ n^ ^""^ f )} "*1 + O(n'^), (23) 4-2. Conditional inference for a component of the natural parameter Now suppose r = (A, i^) and tp is the parameter of interest, with A being a nuisance parameter. Unfortunately, the marginal distribution of X p is not particularly useful for inference about tp since that distribution depends on the nuisance parameter; indeed, approximation (23) involves A. However, the conditional distribution of X p given Y = (y 1,..., Y"~ 1 ) = (X\...,X P ' 1 ), depends on T only through <p. Bamdorff-Nielsen & Cox (1979) have shown that the conditional density of X p given Y satisfies x{l + O(n- 3/2 )}, (24) provided x"-x p is O(n'^), where (A, 4>) = f, where A* satisfies y" = -B"iX*, *P) \a = 1,..., p - 1), Bi(A, tp) is the (p -1) x (p - 1) submatrix of B(A, i/>) corresponding to A, and where c is a normalizing constant such that the approximation on the right-hand side of (24) integrates to 1 + O(n~ 3/2 ). Here A* is the constrained maximum likelihood estimator of A under the fixed value of ip having observed X = x, and x p is defined below.

900 THOMAS J. DICICCIO AND MICHAEL A. MARTIN Approximations to conditional tail probabilities for X p can be obtained by applying (5) to (24). In this case it is natural to choose Since y" = x a (a - 1,...,p-\) are taken as fixed and x' = -B'(k, </)) (i = 1,...,p), it is possible to regard (A, tp) as a function of x p alone. Straightforward calculations give / (1) (x') = «(< -*) and -P\x") = nb pp (k, $), where {B o (\, *)} = B(k, *)~\ Thus, /(*') is maximized at x p = -B p (k*, tp), and since B(k, <p) is positive definite for all (A, if/), it follows that x p is a decreasing function of \p. Formula (5) yields the approximation where pr (X p^x»\y = y) = *(r) + *(,)[! + -* 1 ff' (.f'^'vl + O{n~^), (25) r = sgn(x p -xn[2n{pa*,+)-f}aj)-y a a a -Xt)-x''(<i,-<l,)}f. (26) Having observed X = x, an exact upper 1 a conditional confidence limit for the parameter of interest can be computed as the value of ip such that pr (X p = x p \ Y = y) = I a. Similarly, an approximate conditional limit can be computed as the value of ip for which the right-hand side of (25) equals I-a. This approximate limit differs from the exact limit by terms of order O p (n~ 2 ), and the corresponding conditional confidence interval has coverage error of order O(n~ 3/2 ). In contrast, approximation (3) yields pr (X p ^ x p \ Y = y) 3>(r). The approximate confidence limit calculated as the value of ip for which <&(r) = 1 - a differs from the exact limit by terms of order O p (n~ l ), and the resulting interval has coverage error of order O(n^). This approach to constructing approximate confidence limits is closely related to a method given by Barndorff-Nielsen (1986). Since sgn (x p -jc /> ) = sgn (ip $), it follows that r defined at (26) is simply the signed root of the likelihood ratio statistic for ip. Barndorff-Nielsen shows that the marginal distribution of r - r~ l log K is standard normal to error of order O(n~ 3/2 ), where K is a variable admitting the expansion K = 1 + <?,(A*)r+ Q 2 (A*)r 2 + O p (n- 3/2 ), and Q, = CMA*) and Q 2 = CM^*) a re O p (n~ i ) and O p (n~ l ), respectively. An approximate upper 1 - a confidence limit for the parameter of interest can be calculated as the value of ip for which <J>(r-r~' log K) = 1 -a. Note that, to error of order O p (n~ 3/2 ), <t>(r-r l log K) = 4>{r-Q i -i(q 2 For the present problem, it follows from Barndorff-Nielsen's formula (3.11) that K j l_f B 1 (A^L ) ] i r~" ip-ip 1 \B(k,ip)\ J ' The approximate upper I-a limits obtained through <&(r-r~ l log K) and (25) therefore differ by terms of order O p (n~ 2 ), and the corresponding conditional confidence intervals both have coverage error of order O(n~ 3/2 ). The primary advantage in considering (25) is that it shows these limits to have conditional validity. Approximation (25) is also given by Skovgaard (1987) and is discussed by Davison (1988).

Approximations of marginal tail probabilities 901 Using (24), Barndorff-Nielsen & Cox (1979, formula (6.1)) have approximated the conditional log likelihood function for ip based on the observation X = x by T{*;x p \y)=$log\b l (k*,<l,)\-n{p{%*,t) + y %* + xw. (27) Let ip be the value of ip for which (27) is maximized, and put A = A*. Since y a = x a (a = 1,..., p -1) are taken as fixed, it is possible to regard (A, ip) as a function of x p ajone. The difference between the approximate conditional maximum likelihood estimator ip and the unconditional maximum likelihood estimator ip is of order O p (n~'); see Barndorff-Nielsen & Cox (1979, formula (6.2)). By a variation of the argument given by Barndorff-Nielsen & Cox that leads to their formulae (3.15) and (4.8), it follows that /x' y(x" \y) = c\b(k, iht^a x{l + O(n- 3/2 )}, (28) where c is a normalizing constant such that the approximation on the right-hand side of (28) integrates to l + O(rT 3/2 ). As for (24), approximations to conditional tail probabilities for X p can be obtained by applying (5) to (28). In this case, the choice is convenient. Since l(x") = -T{ip~; x p \y)-nx p tf/, we have l il) (x p ) = n(4>-tp). Hence x p is the value of x p whose corresponding i/f equals tp, and moreover, Formula (5) yields the approximation where 3 PP (A7, Since the -5log B,(A, \p)\ term in l(x p ) is O(l), it can be shown that (30) '{b{x»)y *-' 1 " Consequently, an alternative approximation to (29) is (,/,-</,) In an as yet unpublished technical report, D. A. S. Fraser and N. Reid have derived approximation (31) using different techniques.

902 THOMAS J. DICICCIO AND MICHAEL A. MARTIN Note that r defined at (30) is the signed root of the approximate conditional likelihood ratio statistic for \\i having observed X = x As in the case of (25), an approximate conditional upper 1 - a confidence limit for \fi can be computed as the value of i/> for which the right-hand side of (29) equals I-a. These approximate limits have the same asymptotic properties as those derived from (25), and they improve upon the usual limits derived from the uncorrected standard normal approximation O(r). For more general parametric models, corrections that improve the accuracy of the standard normal approximation to distributions of signed roots of likelihood ratio statistics can be derived from formula (4). Welch & Peers (1963) and Peers (1965) have described how a prior density function for a vector parameter should be chosen so that the posterior quantiles for a component of the vector are approximate confidence limits in the repeated sampling sense, having coverage error of order O{n~ x ). Using such prior densities, modifications to signed roots of likelihood ratio statistics can be derived by applying formula (4) to the joint posterior density of the vector parameter. We develop these corrections in an as yet unpublished paper and relate them to modifications proposed by Barndorff-Nielsen (1990a, b, c). Barndorff-Nielsen's modifications arise from integration of his formula for approximating the conditional density of the maximum likelihood estimator given an ancillary statistic. ACKNOWLEDGEMENT We are grateful to Luke Tierney for helpful discussions. REFERENCES BARNDORFF-NIELSEN, O. E. (1986). Inference on full or partial parameters based on the standardized signed log likelihood ratio. Biometrika 73, 307-22. BARNDORFF-NIELSEN, O. E. (1990a). Discussion of paper by D. A. Sprott. Can. J. Statist. 18, 12-4. BARNDORFF-NIELSEN, O. E. (1990b). A note on the standardized signed log likelihood ratio. Scand. J. Statist. 17, 157-60. BARNDORFF-NIELSEN, O. E. (1990C). Approximate interval probabilities. J. R. Statist. Soc B 52, 485-96. BARNDORFF-NIELSEN, O. E. & Cox, D. R. (1979). Edgeworth and saddlepoint approximations with statistical applications (with discussion). J. R. Statist. Soc. B 41, 279-312. DANIELS, H. E. (1958). Discussion of paper by D. R. Cox. J. R. Statist. Soc. B 20, 236-8. DAVISON, A. C. (1988). Approximate conditional inference in generalized linear models. /. R. Statist. Soc B 50, 445-61. DICICCIO, T. J., FIELD, C. A. & FRASER, D. A. S. (1990). Approximations of marginal tail probabilities and inference for scalar parameters. Biometrika 77, 77-95. DURBIN, J. (1980). Approximations for densities of sufficient estimators. Biometrika 67, 311-33. FEIGL, P. & ZELEN, M. (1965). Estimation of exponential survival probabilities with concomitant information. Biometrics 21, 826-37. LAWLESS, J. F. (1982). Statistical Models and Methods for Lifetime Data. New York: Wiley. LEONARD, T., HSU, J. S. J. & Tsui, K.-W. (1989). Bayesian marginal inference. J. Am. Statist. Assoc. 84, 1051-8. PEERS, H. W. (1965). On confidence points and Bayesian probability points in the case of several parameters. J. R. Statist Soc B 27, 9-16. REID, N. (1988). Saddlepoint methods and statistical inference (with discussion). Statist Set 3, 213-38. SKOVGAARD, I. M. (1987). Saddlepoint expansions for conditional distributions. / AppL Prob. 24, 875-87. TIERNEY, L., KASS, R. E. & KADANE, J. B. (1989). Approximate marginal densities of nonlinear functions. Biometrika 76, 425-33. Amendment (1991), 78, 233-4. WELCH, B. L. & PEERS, H. W. (1963). On formulae for confidence points based on intervals of weighted likelihoods. J. R. Statist Soc B 25, 318-29. [Received April 1990. Revised March 1991]