OBJECTIVE PRIORS FOR THE BIVARIATE NORMAL MODEL. BY JAMES O. BERGER 1 AND DONGCHU SUN 2 Duke University and University of Missouri-Columbia

Size: px

Start display at page:

Download "OBJECTIVE PRIORS FOR THE BIVARIATE NORMAL MODEL. BY JAMES O. BERGER 1 AND DONGCHU SUN 2 Duke University and University of Missouri-Columbia"

Matilda Marsh
6 years ago
Views:

1 The Annals of Statistics 008, Vol. 36, No., DOI: 0.4/07-AOS50 Institute of Mathematical Statistics, 008 OBJECTIVE PRIORS FOR THE BIVARIATE NORMAL MODEL BY JAMES O. BERGER AND DONGCHU SUN Duke University and University of Missouri-Columbia Study of the bivariate normal distribution raises the full range of issues involving objective Bayesian inference, including the different types of objective priors (e.g., Jeffreys, invariant, reference, matching, the different modes of inference (e.g., Bayesian, frequentist, fiducial and the criteria involved in deciding on optimal objective priors (e.g., ease of computation, frequentist performance, marginalization paradoxes. Summary recommendations as to optimal objective priors are made for a variety of inferences involving the bivariate normal distribution. In the course of the investigation, a variety of surprising results were found, including the availability of objective priors that yield exact frequentist inferences for many functions of the bivariate normal parameters, including the correlation coefficient.. Introduction and prior distributions... Notation and problem statement. The bivariate normal distribution of (x,x has mean parameters μ = (μ,μ and covariance matrix ( σ = ρσ σ ρσ σ σ, where ρ is the correlation between x and x. The density is πσ σ ρ { exp σ (x μ + σ (x μ ρσ σ (x μ (x μ σ σ ( ρ The data consists of an independent random sample X = (x k = (x k,x k, k =,...,nof size n 3, for which the sufficient statistics are ( x n ( x = and S = (x x k x(x k x s = r s s ( r, s s s k= }. Supported in part by NSF Grant DMS Supported in part by NSF Grant SES and NIH Grant R0-MH0748. AMS 000 subject classifications. Primary 6F0, 6F5, 6F5; secondary 6A0, 6E5, 6H0, 6H0. Key words and phrases. Reference priors, matching priors, Jeffreys priors, right-haar prior, fiducial inference, frequentist coverage, marginalization paradox, rejection sampling, constructive posterior distributions. 963

2 964 J. O. BERGER AND D. SUN where, for i, j =,, x i = n n n x ij, s ij = (x ik x i (x jk x j and r = s. s s j= k= We will denote prior densities as π(μ,μ,σ σ,ρ, and the corresponding posterior densities as π(μ,μ,σ σ,ρ X (all with respect to dμ dμ dσ dσ dρ. We consider objective inference for parameters of the bivariate normal distribution and functions of these parameters, with special focus on development of objective confidence or credible sets. Section. introduces many of the key issues to be covered, through a summary of some of the most interesting results involving priors yielding exact frequentist procedures; this section also raises interesting historical and philosophical issues. For easy access, Section.3 presents our summary recommendations as to which priors to utilize. Often, the posteriors for the recommended priors are essentially available in computational closed form, allowing direct Monte Carlo simulation. Section provides simple accept-reject schemes for computing with the recommended priors in other cases. Sections 3 and 4 develop the needed theory, concerning what are called reference priors and matching priors, respectively, and also present various simulations that were conducted to enable summary recommendations to be made. Notation: In addition to (μ,μ,σ,σ,ρ, the following parameters will be considered: η = ρ (, η =, η 3 =, σ σ ρ σ ρ (3 (4 (5 (6 (7 θ = ρσ, σ θ = σ ( ρ, θ 3 =σ σ ( ρ, θ 4 = σ ρ, σ θ 5 = μ σ, θ 6 = σ σ, θ 7 = σ σ, θ 8 = μ σ, θ 9 σ = ρσ σ, θ 0 = σ + σ ρσ σ, θ = d d [d = (d,d not proportional to (0, ], λ = ch max (, λ = ch min (. Some of these parameters have straightforward statistical interpretations. Since (x x, μ, N(μ + θ (x μ, θ, it is clear that θ is a regression coefficient, θ is a conditional variance, and η is the corresponding precision. For the marginal distribution of x, η is the precision and θ 5 is the reciprocal of the

3 BIVARIATE NORMAL 965 coefficient of variation. θ 3 is usually called the generalized variance. (η,η,η 3 gives a type of Cholesky decomposition of the precision matrix [see (3 in Section.]. θ 0 is the variance of x x,andθ is the variance of d x + d x. Finally, λ and λ are the largest and smallest eigenvalues of. Technical issue. We will assume that ρ < and r < in virtually all expressions and results that follow. This is because, if either equals in absolute value, then ρ ={sign of r} with probability (either frequentist or Bayesian posterior, as relevant. Indeed, the situation then essentially collapses to the univariate version of the problem, which is standard... Matching, constructive posteriors and fiducial distributions. The bivariate normal distribution has been extensively studied from frequentist, fiducial and objective Bayesian perspectives. Table summarizes a number of interesting results. For a variety of parameters, it presents objective priors (discussed below for which the resulting Bayesian posterior credible sets of level α are also exact frequentist confidence sets at the same level; in this case, the priors are said to be exact frequentist matching. This is a very desirable situation: see [3]and[] for general discussion and the many earlier references. For μ,μ,σ,σ and ρ, the constructive posterior distributions are also the fiducial distributions for the parameters, as found in Fisher [4, 5] and[]. Posterior distributions are presented as constructive random distributions, that is, by a description of how to simulate from them. Thus to simulate from the posterior distribution of σ, given the data (actually, only s is needed, one draws independent χn random variables and simply computes the corresponding s /χn ; this yields an independent sample from the fiducial/posterior distribution of σ. Table also lists the objective prior distributions that yield the indicated objective posterior. The notation π ab in the table stands for the important class of prior densities (a subclass of the generalized Wishart distributions of [8] (8 π ab (μ,μ,σ,σ,ρ= σ 3 a σ b ( ρ. b/ Special cases of this class are the Jeffreys-rule prior π J = π 0,theright-Haar prior π H = π,theindependence Jeffreys prior π IJ = π = σ σ ( ρ 3/ and π RO which has a = b =. The independence Jeffreys prior follows from using a constant prior for the means, and then the Jeffreys prior for the covariance matrix with means given. We highlight the results about ρ in Table because they are interesting from practical, historical and philosphical perspectives. First, it does not seem to be

4 966 J. O. BERGER AND D. SUN TABLE Parameters with exact matching priors of the form π ab, and associated constructive posteriors: Here Z is a standard normal random variable, and χn and χ n are chi-squared random variables with the indicated degrees of freedom, all random variables being independent. For μ,μ,σ,σ and ρ, the indicated posteriors are also fiducial distributions Parameter Prior Posterior μ π b, b (including π J and π H x + Z χ μ π J = π 0 x + Z χn s n n s n d ( μ μ, d R π J = π 0 and π H (see Table 4 d (x, x + Z σ π b, b (including π J and π H ρ π H = π ψ( Z + χn ρ η 3 = σ ρ π a, a (including π H θ = ρσ σ π a, a (including π H θ = σ ( ρ π a, a (including π H χ n s χn χn χ d Sd n r r n ψ(y= y/ + y Z s r s s Z χn s χ n s ( r χ n S θ 3 = π H = π and π IJ = π θ 4 = σ ρ σ π H = π θ 5 = μ σ π b, b (including π J and π H d d π J = π 0 and π H (see Table 4 χ n χ n χn χ n r r r s s s ( r s χ n Z + x n s d Sd χn known that the indicated prior for ρ is exact frequentist matching (proved here in Theorem. Indeed, standard statistical software utilizes various approximations to arrive at frequentist confidence sets for ρ, missing the fact that a simple exact confidence set exists, even for n = 3. It was, of course, known that exact frequentist confidence procedures could be constructed (cf. Exercise 54, Chapter 6 of [8], but explicit expressions do not seem to be available. The historically interesting aspect of this posterior for ρ is that it is also the fiducial distribution of ρ. Geisser and Cornfield [6] studied the question of whether the fiducial distribution of ρ could be reproduced as an objective Bayesian posterior, and they concluded that this was most likely not possible. The strongest evidence for this arose from Brillinger [7], which used results from [9]andadifficult analytic argument to show that there does not exist a prior π(ρ such that

5 BIVARIATE NORMAL 967 the fiducial density of ρ equals f(r ρπ(ρ, wheref(r ρ is the density of r given ρ. Since the fiducial distribution of ρ only depends on r, it was certainly reasonable to speculate that if it were not possible to derive this distribution from the density of r and a prior, then it would not be possible to do so in general. The above result, of course, shows that this speculation was incorrect. The philosophically interesting aspect of this situation is that Brillinger s result does show that the fiducial/posterior distribution for ρ provides another example of the marginalization paradox ([3]. This leads to an interesting philosophical conundrum of a type that we have not previously seen: a complete fiducial/objective Bayesian/frequentist unification can be obtained for inference about ρ, but only if violation of the marginalization paradox is accepted. We will shortly introduce a prior distribution that avoids the marginalization paradox for ρ, but which is not exactly frequentist matching. We know of no way to adjudicate between the competing goals of exact frequentist matching and avoidance of the marginalization paradox, and so will simply present both as possible objective Bayesian approaches. (Note that the same conundrum also arises for θ 5 = μ /σ ; the exact frequentist matching prior results in a marginalization paradox, as shown in [4]. Some interesting examples of improper priors resulting in marginalization paradox can be found from Ghosh and Yang [7] and Datta and Ghosh [0, ]..3. Recommended priors. It is actually rare to have exact matching priors for parameters of interest. Also, one is often interested in very complex functions of parameters (e.g., predictive distributions and/or joint distributions of parameters. For such problems it is important to have a general objective prior that seems to perform reasonably well for all quantities of interest. Furthermore, it is unappealing to many Bayesians to change the prior according to which parameter is declared to be of interest, and an objective prior that performs well overall is often sought. The five priors we recommend for various purposes are π J, π H, (9 π Rρ σ σ ( ρ, π + ρ Rσ σ σ ( ρ and (0 π Rλ. σ σ ( ρ (σ /σ σ /σ + 4ρ The first prior in (9 was developed in [0] and was studied extensively in [], where it was shown to be a one-at-a-time reference prior (see Section 3. The second prior in (9 is new and is derived in Section 3. π Rλ wasdevelopedasa one-at-a-time reference prior in [5]. With these definitions, we can make our summary recommendations. Table gives the four objective priors that are recommended for use, and indicates for

6 968 J. O. BERGER AND D. SUN TABLE Recommendations of objective priors for various parameters in the bivariate normal model: indicates that the posterior will not be exact frequentist matching.(for μ and parameters with σ replaced by σ, use the right-haar prior with the variances interchanged. Prior Parameter σ π Rρ ρ, σ, general use π H μ, σ, ρ, η 3, ρσ σ, σ ( ρ,, σ σ ρ, μ σ π H (see Table 4 d (μ,μ, d d π Rλ ch max ( π Rσ σ = ρσ σ which parameters (or functions thereof they are recommended. These recommendations are based on three criteria: (i the degree of frequentist matching, discussed in Section 4; (ii being a one-at-a-time reference prior, discussed in Section 3;and (iii ease of computation. The rationale for each of the entries in the table, based on these criteria, is given in Section 4.5. Another commonly used prior is the scale prior, π S (σ σ. The motivation that is often given for this prior is that it is standard to use σi as the prior for a standard deviation σ i, while <ρ< is on a bounded set and so one can use a constant prior in ρ. We do not recommend this prior, but do consider its performance in Section Computation. In this paper, a constant prior is always used for (μ,μ, so that (( μ, (( x ( X N μ,n. x Generation from this conditional posterior distribution is standard, so the challenge of simulation from the posterior distribution requires only sampling from (σ,σ,ρ X. The marginal likelihood of (σ,σ,ρsatisfies ( L (σ,σ,ρ (n / exp ( trace(s. It is immediate that, under the priors π J and π IJ, the marginal posteriors of are Inverse Wishart (S,nand Inverse Wishart (S,n, respectively. Berger, Strawderman and Tang [4] gave a Metropolis Hastings algorithm to generate from (σ,σ,ρ X based on the prior π Rλ. The following sections deal with the other priors we consider.

7 BIVARIATE NORMAL 969 TABLE 3 Ratio π/π IJ, upper bound M, rejection step and acceptance probability for ρ = 0.80, 0.95, 0.99, when π = π Rρ, π Rσ, π Rσ,π S and π MS Bound Acceptance probability Prior Ratio π π IJ M Rejection Step ρ = 0.80 ρ = 0.95 ρ = 0.99 π Rρ ρ u ρ π Rσ ρ 4 u ρ ρ π Rσ ( ρ ρ u ρ π S ( ρ 3/ u ( ρ 3/ Marginal posteriors of (σ,σ,ρ under π Rρ, π Rσ, π Rσ, and π S. For these priors, an independent sample from π(σ,σ,ρ X can be obtained by the following acceptance-rejection algorithm: Simulation step. Generate (σ,σ,ρ from the independence Jeffreys posterior π IJ (σ,σ,ρ X [the Inverse Wishart (S,n distribution] and, independently, sample u Uniform(0,. Rejection step. Suppose M sup (σ,σ,ρ π(σ,σ,ρ π IJ (σ,σ,ρ <. If u π(σ,σ,ρ/ [Mπ IJ (σ,σ,ρ], accept (σ,σ,ρ; else, return to Simulation step. For each of the priors listed in Table 3, the key ratio, π/π IJ, is listed in the table, along with the upper bound M,the Rejection step and the resulting acceptance probability for ρ = 0.80, 0.95, The rejection algorithm is quite efficient for sampling these posteriors. Indeed, for ρ 0, the algorithms accept with probability near one and, even for large ρ, the acceptance probabilities are very reasonable for the priors π Rρ, π Rσ, and π Rσ.Forlarge ρ, the algorithm is less efficient for the posteriors under the prior π S, but even these acceptance rates may well be fine in practice, given the simplicity of the algorithm... Computation under π ab. The most interesting prior of this form (besides the Jeffreys and independence Jeffreys priors is the right-haar prior π H, although other priors such as π arise as reference priors, and hence are potentially of interest. While Table gave an explicit form for the most important marginal posteriors arising from priors of this form, it is of considerable interest that essentially closed form generation from the full posterior of any prior of this form is possible (see, e.g., [8]. This is briefly reviewed in this section, since the expressions for the resulting constructive posteriors are needed for later results on frequentist coverage. It is most convenient to work with the parameters (η,η,η 3 given in (. This parameterization gives a type of Cholesky decomposition of the precision

8 970 J. O. BERGER AND D. SUN matrix, ( ( η η = 3 η 0 (3, 0 η η 3 η which accounts for the simplicity of ensuing computations. Note that ( is equivalent to σ = η, σ = + η 3 η 3 (4, ρ =. η η η η + η 3 The prior π ab of (8 for(μ,μ,σ,σ,ρ transforms to the extended conjugate class of priors for (μ,μ,η,η,η 3,givenbyπ ab (μ,μ,η,η,η 3 = η a η b. LEMMA. Consider the prior π ab. (a The marginal posterior of η 3 given (η,η ; X is N( η r s /s, /s. (b The marginal posterior distributions of η and η are independent and (η X Gamma( (n a, s ; (η X Gamma( (n b, s ( r. See [5] for a proof of this result. We next present the constructive posteriors of (η,η,η 3, and from these derive the constructive posteriors of (μ,μ,σ,σ,ρ and other parameters. All results follow directly from Lemma and (4. In presenting the constructive posteriors, we will use a star to represent a random draw from the implied distribution; thus μ will represent a random draw from its posterior distribution, Z,Z,Z 3 will be independent draws from the standard normal distribution, and and χ n b will be independent draws from chi-squared distributions with the indicated degrees of freedom. Recall that these constructive posteriors are not only useful for simulation, but will be the key to proving exact frequentist matching results. FACT. (a The constructive posterior of (η,η,η 3 given X can be expressed as (5 η =, η s = η 3 = Z 3 s χn b s χ n b s ( r, r r.

9 BIVARIATE NORMAL 97 (b The constructive posterior of (σ,σ,ρgiven X can be expressed as (6 (7 (8 σ = s, σ s = ( r χn b ρ = ψ(y, Y = Z ( Z 3 χn b χ n b χ n a r r, r r, where ψ(x= x/ + x. (c The constructive posterior for μ and μ can be written (9 (0 μ = x + Z s n, μ = x + Z r s ( Z + n χ n b Z 3 χn b Z χ n a s ( r. n 3. Reference priors. This paper began with an effort to derive and catalogue the possible reference priors for the bivariate normal distribution. The reference prior theory (cf. Bernardo [6] and Berger and Bernardo [3] has arguably been the most successful technique for deriving objective priors. Reference priors depend on (i specification of a parameter of interest; (ii specification of nuisance parameters; (iii specification of a grouping of parameters; and (iv ordering of the groupings. These are all conveyed by the shorthand notation used in Table 4. Thus, {(μ,μ, (σ,σ,ρ} indicates that (μ,μ is the parameter of interest, with the others being nuisance parameters, and there are two groupings with the indicated ordering. (The resulting reference prior is the independence Jeffreys prior, π IJ. As another example, {λ,λ,ϑ,μ,μ } introduces the eigenvalues λ >λ of as being primarily of interest, with ϑ (the angle defining the orthogonal matrix that diagonalizes, μ and μ being the nuisance parameters. Based on experience with numerous examples, the reference priors that are typically judged to be best are one-at-a-time reference priors, in which each parameter is listed separately as its own group. Hence we will focus on these priors. It turns out to be the case that, for the one-at-a-time reference priors, the ordering of μ and μ among the variables is irrelevant. Hence if μ and μ are omitted from a listing in Table 4, the resulting reference prior is to be viewed as any one-at-a-time reference prior with the indicated ordering of other variables, with the μ i being inserted anywhere in the ordering.

10 97 J. O. BERGER AND D. SUN TABLE 4 Reference priors for the bivariate normal model (where μ = d (μ,μ, ( σ = θ 7, ρ = d (0, /(σ θ7, θ = σ [ ( ρ ] and θ = ρσ / σ ; {{ }} indicates that any ordering of the parameters yields the same reference prior Prior π(μ,μ,σ,σ,ρ For parameter ordering Has form (8with π J σ σ ( ρ {(μ,μ,σ,σ,ρ} (a, b = (, 0 π IJ σ σ ( ρ 3/ {(μ,μ, (σ,σ,ρ} (a, b = (, π Rρ σ σ ( ρ {ρ,σ,σ }, {θ 7,θ 6,ρ} π Rσ +ρ σ σ ( ρ {σ,σ,ρ} π Rσ {σ σ σ ( ρ ρ,ρ,σ } {σ,η 3,θ } π RO σ σ ( ρ 3/ {σ,θ,η 3 } (a, b = (, π Rλ [((σ /σ (σ /σ +4ρ ] / σ σ ( ρ π H σ ( ρ π H d μ dμ d σ dσ d ρ ( σ [ ( ρ ] {λ,λ,ϑ} {{σ,θ,θ }}, {{θ,θ 3,θ 4 }} (a, b = (, {{η,η,θ }}, {{η,θ,θ }} {{d (μ,μ,μ,θ, θ, θ }} We are interested in finding one-at-a-time reference priors for the parameters μ,μ,σ,σ,ρ, η 3, θ,...,θ 9 and λ. This is done in [5], with the results summarized in Table 4, for all these parameters (i.e., the parameter appears as the first entry in the parameter ordering except η 3, σ,andμ i /σ i ; finding one-at-a-time reference priors for these parameters is technically challenging. (We do not explicitly list the reference priors for σ in the table, since they can be found by simply switching with σ in the various expressions. 4. Comparisons of priors via frequentist matching. 4.. Frequentist coverage probabilities and exact matching. Suppose a posterior distribution is used to create one-sided credible intervals (θ L,θ α (X, where θ L is the lower limit in the relevant parameter space and θ α (X is the posterior quantile of the parameter θ of interest, defined by P(θ < θ α (X X = α. (Here θ is the random variable. Of interest is the frequentist coverage of the corresponding confidence interval, that is, C(μ,μ,σ,σ,ρ = P(θ <θ α (X μ,μ,σ,σ,ρ.(herex is the random variable. The closer C(μ,μ,σ,σ,ρ is to the nominal α, the better the procedure (and corresponding objective prior is judged to be. The main results about exact matching are given in Theorems through 8. The proofs of Theorems, and 8 are given in Section 5; the rest can be found in [5].

11 BIVARIATE NORMAL 973 The following technical lemmas will be repeatedly utilized. The first lemma is from (3d..8 in []. Lemma 3 is easy. LEMMA. For n 3 and given σ,σ,ρ, the following three random variables are independent and have the indicated distributions: [ ] s / [ r s T = σ ( ρσ ] ( Z 3 (standard normal, ρ s σ ( (3 T 3 = s ( r σ ( ρ χ n, T 5 = s σ χ n. LEMMA 3. Let Y α denote the α quantile of any random variable Y. (a If g( is a monotonically increasing function, [g(y] α = g(y α for any α (0,. (b If W is a positive random variable, (WY α 0 if and only if Y α 0. We will reserve quantile notation for posterior quantiles, with respect to the distributions. Thus the quantile [(σ Z3 rz 3/χn + ρ s χn b ] α would be computed based on the joint distribution of (Z3,χ n b, while holding (σ,ρ,r,s,z 3,χn fixed. 4.. Credible intervals for a class of functions of (σ,σ,ρ. We consider the one-sided credible intervals of σ,σ and ρ and some functions of the form (4 θ = σ d σ d g(ρ, for d,d R and some function g(. We also consider a class of scale-invariant priors (5 π(μ,μ,σ,σ,ρ h(ρ, for some c,c R and a positive function h. σ c σ c THEOREM. Denote the α posterior quantile of θ by θ α (X under the prior (5. For any fixed (μ,μ,σ,σ,ρ, the frequentist coverage of the credible interval (θ L,θ α (X depends only on ρ. Here θ L is the lower boundary of the parameter space for θ. Note that parameters ρ, η,η,η 3, θ,...,θ 4 are all functions of the form (4. From Theorem, under any of the priors π J,π IJ,π Rσ,π Rρ,π RO,π H,π S,the

12 974 J. O. BERGER AND D. SUN frequentist coverage probabilities of credible intervals for any of these parameters will depend only on ρ. We will show that the frequentist coverage probabilities could be exact under the prior π ab.sinceη (η is a monotone function of σ (θ, we consider only ρ and the last 5 parameters Coverage probabilities under π ab. THEOREM. (a For ψ defined in (8, the posterior α quantile of ρ is ρ α = ψ(y α. (b For any α (0,, ξ = (μ,μ,σ,σ and ρ (,, P(ρ<ρ α (6 ξ,ρ ( ρ Z 3 + ρ χ ( n ρ Z3 χ = P > + ρ n a ρ. χn χn b α (c (6 equals α if and only if the right Haar prior is used, that is, (a, b = (,. (7 THEOREM 3. (a For any α (0,, ξ = (μ,μ,σ,σ and ρ (,, P ( η 3 <(η 3 α ξ,ρ ρ χn (Z 3 + ρ = P χ n (Z3 + ρ χ ρ n < ρ χn b α (b (7 equals α for any <ρ< if and only if b =.. (a The constructive posterior of θ = ρσ /σ has the expression THEOREM 4. (8 θ = r s Z 3 s χ n b r s s. (b For any α (0,, ξ = (μ,μ,σ,σ and ρ (,, P ( θ <(θ α ξ,ρ ( = P t n < n n b (t n b α, which does not depend on ρ. Furthermore, (8 equals α if and only if b =. THEOREM 5. (a The constructive posterior of θ = σ ( ρ is θ = s ( r /χn b. (b For any α (0,, ξ = (μ,μ,σ,σ and ρ (,, P ( θ <(θ α ξ,ρ = P ( (9 χn >(χ n b α, which does not depend on ρ. Furthermore, (9 equals α if and only if b =.

13 BIVARIATE NORMAL 975 THEOREM 6. (a The constructive posterior of θ 3 = is θ 3 = S / ( χ n b. (b For any ξ = (μ,μ,σ,σ and ρ (,, P ( θ 3 <(θ3 α ξ,ρ = P ( (30 χn χ n >(χ n a χ n b α, which does not depend on ρ. Furthermore, (30 equals α iff (a, b is (, or (,. THEOREM 7. (a The constructive posterior of θ 4 is θ4 = s ( r χn b. s (b For any ξ = (μ,μ,σ,σ and ρ (,, P ( θ 4 <(θ4 α ξ,ρ = P ( (3 χn /χ n <(χ n a /χn b α, which does not depend on ρ. Furthermore, (3 equals α iff (a, b = (,. An interesting function of (μ,μ,σ,σ,ρnot of the form (4isθ 5 = μ /σ. THEOREM 8. (a The constructive posterior of θ 5 = μ /σ is θ5 = Z + x χ n s n a. (b For any α (0,, the frequentist coverage of the credible interval (,(θ5 α is P ( θ 5 <(θ5 α μ,μ,σ,σ,ρ (3 ( Z θ 5 n = P < χn ( Z θ 5 n α θ 5, which depends on θ 5 only and equals α if and only if a = First order asymptotic matching. Datta and Mukerjee [9] and Datta and Ghosh [] discuss how to determine first-order matching priors for functions of parameters; these are priors such that the frequentist coverage of a one-sided credible interval is equal to the Bayesian coverage up to a term of order n.for each of the nine objective priors π J,π IJ,π Rρ, π Rσ,π RO,π Rλ,π H,π S and π Rσ,[5] determines if it is a first-order matching prior for each of the parameters μ,μ,σ,σ,ρ, η 3, θ,...,θ 0. The results are listed in Table 5. For example, π J is a first order matching prior for μ,μ,σ,σ,θ,θ 5,θ 7,θ 8, and θ 0, but not for η 3,θ,θ 3 and θ 9.

14 976 J. O. BERGER AND D. SUN TABLE 5 The first-order asymptotic matching of objective priors for μ,μ,σ,σ,ρ, μ μ, η 3, θ j,j =,...,0. Here a boldface letter indicates exact matching Asymptotic matching Prior π(μ,μ,σ,σ,ρ Yes No π J σ σ ( ρ μ, μ, σ, σ ρ μ μ,θ, θ 5,θ 7, θ 8,θ 0 η 3,θ,θ 3,θ 9 π IJ σ σ ( ρ 3/ μ,μ σ,σ,ρ π Rρ π Rσ π RO π Rλ π H σ σ ( ρ μ μ,θ, θ 3,θ 7 η 3,θ,θ 5,θ 8,θ 9,θ 0 μ,μ,ρ σ,σ μ μ,θ 3,θ 7 η 3,θ,θ,θ 5,θ 8,θ 9,θ 0 σ σ ( ρ ρ μ,μ σ,σ,ρ μ μ,η 3,θ 3,θ 7 θ,θ,θ 5,θ 8,θ 9,θ 0 σ σ ( ρ 3/ μ,μ, σ σ,ρ μ μ,θ, θ 5 η 3,θ,θ 3,θ 7,θ 8,θ 9,θ 0 [σ σ ( ρ ] ((σ /σ (σ /σ +4ρ μ,μ σ,σ,ρ σ ( ρ μ μ,θ 3 η 3,θ,θ,θ 5, θ 7,θ 8,θ 9,θ 0 μ,μ, σ, ρ σ μ μ, η 3, θ, θ, θ 3, θ 4, θ 5 θ 7,θ 8,θ 9,θ 0 π S σ σ μ,μ σ,σ,ρ μ μ,θ 3,θ 7 η 3,θ,θ,θ 5,θ 8,θ 9,θ 0 π Rσ +ρ σ σ ( ρ μ,μ σ,σ,ρ μ μ,θ 3,θ 7,θ 9 θ,θ,η 3,θ 5,θ 8,θ Numerically computed coverage and recommendations. First-order matching is only an asymptotic property, and finite sample performance is also crucial. We thus also implemented a modest numerical study, comparing the numerical values of frequentist coverages of the one-sided credible sets P(θ >q 0.05 and P(θ <q 0.95, for the parameters, θ, listed in Table 6 and for the eight objective priors π J,π IJ,π Rρ,π Rσ, π RO,π Rλ,π H and π S.Asusual,q α = q α (X is the posterior α-quantile of θ, and the coverage probability is computed based on the sampling distribution of q α (X for the fixed parameter (μ,μ,σ,σ and ρ. Many of the coverage probabilities depend only on ρ, which was thus chosen to be the x-axis in the graphs. We considered the case n = 3 (the minimal possible sample size and hence the most challenging in terms of obtaining good coverage and the two scenarios Case a: (μ,μ,σ,σ = (0, 0,,, andcase b: (μ,μ,σ,σ = (0, 0,,.

15 BIVARIATE NORMAL 977 TABLE 6 Performance of objective priors for each of the parameters Prior Parameter Bad Medium Good μ rest π RO,π H,π J μ μ rest π J, π RO σ π IJ rest π H,π Rλ,π MS σ π H,π RO,π IJ rest π J ρ π J,π IJ,π S,π RO π Rρ,π Rσ,π Rλ,π H,π MS λ rest π J,π Rλ,π RO θ 3 = π RO,π J rest π IJ,π H θ 7 = σ σ π H,π J,π RO,π Rλ rest θ 9 = σ π J,π IJ (due to size rest π H,π Rρ,π Rσ Here we present the numerical results concerning coverage for only two of the parameters: ρ in Figure and θ 7 = σ /σ in Figure. Table6 summarizes the results from the entire numerical study, the details of which can be found in [5]. The recommendations made in Table for the boxed parameters are justified from these numerical results as follows. FIG.. Frequentist coverages for ρ, where Case a: (μ,μ,σ,σ = (0, 0,,, and Case b: (μ,μ,σ,σ = (0, 0,,. Thex-axisisforρ (,.

16 978 J. O. BERGER AND D. SUN FIG.. Frequentist coverages for θ 7 = σ /σ, where Case a: (μ,μ,σ,σ = (0, 0,, and Case b: (μ,μ,σ,σ = (0, 0,,. Thex-axisisforρ (,. The inferences involving the nonboxed parameters in Table are given in closed form in Table (and so are computationally simple, and are exact frequentist matching. Furthermore, with the exception of μ /σ and η 3, the nonboxed parameters have the indicated priors as one-at-a-time reference priors, so all three criteria point to the indicated recommendation. For ρ, we recommend using π Rρ, since this prior is a one-at-a-time-reference for ρ, first-order matching (as shown in Table 5, and has excellent numerical coverage as shown in Figure. Note that some might prefer to use the right-haar prior because of its exact matching for ρ (even though it exhibits a marginalization paradox. For σ /σ, the one-at-a-time reference prior was also π Rρ.Asthiswas first-order frequentist matching and among the best in terms of numerical coverage (see Figure, we also recommend it for this parameter. For λ, the situation is unclear. The one-at-a-time reference prior is π Rλ and is hence our recommendation, but first-order matching results for this parameter are not known, and the numerical coverages of all priors were rather bad. For σ,the only first-order matching prior among our candidates is π Rσ. It also had the best numerical coverages, and so is a clear recommendation. Note, however, that we were not able to determine if it is a one-at-a-time reference prior for σ,sothe recommendation should be considered tentative. The most interesting question is what to recommend for general use, as an allpurpose prior. Looking at Table, it might seem that π H or even π J would be good choices, since they are optimal for so many parameters. However, both these priors

17 BIVARIATE NORMAL 979 can also give quite bad coverages, as indicated in Figure for π H and in Figures and for π J. Indeed, from Table 6, the only priors that did not have significantly poor performance for at least one parameter (other than λ, for which no prior gave good coverages were π Rρ and π Rσ. The numerical coverages for π Rρ and π Rσ are virtually identical for all the parameters, so there is no principled way to choose between them. π Rρ is a commonly used prior and somewhat simpler, so it becomes our recommended choice for a general prior. 5. Proofs. Due to space limitations, we give only the proofs of Theorems, and 8, because their proofs are quite different. The proofs of the other theorems in Section 4 are relatively easy consequences of Fact and Lemmas 3. For details of these other proofs, see [5]. 5.. Proof of Theorem. With the constant prior for (μ,μ,themarginal likelihood of (σ,σ,ρdepends on S and is proportional to Define (n / exp { trace(s }. D ={(σ,σ,ρ : σ d σ d g(ρ <σ d G(X,σ,σ,ρ= π(σ,σ,ρ Sdσ dσ dρ. Clearly, the frequentist coverage probability is D σ d g(ρ}, P {θ <θ α (X μ,μ,σ,σ,ρ}=p {G(S,σ,σ,ρ< α σ,σ,ρ}. Under the prior (5, G(X,σ,σ,ρ= D h(ρ exp( 0.5 trace(s σ (n +c σ (n +c h(ρ exp( 0.5 trace(s ( ρ dσ (n / dσ dρ, σ (n +c σ (n +c ( ρ dσ (n / dσ dρ where is the symmetric matrix, whose diagonal elements are σ and σ, and off-diagonal element is σ σ ρ. Denote = diag(/σ, /σ and make transformations S S T = S = σ ( σ σ ω S S and = = ω ω ρ ω ω ρ ω. σ σ σ Clearly trace(s = trace(t,andthen h(ρ exp( 0.5 trace(t D dω ω n +c ω n +c ( ρ G(X,σ,σ,ρ= (n / dω dρ, h(ρ exp( 0.5 trace(t dω ω n +c ω n +c ( ρ (n / dω dρ

18 980 J. O. BERGER AND D. SUN where D ={(ω,ω,ρ : ω d ωd g(ρ <g(ρ}. Since the sampling distribution of T depends only on ρ, so does the sampling distribution of G(X,σ,σ,ρ.Also D depends on ρ only. The result thus holds. 5.. Proof of Theorem. It follows from (8 and Lemma 3 (a that ( Z P(ρ<ρ α {[ψ ξ,ρ= P 3 χ ] n b r } + >ρ ρ, r α Note that ψ, definedin(8, is invertible, and ψ (ρ = ρ/ ρ, for ρ <. It follows from Lemma 3 (a and (b that (( Z P(ρ<ρ α ξ,ρ= P 3 + (( Z = P 3 χn b It follows from ( (3 that Consequently, P(ρ<ρ α ξ,ρ ( Z3 = P + χn χn b ρ ρ r r χn b ρ ρ α r = s / s r s ( r = σ ρ Z 3 + (ρσ /σ s σ ρ χ n ρ ρ = Z 3 + χn ρ ρ χn ( Z < 3 + χn χn b χn. χn + ρ ρ α > 0 ρ r r > 0 ρ. χn b ρ α This completes the proof of part (a. For part (b, if (6 equals to α for any <ρ<, choose ρ = 0 and get ( ( Z3 Z P < 3 = α, χn χn b α which implies that b =. Substituting b = into(6 showsthata =..

19 BIVARIATE NORMAL Proof Theorem 8. Part (a is obvious. For part (b, since x = μ + Z σ / n and Z and χn are independent, we have ([ ( Z θ5 <(θ5 ( α = + θ χ n a 5 + Z χ ] n a > 0. n n It follows from Lemma 3 (a and (b that ( θ5 <(θ 5 α = ([ Z χ n a ( = Z χn χ n ( n n + θ 5 χn ( n Z θ 5 < χn χ n + Z χ n α ] α n θ 5 α > 0 Because Z and Z have the same distribution and Z and χn are independent, (3 holds. If (3 equals α for any θ 5, choose θ 5 = 0, ( ( Z Z P < = α, χn α which implies that a =. The result holds. Acknowledgments. The authors are grateful to Fei Liu for performing the numerical frequentist coverage computations, to Xiaoyan Lin for computing the matching priors in Table 5, and to Susie Bayarri for helpful discussions. The authors gratefully acknowledge the very constructive comments of the editor, an associate editor and two referees. REFERENCES [] BAYARRI, M. J. (98. Inferencia bayesiana sobre el coeficiente de correlación de una población normal bivariante. Trabajos de Estadistica e Investigacion Operativa MR [] BAYARRI, M. J. and BERGER, J. (004. The interplay between Bayesian and frequentist analysis. Statist. Sci MR0847 [3] BERGER, J.O.andBERNARDO, J. M. (99. On the development of reference priors (with discussion. In Bayesian Statistics Oxford Univ. Press. MR38069 [4] BERGER, J. O., STRAWDERMAN, W. and TANG, D. (005. Posterior propriety and admissibility of hyperpriors in normal hierarchical models. Ann. Statist MR6354 [5] BERGER, J.O.andSUN, D. (006. Objective priors for a bivariate normal model with multivariate generalizations. Technical Report 07-06, ISDS, Duke Univ. [6] BERNARDO, J. M. (979. Reference posterior distributions for Bayesian inference (with discussion. J. Roy. Statist. Soc. Ser. B MR [7] BRILLINGER, D. R. (96. Examples bearing on the definition of fiducial probability with a bibliography. Ann. Math. Statist MR0483.

20 98 J. O. BERGER AND D. SUN [8] BROWN, P., LE, N. and ZIDEK, J. (994. Inference for a covariance matrix. In Aspects of Uncertainty: A Tribute to D. V. Lindley (P. R. Freeman and A. F. M. Smith, eds Wiley, Chichester. MR [9] DATTA, G.andMUKERJEE, R. (004. Probability Matching Priors: Higher Order Asymptotics. Springer, New York. MR [0] DATTA, G. S. and GHOSH, J. K. (995a. On priors providing frequentist validity for Bayesian inference. Biometrika MR33838 [] DATTA, G. S. and GHOSH, J. K. (995b. Noninformative priors for maximal invariant parameter in group models. Test MR36504 [] DATTA, G. S. and GHOSH, M. (995c. Some remarks on noninformative priors. J. Amer. Statist. Assoc MR [3] DAWID, A. P., STONE, M. and ZIDEK, J. V. (973. Marginalization paradoxes in Bayesian and structural inference (with discussion. J. Roy. Statist. Soc. Ser. B MR [4] FISHER, R. A. (930. Inverse probability. Proc. Cambridge Philos. Soc [5] FISHER, R. A. (956. Statistical Methods and Scientific Inference. Oliver and Boyd, Edinburgh. [6] GEISSER, S. and CORNFIELD, J. (963. Posterior distributions for multivariate normal parameters. J. Roy. Statist. Soc. Ser. B MR07354 [7] GHOSH, M. and YANG, M.-C. (996. Noninformative priors for the two sample normal problem. Test MR40459 [8] LEHMANN, E. L. (986. Testing Statistical Hypotheses, nd ed. Wiley, New York. MR [9] LINDLEY, D. V. (96. The use of prior probability distributions in statistical inference and decisions. Proc. 4th Berkeley Sympos. Math. Statist. Probab. (J. Neyman and E. L. Scott, eds Univ. California Press, Berkeley. MR [0] LINDLEY, D. V. (965. Introduction to Probability and Statistics from a Bayesian Viewpoint. Cambridge Univ. Press. [] PRATT, J. W. (963. Shorter confidence intervals for the mean of a normal distribution with known variance. Ann. Math. Statist MR04850 [] RAO, C. R. (973. Linear Statistical Inference and Its Applications. Wiley, New York. MR [3] SEVERINI, T. A., MUKERJEE, R. and GHOSH, M. (00. On an exact probability matching property of right-invariant priors. Biometrika MR94654 [4] STONE, M.andDAWID, A. P. (97. Un-Bayesian implications of improper Bayes inference in routine statistical problems. Biometrika MR [5] YANG, R. and BERGER, J. (004. Estimation of a covariance matrix using the reference prior. Ann. Statist. 95. MR397 ISDS DUKE UNIVERSITY BOX 905 DURHAM, NORTH CAROLINA NC USA berger@stat.duke.edu URL: DEPARTMENT OF STATISTICS UNIVERSITY OF MISSOURI-COLUMBIA 46 MIDDLEBUSH HALL COLUMBIA, MISSOURI USA sund@missouri.edu URL:

Some Curiosities Arising in Objective Bayesian Analysis

. Some Curiosities Arising in Objective Bayesian Analysis Jim Berger Duke University Statistical and Applied Mathematical Institute Yale University May 15, 2009 1 Three vignettes related to John s work