An Encompassing Prior Generalization of the Savage-Dickey Density Ratio Test

Similar documents
Another Statistical Paradox

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

A note on Reversible Jump Markov Chain Monte Carlo

7. Estimation and hypothesis testing. Objective. Recommended reading

Bayesian Inference: Concept and Practice

Statistical Methods in Particle Physics Lecture 1: Bayesian methods

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Fiducial Inference and Generalizations

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida

Inequality constrained hypotheses for ANOVA

Bayes factor testing of equality and order constraints on measures of association in social research

Bayesian Evaluation of Informative Hypotheses for Multiple Populations. Herbert Hoijtink. Department of Methodology and Statistics, Utrecht University

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

Bayesian Inference in Astronomy & Astrophysics A Short Course

an introduction to bayesian inference

Bayesian (conditionally) conjugate inference for discrete data models. Jon Forster (University of Southampton)

Running head: REPARAMETERIZATION OF ORDER CONSTRAINTS 1. Adjusted Priors for Bayes Factors Involving Reparameterized Order Constraints. Daniel W.

7. Estimation and hypothesis testing. Objective. Recommended reading

Answers and expectations

eqr094: Hierarchical MCMC for Bayesian System Reliability

Bayesian Assessment of Hypotheses and Models

ICES REPORT Model Misspecification and Plausibility

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units

Bayesian inference: what it means and why we care

Bayesian Inference for the Multivariate Normal

Default Gunel and Dickey Bayes Factors for Contingency Tables

Invariant HPD credible sets and MAP estimators

Divergence Based priors for the problem of hypothesis testing

Bayesian Mixture Modeling of Significant P Values: A Meta-Analytic Method to Estimate the Degree of Contamination from H 0 : Supplemental Material

Theory and Methods of Statistical Inference

ST 740: Model Selection

Theory and Methods of Statistical Inference. PART I Frequentist theory and methods

Introduction to Bayesian Data Analysis

Inference Control and Driving of Natural Systems

Lecture : Probabilistic Machine Learning

Experimental Design to Maximize Information

A BAYESIAN MATHEMATICAL STATISTICS PRIMER. José M. Bernardo Universitat de València, Spain

Katsuhiro Sugita Faculty of Law and Letters, University of the Ryukyus. Abstract

A Note on Lenk s Correction of the Harmonic Mean Estimator

A Statisticians View on Bayesian Evaluation of Informative Hypotheses

1 Hypothesis Testing and Model Selection

Bayesian Hypothesis Testing: Redux

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

Strong Lens Modeling (II): Statistical Methods

Bayesian Statistics as an Alternative for Analyzing Data and Testing Hypotheses Benjamin Scheibehenne

Theory and Methods of Statistical Inference. PART I Frequentist likelihood methods

arxiv: v1 [stat.ap] 27 Mar 2015

Approximating Bayesian Posterior Means Using Multivariate Gaussian Quadrature

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs

Physics 403. Segev BenZvi. Choosing Priors and the Principle of Maximum Entropy. Department of Physics and Astronomy University of Rochester

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

MCMC algorithms for fitting Bayesian models

MEASUREMENT UNCERTAINTY AND SUMMARISING MONTE CARLO SAMPLES

Penalized Loss functions for Bayesian Model Choice

A Note on Hypothesis Testing with Random Sample Sizes and its Relationship to Bayes Factors

University of Toronto Department of Statistics

A simple two-sample Bayesian t-test for hypothesis testing

Density Estimation. Seungjin Choi

Riemann Manifold Methods in Bayesian Statistics

Default Priors and Effcient Posterior Computation in Bayesian

Lecture 2: From Linear Regression to Kalman Filter and Beyond

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.

PART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics

Supplemental Data: Appendix

Bayesian search for other Earths

Second Workshop, Third Summary

On resolving the Savage Dickey paradox

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling

The Bayesian Approach to Multi-equation Econometric Model Estimation

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Assessing Regime Uncertainty Through Reversible Jump McMC

Bayesian Regression Linear and Logistic Regression

Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric?

Subjective and Objective Bayesian Statistics

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

P Values and Nuisance Parameters

Bayesian analysis in nuclear physics

JOSEPH B. KADANE AND JIASHUN JIN, IN HONOR OF ARNOLD ZELLNER

BAYESIAN MODEL CRITICISM

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations

Testing Simple Hypotheses R.L. Wolpert Institute of Statistics and Decision Sciences Duke University, Box Durham, NC 27708, USA

Bayesian Image Segmentation Using MRF s Combined with Hierarchical Prior Models

A Power Fallacy. 1 University of Amsterdam. 2 University of California Irvine. 3 University of Missouri. 4 University of Groningen

Bayesian Inference. Chapter 1. Introduction and basic concepts

Supplementary Note on Bayesian analysis

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Expectation Propagation for Approximate Bayesian Inference

Basic math for biology

Lecture 2: From Linear Regression to Kalman Filter and Beyond

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

Stable Limit Laws for Marginal Probabilities from MCMC Streams: Acceleration of Convergence

FREQUENTIST BEHAVIOR OF FORMAL BAYESIAN INFERENCE

Bayesian RL Seminar. Chris Mansley September 9, 2008

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Markov Chain Monte Carlo

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Switching Regime Estimation

Accept-Reject Metropolis-Hastings Sampling and Marginal Likelihood Estimation

A Convolution Method for Folding Systematic Uncertainties into Likelihood Functions

Transcription:

An Encompassing Prior Generalization of the Savage-Dickey Density Ratio Test Ruud Wetzels a, Raoul P.P.P. Grasman a, Eric-Jan Wagenmakers a a University of Amsterdam, Psychological Methods Group, Roetersstraat 15, 1018 WB Amsterdam, The Netherlands Abstract Hoijtink, Klugkist, and colleagues proposed an encompassing prior (EP) approach to facilitate Bayesian model selection in nested models with inequality constraints. In this approach, samples are drawn from the prior and posterior distributions for an encompassing model that contains an inequality restricted version as a special case. The evidence in favor of the inequality restriction (i.e., the Bayes factor) simplifies to the ratio of the proportions of posterior and prior samples consistent with the inequality restriction. Up to now, this elegant formalism has been applied almost exclusively to models with inequality or about equality constraints. Here we show that the EP approach naturally extends to exact equality constraints by considering the ratio of the heights for the posterior and prior distributions at the point that is subject to test (i.e., the Savage-Dickey density ratio test). Therefore, the EP approach generalizes the Savage-Dickey test and can account for both inequality and equality constraints. The general EP approach is an elegant and computationally efficient procedure to calculate Bayes factors for nested models. Key words: Bayesian model selection, Hypothesis testing, Inequality constraints, Equality constraints, Borel-Kolmogorov paradox This research was supported by Veni and Vidi grants from the Dutch Organization for Scientific Research (NWO). Preprint submitted to Elsevier August 13, 2009

1. Introduction In this article we focus on Bayesian model selection for nested models. Consider, for instance, a parameter vector θ = (ψ, φ) Θ = Ψ Φ and suppose we want to compare an encompassing model M e to a restricted version M 1 : ψ = ψ 0. Then, after observing the data D, the Bayes factor in favor of M 1 is BF 1e = p(d M 1) p(d ψ = p(d M e ) = ψ0, φ)p(ψ = ψ 0, φ)dφ. p(d ψ, φ)p(ψ, φ)dψdφ Thus, the Bayes factor is the ratio of the integrated or marginal likelihoods for two competing models; alternatively, the Bayes factor can be conceptualized as the change from prior model odds p(m 1 )/p(m e ) to posterior model odds p(m 1 D)/p(M e D) (Kass and Raftery, 1995). The Bayes factor quantifies the evidence that the data provide for one model versus another, and it represents the standard Bayesian solution to the hypothesis testing and model selection problems (Lewis and Raftery, 1997, p. 648). Unfortunately, for most models, the Bayes factor cannot be obtained in analytic form. Several methods have been proposed to estimate the Bayes factor numerically (e.g., Gamerman and Lopes (2006, Chap. 7) for a description of 11 such methods). Nevertheless, calculation of the Bayes factor often remains a computationally complicated task. Here we discuss an encompassing prior (EP) approach proposed by Hoijtink, Klugkist, and colleagues (Klugkist et al., 2005a,b; Hoijtink et al., 2008). This approach applies to nested models and it virtually eliminates the computational complications inherent in most other methods. 2. Bayes Factors from the Encompassing Prior Approach For concreteness, consider two Normally distributed random variables with means µ 1 and µ 2, and common standard deviation σ. We focus on the following hypotheses: M e : µ 1, µ 2 ; σ, 2

M 1 : µ 1 > µ 2 ; σ, M 2 : µ 1 µ 2 ; σ, M 3 : µ 1 = µ 2 ; σ. In the encompassing model M e, all parameters are free to vary. Models M 1, M 2, and M 3 are nested in M e and enforce particular restrictions on the means; specifically, M 1 features an inequality constraint, M 2 features an about equality constraint, and M 3 features an exact equality constraint. We now deal with these in turn. 2.1. Computing Bayes Factors for Inequality Constraints Suppose we compare two models, an encompassing model M e and an inequality constrained model M 1. We denote the prior distributions under M e by p(ψ, φ M e ), where ψ is the parameter vector of interest (i.e., µ 1 and µ 2 in the earlier example) and φ is the parameter vector of nuisance parameters (i.e., σ in the earlier example). Then, the prior distribution of the parameters under model M 1 can be obtained from p(ψ, φ M e ) by restricting the parameter space of ψ: p(ψ, φ M 1 ) = p(ψ, φ M e )I M1 (ψ, φ) p(ψ, φ Me )I M1 (ψ, φ)dψdφ. (1) In Equation 1, I M1 (ψ, φ) is the indicator function for model M 1. This means that I M1 (ψ, φ) = 1 if the parameter values are in accordance with the constraints imposed by model M 1, and I M1 (ψ, φ) = 0 otherwise. Under the above specification of priors, Klugkist and Hoijtink (2007) showed that the Bayes factor BF 01 can be easily obtained by drawing values from the posterior and prior distribution for M e : BF 1e = 1 m m i=1 I M 1 (ψ, φ D, M e ) m i=1 I M 1 (ψ, φ M e ), (2) 1 m where m represents the total number of samples. The numerator is the proportion of M e s posterior samples for ψ that obey the constraint imposed by M 1, 3

and the denominator is the proportion of M e s prior samples for ψ that obey the constraint imposed by M 1. To illustrate, consider again our initial example in which M e : µ 1, µ 2 ; σ and M 1 : µ 1 > µ 2 ; σ. Figure 1a shows the joint parameter space for µ 1 and µ 2 ; for illustrative purposes, we assume that the joint prior is uniform across the parameter space. In Figure 1a, half of the prior samples are in accordance with the constraints imposed by M 1. Figure 1a also shows three possible encompassing posterior distributions: A, B, and C. In case A, half of the posterior samples are in accordance with the constraint, and this yields BF 1e = 1. In case B, very few samples are in accordance with the constraint, and this yields a Bayes factor BF 1e that is close to zero (i.e., very large support against M 1 ). In case C, almost all samples are in accordance with the constraint, and this yields a Bayes factor BF 1e that is close to 2. (a) M 1 : µ 1 > µ 2 (b) M 2 : µ 1 µ 2 (c) M 3 : µ 1 = µ 2 Figure 1: The encompassing prior approach for inequality, about equality, and exact equality constraints. For illustrative purposes, we assume that the encompassing prior is uniform over the parameter space. The gray area represents the part of the encompassing parameter space that is in accordance with the constraints imposed by the nested model. The circles A, B and C represent three different encompassing posterior distributions. 2.2. Bayes Factors for About Equality Constraints In the EP approach, the Bayes factor for about equality constraints can be calculated in the same manner as for inequality constraints. To illustrate, consider our example in which M e : µ 1, µ 2 ; σ and M 2 : µ 1 µ 2 ; σ. Figure 1b shows as a gray area the proportion of prior samples that are in accordance with 4

the constraints imposed by M 2, which in this case equals about.20. Note that µ 1 µ 2 means µ 1 µ 2 < ε. The choice for ε defines the size of the parameter space that is allowed by the constraint. Now consider the three possible encompassing posterior distributions shown in Figure 1b. In case A, about 80% of the posterior samples are in accordance with the constraint, and this yields a Bayes factor BF 2e =.8/.2 = 4. In case B and C, slightly less than half of the samples, about 40%, are in accordance with the constraint, and this yields a Bayes factor BF 2e =.4/.2 = 2. As before, the Bayes factors are calculated with relative ease all that is required are prior and posterior samples from the encompassing model M e. 2.3. Bayes Factors for Exact Equality Constraints In some situations, any difference between µ 1 and µ 2 is deemed relevant, and this requires a test for exact equality. For instance, one may wish to test whether a chemical compound adds to the effectiveness of a particular medicine. For experimental studies such as this, an exact null effect is a priori plausible. However, it may appear that the EP approach does not extend to exact equality constraints in a straightforward fashion. To illustrate, consider our example in which M e : µ 1, µ 2 ; σ and now M 3 : µ 1 = µ 2 ; σ. Figure 1c shows that the only values allowed by the constrained model M 3 are those that fall exactly on the diagonal. As µ 1 and µ 2 are continuous variables, the proportion of prior and posterior samples that obey this constraint is zero. Therefore, the EP Bayes factor is 0/0, which has led several researchers to conclude that the EP Bayes factor is not defined for exact equality constraints (Rossell et al., 2008, pp. 111-112; Myung et al., 2008, p. 317; Klugkist, 2008, p. 71). The next two sections show that the EP Bayes factor is defined for exact equality constraints, and that it can be computed with relative ease. 2.3.1. Bayes factors for exact equality constraints: An iterative method In order to estimate the EP Bayes factor for exact equality constrained models, Laudy (2006, p. 115) and Klugkist (2008) proposed an iterative procedure. 5

In the context of a test between M e : µ 1, µ 2 ; σ and M 3 : µ 1 = µ 2 ; σ, the procedure comprises the following steps: Step 1: Choose a small value ε 1 and define M 3.1 : µ 1 µ 2 < ε 1 ; Step 2: Compute the Bayes factor BF (3.1)e using Equation 2; Step 3: Define ε 2 < ε 1 and M 3.2 : µ 1 µ 2 < ε 2 ; Step 4: Sample from the constrained ( µ 1 µ 2 < ε 1 ) prior and posterior and compute the Bayes factor BF (3.2)(3.1) ; Repeat steps 3 and 4, with each ε n+1 < ε n, until BF n+1,n 1. Then the required Bayes factor BF 3e can be calculated by multiplication: BF 3e = BF (3.1)e BF (3.2)(3.1),..., BF n(n 1). (3) In the limit (i.e., when ε n 0), this method yields the Bayes factor for the exact equality model M 3 versus the encompassing model M e. Although this iterative method solves the problem of having no samples that obey an exact equality constraint, the method is only approximate and potentially time consuming. 2.3.2. Bayes factors for exact equality constraints: An analytical method equivalence to the Savage-Dickey test Here we show that the EP test for exact equality is identical to the Savage- Dickey density ratio test, a one-step test that is both principled and fast. In order to understand this intuitively, Figure 2 shows a fictitious prior and posterior distribution for µ 1 µ 2, obtained under the encompassing model M e. The surface of the dashed areas equals the proportion of the prior and posterior distribution that is consistent with the constraint µ 1 µ 2 < ε. According to the EP approach, the Bayes factor is obtained by integrating the posterior and prior distribution over the area defined by the constraint. However, it is clear that when ε 0, both areas have surface 0. 6

Figure 2: The encompassing prior approach for exact equality constraints is the Savage-Dickey density ratio. The top dot represents the value of the posterior distribution at µ 1 = µ 2 and the bottom dot represents the value of the prior distribution at µ 1 = µ 2. The ratio of the heights of both densities equals the Bayes factor. But even though both integrals go to zero as ε 0, the Bayes factor is given by the ratio of the two integrals, and this ratio is calculated easily. The Bayes Factor for the equality constraint in the EP approach is the limit BF 3e = lim ǫ 0 ǫ/2 ǫ/2 p(ψ 0 + ψ D, M e )dψ ǫ/2 ǫ/2 p(ψ 0 + ψ M e )dψ. Clearly this limit approaches the form 0/0 and so l Hôpital s 0/0 rule can be employed to obtain p(ψ 0 + ǫ/2 D, M e )/2 + p(ψ 0 ǫ/2 D, M e )/2 BF 3e = lim = p(ψ 0 D, M e ) ǫ 0 p(ψ 0 + ǫ/2 M e )/2 + p(ψ 0 ǫ/2 M e )/2 p(ψ 0 M e ), (4) where ψ 0 represents the point of exact equality specified by the constrained model; in our example, M 3 : ψ 0 means µ 1 µ 2 = 0. Equation 4 shows that the 7

Bayes factor BF 3e simplifies to the ratio of the height of the marginal posterior and the height of the marginal prior at the point of interest, if the limiting processes in the numerator and the denominator are chosen equal (see Appendix A, for a proof). This is known as the Savage-Dickey density ratio test (Dickey and Lientz, 1970; Dickey, 1971; O Hagan and Forster, 2004; Verdinelli and Wasserman, 1995). For the example shown in Figure 2, the Bayes factor in favor of the exact equality model, BF 3e, is approximately 2. 2.3.3. A Cautionary Note: The Borel-Kolmogorov Paradox The main drawback of the EP approach to the test of exact equalities is that it is subject to the Borel-Kolmogorov paradox (DeGroot and Schervish, 2002; Jaynes, 2003; Lindley, 1997; Proschan and Presnell, 1998; Rao, 1988; Singpurwalla and Swift, 2001). This paradox arises when one conditions on events of probability zero. In the case of exact equality constraints, priors for the constrained model are constructed by conditioning on an null-set, and this invokes the Borel-Kolmogorov paradox (see Appendix B for a more detailed explanation). For the EP approach to exact equalities, the foregoing implies that the resulting Bayes factor may depend on the choice of the parameterization, a feature that is clearly undesirable (Dawid and Lauritzen, 2001; Schweder and Hjort, 1996; Wolpert, 1995). Note that the Borel-Kolmogorov paradox does not occur for inequality constraints, where one does not condition on a single point but on an interval. Several procedures are available to avoid the Borel-Kolmogorov paradox. In the example used throughout this paper, for instance, we specified δ = µ 1 µ 2, defined priors for µ 1 and µ 2, and tested whether or not δ = 0 this specification leads to the Borel-Kolmogorov paradox. Alternatively, we could specify µ 2 = µ 1 + δ, define priors for µ 1 and δ, and test whether or not δ = 0 this specification is not sensitive to the Borel-Kolmogorov paradox, in the sense that reparameterization of only δ does not change the Bayes factor (see Appendix C for explanation and proof). A more principled procedure to avoid the Borel-Kolmogorov paradox is to 8

construct priors for the exact equality constrained model not by usual conditioning, but by other means (e.g., marginalization, Kass and Raftery, 1995, Jeffreys conditioning, Dawid and Lauritzen, 2001, reference conditioning, Roverato and Consonni, 2004, Kullback-Leibler projection, Dawid and Lauritzen, 2001; Consonni and Veronese, 2008, and Hausdorff integrals, Kleibergen, 2004). Unfortunately, each of these alternative procedures invoke paradoxes and problems of their own (see Consonni and Veronese, 2008, for a review and a comparison). Presently, there does not appear to be a universally agreed-on method for specifying priors in nested models that is clearly superior to the conditioning procedure inherent in the Hoijtink and Klugkist EP approach. 3. Concluding Remarks Here we have shown that the Savage-Dickey density ratio test is a special case of the encompassing prior (EP) approach proposed by Hoijtink, Klugkist, and colleagues. The EP approach was developed to account for models with inequality constraints; as it turns out, the approach naturally extends to models with exact equality constraints. Consequently, the EP approach offers a unified, elegant, and simple method to compute Bayes factors in nested models. References Billingsley, P., 2008. Probability and Measure. Wiley, New York. Consonni, G., Veronese, P., 2008. Compatibility of prior specifications across linear models. Statistical Science 23, 332 353. Dawid, A. P., Lauritzen, S. L., 2001. Compatible prior distributions. In: George, E. (Ed.), Bayesian Methods with Applications to Science, Policy, and Official Statistics. Monographs of Official Statistics, Luxembourg, pp. 109 118. DeGroot, M., Schervish, M., 2002. Probability and statistics. Addison-Wesley Boston. 9

Dickey, J. M., 1971. The weighted likelihood ratio, linear hypotheses on normal location parameters. The Annals of Mathematical Statistics 42, 204 223. Dickey, J. M., Lientz, B. P., 1970. The weighted likelihood ratio, sharp hypotheses about chances, the order of a Markov chain. The Annals of Mathematical Statistics 41, 214 226. Gamerman, D., Lopes, H. F., 2006. Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference. Chapman & Hall/CRC, Boca Raton, FL. Hoijtink, H., Klugkist, I., Boelen, P., 2008. Bayesian evaluation of informative hypotheses. Springer, New York. Jaynes, E. T., 2003. Probability theory: The logic of science. Cambridge University Press, Cambridge, UK. Kass, R. E., Raftery, A. E., 1995. Bayes factors. Journal of the American Statistical Association 90, 377 395. Kleibergen, F., 2004. Invariant Bayesian inference in regression models that is robust against the Jeffreys Lindley s paradox. Journal of Econometrics 123, 227 258. Klugkist, I., 2008. Encompassing prior based model selection for inequality constrained analysis of variance. In: Hoijtink, H., Klugkist, I., Boelen, P. (Eds.), Bayesian evaluation of informative hypotheses. Springer, New York, pp. 53 83. Klugkist, I., Hoijtink, H., 2007. The Bayes factor for inequality and about equality constrained models. Computational Statistics and Data Analysis 51, 6367 6379. Klugkist, I., Kato, B., Hoijtink, H., 2005a. Bayesian model selection using encompassing priors. Statistica Neerlandica 59, 57 69. Klugkist, I., Laudy, O., Hoijtink, H., 2005b. Inequality constrained analysis of variance: A Bayesian approach. Psychological Methods 10, 477 493. 10

Kolmogorov, A., 1956. Foundations of the theory of probability. Chelsea Publishing Company, New York. Laudy, O., 2006. Bayesian inequality constrained models for categorical data. Ph.D. thesis, Utrecht University. Lewis, S. M., Raftery, A. E., 1997. Estimating Bayes factors via posterior simulation with the Laplace Metropolis estimator. Journal of the American Statistical Association 92, 648 655. Lindley, D., 1997. Some comments on Bayes factors. Journal of Statistical Planning and Inference 61, 181 189. Myung, J. I., Karabatsos, G., Iverson, G. J., 2008. A statistician s view on Bayesian evaluation of informative hypotheses. In: Hoijtink, H., Klugkist, I., Boelen, P. (Eds.), Bayesian evaluation of informative hypotheses. Springer, New York, pp. 309 327. O Hagan, A., Forster, J., 2004. Kendall s advanced theory of statistics vol. 2B: Bayesian inference (2nd ed.). Arnold, London. Proschan, M. A., Presnell, B., 1998. Expect the unexpected from conditional expectation. The American Statistician 52, 248 252. Rao, M., 1988. Paradoxes in conditional probability. Journal of Multivariate Analysis 27, 434 446. Rossell, D., Baladandayuthapani, V., Johnson, V. E., 2008. Bayes factors based on test statistics under order restrictions. In: Hoijtink, H., Klugkist, I., Boelen, P. A. (Eds.), Bayesian evaluation of informative hypotheses. Springer, New York, pp. 111 129. Roverato, A., Consonni, G., 2004. Compatible prior distributions for DAG models. Journal of the Royal Statistical Society B 66, 47 61. 11

Schweder, T., Hjort, N. L., 1996. Bayesian synthesis or likelihood synthesis what does Borel s paradox say? Report International Whaling Commission 46, 475 479. Singpurwalla, N. D., Swift, A., 2001. Network reliability and Borel s paradox. The American Statistician 55, 213 218. Verdinelli, I., Wasserman, L., 1995. Computing Bayes factors using a generalization of the Savage Dickey density ratio. Journal of the American Statistical Association 90, 614 618. Wolpert, R., 1995. Comment inference from a deterministic population dynamics model for bowhead whales. Journal of the American Statistical Association 90, 426 427. A. Proof that the EP approach converges to the S-D density ratio if and only if the limiting process for prior and posterior is the same Equation 4 shows that the EP Bayes factor for exact equality constraints is equal to the Savage-Dickey ratio. However, the rectangular regions of integration and the use of the same limiting processes in both the numerator and the denominator are arbitrary choices. Different choices of limiting process can lead to different Bayes factors, as shown next. To this end, let γ i (ǫ) 0 and δ i (ǫ) > 0, differentiable in a neighborhood (0, ǫ), such that lim ǫ 0 γ i (ǫ) = lim ǫ 0 δ i (ǫ) = 0 and γ i (0) + δ i (0) 0, for i = 1, 2. Here prime indicates derivative. Then, without loss of generality, these functions can be chosen to suit any form of the limiting process in the EP process BF 3e = lim ǫ 0 δ1(ǫ) γ p(ψ 1(ǫ) 0 + ψ D, M e )dψ δ2(ǫ) γ p(ψ 2(ǫ) 0 + ψ M e )dψ. 12

Intuitively this goes to the same limit as above, but in fact it does not, as l Hôpital s rule shows: BF 3e = lim ǫ 0 p(ψ 0 + δ 1 (ǫ) D, M e )δ 1 (ǫ) + p(ψ 0 γ 1 (ǫ) D, M e )γ 1 (ǫ) p(ψ 0 + δ 2 (ǫ) M e )δ 2 (ǫ) + p(ψ 0 γ 2 (ǫ) M e )γ 2 (ǫ) = p(ψ 0 D, M e )[δ 1 (0) + γ 1 (0)] p(ψ 0 M e )[δ 2 (0) + γ 2 (0)]. This is the above Savage-Dickey ratio if and only if δ 1 (0)+γ 1 (0) = δ 2 (0)+γ 2 (0). As both δ 1(0)+γ 1(0) and δ 2(0)+γ 2(0) measure the rate at which the numerator and denominator approach zero, the limit of the EP approach equals the Savage- Dickey ratio if and only if both numerator and denominator approach 0 at the same rate. In practice, of course, few would consider choosing δ 1 and δ 2, or γ 1 and γ 2 to differ, and so, for all practical purposes, the limit of the EP approach will generally equal the Savage-Dickey ratio. B. The Borel-Kolmogorov Paradox: An Example Consider the following situation, inspired by an example from Lindley (1997). Suppose that a point P is described by its Cartesian coordinates X and Y. Furthermore, suppose that 0 X 1 and 0 Y 1, and that P has a uniform distribution on the unit square. Suppose you are told that P lies on the diagonal through the origin, event B. What is your probability that X, associated with that P, and hence also Y, is less then 1/2 (i.e., event A)? The paradox lies in the fact that the answer to this question depends on how we parameterize the diagonal. We examine two situations: Z 1 = X Y = 0 (see Figure 3a) and Z 2 = X/Y = 1 (see Figure 3b). Note that because X and Y are continuous, the probability that X = Y is zero. Because conditioning on an event with probability zero is problematic, we consider values of X and Y that lie in the proximity of the line X = Y. From the geometry of the problem, the associated probability in the first case is P(Y ǫ X Y + ǫ) = 2(1/2 1/2(1 ǫ) 2 ) = (2 ǫ)ǫ, 13

(a) X-Y=0 (b) X/Y=1 Figure 3: Example of the Borel-Kolmogorov paradox. The shaded areas indicate the acceptable values of X and Y for the two parameterizations. In panel (a), all values of X are equally likely while in panel (b), the wedge-like shape suggests that values close to 1 are more likely than those close to 0. while the associated probability for the second case is P(Y ǫy X Y ǫy ) = 2(1/2 1/2 1 (1 ǫ)) = ǫ. Now consider the probabilities that (X, Y ) lies in the left lower quadrant of the square (X, Y 1/2) and either that X Y < ǫ and or that X/Y 1 < ǫ. Again from geometry, the probability of the first case is and for the second case P( X Y < ǫ X, Y 1/2) = (1 ǫ)ǫ, P( X/Y 1 < ǫ X, Y 1/2) = 1 4 ǫ. Hence, the corresponding conditional probabilities of the events that (X, Y ) lies in the left lower quadrant of the square, given that either the event X Y ǫ or that the event X/Y 1 ǫ occurred, are respectively and P(X, Y 1/2 X Y < ǫ) = 1 ǫ 2 ǫ P(X, Y 1/2 X/Y 1 < ǫ) = 1 4. 14

If we now take to the limit ǫ 0, both the events X Y ǫ and X/Y 1 ǫ coincide with the lower left half of the diagonal of the square. However, although the first probability coincides with our intuition that P(X, Y 1/2 X = Y ) = 1/2, the second has it that the probability for this seemingly equal event should be 1/4! The example shows that the probability of an event conditioned on a limiting event of zero probability depends on the way in which the limiting event was generated, i.e., on the parameterization that was chosen to generate the zero probability event. In effect, conditional probability is not invariant under coordinate transformations of the conditioning variable. This paradox is resolved if one accepts that conditional probability cannot be unambiguously defined with respect to events of zero probability without specifying the limiting process from which it should result (Jaynes, 2003). It is random variables on which conditioning is unambiguous, not singular events (see Kolmogorov, 1956, Billingsley, 2008, and Wolpert, 1995). C. When the Savage-Dickey Ratio is Invariant Under Smooth Transformations In light of the Borel-Kolmogorov paradox, it is important to understand when the Savage Dickey density ratio is invariant under smooth transformations of the chosen parameterization. Suppose the chosen set of (absolute continuous) parameters is θ with prior and posterior densities p(θ M t ) and p(θ D, M t ). Let g be a differentiable invertible transform (a diffeomorfism) with inverse h so that χ = g(θ) and h(χ) = θ. The implied prior and posterior densities are denoted p(χ M t ) and p(χ D, M t ) respectively. In general, the parameter vector can be partitioned as θ = (ψ, φ), where φ are nuisance parameters that are not involved in the evaluation of the null hypothesis. We are interested in evaluating the evidence for the simple hypothesis H 0 : ψ = ψ 0, 15

which, in terms of χ = (g 1 (θ), g 2 (θ)) = (ν, ξ) can often be cast equivalently as H 0 : ν = ν 0. We wish to know under what circumstances the Savage Dickey ratio s are equal. That is, we want to determine conditions on g under which the desired equality BF 0t = p(ψ 0 D, M t ) p(ψ 0 M t ) = p(ν 0 D, M t ) p(ν 0 M t ) (5) is true. It turns out that this equality holds, as long as the transformation g does not depend on the data D, and as long as the parameters on which H 0 imposes a simple hypothesis, transform independently from the nuisance parameters. This follows from the following considerations. By the change of variables rule, p(χ D, M t ) = p(h(χ) D, M t ) h (χ) +, p(χ M t ) = p(h(χ) M t ) h (χ) +, where h (χ) + denotes the absolute value of the determinant of the Jacobian matrix h (χ) = h(χ) χ of the transformation h. Consequently, the evidence for H 0 is obtained from the Savage-Dickey ratio BF 0t = p(ψ 0 D, M t ) p(ψ0, φ D, M t )dφ = p(ψ 0 M t ) p(ψ0, φ M t )dφ = p(ν0, ξ D, M t ) h (ν 0, ξ) 1 + dξ p(ν0, ξ M t ) h (ν 0, ξ) 1 + dξ. The desired equality (5) critically depends on the Jacobian h (ν 0, ξ) + = h (χ) + : If g 1 does not depend on the nuisance parameters φ, i.e., g 1 (θ) = g 1 (ψ), then it is straightforward to show that h (ν 0, ξ) + = h 1 (ν 0) + h 2 (ν 0, ξ) +, so that the density ratio reduces to p(ψ 0 D, M t ) p(ψ M t ) which is (5). = p(ν0, ξ D, M t ) h 2(ν 0, ξ) 1 + dξ p(ν0, ξ M t ) h 2 (ν 0, ξ) 1 + dξ = p(ν0, φ D, M t )dφ p(ν0, φ M t )dφ = p(ν 0 D, M t ) p(ν 0 M t ), In the previous example the first parameterization was given by θ = (ψ, φ) = (µ 1 µ 2, µ 1 ), with ψ 0 = 0. The alternative parameterization was given by χ = (ν, ξ) = (µ 2 /µ 1, µ 1 ) with ν 0 = 1. Therefore, g is given by g(ψ, φ) = ((φ ψ)/φ, φ), 16

and hence g 1 = (ψ φ)/φ depends on the nuisance parameter φ. Thus, in general we should not expect p(ψ 0 D, M t )/p(ψ 0 M t ) and p(ν 0 D, M t )/p(ν 0 M t ) to be equal. On the other hand, if we were to reparameterize to χ = (ν, ξ) = (1/(1 + e ψ ), φ/ψ), with ν 0 = 1/2, then g 1 = 1/(1 + e ψ ) is independent from the nuisance parameter, and the above implies that p(ψ 0 D, M t )/p(ψ 0 M t ) and p(ν 0 D, M t )/p(ν 0 M t ) are equal. 17