ESTIl\1ATING BAYES FACTORS VIA POSTERIOR SIMULATION \VITH THE LAPLACE-lVIETROPOLIS ESTIlVIATOR

Size: px

Start display at page:

Download "ESTIl\1ATING BAYES FACTORS VIA POSTERIOR SIMULATION \VITH THE LAPLACE-lVIETROPOLIS ESTIlVIATOR"

Lionel Washington
5 years ago
Views:

1 ESTIl\1ATING BAYES FACTORS VIA POSTERIOR SIMULATION \VITH THE LAPLACE-lVIETROPOLIS ESTIlVIATOR by Steven M. Lewis Adrian E. Raftery TECHNICAL REPORT No. 279 November 1994 Department of Statistics, GN-22 University of Washington Seattle, Washington USA

2 Estin1ating Bayes Factors via Posterior Simulation with the Laplace-Metropolis Estin1ator Steven 1\1. Lewis and Adrian E. Raftery University of vvashington November 30, 1994 Abstract The key quantity needed for Bayesian hypothesis testing and model selection is the marginal likelihood for a model, also known as the integrated likelihood, or the marginal probability of the data. In this paper we describe a way to use posterior simulation output to estimate marginal likelihoods. vve describe the basic Laplace Metropolis estimator for models without random effects. For models with random effects the compound Laplace-Metropolis estimator is introduced. This estimator is applied to data from the World Fertility Survey and shown to give accurate results. Batching of simulation output is used to assess the uncertainty involved in using the compound Laplace-Metropolis estimator. The method allows us to test for the effects of independent variables in a random effects model, and also to test for the presence of random effects. \VORDS: Laplace-Metropolis estimator; Random effects models; Marginal likelihoods; l-'"cte.y"u,, Cll111ll1(],'ulVJeJ, World Survey. 1 Introduction Bayesian solution to moael sell,;ction pf()bleii1ts IS

3 rallo B Ol, for comparing model agiun:st model 1\J 1 ivia agamsl to pnor nh"p, vp d data, Y, In IS ratio integrated, likelihoods two models being compared. Hence calculation of Bayes factors boils down to marginal likelihoods, where () m is the vector of parameters in model AIm and! (() m I j\jm) is its prior density. Dropping the notational dependence on the model, this can be rewritten as!(y) = j!(y I ())!(()) d(). (1) Historically the integration required for calculating marginal likelihoods has been done taking advantage of conjugacy or by assuming approximate posterior normality. In other cases requisite integrals have been approximated using such methods as Gaussian quadrature aylor and Smith 1982), the Laplace approximation (de Bruijn 1970; Tierney and Kadane 1986) or Monte Carlo methods. With the availability of increasing computer power, Markov chain Monte Carlo (MCMC) has become a reasonable alternative. have been a number of alternative ways suggested to use posterior simulation output, as that produced by MCMC, to estimate marginal likelihoods. These are Raftery (1995) and include importance sampling methods as calculating TTnnn,r mean of the output likelihoods, vanous ways of combining prior and such as paper we (le:scflbe CH1'vullvl way to use post(~n()r 0J,1l1l,llCLCIVll n"t"l1t

4 for hi H'-,rrlc,ir-, I IJ<l.Pl<l.LC method at two ditter ent D:loljels. from apply LJGlVl'CLLC In(~tJJlod is applied at a second to integrate out each of the random effect parameters. [n Section 6 the compound Laplace-Metropolis estimator is used to calculate log marginal likelihoods for a number of different models fit to data collected in Iran as part of the World Survey. By taking the difference between log marginal likelihoods for two competing models we are able to estimate the Bayes factor for comparison of the two models. How are log marginal likelihoods estimates produced the compound In Section 7 we demonstrate a method to adapt one way assessmg variability of Monte Carlo estimators, namely the batching of simulation output, for the purpose of assessing the variability of the compound Laplace-Metropolis estimator. We close by comparing the estimates of the log marginal likelihoods provided by the compound Laplace-Metropolis estimator against a "gold standard" determined on the basis Dvlre.yy,.clu large samples from the joint prior distribution of all parameters. The compound Laplace-Metropolis estimates are shown to be remarkably accurate, particularly in light of substantially reduced computing time required. 2 Calculating the Marginal Likelihood Using the Prior Distribution Before describing the Laplace-Metropolis estimator, we first describe a method for obtaining for compound marginal likelihoods we can compare the ap1prc)xlitlate ll not lt1 can paper..gel]eral it is

5 one be<:on1es a good vjvhhlcv'jl to 1'1'>,'11,',0. mt(:;gr,'tl m eqllatlon (1). In error to an it was necessary to use a sample roughly.50 million from the prior The computing effort needed to obtain such a sample can be enormous, days of computer time on most workstations currently available. In this paper will describe an estimator which may be used to estimate the logarithm of the integral in equation (1) in substantially less computer time. However, as a way to check accuracy of the estimator which we describe, we have also calculated Monte Carlo estimate using equation (2) for some of the models demonstrated in this paper; we will refer to this Monte Carlo estimate as the "actual" log marginal likelihood. 3 The Laplace-Metropolis Estimator of the Marginal Likelihood Raftery (1995) has recently proposed a method for estimating the marginal likelihood by combining the Laplace approximation with MCMC. He suggests using the Metropolis al- (Metropolis et as a means for estimating quantities for La])l;:Lce approximation. By applying the Laplace method we can derive the following approximation for the marginal likelihood: P 1 j. r.y) :::::; (2JrI2"IH*I'i j'(8*') f(y 18*). \ J \ J II. J I I (3) 8* is value of 8 which h (8) == log {j (8) j (Y I 8)} attains its maximum, H* 1Tlv',o.r<,,o. Hessian h evaluated at 8* P is dimension of the parameter nurfi(~n;::al reasons IS to log likelihoods, it as (Y P 2 {2Jr {IH* {f Y 8*)}.

6 estimate 0* be to compute h (0) h (0) is large:;t we consider of 0 which minimizes from 0*. can a of computing effort. multivariate median, or L 1 center, which is defined as J l: 10 ( ) - OU) I, =1 denotes L 1 distance. \Ve use this as an estimator of the posterior mode. 'I'he other quantity needed for the Laplace-JVletropolis estimator is H*. This is asympequal to the posterior variance matrix. \Ve could use the sample covariance matrix of the simulation output for H*. However, since MCMC trajectories are known to take occasional distant excursions, it would probably be a good idea to use a robust estimator for the posterior variance matrix. One such estimator is the weighted variance matrix estimate with weights based on the minimum volume ellipsoid estimate of Rousseeuw and van Zomeren (1990). 4 The Compound Laplace-Metropolis Estimator for Hierarchical Models recent work, to be described shortly, we have found need of marginal likelihoods for a UUHf',>, of alternative hierarchical, i.e. random effects, models. Here we use the term hierarmodel in the sense used by Raudenbush (1988), observations are made on subjects which themselves should for a model in which several grouped into higher level enorder to explicitly take into account aggregation m sampling design. It ]S the case that hierarchical involve a very large number nuisance paramrandom effects, as a result calculating marginal llk,ell1]o()(1s becojme~s 00.-_.. ",; assllmptlon IS quiestloilat)le.

7 lul'hhl\.a> pal['arnel;ers, oc, rest of 8 (fj, term in the Laplace-Metropolis estimator, equation (,requiring additional attention a random effects model is the log-likelihood, log {f (Y! 8*)}. Calculation of the terms in equation can be accomplished precisely as already described in Section 3. We still need to find the posterior mode of 8. To do so we locate the L 1 center of (fj, E. The posterior variance of the fixed effects, fj, can still be used for H*. The term IS logarithm joint prior density of fixed parameters and variance hyperparameter. many random effects models the random effects are assumed to conditionally independent given the other parameters in the model, such as the fixed effect parameters and the hyperparameters. vve can take advantage of this assumption to calculate the loglikelihood as simply the sum of the log-likelihoods for each of the random effects. In other words, log {f (Y I fj, E)} I L log {f (Yi I fj, E)}, i=l (5) where log {f (Y i I fj, En (6) CYi is the random effect parameter for the i th context. Equation holds for any (fj, E). In particular, it holds for (0*), the joint mode of IbtltH)n. f (Y i I ai, fj, a result we can each of log- equation conditioning on (0*). integrals on For

8 to as a compound Lapliicc:-lVletrolDoJllS C;",lJllCLvUl IS ;\.1 P 2 1 {2Ti1+ J 2 IH*I} + log (fj*} I Lii i=1 5 The Laplace-Metropolis Estimator for the Logistic Hierarchical Model In this section we derive the Laplace-Metropolis estimator for one of the most frequently of hierarchical model, namely logistic hierarchical model. Here we assume the data are produced by a mixed logistic model. That is, we assume logit(1r) = X'T7 + a, (8) where the data may take on only values 0 or 1, 1r = {Tiit}, Tiit is the probability that the tth observation within the i th random effect is a 1, and X is a matrix of covariate information. The likelihood of the vector of observations for the i th random effect, Y i, is j 'ry! \ \ i 1'T7,0'i) = If we also assume the prior distribution of each of the random effect parameters IS Gaussian with mean 0 and variance L; and independent of the fixed effect parameters the other random effect parameters, it is straightforward to calculate the conditional likelihood of the random effect's vector of observations as 1 \ ~ (2Tit) exp {hz )} + 'X 1 itt! } }

9 can locate the mode we UCllUI,C a iterations the second we square root of C1elLerrmrla;llt of minus the inverse of the Hessian conditional on (0*) evaluated at, I IS vve now have all the quantities needed to use the Laplace method to approximate the integrals on the right-hand side of equation (6). Doing so, we find that a Laplace estimate for random effect log conditional likelihood is L; = These Laplace estimates of the log conditional likelihoods may then be used in equation (7) to calculate a compound Laplace-Metropolis estimate for hierarchical models. In the next section we will show how this works for an example using data collected in Iran as part of the 'World Fertility Survey. 6 Example Using Data from the World Fertility Survey The Iran Fertility Survey (IFS) was a part of the 'World Fertility Survey CWFS). There is m literature describing the details WFS. volume edited serves as a ralldc)ml primary summary publication on the \:VFS. The covariate m10rjmajlcm as

10 m was wuelller or not a woman expene:ncea a m a we to as exposure-years. model consisted of 4 fixed effect parameters in addition to the intercept. The 4 fixed were the age of the woman during each exposure-year (centered at the average li1 IFS data, an indicator variable which is 1 for the first exposure-year of the interval and 0 otherwise, the woman's parity (number of previous children born) during each exposure-year the woman's completed education level (a 6 level categorical variable). exanlpje we also included a random parameter for each woman in the sample. llnple~mient;eda Metropolis algorithm for estimating parameters of a mixed logistic, in a Fortran program written specifically to handle event history data sets (Lewis 1993, 1994; Raftery, Lewis and Aghajanian 1994). \lve obtained the results shown in Table 1. Metropolis was run for a total of 5,500 iterations of which the first 500 were discarded for "burn-in". Table 1: RegT'ession pammeter' estimates fot' a sample ft'om the IFS event histot'y data set llsing the 4 fixed effeet model. Intercept Centered age First of interval nr,cn"",,,,,,, section

11 next sectlo,n we one way doing this. 7 Assessing variance of the estimator using hatching assess the the Laplace-Ivletropolis estimator we adapted one of the most used methods for assessing the uncertainty involved in Monte Carlo estimation, the method of batch means, to the situation at hand. The use of this method for ca.lculating the Monte Carlo variance for the mean of a functional of posterior simulation output has been discussed numerous authors, including Hastings (1970), Schmeiser (1982) and (1992). The underlying idea is to divide an entire MCMC sample into a fairly small number of batches, say B, of equal size. The mean of the simulations within each batch is found, producing B estimates of the mean. Under relatively mild conditions these B estimates will be essentially independent. The sample of the B estimates can then be used to provide a compound estimate of the mean along with an estimate for its variance. To apply this idea to assessing the uncertainty of the Laplace-Metropolis estimator we can ca.lculate separate Laplace-Metropolis estimates within each of B batches, take the mean of B estimates as a compound Laplace-Metropolis estimate and take the variance the mean as an estimate of the variance of the compound Laplace-Metropolis estimate. In other batch we can find Laplace-Metropolis estimate for the log marginal likelihood use Mb = ~ log {2r.} + ~ log {IH*I} + log {f (O*)} +t {i "=1 B estimates to calculate an overall Laplace-Metropolis estimate, (9) A1 Hlt~LHOU to

12 at some if is not or is not sampling from stationary distribution, this is usually apparent sequential plots of one or more fixed effect realizations. The sequential plot of the intercept realizations is the plot which most often exhibits difficulties in the Markov chain. Figure 1 shows the sequential realizations of intercept parameter for the 4 fixed effects model The sequential plots of fixed effects were similar to the intercept plot. In this case the Markov chain seems c;ilvul',h and is likely to be sampling from stationary distribution. N o N I L... j '-,J r , I o : realizatzons of 'UlJ' P'l"J'/-"i'jf pararneter. It is interesting to look at eslt!n1a1;ed marginal denslth~s for some random parameters.

13 erroneous a such as ls pra,ctlce, is quite likely to lead to l!) 0 -.:::t 0 Ct') 0 C\I 0 1,... I Figure 2: Estimated marginal distribution of the intercept parameter. Table 2 shows the estimated log marginal likelihood within each batch. The first batch consisted of the 501 st through 5, sooth iterations. The second batch was made up of it 501 through 10,500, and so forth. Shown are the contribution of the within batch maximized log-likelihood, the contribution of the other three terms of equation to each within batch log marginal likelihood and in the rightmost column the within batch IIli:lTf'"IIli'H likelihood; predominant contribution of the maximized log-likelihood is

14 /Yxample of calculating Laplace-1Vletropolis estimates for 8PIJrtrrtU batches using the SZTliWluum output for the /t effects model., Batch \Vithin Batch ~Within Batch I \Vithin Batch Loglikelihood Other I Logp~terior 'Lf=l i Terms ;\/h I I I H Discussion Raftery (1995) originally proposed Laplace-Metropolis estimator to get around limitations encountered when trying to use the Laplace method. He also proposed use of the L 1 center as an approximation for the posterior mode in the situation where it is impractical,olllluiclv'-u parameter vector. A 1'0,.1101'

15 we COJ]lD,OUnd Lapi,1,CEHVlet,ropo!ls e:3tllnator are reiilajl"ka.ui) approximations obtained accurate. standard solution for comparing two models is to compute Bayes factor. The Bayes factor is ratio of two marginal likelihoods. In this paper we demonstrated how we were able to obtain good approximations for marginal likelihoods in hierarchical models. \Ve found a point estimate of for the log marginal likelihood an example model with 4 fixed effects. \Ve have also fit a number of other mcloels to IFS event history set. example, when we add husband's level completed education to the model, we get an approximate log marginal likelihood of Hence, Bayes factor found for comparing the model without husband's education against the model with husband's education is approximately e 3. 0 ~ 20, providing evidence for the smaller model. Models containing other potential covariates may be similarly compared. Bayes factors can be used not only to compare models containing different fixed effect parameters but also to assess whether the data provide evidence for or against the presence of random effects. In Section 7 we found that a compound Laplace-Metropolis estimate of the log marginal likelihood for the 4 fixed effects model was This model included a woman in the sample. Should we have included rallg<)m 'AL"A.\.,,~ in the model? If we can calculate the log marginal likelihood of a model without random effects, we will be able to determine a Bayes factor for comparing the model with random effects to the model without random effects. The marginal likelihood for the model without random effects can be approximated using Laplace method (Raftery 1993). For the 4 fixed effects model this was Hence the Bayes factor for comparing the model with random effects against HIe model without GH1'UVJ.ll ettelcts is exp { } ~ There is some weak the model incorporating random effects. References

16 C.,1. -Tra<:tH:;al Markov " Statistical,.",JCU:!ICe I, 1- (1970). "Monte sampling methods using Markov chains and their applications. Biometrika 57, 97~109. Kass, R. and Raftery, A.E. (1995). "Bayes factors. Jour'nal of the American Statistical Association, To appear. Lewis, S.M. (1993), "Contribution to the discussion of three papers on Gibbs sampling and related Markov chain Monte Carlo methods." ser'ies B, 55, 79~81. S.M. (1 lvlonte Carlo 111ethods. ivlodeling of Discrete Unpublished doctoral University of Washington, Seattle, Wa. Meng, X.I... and 'Wong, vv.h. (1993). Journal of the Royal Statistical Society! History Data Using Markov dissertation, Department of Statistics, "Simulating ratios of normalizing constants VIa a simple identity." Technical Report no. 365, Department of Statistics, University of Chicago. Metropolis, N., Rosenbluth, A.vV., Rosenbluth, M.N., Teller, A.H. and Teller, E. (1953). "Equations of state calculations by fast computing machines." Physics 21, l087~1092. Journal of Chemical Naylor, J.C. and Smith, A.F.M. (1982). "Applications of a method for the efficient computation of posterior distributions." Applied Statistics 31, 214~225. N M.A. and Raftery, A.E. (1994). "Approximate Bayesian inference by the weighted likelihood bootstrap (with discussion)." Journal of the Royal Statistical Society! Series B 56, 3~48. A.E. (1993). "Approximate Bayes factors and accounting for model uncertainty models." no. 255, of :::itcltistics. D.J. :::ipleg;elhalter

17 A. S.M., A. Kahn, M.J. (,,,roup'u data." \Vorking Paper No. In UE~rrllog]['aT)ny and Ecology, of <Washington, JC'tbLLIC, S.VV. (1988). "Educational applications of hierarchical linear models: A re Journal of Educational Statistics 13, 85~116. Ilousseeuw, P.J. and van Zomeren, B.C. (1990). "Unmasking multivariate outliers and discussion). Journal the American Statistical Association 85, 6:3: Schmeiser, B. (1982). "Batch size effects in the analysis of simulation output." Operations Research 30, 556~568. Terrell, C.R. (1990). "The maximal smoothing principle in density estimation." Journal of the American Statistical Association 85, 470~477. Tierney, L. and Kadane, J.B. (1986). "Accurate approximations for posterior moments and marginal densities." Journal of the American Statistical Association 81,

A note on Reversible Jump Markov Chain Monte Carlo

A note on Reversible Jump Markov Chain Monte Carlo Hedibert Freitas Lopes Graduate School of Business The University of Chicago 5807 South Woodlawn Avenue Chicago, Illinois 60637 February, 1st 2006 1 Introduction