# Estimation of Large Families of Bayes Factors from Markov Chain Output

Save this PDF as:

Size: px
Start display at page:

Download "Estimation of Large Families of Bayes Factors from Markov Chain Output"

## Transcription

1 Estimatio of Large Families of Bayes Factors from Markov Chai Output Hai Doss Uiversity of Florida Abstract We cosider situatios i Bayesia aalysis where the prior is idexed by a hyperparameter takig o a cotiuum of values. We distiguish some arbitrary value of the hyperparameter, ad cosider the problem of estimatig the Bayes factor for the model idexed by the hyperparameter vs. the model specified by the distiguished poit, as the hyperparameter varies. We assume that we have Markov chai output from the posterior for a fiite umber of the priors, ad develop a method for efficietly computig estimates of the etire family of Bayes factors. As a applicatio of the ideas, we cosider some commoly used hierarchical Bayesia models ad show that the parametric assumptios i these models ca be recast as assumptios regardig the prior. Therefore, our method ca be used as a model selectio criterio i a Bayesia framework. We illustrate our methodology through a detailed example ivolvig Bayesia model selectio. Key words ad phrases: Bayes factors, cotrol variates, ergodicity, importace samplig, Markov chai Mote Carlo

8 idepedet ad idetically distributed sice they form a stratified sample: we have exactly s draws from ν hs,y, s =,..., k, a fact which causes o problems. We wish to estimate the itegral I h = l y (θ)ν h (θ)/m h dθ = B(h, h ). Defie the fuctios H j (θ) = l y (θ)ν hj (θ)/m hj l y (θ)ν h (θ)/m h, j = 2,..., k. We have H j (θ) dθ = 0, or equivaletly E pa ( Hj (θ)/p a (θ) ) = 0, where the subscript idicates that the expectatio is take with respect to the mixture distributio p a. Therefore, for every β = (β 2,..., β k ) the estimate Î h,β = k l l= i= l y (θ (l) i )ν h (θ (l) /m h [ ly (θ (l) ( ν hj (θ (l) /m hj ν h (θ (l) )] /m h s= a sν hs,y(θ (l) j=2 β j is ubiased. As writte, this estimate is ot computable, because it ivolves the ormalizig costats m hj, which are ukow, ad also the likelihood l y (θ), which may ot be available. We rewrite it i computable form as Î h,β = k l l= i= ν h (θ (l) i ) j=2 β [ j νhj (θ (l) /d j ν h (θ (l) ] s= a. (2.9) sν hs (θ (l) /d s We would like to use the value of β, call it β opt, that miimizes the variace of Îh,β, but this β opt is geerally ukow. As i Owe ad Zhou (2000), we ca do ordiary liear regressio of Y (h) Y (h) = o predictors Z (j), where ν h (θ (l) s= a, Z (j) sν hs (θ (l) /d s = ν h j (θ (l) /d j ν h (θ (l) s= a, j = 2,..., k, (2.0) sν hs (θ (l) /d s ad all required quatities are available. We the use the least squares estimate ˆβ, i.e. the estimate of I h is Îh, ˆβ. It is easy to see that Îh, ˆβ is simply ˆβ 0, the estimate of the itercept term i the bigger regressio problem where we iclude the itercept term, i.e. Î h, ˆβ = ˆβ 0. (2.) Oe ca show that if the k sequeces are all iid sequeces, the ˆβ coverges to β opt, ad Îh, ˆβ is guarateed to be at least as efficiet as the aive estimator. But whe we have Markov chais this is ot the case, especially if the chais mix at differet rates. I Sectio 2.3 we cosider the estimates ˆβ ad Îh, ˆβ directly. I particular, we give a precise defiitio of the oradom value β that ˆβ is estimatig (it is β (h) lim i equatio (A.3)), ad show that the effect of usig ˆβ istead of β is asymptotically egligible. 7

9 It is atural to cosider the problem of estimatig β opt i the Markov chais settig. Actually, before thikig about miimizig the variace of (2.9) with respect to β, oe should first ote the followig. The costats a s = s /, s =,..., k, used i formig the values Y (h) are sesible i the iid settig, but whe dealig with Markov chais oe would wat to replace s with a effective sample size, as discussed by Meg ad Wog (996). Therefore, the real problem is two-fold: How do we fid optimal (or good) values to use i place of the a s s i the Y (h) s? Usig the Y (h) s based o these values, how do we estimate the value of β that miimizes the variace of (2.9)? Both problems appear to be very difficult. Ituitively at least, the method described here should perform well if the mixig rates of the Markov chais are ot very differet. But i ay case, the results i Sectio 2.3 show that, whether or ot Îh, ˆβ is optimal, it is a cosistet ad asymptotically ormal estimator whose variace ca be estimated cosistetly. Note that if we do ot use cotrol variates, our estimate is just which is exactly (2.). k l ν h (θ (l) s= a, sν hs (θ (l) /d s l= i= Reductio i Variace from Usig the Cotrol Variates of the resposes Y (h) ad predictors Z (j) give by Cosider the liear combiatio L = k a j Z (j) + Y (h). j=2 (We are droppig the subscripts i, l.) A calculatio shows that if h = h the L =, meaig that we have a estimate with zero variace. Similarly, for t = 2,..., k, let L t be the liear combiatio give by k L t = a j Z (j) + (/d t )Y (h) Z (t). j=2 If h = h t, the L t =. Thus if h {h,..., h k }, our estimate of the Bayes factor B(h, h ) has zero variace. This is ot surprisig sice, after all, we are assumig that we kow B(h j, h ), for j =,..., k; however, this does idicate that if we use these cotrol variates, our estimate will be very precise as log as h is close to at least oe of the h j s. This advatage does ot exist if we use the plai estimate (2.). The itercept term i the regressio of the Y (h) s o the Z (j) s is simply a liear combiatio of the form ˆβ 0 = k l l= i= w Y (h). (2.2) 8