Bayesian Inference and the Parametric Bootstrap Bradley Efron Stanford University
Importance Sampling for Bayes Posterior Distribution Newton and Raftery (1994 JRSS-B) Nonparametric Bootstrap: good choice (actually used a smoothed nonparametric bootstrap) Today Parametric Bootstrap Good computational properties when applicable Connection between Bayes and frequentist inference Bayesian Inference 1
Student Score Data (Mardia, Kent and Bibby) n = 22 students scores on two tests: mechanics, vectors Data y = (y 1, y 2,..., y 22 ) with y i = (mec i, vec i ) Parameter of interest θ = correlation (mec, vec) Sample correlation coefficient ˆθ = 0.498±?? Bayesian Inference 2
Scores of 22 students on two tests 'mechanics' and 'vectors'; Sample Correlation Coefficient is.498 +?? vectors score 30 40 50 60 70 0 20 40 60 mechanics score Sampled from 88 students, Mardia, Kent and Bibby Bayesian Inference 3
R. A. Fisher (1915) Density f θ ( ˆθ ) = n 2 π (1 θ2 ) ( (n 1)/2 1 ˆθ 2) (n 4)/2 0 [ cosh(w) θ ˆθ ] (n 1) dw Bivariate normal y i ind N 2 (µ, /Σ) i = 1, 2,..., n 95% confidence limits (from z transform) θ (0.084, 0.754) Bayesian Inference 4
Fisher density for correlation coefficient if corr=.498 And histogram for 10000 parametric bootstrap replications 2.5 Frequency 0 100 200 300 400 500 600.084 corrhat=.498.754 2 1.5 1 0.5 density 0.2 0.0 0.2 0.4 0.6 0.8 1.0 correlation Red triangles are Fisher 95% confidence limits Bayesian Inference 5
Parametric Bootstrap Computations ind Assume y i N 2 (µ, /Σ) Estimate ˆµ and ˆ/Σ by MLE ( ) ind Bootstrap sample N 2 ˆµ, ˆ/Σ for i = 1, 2,..., 22 y i Bootstrap replication ˆθ = corr coeff for y = (y 1, y 2,..., y 22 ) Bootstrap standard deviation sd ( ˆθ ) = 0.168 (B = 10, 000) Percentile 95% confidence limits (0.109, 0.761) Bayesian Inference 6
Better Bootstrap Confidence Limits (BCa) Idea Reweighting the B bootstrap replications improves coverage Weights involve z 0 (bias correction) and a (acceleration) W BCa (θ) = ϕ(z θ/(1 + az θ ) z 0 ) (1 + az θ ) 2 ϕ(z θ + z 0 ) where Z θ = Φ 1 G(θ) z 0 bootstrap cdf Reweighting the B = 10, 000 bootstraps gave weighted percentiles (0.075, 0.748) Fisher: (0.084, 0.754) Bayesian Inference 7
Bayes Posterior Intervals Prior π(θ) Posterior expectation for t(θ): E { t(θ) ˆθ } ( = t(θ)π(θ) f ) / ( θ ˆθ dθ π(θ) f ) θ ˆθ dθ Conversion factor Ratio of likelihood to bootstrap density: ( R(θ) = f ) / θ ˆθ f ˆθ (θ) ( θ = ˆθ ) Bootstrap integrals E { t(θ) ˆθ } = t(θ)π(θ)r(θ) f ˆθ (θ) dθ π(θ)r(θ) f ˆθ (θ) dθ Bayesian Inference 8
Bootstrap Estimation of Bayes Expectation E{t(θ) ˆθ} Parametric bootstrap replications f ˆθ ( ) θ 1, θ 2,..., θ i,..., θ B t i = t(θ i ), π i = π(θ i ), R i = R(θ i ): Ê { t(θ) ˆθ } = B / B t i π i R i π i R i i=1 i=1 Reweighting Weight π i R i on θ i Importance sampling estimate: Ê { t(θ) ˆθ } E { t(θ) ˆθ } as B Bayesian Inference 9
Jeffreys Priors Harold Jeffreys (1930s): Theory of uninformative prior distributions ( invariant, objective, reference,... ) For density family f θ (y), θ R p : π(θ) = c I(θ) 1/2 I(θ) = Fisher info matrix Normal correlation π(θ) = 1/(1 θ 2 ) Bayesian Inference 10
Importance sampling posterior density for correlation (black) using Jeffreys prior pi(theta)=1/(1 theta^2); BCa weighted density (green); raw bootstrap density (red) weighted counts 0.0 0.5 1.0 1.5 2.0 2.5 0.2 0.0 0.2 0.4 0.6 0.8 correlation triangles show 95% posterior limits Bayesian Inference 11
Multiparameter Exponential Families R P p-dimensional sufficient statistic ˆβ p-dimensional parameter vector β = E { ˆβ } (Poisson: e λ λ x /x! has ˆβ = x, β = λ) Hoeffding f β ( ˆβ ) = f ˆβ ( ˆβ ) e D ( ˆβ,β)/2 Deviance D(β 1, β 2 ) = 2E β1 log { f β1 ( ˆβ ) / f β2 ( ˆβ )} Bayesian Inference 12
Conversion Factor in Exponential Families Conversion factor R(β) = f β ( ˆβ ) / f ˆβ (β) (with ˆβ fixed) R(β) = ν(β)e (β) where (β) = [ D ( β, ˆβ ) D ( ˆβ, β )] / 2 ν(β) = f ˆβ ( ˆβ ) / f β (β) 1/Jeffreys prior (Laplace) Jeffreys prior π(β)r(β) e (β) Bayesian Inference 13
Example: Multivariate Normal y i ind N d (µ, Σ) for i = 1, 2,..., n = 1 ( ) n µ ˆµ ˆ/Σ /Σ 1 2 ( µ ˆµ ) + 1 2 tr ( /Σ ˆ/Σ 1 ˆ/Σ/Σ 1 ) + log ˆ/Σ /Σ eigenratio (student score data) θ = Bootstrap λ 1 λ 1 + λ 2 ˆθ = y i ind N d ( ˆµ, ˆ/Σ ) ˆλ 1 ˆλ 1 + ˆλ 2 = 0.793±?? i = 1, 2,..., n θ 1, θ 2,..., θ B Bayesian Inference 14
Density estimates for student score eigenratio (B=10,000); Boot (red),jeffreys (5 dim prior) Black, and BCa Green (z0=.182, a=0) density 0 1 2 3 4 5 6 7 BCa Jeffreys boot 0.5 0.6 0.7 0.8 0.9 eigenratio Triangles show 95% limits: bca (.598,.890) bayes (.650,.908) Bayesian Inference 15
3 2 1 0 1 2 3 3 2 1 0 1 2 3 Reweighting the parametric bootstrap points bhat exp(del) Bayesian Inference 16
Prostate Cancer Study (Singh et al, 2002) Microarray study 102 men: 52 prostate cancer, 50 healthy controls 6033 genes z i test statistic for H 0i : no difference H 0i : z i N(0, 1) Goal Identify genes involved in prostate cancer Bayesian Inference 17
Prostate cancer z values for 6033 genes; 52 patients vs 50 healthy controls. N(0,1) black; LogPoly4 red counts 0 100 200 300 400 log poly4 fit 4 2 0 2 4 z value Bayesian Inference 18
Poisson Regression Models Histogram y j = #{z i bin j } x j = midpoint bin j, j = 1, 2,..., J Poisson model y j ind Poi(µ j ) with log(µ j ) = poly(x j, degree = m) Exponential family degree p = m + 1 Let η j = log(µ j ) = (η ˆη) (µ + ˆµ) 2 ( µj ˆµ j ) Bayesian Inference 19
Parametric False Discovery Rate Estimates glm(y poly(x,4),poisson) { ˆµ j } ( log poly 4 ) θ = Fdr(3) = [1 Φ(3)] / [ 1 ˆF(3) ] Pr{null z 3} ( ˆF is smoothed CDF estimate) Model 4: Fdr(3) = 0.192±?? Parametric bootstrap y ind Poi ( ) { } ˆµ j j ˆµ j Fdr(3) Bayesian Inference 20
Fdr(3) estimation, Model 4; prostate data from 4000 parboots; Jeffreys Bayes (black), raw boot (red), bca (green) posterior density 0 5 10 15 20 0.15 0.20 0.25 0.30 Fdr(3) triangles show 95% Jeffreys/BCa limits (.154,.241) Bayesian Inference 21
Compare Model 4 (line) with Model 8 (solid); 4000 parametric bootstrap replications of Fdr(3) counts 0 50 100 150 200 250 300 0.15 0.20 0.25 0.30 Fdr(3) bootreps triangles are 95% bca limits: (.141,.239) vs (.154,.241) Bayesian Inference 22
Model Selection: AIC = Deviance + 2 df Model AIC boot counts Bayes counts M2 142.6 0 0.0 M3 143.1 0 0.0 M4 73.3 1266 1458 (36%) M5 74.3 415 462 (12%) M6 75.8 215 197 (5%) M7 77.8 54 67 (2%) M8 75.6 2050 1816 (45%) Model 8, Jeffreys prior Bayesian Inference 23
Internal Accuracy of Bayes Estimates B for computing Ê { t(β) ˆβ } = B / B t i π i R i π i R i? Define r i = π i R i and s i = t i π i R i ĉ ss = Example t = Fdr(3) CV { Ê } 2 = 1 B 1 (ĉss 1 B (s i s) 2 /B, etc. 1 ) s 2 2ĉsr s r + ĉrr r 2 4000 parametric bootreps, Jeffreys Bayes Model 4 Ê = 0.193 ĈV = 0.0019 Model 8 Ê = 0.179 ĈV = 0.0025 Bayesian Inference 24
Sampling Variability of Bayes Estimates How much would Ê { t(β) ˆβ } vary for new ˆβ s? (i.e., frequentist properties of Bayes estimates) Need to bootstrap the parametric bootstrap calculations for Ê { t(β) ˆβ } Shortcut Bootstrap after bootstrap Bayesian Inference 25
Bootstrap-after-Bootstrap Calculations f ˆβ ( ) β 1, β 2,..., β i,..., β B, the original bootstrap reps under ˆβ Want bootreps under some other value ˆγ Idea Reweight the originals Weight Q ˆγ (β) = f β ( ˆγ ) / fβ ( ˆβ ) : Ê { t ˆγ } = B / B t i π i R i Q ˆγ (β i ) π i R i Q ˆγ (β i ) 1 1 Bayesian Inference 26
Bootstrap-after-Bootstrap Standard Errors BAB (1) (2) (3) f ˆβ ( ) γ 1, γ 2,..., γ k Ê k = Ê { } t ˆγ k se { Ê } [ 2 / ] 1/2 = (Êk Ê ) K Example Model 4 97.5% credible limit for Fdr(3) = 0.241±?? Bayesian Inference 27
400 BAB replications of 97.5% credible upper limit for Fdr(3); line histogram: same for BCa upper limit Frequency 0 10 20 30 40.240 0.18 0.20 0.22 0.24 0.26 0.28 0.30 0.32 97.5% credible upper limit BAB Standard Errors:.030 (Bayes).034 (BCa); corr.98 Bayesian Inference 28
BAB SDs for Bayes Model Selection Proportions Bayes Model post. prob. BAB SD M4 36% ± 20% M5 12% ± 14% M6 5% ± 8% M7 2% ± 6% M8 45% ± 27% Model averaging? Bayesian Inference 29
A Few Conclusions... Parametric bootstrap closely related to objective Bayes. (That s why it s a good importance sampling choice.) When it applies, parboot approach has both computational and interpretational advantages over MCMC/Gibbs. Objective Bayes analyses should be checked frequentistically. Bayesian Inference 30
References Bootstrap DiCiccio & Efron 1996 Statist. Sci. 189 228 Bayes and Bootstrap 3 48 (see Discussion!) Newton & Raftery 1994 JRSS-B Objective Priors Berger & Bernardo 1989 JASA 200 207; Ghosh 2011 Statist. Sci. 187 202 Exponential Families Appendix 1 MCMC and Bootstrap 1052 1062 Efron 2010 Large-Scale Inference, Efron 2011 J. Biopharm. Statist. Bayesian Inference 31