Bayesian Inference and the Parametric Bootstrap. Bradley Efron Stanford University

Bayesian Inference and the Parametric Bootstrap Bradley Efron Stanford University

Importance Sampling for Bayes Posterior Distribution Newton and Raftery (1994 JRSS-B) Nonparametric Bootstrap: good choice (actually used a smoothed nonparametric bootstrap) Today Parametric Bootstrap Good computational properties when applicable Connection between Bayes and frequentist inference Bayesian Inference 1

Student Score Data (Mardia, Kent and Bibby) n = 22 students scores on two tests: mechanics, vectors Data y = (y 1, y 2,..., y 22 ) with y i = (mec i, vec i ) Parameter of interest θ = correlation (mec, vec) Sample correlation coefficient ˆθ = 0.498±?? Bayesian Inference 2

Scores of 22 students on two tests 'mechanics' and 'vectors'; Sample Correlation Coefficient is.498 +?? vectors score 30 40 50 60 70 0 20 40 60 mechanics score Sampled from 88 students, Mardia, Kent and Bibby Bayesian Inference 3

R. A. Fisher (1915) Density f θ ( ˆθ ) = n 2 π (1 θ2 ) ( (n 1)/2 1 ˆθ 2) (n 4)/2 0 [ cosh(w) θ ˆθ ] (n 1) dw Bivariate normal y i ind N 2 (µ, /Σ) i = 1, 2,..., n 95% confidence limits (from z transform) θ (0.084, 0.754) Bayesian Inference 4

Fisher density for correlation coefficient if corr=.498 And histogram for 10000 parametric bootstrap replications 2.5 Frequency 0 100 200 300 400 500 600.084 corrhat=.498.754 2 1.5 1 0.5 density 0.2 0.0 0.2 0.4 0.6 0.8 1.0 correlation Red triangles are Fisher 95% confidence limits Bayesian Inference 5

Parametric Bootstrap Computations ind Assume y i N 2 (µ, /Σ) Estimate ˆµ and ˆ/Σ by MLE ( ) ind Bootstrap sample N 2 ˆµ, ˆ/Σ for i = 1, 2,..., 22 y i Bootstrap replication ˆθ = corr coeff for y = (y 1, y 2,..., y 22 ) Bootstrap standard deviation sd ( ˆθ ) = 0.168 (B = 10, 000) Percentile 95% confidence limits (0.109, 0.761) Bayesian Inference 6

Better Bootstrap Confidence Limits (BCa) Idea Reweighting the B bootstrap replications improves coverage Weights involve z 0 (bias correction) and a (acceleration) W BCa (θ) = ϕ(z θ/(1 + az θ ) z 0 ) (1 + az θ ) 2 ϕ(z θ + z 0 ) where Z θ = Φ 1 G(θ) z 0 bootstrap cdf Reweighting the B = 10, 000 bootstraps gave weighted percentiles (0.075, 0.748) Fisher: (0.084, 0.754) Bayesian Inference 7

Bayes Posterior Intervals Prior π(θ) Posterior expectation for t(θ): E { t(θ) ˆθ } ( = t(θ)π(θ) f ) / ( θ ˆθ dθ π(θ) f ) θ ˆθ dθ Conversion factor Ratio of likelihood to bootstrap density: ( R(θ) = f ) / θ ˆθ f ˆθ (θ) ( θ = ˆθ ) Bootstrap integrals E { t(θ) ˆθ } = t(θ)π(θ)r(θ) f ˆθ (θ) dθ π(θ)r(θ) f ˆθ (θ) dθ Bayesian Inference 8

Bootstrap Estimation of Bayes Expectation E{t(θ) ˆθ} Parametric bootstrap replications f ˆθ ( ) θ 1, θ 2,..., θ i,..., θ B t i = t(θ i ), π i = π(θ i ), R i = R(θ i ): Ê { t(θ) ˆθ } = B / B t i π i R i π i R i i=1 i=1 Reweighting Weight π i R i on θ i Importance sampling estimate: Ê { t(θ) ˆθ } E { t(θ) ˆθ } as B Bayesian Inference 9

Jeffreys Priors Harold Jeffreys (1930s): Theory of uninformative prior distributions ( invariant, objective, reference,... ) For density family f θ (y), θ R p : π(θ) = c I(θ) 1/2 I(θ) = Fisher info matrix Normal correlation π(θ) = 1/(1 θ 2 ) Bayesian Inference 10

Importance sampling posterior density for correlation (black) using Jeffreys prior pi(theta)=1/(1 theta^2); BCa weighted density (green); raw bootstrap density (red) weighted counts 0.0 0.5 1.0 1.5 2.0 2.5 0.2 0.0 0.2 0.4 0.6 0.8 correlation triangles show 95% posterior limits Bayesian Inference 11

Multiparameter Exponential Families R P p-dimensional sufficient statistic ˆβ p-dimensional parameter vector β = E { ˆβ } (Poisson: e λ λ x /x! has ˆβ = x, β = λ) Hoeffding f β ( ˆβ ) = f ˆβ ( ˆβ ) e D ( ˆβ,β)/2 Deviance D(β 1, β 2 ) = 2E β1 log { f β1 ( ˆβ ) / f β2 ( ˆβ )} Bayesian Inference 12

Conversion Factor in Exponential Families Conversion factor R(β) = f β ( ˆβ ) / f ˆβ (β) (with ˆβ fixed) R(β) = ν(β)e (β) where (β) = [ D ( β, ˆβ ) D ( ˆβ, β )] / 2 ν(β) = f ˆβ ( ˆβ ) / f β (β) 1/Jeffreys prior (Laplace) Jeffreys prior π(β)r(β) e (β) Bayesian Inference 13

Example: Multivariate Normal y i ind N d (µ, Σ) for i = 1, 2,..., n = 1 ( ) n µ ˆµ ˆ/Σ /Σ 1 2 ( µ ˆµ ) + 1 2 tr ( /Σ ˆ/Σ 1 ˆ/Σ/Σ 1 ) + log ˆ/Σ /Σ eigenratio (student score data) θ = Bootstrap λ 1 λ 1 + λ 2 ˆθ = y i ind N d ( ˆµ, ˆ/Σ ) ˆλ 1 ˆλ 1 + ˆλ 2 = 0.793±?? i = 1, 2,..., n θ 1, θ 2,..., θ B Bayesian Inference 14

Density estimates for student score eigenratio (B=10,000); Boot (red),jeffreys (5 dim prior) Black, and BCa Green (z0=.182, a=0) density 0 1 2 3 4 5 6 7 BCa Jeffreys boot 0.5 0.6 0.7 0.8 0.9 eigenratio Triangles show 95% limits: bca (.598,.890) bayes (.650,.908) Bayesian Inference 15

3 2 1 0 1 2 3 3 2 1 0 1 2 3 Reweighting the parametric bootstrap points bhat exp(del) Bayesian Inference 16

Prostate Cancer Study (Singh et al, 2002) Microarray study 102 men: 52 prostate cancer, 50 healthy controls 6033 genes z i test statistic for H 0i : no difference H 0i : z i N(0, 1) Goal Identify genes involved in prostate cancer Bayesian Inference 17

Prostate cancer z values for 6033 genes; 52 patients vs 50 healthy controls. N(0,1) black; LogPoly4 red counts 0 100 200 300 400 log poly4 fit 4 2 0 2 4 z value Bayesian Inference 18

Poisson Regression Models Histogram y j = #{z i bin j } x j = midpoint bin j, j = 1, 2,..., J Poisson model y j ind Poi(µ j ) with log(µ j ) = poly(x j, degree = m) Exponential family degree p = m + 1 Let η j = log(µ j ) = (η ˆη) (µ + ˆµ) 2 ( µj ˆµ j ) Bayesian Inference 19

Parametric False Discovery Rate Estimates glm(y poly(x,4),poisson) { ˆµ j } ( log poly 4 ) θ = Fdr(3) = [1 Φ(3)] / [ 1 ˆF(3) ] Pr{null z 3} ( ˆF is smoothed CDF estimate) Model 4: Fdr(3) = 0.192±?? Parametric bootstrap y ind Poi ( ) { } ˆµ j j ˆµ j Fdr(3) Bayesian Inference 20

Fdr(3) estimation, Model 4; prostate data from 4000 parboots; Jeffreys Bayes (black), raw boot (red), bca (green) posterior density 0 5 10 15 20 0.15 0.20 0.25 0.30 Fdr(3) triangles show 95% Jeffreys/BCa limits (.154,.241) Bayesian Inference 21

Compare Model 4 (line) with Model 8 (solid); 4000 parametric bootstrap replications of Fdr(3) counts 0 50 100 150 200 250 300 0.15 0.20 0.25 0.30 Fdr(3) bootreps triangles are 95% bca limits: (.141,.239) vs (.154,.241) Bayesian Inference 22

Model Selection: AIC = Deviance + 2 df Model AIC boot counts Bayes counts M2 142.6 0 0.0 M3 143.1 0 0.0 M4 73.3 1266 1458 (36%) M5 74.3 415 462 (12%) M6 75.8 215 197 (5%) M7 77.8 54 67 (2%) M8 75.6 2050 1816 (45%) Model 8, Jeffreys prior Bayesian Inference 23

Internal Accuracy of Bayes Estimates B for computing Ê { t(β) ˆβ } = B / B t i π i R i π i R i? Define r i = π i R i and s i = t i π i R i ĉ ss = Example t = Fdr(3) CV { Ê } 2 = 1 B 1 (ĉss 1 B (s i s) 2 /B, etc. 1 ) s 2 2ĉsr s r + ĉrr r 2 4000 parametric bootreps, Jeffreys Bayes Model 4 Ê = 0.193 ĈV = 0.0019 Model 8 Ê = 0.179 ĈV = 0.0025 Bayesian Inference 24

Sampling Variability of Bayes Estimates How much would Ê { t(β) ˆβ } vary for new ˆβ s? (i.e., frequentist properties of Bayes estimates) Need to bootstrap the parametric bootstrap calculations for Ê { t(β) ˆβ } Shortcut Bootstrap after bootstrap Bayesian Inference 25

Bootstrap-after-Bootstrap Calculations f ˆβ ( ) β 1, β 2,..., β i,..., β B, the original bootstrap reps under ˆβ Want bootreps under some other value ˆγ Idea Reweight the originals Weight Q ˆγ (β) = f β ( ˆγ ) / fβ ( ˆβ ) : Ê { t ˆγ } = B / B t i π i R i Q ˆγ (β i ) π i R i Q ˆγ (β i ) 1 1 Bayesian Inference 26

Bootstrap-after-Bootstrap Standard Errors BAB (1) (2) (3) f ˆβ ( ) γ 1, γ 2,..., γ k Ê k = Ê { } t ˆγ k se { Ê } [ 2 / ] 1/2 = (Êk Ê ) K Example Model 4 97.5% credible limit for Fdr(3) = 0.241±?? Bayesian Inference 27

400 BAB replications of 97.5% credible upper limit for Fdr(3); line histogram: same for BCa upper limit Frequency 0 10 20 30 40.240 0.18 0.20 0.22 0.24 0.26 0.28 0.30 0.32 97.5% credible upper limit BAB Standard Errors:.030 (Bayes).034 (BCa); corr.98 Bayesian Inference 28

BAB SDs for Bayes Model Selection Proportions Bayes Model post. prob. BAB SD M4 36% ± 20% M5 12% ± 14% M6 5% ± 8% M7 2% ± 6% M8 45% ± 27% Model averaging? Bayesian Inference 29

A Few Conclusions... Parametric bootstrap closely related to objective Bayes. (That s why it s a good importance sampling choice.) When it applies, parboot approach has both computational and interpretational advantages over MCMC/Gibbs. Objective Bayes analyses should be checked frequentistically. Bayesian Inference 30

References Bootstrap DiCiccio & Efron 1996 Statist. Sci. 189 228 Bayes and Bootstrap 3 48 (see Discussion!) Newton & Raftery 1994 JRSS-B Objective Priors Berger & Bernardo 1989 JASA 200 207; Ghosh 2011 Statist. Sci. 187 202 Exponential Families Appendix 1 MCMC and Bootstrap 1052 1062 Efron 2010 Large-Scale Inference, Efron 2011 J. Biopharm. Statist. Bayesian Inference 31