Frequentist Accuracy of Bayesian Estimates Bradley Efron Stanford University
Bayesian Inference Parameter: µ Ω Observed data: x Prior: π(µ) Probability distributions: Parameter of interest: { fµ (x), µ Ω } θ = t(µ) E{θ x} = t(µ)f µ (x)π(µ) dµ Ω What if we don t know g? / f µ (x)π(µ) dµ Ω Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 2 / 30
Jeffreysonian Bayes Inference Uninformative Priors Jeffreys: π(µ) = I(µ) 1/2 where I(µ) = cov { µ log f µ (x) } (the Fisher information matrix) Can still use Bayes theorem but how accurate are the estimates? Today: frequentist variability of E { t(µ) x } Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 3 / 30
General Accuracy Formula µ and x R p x (µ, V µ ) α x (µ) = x log f µ (x) = ( ) T..., log f µ(x) x i,... Lemma Ê = E { t(µ) x } has gradient x E = cov { t(µ), α x (µ) x }. Theorem The delta-method standard deviation of E is sd ( Ê ) = [ cov { t(µ), α x (µ) x }T V x cov { t(µ), α x (µ) x }] 1/2. Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 4 / 30
Implementation Posterior Sample µ 1, µ 2,..., µ B (MCMC) ˆθ i = t(µ i ) = t i and α i = α x (µ i ) ŝd ( Ê ) = ĉov = [ ] 1/2 ĉov T V x ĉov B i=1 ( α i ᾱ ) ( t i t )/ B Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 5 / 30
Diabetes Data Efron et al. (2004), LARS n = 442 subjects p = 10 predictors: age, sex, bmi, glu,... Response: y = disease progression at one year Model: y = X α + e [e N n (0, I)] n 1 n p p 1 n 1 Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 6 / 30
Diabetes data. Lasso: min{rss/2 + gamma*l1(beta)} Cp minimized at Step 7, with gamma =.37 beta vals 15 10 5 0 5 10 15 Step 7 tc gamma: 16.44 8.37 5.84 2.41 1.64 1.27 0.37 0.1 0.09 0.04 0.02 0 ltg bmi ldl map tch hdl glu age sex 0 2 4 6 8 10 12 step Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 7 / 30
Bayesian Lasso Park and Casella (2008) Model: y N n (Xα, I) µ = α and x = y Prior: π(α) = e γl 1(α) [γ = 0.37] Then posterior mode at Lasso ˆα γ Subject 125: θ 125 = x T 125 α How accurate are Bayes posterior inferences for θ 125? Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 8 / 30
Bayesian Analysis MCMC: posterior sample { α i for i = 1, 2,..., 10, 000 } Gives { θ = x T 125,i 125 α, i = 1, 2,..., 10, 000} i θ 125,i 0.248 ± 0.072 General accuracy formula: frequentist sd 0.071 for Ê = 0.248 Other subjects freq sd < Bayes sd Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 9 / 30
Figure 2. 10000 MCMC values for Theta =Subject 125 estimate; mean=.248, stdev=.0721; Frequentist Sd for E{theta data} is.0708, giving Coefficient of Variation 29% Frequency 0 100 200 300 400 500 600 700 ^ [ ].248 ^ 0.0 0.1 0.2 0.3 0.4 0.5 MCMC mu125 values Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 10 / 30
Posterior CDF of θ 125 Apply GAC to s i = I { t i < c } so E{s data} = Pr{θ 125 c data} c = 0.3 : Ê = 0.762 ± 0.304 Bayes GAC sd estimate Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 11 / 30
Figure 3. MCMC posterior cdf of mu125, Diabetes data, Prior exp{.37*l1(alpha)}; verts are + One Frequentist Standard Dev Prob{mu125 < c data} 0.0 0.2 0.4 0.6 0.8 1.0 [ ].336 0.0 0.1 0.2 0.3 0.4 c value Upper 95% credible limit is.336 +.071 Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 12 / 30
Exponential Families f α (ˆβ ) = e αt ˆβ ψ(α) f 0 (ˆβ ) x = ˆβ and µ = α General Accuracy Formula α = α x (µ) For Ê = E { t(α) ˆβ } ŝd = [cov ( t, α ) T ˆβ Vˆα cov ( t, α )] 1/2 ˆβ with Vˆα = cov α=ˆα {ˆβ } = ψ(ˆα). Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 13 / 30
Bayesian Estimation Using the Parametric Bootstrap Efron (2012) Parametric bootstrap: fˆα ( ) {ˆβ 1, ˆβ 2,..., ˆβ B } Want to calculate: Ê = E { t(α) ˆβ } for prior π(α) Importance sampling: Ê B / B t i π i R i 1 1 π i R i t i = t (ˆα i ), πi = π (ˆα i ), and Ri = conversion factor (Easy in exponential families.) R i = fˆα (ˆβ ) / ) fˆα (ˆβ i i Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 14 / 30
Bootstrap for GAC p i = π i R i / B 1 π jr j is weight on ith Bootstrap Replication Ê = B i=1 p it i ĉov = B i=1 p i ˆα i ( t i Ê) estimates cov ( α, t ˆβ ) ŝd = [ ] 1/2 ĉov T Vˆα ĉov Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 15 / 30
Prostate Cancer Study Singh et al. (2002) Microarray study: 102 men 52 prostate cancer, 50 healthy controls 6033 genes z i test statistic for H 0i : no difference H 0i : z i N(0, 1) Goal: identify genes involved in prostate cancer Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 16 / 30
Figure 4. Prostate study: 6033 z values and matching N(0,1) density Frequency 0 100 200 300 400 3 4 2 0 2 4 z value Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 17 / 30
Poisson GLM Histogram: 49 bins, c j midpoint of bin j y j = #{z i in bin j} Poisson GLM: y j ind Poi(µ j ) log(µ) = poly(c, degree=8) [MLE: glm(y poly(c,8), Poisson)] Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 18 / 30
Bayesian Estimation for the Poisson Model Model: y Poi(µ), µ = e Xα [X = poly(c,8)] Prior: Jeffreys prior for α / cdf: α µ cdf: F α (z) = µ i µ i Fdr parameter: [MLE: Fdrˆα (3) = 0.183] c i z t(α) = 1 Φ(3) 1 F(3) = Fdr(3) Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 19 / 30
Figure 5. Prostate study: posterior density for Fdr(3) based on 4000 parametric bootstraps from Poisson poly(8) gave posterior (m,sd) =(.183,.025); Frequentist Sd for.183 equalled.026; CV=14% density 0 5 10 15 E{t x}=.183 0.15 0.20 0.25 0.30 Fdr(3) values Internal coefficient of variation =.0023 Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 20 / 30
Model Selection Calculations Full model: y Poi(µ), µ = e Xα [ α R 9, µ R 49] Submodel M m : {µ : only 1st m + 1 coordinates of β 0} Bayesian model selection: prior probabilities on and within each M m Poor man s model estimates: partition R 49 into preference regions R m = {µ closest to M m } Calculate posterior probabilities Pr{R m y} Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 21 / 30
Posterior Model Probabilities Distance: minimum AIC from µ to point in R m Preferred model: is one with smallest distance R m 4 5 6 7 8 posterior prob.36.12.05.02.45 sd.32.16.08.03.40 Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 22 / 30
Empirical Bayes Accuracy z k = observed value for gene k θ k = true effect size z k N(θ k, 1) Unknown prior g( ) θ 1, θ 2,..., θ N [N = 6033] From observations z 1, z 2,..., z N we wish to estimate E z = E { t(θ) / } z = t(θ)f θ (z)g(θ) dθ f θ (z)g(θ) dθ [if t(θ) = δ 0 (θ) then E z = Pr{θ = 0 z}]. Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 23 / 30
Parametric Families of Priors p-parameter exponential family: log g α (θ) = q(θ) T 1 p α the unknown parameter α + constant p 1 Marginal: f α (z) = f θ (z)g α (θ) dθ e.g., q(θ) = ( δ 0 (θ), θ, θ 2, θ 3, θ 4, θ 5) T [ spike and slab ] MLE: α g α f α (z 1, z 2,..., z N ) marginal MLE ˆα Ê z = t(θ)f θ (z)gˆα (θ) dθ / f θ (z)gˆα (θ) dθ ±?? Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 24 / 30
Delta-Method Standard Deviation of Êz sd ( { ) (dez ) T ( ) } 1/2 Ê z = I 1 dez α dα dα Fisher information matrix: q α = q(θ)g α (θ) dθ h α (z) = f θ (z)g α (θ) [q(θ) q] dθ hα (z)h α (z) T I α = N f α (z) dz de z dα = E z w(θ)gα (θ) [q(θ) q] dθ where w(θ) = t(θ)f θ (z)g α (θ) / tfg α f θ (z)g α (θ) / fg α Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 25 / 30
Figure 6. Estimated prob{abs(theta)>2 z value} Prostate study; Bars show + one freq stdev. Using g model {0,ns(5)} prob{ theta >2 z} 0.0 0.2 0.4 0.6 0.8 1.0 3 4 2 0 2 4 6 z value Prob{abs(theta)>2 z=3} =.434 +.071 Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 26 / 30
Estimated False Discovery Rate Local false discovery rate: fdr(z) = Pr {θ = 0 z} = E { t(θ) z } where t(θ) = δ 0 (θ) Next: applied to prostate study Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 27 / 30
Figure 7. Estimated prob{theta=0 z value} Prostate study; Bars show + one freq stdev. Using g model {0,ns(5)} prob{theta=0 z} 0.0 0.2 0.4 0.6 0.8 1.0 3 4 2 0 2 4 6 z value Prob{theta=0 z=3} =.322 +.068, coeff of var =.21 Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 28 / 30
For z = 3: Pr { θ > 2 z = 3} = 0.43 ± 0.07 fdr(3) = 0.32 ± 0.07 locfdr : exfam modeling of z s, not θ s, gave fdr(3) = 0.35 ± 0.03 but doesn t work for θ > 2, etc. Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 29 / 30
References Park and Casella (2008) The Bayesian Lasso JASA 681 686 Singh et al. (2002) Gene expression correlates of clinical prostate cancer behavior Cancer Cell 203 209 Efron, Hastie, Johnstone and Tibshirani (2004) Least angle regression Ann. Statist. 407 499 Efron (2012) Bayesian inference and the parametric bootstrap Ann. Appl. Statist. 1971 1997 Efron (2010) Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction IMS Monographs, Cambridge University Press Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 30 / 30