Bayesian Inference and the Parametric Bootstrap. Bradley Efron Stanford University

Similar documents
Frequentist Accuracy of Bayesian Estimates

Bayesian inference and the parametric bootstrap

Frequentist Accuracy of Bayesian Estimates

The bootstrap and Markov chain Monte Carlo

Model Selection, Estimation, and Bootstrap Smoothing. Bradley Efron Stanford University

Tweedie s Formula and Selection Bias. Bradley Efron Stanford University

FREQUENTIST ACCURACY OF BAYESIAN ESTIMATES. Bradley Efron. Division of Biostatistics STANFORD UNIVERSITY Stanford, California

Default priors and model parametrization

Correlation, z-values, and the Accuracy of Large-Scale Estimators. Bradley Efron Stanford University

1. Fisher Information

Better Bootstrap Confidence Intervals

STAT 425: Introduction to Bayesian Analysis

Empirical Bayes Deconvolution Problem

The Nonparametric Bootstrap

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Noninformative Priors for the Ratio of the Scale Parameters in the Inverted Exponential Distributions

Confidence Distribution

A Sequential Bayesian Approach with Applications to Circadian Rhythm Microarray Gene Expression Data

The comparative studies on reliability for Rayleigh models

New Bayesian methods for model comparison

Bootstrap and Parametric Inference: Successes and Challenges

Applied Asymptotics Case studies in higher order inference

Bootstrap Confidence Intervals

The assumptions are needed to give us... valid standard errors valid confidence intervals valid hypothesis tests and p-values

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

The bootstrap. Patrick Breheny. December 6. The empirical distribution function The bootstrap

F & B Approaches to a simple model

The automatic construction of bootstrap confidence intervals

A Very Brief Summary of Statistical Inference, and Examples

Stat 5101 Lecture Notes

Multiple Testing. Hoang Tran. Department of Statistics, Florida State University

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion

A BAYESIAN MATHEMATICAL STATISTICS PRIMER. José M. Bernardo Universitat de València, Spain

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods

Unobservable Parameter. Observed Random Sample. Calculate Posterior. Choosing Prior. Conjugate prior. population proportion, p prior:

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1

Introduction to Bayesian Methods

Statistics: Learning models from data

Lecture 6. Prior distributions

7. Estimation and hypothesis testing. Objective. Recommended reading

inferences on stress-strength reliability from lindley distributions

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

Overall Objective Priors

Statistical Inference

Some Curiosities Arising in Objective Bayesian Analysis

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Practice Exam 1. (A) (B) (C) (D) (E) You are given the following data on loss sizes:

Estimation of a Two-component Mixture Model

Bayesian and frequentist inference

Chapter 4: Constrained estimators and tests in the multiple linear regression model (Part III)

simple if it completely specifies the density of x

Large-Scale Hypothesis Testing

Likelihood inference in the presence of nuisance parameters

Linear Regression Models P8111

MODEL SELECTION, ESTIMATION, AND BOOTSTRAP SMOOTHING. Bradley Efron. Division of Biostatistics STANFORD UNIVERSITY Stanford, California

Bayesian Confidence Intervals for the Ratio of Means of Lognormal Data with Zeros

Answer Key for STAT 200B HW No. 8

Approximating models. Nancy Reid, University of Toronto. Oxford, February 6.

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

COPYRIGHTED MATERIAL CONTENTS. Preface Preface to the First Edition

Model comparison and selection

Module 22: Bayesian Methods Lecture 9 A: Default prior selection

Generalized Linear Models. Kurt Hornik

Invariant HPD credible sets and MAP estimators

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

STAT 740: Testing & Model Selection

Bayesian Inference. Chapter 2: Conjugate models

(1) Introduction to Bayesian statistics

Bayesian Regression Linear and Logistic Regression

STATISTICAL INFERENCE IN ACCELERATED LIFE TESTING WITH GEOMETRIC PROCESS MODEL. A Thesis. Presented to the. Faculty of. San Diego State University

Bootstrap (Part 3) Christof Seiler. Stanford University, Spring 2016, Stats 205

Exercise 5.4 Solution

Modelling geoadditive survival data

Empirical Bayes Analysis for a Hierarchical Negative Binomial Generalized Linear Model

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Bayesian Inference in Astronomy & Astrophysics A Short Course

Are a set of microarrays independent of each other?

A union of Bayesian, frequentist and fiducial inferences by confidence distribution and artificial data sampling

BFF Four: Are we Converging?

Minimum Message Length Analysis of the Behrens Fisher Problem

The binomial model. Assume a uniform prior distribution on p(θ). Write the pdf for this distribution.

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

Package IGG. R topics documented: April 9, 2018

Master s Written Examination

Estimating empirical null distributions for Chi-squared and Gamma statistics with application to multiple testing in RNA-seq

Part 2: One-parameter models

Empirical Likelihood Based Deviance Information Criterion

TABLE OF CONTENTS CHAPTER 1 COMBINATORIAL PROBABILITY 1

Empirical Bayes modeling, computation, and accuracy

Spatial Statistics Chapter 4 Basics of Bayesian Inference and Computation

Lecture 5: LDA and Logistic Regression

Small Area Confidence Bounds on Small Cell Proportions in Survey Populations

ROW AND COLUMN CORRELATIONS (ARE A SET OF MICROARRAYS INDEPENDENT OF EACH OTHER?) Bradley Efron Department of Statistics Stanford University

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands

Bootstrap. Director of Center for Astrostatistics. G. Jogesh Babu. Penn State University babu.

Preliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference

Subject CS1 Actuarial Statistics 1 Core Principles

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Introduction. Start with a probability distribution f(y θ) for the data. where η is a vector of hyperparameters

2014/2015 Smester II ST5224 Final Exam Solution

Transcription:

Bayesian Inference and the Parametric Bootstrap Bradley Efron Stanford University

Importance Sampling for Bayes Posterior Distribution Newton and Raftery (1994 JRSS-B) Nonparametric Bootstrap: good choice (actually used a smoothed nonparametric bootstrap) Today Parametric Bootstrap Good computational properties when applicable Connection between Bayes and frequentist inference Bayesian Inference 1

Student Score Data (Mardia, Kent and Bibby) n = 22 students scores on two tests: mechanics, vectors Data y = (y 1, y 2,..., y 22 ) with y i = (mec i, vec i ) Parameter of interest θ = correlation (mec, vec) Sample correlation coefficient ˆθ = 0.498±?? Bayesian Inference 2

Scores of 22 students on two tests 'mechanics' and 'vectors'; Sample Correlation Coefficient is.498 +?? vectors score 30 40 50 60 70 0 20 40 60 mechanics score Sampled from 88 students, Mardia, Kent and Bibby Bayesian Inference 3

R. A. Fisher (1915) Density f θ ( ˆθ ) = n 2 π (1 θ2 ) ( (n 1)/2 1 ˆθ 2) (n 4)/2 0 [ cosh(w) θ ˆθ ] (n 1) dw Bivariate normal y i ind N 2 (µ, /Σ) i = 1, 2,..., n 95% confidence limits (from z transform) θ (0.084, 0.754) Bayesian Inference 4

Fisher density for correlation coefficient if corr=.498 And histogram for 10000 parametric bootstrap replications 2.5 Frequency 0 100 200 300 400 500 600.084 corrhat=.498.754 2 1.5 1 0.5 density 0.2 0.0 0.2 0.4 0.6 0.8 1.0 correlation Red triangles are Fisher 95% confidence limits Bayesian Inference 5

Parametric Bootstrap Computations ind Assume y i N 2 (µ, /Σ) Estimate ˆµ and ˆ/Σ by MLE ( ) ind Bootstrap sample N 2 ˆµ, ˆ/Σ for i = 1, 2,..., 22 y i Bootstrap replication ˆθ = corr coeff for y = (y 1, y 2,..., y 22 ) Bootstrap standard deviation sd ( ˆθ ) = 0.168 (B = 10, 000) Percentile 95% confidence limits (0.109, 0.761) Bayesian Inference 6

Better Bootstrap Confidence Limits (BCa) Idea Reweighting the B bootstrap replications improves coverage Weights involve z 0 (bias correction) and a (acceleration) W BCa (θ) = ϕ(z θ/(1 + az θ ) z 0 ) (1 + az θ ) 2 ϕ(z θ + z 0 ) where Z θ = Φ 1 G(θ) z 0 bootstrap cdf Reweighting the B = 10, 000 bootstraps gave weighted percentiles (0.075, 0.748) Fisher: (0.084, 0.754) Bayesian Inference 7

Bayes Posterior Intervals Prior π(θ) Posterior expectation for t(θ): E { t(θ) ˆθ } ( = t(θ)π(θ) f ) / ( θ ˆθ dθ π(θ) f ) θ ˆθ dθ Conversion factor Ratio of likelihood to bootstrap density: ( R(θ) = f ) / θ ˆθ f ˆθ (θ) ( θ = ˆθ ) Bootstrap integrals E { t(θ) ˆθ } = t(θ)π(θ)r(θ) f ˆθ (θ) dθ π(θ)r(θ) f ˆθ (θ) dθ Bayesian Inference 8

Bootstrap Estimation of Bayes Expectation E{t(θ) ˆθ} Parametric bootstrap replications f ˆθ ( ) θ 1, θ 2,..., θ i,..., θ B t i = t(θ i ), π i = π(θ i ), R i = R(θ i ): Ê { t(θ) ˆθ } = B / B t i π i R i π i R i i=1 i=1 Reweighting Weight π i R i on θ i Importance sampling estimate: Ê { t(θ) ˆθ } E { t(θ) ˆθ } as B Bayesian Inference 9

Jeffreys Priors Harold Jeffreys (1930s): Theory of uninformative prior distributions ( invariant, objective, reference,... ) For density family f θ (y), θ R p : π(θ) = c I(θ) 1/2 I(θ) = Fisher info matrix Normal correlation π(θ) = 1/(1 θ 2 ) Bayesian Inference 10

Importance sampling posterior density for correlation (black) using Jeffreys prior pi(theta)=1/(1 theta^2); BCa weighted density (green); raw bootstrap density (red) weighted counts 0.0 0.5 1.0 1.5 2.0 2.5 0.2 0.0 0.2 0.4 0.6 0.8 correlation triangles show 95% posterior limits Bayesian Inference 11

Multiparameter Exponential Families R P p-dimensional sufficient statistic ˆβ p-dimensional parameter vector β = E { ˆβ } (Poisson: e λ λ x /x! has ˆβ = x, β = λ) Hoeffding f β ( ˆβ ) = f ˆβ ( ˆβ ) e D ( ˆβ,β)/2 Deviance D(β 1, β 2 ) = 2E β1 log { f β1 ( ˆβ ) / f β2 ( ˆβ )} Bayesian Inference 12

Conversion Factor in Exponential Families Conversion factor R(β) = f β ( ˆβ ) / f ˆβ (β) (with ˆβ fixed) R(β) = ν(β)e (β) where (β) = [ D ( β, ˆβ ) D ( ˆβ, β )] / 2 ν(β) = f ˆβ ( ˆβ ) / f β (β) 1/Jeffreys prior (Laplace) Jeffreys prior π(β)r(β) e (β) Bayesian Inference 13

Example: Multivariate Normal y i ind N d (µ, Σ) for i = 1, 2,..., n = 1 ( ) n µ ˆµ ˆ/Σ /Σ 1 2 ( µ ˆµ ) + 1 2 tr ( /Σ ˆ/Σ 1 ˆ/Σ/Σ 1 ) + log ˆ/Σ /Σ eigenratio (student score data) θ = Bootstrap λ 1 λ 1 + λ 2 ˆθ = y i ind N d ( ˆµ, ˆ/Σ ) ˆλ 1 ˆλ 1 + ˆλ 2 = 0.793±?? i = 1, 2,..., n θ 1, θ 2,..., θ B Bayesian Inference 14

Density estimates for student score eigenratio (B=10,000); Boot (red),jeffreys (5 dim prior) Black, and BCa Green (z0=.182, a=0) density 0 1 2 3 4 5 6 7 BCa Jeffreys boot 0.5 0.6 0.7 0.8 0.9 eigenratio Triangles show 95% limits: bca (.598,.890) bayes (.650,.908) Bayesian Inference 15

3 2 1 0 1 2 3 3 2 1 0 1 2 3 Reweighting the parametric bootstrap points bhat exp(del) Bayesian Inference 16

Prostate Cancer Study (Singh et al, 2002) Microarray study 102 men: 52 prostate cancer, 50 healthy controls 6033 genes z i test statistic for H 0i : no difference H 0i : z i N(0, 1) Goal Identify genes involved in prostate cancer Bayesian Inference 17

Prostate cancer z values for 6033 genes; 52 patients vs 50 healthy controls. N(0,1) black; LogPoly4 red counts 0 100 200 300 400 log poly4 fit 4 2 0 2 4 z value Bayesian Inference 18

Poisson Regression Models Histogram y j = #{z i bin j } x j = midpoint bin j, j = 1, 2,..., J Poisson model y j ind Poi(µ j ) with log(µ j ) = poly(x j, degree = m) Exponential family degree p = m + 1 Let η j = log(µ j ) = (η ˆη) (µ + ˆµ) 2 ( µj ˆµ j ) Bayesian Inference 19

Parametric False Discovery Rate Estimates glm(y poly(x,4),poisson) { ˆµ j } ( log poly 4 ) θ = Fdr(3) = [1 Φ(3)] / [ 1 ˆF(3) ] Pr{null z 3} ( ˆF is smoothed CDF estimate) Model 4: Fdr(3) = 0.192±?? Parametric bootstrap y ind Poi ( ) { } ˆµ j j ˆµ j Fdr(3) Bayesian Inference 20

Fdr(3) estimation, Model 4; prostate data from 4000 parboots; Jeffreys Bayes (black), raw boot (red), bca (green) posterior density 0 5 10 15 20 0.15 0.20 0.25 0.30 Fdr(3) triangles show 95% Jeffreys/BCa limits (.154,.241) Bayesian Inference 21

Compare Model 4 (line) with Model 8 (solid); 4000 parametric bootstrap replications of Fdr(3) counts 0 50 100 150 200 250 300 0.15 0.20 0.25 0.30 Fdr(3) bootreps triangles are 95% bca limits: (.141,.239) vs (.154,.241) Bayesian Inference 22

Model Selection: AIC = Deviance + 2 df Model AIC boot counts Bayes counts M2 142.6 0 0.0 M3 143.1 0 0.0 M4 73.3 1266 1458 (36%) M5 74.3 415 462 (12%) M6 75.8 215 197 (5%) M7 77.8 54 67 (2%) M8 75.6 2050 1816 (45%) Model 8, Jeffreys prior Bayesian Inference 23

Internal Accuracy of Bayes Estimates B for computing Ê { t(β) ˆβ } = B / B t i π i R i π i R i? Define r i = π i R i and s i = t i π i R i ĉ ss = Example t = Fdr(3) CV { Ê } 2 = 1 B 1 (ĉss 1 B (s i s) 2 /B, etc. 1 ) s 2 2ĉsr s r + ĉrr r 2 4000 parametric bootreps, Jeffreys Bayes Model 4 Ê = 0.193 ĈV = 0.0019 Model 8 Ê = 0.179 ĈV = 0.0025 Bayesian Inference 24

Sampling Variability of Bayes Estimates How much would Ê { t(β) ˆβ } vary for new ˆβ s? (i.e., frequentist properties of Bayes estimates) Need to bootstrap the parametric bootstrap calculations for Ê { t(β) ˆβ } Shortcut Bootstrap after bootstrap Bayesian Inference 25

Bootstrap-after-Bootstrap Calculations f ˆβ ( ) β 1, β 2,..., β i,..., β B, the original bootstrap reps under ˆβ Want bootreps under some other value ˆγ Idea Reweight the originals Weight Q ˆγ (β) = f β ( ˆγ ) / fβ ( ˆβ ) : Ê { t ˆγ } = B / B t i π i R i Q ˆγ (β i ) π i R i Q ˆγ (β i ) 1 1 Bayesian Inference 26

Bootstrap-after-Bootstrap Standard Errors BAB (1) (2) (3) f ˆβ ( ) γ 1, γ 2,..., γ k Ê k = Ê { } t ˆγ k se { Ê } [ 2 / ] 1/2 = (Êk Ê ) K Example Model 4 97.5% credible limit for Fdr(3) = 0.241±?? Bayesian Inference 27

400 BAB replications of 97.5% credible upper limit for Fdr(3); line histogram: same for BCa upper limit Frequency 0 10 20 30 40.240 0.18 0.20 0.22 0.24 0.26 0.28 0.30 0.32 97.5% credible upper limit BAB Standard Errors:.030 (Bayes).034 (BCa); corr.98 Bayesian Inference 28

BAB SDs for Bayes Model Selection Proportions Bayes Model post. prob. BAB SD M4 36% ± 20% M5 12% ± 14% M6 5% ± 8% M7 2% ± 6% M8 45% ± 27% Model averaging? Bayesian Inference 29

A Few Conclusions... Parametric bootstrap closely related to objective Bayes. (That s why it s a good importance sampling choice.) When it applies, parboot approach has both computational and interpretational advantages over MCMC/Gibbs. Objective Bayes analyses should be checked frequentistically. Bayesian Inference 30

References Bootstrap DiCiccio & Efron 1996 Statist. Sci. 189 228 Bayes and Bootstrap 3 48 (see Discussion!) Newton & Raftery 1994 JRSS-B Objective Priors Berger & Bernardo 1989 JASA 200 207; Ghosh 2011 Statist. Sci. 187 202 Exponential Families Appendix 1 MCMC and Bootstrap 1052 1062 Efron 2010 Large-Scale Inference, Efron 2011 J. Biopharm. Statist. Bayesian Inference 31