Introduction to Bayesian Computation

Similar documents
Introduction to Bayesian computation (cont.)

Bayesian hypothesis testing

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

Introduction to Bayesian Inference

MONTE CARLO METHODS. Hedibert Freitas Lopes

Markov Chain Monte Carlo (MCMC)

Sequential Monte Carlo Methods

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

7. Estimation and hypothesis testing. Objective. Recommended reading

Learning the hyper-parameters. Luca Martino

Hierarchical models. Dr. Jarad Niemi. August 31, Iowa State University. Jarad Niemi (Iowa State) Hierarchical models August 31, / 31

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Bayesian Estimation with Sparse Grids

HW3 Solutions : Applied Bayesian and Computational Statistics

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

EXERCISES FOR SECTION 1 AND 2

Lecture 1 Bayesian inference

Bayesian Computations for DSGE Models

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Model comparison. Christopher A. Sims Princeton University October 18, 2016

Principles of Bayesian Inference

Density Estimation. Seungjin Choi

One-parameter models

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

Computational statistics

INTRODUCTION TO BAYESIAN STATISTICS

Principles of Bayesian Inference

Sequential Monte Carlo Methods

Markov chain Monte Carlo

Principles of Bayesian Inference

Lecture 13 Fundamentals of Bayesian Inference

ST 740: Model Selection

Lecture 21: Convergence of transformations and generating a random variable

David Giles Bayesian Econometrics

MODEL COMPARISON CHRISTOPHER A. SIMS PRINCETON UNIVERSITY

Stat 451 Lecture Notes Monte Carlo Integration

Stats 579 Intermediate Bayesian Modeling. Assignment # 2 Solutions

Multiparameter models (cont.)

Lecture 2: From Linear Regression to Kalman Filter and Beyond

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

CS281A/Stat241A Lecture 22

TEORIA BAYESIANA Ralph S. Silva

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Sequential Monte Carlo Methods (for DSGE Models)

Foundations of Statistical Inference

Sequential Monte Carlo Methods

Monte Carlo in Bayesian Statistics

A Very Brief Summary of Statistical Inference, and Examples

STAT J535: Chapter 5: Classes of Bayesian Priors

Monte Carlo Methods in Bayesian Inference: Theory, Methods and Applications

Monte Carlo Composition Inversion Acceptance/Rejection Sampling. Direct Simulation. Econ 690. Purdue University

Bayesian Inference and MCMC

Prior Choice, Summarizing the Posterior


Statistical Theory MT 2007 Problems 4: Solution sketches

CSC 2541: Bayesian Methods for Machine Learning

Theory of Statistical Tests

First Year Examination Department of Statistics, University of Florida

Spring 2006: Examples: Laplace s Method; Hierarchical Models

Consideration of prior information in the inference for the upper bound earthquake magnitude - submitted major revision

Interval Estimation. Chapter 9

Remarks on Improper Ignorance Priors


Master s Written Examination

Robert Collins CSE586, PSU Intro to Sampling Methods

Chapter 9: Interval Estimation and Confidence Sets Lecture 16: Confidence sets and credible sets

Chapter 5: Monte Carlo Approximation

Robert Collins CSE586, PSU Intro to Sampling Methods

Stat 451 Lecture Notes Numerical Integration

STAT 830 Bayesian Estimation

Bayesian GLMs and Metropolis-Hastings Algorithm

Hierarchical models. Dr. Jarad Niemi. STAT Iowa State University. February 14, 2018

Lecture : Probabilistic Machine Learning

Introduction. Start with a probability distribution f(y θ) for the data. where η is a vector of hyperparameters

Rejection sampling - Acceptance probability. Review: How to sample from a multivariate normal in R. Review: Rejection sampling. Weighted resampling

Integrated Non-Factorized Variational Inference

Bayesian parameter estimation in predictive engineering

Answers and expectations

Statistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003

Linear Models A linear model is defined by the expression

Time Series and Dynamic Models

MIT Spring 2016

Nested Sampling. Brendon J. Brewer. brewer/ Department of Statistics The University of Auckland

7. Estimation and hypothesis testing. Objective. Recommended reading

Parametric Techniques Lecture 3

Likelihood-free MCMC

Statistical Inference

Introduction to Bayesian Methods

40.530: Statistics. Professor Chen Zehua. Singapore University of Design and Technology

The Central Limit Theorem

Spatial Statistics Chapter 4 Basics of Bayesian Inference and Computation

Bayesian Melding. Assessing Uncertainty in UrbanSim. University of Washington

GAUSSIAN PROCESS REGRESSION

Bayesian Phylogenetics:

Can we do statistical inference in a non-asymptotic way? 1

Bayesian Inference and Computation

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

(3) Review of Probability. ST440/540: Applied Bayesian Statistics

Statistical Inference

Sequential Bayesian Updating

Transcription:

Introduction to Bayesian Computation Dr. Jarad Niemi STAT 544 - Iowa State University March 20, 2018 Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 20, 2018 1 / 30

Bayesian computation Goals: E θ y [h(θ) y] = h(θ)p(θ y)dθ p(y) = p(y θ)p(θ)dθ = E θ [p(y θ)] Approaches: Deterministic approximation Monte Carlo approximation Theoretical justification Gridding Inverse CDF Accept-reject Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 20, 2018 2 / 30

Numerical integration Deterministic methods where E[h(θ) y] = h(θ)p(θ y)dθ S S=1 and θ (s) are selected points, w s is the weight given to the point θ (s), and the error can be bounded. Monte Carlo (simulation) methods where S E[h(θ) y] = h(θ)p(θ y)dθ and ( w s h θ (s)) ) p (θ (s) y S=1 (s) iid θ g(θ) (for some proposal distribution g), w s = p(θ (s) y)/g(θ (s) ), and we have SLLN and CLT. w s h (θ (s)) Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 20, 2018 3 / 30

Deterministic methods Example: Normal-Cauchy model Let Y N(θ, 1) with θ Ca(0, 1). The posterior is p(θ y) p(y θ)p(θ) exp( (y θ)2 /2) 1 + θ 2 = q(θ y) which is not a known distribution. We might be interested in 1. normalizing this posterior, i.e. calculating c(y) = q(θ y)dθ 2. or in calculating the posterior mean, i.e. E[θ y] = θp(θ y)dθ = θ q(θ y) c(y) dθ. Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 20, 2018 4 / 30

Deterministic methods Normal-Cauchy: marginal likelihood y = 1 # Data q = function(theta, y, log = FALSE) { out = -(y-theta)^2/2-log(1+theta^2) if (log) return(out) return(exp(out)) } # Find normalizing constant for q(theta y) w = 0.1 theta = seq(-5,5,by=w)+y (cy = sum(q(theta,y)*w)) # gridding based approach [1] 1.305608 integrate(function(x) q(x,y), -Inf, Inf) # numerical integration 1.305609 with absolute error < 0.00013 Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 20, 2018 5 / 30

Deterministic methods Normal-Cauchy: distribution curve(q(x,y)/cy, -5, 5, n=1001) points(theta,rep(0,length(theta)), cex=0.5, pch=19) segments(theta,0,theta,q(theta,y)/cy) q(x, y)/cy 0.0 0.1 0.2 0.3 0.4 0.5 4 2 0 2 4 x Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 20, 2018 6 / 30

Deterministic methods Posterior expectation E[h(θ) y] S s=1 ( w s h θ (s)) ) p (θ (s) y = S s=1 ( w s h θ (s)) q ( θ (s) y ) c(y) h = function(theta) theta sum(w*h(theta)*q(theta,y)/cy) [1] 0.5542021 Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 20, 2018 7 / 30

Deterministic methods Posterior expectation as a function of observed data 5.0 2.5 Posterior expectation 0.0 prior Cauchy normal improper uniform 2.5 5.0 5.0 2.5 0.0 2.5 5.0 y Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 20, 2018 8 / 30

Convergence review Three main notions of convergence of a sequence of random variables X 1, X 2,... and a random variable X: d Convergence in distribution (X n X): lim Fn(X) = F (x). n Convergence in probability (WLLN, X n p X): lim P ( Xn X ɛ) = 0. n Almost sure convergence (SLLN, X n a.s. X): ( ) P lim n Xn = X = 1. Implications: Almost sure convergence implies convergence in probability. Convergence in probability implies convergence in distribution. Here, X n will be our approximation to an integral and X the true (constant) value of that integral or X n will be a standardized approximation and X will be N(0, 1). Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 20, 2018 9 / 30

Monte Carlo integration Consider evaluating the integral E[h(θ)] = using the Monte Carlo estimate (j) ind where θ g(θ). We know ĥ J = 1 J Θ J j=1 h(θ)p(θ)dθ h (θ (j)) SLLN: ĥj converges almost surely to E[h(θ)]. CLT: if h 2 has finite expectation, then ĥ J E[h(θ)] vj /J d N(0, 1) where v J = V ar[h(θ)] 1 J or any other consistent estimator. J j=1 [ ( h θ (j)) ] 2 ĥj Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 20, 2018 10 / 30

Definite integral Definite integral Suppose you are interested in evaluating Then set I = h(θ) = e θ2 /2 and p(θ) = 1, i.e. θ Unif(0, 1). 1 0 e θ2 /2 dθ. and approximate by a Monte Carlo estimate via 1. For j = 1,..., J, a. sample θ (j) Unif(0, 1) and b. calculate h ( θ (j)). 2. Calculate I 1 J J h(θ (j) ). j=1 Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 20, 2018 11 / 30

Definite integral Monte Carlo sampling randomly infills 20 samples 200 samples f(x) 0.6 0.7 0.8 0.9 1.0 f(x) 0.6 0.7 0.8 0.9 1.0 0.0 0.4 0.8 0.0 0.4 0.8 x Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 20, 2018 12 / 30 x

Definite integral Strong law of large numbers Monte Carlo estimate Estimate 0.85 0.90 0.95 0 200 400 600 800 1000 Number of samples Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 20, 2018 13 / 30

Definite integral Central limit theorem Monte Carlo estimate Estimate 0.75 0.80 0.85 0.90 0.95 Truth Estimate 95% CI 0 200 400 600 800 1000 Number of samples Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 20, 2018 14 / 30

Infinite bounds Infinite bounds Suppose θ N(0, 1) and you are interested in evaluating E[θ] = θ 1 2π e θ2 /2 dθ Then set h(θ) = θ and g(θ) = φ(θ), i.e. θ N(0, 1). and approximate by a Monte Carlo estimate via 1. For j = 1,..., J, a. sample θ (j) N(0, 1) and b. calculate h ( θ (j)). 2. Calculate E[θ] 1 J h(θ (j) ). J j=1 Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 20, 2018 15 / 30

Infinite bounds Non-uniform sampling n=20 n=100 h(theta)=theta 3 2 1 0 1 2 3 h(theta)=theta 3 2 1 0 1 2 3 3 1 1 2 3 3 1 1 2 3 locations, theta locations, theta Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 20, 2018 16 / 30

Infinite bounds Monte Carlo estimate Monte Carlo estimate Estimate 1.0 0.5 0.0 0.5 1.0 Truth Estimate 95% CI 0 200 400 600 800 1000 J Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 20, 2018 17 / 30

Gridding Monte Carlo approximation via gridding Rather than determining c(y) and then E[θ y] via deterministic gridding (all w i are equal), we can use the grid as a discrete approximation to the posterior, i.e. p(θ y) N p i δ θi (θ) p i = i=1 q(θ i y) N j=1 q(θ j y) where δ θi (θ) is the Dirac delta function, i.e. δ θi (θ) = 0 θ θ i δ θi (θ)dθ = 1. This discrete approximation to p(θ y) can be used to approximate the expectation E[h(θ) y] deterministically or via simulation, i.e. N E[h(θ) y] p i h(θ i ) E[h(θ) y] 1 S h (θ (s)) S i=1 where θ (s) N i=1 p iδ θi (θ) (with replacement). Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 20, 2018 18 / 30 s=1

Gridding Example: Normal-Cauchy model y = 1 # Data # Small number of grid locations theta = seq(-5,5,length=1e2+1)+y; p = q(theta,y)/sum(q(theta,y)); sum(p*theta) [1] 0.5542021 mean(sample(theta,prob=p,replace=true)) [1] 0.6118812 # Large number of grid locations theta = seq(-5,5,length=1e6+1)+y; p = q(theta,y)/sum(q(theta,y)); sum(p*theta) [1] 0.5542021 mean(sample(theta,1e2,prob=p,replace=true)) # But small MC sample [1] 0.598394 # Truth post_expectation(1) [1] 0.5542021 Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 20, 2018 19 / 30

Inverse CDF method Inverse cumulative distribution function Definition The cumulative distribution function (cdf) of a random variable X is defined by F X (x) = P X (X x) for all x. Lemma Let X be a random variable whose cdf is F (x) and you have access to the inverse cdf of X, i.e. if u = F (x) = x = F 1 (u). If U Unif(0, 1), then X = F 1 (U) is a simulation from the distribution for X. Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 20, 2018 20 / 30

Inverse CDF method Inverse CDF Standard normal CDF pnorm(x) 0.0 0.2 0.4 0.6 0.8 1.0 3 2 1 0 1 2 3 x Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 20, 2018 21 / 30

Inverse CDF method Exponential example For example, to sample X Exp(1), 1. Sample U Unif(0, 1). 2. Set X = log(1 U), or X = log(u). 1.00 0.75 density 0.50 0.25 0.00 0.0 2.5 5.0 7.5 x Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 20, 2018 22 / 30

Inverse CDF method Sampling from a univariate truncated distribution Suppose you wish to sample from X N(µ, σ 2 )I(a < X < b), i.e. a normal random variable with untruncated mean µ and variance σ 2, but truncated to the interval (a, b). Suppose the untruncated cdf is F and inverse cdf is F 1. 1. Calculate endpoints p a = F (a) and p b = F (b). 2. Sample U Unif(p a, p b ). 3. Set X = F 1 (U). This just avoids having to recalculate the normalizing constant for the pdf, i.e. 1/(F 1 (b) F 1 (a)). Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 20, 2018 23 / 30

Inverse CDF method Truncated normal X N(5, 9)I(1 X 6) N(5,9^2) cdf pnorm(x, mean, sd) 0.0 0.2 0.4 0.6 0.8 1.0 10 5 0 5 10 15 20 Jarad Niemi (STAT544@ISU) Introduction to Bayesian x Computation March 20, 2018 24 / 30

Inverse CDF method Truncated normal X N(5, 9)I(1 X 6) 0.2 density 0.1 0.0 2 4 6 x Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 20, 2018 25 / 30

Rejection sampling Rejection sampling Suppose you wish to obtain samples θ p(θ y), rejection sampling performs the following 1. Sample a proposal θ g(θ) and U Unif(0, 1). 2. Accept θ = θ as a draw from p(θ y) if U p(θ y)/mg(θ ), otherwise return to step 1. where M satisfies M g(θ) p(θ y) for all θ. For a given proposal distribution g(θ), the optimal M is M = sup θ p(θ y)/g(θ). The probability of acceptance is 1/M. The accept-reject idea is to create an envelope, M g(θ), above p(θ y). Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 20, 2018 26 / 30

Rejection sampling Rejection sampling with unnormalized density Suppose you wish to obtain samples θ p(θ y) q(θ y), rejection sampling performs the following 1. Sample a proposal θ g(θ) and U Unif(0, 1). 2. Accept θ = θ as a draw from p(θ y) if U q(θ y)/m g(θ ), otherwise return to step 1. where M satisfies M g(θ) q(θ y) for all θ. For a given proposal distribution g(θ), the optimal M is M = sup θ q(θ y)/g(θ). The acceptance probability is 1/M = c(y)/m. The accept-reject idea is to create an envelope, M g(θ), above q(θ y). Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 20, 2018 27 / 30

Rejection sampling Example: Normal-Cauchy model If Y N(θ, 1) and θ Ca(0, 1), then for θ R. p(θ y) e (y θ)2 /2 1 (1 + θ 2 ) Choose a N(y, 1) as a proposal distribution, i.e. g(θ) = 1 2π e (θ y)2 /2 with M = sup θ q(θ y) g(θ) = sup θ e (y θ)2 /2 1 (1+θ 2 ) 1 2π e (θ y)2 /2 = 2π (1 + θ 2 ) 2π The acceptance rate is 1/M = c(y)/m = 1.3056085/ 2π = 0.5208624. Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 20, 2018 28 / 30

Rejection sampling Example: Normal-Cauchy model 1.00 0.75 u M g(θ) 0.50 accept FALSE TRUE 0.25 0.00 2 1 0 1 2 3 sample Observed acceptance rate was 0.52 Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 20, 2018 29 / 30

Rejection sampling Heavy-tailed proposals Suppose our target is a standard Cauchy and our (proposed) proposal is a standard normal, then 1 p(θ y) π(1+θ = 2 ) g(θ) 1 2π e θ2 /2 and 1 π(1+θ 2 ) 1 2π e θ2 /2 θ since e a converges to zero faster than 1/(1 + a). Thus, there is no value M such that M g(θ) p(θ y) for all θ. Bottom line: the condition M g(θ) p(θ y) requires the proposal to have tails at least as thick (heavy) as the target. Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 20, 2018 30 / 30