I. Bayesian econometrics

Similar documents
CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

Computational statistics

Answers and expectations

Bayesian Methods for Machine Learning

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo

STA 4273H: Statistical Machine Learning

Introduction to Machine Learning CMU-10701

Markov Chain Monte Carlo (MCMC)

Lecture 8: The Metropolis-Hastings Algorithm

Markov Chain Monte Carlo Methods

Metropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9

Markov chain Monte Carlo

MCMC and Gibbs Sampling. Kayhan Batmanghelich

Lecture Notes based on Koop (2003) Bayesian Econometrics

Monte Carlo Inference Methods

Markov Chains and MCMC

Markov Chain Monte Carlo

Adaptive Monte Carlo methods

Markov chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Control Variates for Markov Chain Monte Carlo

MCMC Methods: Gibbs and Metropolis

Markov Chain Monte Carlo methods

Brief introduction to Markov Chain Monte Carlo

6 Markov Chain Monte Carlo (MCMC)

16 : Approximate Inference: Markov Chain Monte Carlo

Bayesian Prediction of Code Output. ASA Albuquerque Chapter Short Course October 2014

Markov Chain Monte Carlo, Numerical Integration

SAMPLING ALGORITHMS. In general. Inference in Bayesian models

Bayesian Inference and MCMC

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

MCMC: Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

CPSC 540: Machine Learning

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Metropolis-Hastings Algorithm

Bayesian Linear Regression

Quantifying Uncertainty

CS281A/Stat241A Lecture 22

Introduction to Computational Biology Lecture # 14: MCMC - Markov Chain Monte Carlo

ST 740: Markov Chain Monte Carlo

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

Monte Carlo Methods. Leon Gu CSD, CMU

Markov chain Monte Carlo Lecture 9

LECTURE 15 Markov chain Monte Carlo

DAG models and Markov Chain Monte Carlo methods a short overview

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

16 : Markov Chain Monte Carlo (MCMC)

MCMC algorithms for fitting Bayesian models

Markov chain Monte Carlo

CSC 446 Notes: Lecture 13

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version)

Math 456: Mathematical Modeling. Tuesday, April 9th, 2018

Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation. Luke Tierney Department of Statistics & Actuarial Science University of Iowa

Advances and Applications in Perfect Sampling

Bayesian Phylogenetics:

Bayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Markov chain Monte Carlo

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling

Computer intensive statistical methods

Appendix 3. Markov Chain Monte Carlo and Gibbs Sampling

Approximate Inference using MCMC

Sampling Algorithms for Probabilistic Graphical models

The Metropolis-Hastings Algorithm. June 8, 2012

Advanced Statistical Modelling

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Convex Optimization CMU-10725

Bayesian Linear Models

CSC 2541: Bayesian Methods for Machine Learning

Markov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017

MCMC Sampling for Bayesian Inference using L1-type Priors

Convergence Rate of Markov Chains

Markov Chain Monte Carlo Methods

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

Bayesian Methods with Monte Carlo Markov Chains II

Lecturer: David Blei Lecture #3 Scribes: Jordan Boyd-Graber and Francisco Pereira October 1, 2007

Bayesian Linear Models

Eco517 Fall 2014 C. Sims MIDTERM EXAM

Stat 516, Homework 1

Session 3A: Markov chain Monte Carlo (MCMC)

MSc MT15. Further Statistical Methods: MCMC. Lecture 5-6: Markov chains; Metropolis Hastings MCMC. Notes and Practicals available at

David Giles Bayesian Econometrics

Generating the Sample

Sampling from complex probability distributions

CSC 2541: Bayesian Methods for Machine Learning

Lecture 7 and 8: Markov Chain Monte Carlo

Closed-form Gibbs Sampling for Graphical Models with Algebraic constraints. Hadi Mohasel Afshar Scott Sanner Christfried Webers

Results: MCMC Dancers, q=10, n=500

Markov chain Monte Carlo

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Inverse Problems in the Bayesian Framework

Principles of Bayesian Inference

PSEUDO-MARGINAL METROPOLIS-HASTINGS APPROACH AND ITS APPLICATION TO BAYESIAN COPULA MODEL

Introduction to Machine Learning CMU-10701

Doing Bayesian Integrals

Transcription:

I. Bayesian econometrics A. Introduction B. Bayesian inference in the univariate regression model C. Statistical decision theory D. Large sample results E. Diffuse priors F. Numerical Bayesian methods 1. Importance sampling Generic Bayesian problem: py likelihood (known) p prior (known) goal: calculate p Y py p G for G py pd Analytical approach: choose p from a family such that G can be found with clever algebra. Numerical approach: satisfied to be able to generate draws 1, 2,..., D from the distribution p Y without ever knowing the distribution (i.e., without calculating G) 1

Importance sampling: Step (1): Generate j from an (essentially arbitrary) importance density g. Step (2): Calculate j py j p j g j. Step (3): Weight the draw j by j to simulate distribution of p Y. Examples: E Y p Yd D j j D j D j j j Var Y D j 2

D Prob 2 0 2 j 0 j D j How does this work? D j j D D1 j j j D 1 D j D Numerator: D D 1 j j p E j j gd py p g py pd gd 3

Denominator: D D 1 j p E j gd py p gd g py pd py Conclusion: D j j p py pd D j py p Yd Example: 0, 1 Importance density g: U0, 1 4

Draws from uniform importance density 3.0 3.0 2.5 2.5 2.0 2.0 1.5 1.5 1.0 1.0 0.5 0.5 0.0 0.00 0.25 0.50 0.75 1.00 0.0 Reweighted draws 3.0 3.0 2.5 2.5 2.0 2.0 1.5 1.5 1.0 1.0 0.5 0.5 0.0 0.00 0.25 0.50 0.75 1.00 0.0 Algorithm will converge faster the more the importance density resembles the target 5

Draws from the importance density g(x) = 3x 2 3.0 3.0 2.5 2.5 2.0 2.0 1.5 1.5 1.0 1.0 0.5 0.5 0.0 0.00 0.25 0.50 0.75 1.00 0.0 Reweighted draws 3.0 3.0 2.5 2.5 2.0 2.0 1.5 1.5 1.0 1.0 0.5 0.5 0.0 0.00 0.25 0.50 0.75 1.00 0.0 What s required of g.? j j j py j p j g j should satisfy Law of Large Numbers. 6

Khintchine s Theorem: If x j D is i.i.d. with finite mean, then D 1 D p x j Note: does not require x j to have finite variance j are drawn i.i.d. from g by construction So we only need E Y p Yd exists p Y kpy p support of g includes However, convergence may be very slow if variance of j py j p j g j is infinite. Practical observations: works best if g has fatter tails than py p works best when g is good approximation to p Y 7

Always produces an answer, good idea to check it. (1) Try special cases where result is known analytically. (2) Try different g. to see if get the same result. (3) Use analytic results for components of in order to keep dimension that must be importance-sampled small. I. Bayesian econometrics F. Numerical Bayesian methods 1. Importance sampling 2. The Gibbs sampler Suppose the parameter vector can be partitioned as 1 with the property that p Y is of unknown form but p 1 Y, 2, 3 p 2 Y, 1, 3 p 3 Y, 1, 2, 2, 3 are of known form (same idea works for 2, 4, or n blocks) 8

(1) Start with arbitrary initial guesses j 1, j 2, j 3 for j 1. (2) Generate: 1 from p 1 Y, j 2, j 3 2 from p 2 Y, 1, j 3 3 from p 3 Y, 1, 2 (3) Repeat step (2) for j 1,2,...,D Notice the sequence j D is a Markov chain with transition kernel j p 3 Y, 1, 2 p 2 Y, 1, 3 j p 1 Y, 2 j, 3 j Under quite general conditions, the realizations from a Markov chain for D converge to draws from the ergodic distribution of the chain satisfying k j j d j 9

Claim: the ergodic distribution of this chain corresponds to the posterior distribution: p Y Proof: k j j d j k p 3 Y, 1, 2 p 2 Y, 1, 3 j p 1 Y, 2 j, 3 j p j Yd j k p, j Yd j p Y 10

Implication: if we throw out the first D 0 draws (for D 0 large), then D01, D02,..., D represent draws from the posterior distribution p Y. Checks: (1) Change 1 same answer? (2) Change D 0, D same answer? (3) Plot elements of j as function of j to see if it looks same across blocks. Example of bad mixing 8 6 4 2 0 0 1 2 3 4 5 6 7 8 9 10 x 10 5 0 2 4 6 8 0 1 2 3 4 5 6 7 8 9 10 x 10 5 11

Checks: (4) Calculate autocorrelations of elements of j Note: throwing out observations does not "cure" the problem (5) Do formal statistical tests for stability Geweke s diagnostic: Test whether mean of i for first 10% of draws is same as for last 50%. Repeat for each parameter i. (1) Calculate mean of parameter i over first subsample: 1 N q 1 N 1 1 j i for say N 1 0. 1D (2) Estimateŝ 1 2 times spectrum at frequency 0 over this subsample 12

(3) Do same for second subsample, e.g. 1 D j q 2 N 2 jj2 1 i for J 2 N 2 0. 5D (4) Calculate q 1 q 2 ŝ 1 /N 1 ŝ 2 /N 2 d N0, 1, e.g., reject stability if exceeds 1. 96 Popular approach to estimate s (e.g., Dynare): (a) Divide subsample into 100 blocks (i.e., block 1 first 1% of draws) (b) Calculate mean over each block and autocovariances of these means (c) Use Newey-West with 4, 8, or 15 lags ( 4%, 8%, or 15% of sample) to getŝ 1 I. Bayesian econometrics F. Numerical Bayesian methods 1. Importance sampling 2. The Gibbs sampler 3. Metropolis-Hastings algorithm 13

T Suppose s t t1 is an ergodic K-state Markov chain, s t 1,2,...,K with transition probabilities p ij Prs t j s t1 i K p ij 1 for i 1,...,K p ij 0 for i,j 1,...,K The ergodic or unconditional probabilities satisfy Prs t j K i1 Prs t j,s t1 i j K i1 p ij i 14

Proposition: Suppose we can find a set of numbers f 1,f 2,..,f K such that f j 0 for j 1,...,K K f j 1 f i p ij f j p ji Then f j j Proof: We re given that f i p ij f j p ji sum over i: K i1 f i p ij f j K i1 p ji f j which satisfy definitions of i, K i1 i p ij j Works also for continuous-valued Markov chains. If x t k is Markov with transition kernel px, y (meaning that): Prx t A x t1 x px, ydy A 15

then the ergodic density y, which signifies that Prx t A ydy, A satisfies y px, yxdx k Proposition: if fx 0 for all x fxdx 1 k fxpx, y fypy, x for all x,y then x fx Goal in Metropolis-Hastings: We know how to calculate hx (where h may be an unknown constant) and want to sample from it. Solution: generate a sample x t from a Markov chain whose ergodic density isx 16

How MH works: We previously generated x t1 x We now generate a candidate y from some known density qx, y We ll then set x t y if y/x is big and otherwise keep x t x Let x, y be probability we set x t y If xqx, y 0, then x, y min yqy, x xqx,y,1 otherwise x,y 1 17

When x y, the transition kernel of this chain is qx, yx, y. To show that y is the ergodic density of this chain, we must show that xx, yqx, y But yy, xqy, x xx, yqx, y xqx, ymin yqy,x xqx,y, 1 minyqy, x, xqx, y yy, xqy, x Options for candidate density: (1) independent qy, x qy e.g., y N, where is our guess of mean of y 18

Options for candidate density: (2) random walk qy,x qy x e.g., qy,x 2 n/2 1/2 exp1/2y x 1 y x 19