Markov Chain Monte Carlo, Numerical Integration

Similar documents
(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Monte Carlo in Bayesian Statistics

Tutorial on Probabilistic Programming with PyMC3

MCMC and Gibbs Sampling. Kayhan Batmanghelich

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference and MCMC

Principles of Bayesian Inference

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

Markov Chain Monte Carlo

Bayesian Methods for Machine Learning

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Sampling Algorithms for Probabilistic Graphical models

Markov chain Monte Carlo

LECTURE 15 Markov chain Monte Carlo

Bayesian Estimation with Sparse Grids

Metropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9

Markov Chain Monte Carlo methods

Bayesian Regression Linear and Logistic Regression


Control Variates for Markov Chain Monte Carlo

MCMC: Markov Chain Monte Carlo

Markov Chain Monte Carlo Methods

Bayesian Phylogenetics:

Session 3A: Markov chain Monte Carlo (MCMC)

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Introduction to Markov Chain Monte Carlo & Gibbs Sampling

Learning the hyper-parameters. Luca Martino

Markov Chain Monte Carlo

16 : Markov Chain Monte Carlo (MCMC)

Multilevel Statistical Models: 3 rd edition, 2003 Contents

MCMC algorithms for fitting Bayesian models

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

Lecture 6: Markov Chain Monte Carlo

STA 4273H: Statistical Machine Learning

Machine Learning for Data Science (CS4786) Lecture 24

Computer Practical: Metropolis-Hastings-based MCMC

Down by the Bayes, where the Watermelons Grow

MCMC Methods: Gibbs and Metropolis

The Metropolis-Hastings Algorithm. June 8, 2012

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018

F denotes cumulative density. denotes probability density function; (.)

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

I. Bayesian econometrics

Reminder of some Markov Chain properties:

16 : Approximate Inference: Markov Chain Monte Carlo

Monte Carlo Techniques for Regressing Random Variables. Dan Pemstein. Using V-Dem the Right Way

Monte Carlo Methods. Leon Gu CSD, CMU

Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation. Luke Tierney Department of Statistics & Actuarial Science University of Iowa

Probabilistic Machine Learning

Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox Richard A. Norton, J.

Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods

Session 5B: A worked example EGARCH model

Likelihood-free MCMC

Part 1: Expectation Propagation

Advanced Statistical Modelling

Answers and expectations

Hierarchical models. Dr. Jarad Niemi. August 31, Iowa State University. Jarad Niemi (Iowa State) Hierarchical models August 31, / 31

Computational statistics

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

Markov Chain Monte Carlo Methods

The Ising model and Markov chain Monte Carlo

Bayesian inference for multivariate extreme value distributions

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Bayesian Linear Regression

MH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Introduction to Machine Learning

A Review of Pseudo-Marginal Markov Chain Monte Carlo

Paul Karapanagiotidis ECO4060

Eco517 Fall 2014 C. Sims MIDTERM EXAM

Pattern Recognition and Machine Learning

Forward Problems and their Inverse Solutions

DAG models and Markov Chain Monte Carlo methods a short overview

Markov chain Monte Carlo Lecture 9

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

CPSC 540: Machine Learning

Example: Ground Motion Attenuation

Introduction to Machine Learning CMU-10701

Reducing The Computational Cost of Bayesian Indoor Positioning Systems

Monte Carlo integration

Brief introduction to Markov Chain Monte Carlo

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

Markov chain Monte Carlo methods in atmospheric remote sensing

CSC 2541: Bayesian Methods for Machine Learning

Bayesian Methods in Multilevel Regression

Eco517 Fall 2013 C. Sims MCMC. October 8, 2013

MCMC Sampling for Bayesian Inference using L1-type Priors

High-dimensional Problems in Finance and Economics. Thomas M. Mertens

Probabilistic Graphical Networks: Definitions and Basic Results

Bayesian GLMs and Metropolis-Hastings Algorithm

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Bayesian Estimation of Input Output Tables for Russia

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

CSC 2541: Bayesian Methods for Machine Learning

TDA231. Logistic regression

Parallel Tempering I

STAT 425: Introduction to Bayesian Analysis

Lecture 8: The Metropolis-Hastings Algorithm

Markov Chains and MCMC

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Transcription:

Markov Chain Monte Carlo, Numerical Integration (See Statistics) Trevor Gallen Fall 2015 1 / 1

Agenda Numerical Integration: MCMC methods Estimating Markov Chains Estimating latent variables 2 / 1

Numerical Integration: Part II Quadrature for one to a few dimensions feasible for well-behaved distributions For many-dimensional integrals, we typically use Markov chain Monte Carlo There are many different methods I discuss a simple one (Gibbs Sampling) and a more complex one (Metropolis-Hastings) This is a prominent problem in Bayesian analysis 3 / 1

The Problem Say your data is summarized by a two-dimensional problem: height and weight f (x 1, x 2 ) You want a population, for i (1,..., n), (x i 1, x i 2 ) You have access to the conditional marginal distributions. That is, while f (x 1, x 2 ) is ugly, f (x 1 x 2 ) and f (x 2 x 1 ) are easy to sample from. 4 / 1

Gibbs Sampling 1. Start with (x 0 1, x 0 2 ) 2. Sample x 1 1 f (x 1 x 0 2 ) 3. Sample x 1 2 f (x 2 x 1 1 ) 4. We have a LLN and CLT that states that: 1 N N g(x i ) i=1 g(x)f (x)dx 5 / 1

Gibbs Sampling: Example Assume a distribution: x N (0, Σ) Σ = [ 1 ρ ρ 1 Then we can get the conditional marginal distributions: Iteratively sample from these. x 1 x 2 N ( ρx 2, (1 ρ) 2) x 2 x 1 N ( ρx 1, (1 ρ) 2) ] See Gibbs.m 6 / 1

Gibbs Sampling: Example 7 / 1

Gibbs Sampling: Example 8 / 1

Gibbs Sampling: Example 9 / 1

Gibbs Sampling: Example 10 / 1

Gibbs Sampling: Example 11 / 1

Gibbs Sampling: Example 12 / 1

Gibbs Sampling: Example 13 / 1

Gibbs Sampling: Example 14 / 1

Gibbs Sampling: Example 15 / 1

Gibbs Sampling: Example 16 / 1

Gibbs Sampling: Example 17 / 1

Gibbs Sampling: Example 18 / 1

Gibbs Sampling: Example 19 / 1

Gibbs Sampling: Example 20 / 1

Gibbs Sampling: Example 21 / 1

Gibbs Sampling: Example 22 / 1

Gibbs Sampling: Example 23 / 1

Gibbs Sampling: Example 24 / 1

Gibbs Sampling: Example 25 / 1

Gibbs Sampling: Example 26 / 1

Metropolis Hastings What if we can only evaluate likelihood at a given point? 1. We start with some (multi-dimensional) value x i and a proposal distribution g(x x i ) 2. Grab a new sample from our proposal distribution: x g(x x i ) 3. Calculate acceptance probability: { pr(x i, x ) = min 1, f (x ) g(x i x } ) f (x i ) g(x, x i ) 4. Accept the new value with probability pr(x i, x ), otherwise, stay there. 5. This again converges in distribution to the true distribution. 27 / 1

A convenient proposal density If our proposal density is symmetric g(x i x ) = g(x, x i ) This is called random-walk Metropolis-Hastings Our acceptance probability is easy: { pr(x i, x ) = min 1, f (x } ) f (x i ) 28 / 1

Metropolis-Hastings: Example Let s say our distribution is one-dimensional: x 0.5U(0, 1) + 0.25U( 1, 2) + 0.25U(0.5, 0.75) Choose sampling distribution centered around current point: g(x x) N (x, 0.1) 29 / 1

Sampling PDF Let s say our distribution is one-dimensional: x 0.5U(0, 1) + 0.25U( 1, 2) + 0.25U(0.5, 0.75) Choose sampling distribution centered around current point: g(x x) N (x, 0.1) 30 / 1

Metropolist-Hastings Random Walk: Example 31 / 1

Metropolist-Hastings Random Walk: Example 32 / 1

Metropolist-Hastings Random Walk: Example 33 / 1

Metropolist-Hastings Random Walk: Example 34 / 1

Metropolist-Hastings Random Walk: Example 35 / 1

Metropolist-Hastings Random Walk: Example 36 / 1

Metropolist-Hastings Random Walk: Example 37 / 1

Metropolist-Hastings Random Walk: Example 38 / 1

Metropolist-Hastings Random Walk: Example 39 / 1

Metropolist-Hastings Random Walk: Example 40 / 1

Metropolist-Hastings Random Walk: Example 41 / 1

Metropolist-Hastings Random Walk: Example 42 / 1

Metropolist-Hastings Random Walk: Example 43 / 1

Metropolist-Hastings Random Walk: Example 44 / 1

Metropolist-Hastings Random Walk: Example 45 / 1

Metropolist-Hastings Random Walk: Example 46 / 1

Metropolist-Hastings Random Walk: Example 47 / 1

Metropolist-Hastings Random Walk: Example 48 / 1

Metropolist-Hastings Random Walk: Example 49 / 1

Metropolist-Hastings Random Walk: Example 50 / 1

Metropolist-Hastings Random Walk: Example 51 / 1

Metropolist-Hastings Random Walk: Example 52 / 1

Metropolist-Hastings Random Walk: Example 53 / 1

Metropolist-Hastings Random Walk: Example 54 / 1

Metropolist-Hastings Random Walk: Example 55 / 1

Metropolist-Hastings Random Walk: Example 56 / 1

Metropolist-Hastings Random Walk: Example 57 / 1

Metropolist-Hastings Random Walk: Example 58 / 1

Metropolist-Hastings Random Walk: Example 59 / 1

Metropolist-Hastings Random Walk: Example 60 / 1

Sampling PDF Let s say our distribution is one-dimensional: x 0.5U(0, 1) + 0.25U( 1, 2) + 0.25U(0.5, 0.75) Choose sampling distribution centered around current point: g(x x) N (x, 0.1) 61 / 1

Why learn MH & Numerical Integration? Many-dimensional problems Bayesian estimation Write down model Write down distribution of parameters f (θ) Simulate many models to get model distribution of data f (x θ) Update your beliefs: f (θ x) f (θ)f (x θ) Typically need to draw from posterior distribution without an analytical calculation Use M-H 62 / 1

Aside: Multiple Hypothesis Testing and Maximum F-Statistics Data is tortured. When you see.....an experiment with multiple test groups or with without very strong theoretical justification, be skeptical!...a regression that could have been run differently, or with many potential controls, weighting options, and unit-of-observation choices, be skeptical! Not all bad: t-statistics might just be heuristics...when I see 0.01 in a regression where I could imagine 10 other setups, I know the real p-value is around 0.1. But we might want to take statistics seriously, or data-mine honestly We can simulate the distribution of the maximum F-statistic, or the maximum t-statistic. See MonteCarlo.do and similar exercises (Note: Stata!) 63 / 1