Bayesian Inference and MCMC

Similar documents
Probabilistic Machine Learning

Density Estimation. Seungjin Choi

an introduction to bayesian inference

Bayesian Phylogenetics:

Tutorial on Probabilistic Programming with PyMC3

Answers and expectations

Bayesian Methods for Machine Learning

Modeling Environment

Approximate Inference using MCMC

Learning the hyper-parameters. Luca Martino

Introduction to Machine Learning

Probability and Estimation. Alan Moses

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

Markov Chain Monte Carlo

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Lecture 2: From Linear Regression to Kalman Filter and Beyond

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods

17 : Markov Chain Monte Carlo

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Bayesian Regression Linear and Logistic Regression

Monte Carlo Methods. Leon Gu CSD, CMU

Bayesian Methods: Naïve Bayes

Lecture 7 and 8: Markov Chain Monte Carlo

MCMC: Markov Chain Monte Carlo

MCMC and Gibbs Sampling. Kayhan Batmanghelich

Reminder of some Markov Chain properties:

Bayesian RL Seminar. Chris Mansley September 9, 2008

Bayes Nets: Sampling

19 : Slice Sampling and HMC

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Lecture 6: Markov Chain Monte Carlo

CSC321 Lecture 18: Learning Probabilistic Models

Bayesian Estimation with Sparse Grids

A Bayesian Approach to Phylogenetics

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

Monte Carlo in Bayesian Statistics

Machine Learning using Bayesian Approaches

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

CS 343: Artificial Intelligence

Introduction to MCMC. DB Breakfast 09/30/2011 Guozhang Wang

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2

Introduction to Bayesian inference

F denotes cumulative density. denotes probability density function; (.)

Notes on pseudo-marginal methods, variational Bayes and ABC

The Metropolis-Hastings Algorithm. June 8, 2012

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

Markov chain Monte Carlo

Accouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF

STA 4273H: Statistical Machine Learning

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

Bayesian Networks BY: MOHAMAD ALSABBAGH

Introduction to Machine Learning CMU-10701

Graphical Models and Kernel Methods

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

Lecture : Probabilistic Machine Learning

Patterns of Scalable Bayesian Inference Background (Session 1)

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

16 : Markov Chain Monte Carlo (MCMC)

MCMC algorithms for fitting Bayesian models

Probabilistic Graphical Networks: Definitions and Basic Results

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Introduction to Probabilistic Machine Learning

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling

CPSC 540: Machine Learning

Markov chain Monte Carlo

Point Estimation. Vibhav Gogate The University of Texas at Dallas

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Principles of Bayesian Inference

Introduction into Bayesian statistics

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Sampling Algorithms for Probabilistic Graphical models

COMP90051 Statistical Machine Learning

Markov Chain Monte Carlo (MCMC)

Need for Sampling in Machine Learning. Sargur Srihari

MCMC notes by Mark Holder

STA414/2104 Statistical Methods for Machine Learning II

DAG models and Markov Chain Monte Carlo methods a short overview

Markov Chain Monte Carlo, Numerical Integration

Risk Estimation and Uncertainty Quantification by Markov Chain Monte Carlo Methods

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?

Markov Chain Monte Carlo methods

18 : Advanced topics in MCMC. 1 Gibbs Sampling (Continued from the last lecture)

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Oliver Schulte - CMPT 419/726. Bishop PRML Ch.

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

Part 1: Expectation Propagation

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Who was Bayes? Bayesian Phylogenetics. What is Bayes Theorem?

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Bayesian Phylogenetics

Machine Learning for Data Science (CS4786) Lecture 24

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

MCMC Methods: Gibbs and Metropolis

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Lecture 2: Conjugate priors

CS 630 Basic Probability and Information Theory. Tim Campbell

Transcription:

Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18

Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the outcome of a coin flip trial. We are interested in learning the dynamics of the world to explain how this data was generated (p(d θ)) In our example θ is the probability of observing head in a coin trial Learning θ will enable us to also predict future outcomes (P(x θ)) 2 / 18

Bayesian Inference - Motivation The primary question is how to infer θ Observing the sample set D gives us some information about θ, however there is still some uncertainty about it (specially when we have very few samples) Furthermore we might have some prior knowledge about θ, which we are interested to take into account In Bayesian approach we embrace this uncertainty by calculating the posterior p(θ D) 3 / 18

Bayes rule Using Bayes rule we know: P(θ D) = P(D θ)p(θ) P(D) P(D θ)p(θ) Where P(D θ) is the data likelihood, P(θ) is the prior, and P(D) is called the evidence In Maximum Likelihood estimation (MLE) we find a θ that maximizes the likelihood: arg max {P(D θ)} θ In Maximum a posteriori (MAP) estimation, the prior is also incorporated: arg max {P(D θ)p(θ)} θ 4 / 18

Bayesian Inference Alternatively, instead of learning a fixed point-value for θ, we can incorporate the uncertainty around θ We can predict the probability of observing a new sample x by marginalizing over θ: P(x D) = P(θ D)P(x θ)dθ θ In cases such as when the model is simple and conjugate priors are being used the posterior and the above integral can be solved analytically However in many practical cases it is difficult to solve the integral in closed form 5 / 18

Monte Carlo methods Although it might be difficult to solve the previous integral, however if we can take samples from the posterior distribution, it can be approximated as P(θ D)P(x θ)dθ 1 P(x θ (i) ) θ n 1 i n Where θ (i) s are samples from the posterior: θ (i) P(θ D) This estimation is called Monte Carlo method 6 / 18

Monte Carlo methods In its general form, Monte Carlo estimates the following expectation P(x)f (x)dx 1 f (x (s) ) x S x (s) P(x) 1 s S It is useful wherever we need to compute difficult integrals: Posterior marginals Finding moments (expectations) Predictive distributions Model comparison 7 / 18

Bias and variance of Monte Carlo Monte Carlo is an unbiased estimation: E[ 1 S f (x (s) )] = 1 S 1 s S E[f (x (s) )] = E[f (x)] 1 s S The variance reduces proportional to S: Var( 1 S 1 s S f (x (s) )) = 1 S 2 1 s S Var(f (x (s) ))) = 1 Var(f (x)) S 8 / 18

How to sample from P(x)? One way is to first sample from a Uniform[0,1] generator: u Uniform[0, 1] Transform the sample as: h(x) = x inf p(x )dx x(u) = h 1 (u) This assumes we can easily compute h 1 (u), which is not always true 9 / 18

Rejection Sampling Another approach is to define a simple distribution q(z) and find a k where for all z: Draw z 0 q(z) Draw u Uniform[0, kq(z 0 )] Discard if u > p(z 0 ) kq(z) p(z) 10 / 18

Rejection sampling in high dimensions Curse of dimensionality makes rejection sampling inefficient It is difficult to find a good q(x) in high dimensions and the discard rate can get very high For example consider P(x) = N(0, I ), where x is D dimensional Then for q(x) = N(0, σi ) (with σ 1), the acceptance rate will be σ D 11 / 18

Markov Chains A Markov chain is a stochastic model for a sequence of random variables that satisfies the Markov property A chain has Markov property if each state is only dependent on the previous state It is also called memoryless property E.g for the sequence x (1),..., x (n) we would have: P(x (i) x (1),..., x (i 1) ) = P(x (i) x (i 1) ) 12 / 18

Markov Chain Monte Carlo (MCMC) An alternative to rejection sampling is to generate dependent samples Similarly, we define and sample from a proposal distribution But now we maintain the record of current state, and proposal distribution depends on it In this setting the samples form a Markov Chain 13 / 18

Markov Chain Monte Carlo (MCMC) Several variations of MCMC have been introduced Some popular variations are: Metropolis-Hasting, Slice sampling and Gibbs sampling They differ on aspects like how the proposal distribution is defined Some motivations are reducing correlation between successive samples in the Markov chain, or increasing the acceptance rate 14 / 18

Gibbs Sampling A simple, general MCMC algorithm Initialize x to some value Select an ordering for the variables x 1,..., x d (can be random or fixed) Pick each variable x i according to the order and resample P(x i x i ) There is no rejection when taking a new sample 15 / 18

Gibbs Sampling For example consider we have three variables P(x 1, x 2, x 3 ) At each round t, we take samples from the following distributions: x (t) 1 P(x 1 x (t 1) 2, x (t 1) 3 ) x (t) 2 P(x 2 x (t) 1, x (t 1) 3 ) x (t) 3 P(x 3 x (t) 1, x (t) 2 ) 16 / 18

Monte Carlo methods summary Useful when we need approximate methods to solve sums/integrals Monte Carlo does not explicitly depend on dimension, although simple methods work only in low dimensions Markov chain Monte Carlo (MCMC) can make local moves. By assuming less it is more applicable to higher dimensions It produces approximate, correlated samples Simple computations and easy to implement 17 / 18

Probabilistic programming languages In probablistic programming languages, such as Stan we can describe Bayesian models and perform inference Models can be described by defining the random variables, model parameters and their distributions Given a model description and a data set, Stan can then perform Bayesian inference using methods such as MCMC 18 / 18