Markov Chain Monte Carlo (MCMC)

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Markov Chain Monte Carlo (MCMC)"

Transcription

1 School of Computer Science Probabilistic Graphical Models Markov Chain Monte Carlo (MCMC) Readings: MacKay Ch. 29 Jordan Ch. 21 Matt Gormley Lecture 16 March 14,

2 Homework 2 Housekeeping Due March 16, 12:00 noon (extended) Midway Project Report Due March 23, 12:00 noon 2

3 Sample 1: 1. Data 2. Model D = {x (n) } N n=1 n v p d n time flies like an arrow p(x ) = 1 Z( ) Y C2C C(x C ) Y 1 Y 2 Y 3 Y 4 Y 5 Sample 2: Sample 3: Sample 4: n n v d n time flies like an arrow n v p n n flies p n n v v with fly with their wings time you will see 3. Objective X 1 X 2 X 3 X 4 X 5 `( ; D) = NX log p(x (n) ) n=1 5. Inference 1. Marginal Inference p(x C )= 2. Partition Function Z( ) = X x 3. MAP Inference ˆx = argmax x X x 0 :x 0 C =x C Y C2C p(x 0 ) C(x C ) p(x ) 4. Learning = argmax `( ; D) 3

4 It s Pi Day 2016 so let s compute π. 4

5 Slide from Ian Murray Properties of Monte Carlo Estimator: f(x)p (x) dx ˆf 1 S S f(x (s) ), x (s) P (x) s=1 Estimator is unbiased: ] E P ({x })[ ˆf (s) = 1 S S E P (x) [f(x)] s=1 = E P (x) [f(x)] Variance shrinks 1/S: ] var P ({x })[ ˆf (s) = 1 S S 2 var P (x) [f(x)] = var P (x) [f(x)] /S Error bars shrink like S s=1 5

6 Slide from Ian Murray A dumb approximation of π P (x, y) = { 1 0<x<1 and 0<y<1 0 otherwise π =4 I ( (x 2 + y 2 ) < 1 ) P (x, y) dx dy octave:1> S=12; a=rand(s,2); 4*mean(sum(a.*a,2)<1) ans = octave:2> S=1e7; a=rand(s,2); 4*mean(sum(a.*a,2)<1) ans =

7 Slide from Ian Murray Aside: don t always sample! Monte Carlo is an extremely bad method; it should be used only when all alternative methods are worse. Alan Sokal, 1996 Example: numerical solutions to (nice) 1D integrals are fast octave:1> 4 * sqrt(1-x.^2), 0, 1, tolerance) Gives π to 6 dp s in 108 evaluations, machine precision in (NB Matlab s quadl fails at zero tolerance) 7

8 Slide from Ian Murray Sampling from distributions Draw points uniformly under the curve: P (x) x (2) x (3) x (1) x (4) x Probability mass to left of point Uniform[0,1] 8

9 Slide from Ian Murray Sampling from distributions How to convert samples from a Uniform[0,1] generator: 1 h(y) h(y) = y p(y )dy p(y) Draw mass to left of point: u Uniform[0,1] Sample, y(u) =h 1 (u) 0 Figure from PRML, Bishop (2006) y Although we can t always compute and invert h(y) 9

10 Slide from Ian Murray Rejection sampling Sampling underneath a P (x) P (x) curve is also valid k Q(x) (x i,u i ) k opt Q(x) P (x) Draw underneath a simple curve k Q(x) P (x): Draw x Q(x) height u Uniform[0,k Q(x)] Discard the point if above P, i.e. if u> P (x) (x j,u j ) x (1) x 10

11 Slide from Ian Murray Importance sampling Computing P (x) and Q(x), thenthrowing x away seems wasteful Instead rewrite the integral as an expectation under Q: f(x)p (x) dx = 1 S f(x) P (x) Q(x) dx, Q(x) (Q(x) > 0 if P (x) > 0) S f(x (s) ) P (x(s) ) Q(x (s) ), x(s) Q(x) s=1 This is just simple Monte Carlo again, so it is unbiased. Importance sampling applies when the integral is not an expectation. Divide and multiply any integrand by a convenient distribution. 11

12 Slide from Ian Murray Importance sampling (2) Previous slide assumed we could evaluate P (x) = P (x)/z P f(x)p (x) dx Z Q Z P 1 S 1 S S f(x (s) ) P (x (s) ), x Q(x (s) Q(x) (s) ) s=1 S f(x (s) ) s=1 1 S }{{} r (s) r (s) s r (s ) S f(x (s) )w (s) s=1 This estimator is consistent but biased Exercise: Prove that Z P /Z Q 1 S s r(s) 12

13 Slide from Ian Murray Summary so far Sums and integrals, often expectations, occur frequently in statistics Monte Carlo approximates expectations with a sample average Rejection sampling draws samples from complex distributions Importance sampling applies Monte Carlo to any sum/integral 13

14 Slide from Ian Murray Application to large problems Pitfalls of Monte Carlo Rejection & importance sampling scale badly with dimensionality Example: P (x) =N (0, I), Q(x) =N (0, σ 2 I) Rejection sampling: Requires σ 1. Fraction of proposals accepted = σ D Importance sampling: Variance of importance weights = ( ) σ 2 D/2 2 1/σ 1 2 Infinite / undefined variance if σ 1/ 2 14

15 Outline Review: Monte Carlo MCMC (Basic Methods) Metropolis algorithm Metropolis- Hastings (M- H) algorithm Gibbs Sampling Markov Chains Transition probabilities Invariant distribution Equilibrium distribution Markov chain as a WFSM Constructing Markov chains Why does M- H work? MCMC (Auxiliary Variable Methods) Slice Sampling Hamiltonian Monte Carlo 15

16 Metropolis, Metropolis- Hastings, Gibbs Sampling MCMC (BASIC METHODS) 16

17 MCMC Goal: Draw approximate, correlated samples from a target distribution p(x) MCMC: Performs a biased random walk to explore the distribution 17

18 Simulations of MCMC Visualization of Metroplis- Hastings, Gibbs Sampling, and Hamiltonian MCMC: mcmc/ 18

19 Whiteboard Metropolis Algorithm Metropolis- Hastings Algorithm Gibbs Sampling 19

20 Gibbs Sampling x 2 p(x) x (t+1) x (t) p(x P (x) 1 x (t) 2 ) a) x 1 ( 20

21 Gibbs Sampling x 2 p(x) x (t+2) x (t+1) P (x) p(x 2 x (t+1) x (t) 1 ) a) x 1 ( 21

22 Gibbs Sampling x 2 p(x) x (t+2) x (t+1) x (t+3) x (t+4) x (t) P (x) a) x 1 ( 22

23 Figure from Jordan Ch. 21 Gibbs Sampling Full conditionals only need to condition on the Markov Blanket MRF Bayes Net Must be easy to sample from conditionals Many conditionals are log- concave and are amenable to adaptive rejection sampling 23

24 Whiteboard Gibbs Sampling as M- H 24

25 Figure from Bishop (2006) Random Walk Behavior of M- H For Metropolis- Hastings, a generic proposal distribution is: If ϵ is large, many rejections If ϵ is small, slow mixing q(x x (t) )=N (0, 2 ) max p(x) min q(x x (t) ) 25

26 Figure from Bishop (2006) Random Walk Behavior of M- H For Rejection Sampling, the accepted samples are are independent But for Metropolis- Hastings, the samples are correlated Question: How long must we wait to get effectively independent samples? min max q(x x (t) ) p(x) A: independent states in the M- H random walk are separated by roughly steps ( max / min ) 2 26

27 Definitions and Theoretical Justification for MCMC MARKOV CHAINS 27

28 Whiteboard Markov chains Transition probabilities Invariant distribution Equilibrium distribution Sufficient conditions for MCMC Markov chain as a WFSM 28

29 Detailed Balance S(x 0 x)p(x) =S(x x 0 )p(x 0 ) Detailed balance means that, for each pair of states x and x, arriving at x then x and arriving at x then x are equiprobable. x x x' x' 29

30 Whiteboard Simple Markov chain example Constructing Markov chains Transition Probabilities for MCMC 30

31 Practical Issues Question: Is it better to move along one dimension or many? Answer: For Metropolis- Hasings, it is sometimes better to sample one dimension at a time Q: Given a sequence of 1D proposals, compare rate of movement for one- at- a- time vs. concatenation. Answer: For Gibbs Sampling, sometimes better to sample a block of variables at a time Q: When is it tractable to sample a block of variables? 31

32 Practical Issues Question: How do we assess convergence of the Markov chain? Answer: It s not easy! Compare statistics of multiple independent chains Ex: Compare log- likelihoods Chain 1 Chain 2 Log- likelihood Log- likelihood # of MCMC steps # of MCMC steps 32

33 Practical Issues Question: How do we assess convergence of the Markov chain? Answer: It s not easy! Compare statistics of multiple independent chains Ex: Compare log- likelihoods Chain 1 Chain 2 Log- likelihood Log- likelihood # of MCMC steps # of MCMC steps 33

34 Practical Issues Question: Is one long Markov chain better than many short ones? Note: typical to discard initial samples (aka. burn- in ) since the chain might not yet have mixed Answer: Often a balance is best: Compared to one long chain: More independent samples Compared to many small chains: Less samples discarded for burn- in We can still parallelize Allows us to assess mixing by comparing chains 34

35 Whiteboard Blocked Gibbs Sampling 35

36 Slice Sampling, Hamiltonian Monte Carlo MCMC (AUXILIARY VARIABLE METHODS) 36

37 Slide from Ian Murray Auxiliary variables The point of MCMC is to marginalize out variables, but one can introduce more variables: f(x)p (x) dx = f(x)p (x, v) dx dv 1 S S f(x (s) ), x,v P (x, v) s=1 We might want to do this if P (x v) and P (v x) are simple P (x, v) is otherwise easier to navigate 37

38 Slice Sampling Motivation: Want samples from p(x) and don t know the normalizer Z Choosing a proposal at the correct scale is difficult Properties: Similar to Gibbs Sampling: one- dimensional transitions in the state space Similar to Rejection Sampling: (asymptotically) draws samples from the region under the curve p(x) An MCMC method with an adaptive proposal 38

39 Slide from Ian Murray Slice sampling idea Sample point uniformly under curve P (x) P (x) This is just an auxiliary- variable Gibbs Sampler! P (x) (x, u) u x Problem: Sampling from the conditional p(x u) might be infeasible. p(u x) =Uniform[0, P (x)] p(x u) { 1 P (x) u 0 otherwise = Uniform on the slice 39

40 Figure adapted from MacKay Ch. 29 Slice Sampling 1 40

41 Figure adapted from MacKay Ch. 29 Slice Sampling 1 41

42 Figure adapted from MacKay Ch. 29 Slice Sampling 1 42

43 Slice Sampling Algorithm: Goal: sample (x, u) given (u (t),x (t) ). u Uniform(0,p(x (t) ) Part 1: Stepping Out Sample interval (x l,x r ) enclosing x (t). r Uniform(u, w) (x l,x r )=(x (t) r, x (t) + w r) Expand until endpoints are outside region under curve. while( p(x l ) >u){x l = x l w} while( p(x r ) >u){x r = x r + w} Part 2: Sample x (Shrinking) while(true) { Draw x from within the interval (x l,x r ), then accept or shrink. x Uniform(x l,x r ) if( p(x) >u){break} else if(x >x (t) ){x r = x} else{x l = x} } x (t+1) = x, u (t+1) = u 43

44 Slice Sampling Algorithm: Goal: sample (x, u) given (u (t),x (t) ). u Uniform(0,p(x (t) ) Part 1: Stepping Out Sample interval (x l,x r ) enclosing x (t). r Uniform(u, w) (x l,x r )=(x (t) r, x (t) + w r) Expand until endpoints are outside region under curve. while( p(x l ) >u){x l = x l w} while( p(x r ) >u){x r = x r + w} Part 2: Sample x (Shrinking) while(true) { Draw x from within the interval (x l,x r ), then accept or shrink. x Uniform(x l,x r ) if( p(x) >u){break} else if(x >x (t) ){x r = x} else{x l = x} } x (t+1) = x, u (t+1) = u 44

45 Slice Sampling Algorithm: Goal: sample (x, u) given (u (t),x (t) ). u Uniform(0,p(x (t) ) Part 1: Stepping Out Sample interval (x l,x r ) enclosing x (t). r Uniform(u, w) (x l,x r )=(x (t) r, x (t) + w r) Expand until endpoints are outside region under curve. while( p(x l ) >u){x l = x l w} while( p(x r ) >u){x r = x r + w} Part 2: Sample x (Shrinking) while(true) { Draw x from within the interval (x l,x r ), then accept or shrink. x Uniform(x l,x r ) if( p(x) >u){break} else if(x >x (t) ){x r = x} else{x l = x} } x (t+1) = x, u (t+1) = u 45

46 Slice Sampling Multivariate Distributions Resample each variable x i one- at- a- time (just like Gibbs Sampling) Does not require sampling from p(x i {x j } j6=i ) Only need to evaluate a quantity proportional to the conditional p(x i {x j } j6=i ) / p(x i {x j } j6=i ) 46

47 Hamiltonian Monte Carlo Suppose we have a distribution of the form: p(x) =exp{ E(x)}/Z where x 2 R N We could use random- walk M- H to draw 2 R samples, but it seems a shame to discard gradient information r x E(x) If we can evaluate it, the gradient tells us where to look for high- probability regions! 47

48 Background: Hamiltonian Dyanmics Applications: Following the motion of atoms in a fluid through time Integrating the motion of a solar system over time Considering the evolution of a galaxy (i.e. the motion of its stars) molecular dynamics N- body simulations Properties: Total energy of the system H(x,p) stays constant Dynamics are reversible Important for detailed balance 48

49 Background: Hamiltonian Dyanmics Let x 2 R N p 2 R N Potential energy: Kinetic energy: Total energy: be a position be a momentum E(x) K(p) =p T p/2 H(x, p) =E(x)+K(p) Hamiltonian function Given a starting position x (1) and a starting momentum p (1) we can simulate the Hamiltonian dynamics of the system via: 1. Euler s method 2. Leapfrog method 3. etc. 49

50 Background: Hamiltonian Dyanmics Parameters to tune: 1. Step size, ϵ 2. Number of iterations, L Leapfrog Algorithm: for in 1...L: p = p 2 r xe(x) x = x + p p = p 2 r xe(x) 50

51 Figure from Neal (2011) Background: Hamiltonian Dyanmics (a) Euler s method, stepsize 0.3 (b) Modified Euler s method, stepsize Momentum ( p) 0 Momentum ( p) Position (q) Position (q) (c) Leapfrog method, stepsize 0.3 (d) Leapfrog method, stepsize Momentum ( p) 0 Momentum ( p) Position (q) Position (q) 51

52 Figure from Neal (2011) Goal: Define: Hamiltonian Monte Carlo Preliminaries p(x) =exp{ E(x)}/Z where x 2 R N 2 R K(p) =p T p/2 H(x, p) =E(x)+K(p) p(x, p) =exp{ H(x, p)}/z H =exp{ E(x} exp{ K(p)}/Z H Note: Since p(x,p) is separable ) X p ) X x p(x, p) =exp{ p(x, p) =exp{ E(x}/Z K(x}/Z K Target dist. Gaussian 52

53 Whiteboard Hamiltonian Monte Carlo algorithm (aka. Hybrid Monte Carlo) 53

54 Figure from Neal (2011) Hamiltonian Monte Carlo 2 Position coordinates 2 Momentum coordinates 2.6 Value of Hamiltonian

55 Figure from Neal (2011) M- H vs. HMC Random walk Metropolis Hamiltonian Monte Carlo

56 Simulations of MCMC Visualization of Metroplis- Hastings, Gibbs Sampling, and Hamiltonian MCMC: mcmc/ 56

57 Slide adapted from Daphe Koller Pros MCMC Summary Very general purpose Often easy to implement Good theoretical guarantees as Cons t!1 Lots of tunable parameters / design choices Can be quite slow to converge Difficult to tell whether it's working 57

Markov Networks.

Markov Networks. Markov Networks www.biostat.wisc.edu/~dpage/cs760/ Goals for the lecture you should understand the following concepts Markov network syntax Markov network semantics Potential functions Partition function

More information

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Oliver Schulte - CMPT 419/726. Bishop PRML Ch.

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Oliver Schulte - CMPT 419/726. Bishop PRML Ch. Sampling Methods Oliver Schulte - CMP 419/726 Bishop PRML Ch. 11 Recall Inference or General Graphs Junction tree algorithm is an exact inference method for arbitrary graphs A particular tree structure

More information

Monte Carlo Methods for Inference and Learning

Monte Carlo Methods for Inference and Learning Monte Carlo Methods for Inference and Learning Ryan Adams University of Toronto CIFAR NCAP Summer School 14 August 2010 http://www.cs.toronto.edu/~rpa Thanks to: Iain Murray, Marc Aurelio Ranzato Overview

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Approximate Inference using MCMC

Approximate Inference using MCMC Approximate Inference using MCMC 9.520 Class 22 Ruslan Salakhutdinov BCS and CSAIL, MIT 1 Plan 1. Introduction/Notation. 2. Examples of successful Bayesian models. 3. Basic Sampling Algorithms. 4. Markov

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

14 : Approximate Inference Monte Carlo Methods

14 : Approximate Inference Monte Carlo Methods 10-708: Probabilistic Graphical Models 10-708, Spring 2018 14 : Approximate Inference Monte Carlo Methods Lecturer: Kayhan Batmanghelich Scribes: Biswajit Paria, Prerna Chiersal 1 Introduction We have

More information

MCMC Methods: Gibbs and Metropolis

MCMC Methods: Gibbs and Metropolis MCMC Methods: Gibbs and Metropolis Patrick Breheny February 28 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/30 Introduction As we have seen, the ability to sample from the posterior distribution

More information

SAMPLING ALGORITHMS. In general. Inference in Bayesian models

SAMPLING ALGORITHMS. In general. Inference in Bayesian models SAMPLING ALGORITHMS SAMPLING ALGORITHMS In general A sampling algorithm is an algorithm that outputs samples x 1, x 2,... from a given distribution P or density p. Sampling algorithms can for example be

More information

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling 1 / 27 Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 27 Monte Carlo Integration The big question : Evaluate E p(z) [f(z)]

More information

CS 188: Artificial Intelligence. Bayes Nets

CS 188: Artificial Intelligence. Bayes Nets CS 188: Artificial Intelligence Probabilistic Inference: Enumeration, Variable Elimination, Sampling Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew

More information

The Origin of Deep Learning. Lili Mou Jan, 2015

The Origin of Deep Learning. Lili Mou Jan, 2015 The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets

More information

Monte Carlo Methods. Geoff Gordon February 9, 2006

Monte Carlo Methods. Geoff Gordon February 9, 2006 Monte Carlo Methods Geoff Gordon ggordon@cs.cmu.edu February 9, 2006 Numerical integration problem 5 4 3 f(x,y) 2 1 1 0 0.5 0 X 0.5 1 1 0.8 0.6 0.4 Y 0.2 0 0.2 0.4 0.6 0.8 1 x X f(x)dx Used for: function

More information

Approximate inference in Energy-Based Models

Approximate inference in Energy-Based Models CSC 2535: 2013 Lecture 3b Approximate inference in Energy-Based Models Geoffrey Hinton Two types of density model Stochastic generative model using directed acyclic graph (e.g. Bayes Net) Energy-based

More information

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Machine Learning. Torsten Möller.

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Machine Learning. Torsten Möller. Sampling Methods Machine Learning orsten Möller Möller/Mori 1 Recall Inference or General Graphs Junction tree algorithm is an exact inference method for arbitrary graphs A particular tree structure defined

More information

Paul Karapanagiotidis ECO4060

Paul Karapanagiotidis ECO4060 Paul Karapanagiotidis ECO4060 The way forward 1) Motivate why Markov-Chain Monte Carlo (MCMC) is useful for econometric modeling 2) Introduce Markov-Chain Monte Carlo (MCMC) - Metropolis-Hastings (MH)

More information

Hamiltonian Monte Carlo with Fewer Momentum Reversals

Hamiltonian Monte Carlo with Fewer Momentum Reversals Hamiltonian Monte Carlo with ewer Momentum Reversals Jascha Sohl-Dickstein December 6, 2 Hamiltonian dynamics with partial momentum refreshment, in the style of Horowitz, Phys. ett. B, 99, explore the

More information

RANDOM TOPICS. stochastic gradient descent & Monte Carlo

RANDOM TOPICS. stochastic gradient descent & Monte Carlo RANDOM TOPICS stochastic gradient descent & Monte Carlo MASSIVE MODEL FITTING nx minimize f(x) = 1 n i=1 f i (x) Big! (over 100K) minimize 1 least squares 2 kax bk2 = X i 1 2 (a ix b i ) 2 minimize 1 SVM

More information

Sampling Algorithms for Probabilistic Graphical models

Sampling Algorithms for Probabilistic Graphical models Sampling Algorithms for Probabilistic Graphical models Vibhav Gogate University of Washington References: Chapter 12 of Probabilistic Graphical models: Principles and Techniques by Daphne Koller and Nir

More information

DAG models and Markov Chain Monte Carlo methods a short overview

DAG models and Markov Chain Monte Carlo methods a short overview DAG models and Markov Chain Monte Carlo methods a short overview Søren Højsgaard Institute of Genetics and Biotechnology University of Aarhus August 18, 2008 Printed: August 18, 2008 File: DAGMC-Lecture.tex

More information

CSC 446 Notes: Lecture 13

CSC 446 Notes: Lecture 13 CSC 446 Notes: Lecture 3 The Problem We have already studied how to calculate the probability of a variable or variables using the message passing method. However, there are some times when the structure

More information

Markov Chain Monte Carlo, Numerical Integration

Markov Chain Monte Carlo, Numerical Integration Markov Chain Monte Carlo, Numerical Integration (See Statistics) Trevor Gallen Fall 2015 1 / 1 Agenda Numerical Integration: MCMC methods Estimating Markov Chains Estimating latent variables 2 / 1 Numerical

More information

Introduction to Stochastic Gradient Markov Chain Monte Carlo Methods

Introduction to Stochastic Gradient Markov Chain Monte Carlo Methods Introduction to Stochastic Gradient Markov Chain Monte Carlo Methods Changyou Chen Department of Electrical and Computer Engineering, Duke University cc448@duke.edu Duke-Tsinghua Machine Learning Summer

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

Particle-Based Approximate Inference on Graphical Model

Particle-Based Approximate Inference on Graphical Model article-based Approimate Inference on Graphical Model Reference: robabilistic Graphical Model Ch. 2 Koller & Friedman CMU, 0-708, Fall 2009 robabilistic Graphical Models Lectures 8,9 Eric ing attern Recognition

More information

Bayesian networks: approximate inference

Bayesian networks: approximate inference Bayesian networks: approximate inference Machine Intelligence Thomas D. Nielsen September 2008 Approximative inference September 2008 1 / 25 Motivation Because of the (worst-case) intractability of exact

More information

AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET. Machine Learning T. Schön. (Chapter 11) AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET. Machine Learning T. Schön. (Chapter 11) AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET About the Eam I/II) ), Lecture 7 MCMC and Sampling Methods Thomas Schön Division of Automatic Control Linköping University Linköping, Sweden. Email: schon@isy.liu.se, Phone: 3-373, www.control.isy.liu.se/~schon/

More information

Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics

Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics Eric Slud, Statistics Program Lecture 1: Metropolis-Hastings Algorithm, plus background in Simulation and Markov Chains. Lecture

More information

Lecture Notes based on Koop (2003) Bayesian Econometrics

Lecture Notes based on Koop (2003) Bayesian Econometrics Lecture Notes based on Koop (2003) Bayesian Econometrics A.Colin Cameron University of California - Davis November 15, 2005 1. CH.1: Introduction The concepts below are the essential concepts used throughout

More information

Computer Intensive Methods in Mathematical Statistics

Computer Intensive Methods in Mathematical Statistics Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of

More information

Three examples of a Practical Exact Markov Chain Sampling

Three examples of a Practical Exact Markov Chain Sampling Three examples of a Practical Exact Markov Chain Sampling Zdravko Botev November 2007 Abstract We present three examples of exact sampling from complex multidimensional densities using Markov Chain theory

More information

STA 294: Stochastic Processes & Bayesian Nonparametrics

STA 294: Stochastic Processes & Bayesian Nonparametrics MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September

More information

Simulation of point process using MCMC

Simulation of point process using MCMC Simulation of point process using MCMC Jakob G. Rasmussen Department of Mathematics Aalborg University Denmark March 2, 2011 1/14 Today When do we need MCMC? Case 1: Fixed number of points Metropolis-Hastings,

More information

Tutorial on Probabilistic Programming with PyMC3

Tutorial on Probabilistic Programming with PyMC3 185.A83 Machine Learning for Health Informatics 2017S, VU, 2.0 h, 3.0 ECTS Tutorial 02-04.04.2017 Tutorial on Probabilistic Programming with PyMC3 florian.endel@tuwien.ac.at http://hci-kdd.org/machine-learning-for-health-informatics-course

More information

Hamiltonian Monte Carlo

Hamiltonian Monte Carlo Hamiltonian Monte Carlo within Stan Daniel Lee Columbia University, Statistics Department bearlee@alum.mit.edu BayesComp mc-stan.org Why MCMC? Have data. Have a rich statistical model. No analytic solution.

More information

A = {(x, u) : 0 u f(x)},

A = {(x, u) : 0 u f(x)}, Draw x uniformly from the region {x : f(x) u }. Markov Chain Monte Carlo Lecture 5 Slice sampler: Suppose that one is interested in sampling from a density f(x), x X. Recall that sampling x f(x) is equivalent

More information

An introduction to Sequential Monte Carlo

An introduction to Sequential Monte Carlo An introduction to Sequential Monte Carlo Thang Bui Jes Frellsen Department of Engineering University of Cambridge Research and Communication Club 6 February 2014 1 Sequential Monte Carlo (SMC) methods

More information

Adaptive Monte Carlo methods

Adaptive Monte Carlo methods Adaptive Monte Carlo methods Jean-Michel Marin Projet Select, INRIA Futurs, Université Paris-Sud joint with Randal Douc (École Polytechnique), Arnaud Guillin (Université de Marseille) and Christian Robert

More information

An ABC interpretation of the multiple auxiliary variable method

An ABC interpretation of the multiple auxiliary variable method School of Mathematical and Physical Sciences Department of Mathematics and Statistics Preprint MPS-2016-07 27 April 2016 An ABC interpretation of the multiple auxiliary variable method by Dennis Prangle

More information

Markov-Chain Monte Carlo

Markov-Chain Monte Carlo Markov-Chain Monte Carlo CSE586 Computer Vision II Spring 2010, Penn State Univ. References Recall: Sampling Motivation If we can generate random samples x i from a given distribution P(x), then we can

More information

Markov Chain Monte Carlo Methods for Stochastic

Markov Chain Monte Carlo Methods for Stochastic Markov Chain Monte Carlo Methods for Stochastic Optimization i John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge U Florida, Nov 2013

More information

Markov chain Monte Carlo

Markov chain Monte Carlo Markov chain Monte Carlo Peter Beerli October 10, 2005 [this chapter is highly influenced by chapter 1 in Markov chain Monte Carlo in Practice, eds Gilks W. R. et al. Chapman and Hall/CRC, 1996] 1 Short

More information

ELEC633: Graphical Models

ELEC633: Graphical Models ELEC633: Graphical Models Tahira isa Saleem Scribe from 7 October 2008 References: Casella and George Exploring the Gibbs sampler (1992) Chib and Greenberg Understanding the Metropolis-Hastings algorithm

More information

Sequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007

Sequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007 Sequential Monte Carlo and Particle Filtering Frank Wood Gatsby, November 2007 Importance Sampling Recall: Let s say that we want to compute some expectation (integral) E p [f] = p(x)f(x)dx and we remember

More information

General Construction of Irreversible Kernel in Markov Chain Monte Carlo

General Construction of Irreversible Kernel in Markov Chain Monte Carlo General Construction of Irreversible Kernel in Markov Chain Monte Carlo Metropolis heat bath Suwa Todo Department of Applied Physics, The University of Tokyo Department of Physics, Boston University (from

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

Monte Carlo Simulations

Monte Carlo Simulations Monte Carlo Simulations What are Monte Carlo Simulations and why ones them? Pseudo Random Number generators Creating a realization of a general PDF The Bootstrap approach A real life example: LOFAR simulations

More information

Parallel MCMC with Generalized Elliptical Slice Sampling

Parallel MCMC with Generalized Elliptical Slice Sampling Journal of Machine Learning Research 15 (2014) 2087-2112 Submitted 10/12; Revised 1/14; Published 6/14 Parallel MCMC with Generalized Elliptical Slice Sampling Robert Nishihara Department of Electrical

More information

MCMC Review. MCMC Review. Gibbs Sampling. MCMC Review

MCMC Review. MCMC Review. Gibbs Sampling. MCMC Review MCMC Review http://jackman.stanford.edu/mcmc/icpsr99.pdf http://students.washington.edu/fkrogsta/bayes/stat538.pdf http://www.stat.berkeley.edu/users/terry/classes/s260.1998 /Week9a/week9a/week9a.html

More information

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Bagging During Markov Chain Monte Carlo for Smoother Predictions Bagging During Markov Chain Monte Carlo for Smoother Predictions Herbert K. H. Lee University of California, Santa Cruz Abstract: Making good predictions from noisy data is a challenging problem. Methods

More information

Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 13 Mar 1, 2018

Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 13 Mar 1, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Backpropagation Matt Gormley Lecture 13 Mar 1, 2018 1 Reminders Homework 5: Neural

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

Practical Numerical Methods in Physics and Astronomy. Lecture 5 Optimisation and Search Techniques

Practical Numerical Methods in Physics and Astronomy. Lecture 5 Optimisation and Search Techniques Practical Numerical Methods in Physics and Astronomy Lecture 5 Optimisation and Search Techniques Pat Scott Department of Physics, McGill University January 30, 2013 Slides available from http://www.physics.mcgill.ca/

More information

Based on slides by Richard Zemel

Based on slides by Richard Zemel CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we

More information

MCMC notes by Mark Holder

MCMC notes by Mark Holder MCMC notes by Mark Holder Bayesian inference Ultimately, we want to make probability statements about true values of parameters, given our data. For example P(α 0 < α 1 X). According to Bayes theorem:

More information

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features

More information

Particle Filters. Pieter Abbeel UC Berkeley EECS. Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics

Particle Filters. Pieter Abbeel UC Berkeley EECS. Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics Particle Filters Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics Motivation For continuous spaces: often no analytical formulas for Bayes filter updates

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. 2 Biostatistics, School of Public

More information

Bayesian Networks BY: MOHAMAD ALSABBAGH

Bayesian Networks BY: MOHAMAD ALSABBAGH Bayesian Networks BY: MOHAMAD ALSABBAGH Outlines Introduction Bayes Rule Bayesian Networks (BN) Representation Size of a Bayesian Network Inference via BN BN Learning Dynamic BN Introduction Conditional

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 12 Feb 23, 2018

Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 12 Feb 23, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Backpropagation Matt Gormley Lecture 12 Feb 23, 2018 1 Neural Networks Outline

More information

Approximate Slice Sampling for Bayesian Posterior Inference

Approximate Slice Sampling for Bayesian Posterior Inference Approximate Slice Sampling for Bayesian Posterior Inference Anonymous Author 1 Anonymous Author 2 Anonymous Author 3 Unknown Institution 1 Unknown Institution 2 Unknown Institution 3 Abstract In this paper,

More information

SAMSI Astrostatistics Tutorial. More Markov chain Monte Carlo & Demo of Mathematica software

SAMSI Astrostatistics Tutorial. More Markov chain Monte Carlo & Demo of Mathematica software SAMSI Astrostatistics Tutorial More Markov chain Monte Carlo & Demo of Mathematica software Phil Gregory University of British Columbia 26 Bayesian Logical Data Analysis for the Physical Sciences Contents:

More information

Advanced Sampling Algorithms

Advanced Sampling Algorithms + Advanced Sampling Algorithms + Mobashir Mohammad Hirak Sarkar Parvathy Sudhir Yamilet Serrano Llerena Advanced Sampling Algorithms Aditya Kulkarni Tobias Bertelsen Nirandika Wanigasekara Malay Singh

More information

Convergence Rate of Markov Chains

Convergence Rate of Markov Chains Convergence Rate of Markov Chains Will Perkins April 16, 2013 Convergence Last class we saw that if X n is an irreducible, aperiodic, positive recurrent Markov chain, then there exists a stationary distribution

More information

Numerical Analysis for Statisticians

Numerical Analysis for Statisticians Kenneth Lange Numerical Analysis for Statisticians Springer Contents Preface v 1 Recurrence Relations 1 1.1 Introduction 1 1.2 Binomial CoefRcients 1 1.3 Number of Partitions of a Set 2 1.4 Horner's Method

More information

Auxiliary Particle Methods

Auxiliary Particle Methods Auxiliary Particle Methods Perspectives & Applications Adam M. Johansen 1 adam.johansen@bristol.ac.uk Oxford University Man Institute 29th May 2008 1 Collaborators include: Arnaud Doucet, Nick Whiteley

More information

Elliptical slice sampling

Elliptical slice sampling Iain Murray Ryan Prescott Adams David J.C. MacKay University of Toronto University of Toronto University of Cambridge Abstract Many probabilistic models introduce strong dependencies between variables

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation

More information

An Brief Overview of Particle Filtering

An Brief Overview of Particle Filtering 1 An Brief Overview of Particle Filtering Adam M. Johansen a.m.johansen@warwick.ac.uk www2.warwick.ac.uk/fac/sci/statistics/staff/academic/johansen/talks/ May 11th, 2010 Warwick University Centre for Systems

More information

Monte Carlo integration

Monte Carlo integration Monte Carlo integration Eample of a Monte Carlo sampler in D: imagine a circle radius L/ within a square of LL. If points are randoml generated over the square, what s the probabilit to hit within circle?

More information

Bayesian model selection in graphs by using BDgraph package

Bayesian model selection in graphs by using BDgraph package Bayesian model selection in graphs by using BDgraph package A. Mohammadi and E. Wit March 26, 2013 MOTIVATION Flow cytometry data with 11 proteins from Sachs et al. (2005) RESULT FOR CELL SIGNALING DATA

More information

Lecture 6: Monte-Carlo methods

Lecture 6: Monte-Carlo methods Miranda Holmes-Cerfon Applied Stochastic Analysis, Spring 2015 Lecture 6: Monte-Carlo methods Readings Recommended: handout on Classes site, from notes by Weinan E, Tiejun Li, and Eric Vanden-Eijnden Optional:

More information

Various lecture notes for

Various lecture notes for Various lecture notes for 18311. R. R. Rosales (MIT, Math. Dept., 2-337) April 12, 2013 Abstract Notes, both complete and/or incomplete, for MIT s 18.311 (Principles of Applied Mathematics). These notes

More information

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber Data Modeling & Analysis Techniques Probability & Statistics Manfred Huber 2017 1 Probability and Statistics Probability and statistics are often used interchangeably but are different, related fields

More information

Numerical integration and importance sampling

Numerical integration and importance sampling 2 Numerical integration and importance sampling 2.1 Quadrature Consider the numerical evaluation of the integral I(a, b) = b a dx f(x) Rectangle rule: on small interval, construct interpolating function

More information

Inference in Bayesian Networks

Inference in Bayesian Networks Andrea Passerini passerini@disi.unitn.it Machine Learning Inference in graphical models Description Assume we have evidence e on the state of a subset of variables E in the model (i.e. Bayesian Network)

More information

Estimation theory and information geometry based on denoising

Estimation theory and information geometry based on denoising Estimation theory and information geometry based on denoising Aapo Hyvärinen Dept of Computer Science & HIIT Dept of Mathematics and Statistics University of Helsinki Finland 1 Abstract What is the best

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Introduction: exponential family, conjugacy, and sufficiency (9/2/13)

Introduction: exponential family, conjugacy, and sufficiency (9/2/13) STA56: Probabilistic machine learning Introduction: exponential family, conjugacy, and sufficiency 9/2/3 Lecturer: Barbara Engelhardt Scribes: Melissa Dalis, Abhinandan Nath, Abhishek Dubey, Xin Zhou Review

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

A Level-Set Hit-And-Run Sampler for Quasi- Concave Distributions

A Level-Set Hit-And-Run Sampler for Quasi- Concave Distributions University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 2014 A Level-Set Hit-And-Run Sampler for Quasi- Concave Distributions Shane T. Jensen University of Pennsylvania Dean

More information

Expectation Maximization Algorithm

Expectation Maximization Algorithm Expectation Maximization Algorithm Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein, Luke Zettlemoyer and Dan Weld The Evils of Hard Assignments? Clusters

More information

Review and Motivation

Review and Motivation Review and Motivation We can model and visualize multimodal datasets by using multiple unimodal (Gaussian-like) clusters. K-means gives us a way of partitioning points into N clusters. Once we know which

More information

Notes on pseudo-marginal methods, variational Bayes and ABC

Notes on pseudo-marginal methods, variational Bayes and ABC Notes on pseudo-marginal methods, variational Bayes and ABC Christian Andersson Naesseth October 3, 2016 The Pseudo-Marginal Framework Assume we are interested in sampling from the posterior distribution

More information

Approximate Bayesian Computation: a simulation based approach to inference

Approximate Bayesian Computation: a simulation based approach to inference Approximate Bayesian Computation: a simulation based approach to inference Richard Wilkinson Simon Tavaré 2 Department of Probability and Statistics University of Sheffield 2 Department of Applied Mathematics

More information

A Review of Mathematical Techniques in Machine Learning. Owen Lewis

A Review of Mathematical Techniques in Machine Learning. Owen Lewis A Review of Mathematical Techniques in Machine Learning Owen Lewis May 26, 2010 Introduction This thesis is a review of an array topics in machine learning. The topics chosen are somewhat miscellaneous;

More information

Machine Learning, Fall 2012 Homework 2

Machine Learning, Fall 2012 Homework 2 0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0

More information

Infinite Mixtures of Gaussian Process Experts

Infinite Mixtures of Gaussian Process Experts in Advances in Neural Information Processing Systems 14, MIT Press (22). Infinite Mixtures of Gaussian Process Experts Carl Edward Rasmussen and Zoubin Ghahramani Gatsby Computational Neuroscience Unit

More information

Probabilistic Graphical Models Lecture Notes Fall 2009

Probabilistic Graphical Models Lecture Notes Fall 2009 Probabilistic Graphical Models Lecture Notes Fall 2009 October 28, 2009 Byoung-Tak Zhang School of omputer Science and Engineering & ognitive Science, Brain Science, and Bioinformatics Seoul National University

More information

The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo

The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo The No-U-Turn Sampler The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo Matthew D. Hoffman Department of Statistics Columbia University New York, NY 10027, USA Andrew Gelman

More information

dt 2 The Order of a differential equation is the order of the highest derivative that occurs in the equation. Example The differential equation

dt 2 The Order of a differential equation is the order of the highest derivative that occurs in the equation. Example The differential equation Lecture 18 : Direction Fields and Euler s Method A Differential Equation is an equation relating an unknown function and one or more of its derivatives. Examples Population growth : dp dp = kp, or = kp

More information

MLE/MAP + Naïve Bayes

MLE/MAP + Naïve Bayes 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes Matt Gormley Lecture 19 March 20, 2018 1 Midterm Exam Reminders

More information

Reduction of Variance. Importance Sampling

Reduction of Variance. Importance Sampling Reduction of Variance As we discussed earlier, the statistical error goes as: error = sqrt(variance/computer time). DEFINE: Efficiency = = 1/vT v = error of mean and T = total CPU time How can you make

More information

Methods of Data Analysis Random numbers, Monte Carlo integration, and Stochastic Simulation Algorithm (SSA / Gillespie)

Methods of Data Analysis Random numbers, Monte Carlo integration, and Stochastic Simulation Algorithm (SSA / Gillespie) Methods of Data Analysis Random numbers, Monte Carlo integration, and Stochastic Simulation Algorithm (SSA / Gillespie) Week 1 1 Motivation Random numbers (RNs) are of course only pseudo-random when generated

More information

Markov Chain Monte Carlo Lecture 6

Markov Chain Monte Carlo Lecture 6 Sequential parallel tempering With the development of science and technology, we more and more need to deal with high dimensional systems. For example, we need to align a group of protein or DNA sequences

More information

Monte Carlo Methods for Computation and Optimization (048715)

Monte Carlo Methods for Computation and Optimization (048715) Technion Department of Electrical Engineering Monte Carlo Methods for Computation and Optimization (048715) Lecture Notes Prof. Nahum Shimkin Spring 2015 i PREFACE These lecture notes are intended for

More information