Sampling Methods (11/30/04)

Similar documents
Probabilistic Machine Learning

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018

Machine Learning for Data Science (CS4786) Lecture 24

16 : Markov Chain Monte Carlo (MCMC)

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

17 : Markov Chain Monte Carlo

Probabilistic Graphical Networks: Definitions and Basic Results

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version)

Markov Chain Monte Carlo

Markov chain Monte Carlo Lecture 9

Markov Chains and MCMC

STA 4273H: Statistical Machine Learning

Chris Bishop s PRML Ch. 8: Graphical Models

Markov Networks.

MCMC and Gibbs Sampling. Sargur Srihari

Markov Chains and MCMC

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

16 : Approximate Inference: Markov Chain Monte Carlo

Inference in Bayesian Networks

CPSC 540: Machine Learning

Introduction to Machine Learning CMU-10701

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Graphical Models and Kernel Methods

STA 4273H: Statistical Machine Learning

LECTURE 15 Markov chain Monte Carlo

CS 343: Artificial Intelligence

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Lecture 6: Bayesian Inference in SDE Models

Monte Carlo Methods. Geoff Gordon February 9, 2006

MCMC and Gibbs Sampling. Kayhan Batmanghelich

Directed Graphical Models

Bayesian Machine Learning

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

28 : Approximate Inference - Distributed MCMC

Review: Directed Models (Bayes Nets)

Bayesian Inference and MCMC

Answers and expectations

an introduction to bayesian inference

Bayesian Networks BY: MOHAMAD ALSABBAGH

The Ising model and Markov chain Monte Carlo

Markov Chain Monte Carlo

Probabilistic Graphical Models

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

Probabilistic Graphical Models (I)

6 Markov Chain Monte Carlo (MCMC)

Recent Advances in Bayesian Inference Techniques

Lecture 7 and 8: Markov Chain Monte Carlo

Session 3A: Markov chain Monte Carlo (MCMC)

Markov Chain Monte Carlo

Chapter 16. Structured Probabilistic Models for Deep Learning

Part IV: Monte Carlo and nonparametric Bayes

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Bayesian Networks. instructor: Matteo Pozzi. x 1. x 2. x 3 x 4. x 5. x 6. x 7. x 8. x 9. Lec : Urban Systems Modeling

Probabilistic Graphical Models

Bayesian Graphical Models

A note on Reversible Jump Markov Chain Monte Carlo

CSC 446 Notes: Lecture 13

Bayesian Networks Inference with Probabilistic Graphical Models

MCMC: Markov Chain Monte Carlo

DAG models and Markov Chain Monte Carlo methods a short overview

9 Forward-backward algorithm, sum-product on factor graphs

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Probabilistic Graphical Models

eqr094: Hierarchical MCMC for Bayesian System Reliability

Markov Chain Monte Carlo methods

Bayesian Inference. Chapter 1. Introduction and basic concepts

13: Variational inference II

Density Propagation for Continuous Temporal Chains Generative and Discriminative Models

Lecture 6: Graphical Models

STA 4273H: Statistical Machine Learning

Probabilistic Graphical Models

Markov chain Monte Carlo

3 : Representation of Undirected GM

MCMC notes by Mark Holder

Stochastic Simulation

MCMC 2: Lecture 3 SIR models - more topics. Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham

Monte Carlo Methods. Leon Gu CSD, CMU

Stochastic inference in Bayesian networks, Markov chain Monte Carlo methods

Probabilistic Graphical Models

Machine Learning for Data Science (CS4786) Lecture 24

CSC 2541: Bayesian Methods for Machine Learning

An Introduction to Bayesian Networks: Representation and Approximate Inference

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling

Lecture 8: Bayesian Networks

Sampling Algorithms for Probabilistic Graphical models

Lecture 13 : Variational Inference: Mean Field Approximation

Variational Scoring of Graphical Model Structures


Introduction to Machine Learning

Markov Chain Monte Carlo methods

Towards a Bayesian model for Cyber Security

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Oliver Schulte - CMPT 419/726. Bishop PRML Ch.

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

Bayesian Networks Introduction to Machine Learning. Matt Gormley Lecture 24 April 9, 2018

An introduction to Sequential Monte Carlo

Particle Methods as Message Passing

Transcription:

CS281A/Stat241A: Statistical Learning Theory Sampling Methods (11/30/04) Lecturer: Michael I. Jordan Scribe: Jaspal S. Sandhu 1 Gibbs Sampling Figure 1: Undirected and directed graphs, respectively, with set of nodes colored that represent the minimum set of nodes necessary to render node x i conditionally independent of all of the other nodes in the graph. A Markov blanket is the minimum set of nodes that renders node x i conditionally independent of all other nodes in the directed graph. In the undirected case, simple graph separation suffices (as shown). For the directed case, the Markov blanket (also shown) consists of the parents of x i, the children of x i, and the parents of the children of x i (the coparents of x i ). If interested in this topic, see BUGS (Bayesian inference Using Gibbs Sampling) software project (http://www.mrcbsu.cam.ac.uk/bugs/welcome.shtml). When performing Gibbs sampling, for each hidden node X i, construct a conditional probability p(x i x i ), where x i is the set of nodes in the graph not including x i. Note that the above conditional independence arguments imply that it suffices to condition on the graph separators of x i for the undirected graph, and the Markov blanket for the directed case. Gibbs sampling proceeds by sampling each hidden variable from the appropriate conditional distribution, given the current values of the other variables in the graph. Marginal probabilities can be estimated by summing over the samples. 1

2 Sampling Methods 2 MCMC (Markov Chain Monte Carlo) State: x t [x: the state of the entire graph, time t of Markov chain] Transition matrix: T (x, x ) [transition matrix is homogeneous if T is independent of t] Invariant distribution: p (x) x T (x, x)p (x ) [a trivial example of this is the identity matrix, where you stay in the same state no matter the distribution] Detailed balance: p (x)t (x, x ) p (x )T (x, x) detailed balance is a sufficient condition for invariance ( but not ). [In this context, the term detailed - which comes from statistical physics - means local.] Proof: p (x )T (x, x) p (x)t (x, x ) x x p (x) T (x, x ) x p (x) Ergodicity: This is when: (1) there is a non-zero probability of getting from any state to any other state and (2) there are no (deterministic) cycles. Refer to the book for a more rigorous definition. Ergodicity we have an equilibrium distribution (unique, invariant). Book recommendation: Norris, James R. et al. Markov Chains, Cambridge U. Press: 1998. (neither too elementary nor too advanced) 2.1 Metropolis-Hastings (M-H) Current state: x Propose to move to x according to proposal distribution q t (x, x ) Accept the proposal with probability: M-H loves to go uphill, is willing to go downhill. Evaluating p(x), p(x ) is easy. ( A t (x, x) min 1, p(x ) q t (x ), x) p(x) q t (x, x ) q t is a simple distribution, e.g., Gaussian, so it is easy to compute by definition. In general, the normalizer terms for p(x), p(x ) will cancel in this ratio. Uphill steps will be accepted with probability 1; downhill steps will be accepted with probability A t (x, x).

Sampling Methods 3 Figure 2: If the transition is symmetric, the q t ( ) terms cancel. This is the Metropolis algorithm: Metropolis, N., Rosenbluth, A. W., Rosenbluth, N., Teller, A. H.,and Teller, E. (1953) Equation of state calculation by fast computing machines. Journal of Chemical Physics 21: 1087-1092. Note: for Ising model and q flip one bit, M-H Gibbs M-H satisfies detailed balance. Proof: p(x)q t (x, x )A t (x, x) min(p(x)q t (x, x ), p(x )q t (x, x)) min(p(x )q t (x, x), p(x)q t (x, x )) p(x )q t (x, x)a t (x, x ) Ergodicity is based on a reasonable choice for proposal distribution (e.g., Gaussian), to ensure that all paths from any state to any other state are non-zero. 2.2 Gibbs as M-H Fix x i. Consider updating node x i (x is x with x i held fixed and x i variable) according to proposal distribution: q(x, x ) p(x i x i ) by Bayes Theorem:

4 Sampling Methods p(x ) q(x, x) p(x) q(x, x ) p(x i x i ) p(x i ) p(x i x i ) p(x i x i ) p(x i ) p(x i x i) 1 (i.e. always accept proposal) Book recommendations: Robert, Christian P. and George Casella. Monte Carlo Statistical Methods, Springer-Verlag: 2004. Liu, Jun S. Monte Carlo Strategies in Scientific Computing, Springer-Verlag: 2001. Gilks, W.R. et al. Markov Chain Monte Carlo in Practice, Chapman-Hall: 1995. 2.3 Particle Filtering Figure 3: Graphical model on which particle filtering is often used. Interest is in p(x t y (t) ), y (t) (y 1,..., y i.e. - filtering. Consider some function f(x f(x (f(x p(x t y (t) )dx t (f(x p(x t y t, y (t 1) )dx t f(xp(y t x p(x t y (t 1) )dx t p(yt x p(x t y (t 1) )dx t M m1 t f(x (m)

Sampling Methods 5 where {x t } are samples drawn from p(x t y (t+1) ) and t p(y t x (m) m p(y t x (m) Figure 4: Graphical illustration of particle filter updates. This schematic of the particle filter shows the posterior represented as a mixture model at time t. M samples are drawn from this distribution and p(y t+1 x (m) t+1 ) is used to determine the new weights wm (t+1). p(x t+1 y (t) ) p(x t+1 x t, y (t) )p(x t y (t) )dx t p(x t+1 x p(x t y t, (t 1) )dx t p(xt+1 x p(y t x p(x t y (t 1) )dx t p(yt x p(x t y (t 1) )dx t M t p(x t+1 x (m)