Machine Learning for Data Science (CS4786) Lecture 24

Similar documents
Machine Learning for Data Science (CS4786) Lecture 24

Sampling Algorithms for Probabilistic Graphical models

CS 343: Artificial Intelligence

Inference in Bayesian Networks

Bayes Nets: Sampling

An Introduction to Bayesian Networks: Representation and Approximate Inference

Stochastic inference in Bayesian networks, Markov chain Monte Carlo methods

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

MCMC and Gibbs Sampling. Kayhan Batmanghelich

Markov chain Monte Carlo Lecture 9

Bayesian networks: approximate inference

Graphical Models and Kernel Methods

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

Sampling Methods (11/30/04)

Bayesian Machine Learning - Lecture 7

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

16 : Approximate Inference: Markov Chain Monte Carlo

Probabilistic Graphical Models

Lecture 8: Bayesian Networks

Probabilistic Graphical Networks: Definitions and Basic Results

Artificial Intelligence Bayesian Networks

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

CS 188: Artificial Intelligence. Bayes Nets

Stochastic Simulation

13: Variational inference II

Artificial Intelligence

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

Informatics 2D Reasoning and Agents Semester 2,

Markov Chains and MCMC

Introduction to Machine Learning CMU-10701

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Probabilistic Graphical Models

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

The Ising model and Markov chain Monte Carlo

Bayesian Networks BY: MOHAMAD ALSABBAGH

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

Machine Learning for Data Science (CS4786) Lecture 19

Intelligent Systems (AI-2)

Monte Carlo Methods. Leon Gu CSD, CMU

CSC 446 Notes: Lecture 13

Statistical Approaches to Learning and Discovery

Bayesian Inference. Anders Gorm Pedersen. Molecular Evolution Group Center for Biological Sequence Analysis Technical University of Denmark (DTU)

Bayesian Methods for Machine Learning

Graphical Models. Lecture 15: Approximate Inference by Sampling. Andrew McCallum

Lecture 6: Graphical Models

Results: MCMC Dancers, q=10, n=500

Lecture 8: PGM Inference

Lecture 6: Graphical Models: Learning

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

Markov Chain Monte Carlo Methods

MCMC: Markov Chain Monte Carlo

Bayesian Network. Outline. Bayesian Network. Syntax Semantics Exact inference by enumeration Exact inference by variable elimination

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Machine Learning. Torsten Möller.

Bayesian Inference and MCMC

Approximate Inference

DAG models and Markov Chain Monte Carlo methods a short overview

Machine Learning Summer School

18 : Advanced topics in MCMC. 1 Gibbs Sampling (Continued from the last lecture)

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Oliver Schulte - CMPT 419/726. Bishop PRML Ch.

Bayes Net Representation. CS 188: Artificial Intelligence. Approximate Inference: Sampling. Variable Elimination. Sampling.

STA 4273H: Statistical Machine Learning

CPSC 540: Machine Learning

Variational Inference (11/04/13)

Lecture 18 Generalized Belief Propagation and Free Energy Approximations

17 : Markov Chain Monte Carlo

Announcements. Inference. Mid-term. Inference by Enumeration. Reminder: Alarm Network. Introduction to Artificial Intelligence. V22.

Markov Chain Monte Carlo

Introduction to Bayesian Learning

Bayesian Networks. instructor: Matteo Pozzi. x 1. x 2. x 3 x 4. x 5. x 6. x 7. x 8. x 9. Lec : Urban Systems Modeling

Inference in Bayesian networks

Bayes Nets III: Inference

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

16 : Markov Chain Monte Carlo (MCMC)

Introduction to Restricted Boltzmann Machines

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

Markov Networks. l Like Bayes Nets. l Graph model that describes joint probability distribution using tables (AKA potentials)

Reminder of some Markov Chain properties:

13 : Variational Inference: Loopy Belief Propagation

References. Markov-Chain Monte Carlo. Recall: Sampling Motivation. Problem. Recall: Sampling Methods. CSE586 Computer Vision II

STA 4273H: Statistical Machine Learning

13 : Variational Inference: Loopy Belief Propagation and Mean Field

UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Particle-Based Approximate Inference on Graphical Model

CS Lecture 3. More Bayesian Networks

Part 1: Expectation Propagation

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

PROBABILISTIC REASONING SYSTEMS

Probabilistic Machine Learning

Markov Chain Monte Carlo

Lecture 7 and 8: Markov Chain Monte Carlo

Linear Dynamical Systems

an introduction to bayesian inference

Bayes Networks 6.872/HST.950

Probabilistic Graphical Models

Markov-Chain Monte Carlo

Rapid Introduction to Machine Learning/ Deep Learning

Transcription:

Machine Learning for Data Science (CS4786) Lecture 24 Graphical Models: Approximate Inference Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/

BELIEF PROPAGATION OR MESSAGE PASSING Each node passes messages to its parents and children Guaranteed to work on a tree To get marginal of node X v (given evidence) Parents(X v ) messages to X v P(X v Parent(X v )) For general graphs, belief propagation need not work Inference for general graphs can be computationally hard Can we perform inference approximately?

WHAT IS APPROXIMATE INFERENCE? Obtain ˆP(X v Observation) that is close to P(X v Observation) Additive approximation: ˆP(Xv Observation) P(X v Observation) Multiplicative approximation: (1 ) ˆP(X v Observation) P(X v Observation) (1 + )

APPROXIMATE INFERENCE Two approaches: Inference via sampling: generate instances from the model, compute marginals Use exact inference but move to a close enough simplified model

INFERENCE VIA SAMPLING Law of large numbers: empirical distribution using large samples approximates the true distribution Some approaches: Rejection sampling: sample all the variables, retain only ones that match evidence Importance sampling: Sample from a different distribution but then apply correction while computing empirical marginals Gibbs sampling: iteratively sample from distributions closer and closer to the true one

REJECTION SAMPLING Example: X 1 X 2 X 3 X 4 ˆP(X v = 1) P(X v = 1) 1 # of samples

REJECTION SAMPLING Algorithm: Topologically sort variables (parents first children later) For t = 1 to n For i = 1 to N End For End For Sample x t i P(X i X 1 = x t 1,...,X i 1 = x t i 1) Throw away x t s that do not match observations

REJECTION SAMPLING Example: X 1 X 2 X 3 X 4 What about ˆP(Xv = 1observation) P(X v = 1observation)?

IMPORTANCE SAMPLING Example: Likelihood weighting X 1 X 2

IMPORTANCE SAMPLING Likelihood weighting: Topologically sort variables (parents first children later) For t = 1 to n Set w t = 1 For i = 1 to N End For End For If X i is observed, set w t w t P(X i = x i X 1 = x t 1,...,X i 1 = x t i 1) Else, sample x t i P(X i X 1 = x t 1,...,X i 1 = x t i 1) To compute P(VariableObservation) set, P(Variable = valueobservation) = n t=1 w t1{variable = value} n t=1 w t

IMPORTANCE SAMPLING More generally importance sampling is given by: Draw x 1,...,x n Q Notice that E X P [f (X)] = E X Q P(X) Q(X) f (X) Hence, ˆP(X = x) 1 n n P(X = x) 1{x t = x} t=1 Q(X = x) Idea draw samples from Q but re-weight them

GIBBS SAMPLING Fix values of observed variables v to the observations (x 1 v = x v ) Randomly initialize all other variables u by randomly sampling x 1 u For t = 2 to n For i = 1 to N If X i is observed set x t i = xt 1 i Else sample x t+1 i from x t+1 i P(X i X 1 = x t+1 1,...,X i 1 = x t+1 i 1, X i+1 = x t i+1,...,x N = x t N) End For End For Take (x n 1,...,xn N ) as one sample and repeat

MCMC SAMPLING IN GENERAL Gibbs sampling belongs o a class of methods called Markov Chain Monte Carlo methods We start by sampling from some simple distribution Set up a markov chain whose stationary distribution is the target distribution That is, based on previous sample (state) we transit to the next state, and then to the next state and so on If the transition probabilities are set up right, after multiple transitions, our sample looks like one from target distribution