CS Lecture 4. Markov Random Fields

Similar documents
Probabilistic Graphical Models

CS Lecture 3. More Bayesian Networks

Undirected Graphical Models: Markov Random Fields

6.867 Machine learning, lecture 23 (Jaakkola)

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah

Undirected Graphical Models

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah

1 Undirected Graphical Models. 2 Markov Random Fields (MRFs)

3 : Representation of Undirected GM

Intelligent Systems (AI-2)

2 : Directed GMs: Bayesian Networks

Representation of undirected GM. Kayhan Batmanghelich

Chris Bishop s PRML Ch. 8: Graphical Models

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

9 Forward-backward algorithm, sum-product on factor graphs

Probabilistic Graphical Models

A brief introduction to Conditional Random Fields

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Gibbs Fields & Markov Random Fields

Undirected Graphical Models

Probabilistic Graphical Models (I)

Machine Learning 4771

Machine Learning Lecture 14

1 : Introduction. 1 Course Overview. 2 Notation. 3 Representing Multivariate Distributions : Probabilistic Graphical Models , Spring 2014

3 Undirected Graphical Models

CSC 412 (Lecture 4): Undirected Graphical Models

Lecture 6: Graphical Models

Ma/CS 6b Class 25: Error Correcting Codes 2

Directed and Undirected Graphical Models

Review: Directed Models (Bayes Nets)

CS281A/Stat241A Lecture 19

Independencies. Undirected Graphical Models 2: Independencies. Independencies (Markov networks) Independencies (Bayesian Networks)

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Probabilistic Graphical Models

2 : Directed GMs: Bayesian Networks

10 : HMM and CRF. 1 Case Study: Supervised Part-of-Speech Tagging

Lecture 4 October 18th

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Directed and Undirected Graphical Models

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Bayesian Networks

Conditional Independence and Factorization

Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013

Probabilistic Graphical Models

Alternative Parameterizations of Markov Networks. Sargur Srihari

Chapter 16. Structured Probabilistic Models for Deep Learning

Statistical Approaches to Learning and Discovery

Intelligent Systems:

Announcements. CS 188: Artificial Intelligence Spring Probability recap. Outline. Bayes Nets: Big Picture. Graphical Model Notation

Intelligent Systems (AI-2)

Learning P-maps Param. Learning

Gibbs Field & Markov Random Field

Statistical Learning

Probabilistic Graphical Models

Directed Graphical Models or Bayesian Networks

Graphical Models and Kernel Methods

Graphical Models. Lecture 4: Undirected Graphical Models. Andrew McCallum

Undirected graphical models

CS839: Probabilistic Graphical Models. Lecture 2: Directed Graphical Models. Theo Rekatsinas

Random Field Models for Applications in Computer Vision

Junction Tree, BP and Variational Methods

Lecture 15. Probabilistic Models on Graph

CPSC 540: Machine Learning

Rapid Introduction to Machine Learning/ Deep Learning

Intelligent Systems (AI-2)

Graphical Model Inference with Perfect Graphs

Probabilistic Graphical Models

Conditional Random Field

Alternative Parameterizations of Markov Networks. Sargur Srihari

Probabilistic Graphical Models Lecture Notes Fall 2009

Inference in Graphical Models Variable Elimination and Message Passing Algorithm

BN Semantics 3 Now it s personal! Parameter Learning 1

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

COMPSCI 650 Applied Information Theory Apr 5, Lecture 18. Instructor: Arya Mazumdar Scribe: Hamed Zamani, Hadi Zolfaghari, Fatemeh Rezaei

Based on slides by Richard Zemel

Outline. Spring It Introduction Representation. Markov Random Field. Conclusion. Conditional Independence Inference: Variable elimination

Probabilistic Graphical Models

Probabilistic Models

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

Low-Density Parity-Check Codes

An Introduction to Bayesian Machine Learning

5. Sum-product algorithm

Message Passing and Junction Tree Algorithms. Kayhan Batmanghelich

Directed Graphical Models

Markov Networks.

MAP Examples. Sargur Srihari

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

Chapter 10: Random Fields

Clique trees & Belief Propagation. Siamak Ravanbakhsh Winter 2018

Probabilistic Machine Learning

Graphical models. Sunita Sarawagi IIT Bombay

Lecture 8: Bayesian Networks

Revisiting the Limits of MAP Inference by MWSS on Perfect Graphs

Bayesian Networks Introduction to Machine Learning. Matt Gormley Lecture 24 April 9, 2018

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Review: Bayesian learning and inference

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

Transcription:

CS 6347 Lecture 4 Markov Random Fields

Recap Announcements First homework is available on elearning Reminder: Office hours Tuesday from 10am-11am Last Time Bayesian networks Today Markov random fields 2

D-separation Let I(p) be the set of all independence relationships in the joint distribution p and I(G) be the set of all independence relationships implied by the graph G We say that G is an I-map for I(p) if I G I(p) Theorem: the joint probability distribution, p, factorizes with respect to the DAG G = (V, E) iff G is an I-map for I(p) An I-map is perfect if I G = I p Not always possible to perfectly represent all of the independence relations with a graph 3

Limits of Bayesian Networks Not all sets of independence relations can be captured by a Bayesian network A C B, D B D A, C Possible DAGs that represent these independence relationships? A D B D B C C A 4

I-Maps A C B D What independence relations does this model imply? 5

I-Maps A C B D I G =, this is an I-map for any joint distribution on four variables! 6

Naïve Bayes Y... X 1 X 2 X n p y, x 1,, x n = p(y)p x 1 y p(x n y) In practice, we often have variables that we observe directly and those that can only be observed indirectly 7

Naïve Bayes Y... X 1 X 2 X n p y, x 1,, x n = p(y)p x 1 y p(x n y) This model assumes that X 1,, X n are independent given Y, sometimes called naïve Bayes 8

Example: Naïve Bayes Let Y be a binary random variable indicating whether or not an email is a piece of spam For each word in the dictionary, create a binary random variable X i indicating whether or not word i appears in the email For simplicity, we will assume that X 1,, X n are independent given Y How do we compute the probability that an email is spam? 9

Hidden Markov Models... Y 1 Y 2 Y T 1 Y T... X 1 X 2 X T 1 X T p x 1,, x T, y 1,, y T = p y 1 p x 1 y 1 p y t y t 1 p(x t y t ) Used in coding, speech recognition, etc. Independence assertions? t=2

Markov Random Fields (MRFs) A Markov random field is an undirected graphical model Undirected graph G = (V, E) One node for each random variable Potential function or "factor" associated with cliques, C, of the graph Nonnegative potential functions represent interactions and need not correspond to conditional probabilities (may not even sum to one) 11

Markov Random Fields (MRFs) A Markov random field is an undirected graphical model Corresponds to a factorization of the joint distribution p x 1,, x n = 1 Z c C ψ c (x c ) Z = ψ c (x c ) x 1,,x n c C 12

Markov Random Fields (MRFs) A Markov random field is an undirected graphical model Corresponds to a factorization of the joint distribution p x 1,, x n = 1 Z c C ψ c (x c ) Z = ψ c (x c ) x 1,,x n c C Normalizing constant, Z, often called the partition function 13

Independence Assertions A B C p x A, x B, x C = 1 Z ψ AB x A, x B ψ BC (x B, x C ) How does separation imply independence? Showed that A C B on board last lecture 14

Independence Assertions If X V is graph separated from Y V by Z V, (i.e., all paths from X to Y go through Z) then X Y Z What independence assertions follow from this MRF? A D B C 15

Independence Assertions Each variable is independent of all of its non-neighbors given its neighbors All paths leaving a single variable must pass through some neighbor If the joint probability distribution, p, factorizes with respect to the graph G, then G is an I-map for p If G is an I-map of a strictly positive distribution p, then p factorizes with respect to the graph G Hamersley-Clifford Theorem 16

MRF Examples Given a graph G = (V, E), express the following as probability distributions that factorize over G Uniform distribution over independent sets Uniform distribution over vertex covers (done on the board) 17

BNs vs. MRFs Property Bayesian Networks Markov Random Fields Factorization Conditional Distributions Potential Functions Distribution Product of Conditional Distributions Normalized Product of Potentials Cycles Directed Not Allowed Allowed Partition Function 1 Potentially NP-hard to Compute Independence Test d-separation Graph Separation 18

Moralization Every Bayesian network can be converted into an MRF with some possible loss of independence information Remove the direction of all arrows in the network If A and B are parents of C in the Bayesian network, we add an edge between A and B in the MRF This procedure is called "moralization" because it "marries" the parents of every node A B D C? 19

Moralization Every Bayesian network can be converted into an MRF with some possible loss of independence information Remove the direction of all arrows in the network If A and B are parents of C in the Bayesian network, we add an edge between A and B in the MRF This procedure is called "moralization" because it "marries" the parents of every node A D A D B C B C 20

Moralization C C A B A B What independence information is lost? 21

Factorizations Many factorizations over the same graph may represent the same joint distribution Some are better than others (e.g., they more compactly represent the distribution) Simply looking at the graph is not enough to understand which specific factorization is being assumed A D B C 22

Factor Graphs Factor graphs are used to explicitly represent a given factorization over a given graph Not a different model, but rather different way to visualize an MRF Undirected bipartite graph with two types of nodes: variable nodes (circles) and factor nodes (squares) Factor nodes are connected to the variable nodes on which they depend 23

Factor Graphs p x A, x B, x C = 1 Z ψ AB(x A, x B )ψ BC (x B, x C ) ψ AC x A, x C A ψ AB ψ AC B ψ BC C 24

MRF Examples Given a graph G = (V, E), express the following as probability distributions that factorize over G Express the uniform distribution over matchings (i.e., subsets of edges such that no two edges in the set have a common endpoint) as a factor graph (done on the board) 25

Conditional Random Fields (CRFs) Undirected graphical models that represent conditional probability distributions p Y X) Potentials can depend on both X and Y p Y X) = 1 Z(x) c C ψ c (x c, y c ) Z x = y ψ c (x c, y c ) c C 26

Log-Linear Models CRFs often assume that the potentials are log-linear functions ψ c x c, y c = exp(w f c (x c, y c )) f c is referred to as a feature vector and w is some vector of feature weights The feature weights are typically learned from data CRFs don t require us to model the full joint distribution (which may not be possible anyhow) 27

Conditional Random Fields (CRFs) Binary image segmentation Label the pixels of an image as belonging to the foreground or background +/- correspond to foreground/background Interaction between neighboring pixels in the image depends on how similar the pixels are Similar pixels should preference having the same spin (i.e., being in the same part of the image) 28

Conditional Random Fields (CRFs) Binary image segmentation This can be modeled as a CRF where the image information (e.g., pixel colors) is observed, but the segmentation is unobserved Because the model is conditional, we don t need to describe the joint probability distribution of (natural) images and their foreground/background segmentations CRFs will be particularly important when we want to learn graphical models from observed data 29

Low Density Parity Check Codes Want to send a message across a noisy channel in which bits can be flipped with some probability use error correcting codes ψ A ψ B ψ C y 1 y 2 y 3 y 4 y 5 y 6 φ 1 φ 2 φ 3 φ 4 φ 5 φ 6 x 1 x 2 x 3 x 4 x 5 x 6 ψ A, ψ B, ψ C are all parity check constraints: they equal one if their input contains an even number of ones and zero otherwise φ i x i, y i = p y i x i, the probability that the ith bit was flipped during transmission 30

Low Density Parity Check Codes ψ A ψ B ψ C y 1 y 2 y 3 y 4 y 5 y 6 φ 1 φ 2 φ 3 φ 4 φ 5 φ 6 x 1 x 2 x 3 x 4 x 5 x 6 The parity check constraints enforce that the y s can only be one of a few possible codewords: 000000, 001011, 010101, 011110, 100110, 101101, 110011, 111000 Decoding the message that was sent is equivalent to computing the most likely codeword under the joint probability distribution 31

Low Density Parity Check Codes ψ A ψ B ψ C y 1 y 2 y 3 y 4 y 5 y 6 φ 1 φ 2 φ 3 φ 4 φ 5 φ 6 x 1 x 2 x 3 x 4 x 5 x 6 Most likely codeword is given by MAP inference arg max y p y x Do we need to compute the partition function for MAP inference? 32