ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Similar documents
Undirected Graphical Models: Markov Random Fields

CS Lecture 4. Markov Random Fields

Intelligent Systems (AI-2)

3 : Representation of Undirected GM

Undirected Graphical Models

Probabilistic Graphical Models

Probabilistic Graphical Models

Representation of undirected GM. Kayhan Batmanghelich

Probabilistic Graphical Models

Alternative Parameterizations of Markov Networks. Sargur Srihari

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Alternative Parameterizations of Markov Networks. Sargur Srihari

Markov Networks.

Chris Bishop s PRML Ch. 8: Graphical Models

Independencies. Undirected Graphical Models 2: Independencies. Independencies (Markov networks) Independencies (Bayesian Networks)

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Markov Random Fields for Computer Vision (Part 1)

Review: Directed Models (Bayes Nets)

BN Semantics 3 Now it s personal! Parameter Learning 1

Artificial Intelligence Bayes Nets: Independence

ECE 5984: Introduction to Machine Learning

Lecture 6: Graphical Models

Machine Learning Lecture 14

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

Cheng Soon Ong & Christian Walder. Canberra February June 2018

CSC 412 (Lecture 4): Undirected Graphical Models

CS 343: Artificial Intelligence

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

1 Undirected Graphical Models. 2 Markov Random Fields (MRFs)

Undirected Graphical Models

CS 5522: Artificial Intelligence II

Machine Learning 4771

CPSC 540: Machine Learning

Bayes Nets: Independence

Undirected Graphical Models

An Introduction to Bayesian Machine Learning

Graphical Models and Kernel Methods

Rapid Introduction to Machine Learning/ Deep Learning

Directed Graphical Models

Clique trees & Belief Propagation. Siamak Ravanbakhsh Winter 2018

3 Undirected Graphical Models

6.867 Machine learning, lecture 23 (Jaakkola)

Conditional Independence and Factorization

Intelligent Systems (AI-2)

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Intelligent Systems (AI-2)

Bayesian Networks. Machine Learning, Fall Slides based on material from the Russell and Norvig AI Book, Ch. 14

Directed Graphical Models or Bayesian Networks

Using Graphs to Describe Model Structure. Sargur N. Srihari

Directed and Undirected Graphical Models

Probabilistic Graphical Models

Bayesian Networks Introduction to Machine Learning. Matt Gormley Lecture 24 April 9, 2018

ECE 5424: Introduction to Machine Learning

Gibbs Field & Markov Random Field

CS Lecture 3. More Bayesian Networks

ECE 5984: Introduction to Machine Learning

Random Field Models for Applications in Computer Vision

Introduction to Graphical Models

Probabilistic Graphical Models Lecture Notes Fall 2009

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech

Undirected graphical models

Probabilistic Graphical Models (I)

Learning P-maps Param. Learning

Intelligent Systems (AI-2)

STATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas. Markov networks: Representation

Lecture 15. Probabilistic Models on Graph

Statistical Learning

Bayesian Networks Representation

Graphical Models. Lecture 4: Undirected Graphical Models. Andrew McCallum

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah

Markov Networks. l Like Bayes Nets. l Graphical model that describes joint probability distribution using tables (AKA potentials)

Probabilistic Graphical Models

Gibbs Fields & Markov Random Fields

BN Semantics 3 Now it s personal!

Markov Networks. l Like Bayes Nets. l Graph model that describes joint probability distribution using tables (AKA potentials)

Conditional Independence

Introduction To Graphical Models

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

4 : Exact Inference: Variable Elimination

MAP Examples. Sargur Srihari

ECE 5984: Introduction to Machine Learning

Product rule. Chain rule

Partially Directed Graphs and Conditional Random Fields. Sargur Srihari

COMP90051 Statistical Machine Learning

Bayesian Networks (Part II)

Bayesian Networks BY: MOHAMAD ALSABBAGH

Lecture 4 October 18th

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Markov Random Fields

Chapter 16. Structured Probabilistic Models for Deep Learning

Probabilistic Machine Learning

EE512 Graphical Models Fall 2009

Markov Chains and MCMC

Revisiting the Limits of MAP Inference by MWSS on Perfect Graphs

CS 188: Artificial Intelligence. Bayes Nets

Example: multivariate Gaussian Distribution

Discriminative Fields for Modeling Spatial Dependencies in Natural Images

Transcription:

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics Markov Random Fields: Representation Conditional Random Fields Log-Linear Models Readings: KF 4.1-3; Barber 4.1-2 Dhruv Batra Virginia Tech

Administrativia No class Next week (Tue, Thu) Project Proposal Due: Mar 12, Mar 5, 11:59pm <=2pages, NIPS format HW2 Out later today Due: Mar 12, 11:59pm Implementation: Variable Elimination in BNs (C) Dhruv Batra 2

Recap of Last Time (C) Dhruv Batra 3

Markov Nets Set of random variables Undirected graph Encodes independence assumptions Unnormalized Factor Tables Joint distribution: Product of Factors (C) Dhruv Batra 4

Pairwise MRFs Pairwise Factors A function of 2 variables Often unary terms are also allowed (although strictly speaking unnecessary) On board (C) Dhruv Batra 5

Pairwise MRF: Example (C) Dhruv Batra 6

Computing probabilities in Markov networks vs BNs In a BN, can compute prob. of an instantiation by multiplying CPTs In an Markov networks, can only compute ratio of probabilities directly (C) Dhruv Batra Slide Credit: Carlos Guestrin 7

Normalization for computing probabilities To compute actual probabilities, must compute normalization constant (also called partition function) Computing partition function is hard! Must sum over all possible assignments (C) Dhruv Batra Slide Credit: Carlos Guestrin 8

Nearest-Neighbor Grids Low Level Vision Image denoising Stereo Optical flow Shape from shading Superresolution Segmentation unobserved or hidden variable local observation (C) Dhruv Batra Slide Credit: Erik Sudderth 9

General Gibbs Distribution Arbitrary Factors Induced MRF Graph (C) Dhruv Batra 10

Factorization in Markov networks Given an undirected graph H over variables X={X 1,...,X n } A distribution P factorizes over H if there exist subsets of variables D 1 X,, D m X, such that D i are fully connected in H non-negative potentials (or factors) φ 1 (D 1 ),, φ m (D m ) also known as clique potentials such that Also called Markov random field H, or Gibbs distribution over H (C) Dhruv Batra Slide Credit: Carlos Guestrin 11

MRFs Given a graph H, are factors unique? (C) Dhruv Batra 12

Active Trails and Separation A path X 1 X k is active when set of variables Z are observed if none of X i {X 1,,X k } are observed (are part of Z) Variables X are separated from Y given Z in graph If no active path between any X X and any Y Y given Z T 1 T 2 T 3 T 4 T 5 T 6 T (C) Dhruv Batra 7 T 8 T 9 13

Markov networks representation Theorem 1 If joint probability distribution P: Then H is an I-map for P If you can write distribution as a normalized product of factors Then Can read independencies from graph (C) Dhruv Batra Slide Credit: Carlos Guestrin 14

What about the other direction for Markov networks? If H is an I-map for P Then joint probability distribution P: Counter-example: X 1,,X 4 are binary, and only eight assignments have positive probability: For example, X 1 X 3 X 2,X 4 : E.g., P(X 1 =0 X 2 =0, X 4 =0) But distribution doesn t factorize!! (C) Dhruv Batra Slide Credit: Carlos Guestrin 15

Representation Theorem for Markov Networks Hammersley Clifford theorem If joint probability distribution P: Then H is an I-map for P If H is an I-map for P and P is a positive distribution Then joint probability distribution P: (C) Dhruv Batra Slide Credit: Carlos Guestrin 16

Markov Blanket = Markov Blanket of variable x 8 Parents, children and parents of children (C) Dhruv Batra Slide Credit: Simon J.D. Prince 17

Independence Assumptions in MNs Separation defines global independencies Pairwise Markov Independence: Pairs of non-adjacent variables A,B are independent given all others T 3 T 1 T 4 T 2 T 5 T 6 Markov Blanket: Variable A independent of rest given its neighbors T 7 T 8 T 9 (C) Dhruv Batra Slide Credit: Carlos Guestrin 18

P-map Perfect map G is a P-map for P if I(P) = I(G) Question: Does every distribution P have P-map? (C) Dhruv Batra 19

Structure in cliques Possible potentials for this graph: A B C

Factor graphs Bipartite graph: variable nodes (ovals) for X 1,,X n factor nodes (squares) for φ 1,,φ m edge X i φ j if X i ε Scope[φ j ] A C B Very useful for approximate inference Make factor dependency explicit

Types of Graphical Models Directed Factor Undirected (C) Dhruv Batra Slide Credit: Erik Sudderth 22

Factor Graphs show Fine-grained Factorization p(x) = 1 Z f F ψ f (x f ) (C) Dhruv Batra Slide Credit: Erik Sudderth 23

Plan for today Undirected Graphical Models: Representation Conditional Random Fields Log-Linear Models Undirected Graphical Models: Inference Variable Elimination (C) Dhruv Batra 24

Conditional Random Fields What s the difference between Naïve Bayes & Logistic Regression? (C) Dhruv Batra 25

Nearest-Neighbor Grids Low Level Vision Image denoising Stereo Optical flow Shape from shading Superresolution Segmentation unobserved or hidden variable local observation

(C) Dhruv Batra 27

Logarithmic representation Standard model: Log representation of potential (assuming positive potential): also called the energy function Log representation of Markov net:

Log-linear Markov network (most common representation) Feature (or Sufficient Statistic) is some function f [D] for some subset of variables D e.g., indicator function Log-linear model over a Markov network H: a set of features f 1 [D 1 ],, f k [D k ] each D i is a subset of a clique in H two f s can be over the same variables a set of weights w 1,,w k usually learned from data

CRFs Felzenszwalb, Huttenlocher, IJCV 04 (C) Dhruv Batra 30

CRFs Node Features Edge Features f i (Y i ) f ij (Y i,y j ) f j (Y j ) (C) Dhruv Batra 31

Node Feature -- Color Superpixel Feature Extraction Step 1 Rmean Gmean Bmean Hmean Smean Vmean 5-dim hist on H 3-dim hist on S Hoiem, Efros, Hebert, IJCV 2007

Node Feature Color Clustering (µ 1," 1 ),(µ 2," 2 ),,(µ N," N ) K-means/X-means, Pelleg, Moore Auton Lab implementation Dictionary Feature Space

Node Feature -- Color Superpixel Feature Extraction Step 1 Feature Extraction Step 2 Pr (Cluster feature) R mean G mean B mean H mean S mean V mean 5-dim hist on H 3-dim hist on S Pr(Cluster 1 " i )! Pr(Cluster i " i )! Pr(Cluster N " i ) Hoiem, Efros, Hebert, IJCV 2007

Conditional Random Fields (C) Dhruv Batra 35

Summary of types of Markov nets Pairwise Markov networks very common potentials over nodes and edges General MRFs Factor graphs explicit representation of factors you know exactly what factors you have very useful for approximate inference Log-linear models log representation of potentials linear coefficients learned from data most common for learning MNs