Probabilistic Graphical Models

Similar documents
Probabilistic Graphical Models

Probabilistic Graphical Models

Probabilistic Graphical Models

3 : Representation of Undirected GM

Undirected Graphical Models: Markov Random Fields

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

CSC 412 (Lecture 4): Undirected Graphical Models

Probabilistic Graphical Models

Independencies. Undirected Graphical Models 2: Independencies. Independencies (Markov networks) Independencies (Bayesian Networks)

1 Undirected Graphical Models. 2 Markov Random Fields (MRFs)

Markov Networks.

Representation of undirected GM. Kayhan Batmanghelich

Probabilistic Graphical Models

Variational Inference. Sargur Srihari

Probabilistic Graphical Models

CS Lecture 4. Markov Random Fields

CS281A/Stat241A Lecture 19

Gibbs Field & Markov Random Field

Directed and Undirected Graphical Models

Lecture 6: Graphical Models

Example: multivariate Gaussian Distribution

Undirected graphical models

Readings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008

Intelligent Systems (AI-2)

Undirected Graphical Models

CS839: Probabilistic Graphical Models. Lecture 2: Directed Graphical Models. Theo Rekatsinas

Gibbs Fields & Markov Random Fields

Directed Graphical Models or Bayesian Networks

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

Alternative Parameterizations of Markov Networks. Sargur Srihari

Alternative Parameterizations of Markov Networks. Sargur Srihari

Markov Networks. l Like Bayes Nets. l Graphical model that describes joint probability distribution using tables (AKA potentials)

Probabilistic Graphical Models

Lecture 12: May 09, Decomposable Graphs (continues from last time)

Junction Tree, BP and Variational Methods

Undirected graphical models

Lecture 9: PGM Learning

Statistical Approaches to Learning and Discovery

Lecture 17: May 29, 2002

From Bayesian Networks to Markov Networks. Sargur Srihari

Markov Networks. l Like Bayes Nets. l Graph model that describes joint probability distribution using tables (AKA potentials)

6.867 Machine learning, lecture 23 (Jaakkola)

STATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas. Markov networks: Representation

3 Undirected Graphical Models

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm

2 : Directed GMs: Bayesian Networks

Learning MN Parameters with Approximation. Sargur Srihari

Undirected Graphical Models 4 Bayesian Networks and Markov Networks. Bayesian Networks to Markov Networks

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI

BN Semantics 3 Now it s personal! Parameter Learning 1

10708 Graphical Models: Homework 2

Graphical models. Sunita Sarawagi IIT Bombay

A brief introduction to Conditional Random Fields

Undirected Graphical Models

Lecture 4 October 18th

Bayesian & Markov Networks: A unified view

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Intelligent Systems (AI-2)

Lecture 15. Probabilistic Models on Graph

Learning P-maps Param. Learning

Review: Directed Models (Bayes Nets)

Introduction to Graphical Models

Exact Inference I. Mark Peot. In this lecture we will look at issues associated with exact inference. = =

Inference in Graphical Models Variable Elimination and Message Passing Algorithm

Graphical Models for Collaborative Filtering

An Introduction to Bayesian Machine Learning

Machine Learning 4771

Lecture 8: Bayesian Networks

Intelligent Systems (AI-2)

Probabilistic Graphical Models Lecture Notes Fall 2009

Undirected Graphical Models

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Learning MN Parameters with Alternative Objective Functions. Sargur Srihari

Directed and Undirected Graphical Models

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015

Course 16:198:520: Introduction To Artificial Intelligence Lecture 9. Markov Networks. Abdeslam Boularias. Monday, October 14, 2015

Bayesian Networks Introduction to Machine Learning. Matt Gormley Lecture 24 April 9, 2018

Chris Bishop s PRML Ch. 8: Graphical Models

Probabilistic Graphical Models. Guest Lecture by Narges Razavian Machine Learning Class April

BN Semantics 3 Now it s personal!

Graphical Models and Kernel Methods

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Bayesian Networks. Motivation

Decomposable Graphical Gaussian Models

Learning in Bayesian Networks

Bayesian Networks (Part II)

Variable Elimination: Algorithm

Variable Elimination: Algorithm

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech

The Origin of Deep Learning. Lili Mou Jan, 2015

Conditional Random Field

Probabilistic Graphical Models (I)

Overlapping Communities

1 : Introduction. 1 Course Overview. 2 Notation. 3 Representing Multivariate Distributions : Probabilistic Graphical Models , Spring 2014

COMP90051 Statistical Machine Learning

Rapid Introduction to Machine Learning/ Deep Learning

Clique trees & Belief Propagation. Siamak Ravanbakhsh Winter 2018

Transcription:

Probabilistic Graphical Models Lecture 10 Undirected Models CS/CNS/EE 155 Andreas Krause

Announcements Homework 2 due this Wednesday (Nov 4) in class Project milestones due next Monday (Nov 9) About half the work should be done 4 pages of writeup, NIPS format http://nips.cc/paperinformation/stylefiles 2

Markov Networks (a.k.a., Markov Random Field, Gibbs Distribution, ) A Markov Network consists of An undirected graph, where each node represents a RV A collection of factors defined over cliques in the graph Joint probability X 1 X 2 X 4 X 3 X 5 X 6 X 7 X 8 X 9 A distribution factorizes over undirected graph G if 3

Computing Joint Probabilities Computing joint probabilities in BNs Computing joint probabilities in Markov Nets 4

Local Markov Assumption for MN X 3 X 1 X 2 X 4 X 5 X 7 X 8 X 6 The Markov Blanket MB(X) of a node X is the set of neighbors of X Local Markov Assumption: X EverythingElse MB(X) I loc (G) = set of all local independences G is called an I-map of distribution P if I loc (G) I(P) X 9 5

Factorization Theorem for Markov Nets s 4 s 1 s 2 s 3 s 6 s 5 s7 s 8 s 11 s 9 s 10 True distribution P can be represented exactly as a Markov net (G,P) s 12 I loc (G) I(P) G is an I-mapof P (independence map) 6

Factorization Theorem for Markov Nets Hammersley-Clifford Theorem s 4 s 1 s 2 s 3 s 6 s 5 s7 s 8 s 11 s 9 s 10 s 12 I loc (G) I(P) True distribution P can be represented exactly as G is an I-mapof P (independence map) and P>0 i.e., P can be represented as a Markov net (G,P) 7

Global independencies X 1 A trail X X 1 X m Y is called active for evidence E, if none of X 1,,X m E X 3 X 2 X 4 X 5 X 6 Variables X and Y are called separatedby Eif there is no active trail for Econnecting X, Y Write sep(x,y E) X 7 X 8 I(G) = {X Y E: sep(x,y E)} X 9 8

Soundness of separation Know: For positive distributions P>0 I loc (G) I(P) P factorizes over G Theorem: Soundness of separation For positive distributions P>0 I loc (G) I(P) I(G) I(P) Hence, separation captures only true independences How about I(G) = I(P)? 9

Completeness of separation Theorem: Completeness of separation I(G) = I(P) for almost all distributions P that factorize over G almost all : Except for of potential parameterizations of measure 0 (assuming no finite set have positive measure) 10

Minimal I-maps For BNs: Minimal I-map not unique E B E B J M A A E B J M J M A For MNs: For positive P, minimal I-map is unique!! 11

Do P-maps always exist? For BNs: no P-maps How about Markov Nets? 12

Exact inference in MNs Variable elimination and junction tree inference work exactly the same way! Need to construct junction trees by obtaining chordalgraph through triangulation 13

Pairwise MNs A pairwise MN is a MN where all factors are defined over single variables or pairs of variables Can reduce any MN to pairwise MN! X 1 X 2 X 4 X 3 X 5 14

Logarithmic representation Can represent any positive distribution in log domain 15

Log-linear models Feature functions φ i (D) defined over cliques Log linear model over undirected graph G Feature functions φ 1 (D 1 ),,φ k (D k ) Domains D i can overlap Set of weights w i learnt from data 16

Converting BNsto MNs C D I G S L H J Theorem: Moralized Bayesnet is minimal Markov I-map 17

Converting MNsto BNs X 1 X2 X 3 X 6 X 7 X 8 X 9 Theorem: Minimal Bayes I-map for MN must be chordal 18

So far Markov Network Representation Local/Global Markov assumptions; Separation Soundness and completeness of separation Markov Network Inference Variable elimination and Junction Tree inference work exactly as in BayesNets How about Learning Markov Nets? 19

Parameter Learning for Bayesnets 20

Algorithm for BN MLE 21

MLE for Markov Nets Log likelihood of the data 22

Log-likelihood doesn t decompose Log likelihood l(d θ) is concave function! Log Partition function log Z(θ) doesn t decompose 23

Derivative of log-likelihood 24

Derivative of log-likelihood 25

Computing the derivative Derivative C D I H G L J S Computing P(c i θ) requires inference! Can optimize using conjugate gradient etc. 26

Alternative approach: Iterative Proportional Fitting (IPF) At optimum, it must hold that Solve fixed point equation Must recompute parameters every iteration 27

Parameter learning for log-linear models Feature functions φ i (C i ) defined over cliques Log linear model over undirected graph G Feature functions φ 1 (C 1 ),,φ k (C k ) Domains C i can overlap Joint distribution How do we get weights w i? 28

Derivative of Log-likelihood 1 29

Derivative of Log-likelihood 2 30

Optimizing parameters Gradient of log-likelihood Thus, w is MLE 31

Regularization of parameters Put prior on parameters w 32

Summary: Parameter learning in MN MLE in BN is easy (score decomposes) MLE in MN requires inference (score doesn t decompose) Can optimize using gradient ascent or IPF 33

Tasks Read Koller& Friedman Chapters 20.1-20.3, 4.6.1 34