Directed and Undirected Graphical Models

Similar documents
Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Chris Bishop s PRML Ch. 8: Graphical Models

Chapter 16. Structured Probabilistic Models for Deep Learning

Introduction to Bayesian Learning

Probabilistic Graphical Models (I)

Undirected Graphical Models

3 : Representation of Undirected GM

Review: Directed Models (Bayes Nets)

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Directed Graphical Models

Machine Learning Lecture 14

Rapid Introduction to Machine Learning/ Deep Learning

Undirected Graphical Models: Markov Random Fields

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Directed Graphical Models or Bayesian Networks

Lecture 4 October 18th

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Machine Learning 4771

Probabilistic Graphical Models

Undirected Graphical Models

Conditional Independence and Factorization

An Introduction to Bayesian Machine Learning

Statistical Approaches to Learning and Discovery

Representation of undirected GM. Kayhan Batmanghelich

Undirected graphical models

Statistical Learning

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence

Graphical Models and Kernel Methods

Dynamic Approaches: The Hidden Markov Model

4.1 Notation and probability review

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

Lecture 6: Graphical Models

Rapid Introduction to Machine Learning/ Deep Learning

CSC 412 (Lecture 4): Undirected Graphical Models

COMP538: Introduction to Bayesian Networks

Directed and Undirected Graphical Models

Independencies. Undirected Graphical Models 2: Independencies. Independencies (Markov networks) Independencies (Bayesian Networks)

BN Semantics 3 Now it s personal! Parameter Learning 1

3 Undirected Graphical Models

Probabilistic Graphical Models Lecture Notes Fall 2009

Probabilistic Graphical Models

Machine Learning Summer School

2 : Directed GMs: Bayesian Networks

CS Lecture 4. Markov Random Fields

Bayesian Machine Learning - Lecture 7

Bayesian Networks to design optimal experiments. Davide De March

Variational Inference (11/04/13)

Markov Random Fields

1 Undirected Graphical Models. 2 Markov Random Fields (MRFs)

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Bayes Nets: Independence

CS 5522: Artificial Intelligence II

1. what conditional independencies are implied by the graph. 2. whether these independecies correspond to the probability distribution

Tópicos Especiais em Modelagem e Análise - Aprendizado por Máquina CPS863

Learning in Bayesian Networks

CPSC 540: Machine Learning

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech

6.867 Machine learning, lecture 23 (Jaakkola)

Graphical Models - Part II

4 : Exact Inference: Variable Elimination

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Lecture 8. Probabilistic Reasoning CS 486/686 May 25, 2006

Using Graphs to Describe Model Structure. Sargur N. Srihari

Alternative Parameterizations of Markov Networks. Sargur Srihari

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

p(x) p(x Z) = y p(y X, Z) = αp(x Y, Z)p(Y Z)

Outline. Spring It Introduction Representation. Markov Random Field. Conclusion. Conditional Independence Inference: Variable elimination

Based on slides by Richard Zemel

Probabilistic Models Bayesian Networks Markov Random Fields Inference. Graphical Models. Foundations of Data Analysis

Probabilistic Graphical Models. Guest Lecture by Narges Razavian Machine Learning Class April

Probabilistic Graphical Models

Probabilistic Graphical Models

Representation. Stefano Ermon, Aditya Grover. Stanford University. Lecture 2

Gibbs Fields & Markov Random Fields

Junction Tree, BP and Variational Methods

Artificial Intelligence Bayes Nets: Independence

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

Bayesian Networks Introduction to Machine Learning. Matt Gormley Lecture 24 April 9, 2018

Inference in Graphical Models Variable Elimination and Message Passing Algorithm

p L yi z n m x N n xi

2 : Directed GMs: Bayesian Networks

Uncertainty and Bayesian Networks

Undirected Graphical Models

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

CS Lecture 3. More Bayesian Networks

Data Mining 2018 Bayesian Networks (1)

BN Semantics 3 Now it s personal!

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

Alternative Parameterizations of Markov Networks. Sargur Srihari

Bayesian Machine Learning

Lecture 17: May 29, 2002

Bayesian Networks BY: MOHAMAD ALSABBAGH

Intelligent Systems:

CMPSCI 240: Reasoning about Uncertainty

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

Introduction to Probabilistic Graphical Models

Inference in Bayesian Networks

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Transcription:

Directed and Undirected Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2)

Last Lecture Refresher Lecture Plan Directed (Bayesian Networks) Directed Acyclic Graph (DAG) G = (V, E) Nodes v V represent random variables Shaded observed Empty un-observed Edges e E describe the conditional independence relationships Conditional Probability Tables (CPT) local to each node describe the probability distribution given its parents P(Y 1,..., Y N ) = N P(Y i pa(y i )) i=1

Plate Notation Introduction Last Lecture Refresher Lecture Plan A compact representation of replication in graphical models Boxes denote replication for a number of times denoted by the letter in the corner Shaded nodes are observed variables Empty nodes denote un-observed latent variables Black seeds (optional) identify model parameters

Introduction Last Lecture Refresher Lecture Plan A graph whose nodes (vertices) are random variables whose edges (links) represent probabilistic relationships between the variables Different classes of graphs Directed Models Undirected Models Dynamic Models Directed edges express causal relationships Undirected edges express soft constraints Structure changes to reflet dynamic processes

Directed Models - Local Markov Property A variable Y v is independent of its non-descendants given its parents and only its parents: i.e. Y v Y V \ch(v) Y pa(v) Party and Study are marginally independent Party Study However, local Markov property does not support Party Study Headache Tabs Party But Party and Tabs are independent given Headache Tabs Party Headache

Joint Probability Factorization An application of Chain rule and Local Markov Property 1 Pick a topological ordering of nodes 2 Apply chain rule following the order 3 Use the conditional independence assumptions P(PA, S, H, T, C) = P(PA) P(S PA) P(H S, PA) P(T H, S, PA) P(C T, H, S, PA) = P(PA) P(S) P(H S, PA) P(T H) P(C H)

Sampling from a Bayesian Network A BN describes a generative process for observations 1 Pick a topological ordering of nodes 2 Generate data by sampling from the local conditional probabilities following this order Generate i-th sample for each variable PA, S, H, T, C 1 pa i P(PA) 2 s i P(S) 3 h i P(H S = s i, PA = pa i ) 4 t i P(T H = h i ) 5 c i P(C H = h i )

Basic Structures of a Bayesian Network There exist 3 basic substructures that determine the conditional independence relationships in a Bayesian network Tail to tail (Common Cause) Y 2 Y 1 Y 3 Head to tail (Causal Effect) Y 1 Y 2 Y 3 Head to head (Common Effect) Y 2 Y 1 Y 3

Tail to Tail Connections Y 1 Y 1 Y 1 Y 2 Y 2 Y 2 Y 3 Y 3 Y 3 Corresponds to P(Y 1, Y 3 Y 2 ) = P(Y 1 Y 2 )P(Y 3 Y 2 ) If Y 2 is unobserved then Y 1 and Y 3 are marginally dependent Y 1 Y 3 If Y 2 is observed then Y 1 and Y 3 are conditionally independent Y 1 Y 3 Y 2 When Y 2 in observed is said to block the path from Y 1 to Y 3

Head to Tail Connections Y 1 Y 2 Y 3 Y 1 Y 2 Y 3 Y 1 Y 2 Y 3 Corresponds to P(Y 1, Y 3 Y 2 ) = P(Y 1 )P(Y 2 Y 1 )P(Y 3 Y 2 ) = P(Y 1 Y 2 )P(Y 3 Y 2 ) If Y 2 is unobserved then Y 1 and Y 3 are marginally dependent Observed Y 2 blocks the path from Y 1 to Y 3 Y 1 Y 3 If Y 2 is observed then Y 1 and Y 3 are conditionally independent Y 1 Y 3 Y 2

Head to Head Connections Y 1 Y 1 Y 2 Y 2 Y 2 Y 3 Y 3 Corresponds to P(Y 1, Y 2, Y 3 ) = P(Y 1 )P(Y 3 )P(Y 2 Y 1, Y 3 ) If Y 2 is observed then Y 1 and Y 3 are conditionally dependent Y 1 Y 3 Y 2 If Y 2 is unobserved then Y 1 and Y 3 are marginally independent Y 1 Y 3 Y 1 Y 3 If any Y 2 descendants is observed it unlocks the path

Derived Conditional Independence Relationships A Bayesian Network represents the local relationships encoded by the 3 basic structures plus the derived relationships Consider Y 1 Y 2 Y 3 Y 4 Local Markov Relationships Y 1 Y 3 Y 2 Derived Relationship Y 1 Y 4 Y 2 Y 4 Y 1, Y 2 Y 3

d-separation Introduction Definition (d-separation) Let r = Y 1... Y 2 be an undirected path between Y 1 and Y 2, then r is d-separated by Z if there exist at least one node Y c Z for which path r is blocked. In other words, d-separation holds if at least one of the following holds r contains an head-to-tail structure Y i Y c Y j (or Y i Y c Y j ) and Y c Z r contains a tail-to-tail structure Y i Y c Y j and Y c Z r contains an head-to-head structure Y i Y c Y j and neither Y c nor its descendants are in Z

Markov Blanket and d-separation Definition (Nodes d-separation) Two nodes Y i and Y j in a BN G are said to be d-separated by Z V (denoted by Dsep G (Y i, Y j Z ) if and only if all undirected paths between Y i and Y j are d-separated by Z Definition (Markov Blanket) The Markov blanket Mb(Y ) is the minimal set of nodes which d-separates a node Y from all other nodes (i.e. it makes Y conditionally independent of all other nodes in the BN) Mb(Y ) = {pa(y ), ch(y ), pa(ch(y ))}

Are Directed Models Enough? Bayesian Networks are used to model asymmetric dependencies (e.g. causal) What if we want to model symmetric dependencies Bidirectional effects, e.g. spatial dependencies Need undirected approaches Directed models cannot represent some (bidirectional) dependencies in the distributions What if we want to represent Y 1 Y 3 Y 2, Y 4? Y 2 What if we also want Y 2 Y 4 Y 1, Y 3? Y 1 Y 4 Y 3 Cannot be done in BN! Need undirected model

Markov Random Fields Undirected graph G = (V, E) (a.k.a. Markov Networks) Nodes v V represent random variables X v Shaded observed Empty un-observed Edges e E describe bi-directional dependencies between variables (constraints) Often arranged in a structure that is coherent with the data/constraint we want to model

Image Processing Introduction Often used in image processing to impose spatial constraints (e.g.smoothness) Image de-noising example Lattice Markov Network (Ising model) Y i observed value of the noisy pixel X i unknown (unobserved) noise-free pixel value Can use more expressive structures Complexity of inference and learning can become relevant

Conditional Independence What is the undirected equivalent of d-separation in directed models? A B C Again it is based on node separation, although it is way simpler! Node subsets A, B V are conditionally independent given C V \ {A, B} if all paths between nodes in A and B pass through at least one of the nodes in C The Markov Blanket of a node includes all and only its neighbors

Joint Probability Factorization What is the undirected equivalent of conditional probability factorization in directed models? We seek a product of functions defined over a set of nodes associated with some local property of the graph Markov blanket tells that nodes that are not neighbors are conditionally independent given the remainder of the nodes P(X v, X i X V\{v,i} ) = P(X v X V\{v,i} )P(X i X V\{v,i} ) Factorization should be chosen in such a way that nodes X v and X i are not in the same factor What is a well-known graph structure that includes only nodes that are pairwise connected?

Cliques Introduction Definition (Clique) A subset of nodes C in graph G such that G contains an edge between all pair of nodes in C Definition (Maximal Clique) A clique C that cannot include any further node from the graph without ceasing to be a clique

Maximal Clique Factorization Define X = X 1,..., X N as the RVs associated to the N nodes in the undirected graph G P(X) = 1 ψ(x Z C ) X C RV associated with nodes in the maximal clique C ψ(x C ) potential function over the maximal cliques C Z partition function ensuring normalization Z = ψ(x C ) X Partition function is the computational bottleneck of undirected modes: e.g. O(K N ) for N discrete RV with K distinct values C C

Potential Functions Introduction Potential functions ψ(x C ) are not probabilities! Express which configurations of the local variables are preferred If we restrict to strictly positive potential functions, the Hammersley-Clifford theorem provides guarantees on the distribution that can be represented by the clique factorization Definition (Boltzmann distribution) A convenient and widely used strictly positive representation of the potential functions is ψ(x C ) = exp { E(X C )} where E(X C ) is called energy function

Models Long story short

From Directed To Undirected Straightforward in some cases Requires a little bit of thinking for v-structures Moralization a.k.a. marrying of the parents

The BN Structure Learning Problem Structure Learning in Bayesian Networks Wrap-up and Next Lecture Y 2 Y 1 Y 4 Y 3 Y 5 Y 6 Y 1 Y 2 Y 3 Y 4 Y 5 Y 6 1 2 1 0 3 4 4 0 0 0 1 2.................................... 0 0 1 3 2 1 Observations are given for a set of fixed random variables But the structure of the Bayesian Network is not specified How do we determine which arcs exist in the network (causal relationships)? Determining causal relationships between variables entails Deciding on arc presence Directing edges

Structure Finding Approaches Structure Learning in Bayesian Networks Wrap-up and Next Lecture Search and Score Model selection approach Search in the space of the graphs Constraint Based Use tests of conditional independence Constrain the network Hybrid Model selection of constrained structures

Constraint-based Models Outline Structure Learning in Bayesian Networks Wrap-up and Next Lecture Tests of conditional independence I(X i, X j Z ) determine edge presence (network skeleton) Estimate mutual information MI(X i, X j Z ) and assume conditional independence if MI is below a threshold, e.g. I(X i, X j Z ) = MI(X i, X j Z ) < α cut Testing order is the fundamental choice for avoiding super-exponential complexity Level-wise testing: tests I(X i, X j Z ) are performed in order of increasing size of the conditioning set Z (PC algorithm by Spirtes, 1995) Nodes that enter Z are chosen in the neighborhood of X i and X j Markovian dependencies determine edge orientation (DAG) Deterministic rules based on the 3 basic substructures seen previously

PC Algorithm Skeleton Identification Structure Learning in Bayesian Networks Wrap-up and Next Lecture 1 Initialize a fully connected graph G = (V, E) 2 for each edge (Y i, Y j ) V if I(Y i, Y j ) then prune (Y i, Y j ) 3 K 1 4 for each test of order K = Z for each edge (Y i, Y j ) V Z set of conditioning sets of K -th order for Y i, Y j if I(Y i, Y j z) for any z Z then prune (Y i, Y j ) K K + 1 5 return G

PC Algorithm Order 0 Tests Introduction Structure Learning in Bayesian Networks Wrap-up and Next Lecture Y 2 Y 1 Y 4 Y 3 Y 1 Y 2 Y 3 Y 4 Y 5 Y 6 1 2 1 0 3 4 4 0 0 0 1 2.................................... 0 0 1 3 2 1 Y 5 Y 6 Step 1 Initialize Step 2 Check unconditional independence I(Y i, Y j ) Step 3 Repeat unconditional tests for all edges

PC Algorithm Order 1 Tests Introduction Structure Learning in Bayesian Networks Wrap-up and Next Lecture Y 2 Y 1 Y 4 Y 3 Y 2 Y 3 Y 6 Y 5 Y 6 Y 1 Y 2 Y 3 Y 4 Y 5 Y 6 1 2 1 0 3 4 4 0 0 0 1 2.................................... 0 0 1 3 2 1 Step 4 Select an edge (Y i, Y j ) Step 5 Add the neighbors to the conditioning set Z Step 6 Check independence for each z Z Step 7 Iterate until convergence

Take Home Messages Structure Learning in Bayesian Networks Wrap-up and Next Lecture Directed graphical models Represent asymmetric (causal) relationships between variables and provide a compact representation of conditional probabilities Difficult to assess conditional independence relationships (v-structures) Straightforward to incorporate prior knowledge and to interpret Undirected graphical models Represent bi-directional relationships between variables (e.g. constraints) Factorization in terms of generic potential functions which, however, are typically not probabilities Easy to assess conditional independence, but difficult to interpret the encoded knowledge Serious computational issues associated with computation of normalization factor

Next Lecture Introduction Structure Learning in Bayesian Networks Wrap-up and Next Lecture Inference in Exact inference Inference on a chain Inference in tree-structured models Sum-product algorithm Elements of approximate inference Variational algorithms Sampling-based methods