Probabilistic Graphical Models (I)

Similar documents
Directed Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Directed and Undirected Graphical Models

Machine Learning Lecture 14

Conditional Independence

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Bayesian Machine Learning

Rapid Introduction to Machine Learning/ Deep Learning

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence

Machine Learning 4771

Lecture 17: May 29, 2002

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Graphical Models and Kernel Methods

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

Undirected Graphical Models

An Introduction to Bayesian Machine Learning

Inference in Graphical Models Variable Elimination and Message Passing Algorithm

{ p if x = 1 1 p if x = 0

Statistical Approaches to Learning and Discovery

Probabilistic Graphical Networks: Definitions and Basic Results

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

p L yi z n m x N n xi

Probabilistic Graphical Models

Lecture 4 October 18th

Graphical Models. Andrea Passerini Statistical relational learning. Graphical Models

Chapter 16. Structured Probabilistic Models for Deep Learning

Inference in Bayesian Networks

CS Lecture 4. Markov Random Fields

Probabilistic Graphical Models

Tópicos Especiais em Modelagem e Análise - Aprendizado por Máquina CPS863

Graphical Models - Part II

6.867 Machine learning, lecture 23 (Jaakkola)

Graphical Models 359

Lecture 8: Bayesian Networks

Junction Tree, BP and Variational Methods

STA 4273H: Statistical Machine Learning

Graphical Models - Part I

Lecture 6: Graphical Models

Statistical Learning

Directed and Undirected Graphical Models

CS Lecture 3. More Bayesian Networks

Review: Directed Models (Bayes Nets)

Implementing Machine Reasoning using Bayesian Network in Big Data Analytics

9 Forward-backward algorithm, sum-product on factor graphs

Undirected Graphical Models: Markov Random Fields

Data Mining 2018 Bayesian Networks (1)

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

The Monte Carlo Method: Bayesian Networks

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

p(x) p(x Z) = y p(y X, Z) = αp(x Y, Z)p(Y Z)

Bayesian Approaches Data Mining Selected Technique

CSC 412 (Lecture 4): Undirected Graphical Models

Probabilistic Classification

Bayesian Networks BY: MOHAMAD ALSABBAGH

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Inference as Optimization

STA 4273H: Statistical Machine Learning

Graphical Models and Independence Models

Probabilistic Models Bayesian Networks Markov Random Fields Inference. Graphical Models. Foundations of Data Analysis

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

Probabilistic Graphical Models: Representation and Inference

1. what conditional independencies are implied by the graph. 2. whether these independecies correspond to the probability distribution

Intelligent Systems:

4.1 Notation and probability review

Bayesian Networks to design optimal experiments. Davide De March

Introduction to Bayesian Learning

Bayes Networks 6.872/HST.950

Review: Bayesian learning and inference

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

Bayesian Networks. Motivation

Bayesian Networks: Representation, Variable Elimination

Pattern Recognition and Machine Learning

Naïve Bayes classification

Probabilistic Graphical Models

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Preliminaries Bayesian Networks Graphoid Axioms d-separation Wrap-up. Bayesian Networks. Brandon Malone

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech

Directed Graphical Models or Bayesian Networks

Introduction to Probabilistic Graphical Models

Conditional Independence and Factorization

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm

Bayesian Inference. Definitions from Probability: Naive Bayes Classifiers: Advantages and Disadvantages of Naive Bayes Classifiers:

Tópicos Especiais em Modelagem e Análise - Aprendizado por Máquina CPS863

Outline. Spring It Introduction Representation. Markov Random Field. Conclusion. Conditional Independence Inference: Variable elimination

Undirected Graphical Models

PMR Learning as Inference

3 : Representation of Undirected GM

Based on slides by Richard Zemel

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

Undirected graphical models

Belief Update in CLG Bayesian Networks With Lazy Propagation

Computational Genomics

STA 4273H: Statistical Machine Learning

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

CPSC 540: Machine Learning

Message Passing and Junction Tree Algorithms. Kayhan Batmanghelich

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Transcription:

Probabilistic Graphical Models (I) Hongxin Zhang zhx@cad.zju.edu.cn State Key Lab of CAD&CG, ZJU 2015-03-31

Probabilistic Graphical Models Modeling many real-world problems => a large number of random variables Dependences among variables may be used to reduce the size to encode the model (PCA?), or They may be the goal by themselves, that is, the idea is to understand the correlations among variables.

Modeling the domain Discrete random variables Take 5 random binary variables (A, B, C, D, E) i.i.d. data from a multinomial distribution A B C D E a ~b ~c ~d ~e a b ~c d ~e a ~b c d ~e

Goals (Parameter) Learning: using training data, estimate the joint distribution Which are the values P(A, B, C, D, E),?... and if there were one hundred binary variables? (Size of model certainly greater than number of atoms on Earth!) Inference: Given the distribution P(A, B, C, D, E), Belief updating: compute the probability of an event What is the probability of A=a given E=e? Maximum a posterior: compute the states of variables that maximize their probability. Which state of A maximizes P(A E=e)? Is it a or ~a?

The unstructured approach To specify the joint distribution, there is an exponential number of values: We can compute the probability of events by: There are exponentially many terms in the summations...

The naïve Bayesian approach p(a, b) = p(a) p( b) Application: Email spanning

Bayesian Networks An arbitrary joint distribution p(a, b, c) over three variables a, b, and c the product rule of probability: p(a, b, c) = p(c a, b) p( a, b) = p(c a, b) p(b a) p(a) General case: p(x 1, x 2,, x K )

Not fully connected graph Joint distribution: p(x 1, x 2,, x 7 )

General form For a graph with K nodes, the joint distribution is given by: where pa k denotes the set of parents of x k, and x = {x 1,..., x K }

Definitions A set of variables associated with nodes of a Directed Acyclic Graph (DAG). Markov condition (w.r.t. the DAG): each variable is independent of its non-descendants given its parents. For each variable (node), local probability distributions: P(A), P(B A=a), P(B A=a), P(C A=a), P(C A=~a), P(D b, c), P(D ~b, c), P(D ~b,c); P(D ~b,~c), P(E c), P(E ~c), All these values are precise.

Regression revisit: Polynomial Curve Fitting h ( x ) h0 ( x) h1 ( x) hm ( x) w w w w 0 1 M t t t t 1 2 n t h( x) w N 0 1 1 2 2 N N j j j 0 t ( x, w ) w w h ( x ) w h ( x )... w h ( x ) w h ( x ) 1 w ( H H ) H t Normal equation

Example: Polynomial regression N 0 1 1 2 2 N N j j j 0 t ( x, w ) w w h ( x ) w h ( x )... w h ( x ) w h ( x )

Example: Polynomial regression N 0 1 1 2 2 N N j j j 0 t ( x, w ) w w h ( x ) w h ( x )... w h ( x ) w h ( x )

Example: Polynomial regression N 0 1 1 2 2 N N j j j 0 t ( x, w ) w w h ( x ) w h ( x )... w h ( x ) w h ( x ) the noise variance σ2, and the hyperparameter α representing the precision of the Gaussian prior over w

Linear-Gaussian models Consider an arbitrary DAG over D variables in which node i represents a single continuous random variable x i having a Gaussian distribution The mean of this distribution is taken to be a linear combination of the states of its parent nodes pa i of node i

Linear-Gaussian models

Linear-Gaussian models

Linear-Gaussian models Case 1: no links in the graph The joint distribution: 2D parameters and represents D independent univariate Gaussian distributions. Case 2: fully connected graph D(D-1)/2+D independent parameters Case 3:

Conditional independence Three random variables: a, b and c a is conditionally independent of b given c iff P( a b, c) = P( a c ) This can be re-written in following way a b c P( a, b c) = P( a b, c ) P ( b c) = P( a c ) P ( b c) The joint distribution of a and b factorizes into the product of the marginal distribution of a and ~b.

c Simple example (1) Joint distribution: P( a, b, c ) = P( a c ) P( b c ) P( c ) a b Condition on c: P( a, b c ) = P( a, b, c ) / P( c ) = P( a c ) P( b c ) => a b c

c Simple example (2) Joint distribution: P( a, b, c ) = P( a ) P( c a ) P( b c ) Factorization: P ( a, b ) P ( a, b, c ) P ( a ) P ( c a ) P ( b c ) c c P ( a ) P ( b a ) Condition on c: a b a b c P ( a, b, c ) P ( a ) P ( c a ) P ( a, b c ) P ( b c ) P ( c ) P ( c ) P ( a c ) P ( b c ) Bayesian Theorem

c Simple example (3) Joint distribution: P( a, b, c ) = P( a ) P( b ) P( c a, b ) Factorization: P ( a, b ) P ( a, b, c ) P ( a ) P ( b ) P ( c a, b ) P ( a ) P ( b ) Condition on c: c c a b a b c P ( a, b c ) P ( a, b, c ) P ( a ) P ( b ) P ( c a, b ) P ( c ) P ( c ) P ( a c ) P ( b c )

Conditional independence Tail-to-Tail: yes c Head-to-Tail: yes a c b Head-to-Head: no a a c b b

Markov condition We say that node y is a descendant of node x if there is a path from x to y in which each step of the path follows the directions of the arrows. If each variable is independent of its non-descendants given its parents, then:

D-separation All possible paths from any node in A to any node in B. Any such path is said to be blocked if it includes a node such that either the arrows on the path meet either head-to-tail or tail-to-tail at the node, and the node is in the set C, or the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, is in the set C If all paths are blocked, then A is said to be d-separated from B by C.

D-separation In graph (a), the path from a to b is not blocked by node c In graph (b), the path from a to b is blocked by node f and e

D-separation A particular directed graph represents a specific decomposition of a joint probability distribution into a product of conditional probabilities A directed graph is a filter

Markov blanket Joint distribution p(x 1,..., x D ) represented by a directed graph having D nodes conditional distribution The set of nodes comprising the parents, the children and the co-parents is called the Markov blanket

Markov Random Fields Also known as a Markov network or an undirected graphical model Conditional independence properties: Conditional dependence exists if there exists a path that connects any node in A to any node in B. If there are no such paths, then the conditional independence property must hold.

Clique A subset of the nodes in a graph such that there exists a link between all pairs of nodes in the subset In other words, the set of nodes in a clique is fully connected Maximal clique A four-node undirected graph showing a clique (outlined in green) and a maximal clique (outlined in blue)

Potential function x C : the set of variables in that clique C The joint distribution is written as a product of potential functions ψ C (x C ) over the maximal cliques of the graph The quantity Z, called the partition function, is a normalization constant Potential functions ψ C (x C ) are strictly positive. Possible choice

Image de-noising

Relation to directed graphs Joint distribution: Directed: Undirected:

Relation to directed graphs

Relation to directed graphs this process of marrying the parents has become known as moralization, and the resulting undirected graph, after dropping the arrows, is called the moral graph.

Inference in Graphical Models

Inference on a chain

Inference on a chain

Inference on a chain Passing of local messages around on the graph

Inference on a chain Passing of local messages around on the graph

Inference on a chain Passing of local messages around on the graph

Inference on a chain Passing of local messages around on the graph

Tree

Factor graph the joint distribution over a set of variables in the form of a product of factors where x s denotes a subset of the variables

Factor graph

Factor graph an undirected graph => a factor graph create variable nodes corresponding to the nodes in the original undirected graph create additional factor nodes corresponding to the maximal cliques x s Multiple choices of f g

Factor graph (a) An undirected graph with a single clique potential ψ(x1, x2, x3). (b) A factor graph with factor f(x1, x2, x3) = ψ(x1, x2, x3) representing the same distribution as the undirected graph. (c) A different factor graph representing the same distribution, whose factors satisfy fa(x1, x2, x3)fb(x1, x2) = ψ(x1, x2, x3).

The sum-product algorithm The problem of finding the marginal p(x) for particular variable node x

The sum-product algorithm The problem of finding the marginal p(x) for particular variable node x

The sum-product algorithm The problem of finding the marginal p(x) for particular variable node x

The sum-product algorithm The problem of finding the marginal p(x) for particular variable node x

The sum-product algorithm The problem of finding the marginal p(x) for particular variable node x

Junction tree algorithm deal with graphs having loops Algorithm: directed graph => undirected graph (moralization) The graph is triangulated join tree Junction tree a two-stage message passing algorithm, essentially equivalent to the sum-product algorithm

Graph inference example Computer-Generated Residential Building Layouts [SIG ASIA 2010]

The End 新浪微博 :@ 浙大张宏鑫 微信公众号 :