Probabilistic Graphical Models

Similar documents
6.867 Machine learning, lecture 23 (Jaakkola)

Probabilistic Graphical Models (I)

Probabilistic Graphical Models

Machine Learning 4771

Chris Bishop s PRML Ch. 8: Graphical Models

Statistical Approaches to Learning and Discovery

Outline. Spring It Introduction Representation. Markov Random Field. Conclusion. Conditional Independence Inference: Variable elimination

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Mixtures of Gaussians. Sargur Srihari

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

CS Lecture 4. Markov Random Fields

9 Forward-backward algorithm, sum-product on factor graphs

Probabilistic Graphical Models Lecture Notes Fall 2009

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Chapter 16. Structured Probabilistic Models for Deep Learning

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

Probabilistic Graphical Models

Machine Learning Lecture 14

Linear Dynamical Systems

Mobile Robot Localization

Probabilistic Graphical Models

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm

Statistical Learning

Directed and Undirected Graphical Models

1 Undirected Graphical Models. 2 Markov Random Fields (MRFs)

Bayesian Approach 2. CSC412 Probabilistic Learning & Reasoning

p L yi z n m x N n xi

10-701/15-781, Machine Learning: Homework 4

Representation of undirected GM. Kayhan Batmanghelich

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Machine Learning Basics III

Review: Directed Models (Bayes Nets)

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Graphical Models and Kernel Methods

Lecture 15. Probabilistic Models on Graph

Inference in Bayesian Networks

Lecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan

Undirected Graphical Models

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Undirected Graphical Models

Lecture 4 October 18th

4 : Exact Inference: Variable Elimination

Message Passing and Junction Tree Algorithms. Kayhan Batmanghelich

Data Mining 2018 Bayesian Networks (1)

MACHINE LEARNING ADVANCED MACHINE LEARNING

Machine Learning Summer School

CS Lecture 18. Expectation Maximization

Introduction to Probabilistic Graphical Models

Unsupervised Learning

3 : Representation of Undirected GM

MACHINE LEARNING ADVANCED MACHINE LEARNING

STA 4273H: Statistical Machine Learning

Variational Inference (11/04/13)

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

Exact Inference I. Mark Peot. In this lecture we will look at issues associated with exact inference. = =

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Gaussian Process Vine Copulas for Multivariate Dependence

Gibbs Fields & Markov Random Fields

Bayesian Learning in Undirected Graphical Models

Rapid Introduction to Machine Learning/ Deep Learning

ON CONVERGENCE PROPERTIES OF MESSAGE-PASSING ESTIMATION ALGORITHMS. Justin Dauwels

Alternative Parameterizations of Markov Networks. Sargur Srihari

Graphical Models. Andrea Passerini Statistical relational learning. Graphical Models

State Space and Hidden Markov Models

5. Sum-product algorithm

Undirected Graphical Models: Markov Random Fields

Variable Elimination: Algorithm

1. what conditional independencies are implied by the graph. 2. whether these independecies correspond to the probability distribution

Inference as Optimization

Variable Elimination: Algorithm

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Graphical Models - Part II

Variational Bayes. A key quantity in Bayesian inference is the marginal likelihood of a set of data D given a model M

Variational algorithms for marginal MAP

Latent Variable View of EM. Sargur Srihari

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Particle Methods as Message Passing

Graphical Model Inference with Perfect Graphs

Graphical Models and Independence Models

Probabilistic Graphical Networks: Definitions and Basic Results

Lecture 9: PGM Learning

STA 4273H: Statistical Machine Learning

Color Scheme. swright/pcmi/ M. Figueiredo and S. Wright () Inference and Optimization PCMI, July / 14

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

Intelligent Systems:

Approximate inference, Sampling & Variational inference Fall Cours 9 November 25

Probabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013

1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM.

Introduction to Graphical Models

Approximate Message Passing with Built-in Parameter Estimation for Sparse Signal Recovery

13 : Variational Inference: Loopy Belief Propagation and Mean Field

STATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas. Markov networks: Representation

CSC 412 (Lecture 4): Undirected Graphical Models

Bayesian Machine Learning

Probabilistic Graphical Models

Lecture 3: Pattern Classification. Pattern classification

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

2 : Directed GMs: Bayesian Networks

Transcription:

Probabilistic Graphical Models Lecture Notes Fall 2009 November, 2009 Byoung-Ta Zhang School of Computer Science and Engineering & Cognitive Science, Brain Science, and Bioinformatics Seoul National University http://bi.snu.ac.r/~btzhang/ Chapter 7. Latent Variable Models 7. Factor Graphs Directed vs. Undirected Graphs Both graphical models Specify a factorization (how to epress the joint distribution) Define a set of conditional independence properties Fig. 7-. Parent-child local conditional distribution

Fig. 7-2. Maimal clique potential function Relation of Directed and Indirected Graphs Converting a directed graph to an undirected graph Case : straight line In this case, the partition function Z = Case 2: general case. Moralization = marrying the parents Add additional undirected lins between all pairs of parents Drop the arrows Results in the moral graph Fully connected no conditional independence properties, in contrast to the original directed graph We should add the fewest etra lins to retain the maimum number of independence properties

Factor Graphs A factor graph is a bipartite graph representing a joint distribution in the form of a product of factors. Factors in directed/undirected graphs Introducing additional nodes for the factors themselves Eplicit decomposition /factorization ψ,, ψψ,, ψ,, (Factor graphs are bipartite) Definition. Given a factorization of a function, where, the corresponding factor graph G = (X, F, E) consists of variable vertices, factor vertices, and edges E. The edges depend on the factorization as follows: there is an undirectedd edge between factor verte f j and variable verte X when. The function is assumed to be real-valued, i.e.. Factor graphs can be combined with message passing algorithms to efficiently compute certain characteristics of the function, such as the marginals. Eample. Consider a function that factorizes as follows: g(x,x 2,X 3 ) = f (X )f 2 (X,XX 2 )f 3 (X,X 2 )f 4 (XX 2,X 3 ), with a corresponding factor graph

This factor graph has a cycle. If we merge f 2 (X,X 2 )f 3 (X,X 2 ) into a single factor, the resulting factor graph will be a tree. This is an important distinction, as message passing algorithms are usually eact for trees, but only approimate for graphs with cycles. Inferences on factor graphs. Sum-product algorithm: evaluating local marginals over nodes or subsets of nodes p( ) = p( ) \ p( ) = Fs (, Xs ) s Ne( ) p ( ) = p( ) \ = Fs(, Xs) \ s Ne( ) = Fs(, Xs) s Ne( ) X s = μ ( ) s Ne( ) fs The messages ( ) μ from the factor node fs to the variable node are computed in the fs vertices and passed along the edges. Ma-sum algorithm: finding the most probable state The Hammersley Clifford theorem shows that other probabilistic models such as Marov networs and Bayesian networs can be represented as factor graphs. Factor graph representation is frequently used when performing inference over such networs using belief propagation. On the other hand, Bayesian networs are more naturally suited for generative models, as they can directly represent the causalities of the model. Properties of Factor Graphs

Converting directed and undirected graphs into factor graphs undirected graph factor graph Note: For a given fully connected undirected graph, two (or more) different factor graphs are possible. Factor graphs are more specific than the undirected graphs. directed graph factor graph For the same directed graph, two or more factor graphs are possible. There can be multiple factor graphs all of which correspond to the same undirected/directed graph Converting a directed/undirected tree to a factor graph The result is again a tree (no loops, one and only one path connecting any two nodes) Converting a directed polytree to a factor graph The results in a tree. Cf. Converting a directed polytree into an undirected graph results in loops due to the moralization step.

Polytree Undirected graph (moral graph) Factor graph Local cycles in a directed graph can be removed on conversion to a factor graph Factor graphs are more specific about the precise form of the factorization For a fully connected undirected graph, two (or more) factor graphs are possible. Directed and undirected graphs can epress different conditional independence properties D: Directed graph Undirected graph

7.2 Probabilistic Latent Semantic Analysis Latent Variable Models Latent variables 4 Variables that are not directly observed but are rather inferred from other variables that are observed and directly measured Latent variable models 4 Eplain the statistical properties of the observed variables in terms of the latent variables General formulation p ( ) = p( z, ) dz= p( z) p( z) dz PLSA

7.3 Gaussian Miture Models Graphical representation of a miture model A binary random variable z having a -of- representation Gaussian miture distribution can be written as a linear superposition of Gaussians ( ) π N = p = ( μ, Σ ) An equivalent formulation of the Gaussian miture involving an eplicit latent variable p( z) = = π z p( z = ) = N( μ, Σ ) p( z) = N( μ, Σ ) = z

z p( ) = p(, z) = p( z) p( z) = π N( μ, Σ) z z z = = = π N( μ, Σ ) = pz ( = ) = π = π = The marginal distribution of is a Gaussian miture of the form (*) for every observed data point there is a corresponding latent variable z n, n p( )= p(,z ) γ ( z ) p z = = = z ( ) p( z = ) p( z = ) j= j= ( j = ) ( j = ) p z p z π N( μ, Σ) π N( μ, Σ ) j j j γ(z ) can also be viewed as the responsibility that component taes for eplaining the observation Generating random samples distributed according to the Gaussian miture model Generating a value for z, which denoted as from the marginal distribution p(z) and then generate a value for from the conditional distribution z a. The three states of z, corresponding to the three components of the miture, are depicted in red, green, blue

b. The corresponding samples from the marginal distribution p() c. The same samples in which the colors represent the value of the responsibilities γ(z ) associated with n data point Illustrating the responsibilities by evaluating the posterior probability for each component in the miture distribution which this data set was generated Distribution Graphical representation of a Gaussian miture model for a set of N i.i.d. data points { }, with n corresponding latent points {z } n Data set: X (N D matri) with n-th row T n Latent variables: Z (N matri) with rows T z n The log of the lielihood function ( X Σ) = N ln p πμ,, ln{ π N( μ, Σ )} n n= = 7.4 Learning Gaussian Mitures by EM The Gaussian miture models can be learned by the epectation-maimization (EM) algorithm. Repeat Epectation step: calculate posterior or responsibilities using the current parameters Maimization step: re-estimate the parameters based on the responsibilities Given a Gaussian miture model, the goal is to maimize the lielihood function with respect to the parameters. Initialize the means μ, covariance Σ and miing coefficients π 2. E-step: evaluate the posterior probabilities or responsibilities using the current value for the parameters γ ( z ) n j= ( μ, ) π N n = Κ π N ( μ, ) j n j j 3. M-step: re-estimate the means, covariances, and miing coefficients using the result of E-step.

μ = N γ ( z ) new n n N n= new N γ new new n n n N n= new N π = N N N = γ n= ( z )( μ )( μ ) T ( z ) n 4. Evaluate the log lielihood N ln p( X μ, Σ, π) = ln πn( n μ, Σ ) n= = If converged, terminate; otherwise, go to Step 2. The General EM Algorithm In maimizing the log lielihood function p( ) = { ΣZ p( )} ln X θ ln X, Z θ, the summation prevents the logarithm from acting directly on the joint distribution Instead, the log lielihood function for the complete data set {X, Z} is straightforward. In practice since we are not given the complete data set, we consider instead its epected value Q under the posterior distribution p( Z X, Θ) of the latent variable General EM Algorithm. Choose an initial setting for the parameters old 2. E step Evaluate p ( X Z, Θ ) 3. M step Evaluate new Θ given by old θ

Θ new old = arg ma Θ Q( Θ, Θ ) old old ( ΘΘ, ) =Σ Z ( Z X, Θ ) ln ( X,Z Θ) Q p p 4. It the covariance criterion is not satisfied, old new then let Θ Θ and return to Step 2. The EM algorithm can also be used for fining MAP (maimum a posteriori) using the modified M-step old Q( θ, θ ) + ln p( θ)