Outline. Spring It Introduction Representation. Markov Random Field. Conclusion. Conditional Independence Inference: Variable elimination

Similar documents
A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004

Bayesian belief networks. Inference.

Introduction to Probabilistic Graphical Models

Bayesian Networks BY: MOHAMAD ALSABBAGH

Statistical Approaches to Learning and Discovery

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Graphical Models and Kernel Methods

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah

Probabilistic Graphical Models

Directed and Undirected Graphical Models

Directed and Undirected Graphical Models

Probabilistic Graphical Models (I)

Machine Learning Lecture 14

Soft Computing. Lecture Notes on Machine Learning. Matteo Matteucci.

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

Rapid Introduction to Machine Learning/ Deep Learning

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah

Review: Bayesian learning and inference

State Space and Hidden Markov Models

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Implementing Machine Reasoning using Bayesian Network in Big Data Analytics

Bayesian belief networks

Junction Tree, BP and Variational Methods

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

Lecture 6: Graphical Models

Inference in Graphical Models Variable Elimination and Message Passing Algorithm

Learning in Bayesian Networks

Probability. CS 3793/5233 Artificial Intelligence Probability 1

Mobile Robot Localization

Lecture 8: Bayesian Networks

Probabilistic Graphical Models. Guest Lecture by Narges Razavian Machine Learning Class April

Directed Graphical Models

2 : Directed GMs: Bayesian Networks

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

CS Lecture 4. Markov Random Fields

4 : Exact Inference: Variable Elimination

Undirected Graphical Models

p L yi z n m x N n xi

6.867 Machine learning, lecture 23 (Jaakkola)

Lecture 15. Probabilistic Models on Graph

9 Forward-backward algorithm, sum-product on factor graphs

3 : Representation of Undirected GM

Representation of undirected GM. Kayhan Batmanghelich

Density Propagation for Continuous Temporal Chains Generative and Discriminative Models

Chris Bishop s PRML Ch. 8: Graphical Models

CS 343: Artificial Intelligence

Bayesian Machine Learning - Lecture 7

COMP538: Introduction to Bayesian Networks

CS 188: Artificial Intelligence. Bayes Nets

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Probabilistic Graphical Models

Bayes Nets III: Inference

Bayesian Networks. Motivation

STA 4273H: Statistical Machine Learning

PROBABILISTIC REASONING SYSTEMS

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

Undirected Graphical Models: Markov Random Fields

Probabilistic Graphical Networks: Definitions and Basic Results

Probabilistic Classification

Bayes Nets: Independence

CS 5522: Artificial Intelligence II

Bayesian Networks aka belief networks, probabilistic networks. Bayesian Networks aka belief networks, probabilistic networks. An Example Bayes Net

Introduction to Bayesian Learning

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

Lecture 9: PGM Learning

Inference in Bayesian Networks

Introduction to Probabilistic Graphical Models

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

CS Lecture 3. More Bayesian Networks

CPSC 540: Machine Learning

Chapter 16. Structured Probabilistic Models for Deep Learning

Tópicos Especiais em Modelagem e Análise - Aprendizado por Máquina CPS863

1 Undirected Graphical Models. 2 Markov Random Fields (MRFs)

Bayesian Networks Inference with Probabilistic Graphical Models

Lecture 18 Generalized Belief Propagation and Free Energy Approximations

Probabilistic Graphical Models

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

Machine Learning for Data Science (CS4786) Lecture 24

An Introduction to Bayesian Machine Learning

Directed Graphical Models

Objectives. Probabilistic Reasoning Systems. Outline. Independence. Conditional independence. Conditional independence II.

Undirected graphical models

Approximate Inference

Learning With Bayesian Networks. Markus Kalisch ETH Zürich

Directed Graphical Models or Bayesian Networks

CS6220: DATA MINING TECHNIQUES

Lecture 6: Graphical Models: Learning

CS6220: DATA MINING TECHNIQUES

UNIVERSITY OF CALIFORNIA, IRVINE. Fixing and Extending the Multiplicative Approximation Scheme THESIS

Bayesian Networks to design optimal experiments. Davide De March

Probabilistic Reasoning Systems

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech

Graphical models. Sunita Sarawagi IIT Bombay

12 : Variational Inference I

Message Passing and Junction Tree Algorithms. Kayhan Batmanghelich

Transcription:

Probabilistic Graphical Models COMP 790-90 Seminar Spring 2011 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Outline It Introduction ti Representation Bayesian network Conditional Independence Inference: Variable elimination Learning Markov Random Field Clique Pi Pair-wise i MRF Inference: Belief Propagation Conclusion 2

Introduction Graphical Model: Probability Theory + Graph Theory Probability theory: ensures consistency, provides interface models to data. Graph theory: intuitively appealing interface for humans, efficient general purpose p algorithms. 3 Introduction Modularity: a comple system is built by combining simpler parts. Provides a natural tool for two problems: Uncertainty and Compleity Plays an important role in the design and analysis of machine learning algorithms 4

Introduction Many of the classical multivariate probabilistic systems are special cases of the general graphical model formalism: Miture models Factor analysis Hidden Markov Models Kalman filters The graphical model framework provides a way to view all of these systems as instances of common underlying formalism. Techniques that have been developed in one field can be transferred to other fields A framework for the design of new system 5 Representation A graphical model represent probabilistic relationships between a set of random variables. Variables are represented by nodes: Binary events, Discrete variables, Continuous variables Conditional (in)dependency is represented tdb by (absence (b of) edges. Directed Graphical Model: (Bayesian network) Undirected Graphical Model: (Markov Random Field) 6

Outline Introduction Representation Bayesian network Conditional Independence Inference: Variable elimination Learning Markov Random Field Clique Pair-wise MRF Inference: Belief Propagation Conclusion 7 Bayesian Network Directed acyclic graphs (DAG). Directed edges give causality relationships between variables For each variable X and parents pa(x) eists a conditional probability bilit --X pa(x)) (X)) Discrete Variables: Conditional Probability Table(CPT) Description of a noisy causal process Parents 8

A Eample: What Causes Grass Wet? 9 More Comple Eample Diagnose the engine start problem 10

More Comple Eample Computer-based Patient Case Simulation system (CPCS-PM) developed by Parker and Miller 422 nodes and 867 arcs: 14 nodes describe diseases, 33 nodes describe history and risk factors, and the remaining 375 nodes describe various findings related to the diseases 11 Joint Distribution X 1, X n ) If the variables are binary, we need O(2 n ) parameters to describe P For the wet grass eample, need 2^4-1=15 parameters Can we do better? Key idea: use properties of independence. 12

Independent Random Variables X is independent of Y iff P ( X Y y ) P ( X ) for all values,y If X and Y are independent then X Y ) P ( X Y ) P ( Y ) P ( X ) P ( Y ) (, X1... X ) ( ) ( )... ( ), n P X1 P X2 P Xn Unfortunately, most of random variables of interest are not independent of each other The wet grass eample 13 Conditional Independence 14 A more suitable notion is that of conditional independence. XandYareconditionally are independent given Z Notation: X Z, Y ) X X, Y Z) X I( X, Y Z) Z) Z) Y Z) The conditionally independent structure in the grass eample C I(S,R C) I(C,W S,R) S W R

Conditional Independence Directed Markov Property: Each random variable X, is conditionally independent of its non-descendents, given its parents Pa(X) Descendent nt Formally, X NonDesc(X), Pa(X))=X Pa(X)) Notation: I (X, NonDesc(X) Pa(X)) Parent Y 1 X Y 2 Y 3 Y 4 Non-descendent 15 Factorized Representation Full Joint distribution is defined in terms of local conditional distributions(obtained via the chain rule) P ( 1,, n ) p ( i pa ( i )) Graphical Structure encodes conditional independences among random variables Represent the full joint distribution over the variables more compactly Compleity reduction Joint probability of n binary variables O(2 n ) Factorized form O(n*2 k ) k: maimal number of parents of a node 16

Factorized Representation The wetgrass eample C,S,R,W)=W S,R)R C)S C)C) Only need 1+2+2+4=9 parameters 17 Inference Computation of the conditional i probability bili distribution of one set of nodes, given a model and another set of nodes. Bottom-up Given Observation (leaves), the probabilities of the reasons can be calculated accordingly. diagnosis from effects to reasons Top-down Knowledge influences the probability of the outcome Predict the effects 18

Basic Computation The value of depends on y Dependency: conditional probability y) Knowledge about y: prior probability y) Product rule, y) y) y) Sum rule (Marginalization) P ( ), y) P ( y), y) y Bayesian rule P ( y ) y) y) ) poserior y conditional likelihood prior likelihood 19 Inference: Bottom UP Observe: wet grass (denoted by W=T) Two possible causes: rain or sprinkle. Which is more likely? Apply Bayes rule W T ) c (, s, r P C c, S s, R r, W T ) 0.0396 0.009 0.324 0 0.0495 0.18 0.045 0 0.6471 20

Inference: Bottom UP C S R W ) T T T T 0.99*0.8*0.1*0.5=0.0396 T T F T 09*02*01*0 0.9 0.2 0.1 0.5=0.009 009 T F T T 0.9*0.8*0.9*0.5=0.324 T F F T 0*0.2*0.9*0.5=0 F T T T 0.99*0.2*0.5*0.5=0.0495 F T F T 0.9*9.8*0.5*0.5=0.18 F F T T 0.9*0.2*0.5*0.5=0.045 F F F T 0*0.8*0.5*0.5=0 21 Inference: Bottom UP Observe: wet grass (denoted by W=T) Two possible causes: rain or sprinkle. Which is more likely? Apply Bayes rule S T W T ) S T, W T ) W T ) c, r C c, S T, R r, W W T ) T ) 0.0396 0.009 0.0495 0.18 0.6471 0.2781 0.43 0.6471 22

Inference: Bottom UP Observe: wet grass (denoted by W=T) Two possible causes: rain or sprinkle. Which is more likely? Apply Bayes rule R T W T ) R T, W T ) W T ) c, s C c, S s, R T, W W T ) T ) 0.0396 0.324 0.0495 0.045 0.6471 0.4581 0.708 0.6471 23 Inference: Top-down The probability bilit that t the grass will be wet given that it is cloudy. W T, C T ) W T C T ) P ( C T ) C C, S, R, W ) S, R P C S R W S, R, W (,,, ) S R W 24

Inference Algorithms Eact inference problem in general graphical model is NP-hard Eact Inference Variable elimination Message passing algorithm Clustering and joint tree approach Approimate Inference Loopy belief propagation Sampling (Monte Carlo) methods Variational methods 25 Variable Elimination Computing W=T) Approach 1. Blind approach Sum out all un-instantiated variables from the full joint Computation Cost O(2 n ) The wetgrass eample Number of additions: 14 Number of products:? Solution: eplore the graph structure 26

Variable Elimination Approach 2: Interleave sums and Products The key idea is to push sums in as far as possible 27 In computation First compute: Then compute: And so on Computation Cost O(n*2 k ) For wetgrass eample Number of Additions:? Number of products:? Learning 28

Learning Learn parameters or structure from data Structure learning: find correct connectivity between eisting nodes Parameter learning: find maimum likelihood estimates of parameters of each conditional probability distribution A lot of knowledge (structures and probabilities) came from domain eperts 29 Learning Structure Observation Method Known Full Maimum Likelihood (ML) estimation Known Partial Epectation ti Maimization i algorithm (EM) Unknown Full Model selection Unknown Partial EM + model selection 30

Model Selection Method Select a 'good' model from all possible models and use it as if it were the correct model Having defined a scoring function, a search algorithm is then used to find a network structure that receives the highest score fitting the prior knowledge and data Unfortunately, the number of DAG's on n variables is super-eponential p in n. The usual approach is therefore to use local search algorithms (e.g., greedy hill climbing) to search through the space of graphs. 31 EM Algorithm Epectation (E) step Use current parameters to estimate the unobserved data Maimization (M) step Use estimated data to do ML/MAP estimation of the parameter Repeat EM steps, until convergence 32

Outline It Introduction ti Representation Bayesian network Conditional Independence Inference Learning Markov Random Field Clique Pi Pair-wise i MRF Inference: Belief Propagation Conclusion 33 Markov Random Fields Undirected edges simply ygive correlations between variables The joint distribution is product of local functions over the cliques of the graph 1 P ( ) P C ( C ) Z where P C ( C ) are the clique potentials, and Z is a normalization constant w C 1, y, z, w) PA (, y, w) PB (, y, z) Z y z 34

The Clique A clique A set of variables which are the arguments of a local lfunction The order of a clique The number of variables in the clique Eample: 1,..., 5) PA ( 1 ) PB ( 2) PC ( 1, 2, 3) PD ( 3, 4) PE ( 3, 5) first order clique third order clique second order clique 35 Regular and Arbitrary Graph 36

Pair-wise MRF The order of cliques is at most two. Commonly used in computer vision applications. Infer underline unknown variables through local observation and the smooth prior φ 1 (i 1 ) o 1 o 2 o 3 Observed image φ 2 (i 2 ) φ 3 (i 3 ) Underlying truth (i 1, i 4 ) i 4, i 7 ) ψ 14 ψ 47 (i i ψ 12 (i 1, i 2 ) 1 i ψ 23 (i 2, i 3 ) 2 i 3 φ 4 (i 4 ) o 4 o 5 o 6 (i 2, i 5 ) ψ 25 φ 5 (i 5 ) (i 3, i 6 ) i ψ 45 (i 4, i 5 ) 4 i ψ 5 56 (i 5, i 6 ) i 6 φ 6 (i 6 ) o 7 o 8 o 9 i 5, i 8 ) ψ 58 (i i 6, i 9 ) ψ 36 φ 7 (i 7 ) φ 8 (i 8 ) φ 9 (i 9 ) ψ 69 (i i ψ 78 (i 7, i 8 ) 7 i ψ 89 (i 8, i 9 ) 8 i 9 compatibility 37 Pair-wise MRF φ 1 (i 1 ) o 1 o 2 o 3 Observed image 1 φ 2 (i 2 ) 2 φ 3 (i 3 ) 3 Underlying truth 1, i 4 ) i 7 ) ψ 14 (i 1 ψ 47 (i 4, ψ 12 (i 1, i 2 ) ψ 23 (i 2, i 3 ) i 1 i 2 i 3 φ 4 (i 4 ) o 4 o 5 o 6 ψ 45 (i 4, i 5 ) 2, i 5 ) ψ 25 (i 2 φ 5 (i 5 ) ψ 56 (i 5, i 6 ) 3, i 6 ) i 4 i 5 i 6 i 7 i 8 i 9 ψ y (i, i y )i is an n * n y matri. ti i 8 ) i 9 ) ψ 36 (i 3 φ 6 (i 6 ) o 7 o 8 o 9 ψ 58 (i 5, φ 7 (i 7 ) φ 8 (i 8 ) φ 9 (i 9 ) ψ 78 (i 7, i 8 ) ψ 89 (i 8, i 9 ) φ (i ) is a vector of length n, where n is the number of states of i. ψ 69 (i 6, 38

Pair-wise MRF φ 1 (i 1 ) o 1 o 2 o 3 Observed image 1 φ 2 (i 2 ) 2 φ 3 (i 3 ) 3 Underlying truth 1, i 4 ) i 7 ) ψ 14 (i 1 ψ 47 (i 4, ψ 12 (i 1, i 2 ) ψ 23 (i 2, i 3 ) i 1 i 2 i 3 φ 4 (i 4 ) o 4 o 5 o 6 ψ 45 (i 4, i 5 ) 2, i 5 ) ψ 25 (i 2 φ 5 (i 5 ) ψ 56 (i 5, i 6 ) 3, i 6 ) i 4 i 5 i 6 i 8 ) i 7 i 8 i 9 Given all the evidence nodes y i, we want to find the most likely l state t for all the hidden nodes i, which is equivalent to maimizing i 9 ) ψ 36 (i 3 φ 6 (i 6 ) o 7 o 8 o 9 ψ 58 (i 5, φ 7 (i 7 ) φ 8 (i 8 ) φ 9 (i 9 ) ψ 78 (i 7, i 8 ) ψ 89 (i 8, i 9 ) ψ 69 (i 6, 1 P ({ }) ij ( i, j ) i ( i ) Z ij i 39 Belief Propagation φ 1 (i 1 ) o 1 o 2 o 3 Observed image 1 φ 2 (i 2 ) 2 φ 3 (i 3 ) 3 Underlying truth 1, i 4 ) i 7 ) ψ 14 (i 1 ψ 47 (i 4, ψ 12 (i 1, i 2 ) ψ 23 (i 2, i 3 ) i 1 i 2 i 3 φ 4 (i 4 ) o 4 o 5 o 6 ψ 45 (i 4, i 5 ) 2, i 5 ) ψ 25 (i 2 φ 5 (i 5 ) ψ 56 (i 5, i 6 ) 3, i 6 ) i 4 i 5 i 6 i 8 ) i 9 ) ψ 36 (i 3 φ 6 (i 6 ) o 7 o 8 o 9 i 7 i 8 i 9 Beliefs are used to approimate this probability bilit b ( i ) ( i ) m ( i ) ψ 58 (i 5, φ 7 (i 7 ) φ 8 (i 8 ) φ 9 (i 9 ) ψ 78 (i 7, i 8 ) ψ 89 (i 8, i 9 ) z ψ 69 (i 6, z m ( i ) ( i ) ( i, i ) m ( i ) y y i y y z y z 40

Belief Propagation i 2 m 2->5 (i 5 ) o 5 φ 5 (i 5 ) i 4 i 5 i 6 m 4->5 (i 5 ) m 6->5 (i 5 ) 5(i 5 ) m 8->5 i 8 5( 5 ) m 8->5 Beliefs are used to approimate this probability b5 ( i5 ) 5( i5 ) m25( i5 ) m45( i5 ) m65( i5 ) m85( i5) 41 Belief Propagation i 4 ) m 74 (i 4 4) m 14 (i i 1 o φ 4 (i 4 ) 4 i 4 ψ 45 (i 4, i 5 ) ψ 25 (i 2, i 5 ) i 2 (i 5 ) m 25 o 5 φ 5 (i 5 ) ψ 56 (i 5, i 6 ) 4 i 5 i 6 8)m45(i5) m65(i5) 5 ψ 58 (i 5, i 8 ) m 85 (i 5 ) 42 i 7 i 8 Beliefs are used to approimate this probability b5 ( i5) 5( i5 ) m25( i5 ) m45( i5 ) m65( i5 ) m85( i5 ) m45( i5 ) 4( i4) 45( i4, i5 ) m14( i4) m74( i4) i4

Belief Propagation φ(i ) and ψ y (i,ii y ) For every node i Compute m z (i ) for each neighbor i z N Does b (i ) converge? Y Compute b (i ) Output most likely state for every node i 43 Application: Learning Based Image Super Resolution Etrapolate higher resolution images from low- resolution inputs. The basic assumption: there are correlations between low frequency and high frequency information. A node corresponds to an image patch φ ( p ): the probability of high frequency given observed low frequency ψ y ( p, q ): the smooth prior between neighbor patches 44

Image Super Resolution (a) Images from a "generic" eample set. (b) Input (magnified 4) (c) Cubic spline (d) Super-resolution result (e) Actual full-resolution 45 Conclusion A graphical representation of the probabilistic structure of a set of random variables, along with functions that tcan be used dto derive the joint probability distribution. Intuitive interface for modeling. Modular: Useful tool for managing compleity. Common formalism for many models. 46

References 47 Kevin Murphy, Introduction ti to Graphical lmodels, Technical Report, May 2001. M. I. Jordan, Learning in Graphical Models, MIT Press, 1999. Yijuan Lu, Introduction to Graphical Models, http:// www.cs.utsa.edu/~danlo/teaching/cs7123/fall2005/lyijuan. danlo/teaching/cs7123/fall2005/lyijuan. ppt. Milos Hauskrecht, Probabilistic graphical models, http://www.cs.pitt.edu/~milos/courses/cs3710/lectures/clas pitt s3.pdf. P. Smyth, Belief networks, hidden Markov models, and Markov random fields: a unifying i view, Pattern Recognition Letters, 1998. References F. R. Kschischang, B. J. Frey and H. A. Loeliger, 2001. Factor graphs and the sum-product algorithm IEEE Transactions on Information Theory, February, 2001. Yedidia J.S., Freeman W.T. and dweiss Y, Understanding di Belief Propagation and Its Generalizations, IJCAI 2001 Distinguished Lecture track. William T. Freeman, Thouis R. Jones, and Egon C. Pasztor, Eample-based super-resolution, IEEE Computer Graphics and Applications, March/April, 2002. W. T. Freeman, E. C. Pasztor, O. T. Carmichael Learning Low-Level Vision International Journal of Computer Vision, 40(1), pp. 25-47, 2000. 48