Undirected Graphical Models

Size: px
Start display at page:

Download "Undirected Graphical Models"

Transcription

1 Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012)

2 Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional Models MEMM and Label Bias

3 Graphical Models: A Few Definitions Nodes (vertices) + links (arcs, edges) Node: a random variable Link: a probabilistic relationship Directed graphical models or Bayesian networks useful to express causal relationships between variables. Undirected graphical models or Markov random fields useful to express soft constraints between variables. Factor graphs convenient for solving inference problems

4 The Family of Graphical Models

5 Main References and Readings Section 8.3 in Patter Recognition and Machine Learning. J. Lafferty, A. McCallum and F. Pereira. : Probabilistic Models for Segmenting and Labeling Sequence Data. ICML 2001 (Test-of-Time Award, 2011). A. Ng and M. Jordan. On Discriminative vs. Generative classifies: A comparison of logistic regression and naive Bayes. NIPS Carlos Guestrin, Ben Taskar and Daphne Koller. Max-margin Markov Networks. NIPS D. McAllester, T. Hazan and J. Keshet. Direct Loss Minimization for Structured Output Learning. NIPS 2010.

6 Also called or Markov Network. Consists of nodes which correspond to variables or group of variables Links within the graph do not carry arrows Conditional independence is determined by simple graph separation

7 A C B Conditional independence properties simplify both the model structure and the computations needed to perform inference and learning under that model. Let a, b and c be three variables. If p(a b, c) = p(a c) then we say that a is conditionally independent of b given c, denoted as a b c

8 We identify three sets of nodes A,B, and C. To test whether the conditional independence property A B C holds, we consider all possible paths that connect nodes in set A to nodes in set B: If all such paths pass through one or more nodes in set C, then conditional independence property holds. Testing for conditional independence in undirected graphs is simpler than in directed graphs.

9 Alternative View Introduction An alternative way to view the conditional independence test is to imagine removing all nodes in set C from the graph together with any links that connect to those nodes. We then ask if there exists a path that connects any node in A to any node in B. If there are no such paths, then the conditional independence property must hold.

10 Markov Blanket Introduction The Markov blanket for an undirected graph takes a particularly simple form, because a node will be conditionally independent of all other nodes conditioned only on the neighboring nodes. Thus, the Markov blanket of a node simply consists of the set of all neighboring nodes.

11 Consideration If we consider two nodes (variables) x i and x j that are not connected by a link, then they must be conditionally independent given all other nodes in the graph. Corresponding conditional independence property: p(x i, x j x \{i,j} ) = p(x i x \{i,j} )p(x j x \{i,j} ) where x \{i,j} denotes the set x of all variables with x i and x j removed. The factorization of the joint distribution must be such that x i and x j do not appear in the same factor in order for the conditional independence property to hold for all possible distributions belonging to the graph.

12 Cliques Introduction A clique is a subset of the nodes in a graph such that there exists a link between all pairs of nodes in the subset, i.e., it is a fully connected or complete subgraph. A maximal clique is a clique such that it is not possible to include in the set any other nodes from the graph without it ceasing to be a clique. x 1 x 2 x 3 x 4 Figure: A four-node undirected graph showing a clique (outlined in green) and a maximal clique (outlined in blue).

13 Based on Cliques We can define the factors in the decomposition of the joint distribution to be functions of the variables in the maximal cliques. Let C denote a maximal clique and x C the set of variables in C. Joint distribution: p(x) = 1 ψ C (x C ) (1) Z which is a product of potential functions ψ C (x C ) over the maximal cliques of the graph. The partition function Z is a normalization constant given by Z = ψ C (x C ) x to ensure that p(x) is a probability distribution. C C

14 Potential Functions Introduction To ensure that p(x) 0, we consider only potential functions s.t. ψ C (x C ) 0. Unlike directed graphs in which each factor represents the conditional distribution of the corresponding variable conditioned on the state of its parents, here we do not restrict the choice of potential functions to those that have a specific probabilistic interpretation as marginal or conditional distributions. Due to the generality of the potential functions, their product will in general not be correctly normalized and hence the partition function has to be introduced as a normalization factor.

15 Partition Function Introduction The need for the partition function as a normalization constant is one of the major limitations of undirected graphs: If a graph has M discrete nodes each with K states, then evaluation of the partition function involves summing over K M states, which is exponential to the graph size. The partition function is needed for parameter learning because it will be a function of any parameters that govern the potential functions ψ C (x C ). However, there are situations when evaluation of the partition function is not needed, e.g. It is not needed for evaluating local conditional distributions because a conditional distribution is the ratio of two marginal distributions and hence the partition function cancels between numerator and denominator when evaluating the ratio.

16 and We need to restrict attention to potential functions ψ C (x C ) that are strictly positive, i.e., ψ C (x C ) > 0 Given an undirected graph whose nodes correspond to a fixed set of variables. The Hammersley-Clifford theorem states that the following two sets of distributions are identical: The set of distributions that are consistent with the set of conditional independence statements that can be read from the graph using graph separation. The set of distributions that can be expressed as a factorization of the form (1) w.r.t. the maximal cliques of the graph.

17 Exponential Representation of Potential Functions Since ψ C (x C ) > 0, it is convenient to express them as exponentials: ψ C (x C ) = exp{ E(x C )}, where E(x C ) is called an energy function and the exponential representation is called the Boltzmann distribution. The joint distribution is defined as the product of potentials, and so the total energy is the sum of the energies of the maximal cliques.

18 Illustrative Example Introduction Original binary image (left) and corrupted image after randomly changing 10% of the pixels (right): Restored images obtained using iterated conditional models (left) which gives a locally optimal solution, and using the graph cut algorithm (right) which guarantees a globally optimal solution:

19 Illustrative Example (2) An undirected graphical model representing a Markov random field for image denoising: y i x i x i { 1, +1} denotes the state of pixel i in the unknown noise-free image and y i { 1, +1} denotes the corresponding value of pixel i in the observed noisy image. Two types of strong correlation giving two types of cliques: Cliques of the form {x i, y i } with energy function ηx i y i for some constant η > 0. Cliques of the form {x i, x j } for neighboring pixels i and j with energy function βx i x j for some constant β > 0.

20 Illustrative Example (3) Because a potential function is an arbitrary, nonnegative function over a maximal clique, we can multiply it by any nonnegative functions of subsets of the clique, or equivalently we add the corresponding energies. This corresponds to adding an extra term hx i for each pixel i in the noise-free image. Such a term has the effect of biasing the model towards pixel values that have one particular sign in preference to the other. (If h = 0, the probabilities of the two states of x i are equal.) Complete energy function for model: E(x, y) = h i x i β {i,j} x i x j η i x i y i Corresponding joint distribution over x and y: p(x, y) = 1 Z exp{ E(x, y)}

21 Illustrative Example (4) Image denoising corresponds to finding an image x that maximizes the conditional distribution p(x y) by fixing y to the observed noisy image. In practice, it may not be possible to find x with the maximum probability but one with a sufficiently high probability. Different iterative optimization algorithms may be used by starting from some initial value of x, e.g., setting x to y.

22 Direction-to-Undirected Graph Conversion: Simple Case Given a directed graph, we convert it into an undirected graph. E.g., x 1 x 2 x N 1 x N x 1 x 2 x N 1 x N Joint distribution of directed graph: p(x) = p(x 1 )p(x 2 x 1 )p(x 3 x 2 )... p(x N x N 1 ) Joint distribution of undirected graph: p(x) = 1 Z ψ 1,2(x 1, x 2 )ψ 2,3 (x 2, x 3 )... ψ N 1,N (x N 1, x N )

23 Direction-to-Undirected Graph Conversion: Simple Case (2) From the two joint distributions we can identify the following equivalence relationships: ψ 1,2 (x 1, x 2 ) = p(x 1 )p(x 2 x 1 ) ψ 2,3 (x 2, x 3 ) = p(x 3 x 2 ) ψ N 1,N (x N 1, x N ) = p(x N x N 1 ). Note that the partition function Z = x ψ 1,2 (x 1, x 2 )ψ 2,3 (x 2, x 3 )... ψ N 1,N (x N 1, x N ) = p(x 1 )p(x 2 x 1 )p(x 3 x 2 )... p(x N x N 1 ) = 1

24 Direction-to-Undirected Graph Conversion: General Case Conversion can be achieved if the clique potentials of the undirected graph are given by the conditional distributions of the directed graph. This requires that the set of variables that appears in each of the conditional distributions be a member of at least one clique of the undirected graph. If a node in the directed graph has: One parent: this can be achieved simply by replaced the directed link with an undirected link. Multiple parents: we also need to add extra links between all pairs of parents (thus discarding some conditional independence properties).

25 Direction-to-Undirected Graph Conversion: General Case (2) x 1 x 3 x 1 x 3 x 2 x 2 x 4 x 4 The factor p(x 4 x 1, x 2, x 3 ) in the directed graph involves the four variables x 1, x 2, x 3, and x 4. So they must all belong to a single clique in the undirected graph if this conditional distribution is to be absorbed into a clique potential. The process of adding extra links between the parents ( marrying the parents ) is known as moralization.

26 Chain Graph Introduction The graphical framework can be extended in a consistent way to graphs that include both directed and undirected links, called chain graphs. Directed graphs and undirected graphs can be considered as special cases of chain graphs.

27 (CRFs) Like a Markov random field, a conditional random field (CRF) is an undirected graphical model. Unlike a Markov random field, the distribution of each discrete variable Y in the graph is conditioned on an input sequence X. CRF is a type of discriminative probabilistic model often used for labeling or segmenting sequential data, such as natural language text or biological sequences. A CRF is a generalization of an hidden Markov model (HMM) that makes the constant state transition probabilities into arbitrary functions that vary across the positions in the sequence of hidden states, depending on the input sequence.

28 Generative Models Introduction Generative vs. Conditional Models MEMM and Label Bias Hidden Markov models (HMMs) and stochastic grammars Assign a joint probability to paired observation and label sequences The parameters typically trained to maximize the joint likelihood of train examples

29 Generative Models (2) Generative vs. Conditional Models MEMM and Label Bias Difficulties and disadvantages Need to enumerate all possible observation sequences Not practical to represent multiple interacting features or long-range dependencies of the observations Very strict independence assumptions on the observations

30 Conditional Models Introduction Generative vs. Conditional Models MEMM and Label Bias Conditional probability P(label sequence y observation sequence x) rather than joint probability P(y, x) Specify the probability of possible label sequences given an observation sequence Allow arbitrary, non-independent features on the observation sequence x. The probability of a transition between labels may depend on past and future observations Relax strong independence assumptions in generative models

31 Generative vs. Conditional Models MEMM and Label Bias Maximum Entropy Markov Models (MEMMs) Given training set X with label sequence Y: Train a model θ that maximizes P(Y X, θ) For a new data sequence x, the predicted label y maximizes P(y x, θ) Notice the per-state normalization

32 MEMMs (2) Introduction Generative vs. Conditional Models MEMM and Label Bias MEMMs have all the advantages of Conditional Models Per-state normalization: all the mass that arrives at a state must be distributed among the possible successor states ( conservation of score mass ) Subject to Label Bias Problem Bias toward states with fewer outgoing transitions Due to per-state normalization

33 Label Bias Problem of MEMM Generative vs. Conditional Models MEMM and Label Bias Since P(2 1, x) = 1 and P(5 4, x) = 1 for all x (per-state normalization), then P(1, 2 r, i) = P(1 r)p(2 1, i) = P(1 r) P(4, 5 r, i) = P(4 r)p(5 4, i) = P(4 r) The probability does not depend on the second observation If one path is slightly more often in training, it always wins in testing.

34 Solve the Label Bias Problem Generative vs. Conditional Models MEMM and Label Bias Change the state-transition structure of the model Start with a fully-connected model and let the training procedure figure out a good structure Not always practical to change the set of states Preclude the use of prior, which is very valuable (e.g. in information extraction)

35 Generative vs. Conditional Models MEMM and Label Bias (CRFs) CRFs have all the advantages of MEMMs without label bias problem MEMM uses per-state exponential model for the conditional probabilities of next states given the current state CRF has a single exponential model for the joint probability of the entire sequence of labels given the observation sequence Undirected graph Allow some transitions vote more strongly than others depending on the corresponding observations

36 Definition of CRFs Introduction Generative vs. Conditional Models MEMM and Label Bias X: random variable over data sequences to be labeled Y: random variable over corresponding label sequences Definition Let G = (V, E) be a graph such that Y = (Y v ) v V, so that Y is indexed by the vertices of G. Then (X, Y) is a conditional random field in case, when conditioned on X, the random variables Y v obey the Markov property with respect to the graph: p(y v X, Y w, w v) = p(y v X, Y w, w v), where w v means that w and v are neighbors in G.

37 Example of CRFs Introduction Generative vs. Conditional Models MEMM and Label Bias Suppose p(y v X, all other Y) = p(y v X, neighbors Y v ), then X with Y is a conditional random field. p(y 3 X, all other Y) = p(y 3 X, Y 2, Y 4 ) Think of X as observation and Y as labels

38 Generative vs. Conditional Models MEMM and Label Bias Graphical Comparison Among HMMs, MEMMs and CRFs Graphical structures of simple HMM (left), MEMMs (middle) and chain-structured CRFs (right). Open circles indicate that the variables are not generated by the model.

39 Conditional Distribution Generative vs. Conditional Models MEMM and Label Bias If the graph G = (V, E) of Y is a chain, the conditional distribution over the label sequence y given x is: p θ (y x) = 1 Z (x) exp λ k f k (e, y e, x) + µ k g k (v, y v, x) e E,k v V,k f k and g k are given and fixed. g k is a Boolean vertex feature, while f k is a Boolean edge feature. k: number of features θ = (λ 1,... ; µ 1,...): λ k and µ k are parameters to be estimated y e: the set of components of y defined by edge e y v: the set of components of y defined by vertex v Z (x): normalization over the data sequence x

40 Parameter Estimation for CRFs Generative vs. Conditional Models MEMM and Label Bias Lafferty et al. presented iterative scaling algorithms But it is very inefficient. log p θ (y x) = λ k f k (e, y e, x) + µ k g k (v, y v, x) log Z (x) e E,k v V,k More efficient learning algorithms: LBFGS with approximate Hessian log p θ (y x) = λ k f k (e, y e, x) + µ k g k (v, y v, x) log Z (x) θ θ e E,k v V,k depending on graph structures, log Z (x) and its derivative can be hard Other optimization algorithms apply Note: standard MCLE over-fits, 2-norm regularization saves a lot!

41 Summary Introduction Generative vs. Conditional Models MEMM and Label Bias Discriminative models are prone to the label bias problem. CRFs provide the benefits of discriminative models. CRFs solve the label bias problem well, and demonstrate good performance.

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Undirected Graphical Models: Markov Random Fields

Undirected Graphical Models: Markov Random Fields Undirected Graphical Models: Markov Random Fields 40-956 Advanced Topics in AI: Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2015 Markov Random Field Structure: undirected

More information

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Part I C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Probabilistic Graphical Models Graphical representation of a probabilistic model Each variable corresponds to a

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Directed and Undirected Graphical Models

Directed and Undirected Graphical Models Directed and Undirected Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Last Lecture Refresher Lecture Plan Directed

More information

Chapter 16. Structured Probabilistic Models for Deep Learning

Chapter 16. Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models David Sontag New York University Lecture 4, February 16, 2012 David Sontag (NYU) Graphical Models Lecture 4, February 16, 2012 1 / 27 Undirected graphical models Reminder

More information

Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013

Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013 Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013 Outline Modeling Inference Training Applications Outline Modeling Problem definition Discriminative vs. Generative Chain CRF General

More information

Graphical models for part of speech tagging

Graphical models for part of speech tagging Indian Institute of Technology, Bombay and Research Division, India Research Lab Graphical models for part of speech tagging Different Models for POS tagging HMM Maximum Entropy Markov Models Conditional

More information

Lecture 6: Graphical Models

Lecture 6: Graphical Models Lecture 6: Graphical Models Kai-Wei Chang CS @ Uniersity of Virginia kw@kwchang.net Some slides are adapted from Viek Skirmar s course on Structured Prediction 1 So far We discussed sequence labeling tasks:

More information

Conditional Random Field

Conditional Random Field Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions

More information

3 : Representation of Undirected GM

3 : Representation of Undirected GM 10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:

More information

CS Lecture 4. Markov Random Fields

CS Lecture 4. Markov Random Fields CS 6347 Lecture 4 Markov Random Fields Recap Announcements First homework is available on elearning Reminder: Office hours Tuesday from 10am-11am Last Time Bayesian networks Today Markov random fields

More information

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 Plan for today and next week Today and next time: Bayesian networks (Bishop Sec. 8.1) Conditional

More information

Random Field Models for Applications in Computer Vision

Random Field Models for Applications in Computer Vision Random Field Models for Applications in Computer Vision Nazre Batool Post-doctorate Fellow, Team AYIN, INRIA Sophia Antipolis Outline Graphical Models Generative vs. Discriminative Classifiers Markov Random

More information

Machine Learning for Structured Prediction

Machine Learning for Structured Prediction Machine Learning for Structured Prediction Grzegorz Chrupa la National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Grzegorz Chrupa la (DCU) Machine Learning for

More information

Probabilistic Models for Sequence Labeling

Probabilistic Models for Sequence Labeling Probabilistic Models for Sequence Labeling Besnik Fetahu June 9, 2011 Besnik Fetahu () Probabilistic Models for Sequence Labeling June 9, 2011 1 / 26 Background & Motivation Problem introduction Generative

More information

CSC 412 (Lecture 4): Undirected Graphical Models

CSC 412 (Lecture 4): Undirected Graphical Models CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:

More information

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated

More information

Probabilistic Graphical Models Lecture Notes Fall 2009

Probabilistic Graphical Models Lecture Notes Fall 2009 Probabilistic Graphical Models Lecture Notes Fall 2009 October 28, 2009 Byoung-Tak Zhang School of omputer Science and Engineering & ognitive Science, Brain Science, and Bioinformatics Seoul National University

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Course 16:198:520: Introduction To Artificial Intelligence Lecture 9. Markov Networks. Abdeslam Boularias. Monday, October 14, 2015

Course 16:198:520: Introduction To Artificial Intelligence Lecture 9. Markov Networks. Abdeslam Boularias. Monday, October 14, 2015 Course 16:198:520: Introduction To Artificial Intelligence Lecture 9 Markov Networks Abdeslam Boularias Monday, October 14, 2015 1 / 58 Overview Bayesian networks, presented in the previous lecture, are

More information

Review: Directed Models (Bayes Nets)

Review: Directed Models (Bayes Nets) X Review: Directed Models (Bayes Nets) Lecture 3: Undirected Graphical Models Sam Roweis January 2, 24 Semantics: x y z if z d-separates x and y d-separation: z d-separates x from y if along every undirected

More information

Probabilistic Graphical Models (I)

Probabilistic Graphical Models (I) Probabilistic Graphical Models (I) Hongxin Zhang zhx@cad.zju.edu.cn State Key Lab of CAD&CG, ZJU 2015-03-31 Probabilistic Graphical Models Modeling many real-world problems => a large number of random

More information

Undirected Graphical Models

Undirected Graphical Models Undirected Graphical Models 1 Conditional Independence Graphs Let G = (V, E) be an undirected graph with vertex set V and edge set E, and let A, B, and C be subsets of vertices. We say that C separates

More information

10 : HMM and CRF. 1 Case Study: Supervised Part-of-Speech Tagging

10 : HMM and CRF. 1 Case Study: Supervised Part-of-Speech Tagging 10-708: Probabilistic Graphical Models 10-708, Spring 2018 10 : HMM and CRF Lecturer: Kayhan Batmanghelich Scribes: Ben Lengerich, Michael Kleyman 1 Case Study: Supervised Part-of-Speech Tagging We will

More information

1 Undirected Graphical Models. 2 Markov Random Fields (MRFs)

1 Undirected Graphical Models. 2 Markov Random Fields (MRFs) Machine Learning (ML, F16) Lecture#07 (Thursday Nov. 3rd) Lecturer: Byron Boots Undirected Graphical Models 1 Undirected Graphical Models In the previous lecture, we discussed directed graphical models.

More information

Conditional Independence and Factorization

Conditional Independence and Factorization Conditional Independence and Factorization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models CPSC

More information

A brief introduction to Conditional Random Fields

A brief introduction to Conditional Random Fields A brief introduction to Conditional Random Fields Mark Johnson Macquarie University April, 2005, updated October 2010 1 Talk outline Graphical models Maximum likelihood and maximum conditional likelihood

More information

Sequential Supervised Learning

Sequential Supervised Learning Sequential Supervised Learning Many Application Problems Require Sequential Learning Part-of of-speech Tagging Information Extraction from the Web Text-to to-speech Mapping Part-of of-speech Tagging Given

More information

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics Markov Random Fields: Representation Conditional Random Fields Log-Linear Models Readings: KF

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc.

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Harsha Veeramachaneni Thomson Reuter Research and Development April 1, 2010 Harsha Veeramachaneni (TR R&D) Logistic Regression April 1, 2010

More information

Markov Random Fields

Markov Random Fields Markov Random Fields Umamahesh Srinivas ipal Group Meeting February 25, 2011 Outline 1 Basic graph-theoretic concepts 2 Markov chain 3 Markov random field (MRF) 4 Gauss-Markov random field (GMRF), and

More information

CRF for human beings

CRF for human beings CRF for human beings Arne Skjærholt LNS seminar CRF for human beings LNS seminar 1 / 29 Let G = (V, E) be a graph such that Y = (Y v ) v V, so that Y is indexed by the vertices of G. Then (X, Y) is a conditional

More information

Conditional Random Fields: An Introduction

Conditional Random Fields: An Introduction University of Pennsylvania ScholarlyCommons Technical Reports (CIS) Department of Computer & Information Science 2-24-2004 Conditional Random Fields: An Introduction Hanna M. Wallach University of Pennsylvania

More information

Statistical Approaches to Learning and Discovery

Statistical Approaches to Learning and Discovery Statistical Approaches to Learning and Discovery Graphical Models Zoubin Ghahramani & Teddy Seidenfeld zoubin@cs.cmu.edu & teddy@stat.cmu.edu CALD / CS / Statistics / Philosophy Carnegie Mellon University

More information

Probabilistic Machine Learning

Probabilistic Machine Learning Probabilistic Machine Learning Bayesian Nets, MCMC, and more Marek Petrik 4/18/2017 Based on: P. Murphy, K. (2012). Machine Learning: A Probabilistic Perspective. Chapter 10. Conditional Independence Independent

More information

Hidden Markov Models

Hidden Markov Models 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 22 April 2, 2018 1 Reminders Homework

More information

Machine Learning Lecture 14

Machine Learning Lecture 14 Many slides adapted from B. Schiele, S. Roth, Z. Gharahmani Machine Learning Lecture 14 Undirected Graphical Models & Inference 23.06.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de

More information

4.1 Notation and probability review

4.1 Notation and probability review Directed and undirected graphical models Fall 2015 Lecture 4 October 21st Lecturer: Simon Lacoste-Julien Scribe: Jaime Roquero, JieYing Wu 4.1 Notation and probability review 4.1.1 Notations Let us recall

More information

Undirected graphical models

Undirected graphical models Undirected graphical models Semantics of probabilistic models over undirected graphs Parameters of undirected models Example applications COMP-652 and ECSE-608, February 16, 2017 1 Undirected graphical

More information

Lecture 4 October 18th

Lecture 4 October 18th Directed and undirected graphical models Fall 2017 Lecture 4 October 18th Lecturer: Guillaume Obozinski Scribe: In this lecture, we will assume that all random variables are discrete, to keep notations

More information

Bayesian Networks BY: MOHAMAD ALSABBAGH

Bayesian Networks BY: MOHAMAD ALSABBAGH Bayesian Networks BY: MOHAMAD ALSABBAGH Outlines Introduction Bayes Rule Bayesian Networks (BN) Representation Size of a Bayesian Network Inference via BN BN Learning Dynamic BN Introduction Conditional

More information

Discriminative Fields for Modeling Spatial Dependencies in Natural Images

Discriminative Fields for Modeling Spatial Dependencies in Natural Images Discriminative Fields for Modeling Spatial Dependencies in Natural Images Sanjiv Kumar and Martial Hebert The Robotics Institute Carnegie Mellon University Pittsburgh, PA 15213 {skumar,hebert}@ri.cmu.edu

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 24, 2016 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,

More information

Graphical Models - Part II

Graphical Models - Part II Graphical Models - Part II Bishop PRML Ch. 8 Alireza Ghane Outline Probabilistic Models Bayesian Networks Markov Random Fields Inference Graphical Models Alireza Ghane / Greg Mori 1 Outline Probabilistic

More information

Markov Networks. l Like Bayes Nets. l Graphical model that describes joint probability distribution using tables (AKA potentials)

Markov Networks. l Like Bayes Nets. l Graphical model that describes joint probability distribution using tables (AKA potentials) Markov Networks l Like Bayes Nets l Graphical model that describes joint probability distribution using tables (AKA potentials) l Nodes are random variables l Labels are outcomes over the variables Markov

More information

3 Undirected Graphical Models

3 Undirected Graphical Models Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 3 Undirected Graphical Models In this lecture, we discuss undirected

More information

6.867 Machine learning, lecture 23 (Jaakkola)

6.867 Machine learning, lecture 23 (Jaakkola) Lecture topics: Markov Random Fields Probabilistic inference Markov Random Fields We will briefly go over undirected graphical models or Markov Random Fields (MRFs) as they will be needed in the context

More information

Gibbs Fields & Markov Random Fields

Gibbs Fields & Markov Random Fields Statistical Techniques in Robotics (16-831, F10) Lecture#7 (Tuesday September 21) Gibbs Fields & Markov Random Fields Lecturer: Drew Bagnell Scribe: Bradford Neuman 1 1 Gibbs Fields Like a Bayes Net, a

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: Tony Jebara Topic 16 Undirected Graphs Undirected Separation Inferring Marginals & Conditionals Moralization Junction Trees Triangulation Undirected Graphs Separation

More information

Markov Networks. l Like Bayes Nets. l Graph model that describes joint probability distribution using tables (AKA potentials)

Markov Networks. l Like Bayes Nets. l Graph model that describes joint probability distribution using tables (AKA potentials) Markov Networks l Like Bayes Nets l Graph model that describes joint probability distribution using tables (AKA potentials) l Nodes are random variables l Labels are outcomes over the variables Markov

More information

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015 Sequence Modelling with Features: Linear-Chain Conditional Random Fields COMP-599 Oct 6, 2015 Announcement A2 is out. Due Oct 20 at 1pm. 2 Outline Hidden Markov models: shortcomings Generative vs. discriminative

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 23, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,

More information

Partially Directed Graphs and Conditional Random Fields. Sargur Srihari

Partially Directed Graphs and Conditional Random Fields. Sargur Srihari Partially Directed Graphs and Conditional Random Fields Sargur srihari@cedar.buffalo.edu 1 Topics Conditional Random Fields Gibbs distribution and CRF Directed and Undirected Independencies View as combination

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should

More information

Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields

Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields Sameer Maskey Week 13, Nov 28, 2012 1 Announcements Next lecture is the last lecture Wrap up of the semester 2 Final Project

More information

Notes on Markov Networks

Notes on Markov Networks Notes on Markov Networks Lili Mou moull12@sei.pku.edu.cn December, 2014 This note covers basic topics in Markov networks. We mainly talk about the formal definition, Gibbs sampling for inference, and maximum

More information

Energy Based Models. Stefano Ermon, Aditya Grover. Stanford University. Lecture 13

Energy Based Models. Stefano Ermon, Aditya Grover. Stanford University. Lecture 13 Energy Based Models Stefano Ermon, Aditya Grover Stanford University Lecture 13 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 13 1 / 21 Summary Story so far Representation: Latent

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Generative v. Discriminative classifiers Intuition

Generative v. Discriminative classifiers Intuition Logistic Regression Machine Learning 070/578 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features Y target

More information

Generative v. Discriminative classifiers Intuition

Generative v. Discriminative classifiers Intuition Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 1 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features

More information

Introduction to Machine Learning Midterm, Tues April 8

Introduction to Machine Learning Midterm, Tues April 8 Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend

More information

10708 Graphical Models: Homework 2

10708 Graphical Models: Homework 2 10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves

More information

Directed and Undirected Graphical Models

Directed and Undirected Graphical Models Directed and Undirected Graphical Models Adrian Weller MLSALT4 Lecture Feb 26, 2016 With thanks to David Sontag (NYU) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Introduction to Graphical Models

Introduction to Graphical Models Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic

More information

9 Forward-backward algorithm, sum-product on factor graphs

9 Forward-backward algorithm, sum-product on factor graphs Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 9 Forward-backward algorithm, sum-product on factor graphs The previous

More information

Fall 2010 Graduate Course on Dynamic Learning

Fall 2010 Graduate Course on Dynamic Learning Fall 2010 Graduate Course on Dynamic Learning Chapter 8: Conditional Random Fields November 1, 2010 Byoung-Tak Zhang School of Computer Science and Engineering & Cognitive Science and Brain Science Programs

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Bayesian Networks Introduction to Machine Learning. Matt Gormley Lecture 24 April 9, 2018

Bayesian Networks Introduction to Machine Learning. Matt Gormley Lecture 24 April 9, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Bayesian Networks Matt Gormley Lecture 24 April 9, 2018 1 Homework 7: HMMs Reminders

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, etworks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Hidden Markov Models

Hidden Markov Models 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 19 Nov. 5, 2018 1 Reminders Homework

More information

Intelligent Systems:

Intelligent Systems: Intelligent Systems: Undirected Graphical models (Factor Graphs) (2 lectures) Carsten Rother 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM Roadmap for next two lectures Definition

More information

Directed Graphical Models

Directed Graphical Models CS 2750: Machine Learning Directed Graphical Models Prof. Adriana Kovashka University of Pittsburgh March 28, 2017 Graphical Models If no assumption of independence is made, must estimate an exponential

More information

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari Deep Belief Nets Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines for continuous

More information

Graphical Models and Independence Models

Graphical Models and Independence Models Graphical Models and Independence Models Yunshu Liu ASPITRG Research Group 2014-03-04 References: [1]. Steffen Lauritzen, Graphical Models, Oxford University Press, 1996 [2]. Christopher M. Bishop, Pattern

More information

Markov Networks.

Markov Networks. Markov Networks www.biostat.wisc.edu/~dpage/cs760/ Goals for the lecture you should understand the following concepts Markov network syntax Markov network semantics Potential functions Partition function

More information

MAP Examples. Sargur Srihari

MAP Examples. Sargur Srihari MAP Examples Sargur srihari@cedar.buffalo.edu 1 Potts Model CRF for OCR Topics Image segmentation based on energy minimization 2 Examples of MAP Many interesting examples of MAP inference are instances

More information

An Introduction to Bayesian Machine Learning

An Introduction to Bayesian Machine Learning 1 An Introduction to Bayesian Machine Learning José Miguel Hernández-Lobato Department of Engineering, Cambridge University April 8, 2013 2 What is Machine Learning? The design of computational systems

More information

p(x) p(x Z) = y p(y X, Z) = αp(x Y, Z)p(Y Z)

p(x) p(x Z) = y p(y X, Z) = αp(x Y, Z)p(Y Z) Graphical Models Foundations of Data Analysis Torsten Möller Möller/Mori 1 Reading Chapter 8 Pattern Recognition and Machine Learning by Bishop some slides from Russell and Norvig AIMA2e Möller/Mori 2

More information

Structure Learning in Sequential Data

Structure Learning in Sequential Data Structure Learning in Sequential Data Liam Stewart liam@cs.toronto.edu Richard Zemel zemel@cs.toronto.edu 2005.09.19 Motivation. Cau, R. Kuiper, and W.-P. de Roever. Formalising Dijkstra's development

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Lecture 15. Probabilistic Models on Graph

Lecture 15. Probabilistic Models on Graph Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how

More information

Chapter 17: Undirected Graphical Models

Chapter 17: Undirected Graphical Models Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)

More information

Representation of undirected GM. Kayhan Batmanghelich

Representation of undirected GM. Kayhan Batmanghelich Representation of undirected GM Kayhan Batmanghelich Review Review: Directed Graphical Model Represent distribution of the form ny p(x 1,,X n = p(x i (X i i=1 Factorizes in terms of local conditional probabilities

More information

Example: multivariate Gaussian Distribution

Example: multivariate Gaussian Distribution School of omputer Science Probabilistic Graphical Models Representation of undirected GM (continued) Eric Xing Lecture 3, September 16, 2009 Reading: KF-chap4 Eric Xing @ MU, 2005-2009 1 Example: multivariate

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

Variational Inference (11/04/13)

Variational Inference (11/04/13) STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further

More information

Introduction to Probabilistic Graphical Models

Introduction to Probabilistic Graphical Models Introduction to Probabilistic Graphical Models Sargur Srihari srihari@cedar.buffalo.edu 1 Topics 1. What are probabilistic graphical models (PGMs) 2. Use of PGMs Engineering and AI 3. Directionality in

More information

Lecture 9: PGM Learning

Lecture 9: PGM Learning 13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and

More information

arxiv: v1 [stat.ml] 17 Nov 2010

arxiv: v1 [stat.ml] 17 Nov 2010 An Introduction to Conditional Random Fields arxiv:1011.4088v1 [stat.ml] 17 Nov 2010 Charles Sutton University of Edinburgh csutton@inf.ed.ac.uk Andrew McCallum University of Massachusetts Amherst mccallum@cs.umass.edu

More information

The Origin of Deep Learning. Lili Mou Jan, 2015

The Origin of Deep Learning. Lili Mou Jan, 2015 The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets

More information