Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI

Size: px
Start display at page:

Download "Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI"

Transcription

1 Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI Lecture 2 Yasemin Altun January 26, 2007

2 Review of Inference on Graphical Models Elimination algorithm finds single marginal probability p(f ) by Choosing an elimination ordering I such that f is the last node Keep track of active potentials Eliminate a node i by removing all active potentials referencing i, take product and sum over x i, put this message on the active list. Consider trees with multiple queries. This is a special case of Junction Tree Algorithm, Sum-Product algorithm. Choose a root node arbitrarily A natural elimination order is depth-first traversal of the tree.

3 Inference on Trees for Single Query p(r) to root i i j m ji ( x i ) j m kj ( x j ) m lj ( x j ) k l (a) (b) Elimination of j node. If j is an evidence node φ E (x j ) = 1[x j = x j ] else φ E (x j ) = 1 m ji (x i ) = φ E (x j )φ(x i, x j ) m kj (x j ) x j k c(j) The marginal of root x r is proportional to its messages p(x r x e ) = φ E (x r ) m kr (x r ) k c(r)

4 m 32 ( Inference on Trees for x ) 2 Multiple Query m ( x ) 42 2 X 3 X4 m 32 ( x 2 ) m 42 ( x 2 ) X 3 X4 (a) (b) X1 X 1 X1 X 1 m 12 ( x 1 ) m 12 ( x ) 12 ( x 2 ) 2 m 12 ( x 2 ) m 12 ( x 1 ) X2 X X2 2 X 2 m 32 ( x 2 ) m 42 ( x 2 ) m 32 ( x 2 ) m 42 ( x 2 ) m24 ( x 4 ) m 32 ( x 2 ) m 42 ( x 2 ) X 3 X4 X 3 X4X 4 X 3 m 23 ( x 3 ) m 24 ( x 4 ) X 4 (a) (b) (c) (d) X1 X1 Fig a) Choosing 1 as root, 2 has all the messages from nodes m 12 ( x ) below it 2 m 12 ( x 2 ) m 12 ( x 1 ) Fig b) To X2 compute p(x 2 ), m 12 X2 (x 2 ) is needed m m 32 ( x 2 ) 32 ( x ) m 2 42 ( ) Fig c) One pass from leaves to root x 2 and one pass m24 ( x 4 ) m 23 ( x ) m 3 24 ( x ) 4 X 3 X4 X 3 X4 node (c) marginals (d) backward gives the messages required to compute all

5 Inference on HMMs '"#!%& '"#& '"#$%& '"#!%& '"#& '"#$%&!"#!%&!"#&!"#$%&!"#!%&!"#&!"#$%& (&)*++,&)-./)(01)2(34(#4506 Hidden Markov Models are directed sequence models. Let x = (x 1,..., x l ) and y = (y 1,..., y l ) be sequences. x i O, where O is a set of possible observations. y i Y, where Y is a set of possible labels. π σ = p(y 1 = σ; θ) initial transition. T σ,σ = p(y i+1 = σ y i = σ ; θ) transition probabilities for σ, σ Y O σ,u = p(x i = u y i = σ; θ) observation probabilities for σ Y, u O N N 1 P(x, y; θ) = π y1 O yi,x i T yi+1,y i i=1 i=1

6 Forward-Backward Algorithm for HMMs Set y l as the root, x are evidence. Forward pass α i (σ) = m (i 1)i (x i = σ) ( ) p( x 1:i, y i = σ) = α i (σ) = α i 1 (σ )T σ,σ σ Y O σ, xi Backward pass p( x i+1:l y i = σ) = β i (σ) = T σ,σo σ, x i+1 β i+1 (σ ) σ Y Combining messages P(y i = σ x 1:l ) = p( x 1:l y i )p(y i ) p( x i:l ) p( x 1:i y i )p( x i+1:l y i )p(y i ) = α i (σ)β i (σ)

7 Generalization to Junction Tree Algorithm Data structure: Generate a tree whose nodes are the maximal cliques. Then, we get p(x C x V \C ) using the sum-product. We can marginalize over this to get individual marginal probabilities. Marginalization over different cliques of node i should yield to same p(x i x V \i ). There are multiple clique tree. We need a data structure that preserves global consistency via local consistency. A junction tree is a clique tree for each clique pair C i, C j, sharing some variables C s, C s is included in all the nodes of the clique tree on the path from C i and C j. There exists a junction tree of a graph iff the graph is triangulated. Elimination ordering Message-Passing Protocol: A clique C i can send a message to neighbor C j if it has received messages from all its other neighbors.

8 Junction Tree Algorithm Moralize if the graph is directed Triangulate the graph G (using variable elimination algorithm) Construct a junction tree J from the triangulated graph via maximal weight spanning tree. Initialize the potential of cliques C of J as the product of potentials from G all assigned factors from the model. Propagate local messages on the junction tree

9 Maximizing instead of Summing Finding maximum probability configurations ion and Belief Propagation both summed over Ch values of the marginal Maximizing (non-query, non-evidence) instead ofnodes Summing If j rginal probability. Same formulation can be used exactly for finding MAP Elimination and Belief Propagation both summed over Multiplication distributes over max as well as sum Pa wanted to allmaximize possible values over the ofnon-query, the marginal non-evidence (non-query, non-evidence) nodes d the probabilty to get aof marginal the single probability. best setting consistent max(ab, ac) = a max(b, c) ery and evidence? What if we wanted to maximize over the non-query, non-evidence ax max max max nodes We max p(x can to 1)p(xfind 2 x do 1)p(x MAP the 3 x 1)p(x probabilty computation 4 x 2)p(x 5 x 3)p(x of 6 xthe 2, using x 5) single exactly best setting the same consistent x1 x2 x3 x4 x5 ax p(x 1) max p(x with algorithm 2 x 1) max anyp(x query 3 x 1) replacing maxand p(x 4 x 2) evidence? max sums p(x 5 x 3)p(x with 6 x 2, max x 5) Re x1 x2 x3 x4 x5 wn as the maximum a-posteriori or MAP configuration. Giv max x p(x) = max x1 max x2 max x3 x1 x2 gation to solve this problem. max x4 max x5 p(x 1 )p(x 2 x 1 )p(x 3 x 1 )p(x 4 x 2 )p(x 5 x 3 )p(x 6 x 2, x 5 ) that (on trees), we can use an algorithm exactly like = max p(x 1 ) max p(x 2 x 1 ) max p(x 3 x 1 ) max p(x 4 x 2 ) max p(x 5 x 3 )p(x 6 x 2, x 5 ) x3 X 4 This is known as the maximum a-posteriori or MAP configuration. X 2 X It turns out that (on trees), we can use 6 X1 an algorithm exactly like belief-propagation to solve this problem. X 4 X 3 X 5 x4 x5 X1 X 2 X 6 Re ma X 3 X 5

10 Learning Parameter Learning: Given the graph structure, how to get the conditional probability distributions P θ (X i X Πi )? Parameters θ are unknown constants Given fully observed training data D Find their estimates maximizing the (penalized) log-likelihood ˆθ = argmax log P(D θ)( λr(θ)) θ Parameter estimation with fully observed data Parameter estimation with latent variables, Expectation Maximization (EM) Structure Learning: Given fully observed training data D, how to get the graph G and its parameters θ? After studying the discriminative framework

11 Parameter Estimation with Complete Observations Random variable X = (X 1,..., X n ). The data D = (x 1,..., x m ) is a set of m IID observations. Maximum likelihood estimate (MLE) can be found by maximizing the log probability of the data. In Bayes nets, l(θ; D) = log p(d θ) = log m p(x j θ) = j=1 m n j=1 i=1 log p(x j i x j, θ π j i ) i Consider X i be discrete rv with K possible values. Then, p(x i X πi, θ i ) = θ i (x i, x πi ) is a multinomial distribution st x i θ i (x i, x πi ) = 1. l(θ; D) = N(x i,πi ) log θ i (x i, x πi ) i x i,πi N(x i,πi ) = m assignment in data j=1 1[x j i,π i = x i,πi ] observed count of x i,πi

12 Parameter Estimation with Complete Observations l(θ; D) = i x i,πi N(x i,πi ) log θ i (x i, x πi ) Estimation of θ i is independent of θ k for k i. Estimating of θ i, ignore data associated with nodes other than i and π i and maximize N(x i,πi ) log θ i (x i, x πi ) wrt θ i. Add a Lagrangian term for normalization and solve for θ i ML estimate is the relative frequency. ˆθ i (x i, x πi ) = N(x i, x πi )/N(π i )

13 Parameter Estimation in HMMs '"#!%& '"#& '"#$%& '"#!%& '"#& '"#$%&!"#!%&!"#&!"#$%&!"#!%&!"#&!"#$%& (&)*++,&)-./)(01)2(34(#4506 Hidden Markov Models are directed sequence models. Let x = (x 1,..., x l ) and y = (y 1,..., y l ) be sequences. x i O, where O is a set of possible observations. y i Y, where Y is a set of possible labels. π σ = p(y 1 = σ; θ) initial transition. T σ,σ = p(y i+1 = σ y i = σ ; θ) transition probabilities for σ, σ Y O σ,u = p(x i = u y i = σ; θ) observation probabilities for σ Y, u O N N 1 P(x, y; θ) = π y1 O yi,x i T yi+1,y i i=1 i=1

14 Parameter Estimation in HMMs MLE estimate π σ = P(x, y; θ) = π y1 N i=1 P m j=1 1[y j 1 =σ] m Pl j 1 i=1 1[y j i+1 =σ y j i =σ ] P m Pl j 1 j=1 i=1 y j i =σ P m Pl j j=1 i=1 1[y j i =σ x j i =u] P m Pl j j=1 i=1 y j i =σ T σ,σ = P m j=1 O σ,u = N 1 O yi,x i i=1 T yi+1,y i

15 MLE with Complete Observations for MRF p(x θ) = C φ(x C) x Y n c φ(x c) In MRFs, Z couples the parameters. There is no closed formed solution of θ. For potential functions φ C (x C ) = exp( f (x C ), θ C ) m l(θ; D) = log p(x j θ) j=1 = j ( ) f (x C ), θ C log Z θ C = C N(x C ) f (x C ), θ C m log Z θ Take the derivative of l wrt θ, equate to 0 and solve for θ. The derivative of log Z wrt θ C is the expectation E x pθ [f (x C )].

16 Parameter Estimation with Latent Variables When there are latent variables, we need to marginalize over the latent variables, which leads to coupling between parameters. Let X be observed variables, Z latent variables. l(θ; D) = log z p(x, z θ) Expectation Maximization (EM) Algorithm is a general approach to maximum likelihood parameter estimation (MLE) with latent(hidden) variables. EM is a coordinate-descent algorithm to minimize Kullback-Leibler (KL) divergence

17 EM Algorithm Since Z is not observed, the log-likelihood of data is a marginal over Z which does not decompose. l(θ; x) := log p(x θ) = log p(x, z θ) z Replace this with expected log-likelihood wrt the averaging distribution q(z x). This is linear in l c (θ; x, z) := log p(x, z θ). l c (θ; x, z) q := z q(z x, θ) log p(x, z θ) This optimization is a lower bound on l(θ; x). Using Jensen s inequality l(θ; x) = log p(x, z θ) = log p(x, z θ) q(z x) q(z x) z z p(x, z θ) q(z x) log = L(q, θ) q(z x) z

18 EM Algorithm Until convergence Maximize wrt q (E step): Maximize wrt θ (M step): q t+1 = argmax L(q, θ t ) q θ t+1 = argmax L(q t+1, θ) θ

19 EM Algorithm Until convergence Maximize wrt q (E step): q t+1 = argmax q L(q, θ t ) Maximize wrt θ (M step): θ t+1 = argmax θ L(q t+1, θ) argmax θ L(q, θ t ) = argmax q(z x) log p(x, z θ) z θ z q(z x) log q(z x) = argmax l c (θ; x, z) q θ

20 EM Algorithm Until convergence Maximize wrt q (E step): q t+1 = argmax q L(q, θ t ) l(θ; x) L(q, θ) = q(z x) log p(x θ) z z = D(q(z x) p(z x, θ)) q(z x) log p(x, z θ) q(z x) Minimizing l(θ; x) L(q, θ) is equivalent to maximizing L(q, θ). KL divergence is minimizes when q(z x) = p(z x, θ t ) Intuitively, given p(x, z θ), p(z x, θ t ) is the best guess for latent variables conditioned on x. Maximize wrt θ (M step): θ t+1 = argmax θ L(q t+1, θ) M step increases a lower bound on likelihood E step closes the gap to yield l(θ t ; x) = L(q t+1, θ t )

21 EM for HMMs E step: Get expected counts using Forward-Backward Algorithm. q t = p(z x, θ t 1 ) M step: Find MLE estimate wrt the expected counts (relative frequency). θ t = argmax l c (θ; x, z) q t θ

22 Discriminative Training of HMMs st-1 s t st-1 s t o t o t (a). Maximum Entropy Markov Models (MEMM) [McCFrePer00] (Generalization to Bayes Nets is trivial.) Figure 1. (a) The dependency graph for a traditional HMM; (b) a) HMM: P(x, y; θ) = π N for our conditional maximum y1 i=1 entropy O N 1 y i,x Markov i i=1 T y model. i+1,y i b) MEMM: P(y x; θ) = π N 1 y1,x 1 i=1 T y i+1,y i,x i+1 (b) distribution, for, and an initial state

23 Inference and Parameter Estimation in MEMMs Forward-Backward algorithm α i (σ) = σ Y α i 1 (σ )T σ,σ, x i β i (σ) = σ Y T σ,σ, x i+1 β i+1 (σ ) Representation for σ, σ Y, u O T σ,σ,ū = 1 Z (ū, σ ) exp ( θ, f (ū, σ, σ) ) Z (ū, σ ) = exp ( θ, f (ū, σ, σ ) ) σ Coupling between θ due to Z results in no-closed-form solution. Use some convex optimization technique. Note the difference between undirected models where Z is a sum over all possible values of all nodes (Y n ). Here, it is a sum over Y.

24 Generative vs Discriminative Approaches Generative framework: HMMs model p(x, y) Advantages: Efficient learning algorithm. Relative frequency. Can handle missing observation (simply latent variable) Naturally incorporates prior knowledge Disadvantages: Harder problem p(y x)p(y) vs p(y x) Questionable independence assumption x i x j y i, j i Limited representation. Overlapping features are problematic. Assuming independence violates the model. p(x i y i ) = Y k p(f k (x i ) y i ) f 1 (u) = Is u the word "Brown"?, f 2 (u) = Is u capitalized? Conditioning increase the number of parameters. p(x i y i ) = Y k p(f k (x i ) y i, f 1 (x i ),..., f k 1 (x i )) Increased complexity to capture dependency between y i and x i 1, p(x i y i, y i 1 ).

25 Generative vs Discriminative Approaches Discriminative learning: MEMM model p(y x) Advantages: Solves more direct problem. Richer representation via kernels or feature selection. Conditioning on x, overlapping features are trivial. No independence assumption on observations. Lower asymptotic error Disadvantages: Expensive learning. There is no closed form solution as in HMMs. Therefore, we need Forward-Backward algorithm Missing values in observation is problematic due to conditioning. Requires more data (O(d)) to reach its asymptotic error, whereas generative models require O(log d), for d number of parameters [NgJor01].

26

Lecture 9: PGM Learning

Lecture 9: PGM Learning 13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and

More information

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated

More information

Bayesian Machine Learning - Lecture 7

Bayesian Machine Learning - Lecture 7 Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should

More information

13 : Variational Inference: Loopy Belief Propagation and Mean Field

13 : Variational Inference: Loopy Belief Propagation and Mean Field 10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

Lecture 8: Bayesian Networks

Lecture 8: Bayesian Networks Lecture 8: Bayesian Networks Bayesian Networks Inference in Bayesian Networks COMP-652 and ECSE 608, Lecture 8 - January 31, 2017 1 Bayes nets P(E) E=1 E=0 0.005 0.995 E B P(B) B=1 B=0 0.01 0.99 E=0 E=1

More information

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly

More information

Variational Inference (11/04/13)

Variational Inference (11/04/13) STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further

More information

Inference in Graphical Models Variable Elimination and Message Passing Algorithm

Inference in Graphical Models Variable Elimination and Message Passing Algorithm Inference in Graphical Models Variable Elimination and Message Passing lgorithm Le Song Machine Learning II: dvanced Topics SE 8803ML, Spring 2012 onditional Independence ssumptions Local Markov ssumption

More information

9 Forward-backward algorithm, sum-product on factor graphs

9 Forward-backward algorithm, sum-product on factor graphs Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 9 Forward-backward algorithm, sum-product on factor graphs The previous

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4 ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian

More information

Machine Learning Summer School

Machine Learning Summer School Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Lecture 6: Graphical Models: Learning

Lecture 6: Graphical Models: Learning Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)

More information

Learning Bayesian network : Given structure and completely observed data

Learning Bayesian network : Given structure and completely observed data Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution

More information

Variational Inference. Sargur Srihari

Variational Inference. Sargur Srihari Variational Inference Sargur srihari@cedar.buffalo.edu 1 Plan of discussion We first describe inference with PGMs and the intractability of exact inference Then give a taxonomy of inference algorithms

More information

Review. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Review. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Review Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 What is Machine Learning (ML) Study of algorithms that improve their performance at some task with experience 2 Graphical Models

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15

More information

10 : HMM and CRF. 1 Case Study: Supervised Part-of-Speech Tagging

10 : HMM and CRF. 1 Case Study: Supervised Part-of-Speech Tagging 10-708: Probabilistic Graphical Models 10-708, Spring 2018 10 : HMM and CRF Lecturer: Kayhan Batmanghelich Scribes: Ben Lengerich, Michael Kleyman 1 Case Study: Supervised Part-of-Speech Tagging We will

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some

More information

Learning MN Parameters with Approximation. Sargur Srihari

Learning MN Parameters with Approximation. Sargur Srihari Learning MN Parameters with Approximation Sargur srihari@cedar.buffalo.edu 1 Topics Iterative exact learning of MN parameters Difficulty with exact methods Approximate methods Approximate Inference Belief

More information

Lecture 4 October 18th

Lecture 4 October 18th Directed and undirected graphical models Fall 2017 Lecture 4 October 18th Lecturer: Guillaume Obozinski Scribe: In this lecture, we will assume that all random variables are discrete, to keep notations

More information

CRF for human beings

CRF for human beings CRF for human beings Arne Skjærholt LNS seminar CRF for human beings LNS seminar 1 / 29 Let G = (V, E) be a graph such that Y = (Y v ) v V, so that Y is indexed by the vertices of G. Then (X, Y) is a conditional

More information

Exact Inference I. Mark Peot. In this lecture we will look at issues associated with exact inference. = =

Exact Inference I. Mark Peot. In this lecture we will look at issues associated with exact inference. = = Exact Inference I Mark Peot In this lecture we will look at issues associated with exact inference 10 Queries The objective of probabilistic inference is to compute a joint distribution of a set of query

More information

Statistical Approaches to Learning and Discovery

Statistical Approaches to Learning and Discovery Statistical Approaches to Learning and Discovery Graphical Models Zoubin Ghahramani & Teddy Seidenfeld zoubin@cs.cmu.edu & teddy@stat.cmu.edu CALD / CS / Statistics / Philosophy Carnegie Mellon University

More information

Posterior Regularization

Posterior Regularization Posterior Regularization 1 Introduction One of the key challenges in probabilistic structured learning, is the intractability of the posterior distribution, for fast inference. There are numerous methods

More information

6.867 Machine learning, lecture 23 (Jaakkola)

6.867 Machine learning, lecture 23 (Jaakkola) Lecture topics: Markov Random Fields Probabilistic inference Markov Random Fields We will briefly go over undirected graphical models or Markov Random Fields (MRFs) as they will be needed in the context

More information

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.

More information

Message Passing Algorithms and Junction Tree Algorithms

Message Passing Algorithms and Junction Tree Algorithms Message Passing lgorithms and Junction Tree lgorithms Le Song Machine Learning II: dvanced Topics S 8803ML, Spring 2012 Inference in raphical Models eneral form of the inference problem P X 1,, X n Ψ(

More information

Expectation Maximization (EM)

Expectation Maximization (EM) Expectation Maximization (EM) The EM algorithm is used to train models involving latent variables using training data in which the latent variables are not observed (unlabeled data). This is to be contrasted

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 12 Dynamical Models CS/CNS/EE 155 Andreas Krause Homework 3 out tonight Start early!! Announcements Project milestones due today Please email to TAs 2 Parameter learning

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

4 : Exact Inference: Variable Elimination

4 : Exact Inference: Variable Elimination 10-708: Probabilistic Graphical Models 10-708, Spring 2014 4 : Exact Inference: Variable Elimination Lecturer: Eric P. ing Scribes: Soumya Batra, Pradeep Dasigi, Manzil Zaheer 1 Probabilistic Inference

More information

Expectation Maximization Algorithm

Expectation Maximization Algorithm Expectation Maximization Algorithm Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein, Luke Zettlemoyer and Dan Weld The Evils of Hard Assignments? Clusters

More information

CSC 412 (Lecture 4): Undirected Graphical Models

CSC 412 (Lecture 4): Undirected Graphical Models CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:

More information

Probability Propagation

Probability Propagation Graphical Models, Lectures 9 and 10, Michaelmas Term 2009 November 13, 2009 Characterizing chordal graphs The following are equivalent for any undirected graph G. (i) G is chordal; (ii) G is decomposable;

More information

Probabilistic Graphical Models. Guest Lecture by Narges Razavian Machine Learning Class April

Probabilistic Graphical Models. Guest Lecture by Narges Razavian Machine Learning Class April Probabilistic Graphical Models Guest Lecture by Narges Razavian Machine Learning Class April 14 2017 Today What is probabilistic graphical model and why it is useful? Bayesian Networks Basic Inference

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3

More information

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015 Sequence Modelling with Features: Linear-Chain Conditional Random Fields COMP-599 Oct 6, 2015 Announcement A2 is out. Due Oct 20 at 1pm. 2 Outline Hidden Markov models: shortcomings Generative vs. discriminative

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Junction Tree, BP and Variational Methods

Junction Tree, BP and Variational Methods Junction Tree, BP and Variational Methods Adrian Weller MLSALT4 Lecture Feb 21, 2018 With thanks to David Sontag (MIT) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

CS281A/Stat241A Lecture 19

CS281A/Stat241A Lecture 19 CS281A/Stat241A Lecture 19 p. 1/4 CS281A/Stat241A Lecture 19 Junction Tree Algorithm Peter Bartlett CS281A/Stat241A Lecture 19 p. 2/4 Announcements My office hours: Tuesday Nov 3 (today), 1-2pm, in 723

More information

Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013

Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013 Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013 Outline Modeling Inference Training Applications Outline Modeling Problem definition Discriminative vs. Generative Chain CRF General

More information

p L yi z n m x N n xi

p L yi z n m x N n xi y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Lecture 8: Graphical models for Text

Lecture 8: Graphical models for Text Lecture 8: Graphical models for Text 4F13: Machine Learning Joaquin Quiñonero-Candela and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/

More information

Cours 7 12th November 2014

Cours 7 12th November 2014 Sum Product Algorithm and Hidden Markov Model 2014/2015 Cours 7 12th November 2014 Enseignant: Francis Bach Scribe: Pauline Luc, Mathieu Andreux 7.1 Sum Product Algorithm 7.1.1 Motivations Inference, along

More information

Lecture 6: Graphical Models

Lecture 6: Graphical Models Lecture 6: Graphical Models Kai-Wei Chang CS @ Uniersity of Virginia kw@kwchang.net Some slides are adapted from Viek Skirmar s course on Structured Prediction 1 So far We discussed sequence labeling tasks:

More information

UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS

UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS JONATHAN YEDIDIA, WILLIAM FREEMAN, YAIR WEISS 2001 MERL TECH REPORT Kristin Branson and Ian Fasel June 11, 2003 1. Inference Inference problems

More information

Machine Learning Lecture 14

Machine Learning Lecture 14 Many slides adapted from B. Schiele, S. Roth, Z. Gharahmani Machine Learning Lecture 14 Undirected Graphical Models & Inference 23.06.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de

More information

Introduction to Graphical Models

Introduction to Graphical Models Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic

More information

A brief introduction to Conditional Random Fields

A brief introduction to Conditional Random Fields A brief introduction to Conditional Random Fields Mark Johnson Macquarie University April, 2005, updated October 2010 1 Talk outline Graphical models Maximum likelihood and maximum conditional likelihood

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

Directed Probabilistic Graphical Models CMSC 678 UMBC

Directed Probabilistic Graphical Models CMSC 678 UMBC Directed Probabilistic Graphical Models CMSC 678 UMBC Announcement 1: Assignment 3 Due Wednesday April 11 th, 11:59 AM Any questions? Announcement 2: Progress Report on Project Due Monday April 16 th,

More information

Learning Parameters of Undirected Models. Sargur Srihari

Learning Parameters of Undirected Models. Sargur Srihari Learning Parameters of Undirected Models Sargur srihari@cedar.buffalo.edu 1 Topics Log-linear Parameterization Likelihood Function Maximum Likelihood Parameter Estimation Simple and Conjugate Gradient

More information

Latent Variable Models and EM algorithm

Latent Variable Models and EM algorithm Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu October 19, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network

More information

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics Summary of Class Advanced Topics Dhruv Batra Virginia Tech HW1 Grades Mean: 28.5/38 ~= 74.9%

More information

Hidden Markov Models

Hidden Markov Models CS769 Spring 2010 Advanced Natural Language Processing Hidden Markov Models Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu 1 Part-of-Speech Tagging The goal of Part-of-Speech (POS) tagging is to label each

More information

Lecture 15. Probabilistic Models on Graph

Lecture 15. Probabilistic Models on Graph Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Statistical Learning

Statistical Learning Statistical Learning Lecture 5: Bayesian Networks and Graphical Models Mário A. T. Figueiredo Instituto Superior Técnico & Instituto de Telecomunicações University of Lisbon, Portugal May 2018 Mário A.

More information

Basic math for biology

Basic math for biology Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood

More information

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc.

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Harsha Veeramachaneni Thomson Reuter Research and Development April 1, 2010 Harsha Veeramachaneni (TR R&D) Logistic Regression April 1, 2010

More information

5. Sum-product algorithm

5. Sum-product algorithm Sum-product algorithm 5-1 5. Sum-product algorithm Elimination algorithm Sum-product algorithm on a line Sum-product algorithm on a tree Sum-product algorithm 5-2 Inference tasks on graphical models consider

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 10 Undirected Models CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due this Wednesday (Nov 4) in class Project milestones due next Monday (Nov 9) About half

More information

CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University

CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University Charalampos E. Tsourakakis January 25rd, 2017 Probability Theory The theory of probability is a system for making better guesses.

More information

Graphical models: parameter learning

Graphical models: parameter learning Graphical models: parameter learning Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London London WC1N 3AR, England http://www.gatsby.ucl.ac.uk/ zoubin/ zoubin@gatsby.ucl.ac.uk

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 9 Undirected Models CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due next Wednesday (Nov 4) in class Start early!!! Project milestones due Monday (Nov 9)

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September

More information

An Introduction to Expectation-Maximization

An Introduction to Expectation-Maximization An Introduction to Expectation-Maximization Dahua Lin Abstract This notes reviews the basics about the Expectation-Maximization EM) algorithm, a popular approach to perform model estimation of the generative

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

An Introduction to Probabilistic Graphical Models

An Introduction to Probabilistic Graphical Models An Introduction to Probabilistic Graphical Models Cédric Archambeau Xerox Research Centre Europe cedric.archambeau@xrce.xerox.com Pascal Bootcamp Marseille, France, July 2010 Reference material D. Koller

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 6 Jan-Willem van de Meent (credit: Yijun Zhao, Chris Bishop, Andrew Moore, Hastie et al.) Project Project Deadlines 3 Feb: Form teams of

More information

Clique trees & Belief Propagation. Siamak Ravanbakhsh Winter 2018

Clique trees & Belief Propagation. Siamak Ravanbakhsh Winter 2018 Graphical Models Clique trees & Belief Propagation Siamak Ravanbakhsh Winter 2018 Learning objectives message passing on clique trees its relation to variable elimination two different forms of belief

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

Learning Parameters of Undirected Models. Sargur Srihari

Learning Parameters of Undirected Models. Sargur Srihari Learning Parameters of Undirected Models Sargur srihari@cedar.buffalo.edu 1 Topics Difficulties due to Global Normalization Likelihood Function Maximum Likelihood Parameter Estimation Simple and Conjugate

More information

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014 Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 Problem Set 3 Issued: Thursday, September 25, 2014 Due: Thursday,

More information

Based on slides by Richard Zemel

Based on slides by Richard Zemel CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we

More information

Expectation maximization tutorial

Expectation maximization tutorial Expectation maximization tutorial Octavian Ganea November 18, 2016 1/1 Today Expectation - maximization algorithm Topic modelling 2/1 ML & MAP Observed data: X = {x 1, x 2... x N } 3/1 ML & MAP Observed

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels? Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone extension 6372 Email: sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11/ Unsupervised learning Can we find regularity

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Probabilistic Graphical Models (I)

Probabilistic Graphical Models (I) Probabilistic Graphical Models (I) Hongxin Zhang zhx@cad.zju.edu.cn State Key Lab of CAD&CG, ZJU 2015-03-31 Probabilistic Graphical Models Modeling many real-world problems => a large number of random

More information

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger

More information

Chapter 16. Structured Probabilistic Models for Deep Learning

Chapter 16. Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe

More information

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 Plan for today and next week Today and next time: Bayesian networks (Bishop Sec. 8.1) Conditional

More information