Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI
|
|
- Clarissa Moore
- 6 years ago
- Views:
Transcription
1 Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI Lecture 2 Yasemin Altun January 26, 2007
2 Review of Inference on Graphical Models Elimination algorithm finds single marginal probability p(f ) by Choosing an elimination ordering I such that f is the last node Keep track of active potentials Eliminate a node i by removing all active potentials referencing i, take product and sum over x i, put this message on the active list. Consider trees with multiple queries. This is a special case of Junction Tree Algorithm, Sum-Product algorithm. Choose a root node arbitrarily A natural elimination order is depth-first traversal of the tree.
3 Inference on Trees for Single Query p(r) to root i i j m ji ( x i ) j m kj ( x j ) m lj ( x j ) k l (a) (b) Elimination of j node. If j is an evidence node φ E (x j ) = 1[x j = x j ] else φ E (x j ) = 1 m ji (x i ) = φ E (x j )φ(x i, x j ) m kj (x j ) x j k c(j) The marginal of root x r is proportional to its messages p(x r x e ) = φ E (x r ) m kr (x r ) k c(r)
4 m 32 ( Inference on Trees for x ) 2 Multiple Query m ( x ) 42 2 X 3 X4 m 32 ( x 2 ) m 42 ( x 2 ) X 3 X4 (a) (b) X1 X 1 X1 X 1 m 12 ( x 1 ) m 12 ( x ) 12 ( x 2 ) 2 m 12 ( x 2 ) m 12 ( x 1 ) X2 X X2 2 X 2 m 32 ( x 2 ) m 42 ( x 2 ) m 32 ( x 2 ) m 42 ( x 2 ) m24 ( x 4 ) m 32 ( x 2 ) m 42 ( x 2 ) X 3 X4 X 3 X4X 4 X 3 m 23 ( x 3 ) m 24 ( x 4 ) X 4 (a) (b) (c) (d) X1 X1 Fig a) Choosing 1 as root, 2 has all the messages from nodes m 12 ( x ) below it 2 m 12 ( x 2 ) m 12 ( x 1 ) Fig b) To X2 compute p(x 2 ), m 12 X2 (x 2 ) is needed m m 32 ( x 2 ) 32 ( x ) m 2 42 ( ) Fig c) One pass from leaves to root x 2 and one pass m24 ( x 4 ) m 23 ( x ) m 3 24 ( x ) 4 X 3 X4 X 3 X4 node (c) marginals (d) backward gives the messages required to compute all
5 Inference on HMMs '"#!%& '"#& '"#$%& '"#!%& '"#& '"#$%&!"#!%&!"#&!"#$%&!"#!%&!"#&!"#$%& (&)*++,&)-./)(01)2(34(#4506 Hidden Markov Models are directed sequence models. Let x = (x 1,..., x l ) and y = (y 1,..., y l ) be sequences. x i O, where O is a set of possible observations. y i Y, where Y is a set of possible labels. π σ = p(y 1 = σ; θ) initial transition. T σ,σ = p(y i+1 = σ y i = σ ; θ) transition probabilities for σ, σ Y O σ,u = p(x i = u y i = σ; θ) observation probabilities for σ Y, u O N N 1 P(x, y; θ) = π y1 O yi,x i T yi+1,y i i=1 i=1
6 Forward-Backward Algorithm for HMMs Set y l as the root, x are evidence. Forward pass α i (σ) = m (i 1)i (x i = σ) ( ) p( x 1:i, y i = σ) = α i (σ) = α i 1 (σ )T σ,σ σ Y O σ, xi Backward pass p( x i+1:l y i = σ) = β i (σ) = T σ,σo σ, x i+1 β i+1 (σ ) σ Y Combining messages P(y i = σ x 1:l ) = p( x 1:l y i )p(y i ) p( x i:l ) p( x 1:i y i )p( x i+1:l y i )p(y i ) = α i (σ)β i (σ)
7 Generalization to Junction Tree Algorithm Data structure: Generate a tree whose nodes are the maximal cliques. Then, we get p(x C x V \C ) using the sum-product. We can marginalize over this to get individual marginal probabilities. Marginalization over different cliques of node i should yield to same p(x i x V \i ). There are multiple clique tree. We need a data structure that preserves global consistency via local consistency. A junction tree is a clique tree for each clique pair C i, C j, sharing some variables C s, C s is included in all the nodes of the clique tree on the path from C i and C j. There exists a junction tree of a graph iff the graph is triangulated. Elimination ordering Message-Passing Protocol: A clique C i can send a message to neighbor C j if it has received messages from all its other neighbors.
8 Junction Tree Algorithm Moralize if the graph is directed Triangulate the graph G (using variable elimination algorithm) Construct a junction tree J from the triangulated graph via maximal weight spanning tree. Initialize the potential of cliques C of J as the product of potentials from G all assigned factors from the model. Propagate local messages on the junction tree
9 Maximizing instead of Summing Finding maximum probability configurations ion and Belief Propagation both summed over Ch values of the marginal Maximizing (non-query, non-evidence) instead ofnodes Summing If j rginal probability. Same formulation can be used exactly for finding MAP Elimination and Belief Propagation both summed over Multiplication distributes over max as well as sum Pa wanted to allmaximize possible values over the ofnon-query, the marginal non-evidence (non-query, non-evidence) nodes d the probabilty to get aof marginal the single probability. best setting consistent max(ab, ac) = a max(b, c) ery and evidence? What if we wanted to maximize over the non-query, non-evidence ax max max max nodes We max p(x can to 1)p(xfind 2 x do 1)p(x MAP the 3 x 1)p(x probabilty computation 4 x 2)p(x 5 x 3)p(x of 6 xthe 2, using x 5) single exactly best setting the same consistent x1 x2 x3 x4 x5 ax p(x 1) max p(x with algorithm 2 x 1) max anyp(x query 3 x 1) replacing maxand p(x 4 x 2) evidence? max sums p(x 5 x 3)p(x with 6 x 2, max x 5) Re x1 x2 x3 x4 x5 wn as the maximum a-posteriori or MAP configuration. Giv max x p(x) = max x1 max x2 max x3 x1 x2 gation to solve this problem. max x4 max x5 p(x 1 )p(x 2 x 1 )p(x 3 x 1 )p(x 4 x 2 )p(x 5 x 3 )p(x 6 x 2, x 5 ) that (on trees), we can use an algorithm exactly like = max p(x 1 ) max p(x 2 x 1 ) max p(x 3 x 1 ) max p(x 4 x 2 ) max p(x 5 x 3 )p(x 6 x 2, x 5 ) x3 X 4 This is known as the maximum a-posteriori or MAP configuration. X 2 X It turns out that (on trees), we can use 6 X1 an algorithm exactly like belief-propagation to solve this problem. X 4 X 3 X 5 x4 x5 X1 X 2 X 6 Re ma X 3 X 5
10 Learning Parameter Learning: Given the graph structure, how to get the conditional probability distributions P θ (X i X Πi )? Parameters θ are unknown constants Given fully observed training data D Find their estimates maximizing the (penalized) log-likelihood ˆθ = argmax log P(D θ)( λr(θ)) θ Parameter estimation with fully observed data Parameter estimation with latent variables, Expectation Maximization (EM) Structure Learning: Given fully observed training data D, how to get the graph G and its parameters θ? After studying the discriminative framework
11 Parameter Estimation with Complete Observations Random variable X = (X 1,..., X n ). The data D = (x 1,..., x m ) is a set of m IID observations. Maximum likelihood estimate (MLE) can be found by maximizing the log probability of the data. In Bayes nets, l(θ; D) = log p(d θ) = log m p(x j θ) = j=1 m n j=1 i=1 log p(x j i x j, θ π j i ) i Consider X i be discrete rv with K possible values. Then, p(x i X πi, θ i ) = θ i (x i, x πi ) is a multinomial distribution st x i θ i (x i, x πi ) = 1. l(θ; D) = N(x i,πi ) log θ i (x i, x πi ) i x i,πi N(x i,πi ) = m assignment in data j=1 1[x j i,π i = x i,πi ] observed count of x i,πi
12 Parameter Estimation with Complete Observations l(θ; D) = i x i,πi N(x i,πi ) log θ i (x i, x πi ) Estimation of θ i is independent of θ k for k i. Estimating of θ i, ignore data associated with nodes other than i and π i and maximize N(x i,πi ) log θ i (x i, x πi ) wrt θ i. Add a Lagrangian term for normalization and solve for θ i ML estimate is the relative frequency. ˆθ i (x i, x πi ) = N(x i, x πi )/N(π i )
13 Parameter Estimation in HMMs '"#!%& '"#& '"#$%& '"#!%& '"#& '"#$%&!"#!%&!"#&!"#$%&!"#!%&!"#&!"#$%& (&)*++,&)-./)(01)2(34(#4506 Hidden Markov Models are directed sequence models. Let x = (x 1,..., x l ) and y = (y 1,..., y l ) be sequences. x i O, where O is a set of possible observations. y i Y, where Y is a set of possible labels. π σ = p(y 1 = σ; θ) initial transition. T σ,σ = p(y i+1 = σ y i = σ ; θ) transition probabilities for σ, σ Y O σ,u = p(x i = u y i = σ; θ) observation probabilities for σ Y, u O N N 1 P(x, y; θ) = π y1 O yi,x i T yi+1,y i i=1 i=1
14 Parameter Estimation in HMMs MLE estimate π σ = P(x, y; θ) = π y1 N i=1 P m j=1 1[y j 1 =σ] m Pl j 1 i=1 1[y j i+1 =σ y j i =σ ] P m Pl j 1 j=1 i=1 y j i =σ P m Pl j j=1 i=1 1[y j i =σ x j i =u] P m Pl j j=1 i=1 y j i =σ T σ,σ = P m j=1 O σ,u = N 1 O yi,x i i=1 T yi+1,y i
15 MLE with Complete Observations for MRF p(x θ) = C φ(x C) x Y n c φ(x c) In MRFs, Z couples the parameters. There is no closed formed solution of θ. For potential functions φ C (x C ) = exp( f (x C ), θ C ) m l(θ; D) = log p(x j θ) j=1 = j ( ) f (x C ), θ C log Z θ C = C N(x C ) f (x C ), θ C m log Z θ Take the derivative of l wrt θ, equate to 0 and solve for θ. The derivative of log Z wrt θ C is the expectation E x pθ [f (x C )].
16 Parameter Estimation with Latent Variables When there are latent variables, we need to marginalize over the latent variables, which leads to coupling between parameters. Let X be observed variables, Z latent variables. l(θ; D) = log z p(x, z θ) Expectation Maximization (EM) Algorithm is a general approach to maximum likelihood parameter estimation (MLE) with latent(hidden) variables. EM is a coordinate-descent algorithm to minimize Kullback-Leibler (KL) divergence
17 EM Algorithm Since Z is not observed, the log-likelihood of data is a marginal over Z which does not decompose. l(θ; x) := log p(x θ) = log p(x, z θ) z Replace this with expected log-likelihood wrt the averaging distribution q(z x). This is linear in l c (θ; x, z) := log p(x, z θ). l c (θ; x, z) q := z q(z x, θ) log p(x, z θ) This optimization is a lower bound on l(θ; x). Using Jensen s inequality l(θ; x) = log p(x, z θ) = log p(x, z θ) q(z x) q(z x) z z p(x, z θ) q(z x) log = L(q, θ) q(z x) z
18 EM Algorithm Until convergence Maximize wrt q (E step): Maximize wrt θ (M step): q t+1 = argmax L(q, θ t ) q θ t+1 = argmax L(q t+1, θ) θ
19 EM Algorithm Until convergence Maximize wrt q (E step): q t+1 = argmax q L(q, θ t ) Maximize wrt θ (M step): θ t+1 = argmax θ L(q t+1, θ) argmax θ L(q, θ t ) = argmax q(z x) log p(x, z θ) z θ z q(z x) log q(z x) = argmax l c (θ; x, z) q θ
20 EM Algorithm Until convergence Maximize wrt q (E step): q t+1 = argmax q L(q, θ t ) l(θ; x) L(q, θ) = q(z x) log p(x θ) z z = D(q(z x) p(z x, θ)) q(z x) log p(x, z θ) q(z x) Minimizing l(θ; x) L(q, θ) is equivalent to maximizing L(q, θ). KL divergence is minimizes when q(z x) = p(z x, θ t ) Intuitively, given p(x, z θ), p(z x, θ t ) is the best guess for latent variables conditioned on x. Maximize wrt θ (M step): θ t+1 = argmax θ L(q t+1, θ) M step increases a lower bound on likelihood E step closes the gap to yield l(θ t ; x) = L(q t+1, θ t )
21 EM for HMMs E step: Get expected counts using Forward-Backward Algorithm. q t = p(z x, θ t 1 ) M step: Find MLE estimate wrt the expected counts (relative frequency). θ t = argmax l c (θ; x, z) q t θ
22 Discriminative Training of HMMs st-1 s t st-1 s t o t o t (a). Maximum Entropy Markov Models (MEMM) [McCFrePer00] (Generalization to Bayes Nets is trivial.) Figure 1. (a) The dependency graph for a traditional HMM; (b) a) HMM: P(x, y; θ) = π N for our conditional maximum y1 i=1 entropy O N 1 y i,x Markov i i=1 T y model. i+1,y i b) MEMM: P(y x; θ) = π N 1 y1,x 1 i=1 T y i+1,y i,x i+1 (b) distribution, for, and an initial state
23 Inference and Parameter Estimation in MEMMs Forward-Backward algorithm α i (σ) = σ Y α i 1 (σ )T σ,σ, x i β i (σ) = σ Y T σ,σ, x i+1 β i+1 (σ ) Representation for σ, σ Y, u O T σ,σ,ū = 1 Z (ū, σ ) exp ( θ, f (ū, σ, σ) ) Z (ū, σ ) = exp ( θ, f (ū, σ, σ ) ) σ Coupling between θ due to Z results in no-closed-form solution. Use some convex optimization technique. Note the difference between undirected models where Z is a sum over all possible values of all nodes (Y n ). Here, it is a sum over Y.
24 Generative vs Discriminative Approaches Generative framework: HMMs model p(x, y) Advantages: Efficient learning algorithm. Relative frequency. Can handle missing observation (simply latent variable) Naturally incorporates prior knowledge Disadvantages: Harder problem p(y x)p(y) vs p(y x) Questionable independence assumption x i x j y i, j i Limited representation. Overlapping features are problematic. Assuming independence violates the model. p(x i y i ) = Y k p(f k (x i ) y i ) f 1 (u) = Is u the word "Brown"?, f 2 (u) = Is u capitalized? Conditioning increase the number of parameters. p(x i y i ) = Y k p(f k (x i ) y i, f 1 (x i ),..., f k 1 (x i )) Increased complexity to capture dependency between y i and x i 1, p(x i y i, y i 1 ).
25 Generative vs Discriminative Approaches Discriminative learning: MEMM model p(y x) Advantages: Solves more direct problem. Richer representation via kernels or feature selection. Conditioning on x, overlapping features are trivial. No independence assumption on observations. Lower asymptotic error Disadvantages: Expensive learning. There is no closed form solution as in HMMs. Therefore, we need Forward-Backward algorithm Missing values in observation is problematic due to conditioning. Requires more data (O(d)) to reach its asymptotic error, whereas generative models require O(log d), for d number of parameters [NgJor01].
26
Lecture 9: PGM Learning
13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and
More informationSequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them
HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated
More informationBayesian Machine Learning - Lecture 7
Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should
More information13 : Variational Inference: Loopy Belief Propagation and Mean Field
10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft
More informationLecture 8: Bayesian Networks
Lecture 8: Bayesian Networks Bayesian Networks Inference in Bayesian Networks COMP-652 and ECSE 608, Lecture 8 - January 31, 2017 1 Bayes nets P(E) E=1 E=0 0.005 0.995 E B P(B) B=1 B=0 0.01 0.99 E=0 E=1
More informationProbabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov
Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly
More informationVariational Inference (11/04/13)
STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further
More informationInference in Graphical Models Variable Elimination and Message Passing Algorithm
Inference in Graphical Models Variable Elimination and Message Passing lgorithm Le Song Machine Learning II: dvanced Topics SE 8803ML, Spring 2012 onditional Independence ssumptions Local Markov ssumption
More information9 Forward-backward algorithm, sum-product on factor graphs
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 9 Forward-backward algorithm, sum-product on factor graphs The previous
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4
ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian
More informationMachine Learning Summer School
Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,
More informationGraphical Models and Kernel Methods
Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.
More informationLinear Dynamical Systems
Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations
More informationLecture 6: Graphical Models: Learning
Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)
More informationLearning Bayesian network : Given structure and completely observed data
Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution
More informationVariational Inference. Sargur Srihari
Variational Inference Sargur srihari@cedar.buffalo.edu 1 Plan of discussion We first describe inference with PGMs and the intractability of exact inference Then give a taxonomy of inference algorithms
More informationReview. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Review Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 What is Machine Learning (ML) Study of algorithms that improve their performance at some task with experience 2 Graphical Models
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15
More information10 : HMM and CRF. 1 Case Study: Supervised Part-of-Speech Tagging
10-708: Probabilistic Graphical Models 10-708, Spring 2018 10 : HMM and CRF Lecturer: Kayhan Batmanghelich Scribes: Ben Lengerich, Michael Kleyman 1 Case Study: Supervised Part-of-Speech Tagging We will
More informationGaussian Mixture Models
Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some
More informationLearning MN Parameters with Approximation. Sargur Srihari
Learning MN Parameters with Approximation Sargur srihari@cedar.buffalo.edu 1 Topics Iterative exact learning of MN parameters Difficulty with exact methods Approximate methods Approximate Inference Belief
More informationLecture 4 October 18th
Directed and undirected graphical models Fall 2017 Lecture 4 October 18th Lecturer: Guillaume Obozinski Scribe: In this lecture, we will assume that all random variables are discrete, to keep notations
More informationCRF for human beings
CRF for human beings Arne Skjærholt LNS seminar CRF for human beings LNS seminar 1 / 29 Let G = (V, E) be a graph such that Y = (Y v ) v V, so that Y is indexed by the vertices of G. Then (X, Y) is a conditional
More informationExact Inference I. Mark Peot. In this lecture we will look at issues associated with exact inference. = =
Exact Inference I Mark Peot In this lecture we will look at issues associated with exact inference 10 Queries The objective of probabilistic inference is to compute a joint distribution of a set of query
More informationStatistical Approaches to Learning and Discovery
Statistical Approaches to Learning and Discovery Graphical Models Zoubin Ghahramani & Teddy Seidenfeld zoubin@cs.cmu.edu & teddy@stat.cmu.edu CALD / CS / Statistics / Philosophy Carnegie Mellon University
More informationPosterior Regularization
Posterior Regularization 1 Introduction One of the key challenges in probabilistic structured learning, is the intractability of the posterior distribution, for fast inference. There are numerous methods
More information6.867 Machine learning, lecture 23 (Jaakkola)
Lecture topics: Markov Random Fields Probabilistic inference Markov Random Fields We will briefly go over undirected graphical models or Markov Random Fields (MRFs) as they will be needed in the context
More informationIEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm
IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.
More informationMessage Passing Algorithms and Junction Tree Algorithms
Message Passing lgorithms and Junction Tree lgorithms Le Song Machine Learning II: dvanced Topics S 8803ML, Spring 2012 Inference in raphical Models eneral form of the inference problem P X 1,, X n Ψ(
More informationExpectation Maximization (EM)
Expectation Maximization (EM) The EM algorithm is used to train models involving latent variables using training data in which the latent variables are not observed (unlabeled data). This is to be contrasted
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 12 Dynamical Models CS/CNS/EE 155 Andreas Krause Homework 3 out tonight Start early!! Announcements Project milestones due today Please email to TAs 2 Parameter learning
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More information4 : Exact Inference: Variable Elimination
10-708: Probabilistic Graphical Models 10-708, Spring 2014 4 : Exact Inference: Variable Elimination Lecturer: Eric P. ing Scribes: Soumya Batra, Pradeep Dasigi, Manzil Zaheer 1 Probabilistic Inference
More informationExpectation Maximization Algorithm
Expectation Maximization Algorithm Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein, Luke Zettlemoyer and Dan Weld The Evils of Hard Assignments? Clusters
More informationCSC 412 (Lecture 4): Undirected Graphical Models
CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:
More informationProbability Propagation
Graphical Models, Lectures 9 and 10, Michaelmas Term 2009 November 13, 2009 Characterizing chordal graphs The following are equivalent for any undirected graph G. (i) G is chordal; (ii) G is decomposable;
More informationProbabilistic Graphical Models. Guest Lecture by Narges Razavian Machine Learning Class April
Probabilistic Graphical Models Guest Lecture by Narges Razavian Machine Learning Class April 14 2017 Today What is probabilistic graphical model and why it is useful? Bayesian Networks Basic Inference
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3
More informationSequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015
Sequence Modelling with Features: Linear-Chain Conditional Random Fields COMP-599 Oct 6, 2015 Announcement A2 is out. Due Oct 20 at 1pm. 2 Outline Hidden Markov models: shortcomings Generative vs. discriminative
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationJunction Tree, BP and Variational Methods
Junction Tree, BP and Variational Methods Adrian Weller MLSALT4 Lecture Feb 21, 2018 With thanks to David Sontag (MIT) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,
More information6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationCS281A/Stat241A Lecture 19
CS281A/Stat241A Lecture 19 p. 1/4 CS281A/Stat241A Lecture 19 Junction Tree Algorithm Peter Bartlett CS281A/Stat241A Lecture 19 p. 2/4 Announcements My office hours: Tuesday Nov 3 (today), 1-2pm, in 723
More informationConditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013
Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013 Outline Modeling Inference Training Applications Outline Modeling Problem definition Discriminative vs. Generative Chain CRF General
More informationp L yi z n m x N n xi
y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationLecture 8: Graphical models for Text
Lecture 8: Graphical models for Text 4F13: Machine Learning Joaquin Quiñonero-Candela and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/
More informationCours 7 12th November 2014
Sum Product Algorithm and Hidden Markov Model 2014/2015 Cours 7 12th November 2014 Enseignant: Francis Bach Scribe: Pauline Luc, Mathieu Andreux 7.1 Sum Product Algorithm 7.1.1 Motivations Inference, along
More informationLecture 6: Graphical Models
Lecture 6: Graphical Models Kai-Wei Chang CS @ Uniersity of Virginia kw@kwchang.net Some slides are adapted from Viek Skirmar s course on Structured Prediction 1 So far We discussed sequence labeling tasks:
More informationUNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS
UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS JONATHAN YEDIDIA, WILLIAM FREEMAN, YAIR WEISS 2001 MERL TECH REPORT Kristin Branson and Ian Fasel June 11, 2003 1. Inference Inference problems
More informationMachine Learning Lecture 14
Many slides adapted from B. Schiele, S. Roth, Z. Gharahmani Machine Learning Lecture 14 Undirected Graphical Models & Inference 23.06.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de
More informationIntroduction to Graphical Models
Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic
More informationA brief introduction to Conditional Random Fields
A brief introduction to Conditional Random Fields Mark Johnson Macquarie University April, 2005, updated October 2010 1 Talk outline Graphical models Maximum likelihood and maximum conditional likelihood
More informationMachine Learning for Signal Processing Bayes Classification and Regression
Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For
More informationDirected Probabilistic Graphical Models CMSC 678 UMBC
Directed Probabilistic Graphical Models CMSC 678 UMBC Announcement 1: Assignment 3 Due Wednesday April 11 th, 11:59 AM Any questions? Announcement 2: Progress Report on Project Due Monday April 16 th,
More informationLearning Parameters of Undirected Models. Sargur Srihari
Learning Parameters of Undirected Models Sargur srihari@cedar.buffalo.edu 1 Topics Log-linear Parameterization Likelihood Function Maximum Likelihood Parameter Estimation Simple and Conjugate Gradient
More informationLatent Variable Models and EM algorithm
Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu October 19, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network
More informationECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning
ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics Summary of Class Advanced Topics Dhruv Batra Virginia Tech HW1 Grades Mean: 28.5/38 ~= 74.9%
More informationHidden Markov Models
CS769 Spring 2010 Advanced Natural Language Processing Hidden Markov Models Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu 1 Part-of-Speech Tagging The goal of Part-of-Speech (POS) tagging is to label each
More informationLecture 15. Probabilistic Models on Graph
Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationUndirected Graphical Models
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationStatistical Learning
Statistical Learning Lecture 5: Bayesian Networks and Graphical Models Mário A. T. Figueiredo Instituto Superior Técnico & Instituto de Telecomunicações University of Lisbon, Portugal May 2018 Mário A.
More informationBasic math for biology
Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood
More informationLogistic Regression: Online, Lazy, Kernelized, Sequential, etc.
Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Harsha Veeramachaneni Thomson Reuter Research and Development April 1, 2010 Harsha Veeramachaneni (TR R&D) Logistic Regression April 1, 2010
More information5. Sum-product algorithm
Sum-product algorithm 5-1 5. Sum-product algorithm Elimination algorithm Sum-product algorithm on a line Sum-product algorithm on a tree Sum-product algorithm 5-2 Inference tasks on graphical models consider
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 10 Undirected Models CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due this Wednesday (Nov 4) in class Project milestones due next Monday (Nov 9) About half
More informationCS 591, Lecture 2 Data Analytics: Theory and Applications Boston University
CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University Charalampos E. Tsourakakis January 25rd, 2017 Probability Theory The theory of probability is a system for making better guesses.
More informationGraphical models: parameter learning
Graphical models: parameter learning Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London London WC1N 3AR, England http://www.gatsby.ucl.ac.uk/ zoubin/ zoubin@gatsby.ucl.ac.uk
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 9 Undirected Models CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due next Wednesday (Nov 4) in class Start early!!! Project milestones due Monday (Nov 9)
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September
More informationAn Introduction to Expectation-Maximization
An Introduction to Expectation-Maximization Dahua Lin Abstract This notes reviews the basics about the Expectation-Maximization EM) algorithm, a popular approach to perform model estimation of the generative
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationAn Introduction to Probabilistic Graphical Models
An Introduction to Probabilistic Graphical Models Cédric Archambeau Xerox Research Centre Europe cedric.archambeau@xrce.xerox.com Pascal Bootcamp Marseille, France, July 2010 Reference material D. Koller
More informationData Mining Techniques
Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 6 Jan-Willem van de Meent (credit: Yijun Zhao, Chris Bishop, Andrew Moore, Hastie et al.) Project Project Deadlines 3 Feb: Form teams of
More informationClique trees & Belief Propagation. Siamak Ravanbakhsh Winter 2018
Graphical Models Clique trees & Belief Propagation Siamak Ravanbakhsh Winter 2018 Learning objectives message passing on clique trees its relation to variable elimination two different forms of belief
More informationLecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions
DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K
More informationLearning Parameters of Undirected Models. Sargur Srihari
Learning Parameters of Undirected Models Sargur srihari@cedar.buffalo.edu 1 Topics Difficulties due to Global Normalization Likelihood Function Maximum Likelihood Parameter Estimation Simple and Conjugate
More informationMassachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 Problem Set 3 Issued: Thursday, September 25, 2014 Due: Thursday,
More informationBased on slides by Richard Zemel
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we
More informationExpectation maximization tutorial
Expectation maximization tutorial Octavian Ganea November 18, 2016 1/1 Today Expectation - maximization algorithm Topic modelling 2/1 ML & MAP Observed data: X = {x 1, x 2... x N } 3/1 ML & MAP Observed
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationMachine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?
Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone extension 6372 Email: sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11/ Unsupervised learning Can we find regularity
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationProbabilistic Graphical Models (I)
Probabilistic Graphical Models (I) Hongxin Zhang zhx@cad.zju.edu.cn State Key Lab of CAD&CG, ZJU 2015-03-31 Probabilistic Graphical Models Modeling many real-world problems => a large number of random
More informationThe Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision
The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample
More informationExpectation Maximization
Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger
More informationChapter 16. Structured Probabilistic Models for Deep Learning
Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe
More informationCS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 Plan for today and next week Today and next time: Bayesian networks (Bishop Sec. 8.1) Conditional
More information