13 : Variational Inference: Loopy Belief Propagation and Mean Field

Size: px
Start display at page:

Download "13 : Variational Inference: Loopy Belief Propagation and Mean Field"

Transcription

1 10-708: Probabilistic Graphical Models , Spring : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction Inference problems involve answering a query that concerns the likelihood of observed data. For example, to answer the query on a marginal p(x A ), we can perform the marginalization operation to derive C/A p(x). Or, for queries concern the conditionals, such as p(x A x B ), we can first compute the joint, and divide by the marginals p(x B ). Sometimes to answer a query, we might also need to compute the mode of density ˆx = arg max x X m p(x). So far, in the class, we have covered the exact inference problem. To perform exact inference, we know that brute force search might be too inefficient for large graphs with complex structures, so a family of message passing algorithm such as forward-backward, sum-product, max-product, and junction tree, was introduced. However, although we know that these message-passing based exact inference algorithms work well for treestructured graphical models, it was also shown in the class that they might not yield consistent results for loopy graphs, or, the convergence might not be guaranteed. Also, for complex graphical models such as the Ising model, we cannot run exact inference algorithm such as the junction tree algorithm, because it is computationally intractable. In this lecture, we look at two variational inference algorithms: loopy belief propagation (yww) and mean field approximation (pschulam). 2 Loopy Belief Propagation The general idea for loopy belief propagation is that even though we know graph contains loops and the messages might circulate indefinitely, we still let it run anyway and hope for the best. In this section, we first review the basic belief propagation algorithms. Then, we discuss an experimental study by Murphy et al. (1999), and show some empirical results on the effects of loopy belief propagation. Most importantly, we start from the notion of KL divergence, and show how to explain the LBP algorithm from the perspective of minimizing the Bethe free energy. 2.1 Belief Propagation: a Quick Review The basic idea of belief propagation is very simple: to update the belief on a node, we just need to calculate the doubleton potentials from its neighboring nodes, and multiply with the target node s singleton potential. To give a more concrete example, let s consider the example on Figure 1. On the part (a) of the figure, we see that in order to compute the message M i j (x j ), we will need to calculate the message from all the neighboring nodes x k to. Then, multiply with the singletons and doubletons concerning and, x j : M i j (x j ) Φ ij (, x j )Φ i ( ) k M k i ( ) (1) 1

2 2 13 : Variational Inference: Loopy Belief Propagation and Mean Field Figure 1: Belief propagation: an example. Here the doubleton potential Φ ij (, x j ) is also called compatibilities, and is used to model the interactions of the two nodes, whereas the singleton potential Φ i ( ) is also called the external evidence. On the right-hand side (part b), we can simple update the belief of using the similar formulation: b i ( ) Φ i ( ) k M k (x k ) (2) Similarly, for factor graphs, we can also have the notion of messages and update the belief of node by multiplying its factor and the messages coming from neighboring nodes: b i ( ) f i ( ) m a i ( ) (3) a N(i) If we want to calculate message from node X a to node, we should sum up all the products: m a i ( ) f a (X a ) m j a (x j ) (4) X a\ j N(a)\i In the class, we know that running BP on trees always converges to the exact solution. However, it is not always the case for loopy graphs. The problem is that when the message is sent into a loop structure, it might circulate indefinitely, so it does not guarantee the convergence or it might converge to the wrong solution. 2.2 Loopy Belief Propagation Algorithm The idea of loopy belief propagation algorithm is to use a fixed point iteration procedure to minimize the Bethe free energy. Basically, if the convergence criteria is not met, we can update the messages and the believes: b i ( ) m a i ( ) (5) a N(i) b a (X a ) f a (X a ) m new i a( ) = c N(i)\a i N(a) m new a i( ) = f a (X a ) X a\ m i a ( ) (6) m c i ( ) (7) j N(a)\i m j a (x j ) (8)

3 13 : Variational Inference: Loopy Belief Propagation and Mean Field 3 Therefore, we know that stationary properties are guaranteed when it converges. However, the big problem here is that the convergence is not guaranteed, and the reason is intuitive: when BP algorithm is running on graphs that include loops, the messages might be circulating in the loops forever. Interestingly, Murphy et al. (1999 UAI) has studied the empirical behaviors of the loopy belief propagation algorithm, and found that LBP can still achieve good approximations: The program is stopped after a fixed number of iterations. Stop when there is no significant difference in belief updates. When the solution converges, it is usually a good approximation. And this is probably the reason why LBP is still a very popular inference algorithm, even though convergence might be guaranteed. Also, it was mentioned in class that, in order to test the empirical performance of an approximate inference algorithm on large intractable problems, one can always start simple by testing on a small example of the problem (e.g. a 20 x 20 nodes Ising model). 2.3 Understanding LBP: a F Bethe Minimization Perspective To understand the LBP algorithm, let s first define the true distribution P as: P (X) = 1 f a (X a ) (9) Z f a F where Z is the partition function, and we are interested in the product of factors. Since this is often intractable, we can approximate the distribution P with Q. To do this, we can utilize the KL-divergence method: KL(Q P ) = X Q(X) log Q(X) P (X) Note that the KL divergence is asymmetric. The value from KL should be non-negative, and has the minimum value when P = Q. KL is very useful in our problem, because it means that we can know use KL to approximate Q. To do this, we can compute KL(Q P ) without performing inference: KL(Q P ) = X Q(X) log Q(X) P (X) (10) (11) = X Q(X) log Q(X) X Q(X) log P (X) (12) = H Q (X) E Q log P (X) (13) If we replace P (X) with our earlier definition on the true probability, we can get: ( 1 ) KL(Q P ) = H Q (X) E Q log f a (X a ) (14) Z f a F = H Q (X) log 1 Z E Q log f a (X a ) (15) Note that if we re-arrange the terms on the right side of the equation, we can get KL(Q P ) = H Q (X) E Q log f a (X a ) + log Z (16) f a F f a F

4 4 13 : Variational Inference: Loopy Belief Propagation and Mean Field And the Physicists define the first two terms on the right side of equation as (Gibbs) free energy F (P, Q). So, now, our goal can be boiled down to compute the F (P, Q). In order to do this, we know that f E a F Q log f a (X a ) can be computed by summing up all the marginals, where as computing H Q (X) is a much harder task that needs to sum over all possible values, which is very expensive. However, we can always approximate F (P, Q) by computing ˆF (P, Q). Before we show how to approximate the Gibbs free energy, let s first consider the case with tree graphical models in Fig. 2 : Here, we know the probability can Figure 2: Calculating the tree energy: an example. be written as: b(x) = a b a(x a ) i b i( ) 1 di, and the H tree and F tree can be written as: H tree = b a (x a ) log b a (x a ) + (d i 1) b i ( ) log b i ( ) (17) a x a i F tree = b a (x a ) log b a(x a ) f a x a (x a ) + (1 d i ) b i ( ) log b i ( ) (18) a i = F 12 + F F 67 + F 78 F 1 F 5 F 2 F 6 F 3 F 7 (19) It can be seen that from the above derivation, we only need to sum over the singletons and doubletons, which is easy to compute. Similarly, we can also use the above idea to approximate the Gibbs free energy. For example, in a general graph, such as Fig. 3, we also have: Figure 3: Calculting the loopy graph Bethe energy: an example. H Bethe = b a (x a ) log b a (x a ) + (d i 1) b i ( ) log b i ( ) a x a i (20) F Bethe = b a (x a ) log b a(x a ) f a x a (x a ) + (1 d i ) b i ( ) log b i ( ) = f a (x a ) H Bethe a i (21) = F 12 + F F 67 + F 78 F 1 F 5 2F 2 2F 6 F 8 (22)

5 13 : Variational Inference: Loopy Belief Propagation and Mean Field 5 So, this is called the Bethe approximation of the Gibbs free energy. The idea is simple: we just need to sum over the singletons and the doubletons to derive the entropy. However, we need to notice that this approximation might not be well connected to the Gibbs free energy. Now, to minimize the Bethe free energy, we can write out the objective with Lagrangian dual form: l = F Bethe + i γ i {1 b i ( )} + a i N(a) λ ai ( ) {b i ( ) } b a (X a ) X a\ l Now to solve this, we need to take the partial derivatives and set them into zeros ( b = 0 and l i() b = 0). a(x a) Then, we have: ( 1 b i ( ) exp d i 1 a N(i) ) λ ai ( ) ( b a (X a ) exp E a (X a ) + i N(a) ) λ ai ( ) Interestingly, if we let λ ai ( ) = log(m i a ( )) = log b N(i) a m b i( ), and use b a i ( ) = X a\ b a (X a ), then we can obtain exactly the same BP formulations in equations 3 and 4. This is very attractive, because we have shown how to derive the message passing algorithm from the perspective of minimizing the Bethe free energy. So, in general, the variational methods can be summarize as: { } q = arg min F Bethe (p, q) q S where q is a now a tractable problem. Note that here we do not want to optimize q(x) directly. Instead, we want to focus on a relaxed feasible set and approximate the objective: { } b = arg min E F (b) b M o b where b covers the edge potentials (doubletons) and the node potentials (singletons). To solve for b, we typically use a fixed point iteration algorithm. (23) (24) (25) (26) (27) 3 Mean Field Approximation Recall that the purpose of approximate inference methods is to allow us to compute the posterior distribution over a model s latent variables even when the posterior involves an intractable integral or summation. As a motivating example, we will look at a Bayesian mixture of Gaussians with known variance b 2. To review, a mixture of K Gaussians has the following generative story: θ Dir(α) µ k N (0, a 2 ) for k {1,..., K} For i {1,..., n} z i Mult(θ) N (µ zi, b 2 )

6 6 13 : Variational Inference: Loopy Belief Propagation and Mean Field Suppose that we wanted to compute a posterior distribution over the cluster assignments z i and cluster mean vectors µ k. This would require us to compute the following quantity where µ = {µ 1,..., µ K }, z = {z 1,..., z n }, and x = {x 1,..., x n } p(µ, z x) = µ K k=1 p(µ k) n i=1 p(z i)p( z i, µ) K z k=1 p(µ k) n i=1 p(z i)p( z i, µ) (28) We can easily compute the numerator, but the denominator will be intractable because it involves a summation over all configuration of the latent cluster variables z. If there are K clusters, then the number of configurations that we would need to sum over is K n. This is difficult by itself, and, in order to compute the denominator, we also need compute the integral over all mean vectors µ. In the above posterior distribution, the denominator is difficult to compute because the latent variables are not easily factored. Note, in particular, that they are coupled in the conditional density of a particular data point. In general, when the latent variables are coupled, we must sum an exponentially large number of terms in order to compute the normalizing quantity in a posterior distribution. If, however, the latent variables could be easily factored, then we might be able to more easily compute the normalizing term in the posterior distribution. Broadly, mean field variational inference is a technique used to design a new family of distributions over the latent variables that do factorize well, and can then be used to approximate the posterior distribution over the Gaussian mixture model parameters and latent variables shown above. In symbols, the mean field approximation assumes that the variational distribution over the latent variables factorizes q(z 1,..., z m ) = m q(z i ; ν i ) (29) More generally, we do not need to assume that the joint distribution over the latent variables factorizes into a separate term for each variable. We can include more broad families of variational distributions by instead assuming that the joint factorizes into independent distributions over clusters of the latent variables: i=1 q(z 1,..., z m ) = q(c i ; ν i ) (30) C i C Where C is some set of disjoint sets of the latent variables. 3.1 Variational Inference Objective Functions Since we are approximating a distribution with our variational distribution q, a natural way to measure the quality of our approximation is using the Kullback-Leibler (KL) divergence between the true density p and our approximation q: KL(p q) = x p(x) log p(x) q(x) (31)

7 13 : Variational Inference: Loopy Belief Propagation and Mean Field 7 This metric, however, is a problem since it requires pointwise evaluation of p(x), which is the problem we are trying to solve in the first place. An alternative is to reverse the directionality of the KL divergence: KL(q p) = x q(x) log q(x) p(x) (32) Assuming that our approximation q(x) is tractable to compute, this metric is a slight improvement, but still involves evaluating p(x) in the denominator of the log. Note, however, that the unnormalized measure p(x) can be written as p(x)z where Z is the normalizing factor of the distribution. When p(x) is a posterior p(x D), then the normalizing constant Z is p(d). Using this fact, we can define a new objective function J(q): J(q) = KL(q p) (33) = x = x = x = x q(x) log q(x) p(x) q(x) log q(x) p(x)z q(x) log q(x) p(x) x (34) (35) q(x) log Z (36) q(x) log q(x) log Z (37) p(x) (38) Since log Z = log p(d) is a constant, minimizing J(q) = KL(q p) is equivalent to minimizing an upper bound on the negative log likelihood of the evidence by minimizing the KL divergence between our approximation and the true distribution p(x). An alternative objective is to maximize J(q), which is known as the energy functional. 3.2 Interpretations of the Objective Function We can rewrite our objective function J(q) as J(q) = E q [log q(x)] + E q [ log p(x)] (39) = H(q) + E q [E(x)] (40) Where E(x) = log p(x) is the energy. Intuitively, since we are minimizing J(q), we can see by breaking the objective function down in this way that minimizing our objective is attempting to do two things. First, we want to minimize the negative entropy (or increase the entropy), which, as we know from the maximum entropy principle, is a good way to measure how well a distribution will generalize. That is, we do not want to make unwarranted assumptions about the distribution, and should, in general, always seek to choose the distribution that maximizes entropy. Second, we want to minimize the expected energy E q [ log p(x)].

8 8 13 : Variational Inference: Loopy Belief Propagation and Mean Field Recall that the energy is lower when probability is high, so we would like to minimize energy if we want to maximize likelihood. The second term, at an intuitive level, is making sure that our approximate distribution puts more mass on x with low energy according to p(x), and less mass on x with high energy. Another interpretation of the objective function J(q) is J(q) = E q [log q(x) log p(x)p(d x)] (41) = E q [log q(x) log p(x) log p(d x)] (42) = E q [ log p(d x)] + KL(q p) (43) We can again see that breaking down the objective function can give us an intuitive feel for what we are minimizing. We see that we are minimizing the expected negative log likelihood of the data conditioned on x, which prefers distributions that put more probability mass on x that increase the likelihood of our observed data. In addition, there is a term that penalizes distributions q that are too far from p. 3.3 Optimizing the Variational Distribution Now that we understand the objective functions that we want to minimize (maximize in the case of the energy functional), we can address the issue of actually finding the variational distribution that minimizes (or maximizes) the objective function. In what follows, we use the energy functional objective function: J(q) = x q(x) log p(x) q(x) (44) Recall that we wish to minimize this function. We will also use the most simple approximating distribution that assumes that the joint density over all hidden variables x can factor completely. That is m q(x 1,..., x m ) = q i ( ; ν i ) (45) i=1 Our strategy will be to use coordinate ascent to maximize the J(q) with respect to each q i. By deriving results that optimize the energy functional with respect to q i, it is relatively straight forward to extend the coordinate ascent updates to optimize the parameters ν i for each local distribution q i. Let us first view J(q) as a function of one of the local distributions q i, which we will write as J(q i ). We can then rewrite the objective function

9 13 : Variational Inference: Loopy Belief Propagation and Mean Field 9 J(q i ) = q(x) log p(x) (46) q(x) x = ( m q i ( )) m log p(x) log q j (x j ) (47) =1 j=1 = q i ( ) q k (x k ) log p(x) log q j (x j ) log q i ( ) (48) x i k i j i = q i ( ) q k (x k ) log p(x) q i ( ) q k (x k ) log q j (x j ) + log q i ( ) (49) x i k i x i k i j i q i ( ) q k (x k ) log p(x) q i ( ) q k (x k ) log q i ( ) (50) x i k i x i k i = q i ( ) q k (x k ) log p(x) q i ( ) log q i ( ) (51) x i k i = q i ( )E qi [log p(x)] q i ( ) log q i ( ) (52) With this modified form, we can now define E qi [log p(x)] to be the log of some function of : log f i ( ) = E qi [log p(x)] (53) which allows us to rewrite the final expression in our derivation above as: q i ( ) log f i ( ) q i ( ) log q i ( ) = KL(q i f i ) (54) We then maximize our objective J(q i ) by minimizing KL(q i f i ), which is clearly done by setting q i ( ) = f i ( ). Thus log f i ( ) = E qi [log p(x)] (55) f i ( ) = exp (E qi [log p(x)]) (56) Therefore, the distribution for each q i that maximizes our objective function is q i ( ) = 1 Z i exp (E qi [log p(x)]) (57) where Z i is some normalizing constant to ensure that q i is a proper distribution. From this we can see that the approximate distribution over a particular hidden variable depends on the mean values of the rest of

10 10 13 : Variational Inference: Loopy Belief Propagation and Mean Field the hidden variables. In the expectation, we can drop all terms that do not involve, which will remove the means of all variables that are not neighbors of. We see then that the distribution for a variable depends on the mean values of its neighbors. This is known as the mean field, which is where the name mean field approximation comes from.

12 : Variational Inference I

12 : Variational Inference I 10-708: Probabilistic Graphical Models, Spring 2015 12 : Variational Inference I Lecturer: Eric P. Xing Scribes: Fattaneh Jabbari, Eric Lei, Evan Shapiro 1 Introduction Probabilistic inference is one of

More information

13 : Variational Inference: Loopy Belief Propagation

13 : Variational Inference: Loopy Belief Propagation 10-708: Probabilistic Graphical Models 10-708, Spring 2014 13 : Variational Inference: Loopy Belief Propagation Lecturer: Eric P. Xing Scribes: Rajarshi Das, Zhengzhong Liu, Dishan Gupta 1 Introduction

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Variational Inference (11/04/13)

Variational Inference (11/04/13) STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3

More information

Bayesian Machine Learning - Lecture 7

Bayesian Machine Learning - Lecture 7 Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1

More information

Variational algorithms for marginal MAP

Variational algorithms for marginal MAP Variational algorithms for marginal MAP Alexander Ihler UC Irvine CIOG Workshop November 2011 Variational algorithms for marginal MAP Alexander Ihler UC Irvine CIOG Workshop November 2011 Work with Qiang

More information

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2014 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Yu-Hsin Kuo, Amos Ng 1 Introduction Last lecture

More information

Expectation Propagation Algorithm

Expectation Propagation Algorithm Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,

More information

Probabilistic and Bayesian Machine Learning

Probabilistic and Bayesian Machine Learning Probabilistic and Bayesian Machine Learning Day 4: Expectation and Belief Propagation Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/

More information

14 : Mean Field Assumption

14 : Mean Field Assumption 10-708: Probabilistic Graphical Models 10-708, Spring 2018 14 : Mean Field Assumption Lecturer: Kayhan Batmanghelich Scribes: Yao-Hung Hubert Tsai 1 Inferential Problems Can be categorized into three aspects:

More information

Lecture 18 Generalized Belief Propagation and Free Energy Approximations

Lecture 18 Generalized Belief Propagation and Free Energy Approximations Lecture 18, Generalized Belief Propagation and Free Energy Approximations 1 Lecture 18 Generalized Belief Propagation and Free Energy Approximations In this lecture we talked about graphical models and

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS

UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS JONATHAN YEDIDIA, WILLIAM FREEMAN, YAIR WEISS 2001 MERL TECH REPORT Kristin Branson and Ian Fasel June 11, 2003 1. Inference Inference problems

More information

Variational Inference. Sargur Srihari

Variational Inference. Sargur Srihari Variational Inference Sargur srihari@cedar.buffalo.edu 1 Plan of discussion We first describe inference with PGMs and the intractability of exact inference Then give a taxonomy of inference algorithms

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Variational Inference IV: Variational Principle II Junming Yin Lecture 17, March 21, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3 X 3 Reading: X 4

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger

More information

Junction Tree, BP and Variational Methods

Junction Tree, BP and Variational Methods Junction Tree, BP and Variational Methods Adrian Weller MLSALT4 Lecture Feb 21, 2018 With thanks to David Sontag (MIT) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,

More information

The Expectation Maximization or EM algorithm

The Expectation Maximization or EM algorithm The Expectation Maximization or EM algorithm Carl Edward Rasmussen November 15th, 2017 Carl Edward Rasmussen The EM algorithm November 15th, 2017 1 / 11 Contents notation, objective the lower bound functional,

More information

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Maria Ryskina, Yen-Chia Hsu 1 Introduction

More information

9 Forward-backward algorithm, sum-product on factor graphs

9 Forward-backward algorithm, sum-product on factor graphs Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 9 Forward-backward algorithm, sum-product on factor graphs The previous

More information

Inference as Optimization

Inference as Optimization Inference as Optimization Sargur Srihari srihari@cedar.buffalo.edu 1 Topics in Inference as Optimization Overview Exact Inference revisited The Energy Functional Optimizing the Energy Functional 2 Exact

More information

An Introduction to Expectation-Maximization

An Introduction to Expectation-Maximization An Introduction to Expectation-Maximization Dahua Lin Abstract This notes reviews the basics about the Expectation-Maximization EM) algorithm, a popular approach to perform model estimation of the generative

More information

Basic Sampling Methods

Basic Sampling Methods Basic Sampling Methods Sargur Srihari srihari@cedar.buffalo.edu 1 1. Motivation Topics Intractability in ML How sampling can help 2. Ancestral Sampling Using BNs 3. Transforming a Uniform Distribution

More information

17 : Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo

More information

Learning MN Parameters with Approximation. Sargur Srihari

Learning MN Parameters with Approximation. Sargur Srihari Learning MN Parameters with Approximation Sargur srihari@cedar.buffalo.edu 1 Topics Iterative exact learning of MN parameters Difficulty with exact methods Approximate methods Approximate Inference Belief

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

Probabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013

Probabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013 School of Computer Science Probabilistic Graphical Models Theory of Variational Inference: Inner and Outer Approximation Junming Yin Lecture 15, March 4, 2013 Reading: W & J Book Chapters 1 Roadmap Two

More information

Instructor: Dr. Volkan Cevher. 1. Background

Instructor: Dr. Volkan Cevher. 1. Background Instructor: Dr. Volkan Cevher Variational Bayes Approximation ice University STAT 631 / ELEC 639: Graphical Models Scribe: David Kahle eviewers: Konstantinos Tsianos and Tahira Saleem 1. Background These

More information

17 Variational Inference

17 Variational Inference Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms for Inference Fall 2014 17 Variational Inference Prompted by loopy graphs for which exact

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

Sampling Algorithms for Probabilistic Graphical models

Sampling Algorithms for Probabilistic Graphical models Sampling Algorithms for Probabilistic Graphical models Vibhav Gogate University of Washington References: Chapter 12 of Probabilistic Graphical models: Principles and Techniques by Daphne Koller and Nir

More information

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI Generative and Discriminative Approaches to Graphical Models CMSC 35900 Topics in AI Lecture 2 Yasemin Altun January 26, 2007 Review of Inference on Graphical Models Elimination algorithm finds single

More information

Fractional Belief Propagation

Fractional Belief Propagation Fractional Belief Propagation im iegerinck and Tom Heskes S, niversity of ijmegen Geert Grooteplein 21, 6525 EZ, ijmegen, the etherlands wimw,tom @snn.kun.nl Abstract e consider loopy belief propagation

More information

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Lecture 21: Spectral Learning for Graphical Models

Lecture 21: Spectral Learning for Graphical Models 10-708: Probabilistic Graphical Models 10-708, Spring 2016 Lecture 21: Spectral Learning for Graphical Models Lecturer: Eric P. Xing Scribes: Maruan Al-Shedivat, Wei-Cheng Chang, Frederick Liu 1 Motivation

More information

Latent Variable Models

Latent Variable Models Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 5 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 5 1 / 31 Recap of last lecture 1 Autoregressive models:

More information

Probabilistic Graphical Models for Image Analysis - Lecture 4

Probabilistic Graphical Models for Image Analysis - Lecture 4 Probabilistic Graphical Models for Image Analysis - Lecture 4 Stefan Bauer 12 October 2018 Max Planck ETH Center for Learning Systems Overview 1. Repetition 2. α-divergence 3. Variational Inference 4.

More information

Variational Inference. Sargur Srihari

Variational Inference. Sargur Srihari Variational Inference Sargur srihari@cedar.buffalo.edu 1 Plan of Discussion Functionals Calculus of Variations Maximizing a Functional Finding Approximation to a Posterior Minimizing K-L divergence Factorized

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 9: Expectation Maximiation (EM) Algorithm, Learning in Undirected Graphical Models Some figures courtesy

More information

Latent Variable Models and EM algorithm

Latent Variable Models and EM algorithm Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels? Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone extension 6372 Email: sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11/ Unsupervised learning Can we find regularity

More information

Extended Version of Expectation propagation for approximate inference in dynamic Bayesian networks

Extended Version of Expectation propagation for approximate inference in dynamic Bayesian networks Extended Version of Expectation propagation for approximate inference in dynamic Bayesian networks Tom Heskes & Onno Zoeter SNN, University of Nijmegen Geert Grooteplein 21, 6525 EZ, Nijmegen, The Netherlands

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

4 : Exact Inference: Variable Elimination

4 : Exact Inference: Variable Elimination 10-708: Probabilistic Graphical Models 10-708, Spring 2014 4 : Exact Inference: Variable Elimination Lecturer: Eric P. ing Scribes: Soumya Batra, Pradeep Dasigi, Manzil Zaheer 1 Probabilistic Inference

More information

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm Probabilistic Graphical Models 10-708 Homework 2: Due February 24, 2014 at 4 pm Directions. This homework assignment covers the material presented in Lectures 4-8. You must complete all four problems to

More information

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.

More information

Expectation Propagation in Factor Graphs: A Tutorial

Expectation Propagation in Factor Graphs: A Tutorial DRAFT: Version 0.1, 28 October 2005. Do not distribute. Expectation Propagation in Factor Graphs: A Tutorial Charles Sutton October 28, 2005 Abstract Expectation propagation is an important variational

More information

CS Lecture 19. Exponential Families & Expectation Propagation

CS Lecture 19. Exponential Families & Expectation Propagation CS 6347 Lecture 19 Exponential Families & Expectation Propagation Discrete State Spaces We have been focusing on the case of MRFs over discrete state spaces Probability distributions over discrete spaces

More information

Inference in Bayesian Networks

Inference in Bayesian Networks Andrea Passerini passerini@disi.unitn.it Machine Learning Inference in graphical models Description Assume we have evidence e on the state of a subset of variables E in the model (i.e. Bayesian Network)

More information

Recitation 9: Loopy BP

Recitation 9: Loopy BP Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 204 Recitation 9: Loopy BP General Comments. In terms of implementation,

More information

Bayesian Learning in Undirected Graphical Models

Bayesian Learning in Undirected Graphical Models Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ Work with: Iain Murray and Hyun-Chul

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

G8325: Variational Bayes

G8325: Variational Bayes G8325: Variational Bayes Vincent Dorie Columbia University Wednesday, November 2nd, 2011 bridge Variational University Bayes Press 2003. On-screen viewing permitted. Printing not permitted. http://www.c

More information

The Origin of Deep Learning. Lili Mou Jan, 2015

The Origin of Deep Learning. Lili Mou Jan, 2015 The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets

More information

MCMC and Gibbs Sampling. Kayhan Batmanghelich

MCMC and Gibbs Sampling. Kayhan Batmanghelich MCMC and Gibbs Sampling Kayhan Batmanghelich 1 Approaches to inference l Exact inference algorithms l l l The elimination algorithm Message-passing algorithm (sum-product, belief propagation) The junction

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

Structured Variational Inference

Structured Variational Inference Structured Variational Inference Sargur srihari@cedar.buffalo.edu 1 Topics 1. Structured Variational Approximations 1. The Mean Field Approximation 1. The Mean Field Energy 2. Maximizing the energy functional:

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

STATS 306B: Unsupervised Learning Spring Lecture 5 April 14

STATS 306B: Unsupervised Learning Spring Lecture 5 April 14 STATS 306B: Unsupervised Learning Spring 2014 Lecture 5 April 14 Lecturer: Lester Mackey Scribe: Brian Do and Robin Jia 5.1 Discrete Hidden Markov Models 5.1.1 Recap In the last lecture, we introduced

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

5. Sum-product algorithm

5. Sum-product algorithm Sum-product algorithm 5-1 5. Sum-product algorithm Elimination algorithm Sum-product algorithm on a line Sum-product algorithm on a tree Sum-product algorithm 5-2 Inference tasks on graphical models consider

More information

Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation

Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation Jason K. Johnson, Dmitry M. Malioutov and Alan S. Willsky Department of Electrical Engineering and Computer Science Massachusetts Institute

More information

3 : Representation of Undirected GM

3 : Representation of Undirected GM 10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:

More information

Series 7, May 22, 2018 (EM Convergence)

Series 7, May 22, 2018 (EM Convergence) Exercises Introduction to Machine Learning SS 2018 Series 7, May 22, 2018 (EM Convergence) Institute for Machine Learning Dept. of Computer Science, ETH Zürich Prof. Dr. Andreas Krause Web: https://las.inf.ethz.ch/teaching/introml-s18

More information

Power EP. Thomas Minka Microsoft Research Ltd., Cambridge, UK MSR-TR , October 4, Abstract

Power EP. Thomas Minka Microsoft Research Ltd., Cambridge, UK MSR-TR , October 4, Abstract Power EP Thomas Minka Microsoft Research Ltd., Cambridge, UK MSR-TR-2004-149, October 4, 2004 Abstract This note describes power EP, an etension of Epectation Propagation (EP) that makes the computations

More information

Lecture 6: Graphical Models: Learning

Lecture 6: Graphical Models: Learning Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)

More information

CS Lecture 8 & 9. Lagrange Multipliers & Varitional Bounds

CS Lecture 8 & 9. Lagrange Multipliers & Varitional Bounds CS 6347 Lecture 8 & 9 Lagrange Multipliers & Varitional Bounds General Optimization subject to: min ff 0() R nn ff ii 0, h ii = 0, ii = 1,, mm ii = 1,, pp 2 General Optimization subject to: min ff 0()

More information

Machine Learning for Data Science (CS4786) Lecture 24

Machine Learning for Data Science (CS4786) Lecture 24 Machine Learning for Data Science (CS4786) Lecture 24 Graphical Models: Approximate Inference Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ BELIEF PROPAGATION OR MESSAGE PASSING Each

More information

Learning MN Parameters with Alternative Objective Functions. Sargur Srihari

Learning MN Parameters with Alternative Objective Functions. Sargur Srihari Learning MN Parameters with Alternative Objective Functions Sargur srihari@cedar.buffalo.edu 1 Topics Max Likelihood & Contrastive Objectives Contrastive Objective Learning Methods Pseudo-likelihood Gradient

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Expectation Propagation in Dynamical Systems

Expectation Propagation in Dynamical Systems Expectation Propagation in Dynamical Systems Marc Peter Deisenroth Joint Work with Shakir Mohamed (UBC) August 10, 2012 Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 1 Motivation Figure : Complex

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models David Rosenberg, Brett Bernstein New York University April 26, 2017 David Rosenberg, Brett Bernstein (New York University) DS-GA 1003 April 26, 2017 1 / 42 Intro Question Intro

More information

EECS 545 Project Report: Query Learning for Multiple Object Identification

EECS 545 Project Report: Query Learning for Multiple Object Identification EECS 545 Project Report: Query Learning for Multiple Object Identification Dan Lingenfelter, Tzu-Yu Liu, Antonis Matakos and Zhaoshi Meng 1 Introduction In a typical query learning setting we have a set

More information

11. Learning graphical models

11. Learning graphical models Learning graphical models 11-1 11. Learning graphical models Maximum likelihood Parameter learning Structural learning Learning partially observed graphical models Learning graphical models 11-2 statistical

More information

Quantitative Biology II Lecture 4: Variational Methods

Quantitative Biology II Lecture 4: Variational Methods 10 th March 2015 Quantitative Biology II Lecture 4: Variational Methods Gurinder Singh Mickey Atwal Center for Quantitative Biology Cold Spring Harbor Laboratory Image credit: Mike West Summary Approximate

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s

More information

Machine Learning Summer School

Machine Learning Summer School Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,

More information

Lecture 15. Probabilistic Models on Graph

Lecture 15. Probabilistic Models on Graph Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by

More information

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Aaron C. Courville Université de Montréal Note: Material for the slides is taken directly from a presentation prepared by Christopher M. Bishop Learning in DAGs Two things could

More information

Approximate inference, Sampling & Variational inference Fall Cours 9 November 25

Approximate inference, Sampling & Variational inference Fall Cours 9 November 25 Approimate inference, Sampling & Variational inference Fall 2015 Cours 9 November 25 Enseignant: Guillaume Obozinski Scribe: Basile Clément, Nathan de Lara 9.1 Approimate inference with MCMC 9.1.1 Gibbs

More information

Variational inference

Variational inference Simon Leglaive Télécom ParisTech, CNRS LTCI, Université Paris Saclay November 18, 2016, Télécom ParisTech, Paris, France. Outline Introduction Probabilistic model Problem Log-likelihood decomposition EM

More information

Bayesian Learning in Undirected Graphical Models

Bayesian Learning in Undirected Graphical Models Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ and Center for Automated Learning and

More information

Bayesian Inference Course, WTCN, UCL, March 2013

Bayesian Inference Course, WTCN, UCL, March 2013 Bayesian Course, WTCN, UCL, March 2013 Shannon (1948) asked how much information is received when we observe a specific value of the variable x? If an unlikely event occurs then one would expect the information

More information

Introduction to Probabilistic Graphical Models: Exercises

Introduction to Probabilistic Graphical Models: Exercises Introduction to Probabilistic Graphical Models: Exercises Cédric Archambeau Xerox Research Centre Europe cedric.archambeau@xrce.xerox.com Pascal Bootcamp Marseille, France, July 2010 Exercise 1: basics

More information

Mixtures of Gaussians. Sargur Srihari

Mixtures of Gaussians. Sargur Srihari Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm

More information

Lecture 14. Clustering, K-means, and EM

Lecture 14. Clustering, K-means, and EM Lecture 14. Clustering, K-means, and EM Prof. Alan Yuille Spring 2014 Outline 1. Clustering 2. K-means 3. EM 1 Clustering Task: Given a set of unlabeled data D = {x 1,..., x n }, we do the following: 1.

More information

Week 3: The EM algorithm

Week 3: The EM algorithm Week 3: The EM algorithm Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2005 Mixtures of Gaussians Data: Y = {y 1... y N } Latent

More information

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2 STATS 306B: Unsupervised Learning Spring 2014 Lecture 2 April 2 Lecturer: Lester Mackey Scribe: Junyang Qian, Minzhe Wang 2.1 Recap In the last lecture, we formulated our working definition of unsupervised

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

Variational Algorithms for Marginal MAP

Variational Algorithms for Marginal MAP Variational Algorithms for Marginal MAP Qiang Liu Department of Computer Science University of California, Irvine Irvine, CA, 92697 qliu1@ics.uci.edu Alexander Ihler Department of Computer Science University

More information

Clustering and Gaussian Mixture Models

Clustering and Gaussian Mixture Models Clustering and Gaussian Mixture Models Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 25, 2016 Probabilistic Machine Learning (CS772A) Clustering and Gaussian Mixture Models 1 Recap

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /

More information