Expectation maximization tutorial

Size: px
Start display at page:

Download "Expectation maximization tutorial"

Transcription

1 Expectation maximization tutorial Octavian Ganea November 18, /1

2 Today Expectation - maximization algorithm Topic modelling 2/1

3 ML & MAP Observed data: X = {x 1, x 2... x N } 3/1

4 ML & MAP Observed data: X = {x 1, x 2... x N } Probabilistic model of the data: p(x θ) = n i=1 p(x i θ) 3/1

5 ML & MAP Observed data: X = {x 1, x 2... x N } Probabilistic model of the data: p(x θ) = n i=1 p(x i θ) Estimate parameters: 3/1

6 ML & MAP Observed data: X = {x 1, x 2... x N } Probabilistic model of the data: p(x θ) = n i=1 p(x i θ) Estimate parameters: Maximum likelihood: ˆθ ML = arg max θ p(x θ) 3/1

7 ML & MAP Observed data: X = {x 1, x 2... x N } Probabilistic model of the data: p(x θ) = n i=1 p(x i θ) Estimate parameters: Maximum likelihood: ˆθ ML = arg max θ p(x θ) Maximum a-posteriori: ˆθ MAP = arg max θ p(θ X ) = arg max θ [p(θ) + p(x θ)] 3/1

8 Maximizing the log-likelihood Observed data: X = {x 1, x 2... x N } 4/1

9 Maximizing the log-likelihood Observed data: X = {x 1, x 2... x N } Log-likelihood: l(θ) = log p(x θ) = N i=1 log p(x i θ) 4/1

10 Maximizing the log-likelihood Observed data: X = {x 1, x 2... x N } Log-likelihood: l(θ) = log p(x θ) = N i=1 log p(x i θ) Latent variables: log p(x θ) = log Z p(x, Z θ) 4/1

11 Maximizing the log-likelihood Observed data: X = {x 1, x 2... x N } Log-likelihood: l(θ) = log p(x θ) = N i=1 log p(x i θ) Latent variables: log p(x θ) = log Z p(x, Z θ) Hard to maximize l(θ) directly (no closed form solution in most of the interesting cases). 4/1

12 Maximizing the log-likelihood Observed data: X = {x 1, x 2... x N } Log-likelihood: l(θ) = log p(x θ) = N i=1 log p(x i θ) Latent variables: log p(x θ) = log Z p(x, Z θ) Hard to maximize l(θ) directly (no closed form solution in most of the interesting cases). One solution: 4/1

13 Maximizing the log-likelihood Observed data: X = {x 1, x 2... x N } Log-likelihood: l(θ) = log p(x θ) = N i=1 log p(x i θ) Latent variables: log p(x θ) = log Z p(x, Z θ) Hard to maximize l(θ) directly (no closed form solution in most of the interesting cases). One solution: use a gradient method (e.g. gradient ascent, Newton) 4/1

14 Maximizing the log-likelihood Observed data: X = {x 1, x 2... x N } Log-likelihood: l(θ) = log p(x θ) = N i=1 log p(x i θ) Latent variables: log p(x θ) = log Z p(x, Z θ) Hard to maximize l(θ) directly (no closed form solution in most of the interesting cases). One solution: use a gradient method (e.g. gradient ascent, Newton) sometimes the gradient is hard to compute, hard to implement, or we do not want a black-box optimization routine with no guarantees 4/1

15 Expectation - maximization algorithm Used in models with latent variables. Iterative algorithm that guarantees convergence to stationary point of l(θ) (i.e. point with gradient zero, either local optimum or saddle point). No global optima guarantees. EM reaches either a local maximum or a saddle point Convergence speed might be slow. Idea: Builds sequence: l(θ (0) ) l(θ (1) )... l(θ (t) )... 5/1

16 Expectation - maximization algorithm Used in models with latent variables. Iterative algorithm that guarantees convergence to stationary point of l(θ) (i.e. point with gradient zero, either local optimum or saddle point). No global optima guarantees. EM reaches either a local maximum or a saddle point Convergence speed might be slow. Idea: Builds sequence: l(θ (0) ) l(θ (1) )... l(θ (t) )... At each step, using Jensen s inequality, finds a lower bound g s.t. l(θ (t) ) g(θ (t+1), q) l(θ (t+1) ) 5/1

17 Expectation - maximization algorithm For any probability distribution q(z) (s.t. Z q(z) = 1), Jensen inequality gives a lower bound F (q, θ) on the true likelihood: ( ) ( ) p(x, Z θ) l(θ) = log p(x, Z θ) = log q(z) q(z) Z Z Reason: log( ) is concave. ( ), Z θ) q(z) log p(x q(z) Equality case: q(z) = p(z X, θ). Z := F (q, θ) 6/1

18 Expectation - maximization algorithm Update rule: where g t (θ) := F (p(z X, θ (t) ), θ) = Z θ (t+1) = arg max g t (θ) θ ( ) p(x, Z θ) p(z X, θ (t) ) log p(z X, θ (t) ) From above, g t (θ) l(θ), θ in particular: gt (θ (t+1) ) l(θ (t+1) ) Equality in Jensen: g t (θ (t) ) = l(θ (t) ) So: l(θ (t) ) = g t (θ (t) ) g t (θ (t+1) ) l(θ (t+1) ) 7/1

19 Expectation - maximization algorithm EM algorithm: E-step: q (t+1) = arg max q F (q, θ (t) ) (i.e. q (t+1) = p(z X, θ (t) )) M-step: θ (t+1) = arg max θ F (q (t+1), θ) 8/1

20 EM algorithm - convergence We proved so far that: l(θ (0) ) l(θ (1) )... l(θ (t) )... But why does it converge to a stationary point? (Who guarantees no early stopping?) Proof: Let θ be the limit of the sequence defined by the EM algorithm. Then: θ = arg max θ g (θ), where g (θ) = F (p(z X, θ ), θ). This implies: θ g (θ ) = 0. Let h (θ) := l(θ) g (θ) = ( ) Z p(z X, θ ) log p(z X,θ) p(z X,θ ) Then, h (θ) 0, θ (since g is a lower bound of l) and h (θ ) = 0 (Jensen equality case) So, θ = arg min θ h (θ) θ h (θ ) = 0 So, θ l(θ ) = θ h (θ ) + θ g (θ ) = 0, q.e.d. 9/1

21 EM Applications Tired of too much math? :) Let s look at some cool applications of EM 10/1

22 Application 1 : Coin Flipping There are two coins A and B with θ A and θ B being the probability landing on Head when tossed. Do 5 rounds. In each round, select one coin uniformly at random and toss it 10 times then record the results. The observed data consists of 50 coin tosses. However, we don t know which coin was selected for a particular round. Estimate θ A and θ B. 11/1

23 Application 1 : Coin Flipping Let s start simple: One coin A with P(Y = H) = θ A 10 tosses: #H = x {0,..., 10}, #T = 10 x How to estimate θ A? Maximize what we see! Mathematically, maximize data (log-)likelihood: θ A = arg max θ A l(θ A ), where l(θ A ) := log P(X = x θ A ) P(X = x θa ) = θ x A (1 θ A) 10 x (note: fixed order of tosses) l(θ A ) = x log(θ A ) + (10 x) log(1 θ A ) Set derivative to 0: l θ A (θa ) = 0 θ A = x 10 Best ML distribution is the empirical distribution. 12/1

24 Application 1 : Coin Flipping Back to our original problem. Parameters θ = {θ A, θ B } 13/1

25 Application 1 : Coin Flipping Back to our original problem. Parameters θ = {θ A, θ B } Latent r.v. Z r - the coin selected in round r {1,..., 5}: p(z r = A) = p(z r = B) = /1

26 Application 1 : Coin Flipping Back to our original problem. Parameters θ = {θ A, θ B } Latent r.v. Z r - the coin selected in round r {1,..., 5}: p(z r = A) = p(z r = B) = 0.5 In each round r, the number of heads is x r. Associated r.v. X r. 13/1

27 Application 1 : Coin Flipping Back to our original problem. Parameters θ = {θ A, θ B } Latent r.v. Z r - the coin selected in round r {1,..., 5}: p(z r = A) = p(z r = B) = 0.5 In each round r, the number of heads is x r. Associated r.v. X r. p(x r = x r Z r = A; θ) = θ xr A (1 θ A) 10 xr 13/1

28 Application 1 : Coin Flipping Back to our original problem. Parameters θ = {θ A, θ B } Latent r.v. Z r - the coin selected in round r {1,..., 5}: p(z r = A) = p(z r = B) = 0.5 In each round r, the number of heads is x r. Associated r.v. X r. p(x r = x r Z r = A; θ) = θ xr A (1 θ A) 10 xr Bayes rule: p(z r = A x r ; θ) = θ xr A (1 θ A) 10 xr θ xr A (1 θ A) 10 xr +θ xr B (1 θ B) 10 xr 13/1

29 Application 1 : Coin Flipping Data likelihood (per one round): p(x r ; θ) = p(x r Z r = A; θ)p(z r = A) + p(x r Z r = B; θ)p(z r = B) = 0.5 ( θ xr A (1 θ A) 10 xr + θ xr B (1 θ B) ) 10 xr 14/1

30 Application 1 : Coin Flipping Data likelihood (per one round): p(x r ; θ) = p(x r Z r = A; θ)p(z r = A) + p(x r Z r = B; θ)p(z r = B) = 0.5 ( θ xr A (1 θ A) 10 xr + θ xr B (1 θ B) ) 10 xr Data log-likelihood (all rounds): l(θ) = log p(x ; θ) = 5 r=1 log p(x r ; θ) 14/1

31 Application 1 : Coin Flipping Data likelihood (per one round): p(x r ; θ) = p(x r Z r = A; θ)p(z r = A) + p(x r Z r = B; θ)p(z r = B) = 0.5 ( θ xr A (1 θ A) 10 xr + θ xr B (1 θ B) ) 10 xr Data log-likelihood (all rounds): l(θ) = log p(x ; θ) = 5 r=1 log p(x r ; θ) Cannot maximize log-likelihood directly (i.e. by setting gradient to zero). 14/1

32 Application 1 : Coin Flipping Data likelihood (per one round): p(x r ; θ) = p(x r Z r = A; θ)p(z r = A) + p(x r Z r = B; θ)p(z r = B) = 0.5 ( θ xr A (1 θ A) 10 xr + θ xr B (1 θ B) ) 10 xr Data log-likelihood (all rounds): l(θ) = log p(x ; θ) = 5 r=1 log p(x r ; θ) Cannot maximize log-likelihood directly (i.e. by setting gradient to zero). Instead, maximize EM lower bound on l(θ) (formalized last time). 14/1

33 Application 1 : Coin Flipping EM lower-bound per round (Jensen inequality): log p(x r ; θ) ( ) p(xr, Z r = c; θ) q r (Z r = c) log := F r (q r, θ) q r (Z r = c) c=a,b 15/1

34 Application 1 : Coin Flipping EM lower-bound per round (Jensen inequality): log p(x r ; θ) ( ) p(xr, Z r = c; θ) q r (Z r = c) log := F r (q r, θ) q r (Z r = c) c=a,b Expectation step: q r (Z r = c) = p(z r = c x r ; θ (t) ), r {1,..., 5} 15/1

35 Application 1 : Coin Flipping EM lower-bound per round (Jensen inequality): log p(x r ; θ) ( ) p(xr, Z r = c; θ) q r (Z r = c) log := F r (q r, θ) q r (Z r = c) c=a,b Expectation step: q r (Z r = c) = p(z r = c x r ; θ (t) ), r {1,..., 5} Maximization step: where g t (θ) = θ (t+1) = arg max g t (θ) θ 5 F r (p(z r = x r, θ (t) ), θ) r=1 15/1

36 Application 1 : Coin Flipping Maximization step: θ (t+1) = arg max θ 5 r=1 c=a,b p(z r = c x r, θ (t) ) log (p(x r, Z r = c; θ)) 16/1

37 Application 1 : Coin Flipping Maximization step: θ (t+1) = arg max θ Gradient: g t (θ) θ A = = 5 r=1 c=a,b p(z r = c x r, θ (t) ) log (p(x r, Z r = c; θ)) 5 p(z r = A x r, θ (t) ) log (p(x r, Z r = A; θ)) θ A 5 ( p(z r = A x r, θ (t) xr ) + 10 x ) r θ A 1 θ A r=1 r=1 16/1

38 Application 1 : Coin Flipping Maximization step: θ (t+1) = arg max θ Gradient: g t (θ) θ A = = 5 r=1 c=a,b p(z r = c x r, θ (t) ) log (p(x r, Z r = c; θ)) 5 p(z r = A x r, θ (t) ) log (p(x r, Z r = A; θ)) θ A 5 ( p(z r = A x r, θ (t) xr ) + 10 x ) r θ A 1 θ A r=1 r=1 Gradient set to 0 gives: θ (t+1) α (t) 5 A = p(z r = A x r, θ (t) )x r ; r=1 A = α(t) A α (t) A +β(t) A β (t) where 5 A = p(z r = A x r, θ (t) )(10 x r ) r=1 16/1

39 Application 1 : Coin Flipping Final algorithm: Iteration: t 0 Initialize parameters randomly: θ (0) A, θ(0) B (0, 1) Do until convergence: θ (t+1) A = α(t) A α (t) A +β(t) A θ (t+1) B = α(t) B α (t) B +β(t) B t t /1

40 Application 2 : Topic Modelling Document representations: Used for classification, query retrieval, document similarity, etc. A document can be seen as a multi-set of words d = {(w i tf (w i ; d))} i=1, V R V Issues: high dimensionality, sparsity issues, potentially many infrequent words (with noisy estimated parameters) Alternative (compressed topic representation): topic distributions: d = {(t p(t d))} t=1,k R K K = num of topics K << V How to choose the number of topics K? Hyper-parameter: the one that gives the best performance on a validation set for the task at hand Minimize perplexity of seen words 18/1

41 Application 2 : Topic Modelling Model parameters (to be learned): π t := p(t d), a nt := p(w n t) 19/1

42 Application 2 : Topic Modelling Model parameters (to be learned): π t := p(t d), a nt := p(w n t) Log likelihood (one document): N N T l(π) = log p(w n d) = log π t a nt n=1 n=1 t=1 19/1

43 Application 2 : Topic Modelling Model parameters (to be learned): π t := p(t d), a nt := p(w n t) Log likelihood (one document): N N T l(π) = log p(w n d) = log π t a nt n=1 n=1 t=1 Iterative algorithm: keep a nt fixed, learn π t ; and reverse. 19/1

44 Application 2 : Topic Modelling Model parameters (to be learned): π t := p(t d), a nt := p(w n t) Log likelihood (one document): N N T l(π) = log p(w n d) = log π t a nt n=1 n=1 t=1 Iterative algorithm: keep a nt fixed, learn π t ; and reverse. We do here just the update of π t. The update of a nt is similar. 19/1

45 Application 2 : Topic Modelling Model parameters (to be learned): π t := p(t d), a nt := p(w n t) Log likelihood (one document): N N T l(π) = log p(w n d) = log π t a nt n=1 n=1 t=1 Iterative algorithm: keep a nt fixed, learn π t ; and reverse. We do here just the update of π t. The update of a nt is similar. Log-likelihood with Lagrange multipliers: ( N T T ) L(π, λ) = log π t a nt λ π t 1 n=1 t=1 t=1 19/1

46 Application 2 : Topic Modelling Iterative update algorithm. 20/1

47 Application 2 : Topic Modelling Iterative update algorithm. Latent variables Z are now the topics t. 20/1

48 Application 2 : Topic Modelling Iterative update algorithm. Latent variables Z are now the topics t. EM lower bound using Jensen: L(π, λ) F (q, π, λ) = where t q nt = 1, n N T n=1 t=1 [ q nt log π ] ( T ) t + log a nt λ π t 1 q nt t=1 20/1

49 Application 2 : Topic Modelling Iterative update algorithm. Latent variables Z are now the topics t. EM lower bound using Jensen: L(π, λ) F (q, π, λ) = where t q nt = 1, n E-step, iteration k: q (k) N T n=1 t=1 nt = π(k) t a nt t π(k) t a nt [ q nt log π ] ( T ) t + log a nt λ π t 1 q nt t=1 20/1

50 Application 2 : Topic Modelling Iterative update algorithm. Latent variables Z are now the topics t. EM lower bound using Jensen: L(π, λ) F (q, π, λ) = where t q nt = 1, n E-step, iteration k: q (k) M-step, iteration k: π (k+1) t N T n=1 t=1 nt = π(k) t a nt t π(k) t a nt = π(k) t N [ q nt log π ] ( T ) t + log a nt λ π t 1 q nt t=1 N n=1 a nt t π(k) t a nt 20/1

51 Questions? 21/1

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

A Note on the Expectation-Maximization (EM) Algorithm

A Note on the Expectation-Maximization (EM) Algorithm A Note on the Expectation-Maximization (EM) Algorithm ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign March 11, 2007 1 Introduction The Expectation-Maximization

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

Motif representation using position weight matrix

Motif representation using position weight matrix Motif representation using position weight matrix Xiaohui Xie University of California, Irvine Motif representation using position weight matrix p.1/31 Position weight matrix Position weight matrix representation

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

The Expectation Maximization or EM algorithm

The Expectation Maximization or EM algorithm The Expectation Maximization or EM algorithm Carl Edward Rasmussen November 15th, 2017 Carl Edward Rasmussen The EM algorithm November 15th, 2017 1 / 11 Contents notation, objective the lower bound functional,

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures

More information

Series 6, May 14th, 2018 (EM Algorithm and Semi-Supervised Learning)

Series 6, May 14th, 2018 (EM Algorithm and Semi-Supervised Learning) Exercises Introduction to Machine Learning SS 2018 Series 6, May 14th, 2018 (EM Algorithm and Semi-Supervised Learning) LAS Group, Institute for Machine Learning Dept of Computer Science, ETH Zürich Prof

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s

More information

Expectation Maximization Algorithm

Expectation Maximization Algorithm Expectation Maximization Algorithm Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein, Luke Zettlemoyer and Dan Weld The Evils of Hard Assignments? Clusters

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

EM-algorithm for motif discovery

EM-algorithm for motif discovery EM-algorithm for motif discovery Xiaohui Xie University of California, Irvine EM-algorithm for motif discovery p.1/19 Position weight matrix Position weight matrix representation of a motif with width

More information

CS Lecture 18. Topic Models and LDA

CS Lecture 18. Topic Models and LDA CS 6347 Lecture 18 Topic Models and LDA (some slides by David Blei) Generative vs. Discriminative Models Recall that, in Bayesian networks, there could be many different, but equivalent models of the same

More information

1 Expectation Maximization

1 Expectation Maximization Introduction Expectation-Maximization Bibliographical notes 1 Expectation Maximization Daniel Khashabi 1 khashab2@illinois.edu 1.1 Introduction Consider the problem of parameter learning by maximizing

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample

More information

Another Walkthrough of Variational Bayes. Bevan Jones Machine Learning Reading Group Macquarie University

Another Walkthrough of Variational Bayes. Bevan Jones Machine Learning Reading Group Macquarie University Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University 2 Variational Bayes? Bayes Bayes Theorem But the integral is intractable! Sampling Gibbs, Metropolis

More information

Computing the MLE and the EM Algorithm

Computing the MLE and the EM Algorithm ECE 830 Fall 0 Statistical Signal Processing instructor: R. Nowak Computing the MLE and the EM Algorithm If X p(x θ), θ Θ, then the MLE is the solution to the equations logp(x θ) θ 0. Sometimes these equations

More information

Expectation maximization

Expectation maximization Expectation maximization Subhransu Maji CMSCI 689: Machine Learning 14 April 2015 Motivation Suppose you are building a naive Bayes spam classifier. After your are done your boss tells you that there is

More information

Weighted Finite-State Transducers in Computational Biology

Weighted Finite-State Transducers in Computational Biology Weighted Finite-State Transducers in Computational Biology Mehryar Mohri Courant Institute of Mathematical Sciences mohri@cims.nyu.edu Joint work with Corinna Cortes (Google Research). 1 This Tutorial

More information

Latent Variable Models and EM Algorithm

Latent Variable Models and EM Algorithm SC4/SM8 Advanced Topics in Statistical Machine Learning Latent Variable Models and EM Algorithm Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/atsml/

More information

Introduction To Machine Learning

Introduction To Machine Learning Introduction To Machine Learning David Sontag New York University Lecture 21, April 14, 2016 David Sontag (NYU) Introduction To Machine Learning Lecture 21, April 14, 2016 1 / 14 Expectation maximization

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Expectation-Maximization (EM) algorithm

Expectation-Maximization (EM) algorithm I529: Machine Learning in Bioinformatics (Spring 2017) Expectation-Maximization (EM) algorithm Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2017 Contents Introduce

More information

Lecture 8: Graphical models for Text

Lecture 8: Graphical models for Text Lecture 8: Graphical models for Text 4F13: Machine Learning Joaquin Quiñonero-Candela and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/

More information

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades

More information

Bayesian Inference and MCMC

Bayesian Inference and MCMC Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the

More information

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1) HW 1 due today Parameter Estimation Biometrics CSE 190 Lecture 7 Today s lecture was on the blackboard. These slides are an alternative presentation of the material. CSE190, Winter10 CSE190, Winter10 Chapter

More information

Lecture 4 September 15

Lecture 4 September 15 IFT 6269: Probabilistic Graphical Models Fall 2017 Lecture 4 September 15 Lecturer: Simon Lacoste-Julien Scribe: Philippe Brouillard & Tristan Deleu 4.1 Maximum Likelihood principle Given a parametric

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some

More information

Lecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan

Lecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan Lecture 3: Latent Variables Models and Learning with the EM Algorithm Sam Roweis Tuesday July25, 2006 Machine Learning Summer School, Taiwan Latent Variable Models What to do when a variable z is always

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Expectation Maximization Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning with MAR Values We discussed learning with missing at random values in data:

More information

Likelihood, MLE & EM for Gaussian Mixture Clustering. Nick Duffield Texas A&M University

Likelihood, MLE & EM for Gaussian Mixture Clustering. Nick Duffield Texas A&M University Likelihood, MLE & EM for Gaussian Mixture Clustering Nick Duffield Texas A&M University Probability vs. Likelihood Probability: predict unknown outcomes based on known parameters: P(x q) Likelihood: estimate

More information

Introduction to Bayesian inference

Introduction to Bayesian inference Introduction to Bayesian inference Thomas Alexander Brouwer University of Cambridge tab43@cam.ac.uk 17 November 2015 Probabilistic models Describe how data was generated using probability distributions

More information

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation Machine Learning CMPT 726 Simon Fraser University Binomial Parameter Estimation Outline Maximum Likelihood Estimation Smoothed Frequencies, Laplace Correction. Bayesian Approach. Conjugate Prior. Uniform

More information

STATS 306B: Unsupervised Learning Spring Lecture 3 April 7th

STATS 306B: Unsupervised Learning Spring Lecture 3 April 7th STATS 306B: Unsupervised Learning Spring 2014 Lecture 3 April 7th Lecturer: Lester Mackey Scribe: Jordan Bryan, Dangna Li 3.1 Recap: Gaussian Mixture Modeling In the last lecture, we discussed the Gaussian

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide

More information

The PAC Learning Framework -II

The PAC Learning Framework -II The PAC Learning Framework -II Prof. Dan A. Simovici UMB 1 / 1 Outline 1 Finite Hypothesis Space - The Inconsistent Case 2 Deterministic versus stochastic scenario 3 Bayes Error and Noise 2 / 1 Outline

More information

Unsupervised Learning

Unsupervised Learning 2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

Latent Variable Models

Latent Variable Models Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 5 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 5 1 / 31 Recap of last lecture 1 Autoregressive models:

More information

Speech Recognition Lecture 8: Expectation-Maximization Algorithm, Hidden Markov Models.

Speech Recognition Lecture 8: Expectation-Maximization Algorithm, Hidden Markov Models. Speech Recognition Lecture 8: Expectation-Maximization Algorithm, Hidden Markov Models. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.com This Lecture Expectation-Maximization (EM)

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Yuriy Sverchkov Intelligent Systems Program University of Pittsburgh October 6, 2011 Outline Latent Semantic Analysis (LSA) A quick review Probabilistic LSA (plsa)

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

Bayesian Methods: Naïve Bayes

Bayesian Methods: Naïve Bayes Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Logistic Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information

CSE446: Clustering and EM Spring 2017

CSE446: Clustering and EM Spring 2017 CSE446: Clustering and EM Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin, Dan Klein, and Luke Zettlemoyer Clustering systems: Unsupervised learning Clustering Detect patterns in unlabeled

More information

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4 ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian

More information

Lecture 2 Machine Learning Review

Lecture 2 Machine Learning Review Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things

More information

Introduction to Machine Learning. Lecture 2

Introduction to Machine Learning. Lecture 2 Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for

More information

Computational Cognitive Science

Computational Cognitive Science Computational Cognitive Science Lecture 9: Bayesian Estimation Chris Lucas (Slides adapted from Frank Keller s) School of Informatics University of Edinburgh clucas2@inf.ed.ac.uk 17 October, 2017 1 / 28

More information

Modeling Environment

Modeling Environment Topic Model Modeling Environment What does it mean to understand/ your environment? Ability to predict Two approaches to ing environment of words and text Latent Semantic Analysis (LSA) Topic Model LSA

More information

Algorithmisches Lernen/Machine Learning

Algorithmisches Lernen/Machine Learning Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines

More information

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI Generative and Discriminative Approaches to Graphical Models CMSC 35900 Topics in AI Lecture 2 Yasemin Altun January 26, 2007 Review of Inference on Graphical Models Elimination algorithm finds single

More information

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018 Bayesian Methods David S. Rosenberg New York University March 20, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 March 20, 2018 1 / 38 Contents 1 Classical Statistics 2 Bayesian

More information

Maximum likelihood estimation

Maximum likelihood estimation Maximum likelihood estimation Guillaume Obozinski Ecole des Ponts - ParisTech Master MVA Maximum likelihood estimation 1/26 Outline 1 Statistical concepts 2 A short review of convex analysis and optimization

More information

But if z is conditioned on, we need to model it:

But if z is conditioned on, we need to model it: Partially Unobserved Variables Lecture 8: Unsupervised Learning & EM Algorithm Sam Roweis October 28, 2003 Certain variables q in our models may be unobserved, either at training time or at test time or

More information

A brief introduction to Conditional Random Fields

A brief introduction to Conditional Random Fields A brief introduction to Conditional Random Fields Mark Johnson Macquarie University April, 2005, updated October 2010 1 Talk outline Graphical models Maximum likelihood and maximum conditional likelihood

More information

Clustering with k-means and Gaussian mixture distributions

Clustering with k-means and Gaussian mixture distributions Clustering with k-means and Gaussian mixture distributions Machine Learning and Object Recognition 2017-2018 Jakob Verbeek Clustering Finding a group structure in the data Data in one cluster similar to

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

Variables which are always unobserved are called latent variables or sometimes hidden variables. e.g. given y,x fit the model p(y x) = z p(y x,z)p(z)

Variables which are always unobserved are called latent variables or sometimes hidden variables. e.g. given y,x fit the model p(y x) = z p(y x,z)p(z) CSC2515 Machine Learning Sam Roweis Lecture 8: Unsupervised Learning & EM Algorithm October 31, 2006 Partially Unobserved Variables 2 Certain variables q in our models may be unobserved, either at training

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

Naïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Naïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824 Naïve Bayes Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative HW 1 out today. Please start early! Office hours Chen: Wed 4pm-5pm Shih-Yang: Fri 3pm-4pm Location: Whittemore 266

More information

COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017

COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University SOFT CLUSTERING VS HARD CLUSTERING

More information

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26 Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar

More information

Expectation Maximisation (EM) CS 486/686: Introduction to Artificial Intelligence University of Waterloo

Expectation Maximisation (EM) CS 486/686: Introduction to Artificial Intelligence University of Waterloo Expectation Maximisation (EM) CS 486/686: Introduction to Artificial Intelligence University of Waterloo 1 Incomplete Data So far we have seen problems where - Values of all attributes are known - Learning

More information

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling Due: Tuesday, May 10, 2016, at 6pm (Submit via NYU Classes) Instructions: Your answers to the questions below, including

More information

Expectation Maximization and Mixtures of Gaussians

Expectation Maximization and Mixtures of Gaussians Statistical Machine Learning Notes 10 Expectation Maximiation and Mixtures of Gaussians Instructor: Justin Domke Contents 1 Introduction 1 2 Preliminary: Jensen s Inequality 2 3 Expectation Maximiation

More information

Conditional Random Field

Conditional Random Field Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions

More information

Note for plsa and LDA-Version 1.1

Note for plsa and LDA-Version 1.1 Note for plsa and LDA-Version 1.1 Wayne Xin Zhao March 2, 2011 1 Disclaimer In this part of PLSA, I refer to [4, 5, 1]. In LDA part, I refer to [3, 2]. Due to the limit of my English ability, in some place,

More information

Learning MN Parameters with Approximation. Sargur Srihari

Learning MN Parameters with Approximation. Sargur Srihari Learning MN Parameters with Approximation Sargur srihari@cedar.buffalo.edu 1 Topics Iterative exact learning of MN parameters Difficulty with exact methods Approximate methods Approximate Inference Belief

More information

EM & Variational Bayes

EM & Variational Bayes EM & Variational Bayes Hanxiao Liu September 9, 2014 1 / 19 Outline 1. EM Algorithm 1.1 Introduction 1.2 Example: Mixture of vmfs 2. Variational Bayes 2.1 Introduction 2.2 Example: Bayesian Mixture of

More information

Clustering with k-means and Gaussian mixture distributions

Clustering with k-means and Gaussian mixture distributions Clustering with k-means and Gaussian mixture distributions Machine Learning and Category Representation 2014-2015 Jakob Verbeek, ovember 21, 2014 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.14.15

More information

Generative v. Discriminative classifiers Intuition

Generative v. Discriminative classifiers Intuition Logistic Regression Machine Learning 070/578 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features Y target

More information

CSC411 Fall 2018 Homework 5

CSC411 Fall 2018 Homework 5 Homework 5 Deadline: Wednesday, Nov. 4, at :59pm. Submission: You need to submit two files:. Your solutions to Questions and 2 as a PDF file, hw5_writeup.pdf, through MarkUs. (If you submit answers to

More information

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions

More information

Posterior Regularization

Posterior Regularization Posterior Regularization 1 Introduction One of the key challenges in probabilistic structured learning, is the intractability of the posterior distribution, for fast inference. There are numerous methods

More information

Naïve Bayes Classifiers and Logistic Regression. Doug Downey Northwestern EECS 349 Winter 2014

Naïve Bayes Classifiers and Logistic Regression. Doug Downey Northwestern EECS 349 Winter 2014 Naïve Bayes Classifiers and Logistic Regression Doug Downey Northwestern EECS 349 Winter 2014 Naïve Bayes Classifiers Combines all ideas we ve covered Conditional Independence Bayes Rule Statistical Estimation

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger

More information

Statistical Computing (36-350)

Statistical Computing (36-350) Statistical Computing (36-350) Lecture 19: Optimization III: Constrained and Stochastic Optimization Cosma Shalizi 30 October 2013 Agenda Constraints and Penalties Constraints and penalties Stochastic

More information

Probability and Estimation. Alan Moses

Probability and Estimation. Alan Moses Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework

More information

an introduction to bayesian inference

an introduction to bayesian inference with an application to network analysis http://jakehofman.com january 13, 2010 motivation would like models that: provide predictive and explanatory power are complex enough to describe observed phenomena

More information

CSC 411: Lecture 04: Logistic Regression

CSC 411: Lecture 04: Logistic Regression CSC 411: Lecture 04: Logistic Regression Raquel Urtasun & Rich Zemel University of Toronto Sep 23, 2015 Urtasun & Zemel (UofT) CSC 411: 04-Prob Classif Sep 23, 2015 1 / 16 Today Key Concepts: Logistic

More information

Language Modelling: Smoothing and Model Complexity. COMP-599 Sept 14, 2016

Language Modelling: Smoothing and Model Complexity. COMP-599 Sept 14, 2016 Language Modelling: Smoothing and Model Complexity COMP-599 Sept 14, 2016 Announcements A1 has been released Due on Wednesday, September 28th Start code for Question 4: Includes some of the package import

More information

Computational Cognitive Science

Computational Cognitive Science Computational Cognitive Science Lecture 8: Frank Keller School of Informatics University of Edinburgh keller@inf.ed.ac.uk Based on slides by Sharon Goldwater October 14, 2016 Frank Keller Computational

More information

Probabilistic Graphical Models for Image Analysis - Lecture 4

Probabilistic Graphical Models for Image Analysis - Lecture 4 Probabilistic Graphical Models for Image Analysis - Lecture 4 Stefan Bauer 12 October 2018 Max Planck ETH Center for Learning Systems Overview 1. Repetition 2. α-divergence 3. Variational Inference 4.

More information

Clustering and Gaussian Mixture Models

Clustering and Gaussian Mixture Models Clustering and Gaussian Mixture Models Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 25, 2016 Probabilistic Machine Learning (CS772A) Clustering and Gaussian Mixture Models 1 Recap

More information

Case Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday!

Case Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday! Case Study 1: Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 4, 017 1 Announcements:

More information

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 Nasser Sadeghkhani a.sadeghkhani@queensu.ca There are two main schools to statistical inference: 1-frequentist

More information