Latent Variable Models and EM Algorithm

Size: px
Start display at page:

Download "Latent Variable Models and EM Algorithm"

Transcription

1 SC4/SM8 Advanced Topics in Statistical Machine Learning Latent Variable Models and EM Algorithm Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: Department of Statistics, Oxford SC4/SM8 ATSML, HT / 36

2 Probabilistic Unsupervised Learning Department of Statistics, Oxford SC4/SM8 ATSML, HT / 36

3 Probabilistic Methods Mixture Models Algorithmic approach: Data Algorithm Analysis/ Interpretation Probabilistic modelling approach: Unobserved process Generative Model Analysis Interpretation Data Department of Statistics, Oxford SC4/SM8 ATSML, HT / 36

4 Mixture Models Probabilistic Unsupervised Learning Mixture Models Mixture models suppose that our dataset X was created by sampling iid from K distinct populations (called mixture components). Samples in population k can be modelled using a distribution F µk with density f (x µ k), where µ k is the model parameter for the k-th component. For a concrete example, consider a Gaussian with unknown mean µ k and known diagonal covariance σ 2 I, ( f (x µ k) = 2πσ 2 p 2 exp 1 ) 2σ x 2 µk 2 2. Generative model: for i = 1, 2,..., n: First determine the assignment variable independently for each data item i: Z i Discrete(π 1,..., π K) i.e., P(Z i = k) = π k where mixing proportions are π k 0 for each k and K k=1 πk = 1. Given the assignment Z i = k, then X i = (X (1) i,..., X (p) i ) is sampled (independently) from the corresponding k-th component: X i Z i = k f (x µ k) We observe X i = x i for each i but not Z i s (latent variables), and would like to infer the parameters {µ k} K k=1 and {π k} K k=1 (σ 2 can also be estimated). Department of Statistics, Oxford SC4/SM8 ATSML, HT / 36

5 Mixture Models Mixture Models Unknowns to learn given data are Parameters: θ = (π k, µ k) K k=1, where π 1,..., π K [0, 1], µ 1,..., µ K R p, and Latent variables: z 1,..., z n. The joint probability over all cluster indicator variables {Z i } are: p Z ((z i ) n i=1) = n π zi = i=1 n K i=1 k=1 π 1(zi=k) k The joint density at observations X i = x i given Z i = z i are: n n K p X ((x i ) n i=1 (Z i = z i ) n i=1) = f (x i µ zi ) = f (x i µ k ) 1(zi=k) i=1 i=1 k=1 Department of Statistics, Oxford SC4/SM8 ATSML, HT / 36

6 Mixture Models Mixture Models: Joint pmf/pdf of observed and latent variables Unknowns to learn given data are Parameters: θ = (π k, µ k) K k=1, where π 1,..., π K [0, 1], µ 1,..., µ K R p, and Latent variables: z 1,..., z n. The joint probability mass function/density 1 is: n K p X,Z ((x i, z i ) n i=1) = p Z ((z i ) n i=1)p X ((x i ) n i=1 (Z i = z i ) n i=1) = (π k f (x i µ k )) 1(zi=k) i=1 k=1 And the marginal density of x i (resulting model on the observed data) is: p(x i ) = K p(z i = j, x i ) = j=1 K π j f (x i µ j ). j=1 Department of Statistics, Oxford SC4/SM8 ATSML, HT / 36

7 Prod. Bernoulli Probabilistic Prod. Unsupervised Bernoulli Learning Sigmoid Mixturebelief Modelsnet 27.7 Table 11.1 Mixture Models: Gaussian Mixtures with Unequal Prod. Discrete in the likelihood means a factored distribution of the form j Gaussian means a factored distribution of the form Covariances j Summary of some popular directed latent variable models. Here Prod means product, so Cat(xij zi), and Prod. N (xij zi). PCA stands for principal components analysis. ICA stands for indepedendent components analysis (a) figure from Murphy, 2012, Ch. 11. Here θ = (π k, µ k, Σ k ) K k=1 are all the model parametes and ( f (x (µ k, Σ k )) = (2π) p 2 Σk 1 2 exp 1 2 (x µ k) Σ 1 k (x µ k ) Figure 11.3 A mixture of 3 Gaussians in 2d. (a) We show the contours of constant probability for each component in the mixture. (b) A surface plot of the overall density. Based on Figure 2.23 of (Bishop ) 2006a). Figure generated by mixgaussplotdemo..1 Mixtures of Gaussians K p(x) = π k f (x (µ k, Σ k )) The most widely used mixturek=1 model is the mixture of Gaussians (MOG), also called a Gaussian mixture model or GMM. In this model, each base distribution in the mixture is a multivariate Gaussian with mean μ and covariance matrix Σ k. Thus the model has the form Department of Statistics, Oxford SC4/SM8 ATSML, HT / 36 (b),

8 Mixture Models Mixture Models: Responsibility Suppose we know the parameters θ = (π k, µ k ) K k=1. Z i is a random variable and its conditional distribution given data set X is: Q ik := p(z i = k x i ) = p(z i = k, x i ) p(x i ) = π kf (x i µ k ) K j=1 π jf (x i µ j ) The conditional probability Q ik is called the responsibility of mixture component k for data point x i. These conditionals softly partitions the dataset among the k components: K k=1 Q ik = 1. Department of Statistics, Oxford SC4/SM8 ATSML, HT / 36

9 Mixture Models Mixture Models: Maximum Likehood How can we learn about the parameters θ = (π k, µ k ) K k=1 from data? Standard statistical methodology asks for the maximum likelihood estimator (MLE). The goal is to maximise the marginal probability of the data over the parameters ˆθ ML = argmax p(x θ) = argmax θ (π k,µ k) K k=1 = argmax (π k,µ k) K k=1 = argmax n p(x i (π k, µ k ) K k=1) i=1 n i=1 k=1 (π k,µ k) K k=1 i=1 K π k f (x i µ k ) n K log π k f (x i µ k ). k=1 } {{ } :=l((π k,µ k) K k=1 ) Department of Statistics, Oxford SC4/SM8 ATSML, HT / 36

10 Mixture Models Mixture Models: Maximum Likehood Marginal log-likelihood: l((π k, µ k ) K k=1) := log p(x (π k, µ k ) K k=1) = n K log π k f (x i µ k ) i=1 k=1 The gradient w.r.t. µ k : µk l((π k, µ k ) K k=1) = = n i=1 π k f (x i µ k ) K j=1 π jf (x i µ j ) µ k log f (x i µ k ) n Q ik µk log f (x i µ k ). i=1 Difficult to solve, as Q ik depends implicitly on µ k. Department of Statistics, Oxford SC4/SM8 ATSML, HT / 36

11 Mixture Models Likelihood Surface for a Simple Example If latent variables z i s were all observed, we would have a unimodal likelihood surface but when we marginalise out the latents, the likelihood surface becomes multimodal: no unique MLE. 320 compbody.tex mu (a) mu 1 (left) n = 200 data points from a mixture of two 1D Gaussians with Figure 11.6: Left: N = 200 data points sampled from a mixture of 2 Gaussians in 1d, with π k = 0.5, σ k = 5, µ 1 = 10 and µ 2 = 10. Right: π 1 = Likelihood π 2 = surface 0.5, p(d µ σ = 1, 5µ 2), and withµ all 1 other = 10, parameters µ 2 = set 10. to their true values. We see the two symmetric modes, reflecting the unidentifiability of the parameters. Produced by mixgaussliksurfacedemo. (right) Observed data log likelihood surface l (µ 1, µ 2 ), all the other parameters being assumed known Unidentifiability Note that Department mixture of Statistics, models Oxford are not identifiable, which SC4/SM8 means ATSML, thereht2018 are many settings of the parameters which have the 11 / same 36 (b)

12 Mixture Models Mixture Models: Maximum Likehood Recall we would like to solve: n µk l((π k, µ k ) K k=1) = Q ik µk log f (x i µ k ) = 0 i=1 What if we ignore the dependence of Q ik on the parameters? Taking the mixture of Gaussian with covariance σ 2 I as example, n i=1 = 1 σ 2 n ( Q ik µk p 2 log(2πσ2 ) 1 ) 2σ 2 x i µ k 2 2 i=1 Q ik (x i µ k ) = 1 σ 2 ( n i=1 Q ik x i µ k ( n i=1 Q ik) ) = 0 n µ ML? i=1 k = Q ikx i n i=1 Q ik Department of Statistics, Oxford SC4/SM8 ATSML, HT / 36

13 Mixture Models Mixture Models: Maximum Likehood The estimate is a weighted average of data points, where the estimated mean of cluster k uses its responsibilities to data points as weights. n µ ML? i=1 k = Q ikx i n i=1 Q. ik Makes sense: Suppose we knew that data point x i came from population z i. Then Q izi = 1 and Q ik = 0 for k z i and: µ ML? i:z k = x i=k i i:z 1 = avg{x i : z i = k} i=k Our best guess of the originating population is given by Q ik. Soft K-Means algorithm? Department of Statistics, Oxford SC4/SM8 ATSML, HT / 36

14 Mixture Models Mixture Models: Maximum Likehood Gradient w.r.t. mixing proportion π k (including a Lagrange multiplier λ ( k π k 1 ) to enforce constraint k π k = 1). ( πk l((π k, µ k ) K k=1) λ( ) K k=1 π k 1) n f (x i µ k ) = K j=1 π jf (x i µ j ) λ = i=1 n i=1 Q ik π k λ = 0 π k n i=1 Q ik Note: K n Q ik = k=1 i=1 n K i=1 k=1 Q ik }{{} =1 n πk ML? i=1 = Q ik n Again makes sense: the estimate is simply (our best guess of) the proportion of data points coming from population k. Department of Statistics, Oxford SC4/SM8 ATSML, HT / 36

15 Mixture Models Mixture Models: The EM Algorithm Putting all the derivations together, we get an iterative algorithm for learning about the unknowns in the mixture model. Start with some initial parameters (π (0) k, µ (0) k ) K k=1. Iterate for t = 1, 2,...: Expectation Step: Maximization Step: Q (t) ik := π (t 1) k K j=1 π(t 1) j f (x i µ (t 1) k ) f (x i µ (t 1) j ) n π (t) i=1 k = Q(t) ik n n µ (t) i=1 k = Q(t) ik xi n i=1 Q(t) ik Will the algorithm converge? What does it converge to? Department of Statistics, Oxford SC4/SM8 ATSML, HT / 36

16 Mixture Models Example: Mixture of 3 Gaussians An example with 3 clusters X1 X2 Department of Statistics, Oxford SC4/SM8 ATSML, HT / 36

17 Mixture Models Example: Mixture of 3 Gaussians After 1st E and M step. Iteration 1 data[,2] data[,1] Department of Statistics, Oxford SC4/SM8 ATSML, HT / 36

18 Mixture Models Example: Mixture of 3 Gaussians After 2nd E and M step. Iteration 2 data[,2] data[,1] Department of Statistics, Oxford SC4/SM8 ATSML, HT / 36

19 Mixture Models Example: Mixture of 3 Gaussians After 3rd E and M step. Iteration 3 data[,2] data[,1] Department of Statistics, Oxford SC4/SM8 ATSML, HT / 36

20 Mixture Models Example: Mixture of 3 Gaussians After 4th E and M step. Iteration 4 data[,2] data[,1] Department of Statistics, Oxford SC4/SM8 ATSML, HT / 36

21 Mixture Models Example: Mixture of 3 Gaussians After 5th E and M step. Iteration 5 data[,2] data[,1] Department of Statistics, Oxford SC4/SM8 ATSML, HT / 36

22 EM Algorithm EM Algorithm In a maximum likelihood framework, the objective function is the log likelihood, l(θ) = Direct maximisation is not feasible. n K log π kf (x i µ k) i=1 Consider another objective function F(θ, q), where q is any probability distribution on latent variables z, such that: k=1 F(θ, q) l(θ) for all θ, q, max F(θ, q) = l(θ) q F(θ, q) is a lower bound on the log likelihood. We can construct an alternating maximisation algorithm as follows: For t = 1, 2... until convergence: q (t) := argmax F(θ (t 1), q) q θ (t) := argmax F(θ, q (t) ) θ Department of Statistics, Oxford SC4/SM8 ATSML, HT / 36

23 EM Algorithm Probabilistic Unsupervised Learning EM Algorithm The lower bound we use is called the variational free energy. q is a probability mass function for a distribution over z := (z i ) n i=1. F(θ, q) =E q [log p(x, z θ) log q(z)] [( n ) ] K =E q 1(z i = k) (log π k + log f (x i µ k )) log q(z) i=1 k=1 = [( n ) ] K q(z) 1(z i = k) (log π k + log f (x i µ k )) log q(z) z i=1 k=1 Lemma F(θ, q) l(θ) for all q and for all θ. Department of Statistics, Oxford SC4/SM8 ATSML, HT / 36

24 EM Algorithm - Solving for q EM Algorithm Lemma F(θ, q) = l(θ) for q(z) = p(z x, θ). In combination with previous Lemma, this implies that q(z) = p(z x, θ) maximizes F(θ, q) for fixed θ, i.e., the optimal q is simply the conditional distribution given the data and that fixed θ. In mixture model, q (z) = p(z, x θ) p(x θ) = n i=1 π z i f (x i µ zi ) z n i=1 π z i f (x i µ z i ) = = n π zi f (x i µ zi ) k π kf (x i µ k ) n p(z i x i, θ). i=1 i=1 Department of Statistics, Oxford SC4/SM8 ATSML, HT / 36

25 Similar derivation for optimal π k as before. Department of Statistics, Oxford SC4/SM8 ATSML, HT / 36 Probabilistic Unsupervised Learning EM Algorithm - Solving for θ EM Algorithm Setting derivative with respect to µ k to 0, µk F(θ, q) = n q(z) 1(z i = k) µk log f (x i µ k ) z i=1 n = q(z i = k) µk log f (x i µ k ) = 0 i=1 This equation can be solved quite easily. E.g., for mixture of Gaussians, n µ i=1 k = q(z i = k)x i n i=1 q(z i = k) If it cannot be solved exactly, we can use gradient ascent algorithm (generalized EM): n µ k = µ k + α q(z i = k) µk log f (x i µ k ). i=1

26 EM Algorithm Probabilistic Unsupervised Learning EM Algorithm Start with some initial parameters (π (0) k, µ (0) k ) K k=1. Iterate for t = 1, 2,...: Expectation Step: q (t) (z i = k) := p(z i = k x i, θ (t 1) ) = Maximization Step: π (t 1) k K j=1 π(t 1) j f (x i µ (t 1) k ) f (x i µ (t 1) j ) n π (t) i=1 k = q(t) (z i = k) n n µ (t) i=1 k = q(t) (z i = k)x i n i=1 q(t) (z i = k) Theorem EM algorithm does not decrease the log likelihood. Proof: l(θ (t 1) ) = F(θ (t 1), q (t) ) F(θ (t), q (t) ) F(θ (t), q (t+1) ) = l(θ (t) ). Additional assumption, that 2 θ F(θ(t), q (t) ) are negative definite with eigenvalues < ɛ < 0, implies that θ (t) θ where θ is a local MLE. Department of Statistics, Oxford SC4/SM8 ATSML, HT / 36

27 EM Algorithm Notes on Probabilistic Approach and EM Algorithm Some good things: Guaranteed convergence to locally optimal parameters. Formal reasoning of uncertainties, using both Bayes Theorem and maximum likelihood theory. Rich language of probability theory to express a wide range of generative models, and straightforward derivation of algorithms for ML estimation. Some bad things: Can get stuck in local minima so multiple starts are recommended. Slower and more expensive than K-means. Choice of K still problematic, but rich array of methods for model selection comes to rescue. Department of Statistics, Oxford SC4/SM8 ATSML, HT / 36

28 EM Algorithm Flexible Gaussian Mixture Models We can allow each cluster to have its own mean and covariance structure to enable greater flexibility in the model. Different covariances Different, but diagonal covariances Identical covariances Identical and spherical covariances Department of Statistics, Oxford SC4/SM8 ATSML, HT / 36

29 Probabilistic PCA Probabilistic Unsupervised Learning Probabilistic PCA A probabilistic model related to PCA (also known as sensible PCA) has the following generative model: for i = 1, 2,..., n: Let k < n, p be given. Let Y i be a (latent) k-dimensional normally distributed random variable with 0 mean and identity covariance: Y i N (0, I k) We model the distribution of the ith data point given Y i as a p-dimensional normal: X i N (µ + LY i, σ 2 I) where the parameters are a vector µ R p, a matrix L R p k and σ 2 > 0. Tipping and Bishop, 1999 Department of Statistics, Oxford SC4/SM8 ATSML, HT / 36

30 Probabilistic PCA Probabilistic PCA: EM vs MLE EM algorithm can be used for ML estimation (lecture notes), but PPCA can more directly give an MLE (which is not unique). Let λ 1 λ p be the eigenvalues of the sample covariance and V 1:k R p k the top k eigenvectors as before. Let Q R k k be any orthogonal matrix. Then an MLE is given by: µ MLE = x (σ 2 ) MLE = 1 p k p j=k+1 λ j L MLE = V 1:k diag((λ 1 (σ 2 ) MLE ) 1 2,..., (λk (σ 2 ) MLE ) 1 2 )Q However, EM can be faster, can be implemented online, can handle missing data and can be extended to more complicated models! Tipping and Bishop, 1999 Department of Statistics, Oxford SC4/SM8 ATSML, HT / 36

31 Probabilistic PCA Probabilistic PCA PPCA latents principal subspace figures from M. Sahani s UCL course on Unsupervised Learning Department of Statistics, Oxford SC4/SM8 ATSML, HT / 36

32 Probabilistic PCA Probabilistic PCA PPCA latents PCA projection principal subspace figures from M. Sahani s UCL course on Unsupervised Learning Department of Statistics, Oxford SC4/SM8 ATSML, HT / 36

33 Probabilistic PCA Probabilistic PCA PPCA latents PPCA noise PPCA latent prior principal subspace figures from M. Sahani s UCL course on Unsupervised Learning Department of Statistics, Oxford SC4/SM8 ATSML, HT / 36

34 Probabilistic PCA Probabilistic PCA PPCA latents PPCA posterior PPCA latent prior PPCA noise PPCA projection principal subspace figures from M. Sahani s UCL course on Unsupervised Learning Department of Statistics, Oxford SC4/SM8 ATSML, HT / 36

35 Probabilistic PCA Mixture of Probabilistic PCAs We have learnt two types of unsupervised learning techniques: Dimensionality reduction, e.g. PCA, MDS, Isomap. Clustering, e.g. K-means, linkage and mixture models. Probabilistic models allow us to construct more complex models from simpler pieces. Mixture of probabilistic PCAs allows both clustering and dimensionality reduction at the same time. Z i Discrete(π 1,..., π K ) Y i N (0, I d ) X i Z i = k, Y i = y i N (µ k + Ly i, σ 2 I p ) Allows flexible modelling of covariance structure without using too many parameters. Ghahramani and Hinton 1996 Department of Statistics, Oxford SC4/SM8 ATSML, HT / 36

36 Probabilistic PCA Further reading Hastie et al, 8.5 Bishop, Chapter 9 Roweis and Ghahramani: A unifying review of linear Gaussian models Department of Statistics, Oxford SC4/SM8 ATSML, HT / 36

Latent Variable Models and EM algorithm

Latent Variable Models and EM algorithm Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some

More information

Lecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan

Lecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan Lecture 3: Latent Variables Models and Learning with the EM Algorithm Sam Roweis Tuesday July25, 2006 Machine Learning Summer School, Taiwan Latent Variable Models What to do when a variable z is always

More information

Unsupervised Learning

Unsupervised Learning 2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and

More information

COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017

COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University SOFT CLUSTERING VS HARD CLUSTERING

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

The Expectation Maximization or EM algorithm

The Expectation Maximization or EM algorithm The Expectation Maximization or EM algorithm Carl Edward Rasmussen November 15th, 2017 Carl Edward Rasmussen The EM algorithm November 15th, 2017 1 / 11 Contents notation, objective the lower bound functional,

More information

Lecture 7: Con3nuous Latent Variable Models

Lecture 7: Con3nuous Latent Variable Models CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 7: Con3nuous Latent Variable Models All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College

More information

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2 STATS 306B: Unsupervised Learning Spring 2014 Lecture 2 April 2 Lecturer: Lester Mackey Scribe: Junyang Qian, Minzhe Wang 2.1 Recap In the last lecture, we formulated our working definition of unsupervised

More information

Machine Learning for Data Science (CS4786) Lecture 12

Machine Learning for Data Science (CS4786) Lecture 12 Machine Learning for Data Science (CS4786) Lecture 12 Gaussian Mixture Models Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016fa/ Back to K-means Single link is sensitive to outliners We

More information

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Aaron C. Courville Université de Montréal Note: Material for the slides is taken directly from a presentation prepared by Christopher M. Bishop Learning in DAGs Two things could

More information

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY

More information

Mixtures of Gaussians. Sargur Srihari

Mixtures of Gaussians. Sargur Srihari Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm

More information

Gaussian Mixture Models, Expectation Maximization

Gaussian Mixture Models, Expectation Maximization Gaussian Mixture Models, Expectation Maximization Instructor: Jessica Wu Harvey Mudd College The instructor gratefully acknowledges Andrew Ng (Stanford), Andrew Moore (CMU), Eric Eaton (UPenn), David Kauchak

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2 Agenda Expectation-maximization

More information

Expectation Maximization Algorithm

Expectation Maximization Algorithm Expectation Maximization Algorithm Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein, Luke Zettlemoyer and Dan Weld The Evils of Hard Assignments? Clusters

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 6 Jan-Willem van de Meent (credit: Yijun Zhao, Chris Bishop, Andrew Moore, Hastie et al.) Project Project Deadlines 3 Feb: Form teams of

More information

But if z is conditioned on, we need to model it:

But if z is conditioned on, we need to model it: Partially Unobserved Variables Lecture 8: Unsupervised Learning & EM Algorithm Sam Roweis October 28, 2003 Certain variables q in our models may be unobserved, either at training time or at test time or

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s

More information

1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM.

1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM. Université du Sud Toulon - Var Master Informatique Probabilistic Learning and Data Analysis TD: Model-based clustering by Faicel CHAMROUKHI Solution The aim of this practical wor is to show how the Classification

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Machine Learning CSE546 Carlos Guestrin University of Washington November 13, 2014 1 E.M.: The General Case E.M. widely used beyond mixtures of Gaussians The recipe is the same

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger

More information

Lecture 4: Probabilistic Learning

Lecture 4: Probabilistic Learning DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods

More information

EM Algorithm LECTURE OUTLINE

EM Algorithm LECTURE OUTLINE EM Algorithm Lukáš Cerman, Václav Hlaváč Czech Technical University, Faculty of Electrical Engineering Department of Cybernetics, Center for Machine Perception 121 35 Praha 2, Karlovo nám. 13, Czech Republic

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample

More information

Clustering and Gaussian Mixture Models

Clustering and Gaussian Mixture Models Clustering and Gaussian Mixture Models Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 25, 2016 Probabilistic Machine Learning (CS772A) Clustering and Gaussian Mixture Models 1 Recap

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 13: Learning in Gaussian Graphical Models, Non-Gaussian Inference, Monte Carlo Methods Some figures

More information

ECE 5984: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning ECE 5984: Introduction to Machine Learning Topics: (Finish) Expectation Maximization Principal Component Analysis (PCA) Readings: Barber 15.1-15.4 Dhruv Batra Virginia Tech Administrativia Poster Presentation:

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 203 http://ce.sharif.edu/courses/9-92/2/ce725-/ Agenda Expectation-maximization

More information

Probabilistic & Unsupervised Learning. Latent Variable Models

Probabilistic & Unsupervised Learning. Latent Variable Models Probabilistic & Unsupervised Learning Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London

More information

Weighted Finite-State Transducers in Computational Biology

Weighted Finite-State Transducers in Computational Biology Weighted Finite-State Transducers in Computational Biology Mehryar Mohri Courant Institute of Mathematical Sciences mohri@cims.nyu.edu Joint work with Corinna Cortes (Google Research). 1 This Tutorial

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

Probabilistic Unsupervised Learning

Probabilistic Unsupervised Learning Statistical Data Miig ad Machie Learig Hilary Term 2016 Dio Sejdiovic Departmet of Statistics Oxford Slides ad other materials available at: http://www.stats.ox.ac.u/~sejdiov/sdmml Probabilistic Methods

More information

Intractable Likelihood Functions

Intractable Likelihood Functions Intractable Likelihood Functions Michael Gutmann Probabilistic Modelling and Reasoning (INFR11134) School of Informatics, University of Edinburgh Spring semester 2018 Recap p(x y o ) = z p(x,y o,z) x,z

More information

Latent Variable Models and Expectation Maximization

Latent Variable Models and Expectation Maximization Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15

More information

An Introduction to Expectation-Maximization

An Introduction to Expectation-Maximization An Introduction to Expectation-Maximization Dahua Lin Abstract This notes reviews the basics about the Expectation-Maximization EM) algorithm, a popular approach to perform model estimation of the generative

More information

Variables which are always unobserved are called latent variables or sometimes hidden variables. e.g. given y,x fit the model p(y x) = z p(y x,z)p(z)

Variables which are always unobserved are called latent variables or sometimes hidden variables. e.g. given y,x fit the model p(y x) = z p(y x,z)p(z) CSC2515 Machine Learning Sam Roweis Lecture 8: Unsupervised Learning & EM Algorithm October 31, 2006 Partially Unobserved Variables 2 Certain variables q in our models may be unobserved, either at training

More information

Factor Analysis (10/2/13)

Factor Analysis (10/2/13) STA561: Probabilistic machine learning Factor Analysis (10/2/13) Lecturer: Barbara Engelhardt Scribes: Li Zhu, Fan Li, Ni Guan Factor Analysis Factor analysis is related to the mixture models we have studied.

More information

Latent Variable Models and Expectation Maximization

Latent Variable Models and Expectation Maximization Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15

More information

Expectation-Maximization

Expectation-Maximization Expectation-Maximization Léon Bottou NEC Labs America COS 424 3/9/2010 Agenda Goals Representation Capacity Control Operational Considerations Computational Considerations Classification, clustering, regression,

More information

Lecture 6: April 19, 2002

Lecture 6: April 19, 2002 EE596 Pat. Recog. II: Introduction to Graphical Models Spring 2002 Lecturer: Jeff Bilmes Lecture 6: April 19, 2002 University of Washington Dept. of Electrical Engineering Scribe: Huaning Niu,Özgür Çetin

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Expectation Maximization Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning with MAR Values We discussed learning with missing at random values in data:

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

Lecture 8: Graphical models for Text

Lecture 8: Graphical models for Text Lecture 8: Graphical models for Text 4F13: Machine Learning Joaquin Quiñonero-Candela and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/

More information

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures

More information

Week 3: The EM algorithm

Week 3: The EM algorithm Week 3: The EM algorithm Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2005 Mixtures of Gaussians Data: Y = {y 1... y N } Latent

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2017

Cheng Soon Ong & Christian Walder. Canberra February June 2017 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2017 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 679 Part XIX

More information

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inria.fr http://perception.inrialpes.fr/

More information

Latent Variable Models

Latent Variable Models Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 5 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 5 1 / 31 Recap of last lecture 1 Autoregressive models:

More information

Machine Learning Techniques for Computer Vision

Machine Learning Techniques for Computer Vision Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 8 Continuous Latent Variable

More information

Clustering and Gaussian Mixtures

Clustering and Gaussian Mixtures Clustering and Gaussian Mixtures Oliver Schulte - CMPT 883 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15 2 25 5 1 15 2 25 detected tures detected

More information

Machine learning - HT Maximum Likelihood

Machine learning - HT Maximum Likelihood Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce

More information

Lecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models

Lecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models Advanced Machine Learning Lecture 10 Mixture Models II 30.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ Announcement Exercise sheet 2 online Sampling Rejection Sampling Importance

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

Review and Motivation

Review and Motivation Review and Motivation We can model and visualize multimodal datasets by using multiple unimodal (Gaussian-like) clusters. K-means gives us a way of partitioning points into N clusters. Once we know which

More information

Lecture 16 Deep Neural Generative Models

Lecture 16 Deep Neural Generative Models Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed

More information

Mixtures of Gaussians continued

Mixtures of Gaussians continued Mixtures of Gaussians continued Machine Learning CSE446 Carlos Guestrin University of Washington May 17, 2013 1 One) bad case for k-means n Clusters may overlap n Some clusters may be wider than others

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

K-Means and Gaussian Mixture Models

K-Means and Gaussian Mixture Models K-Means and Gaussian Mixture Models David Rosenberg New York University October 29, 2016 David Rosenberg (New York University) DS-GA 1003 October 29, 2016 1 / 42 K-Means Clustering K-Means Clustering David

More information

COM336: Neural Computing

COM336: Neural Computing COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk

More information

Expectation maximization tutorial

Expectation maximization tutorial Expectation maximization tutorial Octavian Ganea November 18, 2016 1/1 Today Expectation - maximization algorithm Topic modelling 2/1 ML & MAP Observed data: X = {x 1, x 2... x N } 3/1 ML & MAP Observed

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis

Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture

More information

STATS 306B: Unsupervised Learning Spring Lecture 3 April 7th

STATS 306B: Unsupervised Learning Spring Lecture 3 April 7th STATS 306B: Unsupervised Learning Spring 2014 Lecture 3 April 7th Lecturer: Lester Mackey Scribe: Jordan Bryan, Dangna Li 3.1 Recap: Gaussian Mixture Modeling In the last lecture, we discussed the Gaussian

More information

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.

More information

Expectation maximization

Expectation maximization Expectation maximization Subhransu Maji CMSCI 689: Machine Learning 14 April 2015 Motivation Suppose you are building a naive Bayes spam classifier. After your are done your boss tells you that there is

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang. Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

Finite Singular Multivariate Gaussian Mixture

Finite Singular Multivariate Gaussian Mixture 21/06/2016 Plan 1 Basic definitions Singular Multivariate Normal Distribution 2 3 Plan Singular Multivariate Normal Distribution 1 Basic definitions Singular Multivariate Normal Distribution 2 3 Multivariate

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Based on slides by Richard Zemel

Based on slides by Richard Zemel CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we

More information

CSC411 Fall 2018 Homework 5

CSC411 Fall 2018 Homework 5 Homework 5 Deadline: Wednesday, Nov. 4, at :59pm. Submission: You need to submit two files:. Your solutions to Questions and 2 as a PDF file, hw5_writeup.pdf, through MarkUs. (If you submit answers to

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels? Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone extension 6372 Email: sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11/ Unsupervised learning Can we find regularity

More information

CSE446: Clustering and EM Spring 2017

CSE446: Clustering and EM Spring 2017 CSE446: Clustering and EM Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin, Dan Klein, and Luke Zettlemoyer Clustering systems: Unsupervised learning Clustering Detect patterns in unlabeled

More information

PCA and admixture models

PCA and admixture models PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1

More information

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan Clustering CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Supervised vs Unsupervised Learning Supervised learning Given x ", y " "%& ', learn a function f: X Y Categorical output classification

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Introduction PCA classic Generative models Beyond and summary. PCA, ICA and beyond

Introduction PCA classic Generative models Beyond and summary. PCA, ICA and beyond PCA, ICA and beyond Summer School on Manifold Learning in Image and Signal Analysis, August 17-21, 2009, Hven Technical University of Denmark (DTU) & University of Copenhagen (KU) August 18, 2009 Motivation

More information

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Course 495: Advanced Statistical Machine Learning/Pattern Recognition Course 495: Advanced Statistical Machine Learning/Pattern Recognition Goal (Lecture): To present Probabilistic Principal Component Analysis (PPCA) using both Maximum Likelihood (ML) and Expectation Maximization

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

K-Means, Expectation Maximization and Segmentation. D.A. Forsyth, CS543

K-Means, Expectation Maximization and Segmentation. D.A. Forsyth, CS543 K-Means, Expectation Maximization and Segmentation D.A. Forsyth, CS543 K-Means Choose a fixed number of clusters Choose cluster centers and point-cluster allocations to minimize error can t do this by

More information

Probabilistic clustering

Probabilistic clustering Aprendizagem Automática Probabilistic clustering Ludwig Krippahl Probabilistic clustering Summary Fuzzy sets and clustering Fuzzy c-means Probabilistic Clustering: mixture models Expectation-Maximization,

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information