Lecture 6: Gaussian Mixture Models (GMM)

Similar documents
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Machine Learning for Data Science (CS4786) Lecture 12

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Pattern Recognition. Parameter Estimation of Probability Density Functions

Statistical learning. Chapter 20, Sections 1 4 1

Expectation Maximization

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Latent Variable Models

Statistical Pattern Recognition

Latent Variable View of EM. Sargur Srihari

Likelihood, MLE & EM for Gaussian Mixture Clustering. Nick Duffield Texas A&M University

Gaussian Mixture Models

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

Composite Hypotheses and Generalized Likelihood Ratio Tests

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Statistical Pattern Recognition

Lecture 4: Probabilistic Learning

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Machine Learning Techniques for Computer Vision

CMU-Q Lecture 24:

CS Lecture 18. Expectation Maximization

Clustering, K-Means, EM Tutorial

10. Composite Hypothesis Testing. ECE 830, Spring 2014

Lecture 14. Clustering, K-means, and EM

The Expectation Maximization or EM algorithm

CS Lecture 19. Exponential Families & Expectation Propagation

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

Expectation maximization

Introduction to Machine Learning

CPSC 540: Machine Learning

CS534 Machine Learning - Spring Final Exam

The Expectation-Maximization Algorithm

Clustering and Gaussian Mixture Models

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Statistical learning. Chapter 20, Sections 1 3 1

CSCI-567: Machine Learning (Spring 2019)

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Cheng Soon Ong & Christian Walder. Canberra February June 2017

EM Algorithm LECTURE OUTLINE

Mixture of Gaussians Models

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

CS6220: DATA MINING TECHNIQUES

Bayesian statistics. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Notes on Machine Learning for and

Another Walkthrough of Variational Bayes. Bevan Jones Machine Learning Reading Group Macquarie University

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation

Non-Parametric Bayes

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning for Data Science (CS4786) Lecture 24

Data Preprocessing. Cluster Similarity

Data Mining Techniques

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation

10-701/15-781, Machine Learning: Homework 4

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

p L yi z n m x N n xi

PMR Learning as Inference

Bayesian Machine Learning

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM

Mixtures of Gaussians continued

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

Lecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models

Probabilistic Graphical Models for Image Analysis - Lecture 4

Latent Variable Models and EM algorithm

1 Expectation Maximization

Statistical learning. Chapter 20, Sections 1 3 1

EM-based Reinforcement Learning

Probabilistic Time Series Classification

Computer Intensive Methods in Mathematical Statistics

Expectation-Maximization (EM) algorithm

K-Means and Gaussian Mixture Models

Review and Motivation

Maximum Likelihood Estimation. only training data is available to design a classifier

Note Set 5: Hidden Markov Models

Mixture Models and EM

Clustering. Léon Bottou COS 424 3/4/2010. NEC Labs America

Mixtures of Gaussians. Sargur Srihari

Statistical Learning. Dong Liu. Dept. EEIS, USTC

Probability and Information Theory. Sargur N. Srihari

Machine Learning for Signal Processing Bayes Classification and Regression

A brief introduction to Conditional Random Fields

Maximum likelihood estimation

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

Scribe to lecture Tuesday March

CS6220: DATA MINING TECHNIQUES

CSC411: Final Review. James Lucas & David Madras. December 3, 2018

Unsupervised Learning

Expectation Maximization Algorithm

PCA and admixture models

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

Lecture 11: Unsupervised Machine Learning

Accelerating the EM Algorithm for Mixture Density Estimation

6.867 Machine Learning

Week 3: The EM algorithm

Machine Learning 4771

Lecture 6: April 19, 2002

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2

Transcription:

Helsinki Institute for Information Technology Lecture 6: Gaussian Mixture Models (GMM) Pedram Daee 3.11.2015

Outline Gaussian Mixture Models (GMM) Models Model families and parameters Parameter learning for simple models (maximum likelihood) Gaussian CLT Mixtures Multi modal data Parameter learning for GMM Expectation Maximization Applications 2

A Gaussian Mixture Model (GMM) is a parametric probability density function represented as a weighted sum of Gaussian component densities... Let s start from the very very beginning 3

Goal: describe a set of data with a model and then use the model as the representative of the data What we have: A set of d-dimensional data points Example: Two dimensions: d=2 N=16 X = x 1,, x N x t R d = {x t N } t=1 4

Assumptions: There exists a model that describes the data The designer knows the family type of the model, but not its parameters To do: Given the family model, find the best model that fits to the data Meaning: given the distribution family, find the best parameters that fits the distribution to the observed data 5

Model families and their parameters: Chi-square Beta Poisson Gaussian (1-d) Gaussian (2-d) 0.25 Binomial 0.2 0.15 0.1 0.05 2 3 0-3 -2-1 0 1 2 v 3 1 0 1 v 2 6

Example: Given that we know that data are Gaussian, what kind of Gaussian distribution should we use to represent them? Problem definition: x 1,, x N ~N μ, σ 2 What is the best value for μ? One answer: The μ that maximizes the likelihood function Maximum Likelihood estimate (ML) 7

Definition: Likelihood function: ML estimate: Example: x 1,, x N ~N μ, σ 2 Find μ ML L X; θ = P(X; θ) θ ML = arg max L(X; θ) 8

Solution for μ ML = arg max L(X; μ, σ 2 ) Picture will be added Side notes: If x 1 and x 2 are independent, then: P(x 1, x 2 ) = P x 1 P(x 2 ) We can maximize the log of a function, instead of the function arg max[f θ + const] = arg max f θ = arg max log f θ 9

By using ML we can estimate the parameters of any model, given that we know the model family and we have observed a set of data 10

Gaussian (normal) distributions are preferred models for continuous random variables Central limit theorem: Mean of a large number of random variables has an approximately Gaussian distribution Example: A large number of phenomena in the nature, like noise, are also sum of different random factors 11

Sometimes it is hard to describe the data with just one model The next reasonable step is to use sum of several models x 1,, x N ~w 1 f 1 θ 1 +w 2 f 2 θ 2 +w 3 f 3 θ 3,where w 1 + w 2 + w 3 = 1 Each model has its own family and parameters We have to estimate both parameters and weights We don t know which data comes from which model => estimating the parameters is very difficult 12

Different operating regimes Continuous nonlinearities 13

A Gaussian Mixture Model (GMM) is a parametric probability density function represented as a weighted sum of Gaussian component densities. x 1,, x N ~w 1 N μ 1, Σ 2 2 1 + +w M N μ M, Σ M M w i = 1,where i=1 14

Problem statement: We need to estimate: λ = w i, μ i, Σ i i = 1,, M GMM: Each data comes from one of the mixtures, which is unknown to us. For each data x t there is a latent parameter z t that indicate from which mixture x is coming from X: observed data, Z: unobserved latent variables {X, Z} : complete data, X: incomplete data 15

This time there are two sets of unknowns The parameters of each model The latent variable Z: specifying the mixture component that each data point belongs to It is often intractable to compute the maximum likelihood since the likelihood would be products of sums λ ML = arg max L(X; λ) = arg max P(X; λ) = arg max[ Z P(X, Z; λ) ] One approach to solve this ML estimate is to use Expectation Maximization (EM) algorithm 16

Expectation Maximization The EM algorithm is used to find (locally) maximum likelihood parameters of a statistical model in cases where the equations cannot be solved directly. The EM algorithm proceeds from the observation that the following is a way to solve these two sets of equations numerically. One can simply pick arbitrary values for one of the two sets of unknowns, use them to estimate the second set, then use these new values to find a better estimate of the first set, and then keep alternating between the two until the resulting values both converge to fixed points. 17

Expectation Maximization EM algorithm: Goal: find λ ML = arg max[log Z P(X, Z; λ)] 1. E-step: Take the expectation over the latent variables, assuming that the unknown parameters are fixed 2. M-step Assume that the latent variables are fixed and solve the maximum likelihood for unknown parameters EM in one equation: λ t+1 = arg max E Z X, λ t log P(X, Z λ) 18

Clustering with GMM 19

GMM in Matlab doc gmdistribution.fit Example See GMM_test.m 20

GMM in Matlab 21

Thank You 22