Finite Singular Multivariate Gaussian Mixture

Similar documents
COM336: Neural Computing

The Expectation-Maximization Algorithm

CPSC 540: Machine Learning

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Expectation Maximization

Latent Variable Models and EM algorithm

Computing the MLE and the EM Algorithm

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Latent Variable Models and EM Algorithm

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2

Machine Learning for Data Science (CS4786) Lecture 12

Expectation Maximization Algorithm

K-Means, Expectation Maximization and Segmentation. D.A. Forsyth, CS543

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Statistical Pattern Recognition

Gaussian Mixture Models

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Linear Regression and Discrimination

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Mixtures of Gaussians. Sargur Srihari

1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM.

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Computational functional genomics

CSC411 Fall 2018 Homework 5

MIXTURE MODELS AND EM

Statistics & Data Sciences: First Year Prelim Exam May 2018

Clustering using Mixture Models

CSE 150. Assignment 6 Summer Maximum likelihood estimation. Out: Thu Jul 14 Due: Tue Jul 19

K-Means and Gaussian Mixture Models

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

The Expectation Maximization or EM algorithm

Finite Mixture Models and Clustering

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

Mixtures of Gaussians continued

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

EM-based Reinforcement Learning

Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger

EM & Variational Bayes

Degenerate Expectation-Maximization Algorithm for Local Dimension Reduction

Gaussian Mixture Models, Expectation Maximization

Probabilistic & Unsupervised Learning

CSE446: Clustering and EM Spring 2017

Unsupervised Learning

Bayesian Decision and Bayesian Learning

EM for ML Estimation

Parameter estimation in linear Gaussian covariance models

Chapter 4: Asymptotic Properties of the MLE (Part 2)

COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017

Weighted Finite-State Transducers in Computational Biology

Gaussian Mixture Models

A Derivation of the EM Updates for Finding the Maximum Likelihood Parameter Estimates of the Student s t Distribution

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

EM for Spherical Gaussians

MATH 829: Introduction to Data Mining and Analysis Graphical Models III - Gaussian Graphical Models (cont.)

Naive Bayes and Gaussian Bayes Classifier

Naive Bayes and Gaussian Bayes Classifier

CS229 Lecture notes. Andrew Ng

Optimization Methods II. EM algorithms.

The Gaussian distribution

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

Ph.D. Qualifying Exam Monday Tuesday, January 4 5, 2016

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Introduction to Machine Learning

Likelihood, MLE & EM for Gaussian Mixture Clustering. Nick Duffield Texas A&M University

An Introduction to Expectation-Maximization

CPSC 540: Machine Learning

Clustering and Gaussian Mixture Models

Week 3: The EM algorithm

EXPECTATION- MAXIMIZATION THEORY

STATS 306B: Unsupervised Learning Spring Lecture 3 April 7th

Computational methods for mixed models

X t = a t + r t, (7.1)

Chapter 10. Semi-Supervised Learning

Linear Regression and Its Applications

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM

Complex random vectors and Hermitian quadratic forms

Cheng Soon Ong & Christian Walder. Canberra February June 2017

CSC 2541: Bayesian Methods for Machine Learning

A Note on the Expectation-Maximization (EM) Algorithm

Maximum Likelihood Estimation. only training data is available to design a classifier

EM-algorithm for motif discovery

Naive Bayes and Gaussian Bayes Classifier

PROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

Lecture 5. Gaussian Models - Part 1. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. November 29, 2016

10708 Graphical Models: Homework 2

Graphical Models for Collaborative Filtering

Mixture Models and Expectation-Maximization

Factor Analysis and Kalman Filtering (11/2/04)

Probabilistic Latent Semantic Analysis

Theory and Use of the EM Algorithm. Contents

CS6220: DATA MINING TECHNIQUES

Math 181B Homework 1 Solution

CSCI-567: Machine Learning (Spring 2019)

Naïve Bayes classification

Journal of Multivariate Analysis. Sphericity test in a GMANOVA MANOVA model with normal error

CS Lecture 19. Exponential Families & Expectation Propagation

Transcription:

21/06/2016

Plan 1 Basic definitions Singular Multivariate Normal Distribution 2 3

Plan Singular Multivariate Normal Distribution 1 Basic definitions Singular Multivariate Normal Distribution 2 3

Multivariate Normal Distribution Singular Multivariate Normal Distribution Gaussian Vector Properties Let X = (X 1, X 2,..., X d ) T N d (µ, Σ). β R d, β T X N(β T µ, β T Σβ) Laplace Transform :L(θ) = E(e θt X ) = e θt µ+ 1 2 θt Σθ Density function if Σ is positive-definite (Σ > 0) : x R d, f (x) = (2π) d 2 Σ 1 2 exp( 1 2 (x µ)t Σ 1 (x µ))

Singular Multivariate Normal Distribution Singular Multivariate Normal Distribution Definition Let X = (X 1, X 2,..., X d ) T N d (µ, Σ) If r = rank(σ) < d, X has a singular multivariate normal distribution. Laplace Transform :L(θ) = E(e θt X ) = e θt µ+ 1 2 θt Σθ Σ = (P 1, P 2 ) ( ) ( Λ 0 P1 0 0 P 2 ) T where Λ = λ 1... The density function is defined on an affine subspace of R d : {f (x) = (2π) r 2 Λ 1 2 exp( 1 2 (x µ)t Σ + (x µ)) P2 T x = PT 2 µ Where Σ + is the Moore-Penrose pseudo-inverse of Σ. λ r

Plan 1 Basic definitions Singular Multivariate Normal Distribution 2 3

Mixture Model Definition Consider a mixture of K d-variate singular normal distributions with density : K f (x Θ) = π k f k (x µ k, Σ k ) (1) k=1 π 1, π 2,..., π K are the mixing weights :( K π k = 1) k=1 f k (x µ k, Σ k ) is the density function of multivariate degenerate Gaussian distribution with mean µ k and covariance matrix Σ k Θ = {π k, µ k, Σ k ; k = 1..K} the parameters vector All components are concentrated on an affine subspace E.

Concrete Situation Clustering data Y is drawn form a population which consists of K groups G 1, G 2,...,G K in proportions π 1,..., π K. Z is K-dimensional component-label { vector 1 if Y i G k Z i = (Z i1,..., Z ik ) : Z ik = (Z i ) k = 0 if Y i / G k Z i is distributed according to a multinomial distribution with probabilities vector π = (π 1,..., π K ) Z i Mult K (1, π) and P(Z i = z i ) = K k=1 π z i k k

Data in the EM framework (Dempster et al., 1977) y = (y 1,..., y n ) : the observed data. z = (z 1,..., z n ) : the hidden data. The complete data is (y T, z T ) T Likelihood function The complete likelihood function is given by : l(y 1,..., y n, z 1,..., z n Θ) = n K [π z i f z i (y i µ, Σ )] i=1 =1 The complete-data log-likelihood can be written as : L(y 1,..., y n, z 1,..., z n Θ) = n K z i log(π ) + z i f (y i µ, Σ ) i=1 =1

EM Algorithm : 3 steps Initialization π (0) k = 1 K µ (0) k E Σ (0) k positive semi-definite verifying : ϕ(σ (0) k ) = Ē where ϕ(a) denotes the column space of a matrix A and Ē is the vector subspace associated to E. Expectation Step : E-Step After l iterations, the conditional expectation of the complete-data log likelihood given the observed data using the current fit can be written as follows : Q(Θ Θ (l) ) = E Θ (l)(l(y, z, Θ) y)

EM Algorithm : 3 steps Expectation Step : E-Step The posterior probability that y i belongs to the th component of the mixture : τ (l) i = E Θ (l)(z i y) = P Θ (l)(z i = 1 y) = π(l) K k=1 f (y i µ (l),σ (l) ) π (l) k f k(y i µ (l) k,σ(l) k ) The conditional expectation of the complete-data log likelihood given y is : Q(Θ Θ (l) ) = n K τ (l) i [log(π ) + log(f (y i µ, Σ ))] Q(Θ Θ (l) ) = n i=1 =1 K i=1 =1 1 2 (y i µ ) T Σ + (y i µ )] τ (l) i [log(π ) r 2 log(2π) 1 2 log Λ

EM Algorithm : 3 steps Maximization Step : M-Step The global maximization of Q(Θ Θ (l) ) with respect to Θ over the parameter space to give the updated estimate Θ (l+1). Θ (l+1) = argmax Θ Q(Θ Θ (l) ) π (l+1) n = 1 n i=1 τ (l) i µ (l+1) = n i=1 n τ (l) i y i n = n i=1 τ (l) i

EM Algorithm : 3 steps Maximization Step : Covariance matrices Let S (l+1) = n i=1 τ (l) i (y i µ (l+1) )(y i µ (l+1) ) T (2) and H be the d r matrix of eigenvectors corresponding to the r largest eigenvalues of S (l+1), denoted by L = diag(l 1,..., l r ). Then the maximum of Q with respect to Σ is achieved for : Σ (l+1) = H L H T n where n = n i=1 τ (l) i.

Plan 1 Basic definitions Singular Multivariate Normal Distribution 2 3

Singular Multivariate Normal Distribution 1 Simulation study in progress. 2 Clustering real data. 3 Convergence problems : Choosing good Starting values.

Thank you!

Moore-Penrose Pseudo-inverse Definition Let A M n,m, then there exists a unique A + M n,m that satisfies : 1 AA + A = A 2 A + AA + = A + 3 A + A = (A + A) Hermitian 4 AA + = (AA + ) Hermitian Where M is the conugate transpose of matrix M. Remark If A is square and non-singular, it is clear that A + = A 1. For Σ we can check that : Σ + = P 1 Λ 1 P T 1

Maximum Likelihood Estimation Theorem (M.S. Srivastava & D. von Rosen) Let Y = (y 1 : y 2 :... : y n ) be a set of i.i.d data drawn from N d (µ, Σ), where Σ is of rank r(σ) = r, and Σ = P 1 ΛP T 1. Let S = Y (I 1 n 11T )Y and H be the p r matrix of eigenvectors corresponding to the r largest eigenvalues of S, denoted by L = diag(l 1,..., l r ). 1 stands for n 1 vector of ones. Then the MLE of Λ and P 1 are respectively given by : ˆΛ = 1 n L ˆP 1 = H ˆΣ = 1 n HLHT