Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger

Similar documents
The Expectation Maximization or EM algorithm

Machine Learning for Data Science (CS4786) Lecture 12

Expectation Maximization Algorithm

CSE446: Clustering and EM Spring 2017

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Mixtures of Gaussians. Sargur Srihari

Statistical Pattern Recognition

Lecture 3: Pattern Classification

Likelihood, MLE & EM for Gaussian Mixture Clustering. Nick Duffield Texas A&M University

Naïve Bayes classification

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Statistical Estimation

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation

EM for Spherical Gaussians

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Linear Models for Classification

Gaussian Mixture Models, Expectation Maximization

Data Mining and Analysis: Fundamental Concepts and Algorithms

Pattern Recognition. Parameter Estimation of Probability Density Functions

Mixture Models & EM algorithm Lecture 21

Probabilistic clustering

The Gaussian distribution

K-Means, Expectation Maximization and Segmentation. D.A. Forsyth, CS543

4 Gaussian Mixture Models

CPSC 540: Machine Learning

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,

Introduction to Machine Learning

Hidden Markov Models and Gaussian Mixture Models

CS Lecture 19. Exponential Families & Expectation Propagation

CS6220: DATA MINING TECHNIQUES

EM-based Reinforcement Learning

Review and Motivation

Statistical learning. Chapter 20, Sections 1 4 1

CSCI-567: Machine Learning (Spring 2019)

Multiple Random Variables

Gaussian Mixture Models

Biostat 2065 Analysis of Incomplete Data

Elementary Differential Equations, Section 2 Prof. Loftin: Practice Test Problems for Test Find the radius of convergence of the power series

Mixtures of Gaussians continued

COM336: Neural Computing

Normalising constants and maximum likelihood inference

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Statistics & Data Sciences: First Year Prelim Exam May 2018

COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Expectation Maximization

The Expectation-Maximization Algorithm

Unsupervised Learning

A Note on the Expectation-Maximization (EM) Algorithm

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Machine Learning 4771

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

Latent Variable View of EM. Sargur Srihari

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2

Statistical Data Mining and Machine Learning Hilary Term 2016

Informatics 2B: Learning and Data Lecture 10 Discriminant functions 2. Minimal misclassifications. Decision Boundaries

Linear Classification: Probabilistic Generative Models

Fitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation

Finite Singular Multivariate Gaussian Mixture

K-Means and Gaussian Mixture Models

Linear Dynamical Systems

Bayes Decision Theory

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

ECE 5984: Introduction to Machine Learning

Expectation Propagation Algorithm

COS Lecture 16 Autonomous Robot Navigation

Clustering, K-Means, EM Tutorial

Cheng Soon Ong & Christian Walder. Canberra February June 2017

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

MACHINE LEARNING AND PATTERN RECOGNITION Fall 2006, Lecture 8: Latent Variables, EM Yann LeCun

Energy Based Models. Stefano Ermon, Aditya Grover. Stanford University. Lecture 13

Solution. Daozheng Chen. Challenge 1

CS6220: DATA MINING TECHNIQUES

Statistical Pattern Recognition

Mixture Models and EM

Bayesian Classifiers and Probability Estimation. Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

PATTERN RECOGNITION AND MACHINE LEARNING

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Chapter 5 continued. Chapter 5 sections

Hidden Markov Models and Gaussian Mixture Models

Gauss-Seidel method. Dr. Motilal Panigrahi. Dr. Motilal Panigrahi, Nirma University

EM Algorithm. Expectation-maximization (EM) algorithm.

Math 240 Calculus III

Basic math for biology

Graphical Models for Collaborative Filtering

Announcements. Proposals graded

The Expectation-Maximization Algorithm

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Transcription:

Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm by Korbinian Schwinger

Overview Exponential Family Maximum Likelihood The EM Algorithm Gaussian Mixture Models

Exponential Family A density function of a random variable x is member of the exponential family if the distribution function can be written as : n f x =h x exp b i=1 w i t i x

Gauss in 1 dimension Density function with the Parameters μ=0 and σ²=1 f x x = 1 x 2 2 e 2

Gauss in 1 dimension Distribution function with the Parameters μ=0 and σ²=1 x F x x = x 1 2 2 e 2 dx

Gauss in 2 dimensions μ becomes a vector, the variance becomes the covariance-matrix, which shows the relationship between the 2 Coordinates : = 1 2 = 11 12 12 ij =E [ x i i x j j ] 22

Gauss in 2 dimensions Density with the Parameters : = 10 1 = 9 1 1 2

Gauss in n-dimensions The parameters become : = 1 2 n 11 12 1 n = 12 22 2 n 1 n 2 n nn ij =E [ x i i x j j ]

Maximum Likelihood We have a density function p(x Θ) which depends on parameters in Θ We want to find out, which parameters in Θ are the most probable First, we need a set of data X of size N With this set we can create a new density function N p X = i=1 p x i =L X

Maximum Likelihood Best set of parameters maximizes the likelihood function max =argmax L X Often log(l(θ X)) is maximized If distribution is Gaussian the maximum can be found with the derivative

Maximum Likelihood - example We have got a random variable with a gaussian distribution with the Parameters μ and σ² The likelihood function is L x ;, 2 = 1 n 2 n i=1 exp x i 2 2 2

Maximum Likelihood - example Now we have to calculate the logaritm of the likelihood function n ln L x ;, 2 = n ln 2 ln i=1 x i 2 2 2

Maximum Likelihood - example Next step is to create the derivative of the loglikelihood ln L n = i=1 x i 2 =! 0 ln L = n n i=1 x i 2 3 =! 0

Maximum Likelihood - example At the end we have got : = x n 2 = 1 n i=1 x i 2

Basic EM Used to find the maximum-likelihood estimate when features are missing Limitations of the observation process calculation can be simplified First, we assume the observed data X are generated by some distribution Second, we assume, that a complete set of data Z=X+Y exists

Basic EM Then we can create a density function : p z = p x, y = p y x, p x With this density function, the complete-data likelihood, p(x,y Θ), can be formed Then the Algorithm calculates the expected value of the complete-data log-likelihood Q, i 1 =E [log p X,Y X, i 1 ] = y Y log p X, y f y X, i 1 dy

Basic EM The evaluation of the expectation is called the E- step The M-step is to maximize the expectation, we computed in the first step : i =argmax Q, i 1 These steps are repeated as necessary each iteration is guaranteed to increase the likelihood

Basic EM - example We have got 4 Points in 2D Coordinates : { 0 2, 1 0, 2 2,? 4 } We assume, the model is a Gaussian : 1 = 2 2 2 1 2

Basic EM - example The first estimation, we get with the assumption of an Gaussian model : = 0 1 0 0 1

Basic EM - example 3 = k=1 Now we calculate the expected value : = 3 [ k=1 Q, 0 =E [ln p x g, x b ; 0 ; D g ] ln p x k ln p x 4 ] p x 41 0 ; x 42 =4 dx 41 [ln p x k ] ln p x 41 4 p x 41 4 0 p x ' 0 41 4 dx 41 ' =K dx 41

Basic EM - example K is constant and can be brought out of the integral : 3 Q ; 0 = k=1 [ln p x k ] 1 K ln p x 41 4 1 2 1 0 exp [ 1 1 0 2 x 2 41 4 2 ] dx 41

Basic EM - example At the end we get (E step completed) [ln p x k ] 1 2 1 3 Q ; 0 = k=1 2 4 2 2 2 1 2 2 Then, we can calculate the parameters, which maximize Q(..) 1 = 0.75 2.0 0.938 2.0 2 ln 2 1 2

Basic EM - example

Gaussian Mixture Models We assume the following probilistic model: The parameters M are and i=1 i =1 p x = With a lot of calculations we get the function M i=1 i p i x i = 1,..., M, 1,..., M M N Q, g = l=1 i=1 M N log l p l x i, g l=1 i=1 log p l x i l p l x i, g

Gaussian Mixture Models After some more pages of calculations you get a new estimation of the parameters : N l new = 1 N i=1 N new i=1 l = p l x i, g new i=1 l = These values can be used to repeat the iteration as often as needed N N i=1 x i p l x i, g p l x i, g p l x i, g x i l new x i l new T N i=1 p l x i, g

Gaussian Mixture Models - example Java Demo Applet