Mixture of Gaussians Expectation Maximization (EM) Part 2

Similar documents
Expectation Maximization Mixture Models HMMs

Outline. EM Algorithm and its Applications. K-Means Classifier. K-Means Classifier (Cont.) Introduction of EM K-Means EM EM Applications.

Hidden Markov Model Cheat Sheet

LECTURE :FACTOR ANALYSIS

Speech and Language Processing

EM and Structure Learning

Course 395: Machine Learning - Lectures

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression

Lecture Nov

Supplementary material: Margin based PU Learning. Matrix Concentration Inequalities

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

Lecture 21: Numerical methods for pricing American type derivatives

Machine Learning for Signal Processing Linear Gaussian Models

The Expectation-Maximization Algorithm

The Bellman Equation

Differentiating Gaussian Processes

9 : Learning Partially Observed GM : EM Algorithm

Mechanics Physics 151

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin

Overview. Hidden Markov Models and Gaussian Mixture Models. Acoustic Modelling. Fundamental Equation of Statistical Speech Recognition

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Matrix Approximation via Sampling, Subspace Embedding. 1 Solving Linear Systems Using SVD

Classification Bayesian Classifiers

n α j x j = 0 j=1 has a nontrivial solution. Here A is the n k matrix whose jth column is the vector for all t j=0

( ) 2 ( ) ( ) Problem Set 4 Suggested Solutions. Problem 1

+, where 0 x N - n. k k

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

Lecture 3. Ax x i a i. i i

Mixture o f of Gaussian Gaussian clustering Nov

Singular Value Decomposition: Theory and Applications

Chapter Newton s Method

Mechanics Physics 151

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

Approximate Inference: Mean Field Methods

18.1 Introduction and Recap

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 3: Large deviations bounds and applications Lecturer: Sanjeev Arora

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

CIS 700: algorithms for Big Data

Confidence intervals for weighted polynomial calibrations

Semi-Supervised Learning

CHALMERS, GÖTEBORGS UNIVERSITET. SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD

Linear Approximation with Regularization and Moving Least Squares

Mean Field / Variational Approximations

Machine learning: Density estimation

Advanced Topics in Optimization. Piecewise Linear Approximation of a Nonlinear Function

Lecture Notes on Linear Regression

Clustering gene expression data & the EM algorithm

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )

14 Lagrange Multipliers

ONE DIMENSIONAL TRIANGULAR FIN EXPERIMENT. Technical Advisor: Dr. D.C. Look, Jr. Version: 11/03/00

Neural networks. Nuno Vasconcelos ECE Department, UCSD

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Algorithms for factoring

Pattern Classification (II) 杜俊

An application of generalized Tsalli s-havrda-charvat entropy in coding theory through a generalization of Kraft inequality

ρ some λ THE INVERSE POWER METHOD (or INVERSE ITERATION) , for , or (more usually) to

1 Convex Optimization

Hidden Markov Models

Lecture 20: Lift and Project, SDP Duality. Today we will study the Lift and Project method. Then we will prove the SDP duality theorem.

A New Refinement of Jacobi Method for Solution of Linear System Equations AX=b

Practical Newton s Method

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Pattern Recognition. Approximating class densities, Bayesian classifier, Errors in Biometric Systems

The Dirac Equation for a One-electron atom. In this section we will derive the Dirac equation for a one-electron atom.

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE

5. POLARIMETRIC SAR DATA CLASSIFICATION

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

COS 511: Theoretical Machine Learning

Additional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty

Probability Density Function Estimation by different Methods

Lecture 12: Discrete Laplacian

ECE559VV Project Report

An Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation

Gaussian Mixture Models

Chapter 8. Potential Energy and Conservation of Energy

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications

Chapter 12. Ordinary Differential Equation Boundary Value (BV) Problems

Supplementary Material for Spectral Clustering based on the graph p-laplacian

STATS 306B: Unsupervised Learning Spring Lecture 10 April 30

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law:

Grover s Algorithm + Quantum Zeno Effect + Vaidman

This model contains two bonds per unit cell (one along the x-direction and the other along y). So we can rewrite the Hamiltonian as:

Expected Value and Variance

Lec 02 Entropy and Lossless Coding I

Yong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 )

PHYS 705: Classical Mechanics. Calculus of Variations II

Machine Learning. Classification. Theory of Classification and Nonparametric Classifier. Representing data: Hypothesis (classifier) Eric Xing

Departure Process from a M/M/m/ Queue

A linear imaging system with white additive Gaussian noise on the observed data is modeled as follows:

Some Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS)

Probabilistic Graphical Models

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16

Computing MLE Bias Empirically

coordinates. Then, the position vectors are described by

Economics 101. Lecture 4 - Equilibrium and Efficiency

Transcription:

Mture of Gaussans Eectaton Mamaton EM Part 2 Most of the sldes are due to Chrstoher Bsho BCS Summer School Eeter 2003. The rest of the sldes are based on lecture notes by A. Ng

Lmtatons of K-means Hard assgnments of data onts to clusters small shft of a data ont can fl t to a dfferent cluster Not clear how to choose the value of K Soluton: relace hard clusterng of K-means wth soft robablstc assgnments Reresents the robablty dstrbuton of the data as a Gaussan mture model BCS Summer School Eeter 2003 Chrstoher M. Bsho

Gaussan Mtures Lnear suer-oston of Gaussans Normalaton and ostvty requre Can nterret the mng coeffcents as ror robabltes K 1 BCS Summer School Eeter 2003 Chrstoher M. Bsho

Mamum Lelhood for the GMM The log lelhood functon taes the form Note: sum over comonents aears nsde the log There s no closed form soluton for mamum lelhood How to mame the log lelhood solved by eectaton-mamaton EM algorthm BCS Summer School Eeter 2003 Chrstoher M. Bsho

EM Algorthm Informal Dervaton The solutons are not closed form snce they are couled Suggests an teratve scheme for solvng them: Mae ntal guesses for the arameters Alternate between the followng two stages: 1. E-ste: evaluate resonsbltes 2. M-ste: udate arameters usng ML results BCS Summer School Eeter 2003 Chrstoher M. Bsho

General Vew of the EM algorthm Jensen's nequalty: Let f be a conve functon f '' 0 for all and X be a and random varable. Then E[ f X] f E[ X]. f s conve X =a wth robablty 0.5 X =b wth robablty 0.5 E[X] s gven by the mdont between a and b. E[fX] s the mdont between fa and fb. E[fX] fex.

Jensen's nequalty cont. Further f f s a strctly conve functon f '' 0 then E[ f X] f E[ X] holds true f and only f E[ X] X wth robablty 1.e. f X s a constant. Jensen's nequalty also holds for concave functons f '' 0 but wth the drecton of all the nequaltes reversed E[ f X] f E[ X] etc..

Problem Defnton Suose we have an estmaton roblem n whch we have a tranng set.. } of d samles. { 1 m We wsh to ft the arameters of a model to the data. We want to mame the lelhood m 1 Dong t elctly may be hard snce s are the nonobserved. If s were observed then often mamum lelhood estmaton would be easy. 1 l log log m observed hdden arameters

EM at glance Our strategy wll be to reeatedly construct a lower-bound on l E-ste otme that lower-bound M-ste. Lower bounds on l l 0 1 2 3 *

0 1 EM algorthm dervaton m m l 1 1 log log m 1 log m E 1 log s some dstrbuton over s E s wth resect to drawn accordng to the dstrbuton gven by m E 1 log m 1 log Jensen's nequalty: ] [log ] [ log X E X E Ths s a lower bound on l

EM algorthm dervaton cont. m l 1 log We want a lower bound to be equal to l at the revous

EM algorthm dervaton cont. Snce we now that then 1 const To ensure that we should choose such that nequalty n our dervaton above would hold wth equalty. We requre that:

General EM Algorthm Reeat untl convergence { E-ste: For each set : M-ste: : arg ma m log 1 } Lower bound on l

EM for MoG revsted For 1 N 1 j K defne hdden varables 1 f samle was generated by comonent j otherwse 0 j are ndcator random varables they ndcate whch Gaussan comonent generated samle { 1... K } Let ndcator r.v. corresond to samle. We say that the rest are 0. when ts st coordnate s 1 and Condtoned on dstrbuton of s Gaussan ~ N j

EM for MoG revsted E-ste: j j j j N N γ

EM for MoG revsted M-ste: N K N 1 1 γ log γ m 1 log ma 0... set N N 1 1 γ γ Smlarly N T N 1 1 γ γ γ 1 1 N N

K-means Algorthm Goal: reresent a data set n terms of K clusters each of whch s summared by a rototye Intale rototyes then terate between two hases: E-ste: assgn each data ont to nearest rototye M-ste: udate rototyes to be the cluster means BCS Summer School Eeter 2003 Chrstoher M. Bsho

Resonsbltes Resonsbltes assgn data onts to clusters such that Eamle: 5 data onts and 3 clusters BCS Summer School Eeter 2003 Chrstoher M. Bsho

K-means Cost Functon data resonsbltes rototyes BCS Summer School Eeter 2003 Chrstoher M. Bsho

Mnmng the Cost Functon E-ste: mnme w.r.t. assgns each data ont to nearest rototye M-ste: mnme w.r.t gves each rototye set to the mean of onts n that cluster Convergence guaranteed snce there s a fnte number of ossble settngs for the resonsbltes BCS Summer School Eeter 2003 Chrstoher M. Bsho

EM Eamle Eamle from R. Guterre-Osuna Tranng set of 900 eamles formng an annulus Mture model wth m = 30 Gaussan comonents of unnown mean and varance s used Tranng: Intalaton: means to 30 random eamles covarance matrces ntaled to be dagonal wth large varances on the dagonal comared to the tranng data varance Durng EM tranng comonents wth small mng coeffcents were trmmed Ths s a trc to get n a more comact model wth fewer than 30 Gaussan comonents

EM Eamle from R. Guterre-Osuna

EM Moton Segmentaton Eamle Three frames from the MPEG flower garden sequence Fgure from Reresentng Images wth layers by J. Wang and E.H. Adelson IEEE Transactons on Image Processng 1994 c 1994 IEEE