Expectation Maximization Mixture Models HMMs

Similar documents
Expectation Maximization Mixture Models

Expectation Maximization Mixture Models HMMs

Machine learning: Density estimation

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

Machine Learning for Signal Processing Expectation Maximization Mixture Models. Bhiksha Raj 27 Oct /

Machine Learning for Signal Processing Linear Gaussian Models

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Chapter 1. Probability

Mixture of Gaussians Expectation Maximization (EM) Part 2

Course 395: Machine Learning - Lectures

9 : Learning Partially Observed GM : EM Algorithm

CHAPTER 3: BAYESIAN DECISION THEORY

Space of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics

Retrieval Models: Language models

Overview. Hidden Markov Models and Gaussian Mixture Models. Acoustic Modelling. Fundamental Equation of Statistical Speech Recognition

Machine Learning for Signal Processing Linear Gaussian Models

Statistical analysis using matlab. HY 439 Presented by: George Fortetsanakis

Gaussian Mixture Models

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression

An Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation

Lecture Nov

Expected Value and Variance

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

Evaluation for sets of classes

Mean Field / Variational Approximations

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

Classification as a Regression Problem

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

Homework Assignment 3 Due in class, Thursday October 15

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

The Expectation-Maximization Algorithm

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models

A random variable is a function which associates a real number to each element of the sample space

e i is a random error

18.1 Introduction and Recap

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

Hidden Markov Models

+, where 0 x N - n. k k

Parametric fractional imputation for missing data analysis

Learning with Maximum Likelihood

10-701/ Machine Learning, Fall 2005 Homework 3

Mixture o f of Gaussian Gaussian clustering Nov

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 3: Large deviations bounds and applications Lecturer: Sanjeev Arora

The EM Algorithm (Dempster, Laird, Rubin 1977) The missing data or incomplete data setting: ODL(φ;Y ) = [Y;φ] = [Y X,φ][X φ] = X

Probability and Random Variable Primer

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Learning from Data 1 Naive Bayes

Lecture 3: Probability Distributions

XII.3 The EM (Expectation-Maximization) Algorithm

Probability Density Function Estimation by different Methods

The Basic Idea of EM

Learning with Partially Observed Data

Sampling Theory MODULE VII LECTURE - 23 VARYING PROBABILITY SAMPLING

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ

Clustering with Gaussian Mixtures

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin

Introduction to Random Variables

Speech and Language Processing

Boostrapaggregating (Bagging)

SDMML HT MSc Problem Sheet 4

Bayesian Learning. Smart Home Health Analytics Spring Nirmalya Roy Department of Information Systems University of Maryland Baltimore County

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING

Generative classification models

Probability Theory. The nth coefficient of the Taylor series of f(k), expanded around k = 0, gives the nth moment of x as ( ik) n n!

Limited Dependent Variables

Problem Set 9 Solutions

Semi-Supervised Learning

Logistic Regression Maximum Likelihood Estimation

A REVIEW OF ERROR ANALYSIS

14 Lagrange Multipliers

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

RELIABILITY ASSESSMENT

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

Markov Chain Monte Carlo Lecture 6

Maximum Likelihood Estimation

Hopfield networks and Boltzmann machines. Geoffrey Hinton et al. Presented by Tambet Matiisen

Lecture 12: Classification

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )

APPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14

Linear Approximation with Regularization and Moving Least Squares

Conjugacy and the Exponential Family

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

Engineering Risk Benefit Analysis

1/10/18. Definitions. Probabilistic models. Why probabilistic models. Example: a fair 6-sided dice. Probability

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

PHYS 450 Spring semester Lecture 02: Dealing with Experimental Uncertainties. Ron Reifenberger Birck Nanotechnology Center Purdue University

Motion Perception Under Uncertainty. Hongjing Lu Department of Psychology University of Hong Kong

Chapter 8 Indicator Variables

Natural Images, Gaussian Mixtures and Dead Leaves Supplementary Material

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1

Generative and Discriminative Models. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

Transcription:

-755 Machne Learnng for Sgnal Processng Mture Models HMMs Class 9. 2 Sep 200 Learnng Dstrbutons for Data Problem: Gven a collecton of eamples from some data, estmate ts dstrbuton Basc deas of Mamum Lelhood and MAP estmaton can be found n Aart/Pars sldes Ponted to n a prevous class Soluton: Assgn a model to the dstrbuton Learn parameters of model from data Models can be arbtrarly comple Mture denstes, Herarchcal models. Learnng must be done usng Followng sldes: An ntutve eplanaton usng a smple eample of multnomals 2 A Thought Eperment 6 3 5 4 2 4 A person shoots a loaded dce repeatedly You observe the seres of outcomes You can form a good dea of how the dce s loaded Fgure out what the probabltes of the varous are for dce number = countnumber/sumrolls Ths s a mamum lelhood estmate Estmate that maes the observed sequence of most probable The Multnomal Dstrbuton A probablty dstrbuton over a dscrete collecton of tems s a Multnomal : belongs to a dscrete set E.g. the roll of dce : n,2,3,4,5,6 Or the toss of a con : n head, tals 3 4 Mamum Lelhood Estmaton n n 2 n 3 n 4 n 5 n 6 p p p 6 3 p p 4 p 2 p 4 2 p5 p p 5 p p6 3 p 6 Basc prncple: Assgn a form to the dstrbuton E.g. a multnomal Or a Gaussan Fnd the dstrbuton that best fts the hstogram of the data 5 Defnng Best Ft The data are generated by draws from the dstrbuton I.e. the generatng process draws from the dstrbuton Assumpton: The dstrbuton has a hgh probablty of generatng the observed data ot necessarly true Select the dstrbuton that has the hghest probablty of generatng the data Should assgn lower probablty to less frequent observatons and vce versa 6

Mamum Lelhood Estmaton: Multnomal Segue: Gaussans Probablty of generatng n, n 2, n 3, n 4, n 5, n 6 n P n, n2, n3, n4, n5, n6 Const p Fnd p,p 2,p 3,p 4,p 5,p 6 so that the above s mamzed Alternately mamze n, n2, n3, n4, n5, n log Const n logp log 6 Log s a monotonc functon argma f = argma logf Solvng for the probabltes gves us Requres constraned optmzaton to ensure probabltes sum to p 7 n n j j EVETUALLY ITS JUST COUTIG! ;, Parameters of a Gaussan: Mean, Covarance 8 T ep 0.5 2 d Mamum Lelhood: Gaussan Gven a collecton of observatons, 2,, estmate mean and covarance log T, 2,... ep 0.5 d 2 T, 2,... C 0.5log Mamzng w.r.t and gves us T ITS STILL JUST COUTIG! Laplacan L ;, b ep 2b b Parameters: Mean, scale b b > 0 9 0 Mamum Lelhood: Laplacan Gven a collecton of observatons, 2,, estmate mean and scale b,,... C log b log 2 Mamzng w.r.t and b gves us b b Drchlet K=3. Clocwse from top left: α=6, 2, 2, 3, 7, 5, 6, 2, 6, 2, 3, 4 from wpeda log of the densty as we change α from α=0.3, 0.3, 0.3 to 2.0, 2.0, 2.0, eepng all the ndvdual α's s equal to each other. Parameters are s Determne mode and curvature Defned only of probablty vectors = [ 2.. K ], =, >= 0 for all 2 D ; 2

0.3 0.25 0.2 0.5 0. 0.05 0 2 3 4 5 6 0.3 0.25 0.2 0.5 0. 0.05 0 2 3 4 5 6 Mamum Lelhood: Drchlet Gven a collecton of observatons, 2,, estmate, 2,... log j log log j log, o closed form soluton for s. eeds gradent ascent Several dstrbutons have ths property: the ML estmate of ther parameters have no closed form soluton Contnung the Thought Eperment 6 3 5 4 2 4 4 4 6 3 2 2 Two persons shoot loaded dce repeatedly The dce are dfferently loaded for the two of them We observe the seres of outcomes for both persons How to determne the probablty dstrbutons of the two dce? 3 4 Estmatng Probabltes Observaton: The sequence of from the two dce As ndcated by the colors, we now who rolled what number 6 4 5 2 3 4 5 2 2 4 3 4 6 2 6 Estmatng Probabltes Observaton: The sequence of from the two dce As ndcated by the colors, we now who rolled what number Segregaton: Separate the blue observatons from the red 6 4 5 2 3 4 5 2 2 4 3 4 6 2 6 6 5 2 4 2 3 6.. 4 3 5 2 4 4 2 6.. Collecton of blue Collecton of red 5 6 Estmatng Probabltes Observaton: The sequence of from the two dce As ndcated by the colors, we now who rolled what number 6 4 5 2 3 4 5 2 2 4 3 4 6 2 6 A Thought Eperment 6 4 5 3 2 2 2 Segregaton: Separate the blue observatons from the red 6 5 2 4 2 3 6.. 4 3 5 2 4 4 2 6.. From each set compute probabltes for each of the 6 possble outcomes no.of tmes number was rolled number total number of observed rolls 7 6 3 5 4 2 4 4 4 6 3 2 2 ow magne that you cannot observe the dce yourself Instead there s a caller who randomly calls out the outcomes 40% of the tme he calls out the number from the left shooter, and 60% of the tme, the one from the rght and you now ths At any tme, you do not now whch of the two he s callng out How do you determne the probablty dstrbutons for the two dce? 8 3

A Thought Eperment 6 4 5 3 2 2 2 A Mture Multnomal The caller wll call out a number n any gven callout IF He selects RED, and the Red de rolls the number OR He selects BLUE and the Blue de rolls the number = Red Red + Blue Blue E.g. 6 = Red6 Red + Blue6 Blue 6 3 5 4 2 4 4 4 6 3 2 2 How do you now determne the probablty dstrbutons for the two sets of dce.. If you do not even now what fracton of tme the blue are called, and what fracton are red? 9 A dstrbuton that combnes or mes multple multnomals s a mture multnomal P Mture weghts Component multnomals 20 Mture Dstrbutons P Mture weghts Component dstrbutons Mture of Gaussans and Laplacans P ; z, z P Mture Gaussan P ;, P L ;, b z z, Mture dstrbutons m several component dstrbutons Component dstrbutons may be of vared type Mng weghts must sum to.0 Component dstrbutons ntegrate to.0 Mture dstrbuton ntegrates to.0 2 z z Mamum Lelhood Estmaton For our problem: = color of dce P n P n, n2, n3, n4, n5, n6 Const Const Mamum lelhood soluton: Mamze log P n P, n2, n3, n4, n5, n6 log Const n log o closed form soluton summaton nsde log! In general ML estmates for mtures do not have a closed form USE EM! 22 n It s possble to estmate all parameters n ths setup usng the or EM algorthm Frst descrbed n a landmar paper by Dempster, Lard and Rubn Mamum Lelhood Estmaton from ncomplete data, va the EM Algorthm, Journal of the Royal Statstcal Socety, Seres B, 977 Much wor on the algorthm snce then The prncples behnd the algorthm ested for several years pror to the landmar paper, however. 23 Iteratve soluton Get some ntal estmates for all parameters Dce shooter eample: Ths ncludes probablty dstrbutons for dce AD the probablty wth whch the caller selects the dce Two steps that are terated: Epectaton Step: Estmate statstcally, the values of unseen varables Mamzaton Step: Usng the estmated values of the unseen varables as truth, estmates of the model parameters 24 4

EM: The aulary functon EM teratvely optmzes the followng aulary functon Q, =, log, are the unseen varables Assumng s dscrete may not be are the parameter estmates from the prevous teraton are the estmates to be obtaned n the current teraton as countng Instance from blue dce Instance from red dce Dce unnown 6 6 6 6 6........ 6 6 6.. 6.. Collecton of blue Collecton of red Collecton of blue Collecton of red Collecton of blue Collecton of red Hdden varable: Dce: The dentty of the dce whose number has been called out If we new for every observaton, we could estmate all terms By addng the observaton to the rght bn Unfortunately, we do not now t s hdden from us! Soluton: FRAGMET THE OBSERVATIO 25 26 Fragmentng the Observaton EM s an teratve algorthm At each tme there s a current estmate of parameters The sze of the fragments s proportonal to the a posteror probablty of the component dstrbutons The a posteror probabltes of the varous values of are computed usng Bayes rule: C Every dce gets a fragment of sze dce number 27 Hypothetcal Dce Shooter Eample: We obtan an ntal estmate for the probablty dstrbuton of the two sets of dce somehow: blue 0.35 0.3 0.25 0.2 0.5 0. 0.05 0 40. 3 0.05 2 4 5 6 2 3 5 6 We obtan an ntal estmate for the probablty wth whch the caller calls out the two shooters somehow 0.5 0.5 28 red 0.45 0.4 0.35 0.3 0.25 02 0.2 0.5 0. 0.05 0 Hypothetcal Dce Shooter Eample: Intal estmate: blue = red = 0.5 4 blue = 0., for 4 red = 0.05 6 4 5 2 3 4 5 2 2 4 3 4 6 2 6 4 0.33 4 0.67 Caller has just called out 4 Posteror probablty of colors: red 4 C 4 red red C 0.05 0.5 C0.025 blue 4 C 4 blue blue C 0. 0.5 C0.05 ormalzng : P red 4 0.33; blue 4 0.67 29 30 5

Every observed roll of the dce contrbutes to both Red and Blue 6 4 5 2 3 4 5 2 2 4 3 4 6 2 6 Every observed roll of the dce contrbutes to both Red and Blue 6 4 5 2 3 4 5 2 2 4 3 4 6 2 6 6 0.8 6 0.2 3 32 Every observed roll of the dce contrbutes to both Red and Blue 6 4 5 2 3 4 5 2 2 4 3 4 6 2 6 Every observed roll of the dce contrbutes to both Red and Blue 6 4 5 2 3 4 5 2 2 4 3 4 6 2 6 6 0.8, 4 0.33 6 0.2, 4 0.67 6 0.8, 4 0.33, 6 0.2, 4 0.67, 5 0.33, 50.67, 33 34 Every observed roll of the dce contrbutes to both Red and Blue 6 4 5 2 3 4 5 2 2 4 3 4 6 2 6 6 0.8, 4 0.33, 5 0.33, 0.57, 2 0.4, 3 0.33, 4 0.33, 5 0.33, 2 0.4, 2 0.4, 0.57, 4 0.33, 3 0.33, 4 0.33, 6 0.8, 2 0.4, 0.57, 6 0.8 6 0.2, 4 0.67, 5 0.67, 0.43, 2 0.86, 3 0.67, 4 0.67, 5 0.67, 2 0.86, 2 0.86, 0.43, 4 0.67, 3 0.67, 4 0.67, 6 0.2, 2 0.86, 0.43, 6 0.2 35 Every observed roll of the dce contrbutes to both Red and Blue Total count for Red s the sum of all the posteror probabltes n the red column 7.3 Total count for Blue s the sum of all the posteror probabltes n the blue column 0.69 ote: 0.69 + 7.3 = 8 = the total number of nstances Called red blue.57.43 2.4.86 2.4.86 2.4.86.57.43 2.4.86.57.43 7.3 0.69 36 6

Total count for Red : 7.3 Total count for :.7 Called red blue.57.43 2.4.86 2.4.86 2.4.86.57.43 2.4.86.57.43 7.3 0.69 37 Total count for Red : 7.3 Total count for :.7 Total count for 2: 0.56 Called red blue.57.43 2.4.86 2.4.86 2.4.86.57.43 2.4.86.57.43 7.3 0.69 38 Total count for Red : 7.3 Total count for :.7 Total count for 2: 0.56 Total count for 3: 0.66 Called red blue.57.43 2.4.86 2.4.86 2.4.86.57.43 2.4.86.57.43 7.3 0.69 39 Total count for Red : 7.3 Total count for :.7 Total count for 2: 0.56 Total count for 3: 0.66 Total count for 4:.32 Called red blue.57.43 2.4.86 2.4.86 2.4.86.57.43 2.4.86.57.43 7.3 0.69 40 Total count for Red : 7.3 Total count for :.7 Total count for 2: 0.56 Total count for 3: 0.66 Total count for 4:.32 Total count for 5: 0.66 Called red blue.57.43 2.4.86 2.4.86 2.4.86.57.43 2.4.86.57.43 7.3 0.69 4 Called red blue Total count for Red : 7.3 Total count for :.7.57.43 Total count for 2: 0.56 2.4.86 Total count for 3: 0.66 Total count for 4:.32 Total count for 5: 0.66 Total count for 6: 2.4 2.4.86 2.4.86.57.43 2.4.86.57.43 7.3 0.69 42 7

Total count for Red : 7.3 Total count for :.7 Total count for 2: 0.56 Total count for 3: 0.66 Total count for 4:.32 Total count for 5: 0.66 Total count for 6: 2.4 Updated probablty of Red dce: Red =.7/7.3 = 0.234 2 Red = 0.56/7.3 = 0.077 3 Red = 0.66/7.3 = 0.090 4 Red =.32/7.3 = 0.8 5 Red = 0.66/7.3 = 0.090 6 Red = 2.40/7.3 = 0.328 Called red blue.57.43 2.4.86 2.4.86 2.4.86.57.43 2.4.86.57.43 7.3 0.69 43 Total count for Blue : 0.69 Total count for :.29 Called red blue.57.43 2.4.86 2.4.86 2.4.86.57.43 2.4.86.57.43 7.3 0.69 44 Total count for Blue : 0.69 Total count for :.29 Total count for 2: 3.44 Called red blue.57.43 2.4.86 2.4.86 2.4.86.57.43 2.4.86.57.43 7.3 0.69 45 Total count for Blue : 0.69 Total count for :.29 Total count for 2: 3.44 Total count for 3:.34 Called red blue.57.43 2.4.86 2.4.86 2.4.86.57.43 2.4.86.57.43 7.3 0.69 46 Total count for Blue : 0.69 Total count for :.29 Total count for 2: 3.44 Total count for 3:.34 Total count for 4: 2.68 Called red blue.57.43 2.4.86 2.4.86 2.4.86.57.43 2.4.86.57.43 7.3 0.69 47 Total count for Blue : 0.69 Total count for :.29 Total count for 2: 3.44 Total count for 3:.34 Total count for 4: 2.68 Total count for 5:.34 Called red blue.57.43 2.4.86 2.4.86 2.4.86.57.43 2.4.86.57.43 7.3 0.69 48 8

Called red blue Total count for Blue : 0.69 Total count for :.29.57.43 Total count for 2: 3.44 2.4.86 Total count for 3:.34 Total count for 4: 2.68 Total count for 5:.34 Total count for 6: 0.6 2.4.86 2.4.86.57.43 2.4.86.57.43 7.3 0.69 49 Total count for Blue : 0.69 Total count for :.29 Total count for 2: 3.44 Total count for 3:.34 Total count for 4: 2.68 Total count for 5:.34 Total count for 6: 0.6 Updated probablty of Blue dce: Blue =.29/.69 = 0.22 2 Blue = 0.56/.69 = 0.322 3 Blue = 0.66/.69 = 0.25 4 Blue =.32/.69 = 0.250 5 Blue = 0.66/.69 = 0.25 6 Blue = 2.40/.69 = 0.056 Called red blue.57.43 2.4.86 2.4.86 2.4.86.57.43 2.4.86.57.43 7.3 0.69 50 Total count for Red : 7.3 Total count for Blue : 0.69 Total nstances = 8 ote 7.3+0.69 = 8 We also revse our estmate for the probablty that the caller calls out Red or Blue.e the fracton of tmes that he calls Red and the fracton of tmes he calls Blue =Red = 7.3/8 = 0.4 =Blue = 0.69/8 = 0.59 Called red blue.57.43 2.4.86 2.4.86 2.4.86.57.43 2.4.86.57.43 7.3 0.69 5 The updated values Probablty of Red dce: Called red blue Red =.7/7.3 = 0.234 2 Red = 0.56/7.3 = 0.077 3 Red = 0.66/7.3 = 0.090 4 Red =.32/7.3 = 0.8.57.43 5 Red = 0.66/7.3 = 0.090 2.4.86 6 Red = 2.40/7.3 = 0.328 Probablty of Blue dce: Blue =.29/.69 = 0.22 2.4.86 2 Blue = 0.56/.69 = 0.322 2.4.86 3 Blue = 0.66/.69 = 0.25.57.43 4 Blue =.32/.69 = 0.250 5 Blue = 0.66/.69 = 0.25 6 Blue = 2.40/.69 = 0.056 2 =Red = 7.3/8 = 0.4.4.86.57.43 =Blue = 0.69/8 = 0.59 THE UPDATED VALUES CA BE USED TO REPEAT THE 755/8797 PROCESS. ESTIMATIO IS A ITERATIVE PROCESS 2 Sep 200 52 The Dce Shooter Eample 6 4 5 3 2 2 2 6 3 5 4 2 4 4 4 6 3 2 2. Intalze, 2. Estmate for each, for each called out number Assocate wth each value of, wth weght 3. Re-estmate for every value of and 4. Re-estmate 5. If not converged, return to 2 53 In Squggles Gven a sequence of observatons O, O 2,.. s the number of observatons of number Intalze, for dce and Iterate: For each number : Update: P P O such that O O O P P P 54 ' ' ' ' ' 9

Solutons may not be unque A More Comple Model The EM algorthm wll gve us one of many solutons, all equally vald! The probablty of 6 beng called out: 6 6 red 6 blue P r P b Assgns P r as the probablty of 6 for the red de Assgns P b as the probablty of 6 for the blue de The followng too s a vald soluton [FI] P P 0. anythng 6.0 r b 0 Assgns.0 as the a pror probablty of the red de Assgns 0.0 as the probablty of the blue de The soluton s OT unque 55 T P ;, ep 0.5 d 2 Gaussan mtures are often good models for the dstrbuton of multvarate data Problem: Estmatng the parameters, gven a collecton of data 56 Gaussan Mtures: Generatng model P ;, The caller now has two Gaussans 6..4 5.3.9 4.2 2.2 4.9 0.5 At each draw he randomly selects a Gaussan, by the mture weght dstrbuton He then draws an observaton from that Gaussan Much le the dce problem only the outcomes are now real and can be anythng 57 Estmatng GMM wth complete nformaton Observaton: A collecton of drawn from a mture of 2 Gaussans As ndcated by the colors, we now whch Gaussan generated what number Segregaton: Separate the blue observatons from the red From each set compute parameters for that Gaussan red red red red red red T red red red red 6..4 5.3.9 4.2 2.2 4.9 0.5 6. 5.3 4.2 4.9...4.9 2.2 0.5.. 58 Fragmentng the observaton Gaussan unnown Collecton of blue 4.2 4.2 4.2 4.2.. 4.2.. Collecton of red The dentty of the Gaussan s not nown! Soluton: Fragment the observaton Fragment sze proportonal to a posteror probablty ;, ' ' ' ; ', ' ' ' Intalze, and for both Gaussans Important how we do ths Typcal soluton: Intalze means randomly, as the global covarance of the data and unformly Compute fragment szes for each Gaussan, for each observaton ' umber red blue 6..8.9. 5.3.75.25.9.4.59 4.2.64.36 2.2.43.57 4.9.66.34 0.5.05.95 ;, ' ;, ' ' 59 60 0

Each observaton contrbutes only as much as ts fragment sze to each statstc Meanred = 6.*0.8 +.4*0.33 + 5.3*0.75 +.9*0.4 + 4.2*0.64 + 2.2*0.43 + 4.9*0.66 + 0.5*0.05 / 0.8 + 0.33 + 0.75 + 0.4 + 0.64 + 0.43 + 0.66 + 0.05 = 7.05 / 4.08 = 4.8 umber red blue 6..8.9. 5.3.75.25.9.4.59 4.2.64.36 2.2.43.57 4.9.66.34 0.5.05.95 4.08 3.92 Varred = 6.-4.8 2 *0.8 +.4-4.8 2 *0.33 + 5.3-4.8 2 *0.75 +.9-4.8 2 *0.4 + 4.2-4.8 2 *0.64 + 2.2-4.8 2 *0.43 + 4.9-4.8 2 *0.66 + 0.5-4.8 2 *0.05 / 0.8 + 0.33 + 0.75 + 0.4 + 0.64 + 0.43 + 0.66 + 0.05 4.08 red 8 6 EM for Gaussan Mtures. Intalze, and for all Gaussans 2. For each observaton compute a posteror probabltes for all Gaussan ;, P ' ;, ' ' ' 3. Update mture weghts, means and varances for all Gaussans 2 4. If not converged, return to 2 62 EM estmaton of Gaussan Mtures An Eample The same prncple can be etended to mtures of other dstrbutons. E.g. Mture of Laplacans: Laplacan parameters become P b P Hstogram of 4000 nstances of a randomly generated data Indvdual parameters of a two-gaussan mture estmated by EM Two-Gaussan mture estmated by EM In a mture of Gaussans and Laplacans, Gaussans use the Gaussan update rules, Laplacans use the Laplacan rule 63 64 The EM algorthm s used whenever proper statstcal analyss of a phenomenon requres the nowledge of a hdden or mssng varable or a set of hdden/mssng varables The hdden varable s often called a latent varable Some eamples: Estmatng mtures of dstrbutons Only data are observed. The ndvdual dstrbutons and mng proportons must both be learnt. Estmatng the dstrbuton of data, when some attrbutes are mssng Estmatng the dynamcs of a system, based only on observatons that may be a comple functon of system state Solve ths problem: Caller rolls a dce and flps a con He calls out the number rolled f the con shows head Otherwse he calls the number+ Determne pheads and pnumber for the dce from a collecton of ouputs Caller rolls two dce He calls out the sum Determne dce from a collecton of ouputs 65 66

The dce and the con Heads or tal? The two dce 4 Tals count Heads count 4 3.. 4. 3 3, 4 2,2,3 Unnown: Whether t was head or tals Unnown: How to partton the number Count blue 3 += 3, 4 Count blue 2 += 2,2 4 Count blue +=,3 4 67 68 Fragmentaton can be herarchcal P, 2 2 3 4 More later Wll see a couple of other nstances of the use of EM Wor out HMM tranng Assume state output dstrbutons are multnomals Assume they are Gaussan Assume Gaussan mtures E.g. mture of mtures Fragments are further fragmented.. Wor ths out 69 70 2