Expectation Maximization Mixture Models

Similar documents
Expectation Maximization Mixture Models HMMs

Expectation Maximization Mixture Models HMMs

Machine Learning for Signal Processing Expectation Maximization Mixture Models. Bhiksha Raj 27 Oct /

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

Machine learning: Density estimation

Retrieval Models: Language models

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Overview. Hidden Markov Models and Gaussian Mixture Models. Acoustic Modelling. Fundamental Equation of Statistical Speech Recognition

Course 395: Machine Learning - Lectures

Space of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics

An Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation

The Expectation-Maximization Algorithm

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

EM and Structure Learning

Gaussian Mixture Models

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

Probability Density Function Estimation by different Methods

The EM Algorithm (Dempster, Laird, Rubin 1977) The missing data or incomplete data setting: ODL(φ;Y ) = [Y;φ] = [Y X,φ][X φ] = X

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin

Chapter 1. Probability

Motion Perception Under Uncertainty. Hongjing Lu Department of Psychology University of Hong Kong

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ

XII.3 The EM (Expectation-Maximization) Algorithm

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

CHAPTER 3: BAYESIAN DECISION THEORY

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

A linear imaging system with white additive Gaussian noise on the observed data is modeled as follows:

Machine Learning for Signal Processing Linear Gaussian Models

The Basic Idea of EM

18.1 Introduction and Recap

SDMML HT MSc Problem Sheet 4

Hidden Markov Models

Note on EM-training of IBM-model 1

Homework Assignment 3 Due in class, Thursday October 15

Expected Value and Variance

Semi-Supervised Learning

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )

Mixture of Gaussians Expectation Maximization (EM) Part 2

9 : Learning Partially Observed GM : EM Algorithm

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

Lecture 3: Probability Distributions

Parametric fractional imputation for missing data analysis

Classification as a Regression Problem

The big picture. Outline

Conjugacy and the Exponential Family

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Module 9. Lecture 6. Duality in Assignment Problems

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

Clustering with Gaussian Mixtures

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

Lecture Nov

10-701/ Machine Learning, Fall 2005 Homework 3

Limited Dependent Variables

Problem Set 9 Solutions

Boostrapaggregating (Bagging)

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

Differentiating Gaussian Processes

Markov Chain Monte Carlo Lecture 6

Finding Dense Subgraphs in G(n, 1/2)

Sampling Theory MODULE VII LECTURE - 23 VARYING PROBABILITY SAMPLING

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture # 15 Scribe: Jieming Mao April 1, 2013

Learning from Data 1 Naive Bayes

Grover s Algorithm + Quantum Zeno Effect + Vaidman

Linear Approximation with Regularization and Moving Least Squares

Lecture 7: Boltzmann distribution & Thermodynamics of mixing

A random variable is a function which associates a real number to each element of the sample space

RELIABILITY ASSESSMENT

Notes on Frequency Estimation in Data Streams

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Maximum Likelihood Estimation

Week3, Chapter 4. Position and Displacement. Motion in Two Dimensions. Instantaneous Velocity. Average Velocity

The Geometry of Logit and Probit

14 Lagrange Multipliers

Lecture 10 Support Vector Machines. Oct

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

PHYS 450 Spring semester Lecture 02: Dealing with Experimental Uncertainties. Ron Reifenberger Birck Nanotechnology Center Purdue University

3.1 ML and Empirical Distribution

A Robust Method for Calculating the Correlation Coefficient

LINEAR REGRESSION ANALYSIS. MODULE VIII Lecture Indicator Variables

A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS

Learning with Maximum Likelihood

First Year Examination Department of Statistics, University of Florida

Chapter 8 Indicator Variables

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 3: Large deviations bounds and applications Lecturer: Sanjeev Arora

Lecture 12: Classification

Introduction to Random Variables

Natural Images, Gaussian Mixtures and Dead Leaves Supplementary Material

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

1/10/18. Definitions. Probabilistic models. Why probabilistic models. Example: a fair 6-sided dice. Probability

Transcription:

-755 Machne Learnng for Sgnal Processng Mxture Models Understandng (and Predctng Data Many dfferent data streams around us We process, understand and respond What s the response based on? Class 0. Oct 2, 202 2 Understandng (and Predctng Data Many dfferent data streams around us We process, understand and respond What s the response based on? The data we observed Underlyng characterstcs that we nferred Understandng (and Predctng Data Many dfferent data streams around us We process, understand and respond What s the response based on? The data we observed Underlyng characterstcs that we nferred Modeled usng latent varables 3 4 Examples Stoc Maret Examples Sports From Yahoo! Fnance Maret sentment as a latent varable? What slls n players should be valued? Sdenote: For anyone nterested, Baseball as a Marov Chan 5 6

Ptch (Hz Ptch (Hz Ptch (Hz Examples A Strange Observaton Many audo applcatons use latent varables Sgnal Separaton Voce Modfcaton Musc Analyss 800 600 Lata Mangeshar, Anupama Pea: 570 Hz, Mean: 40 Hz Ala Yangn, Dl Ka Rshta Pea: 740 Hz, Mean: 580 Hz Musc and Speech Generaton 400 Shamshad Begum, Patanga Pea 30 Hz, Mean 278 Hz 949 966 2003 Year (AD The ptch of female Indan playbac sngers s on an ever-ncreasng trajectory 7 Oct 2, 202 8 Comments on the hgh-ptched sngng A Strange Observaton Sarah McDonald (Holy Cow:.. shreng 800 Khazana.com:.. female Indan move playbac sngers who can produce ultra hgh frequnces whch only dogs can hear clearly.. 600 400 Lata Mangeshar, Anupama Pea: 570 Hz, Mean: 40 Hz Ala Yangn, Dl Ka Rshta Pea: 740 Hz, Mean: 580 Hz www.roadjuny.com:.. Hgh ptched female sngers dong ther best to sound le they were seven years old.. 9 Average Female Talng Ptch Shamshad Begum, Patanga Pea 30 Hz, Mean 278 Hz 949 966 2003 Year (AD The ptch of female Indan playbac sngers s on an ever-ncreasng trajectory 0 A Dsturbng Observaton Lets Fx the Song 800 Glass Shatters The ptch s unpleasant The melody sn t bad 600 Lata Mangeshar, Anupama Pea: 570 Hz, Mean: 40 Hz Ala Yangn, Dl Ka Rshta Pea: 740 Hz, Mean: 580 Hz Modfy the ptch, but retan melody 400 Average Female Talng Ptch Shamshad Begum, Patanga Pea 30 Hz, Mean 278 Hz 949 966 2003 Year (AD The ptch of female Indan playbac sngers s on an ever-ncreasng trajectory Problem: Cannot just shft the ptch: wll destroy the musc The musc s fne, leave t alone Modfy the sngng ptch wthout affectng the musc 2 2

Personalzng the Song Separate the vocals from the bacground musc Modfy the separated vocals, eep musc unchanged Separaton need not be perfect Must only be suffcent to enable ptch modfcaton of vocals Ptch modfcaton s tolerant of low-level artfacts For octave level ptch modfcaton artfacts can be undetectable. Separaton example Dayya Dayya orgnal (only vocalzed regons Dayya Dayya separated musc Dayya Dayya separated vocals Oct 2, 202 3 Oct 2, 202 4 Some examples Some examples Example : Vocals shfted down by 4 semtonesexample 2: Gender of snger partally modfed Example : Vocals shfted down by 4 semtones Example 2: Gender of snger partally modfed Oct 2, 202 5 Oct 2, 202 6 Technques Employed Sgnal separaton Employed a smple latent-varable based separaton method Voce modfcaton Equally smple technques Wll consder the underlyng methods over next few lectures Extensve use of Learnng Dstrbutons for Data Problem: Gven a collecton of examples from some data, estmate ts dstrbuton Basc deas of Maxmum Lelhood and MAP estmaton can be found n Aart/Pars sldes Ponted to n a prevous class Soluton: Assgn a model to the dstrbuton Learn parameters of model from data Models can be arbtrarly complex Mxture denstes, Herarchcal models. Learnng can be done usng 7 8 3

A Thought Experment 6 3 5 4 2 4 A person shoots a loaded dce repeatedly You observe the seres of outcomes You can form a good dea of how the dce s loaded Fgure out what the probabltes of the varous are for dce number = count(number/sum(rolls Ths s a maxmum lelhood estmate Estmate that maes the observed sequence of most probable Generatve Model The data are generated by draws from the dstrbuton I.e. the generatng process draws from the dstrbuton Assumpton: The dstrbuton has a hgh probablty of generatng the observed data ot necessarly true Select the dstrbuton that has the hghest probablty of generatng the data Should assgn lower probablty to less frequent observatons and vce versa 9 20 The Multnomal Dstrbuton A probablty dstrbuton over a dscrete collecton of tems s a Multnomal : belongs toa dscrete set E.g. the roll of dce : n (,2,3,4,5,6 Or the toss of a con : n (head, tals 2 Maxmum Lelhood Estmaton: Multnomal Probablty of generatng (n, n 2, n 3, n 4, n 5, n 6 Fnd p,p 2,p 3,p 4,p 5,p 6 so that the above s maxmzed Alternately maxmze P n, n, n, n, n, n Const ( 2 3 4 5 6 Log( s a monotonc functon argmax x f(x = argmax x log(f(x Solvng for the probabltes gves us Requres constraned optmzaton to ensure probabltes sum to 22 n, n2, n3, n4, n5, n log( Const n log p log 6 p j n n j n p EVETUALLY ITS JUST COUTIG! Segue: Gaussans ; m, Q Parameters of a Gaussan: Mean m, Covarance Q T exp 0.5( m Q ( m (2 Q d Maxmum Lelhood: Gaussan Gven a collecton of observatons (, 2,, estmate mean m and covarance Q log T, 2,... exp 0.5( m Q ( m d (2 Q T, 2,... C 0.5log Q ( m Q ( m Maxmzng w.r.t m and Q gves us T ITS STILL m Q m m JUST COUTIG! 23 24 4

Laplacan Maxmum Lelhood: Laplacan Gven a collecton of observatons (x, x 2,, estmate mean m and scale b x m x, x,... C log( b log 2 b Maxmzng w.r.t m and b gves us x m x L( x; m, b exp 2b b Parameters: Mean m, scale b (b > 0 m x b x m 25 26 Drchlet K=3. Clocwse from top left: α=(6, 2, 2, (3, 7, 5, (6, 2, 6, (2, 3, 4 (from wpeda log of the densty as we change α from α=(0.3, 0.3, 0.3 to (2.0, 2.0, 2.0, eepng all the ndvdual α's equal to each other. Parameters are as Determne mode and curvature Defned only of probablty vectors = [x x 2.. x K ], S x =, x >= 0 for all 27 ( a D( ; a a x a Maxmum Lelhood: Drchlet Gven a collecton of observatons (, 2,, estmate a, 2,... ( a log( j log a log a j log, o closed form soluton for as. eeds gradent ascent Several dstrbutons have ths property: the ML estmate of ther parameters have no closed form soluton 28 Contnung the Thought Experment Estmatng Probabltes Observaton: The sequence of from the two dce As ndcated by the colors, we now who rolled what number 6 4 5 2 3 4 5 2 2 4 3 4 6 2 6 6 3 5 4 2 4 4 4 6 3 2 2 Two persons shoot loaded dce repeatedly The dce are dfferently loaded for the two of them We observe the seres of outcomes for both persons How to determne the probablty dstrbutons of the two dce? Oct 2, 202 29 Oct 2, 202 30 5

0.3 0.25 0.2 0.5 0. 0.05 0 2 3 4 5 6 0.3 0.25 0.2 0.5 0. 0.05 0 2 3 4 5 6 Estmatng Probabltes Observaton: The sequence of from the two dce As ndcated by the colors, we now who rolled what number Segregaton: Separate the blue observatons from the red 6 4 5 2 3 4 5 2 2 4 3 4 6 2 6 6 5 2 4 2 3 6.. 4 3 5 2 4 4 2 6.. Collecton of blue Collecton of red Estmatng Probabltes Observaton: The sequence of from the two dce As ndcated by the colors, we now who rolled what number Segregaton: Separate the blue observatons from the red From each set compute probabltes for each of the 6 possble outcomes no.of tmes number was rolled number totalnumber of observed rolls 6 4 5 2 3 4 5 2 2 4 3 4 6 2 6 6 5 2 4 2 3 6.. 4 3 5 2 4 4 2 6.. Oct 2, 202 3 Oct 2, 202 32 A Thought Experment 6 4 5 3 2 2 2 A Thought Experment 6 4 5 3 2 2 2 6 3 5 4 2 4 4 4 6 3 2 2 ow magne that you cannot observe the dce yourself Instead there s a caller who randomly calls out the outcomes 40% of the tme he calls out the number from the left shooter, and 60% of the tme, the one from the rght (and you now ths At any tme, you do not now whch of the two he s callng out How do you determne the probablty dstrbutons for the two dce? 33 6 3 5 4 2 4 4 4 6 3 2 2 How do you now determne the probablty dstrbutons for the two sets of dce.. If you do not even now what fracton of tme the blue are called, and what fracton are red? 34 A Mxture Multnomal The caller wll call out a number n any gven callout IF He selects RED, and the Red de rolls the number OR He selects BLUE and the Blue de rolls the number = Red Red + Blue Blue E.g. 6 = Red6 Red + Blue6 Blue A dstrbuton that combnes (or mxes multple multnomals s a mxture multnomal P ( Z Mxture weghts Component multnomals Mxture Dstrbutons P ( Z Mxture weghts Component dstrbutons Mxture Gaussan P ( ; m, Q Mxture dstrbutons mx several component dstrbutons Component dstrbutons may be of vared type Mxng weghts must sum to.0 Component dstrbutons ntegrate to.0 Mxture dstrbuton ntegrates to.0 Z z z 35 36 6

Maxmum Lelhood Estmaton For our problem: Z = color of dce Maxmum lelhood soluton: Maxmze o closed form soluton (summaton nsde log! In general ML estmates for mxtures do not have a closed form USE EM! P ( Z n P ( n, n2, n3, n4, n5, n6 Const Const Z n log P ( Z log( P ( n, n2, n3, n4, n5, n6 log( Const Z n It s possble to estmate all parameters n ths setup usng the (or EM algorthm Frst descrbed n a landmar paper by Dempster, Lard and Rubn Maxmum Lelhood Estmaton from ncomplete data, va the EM Algorthm, Journal of the Royal Statstcal Socety, Seres B, 977 Much wor on the algorthm snce then The prncples behnd the algorthm exsted for several years pror to the landmar paper, however. Oct 2, 202 37 Oct 2, 202 38 Iteratve soluton Get some ntal estmates for all parameters Dce shooter example: Ths ncludes probablty dstrbutons for dce AD the probablty wth whch the caller selects the dce Two steps that are terated: Expectaton Step: Estmate statstcally, the values of unseen varables Maxmzaton Step: Usng the estmated values of the unseen varables as truth, estmates of the model parameters 39 EM: The auxlary functon EM teratvely optmzes the followng auxlary functon Q(q, q = S Z Z,q log(z, q Z are the unseen varables Assumng Z s dscrete (may not be q are the parameter estmates from the prevous teraton q are the estmates to be obtaned n the current teraton 40 as countng Instance from blue dce Instance from red dce Dce unnown 6 6 6 6 6........ 6 6 6.. 6.. Collecton of blue Collecton of red Collecton of blue Collecton of red Collecton of blue Collecton of red Hdden varable: Z Dce: The dentty of the dce whose number has been called out If we new Z for every observaton, we could estmate all terms By addng the observaton to the rght bn Unfortunately, we do not now Z t s hdden from us! Soluton: FRAGMET THE OBSERVATIO Fragmentng the Observaton EM s an teratve algorthm At each tme there s a current estmate of parameters The sze of the fragments s proportonal to the a posteror probablty of the component dstrbutons The a posteror probabltes of the varous values of Z are computed usng Bayes rule: Z C Every dce gets a fragment of sze dce number Oct 2, 202 4 Oct 2, 202 42 7

blue red Hypothetcal Dce Shooter Example: We obtan an ntal estmate for the probablty dstrbuton of the two sets of dce (somehow: 0.35 0.3 0.25 0.2 0.5 0. 0.05 0 40. 3 4 0.05 2 5 6 2 3 5 6 We obtan an ntal estmate for the probablty wth whch the caller calls out the two shooters (somehow 0.5 0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.5 0. 0.05 0 Hypothetcal Dce Shooter Example: Intal estmate: blue = red = 0.5 4 blue = 0., for 4 red = 0.05 Caller has just called out 4 Posteror probablty of colors: red 4 C 4 Z red Z red C 0.05 0.5 C0.025 blue 4 C 4 Z blue Z blue C 0. 0.5 C0.05 ormalzn g : red 4 0.33; blue 4 0.67 Oct 2, 202 43 Oct 2, 202 44 6 4 5 2 3 4 5 2 2 4 3 4 6 2 6 Every observed roll of the dce contrbutes to both Red and Blue 6 4 5 2 3 4 5 2 2 4 3 4 6 2 6 4 (0.33 4 (0.67 45 46 Every observed roll of the dce contrbutes to both Red and Blue 6 4 5 2 3 4 5 2 2 4 3 4 6 2 6 Every observed roll of the dce contrbutes to both Red and Blue 6 4 5 2 3 4 5 2 2 4 3 4 6 2 6 6 (0.8 6 (0.2 6 (0.8, 4 (0.33 6 (0.2, 4 (0.67 Oct 2, 202 47 Oct 2, 202 48 8

Every observed roll of the dce contrbutes to both Red and Blue 6 4 5 2 3 4 5 2 2 4 3 4 6 2 6 Every observed roll of the dce contrbutes to both Red and Blue 6 4 5 2 3 4 5 2 2 4 3 4 6 2 6 6 (0.8, 4 (0.33, 6 (0.2, 4 (0.67, 5 (0.33, 5 (0.67, 6 (0.8, 4 (0.33, 5 (0.33, (0.57, 2 (0.4, 3 (0.33, 4 (0.33, 5 (0.33, 2 (0.4, 2 (0.4, (0.57, 4 (0.33, 3 (0.33, 4 (0.33, 6 (0.8, 2 (0.4, (0.57, 6 (0.8 6 (0.2, 4 (0.67, 5 (0.67, (0.43, 2 (0.86, 3 (0.67, 4 (0.67, 5 (0.67, 2 (0.86, 2 (0.86, (0.43, 4 (0.67, 3 (0.67, 4 (0.67, 6 (0.2, 2 (0.86, (0.43, 6 (0.2 Oct 2, 202 49 Oct 2, 202 50 Every observed roll of the dce contrbutes to both Red and Blue Total count for Red s the sum of all the posteror probabltes n the red column 7.3 Total count for Blue s the sum of all the posteror probabltes n the blue column 0.69 ote: 0.69 + 7.3 = 8 = the total number of nstances Called red blue Total count for Red : 7.3 Red: Total count for :.7 Oct 2, 202 5 Oct 2, 202 52 Called red blue Total count for Red : 7.3 Red: Total count for :.7 Total count for 2: 0.56 Called red blue 53 Total count for Red : 7.3 Red: Total count for :.7 Total count for 2: 0.56 Total count for 3: 0.66 Called red blue 54 9

Total count for Red : 7.3 Red: Total count for :.7 Total count for 2: 0.56 Total count for 3: 0.66 Total count for 4:.32 Called red blue 55 Total count for Red : 7.3 Red: Total count for :.7 Total count for 2: 0.56 Total count for 3: 0.66 Total count for 4:.32 Total count for 5: 0.66 Called red blue 56 Total count for Red : 7.3 Red: Total count for :.7 Total count for 2: 0.56 Total count for 3: 0.66 Total count for 4:.32 Total count for 5: 0.66 Total count for 6: 2.4 Called red blue 57 Total count for Red : 7.3 Red: Total count for :.7 Total count for 2: 0.56 Total count for 3: 0.66 Total count for 4:.32 Total count for 5: 0.66 Total count for 6: 2.4 Updated probablty of Red dce: Red =.7/7.3 = 0.234 2 Red = 0.56/7.3 = 0.077 3 Red = 0.66/7.3 = 0.090 4 Red =.32/7.3 = 0.8 5 Red = 0.66/7.3 = 0.090 6 Red = 2.40/7.3 = 0.328 Called red blue 58 Total count for Blue : 0.69 Blue: Total count for :.29 Called red blue 59 Total count for Blue : 0.69 Blue: Total count for :.29 Total count for 2: 3.44 Called red blue 60 0

Total count for Blue : 0.69 Blue: Total count for :.29 Total count for 2: 3.44 Total count for 3:.34 Called red blue 6 Total count for Blue : 0.69 Blue: Total count for :.29 Total count for 2: 3.44 Total count for 3:.34 Total count for 4: 2.68 Called red blue 62 Total count for Blue : 0.69 Blue: Total count for :.29 Total count for 2: 3.44 Total count for 3:.34 Total count for 4: 2.68 Total count for 5:.34 Called red blue 63 Total count for Blue : 0.69 Blue: Total count for :.29 Total count for 2: 3.44 Total count for 3:.34 Total count for 4: 2.68 Total count for 5:.34 Total count for 6: 0.6 Called red blue 64 Total count for Blue : 0.69 Blue: Total count for :.29 Total count for 2: 3.44 Total count for 3:.34 Total count for 4: 2.68 Total count for 5:.34 Total count for 6: 0.6 Updated probablty of Blue dce: Blue =.29/.69 = 0.22 2 Blue = 0.56/.69 = 0.322 3 Blue = 0.66/.69 = 0.25 4 Blue =.32/.69 = 0.250 5 Blue = 0.66/.69 = 0.25 6 Blue = 2.40/.69 = 0.056 Called red blue 65 Total count for Red : 7.3 Total count for Blue : 0.69 Total nstances = 8 ote 7.3+0.69 = 8 We also revse our estmate for the probablty that the caller calls out Red or Blue.e the fracton of tmes that he calls Red and the fracton of tmes he calls Blue Z=Red = 7.3/8 = 0.4 Z=Blue = 0.69/8 = 0.59 Called red blue 66

The updated values Probablty of Red dce: Red =.7/7.3 = 0.234 2 Red = 0.56/7.3 = 0.077 3 Red = 0.66/7.3 = 0.090 4 Red =.32/7.3 = 0.8 5 Red = 0.66/7.3 = 0.090 6 Red = 2.40/7.3 = 0.328 Probablty of Blue dce: Blue =.29/.69 = 0.22 2 Blue = 0.56/.69 = 0.322 3 Blue = 0.66/.69 = 0.25 4 Blue =.32/.69 = 0.250 5 Blue = 0.66/.69 = 0.25 6 Blue = 2.40/.69 = 0.056 Z=Red = 7.3/8 = 0.4 Z=Blue = 0.69/8 = 0.59 Called red blue THE UPDATED VALUES CA BE USED TO REPEAT THE 67 PROCESS. ESTIMATIO IS A ITERATIVE PROCESS The Dce Shooter Example 6 4 5 3 2 2 2 6 3 5 4 2 4 4 4 6 3 2 2. Intalze, 2. Estmate Z for each Z, for each called out number Assocate wth each value of Z, wth weght Z 3. Re-estmate for every value of and Z 4. Re-estmate 5. If not converged, return to 2 68 In Squggles Gven a sequence of observatons O, O 2,.. s the number of observatons of number Intalze, for dce Z and Iterate: For each number : Update: P P O such that ( O Z O ( Z O Z Z Z Z ' Z' Z' Z ' Z Z' Solutons may not be unque The EM algorthm wll gve us one of many solutons, all equally vald! The probablty of 6 beng called out: 6 a6 red 6 blue ap r P b Assgns P r as the probablty of 6 for the red de Assgns P b as the probablty of 6 for the blue de The followng too s a vald soluton ap P 0. anythng 6.0 r b 0 Assgns.0 as the a pror probablty of the red de Assgns 0.0 as the probablty of the blue de The soluton s OT unque Oct 2, 202 69 Oct 2, 202 70 A More Complex Model Gaussan Mxtures: Generatng model P ( ; m, Q 6..4 5.3.9 4.2 2.2 4.9 0.5 T P ( ; m, Q exp 0.5( m Q ( m d (2 Q Gaussan mxtures are often good models for the dstrbuton of multvarate data Problem: Estmatng the parameters, gven a collecton of data 7 The caller now has two Gaussans At each draw he randomly selects a Gaussan, by the mxture weght dstrbuton He then draws an observaton from that Gaussan Much le the dce problem (only the outcomes are now real and can be anythng 72 2

Estmatng GMM wth complete nformaton Observaton: A collecton of drawn from a mxture of 2 Gaussans As ndcated by the colors, we now whch Gaussan generated what number Segregaton: Separate the blue observatons from the red From each set compute parameters for that Gaussan mred red red Qred red red red T mred mred red 6..4 5.3.9 4.2 2.2 4.9 0.5 6. 5.3 4.2 4.9...4.9 2.2 0.5.. Fragmentng the observaton Gaussan unnown Collecton of blue Collecton of red The dentty of the Gaussan s not nown! Soluton: Fragment the observaton Fragment sze proportonal to a posteror probablty ; m, Q ' ' ' ; m ', Q Oct 2, 202 73 Oct 2, 202 74 4.2 4.2 4.2 4.2.. 4.2.. ' ' ' Intalze, m and Q for both Gaussans Important how we do ths Typcal soluton: Intalze means randomly, Q as the global covarance of the data and unformly Compute fragment szes for each Gaussan, for each observaton ' umber 6..8.9. 5.3.75.25.9.4.59 4.2.64.36 2.2.43.57 4.9.66.34 0.5.05.95 ; m, Q ' ; m, Q ' ' red blue Each observaton contrbutes only as much as ts fragment sze to each statstc Mean(red = (6.*0.8 +.4*0.33 + 5.3*0.75 +.9*0.4 + 4.2*0.64 + 2.2*0.43 + 4.9*0.66 + 0.5*0.05 / (0.8 + 0.33 + 0.75 + 0.4 + 0.64 + 0.43 + 0.66 + 0.05 = 7.05 / 4.08 = 4.8 umber red blue 6..8.9. 5.3.75.25.9.4.59 4.2.64.36 2.2.43.57 4.9.66.34 0.5.05.95 4.08 3.92 Var(red = ((6.-4.8 2 *0.8 + (.4-4.8 2 *0.33 + (5.3-4.8 2 *0.75 + (.9-4.8 2 *0.4 + (4.2-4.8 2 *0.64 + (2.2-4.8 2 *0.43 + (4.9-4.8 2 *0.66 + (0.5-4.8 2 *0.05 / (0.8 + 0.33 + 0.75 + 0.4 + 0.64 + 0.43 + 0.66 + 0.05 4.08 red 8 Oct 2, 202 75 Oct 2, 202 76 EM for Gaussan Mxtures. Intalze, m and Q for all Gaussans 2. For each observaton compute a posteror probabltes for all Gaussan ' ; m, Q ' ; m, Q 3. Update mxture weghts, means and varances for all Gaussans 2 ( m m Q 4. If not converged, return to 2 ' ' 77 EM estmaton of Gaussan Mxtures An Example Hstogram of 4000 nstances of a randomly generated data Indvdual parameters of a two-gaussan mxture estmated by EM Two-Gaussan mxture estmated by EM 78 3

The same prncple can be extended to mxtures of other dstrbutons. E.g. Mxture of Laplacans: Laplacan parameters become m x x x x x b In a mxture of Gaussans and Laplacans, Gaussans use the Gaussan update rules, Laplacans use the Laplacan rule x x x x x m The EM algorthm s used whenever proper statstcal analyss of a phenomenon requres the nowledge of a hdden or mssng varable (or a set of hdden/mssng varables The hdden varable s often called a latent varable Some examples: Estmatng mxtures of dstrbutons Only data are observed. The ndvdual dstrbutons and mxng proportons must both be learnt. Estmatng the dstrbuton of data, when some attrbutes are mssng Estmatng the dynamcs of a system, based only on observatons that may be a complex functon of system state 79 80 Solve ths problem: Caller rolls a dce and flps a con He calls out the number rolled f the con shows head Otherwse he calls the number+ Determne p(heads and p(number for the dce from a collecton of ouputs The dce and the con Heads or tal? 4 Tals count Heads count 4 3 4. 3 Unnown: Whether t was head or tals.. Caller rolls two dce He calls out the sum Determne dce from a collecton of ouputs 8 82 The two dce Fragmentaton can be herarchcal 3, 4,3 P ( Z Z, Z 2,2 2 Unnown: How to partton the number Count blue (3 += 3, 4 Count blue (2 += 2,2 4 Count blue ( +=,3 4 Z Z 2 Z 3 Z 4 E.g. mxture of mxtures Fragments are further fragmented.. Wor ths out 83 84 4

More later Wll see a couple of other nstances of the use of EM Wor out HMM tranng Assume state output dstrbutons are multnomals Assume they are Gaussan Assume Gaussan mxtures 85 5