Gaussian Mixture Models

Similar documents
MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

Statistical analysis using matlab. HY 439 Presented by: George Fortetsanakis

EM and Structure Learning

Mixture o f of Gaussian Gaussian clustering Nov

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

Lecture Nov

Generalized Linear Methods

Lecture Notes on Linear Regression

Overview. Hidden Markov Models and Gaussian Mixture Models. Acoustic Modelling. Fundamental Equation of Statistical Speech Recognition

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

A Robust Method for Calculating the Correlation Coefficient

Expectation Maximization Mixture Models HMMs

1 Motivation and Introduction

Semi-Supervised Learning

Space of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics

Probability Theory (revisited)

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Natural Images, Gaussian Mixtures and Dead Leaves Supplementary Material

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )

An Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation

First Year Examination Department of Statistics, University of Florida

Basically, if you have a dummy dependent variable you will be estimating a probability.

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

2 Finite difference basics

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

SDMML HT MSc Problem Sheet 4

APPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14

3.1 ML and Empirical Distribution

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Lecture 12: Discrete Laplacian

Supporting Information

10-701/ Machine Learning, Fall 2005 Homework 3

The Basic Idea of EM

Maximum Likelihood Estimation

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Singular Value Decomposition: Theory and Applications

Report on Image warping

CIVL 8/7117 Chapter 10 - Isoparametric Formulation 42/56

Calculating CLs Limits. Abstract

STAT 511 FINAL EXAM NAME Spring 2001

Linear Approximation with Regularization and Moving Least Squares

Stat 543 Exam 2 Spring 2016

Markov Chain Monte Carlo Lecture 6

Open Systems: Chemical Potential and Partial Molar Quantities Chemical Potential

BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS. M. Krishna Reddy, B. Naveen Kumar and Y. Ramu

Stat 543 Exam 2 Spring 2016

1 Matrix representations of canonical matrices

Expectation propagation

Ensemble Methods: Boosting

Hidden Markov Model Cheat Sheet

Why Bayesian? 3. Bayes and Normal Models. State of nature: class. Decision rule. Rev. Thomas Bayes ( ) Bayes Theorem (yes, the famous one)

Week 5: Neural Networks

Primer on High-Order Moment Estimators

NUMERICAL DIFFERENTIATION

The big picture. Outline

Probability Density Function Estimation by different Methods

Convergence of random processes

Grover s Algorithm + Quantum Zeno Effect + Vaidman

Learning with Maximum Likelihood

Clustering gene expression data & the EM algorithm

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law:

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1

Solving Nonlinear Differential Equations by a Neural Network Method

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

1 Convex Optimization

A Rigorous Framework for Robust Data Assimilation

Bayesian predictive Configural Frequency Analysis

Lecture 6 More on Complete Randomized Block Design (RBD)

Linear Regression Analysis: Terminology and Notation

Lecture 4: Universal Hash Functions/Streaming Cont d

Supplementary Notes for Chapter 9 Mixture Thermodynamics

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

Inexact Newton Methods for Inverse Eigenvalue Problems

Differentiating Gaussian Processes

Assortment Optimization under MNL

Composite Hypotheses testing

The Expectation-Maximization Algorithm

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression

Limited Dependent Variables

Engineering Risk Benefit Analysis

MA 323 Geometric Modelling Course Notes: Day 13 Bezier Curves & Bernstein Polynomials

Learning from Data 1 Naive Bayes

Lecture 12: Classification

EEE 241: Linear Systems

CSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing

Statistics Chapter 4

Problem Set 9 Solutions

Homework Assignment 3 Due in class, Thursday October 15

Quantifying Uncertainty

Transcription:

Lab Gaussan Mxture Models Lab Objectve: Understand the formulaton of Gaussan Mxture Models (GMMs) and how to estmate GMM parameters. You ve already seen GMMs as the observaton dstrbuton n certan contnuous densty HMMs. Here, we wll dscuss them further and learn how to estmate ther parameters, gven data. The man dea behnd a mxture model s contaned n the name,.e. t s a mxture of dfferent models. What do we mean by a mxture? A mxture model s composed of K components, each component beng responsble for a porton of the data. The responsbltes of these components are represented by mxture weghts w, for =,, k. As you may have guessed, these weghts are nonnegatve and sum to. Thus component j s responsble for 00 w j percent of the data generated by the model. Each component s tself a probablty dstrbuton. In a GMM, each component s specfcally a Gaussan (multvarate normal) dstrbuton. Thus we addtonally have parameters µ, Σ for =,, K,.e. a mean and covarance for each component n the GMM. It s mportant here to keep n mnd that a GMM does not arse from addng weghted multvarate normal random varables, but rather from weghtng the responsblty of each multvarate normal random varable. In the frst case, we would smply have a dfferent multvarate normal dstrbuton, whereas n the second case we have a mxture. Refer to Fgure?? for a vsualzaton of ths. Thus, a fully defned GMM has parameters λ = (w, µ, Σ). The densty of a GMM s gven by P(x λ) = = w N (x; µ, Σ ) where N (x; µ, Σ ) = (2π) K 2 Σ 2 e 2 (x µ)t Σ (x µ ) Problem. Wrte a functon to evaluate the densty of a normal dstrbuton at a pont x, gven parameters µ and Σ. Include the opton to return the log of ths probablty, but be sure to do t ntellgently! Also wrte a functon

2 Lab. Gaussan Mxture Models (a) Sum of weghted multvarate normal random varables. (b) Weghted mxture of multvarate normal random varables. that computes the densty of a GMM at a pont x, gven the parameters λ, along wth the log opton. Throughout ths lab, we wll buld a GMM class wth varous methods. We wll outlne ths now. Problem 2. Wrte the skeleton of a GMM class. In the nt method, t should accept the non-null parameter n components, as well as parameters for the weghts, means, and covarance matrces whch defne the GMM. Include a functon to generate data from a fully defned GMM (you may use your code from the CDHMM lab for ths), as well as the densty functon you recently defned. The man focus of ths lab wll be to estmate the parameters of a GMM, gven observed multvarate data Y = y, y 2,, y T. Ths can be done va Gbbs samplng, as well as wth EM (Expectaton Maxmzaton). We choose the latter approach for ths lab. To do ths, we must compute the probablty of an observaton beng from each component of a GMM wth parameters λ (n) = ( w (n), µ (n), Σ (n)). Ths s smply P(x t = y t, λ) w (n) N (y t ; µ (n), Σ (n) ) Just as wth HMMs, we refer to these probabltes as γ t (), and ths s the E-step n the algorthm. Ths mght seem straghtforward, except ths drect computaton wll lkely lead to numercal ssues. Instead, we work n the log space, whch means we have to be a bt more careful. It s feasble (and occurs qute often) that each term w (n) N (y t ; µ (n), Σ (n) ) s 0, because of underflow n the computaton of the multvarate normal densty. Lettng l (n) = ln w (n) + ln N (y t ; µ (n), Σ (n) ), we can compute these probabltes

3 more carefully, as follows: P(x t = y t, λ) = = = e l j= elj e l e max k l k j= elj e max k l k e l max k l k j= elj max k l k whch wll effectvely avod underflow problems. Problem 3. Add a method to your class to compute γ t () for t =,, T and =,, K. Don t forget to do ths ntellgently to avod underflow! Gven our matrx γ, we can reestmate our weghts, means, and covarance matrces as follows: w (n+) = µ (n+) = Σ (n+) = T γ t () t= t= γ t()y t t= γ t() t= γ t()(y t µ (n+) )(y t µ (n+) t= γ t() ) T for =,, K. These updates are the M -step n the algorthm. Problem 4. Add methods to your class to update w, µ and Σ as descrbed above. Wth the above work, we are almost ready to complete our class. To tran, we wll randomly ntalze our parameters λ, and then teratvely update them as above. Problem 5. Add a method to ntalze λ. Do ths ntellgently,.e. your means should not be far from your actual data used for tranng, and your covarances should nether be too bg nor too small. Your weghts should roughly be equal, and stll sum to. Also add a method to tran your model, as descrbed prevously, teratng untl convergence wthn some tolerance.

4 Lab. Gaussan Mxture Models We wll use our work to tran the Mckey Mouse GMM, whch has parameters w = [ 0.7 0.5 0.5 ] µ = [ 0.0 0.0 ] µ 2 = [.5 2.0 ] µ 3 = [.5 2.0 ] Σ = I 3 Σ 2 = 0.25 I 3 Σ 3 = 0.25 I 3 To look at ths GMM, we wll evaluate the densty at each pont on a grd, as follows: >>> mport matplotlb.pyplot as plt >>> x = np.arange(-3, 3, 0.) >>> y = np.arange(-2, 3, 0.) >>> X, Y = np.meshgrd(x, y) >>> N, M = X.shape >>> mmat = np.array([[model.dgmm(np.array([x[,j],y[,j]])) for j n xrange(m)] for n xrange(n)]) >>> plt.mshow(mmat, orgn='lower') >>> plt.show() See Fgure.2 for ths plot. Problem 6. Generate 750 samples from the above mxture model. Usng just the drawn samples, retran your model. Evaluate and plot your densty on the grd used above. How smlar s your densty to the orgnal? How close s our traned model to the orgnal one? We can use the symmetrc Kullback-Lebler dvergence to measure the dstance between two probablty dstrbutons wth denstes p(x) and p (x): SKL(p, p ) = 2 p(x) ln p(x) p (x) dx + 2 p (x) ln p (x) p(x) dx We cannot analytcally compute ths, so we use a Monte Carlo approxmaton, whch uses the fact that N f(x ) f(x)p(x)dx N = as N, assumng that each x p. Then we have the followng approxmaton of the symmetrc KL dvergence: SKL(p, p ) N ln p(x ) N 2N p (x ) + ln p (x ) p(x ) where x p and x p, for large N. = =

5 Fgure.2: Densty of true Mckey Mouse GMM. Problem 7. Wrte a functon to compute the approxmate the SKL of two GMMs. Compute the SKL between a randomly ntalzed GMM and the known GMM. Compute the SKL between the traned GMM and the known GMM. Is our traned model a good ft?