Mixture o f of Gaussian Gaussian clustering Nov

Similar documents
Lecture Nov

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

Gaussian Mixture Models

Outline. EM Algorithm and its Applications. K-Means Classifier. K-Means Classifier (Cont.) Introduction of EM K-Means EM EM Applications.

Generative classification models

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

EM and Structure Learning

Lecture 12: Classification

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression

Composite Hypotheses testing

15-381: Artificial Intelligence. Regression and cross validation

Classification learning II

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin

Discriminative classifier: Logistic Regression. CS534-Machine Learning

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

An Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation

5. POLARIMETRIC SAR DATA CLASSIFICATION

Hidden Markov Models

Evaluation for sets of classes

Maximum Likelihood Estimation (MLE)

Clustering & Unsupervised Learning

Lecture 3. Ax x i a i. i i

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Clustering gene expression data & the EM algorithm

Statistical pattern recognition

Computing MLE Bias Empirically

Discriminative classifier: Logistic Regression. CS534-Machine Learning

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

Classification as a Regression Problem

p 1 c 2 + p 2 c 2 + p 3 c p m c 2

Homework Assignment 3 Due in class, Thursday October 15

1 The Mistake Bound Model

Lecture Notes on Linear Regression

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

Clustering & (Ken Kreutz-Delgado) UCSD

Learning from Data 1 Naive Bayes

Convergence of random processes

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ

Expectation Maximization Mixture Models HMMs

Bayesian Learning. Smart Home Health Analytics Spring Nirmalya Roy Department of Information Systems University of Maryland Baltimore County

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING

Conjugacy and the Exponential Family

Overview. Hidden Markov Models and Gaussian Mixture Models. Acoustic Modelling. Fundamental Equation of Statistical Speech Recognition

Chapter 13: Multiple Regression

biologically-inspired computing lecture 21 Informatics luis rocha 2015 INDIANA UNIVERSITY biologically Inspired computing

Linear Regression Analysis: Terminology and Notation

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

1 Motivation and Introduction

Clustering with Gaussian Mixtures

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach

Hydrological statistics. Hydrological statistics and extremes

STATS 306B: Unsupervised Learning Spring Lecture 10 April 30

Uncertainty as the Overlap of Alternate Conditional Distributions

Mixture of Gaussians Expectation Maximization (EM) Part 2

Classification. Representing data: Hypothesis (classifier) Lecture 2, September 14, Reading: Eric CMU,

Computational Biology Lecture 8: Substitution matrices Saad Mneimneh

Space of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

I529: Machine Learning in Bioinformatics (Spring 2017) Markov Models

ERROR RATES STABILITY OF THE HOMOSCEDASTIC DISCRIMINANT FUNCTION

3.1 ML and Empirical Distribution

Outline. Multivariate Parametric Methods. Multivariate Data. Basic Multivariate Statistics. Steven J Zeil

Performance of Different Algorithms on Clustering Molecular Dynamics Trajectories

Note on EM-training of IBM-model 1

Boostrapaggregating (Bagging)

9.913 Pattern Recognition for Vision. Class IV Part I Bayesian Decision Theory Yuri Ivanov

ISQS 6348 Final Open notes, no books. Points out of 100 in parentheses. Y 1 ε 2

Measurement Uncertainties Reference

Why Bayesian? 3. Bayes and Normal Models. State of nature: class. Decision rule. Rev. Thomas Bayes ( ) Bayes Theorem (yes, the famous one)

Unified Subspace Analysis for Face Recognition

Learning with Maximum Likelihood

Probability and Random Variable Primer

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

Course 395: Machine Learning - Lectures

Machine learning: Density estimation

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 3: Large deviations bounds and applications Lecturer: Sanjeev Arora

x yi In chapter 14, we want to perform inference (i.e. calculate confidence intervals and perform tests of significance) in this setting.

Semi-Supervised Learning

Mean Field / Variational Approximations

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015

Vapnik-Chervonenkis theory

10-701/ Machine Learning, Fall 2005 Homework 3

Basically, if you have a dummy dependent variable you will be estimating a probability.

The Expectation-Maximization Algorithm

Lecture 12: Discrete Laplacian

Ensemble Methods: Boosting

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models

A Robust Method for Calculating the Correlation Coefficient

Statistical analysis using matlab. HY 439 Presented by: George Fortetsanakis

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

9 Derivation of Rate Equations from Single-Cell Conductance (Hodgkin-Huxley-like) Equations

Regularized Discriminant Analysis for Face Recognition

Transcription:

Mture of Gaussan clusterng Nov 11 2009

Soft vs hard lusterng Kmeans performs Hard clusterng: Data pont s determnstcally assgned to one and only one cluster But n realty clusters may overlap Soft-clusterng: Data ponts are assgned to clusters wth certan probabltes

How can we etend Kmeans to make soft clusterng Gven a set of clusters centers μ 1, μ 2,, μ k, nstead of drectly assgn all data ponts to ther closest clusters, we can assgn them partally probablstcally based on the dstances If each pont only partally belongs to a partcular cluster, when computng the centrod, should we stll use t as f t was fully there?

Gaussan for representng a cluster What eactly s a cluster? Intutvely t s a tghtly packed ball-shape lke thng We can use a Gaussan normal dstrbuton to descrbe t Let s frst revew what s a Gaussan dstrbuton

Sde track: Gaussan Dstrbtuon Unvarate Gaussan dstrbuton: Nμ, σ 2 μ mean, center of the mass σ 2 standard devaton, spread of the mass Multvarate Gaussan dstrbuton: Nμ, Σ μ μ 1, μ 2 Σ ovarance matr σ 2 1 σ 12 σ 12 σ 2 2

Dfferent covarance matrces ovarance matr Σ = σ 2 0 0 σ 2 ovarance matr Σ = σ 1 2 0 0 σ 2 2 ovarance matr Σ = σ 1 2 σ 12 σ 12 σ 2 2 Usng dfferent forms of covarance matr allows for clusters of dfferent shapes

Mture of Gaussans Assume that we have k clusters n our data Each cluster contans data generated from a Gaussan dstrbuton Overall process of generatng g data: frst randomly select one of the clusters accordng to a pror dstrbuton of the clusters draw a random sample from the Gaussan dstrbuton of that partcular cluster Smlar to the generatve model we have learned n Bayes lassfer, dfference? Here we don t know the cluster membershp of each data pont thus unsupervsed

lusterng usng mture of Gaussan models Gven a set of data ponts, and assume that we know there are k clusters n the data, we need to: Assgn the data ponts to the k clusters soft assgnment Learn the gaussan dstrbuton b t parameters for each cluster: μ and Σ

A smpler problem A smpler problem If we know the parameters of each Gaussan: If we know the parameters of each Gaussan: μ 1,Σ 1 ; μ 2,Σ 2 ;..., μ Κ,Σ Κ we can compute the probablty of each data pont belongng to each cluster P P P P = ] 2 1 ep[ 2 1 1 2 1/ 2 / T d P μ μ π α Σ Σ The same as n makng predcton n Bayes classfer classfer

Another smpler problem If we know what ponts belong to cluster, we can estmate the gaussan parameters easly: luster pror ˆμ = 1 n luster mean Σˆ = 1 n What we have s slghtly dfferent T ˆ μ ˆ μ luster covarance For each data pont, we have P for =1,2,, K

Modfcatons luster pror = = n P n, 1, 1 L α = 1 ˆμ luster mean = =,n, P 1 ˆ L μ n mean = n P, 1, L μ luster covarance T n = Σ ˆ ˆ 1 ˆ μ μ = = Σ,n, T P P 1 1 ˆ ˆ ˆ L μ μ = n,, 1 L

A procedure smlar to Kmeans Randomly ntalze the Gaussan parameters Repeat untl converge 1. ompute P for all data ponts and all clusters Ths s called the E-step for t computes the epected values of the cluster membershps for each data pont 2. Re-compute the parameters of each Gaussan Ths s called the M-step for t performs mamum lkelhood estmaton of parameters

Q: Why are these two ponts red when they appear to be closer to blue?

K-Means s a Specal ase we get K-Means f we make followng restrctons: All Gaussans have the dentty covarance matr.e., sphercal Gaussans Use hard assgnment for the E-step to assgn data pont to ts most lkely cluster

Behavor of EM It s guaranteed to converge In practce t may converge slowly, one can stop early f the change n loglkelhood s smaller than a threshold Lke K-means t converges to a local l optmum Multple restart s recommended