Mixture o f of Gaussian Gaussian clustering Nov

Mture of Gaussan clusterng Nov 11 2009

Soft vs hard lusterng Kmeans performs Hard clusterng: Data pont s determnstcally assgned to one and only one cluster But n realty clusters may overlap Soft-clusterng: Data ponts are assgned to clusters wth certan probabltes

How can we etend Kmeans to make soft clusterng Gven a set of clusters centers μ 1, μ 2,, μ k, nstead of drectly assgn all data ponts to ther closest clusters, we can assgn them partally probablstcally based on the dstances If each pont only partally belongs to a partcular cluster, when computng the centrod, should we stll use t as f t was fully there?

Gaussan for representng a cluster What eactly s a cluster? Intutvely t s a tghtly packed ball-shape lke thng We can use a Gaussan normal dstrbuton to descrbe t Let s frst revew what s a Gaussan dstrbuton

Sde track: Gaussan Dstrbtuon Unvarate Gaussan dstrbuton: Nμ, σ 2 μ mean, center of the mass σ 2 standard devaton, spread of the mass Multvarate Gaussan dstrbuton: Nμ, Σ μ μ 1, μ 2 Σ ovarance matr σ 2 1 σ 12 σ 12 σ 2 2

Dfferent covarance matrces ovarance matr Σ = σ 2 0 0 σ 2 ovarance matr Σ = σ 1 2 0 0 σ 2 2 ovarance matr Σ = σ 1 2 σ 12 σ 12 σ 2 2 Usng dfferent forms of covarance matr allows for clusters of dfferent shapes

Mture of Gaussans Assume that we have k clusters n our data Each cluster contans data generated from a Gaussan dstrbuton Overall process of generatng g data: frst randomly select one of the clusters accordng to a pror dstrbuton of the clusters draw a random sample from the Gaussan dstrbuton of that partcular cluster Smlar to the generatve model we have learned n Bayes lassfer, dfference? Here we don t know the cluster membershp of each data pont thus unsupervsed

lusterng usng mture of Gaussan models Gven a set of data ponts, and assume that we know there are k clusters n the data, we need to: Assgn the data ponts to the k clusters soft assgnment Learn the gaussan dstrbuton b t parameters for each cluster: μ and Σ

A smpler problem A smpler problem If we know the parameters of each Gaussan: If we know the parameters of each Gaussan: μ 1,Σ 1 ; μ 2,Σ 2 ;..., μ Κ,Σ Κ we can compute the probablty of each data pont belongng to each cluster P P P P = ] 2 1 ep[ 2 1 1 2 1/ 2 / T d P μ μ π α Σ Σ The same as n makng predcton n Bayes classfer classfer

Another smpler problem If we know what ponts belong to cluster, we can estmate the gaussan parameters easly: luster pror ˆμ = 1 n luster mean Σˆ = 1 n What we have s slghtly dfferent T ˆ μ ˆ μ luster covarance For each data pont, we have P for =1,2,, K

Modfcatons luster pror = = n P n, 1, 1 L α = 1 ˆμ luster mean = =,n, P 1 ˆ L μ n mean = n P, 1, L μ luster covarance T n = Σ ˆ ˆ 1 ˆ μ μ = = Σ,n, T P P 1 1 ˆ ˆ ˆ L μ μ = n,, 1 L

A procedure smlar to Kmeans Randomly ntalze the Gaussan parameters Repeat untl converge 1. ompute P for all data ponts and all clusters Ths s called the E-step for t computes the epected values of the cluster membershps for each data pont 2. Re-compute the parameters of each Gaussan Ths s called the M-step for t performs mamum lkelhood estmaton of parameters

Q: Why are these two ponts red when they appear to be closer to blue?

K-Means s a Specal ase we get K-Means f we make followng restrctons: All Gaussans have the dentty covarance matr.e., sphercal Gaussans Use hard assgnment for the E-step to assgn data pont to ts most lkely cluster

Behavor of EM It s guaranteed to converge In practce t may converge slowly, one can stop early f the change n loglkelhood s smaller than a threshold Lke K-means t converges to a local l optmum Multple restart s recommended