Unsupervised Learning: K-Means, Gaussian Mixture Models These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these slides for your own academic purposes, provided that you include proper attribution. Please send comments and corrections to Eric. Robot Image Credit: Viktoriya Sukhanova
Unsupervised Learning
Unsupervised Learning
Unsupervised Learning
Unsupervised Learning Supervised learning used labeled data pairs (x, y) to learn a function f : X Y But, what if we don t have labels? Most data doesn t! No labels = unsupervised learning Only some points are labeled = semi-supervised learning Labels may be expensive to obtain, so we only get a few
Unsupervised Learning Major Categories (and their uses) Clustering Density-estimation Dimensionality reduction (feature selection)
Kernel Density Estimation Some material adapted from slides by Le Song, GT.
Kernel Density Estimation Clustering with Gaussian Mixtures: Slide 8
Kernel Density Estimation Clustering with Gaussian Mixtures: Slide 9
Kernel Density Estimation Clustering with Gaussian Mixtures: Slide 10
Kernel Density Estimation Clustering with Gaussian Mixtures: Slide 11
Kernel Density Estimation Clustering with Gaussian Mixtures: Slide 12
Kernel Density Estimation Clustering with Gaussian Mixtures: Slide 13
K-Means Clustering Some material adapted from slides by Andrew Moore, CMU. Visit http://www.autonlab.org/tutorials/ for Andrew s repository of Data Mining tutorials.
Clustering Data
K-Means Clustering K-Means ( k, X ) Randomly choose k cluster center locations (centroids) Loop until convergence Assign each point to the cluster of the closest centroid Re-estimate the cluster centroids based on the data assigned to each cluster
K-Means Clustering K-Means ( k, X ) Randomly choose k cluster center locations (centroids) Loop until convergence Assign each point to the cluster of the closest centroid Re-estimate the cluster centroids based on the data assigned to each cluster
K-Means Clustering K-Means ( k, X ) Randomly choose k cluster center locations (centroids) Loop until convergence Assign each point to the cluster of the closest centroid Re-estimate the cluster centroids based on the data assigned to each cluster
K-Means Animation Example generated by Andrew Moore using Dan Pelleg s superduper fast K-means system: Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning. Proc. Conference on Knowledge Discovery in Databases 1999.
K-Means Animation Example generated by Andrew Moore using Dan Pelleg s superduper fast K-means system: Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning. Proc. Conference on Knowledge Discovery in Databases 1999.
K-Means Animation Example generated by Andrew Moore using Dan Pelleg s superduper fast K-means system: Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning. Proc. Conference on Knowledge Discovery in Databases 1999.
K-Means Animation Example generated by Andrew Moore using Dan Pelleg s superduper fast K-means system: Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning. Proc. Conference on Knowledge Discovery in Databases 1999.
K-Means Animation Example generated by Andrew Moore using Dan Pelleg s superduper fast K-means system: Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning. Proc. Conference on Knowledge Discovery in Databases 1999.
K-Means Animation Example generated by Andrew Moore using Dan Pelleg s superduper fast K-means system: Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning. Proc. Conference on Knowledge Discovery in Databases 1999.
K-Means Animation Example generated by Andrew Moore using Dan Pelleg s superduper fast K-means system: Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning. Proc. Conference on Knowledge Discovery in Databases 1999.
K-Means Animation Example generated by Andrew Moore using Dan Pelleg s superduper fast K-means system: Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning. Proc. Conference on Knowledge Discovery in Databases 1999.
K-Means Animation Example generated by Andrew Moore using Dan Pelleg s superduper fast K-means system: Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning. Proc. Conference on Knowledge Discovery in Databases 1999.
K-Means Animation Example generated by Andrew Moore using Dan Pelleg s superduper fast K-means system: Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning. Proc. Conference on Knowledge Discovery in Databases 1999.
K-Means Objective Function K-means finds a local optimum of the following objective function:
Problems with K-Means Very sensitive to the initial points Do many runs of K-Means, each with different initial centroids Seed the centroids using a better method than randomly choosing the centroids e.g., Farthest-first sampling Must manually choose k Learn the optimal k for the clustering Note that this requires a performance measure
Choosing the key parameter 1.The Elbow Method a. SSE= Ki=1 x cidist(x,ci)2 2.x-means clustering a. repeatedly attempting subdivision 3.Others: a.information Criterion Approach b.an Information Theoretic Approach c.the Silhouette Method d.cross-validation
Choosing the key parameter
Problems with K-Means How do you tell it which clustering you want? k = 2 Constrained clustering techniques (semi-supervised) Same-cluster constraint (must-link) Different-cluster constraint (cannot-link)
Gaussian Mixture Models and Expectation Maximization
Gaussian Mixture Models Recall the Gaussian distribution:
The GMM assumption There are k components. The i th component is called ω i Component ω i has an associated mean vector µ i µ1 µ 2 µ 3 Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 36
The GMM assumption There are k components. The i th component is called ω i Component ω i has an associated mean vector µ i Each component generates data from a Gaussian with mean µ i and covariance matrix σ 2 I Assume that each datapoint is generated according to the following recipe: µ 1 µ 2 µ 3 Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 37
The GMM assumption There are k components. The i th component is called ω i Component ω i has an associated mean vector µ i Each component generates data from a Gaussian with mean µ i and covariance matrix σ 2 I Assume that each datapoint is generated according to the following recipe: Pick a component at random. Choose component i with probability P(ω i ). µ 2 Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 38
The GMM assumption There are k components. The i th component is called ω i Component ω i has an associated mean vector µ i Each component generates data from a Gaussian with mean µ i and covariance matrix σ 2 I Assume that each datapoint is generated according to the following recipe: Pick a component at random. Choose component i with probability P(ω i ). Datapoint ~ N(µ i, σ 2 I ) µ 2 x Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 39
The General GMM assumption There are k components. The i th component is called ω i Component ω i has an associated mean vector µ i Each component generates data from a Gaussian with mean µ i and covariance matrix Σ i Assume that each datapoint is generated according to the following recipe: Pick a component at random. Choose component i with probability P(ω i ). Datapoint ~ N(µ i, Σ i ) µ 1 µ 2 µ 3 Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 40
Expectation-Maximization for GMMs Iterate until convergence: On the t th iteration let our estimates be λ t = { μ 1 (t), μ 2 (t) μ c (t) } Just evaluate a Gaussian at x k E-step: Compute expected classes of all datapoints for each class M-step: Estimate μ given our data s class membership distributions Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 41
E.M. for General GMMs Iterate. On the t th iteration let our estimates be p i (t) is shorthand for estimate of P(ω i ) on t th iteration λ t = { μ 1 (t), μ 2 (t) μ c (t), Σ 1 (t), Σ 2 (t) Σ c (t), p 1 (t), p 2 (t) p c (t) } E-step: Compute expected clusters of all datapoints Just evaluate a Gaussian at x k M-step: Estimate μ, Σ given our data s class membership distributions R = #records Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 42
Gaussian Mixture Example: Start Advance apologies: in Black and White this example will be incomprehensible Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 43
After first iteration Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 44
After 2nd iteration Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 45
After 3rd iteration Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 46
After 4th iteration Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 47
After 5th iteration Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 48
After 6th iteration Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 49
After 20th iteration Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 50
Some Bio Assay data Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 51
GMM clustering of the assay data Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 52
Resulting Density Estimator Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 53
Other Methods and Reading Hidden Markov Models: Andrew Ng s Notes Mean Shift Mean Shift Tutorial Deep Methods Tutorial on Variational Autoencoders Clustering with Gaussian Mixtures: Slide 54