Unsupervised Learning: K-Means, Gaussian Mixture Models

Size: px

Start display at page:

Download "Unsupervised Learning: K-Means, Gaussian Mixture Models"

Aileen Andrews
5 years ago
Views:

Unsupervised Learning: K-Means, Gaussian Mixture Models These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available

1 Unsupervised Learning: K-Means, Gaussian Mixture Models These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these slides for your own academic purposes, provided that you include proper attribution. Please send comments and corrections to Eric. Robot Image Credit: Viktoriya Sukhanova

2 Unsupervised Learning

3 Unsupervised Learning

4 Unsupervised Learning

5 Unsupervised Learning Supervised learning used labeled data pairs (x, y) to learn a function f : X Y But, what if we don t have labels? Most data doesn t! No labels = unsupervised learning Only some points are labeled = semi-supervised learning Labels may be expensive to obtain, so we only get a few

6 Unsupervised Learning Major Categories (and their uses) Clustering Density-estimation Dimensionality reduction (feature selection)

7 Kernel Density Estimation Some material adapted from slides by Le Song, GT.

8 Kernel Density Estimation Clustering with Gaussian Mixtures: Slide 8

9 Kernel Density Estimation Clustering with Gaussian Mixtures: Slide 9

10 Kernel Density Estimation Clustering with Gaussian Mixtures: Slide 10

11 Kernel Density Estimation Clustering with Gaussian Mixtures: Slide 11

12 Kernel Density Estimation Clustering with Gaussian Mixtures: Slide 12

13 Kernel Density Estimation Clustering with Gaussian Mixtures: Slide 13

14 K-Means Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for Andrew s repository of Data Mining tutorials.

15 Clustering Data

16 K-Means Clustering K-Means ( k, X ) Randomly choose k cluster center locations (centroids) Loop until convergence Assign each point to the cluster of the closest centroid Re-estimate the cluster centroids based on the data assigned to each cluster

17 K-Means Clustering K-Means ( k, X ) Randomly choose k cluster center locations (centroids) Loop until convergence Assign each point to the cluster of the closest centroid Re-estimate the cluster centroids based on the data assigned to each cluster

18 K-Means Clustering K-Means ( k, X ) Randomly choose k cluster center locations (centroids) Loop until convergence Assign each point to the cluster of the closest centroid Re-estimate the cluster centroids based on the data assigned to each cluster

19 K-Means Animation Example generated by Andrew Moore using Dan Pelleg s superduper fast K-means system: Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning. Proc. Conference on Knowledge Discovery in Databases 1999.

20 K-Means Animation Example generated by Andrew Moore using Dan Pelleg s superduper fast K-means system: Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning. Proc. Conference on Knowledge Discovery in Databases 1999.

21 K-Means Animation Example generated by Andrew Moore using Dan Pelleg s superduper fast K-means system: Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning. Proc. Conference on Knowledge Discovery in Databases 1999.

22 K-Means Animation Example generated by Andrew Moore using Dan Pelleg s superduper fast K-means system: Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning. Proc. Conference on Knowledge Discovery in Databases 1999.

23 K-Means Animation Example generated by Andrew Moore using Dan Pelleg s superduper fast K-means system: Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning. Proc. Conference on Knowledge Discovery in Databases 1999.

24 K-Means Animation Example generated by Andrew Moore using Dan Pelleg s superduper fast K-means system: Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning. Proc. Conference on Knowledge Discovery in Databases 1999.

25 K-Means Animation Example generated by Andrew Moore using Dan Pelleg s superduper fast K-means system: Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning. Proc. Conference on Knowledge Discovery in Databases 1999.

26 K-Means Animation Example generated by Andrew Moore using Dan Pelleg s superduper fast K-means system: Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning. Proc. Conference on Knowledge Discovery in Databases 1999.

27 K-Means Animation Example generated by Andrew Moore using Dan Pelleg s superduper fast K-means system: Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning. Proc. Conference on Knowledge Discovery in Databases 1999.

28 K-Means Animation Example generated by Andrew Moore using Dan Pelleg s superduper fast K-means system: Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning. Proc. Conference on Knowledge Discovery in Databases 1999.

29 K-Means Objective Function K-means finds a local optimum of the following objective function:

30 Problems with K-Means Very sensitive to the initial points Do many runs of K-Means, each with different initial centroids Seed the centroids using a better method than randomly choosing the centroids e.g., Farthest-first sampling Must manually choose k Learn the optimal k for the clustering Note that this requires a performance measure

31 Choosing the key parameter 1.The Elbow Method a. SSE= Ki=1 x cidist(x,ci)2 2.x-means clustering a. repeatedly attempting subdivision 3.Others: a.information Criterion Approach b.an Information Theoretic Approach c.the Silhouette Method d.cross-validation

32 Choosing the key parameter

33 Problems with K-Means How do you tell it which clustering you want? k = 2 Constrained clustering techniques (semi-supervised) Same-cluster constraint (must-link) Different-cluster constraint (cannot-link)

34 Gaussian Mixture Models and Expectation Maximization

35 Gaussian Mixture Models Recall the Gaussian distribution:

36 The GMM assumption There are k components. The i th component is called ω i Component ω i has an associated mean vector µ i µ1 µ 2 µ 3 Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 36

37 The GMM assumption There are k components. The i th component is called ω i Component ω i has an associated mean vector µ i Each component generates data from a Gaussian with mean µ i and covariance matrix σ 2 I Assume that each datapoint is generated according to the following recipe: µ 1 µ 2 µ 3 Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 37

38 The GMM assumption There are k components. The i th component is called ω i Component ω i has an associated mean vector µ i Each component generates data from a Gaussian with mean µ i and covariance matrix σ 2 I Assume that each datapoint is generated according to the following recipe: Pick a component at random. Choose component i with probability P(ω i ). µ 2 Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 38

39 The GMM assumption There are k components. The i th component is called ω i Component ω i has an associated mean vector µ i Each component generates data from a Gaussian with mean µ i and covariance matrix σ 2 I Assume that each datapoint is generated according to the following recipe: Pick a component at random. Choose component i with probability P(ω i ). Datapoint ~ N(µ i, σ 2 I ) µ 2 x Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 39

40 The General GMM assumption There are k components. The i th component is called ω i Component ω i has an associated mean vector µ i Each component generates data from a Gaussian with mean µ i and covariance matrix Σ i Assume that each datapoint is generated according to the following recipe: Pick a component at random. Choose component i with probability P(ω i ). Datapoint ~ N(µ i, Σ i ) µ 1 µ 2 µ 3 Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 40

Expectation-Maximization for GMMs Iterate until convergence: On the t th iteration let our estimates be λ t = { μ 1 (t), μ 2 (t) μ c (t) } Just evaluate a Gaussian at x k E-step: Compute

41 Expectation-Maximization for GMMs Iterate until convergence: On the t th iteration let our estimates be λ t = { μ 1 (t), μ 2 (t) μ c (t) } Just evaluate a Gaussian at x k E-step: Compute expected classes of all datapoints for each class M-step: Estimate μ given our data s class membership distributions Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 41

all datapoints Just evaluate a Gaussian at x k M-step: Estimate μ, Σ given our

42 E.M. for General GMMs Iterate. On the t th iteration let our estimates be p i (t) is shorthand for estimate of P(ω i ) on t th iteration λ t = { μ 1 (t), μ 2 (t) μ c (t), Σ 1 (t), Σ 2 (t) Σ c (t), p 1 (t), p 2 (t) p c (t) } E-step: Compute expected clusters of all datapoints Just evaluate a Gaussian at x k M-step: Estimate μ, Σ given our data s class membership distributions R = #records Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 42

54 Other Methods and Reading Hidden Markov Models: Andrew Ng s Notes Mean Shift Mean Shift Tutorial Deep Methods Tutorial on Variational Autoencoders Clustering with Gaussian Mixtures: Slide 54

Unsupervised Learning: K- Means & PCA

Unsupervised Learning: K- Means & PCA Unsupervised Learning Supervised learning used labeled data pairs (x, y) to learn a func>on f : X Y But, what if we don t have labels? No labels = unsupervised learning