Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan

Similar documents
Data Preprocessing. Cluster Similarity

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Part I. Linear regression & LASSO. Linear Regression. Linear Regression. Week 10 Based in part on slides from textbook, slides of Susan Holmes

Data Exploration and Unsupervised Learning with Clustering

Unsupervised machine learning

University of Florida CISE department Gator Engineering. Clustering Part 1

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Statistical Machine Learning

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

Data Mining Techniques

Mixtures of Gaussians. Sargur Srihari

Chapter 5-2: Clustering

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Clustering VS Classification

L11: Pattern recognition principles

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Clustering and Gaussian Mixture Models

Statistical Pattern Recognition

STATS 306B: Unsupervised Learning Spring Lecture 5 April 14

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

CSE446: Clustering and EM Spring 2017

Lecture 8: Clustering & Mixture Models

Clustering. Stephen Scott. CSCE 478/878 Lecture 8: Clustering. Stephen Scott. Introduction. Outline. Clustering.

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata

Density-Based Clustering

Machine Learning. Clustering 1. Hamid Beigy. Sharif University of Technology. Fall 1395

Latent Variable Models and Expectation Maximization

Latent Variable Models and Expectation Maximization

Final Exam, Machine Learning, Spring 2009

Multivariate Statistics

Multivariate Statistics: Hierarchical and k-means cluster analysis

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Clustering with k-means and Gaussian mixture distributions

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Clustering with k-means and Gaussian mixture distributions

Unsupervised Learning

Randomized Algorithms

Gaussian Mixture Models, Expectation Maximization

Latent Variable Models and EM Algorithm

Gaussian Mixture Models

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Logic and machine learning review. CS 540 Yingyu Liang

Pattern Recognition and Machine Learning

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Applying cluster analysis to 2011 Census local authority data

Machine Learning Lecture 5

Clustering Lecture 1: Basics. Jing Gao SUNY Buffalo

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

Machine Learning for Data Science (CS4786) Lecture 12

Clustering and Gaussian Mixtures

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

CSCI-567: Machine Learning (Spring 2019)

ECE 5984: Introduction to Machine Learning

Lecture 11: Unsupervised Machine Learning

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Brief Introduction of Machine Learning Techniques for Content Analysis

Modern Information Retrieval

Statistical Learning. Dong Liu. Dept. EEIS, USTC

Data Mining Techniques

Clustering, K-Means, EM Tutorial

Statistical Pattern Recognition

MLCC Clustering. Lorenzo Rosasco UNIGE-MIT-IIT

Module Master Recherche Apprentissage et Fouille

1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM.

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

Clustering using Mixture Models

CS6220: DATA MINING TECHNIQUES

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Pattern Recognition

Introduction to Machine Learning

CPSC 540: Machine Learning

PCA and admixture models

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Lecture 4: Probabilistic Learning

K-Means and Gaussian Mixture Models

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Clustering. Léon Bottou COS 424 3/4/2010. NEC Labs America

STA 414/2104: Lecture 8

Probabilistic clustering

Machine Learning for Signal Processing Bayes Classification and Regression

ECE521 week 3: 23/26 January 2017

CS145: INTRODUCTION TO DATA MINING

Multivariate Analysis

Clustering with k-means and Gaussian mixture distributions

STA 4273H: Statistical Machine Learning

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

Expectation Maximization

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Machine Learning - MT Clustering

PATTERN CLASSIFICATION

Transcription:

Clustering CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in

Supervised vs Unsupervised Learning Supervised learning Given x ", y " "%& ', learn a function f: X Y Categorical output classification Continuous output regression Unsupervised learning - Given x " "%& ', can we infer the structure of the data? Learning without a teacher Clustering CSL465/603 - Machine Learning 2

Why Unsupervised Learning? Unlabeled data is cheap Labeled data is expensive cumbersome to collect Exploratory data analysis Preprocessing step for supervised learning algorithms Analysis of data in high dimensional spaces Clustering CSL465/603 - Machine Learning 3

Cluster Analysis Discover groups such that samples within a group are more similar to each other than samples across groups Clustering CSL465/603 - Machine Learning 4

Applications of Clustering (1) Unsupervised image segmentation Clustering CSL465/603 - Machine Learning 5

Applications of Clustering (2) Image Compression Clustering CSL465/603 - Machine Learning 6

Applications of Clustering (3) Social network clustering Clustering CSL465/603 - Machine Learning 7

Applications of Clustering (4) Recommendation Systems Clustering CSL465/603 - Machine Learning 8

Components of Clustering A dissimilarity (similarity) function Measures the distance/dissimilarity between examples A loss function Evaluates the clusters An algorithm that optimizes this loss function Clustering CSL465/603 - Machine Learning 9

Proximity Matrices Data is directly represented in terms of proximity between pairs of objects Subjectively judged dissimilarities are seldom distance in the strict sense (not necessarily follow the properties of a distance measure) Replace the proximity matrix D by D + D 0 /2 Clustering CSL465/603 - Machine Learning 10

Dissimilarity Based on Attributes (1) Data point x " has D features Attributes are real-valued Euclidean distance between the data points : D x ", x 4 = 6 x "7 x 47 ^2 7%& Resulting clusters are invariant to rotation and translation, but not to scaling If features have different scales - standardize the data Clustering CSL465/603 - Machine Learning 11

Dissimilarity Based on Attributes (2) Data point x " has D features Attributes are real-valued Any L = norm? D x ", x 4 = 6 x "7 x 47 > : 7%& Cosine distance between the data points : 7%& x "7 x 47 D x ", x 4 = : x A 7%& "7 : xa 7%& 47 Clustering CSL465/603 - Machine Learning 12

Dissimilarity Based on Attributes (3) Data point x " has D features Attributes are ordinal Grades A, B, C, D Answers to survey question - strongly agree, agree, neutral, disagree Replace the ordinal values by quantitative representations m 1/2 M, m = 1,, M Clustering CSL465/603 - Machine Learning 13

Dissimilarity Based on Attributes (4) Data point x " has D features Attributes are categorical Values of an attribute are unordered Define explicit difference between the values d && d &H d H& d HH Often For identical values - d K,K L = 0, if m = m P For different values- d K,K L= 1, if m m P Clustering CSL465/603 - Machine Learning 14

Loss Function for Clustering (1) Assign each observation to a cluster without regard to the probability model describing the data Let K - be the number of clusters and k - indexes into the number of clusters Each observation is assigned to one and only one cluster View the assignment as a function C i = k Loss function [ W C = 1 2 6 6 6 d(x ", x " L) Z%& Y " %Z Y " L %Z Characterized the extent to which observations assigned to the same cluster tend to be close to one another Within cluster distance/scatter Clustering CSL465/603 - Machine Learning 15

Loss Function for Clustering (2) Consider the function ' ' Total point scatter This can be divided as [ T = 1 2 6 6 d "" L "%& " L %& T = 1 2 6 6 6 d "" L Z%& Y " %Z Y " L %Z + 6 d "" L Y " L ]Z T = W C + B(C) Clustering CSL465/603 - Machine Learning 16

Loss Function for Clustering (3) The function B C [ B C = 1 2 6 6 6 d "" L Z%& Y " %Z Y " L ]Z Between cluster distance/scatter Thus minimizing W C is equivalent to maximizing B C Clustering CSL465/603 - Machine Learning 17

Combinatorial Clustering Minimize W over all possible assignments of N data points to K clusters Unfortunately feasible only for very small data sets The number of distinct assignments is [ S N, K = 1 K! 6 1 [bz K k k' S(10, 4) = 34,105 S 19, 4 = 10 &g Z%& Not a practical clustering algorithm Clustering CSL465/603 - Machine Learning 18

K- Means Clustering (1) Most popular iterative descent clustering method Suppose all variables/features are real-valued and we use squared Euclidean distance as the dissimilarity measure A d x h, x " L = x " x " L The within cluster scatter can be written as [ W C = 1 2 6 6 6 x " x " L Z%& [ Y " %Z Y " L %Z = 6 N Z 6 x " xi Z A Z%& Y " %Z Clustering CSL465/603 - Machine Learning 19 A

K-Means Clustering (2) Find C A = min 6 N Z 6 x " xi Z Y Z%& Y " %Z Note that for a set S So find C = [ x n = argmin r 6 x " m A min Y, r w t tuv [ " n 6 N Z 6 x " m Z A Z%& Y " %Z Clustering CSL465/603 - Machine Learning 20

K-Means Clustering (3) Find the optimal solution using Expectation Maximization Iterative procedure consisting of two steps Expectation step (E Step) Fix the mean vectors [ m Z Z%& and find the optimal C Maximization step (M step) Fix the cluster assignments C and find the optimal mean vectors [ m Z Z%& Each step of this procedure reduces the loss function value Clustering CSL465/603 - Machine Learning 21

K-Means Clustering Illustration (1) Clustering CSL465/603 - Machine Learning 22

K-Means Clustering Illustration (2) Clustering CSL465/603 - Machine Learning 23

K-Means Clustering Illustration (3) Clustering CSL465/603 - Machine Learning 24

K-Means Clustering Illustration (4) Clustering CSL465/603 - Machine Learning 25

K-Means Clustering Illustration (5) Clustering CSL465/603 - Machine Learning 26

K-Means Clustering Illustration (6) Clustering CSL465/603 - Machine Learning 27

K-Means Clustering Illustration (7) Clustering CSL465/603 - Machine Learning 28

K-Means Clustering Illustration (8) Clustering CSL465/603 - Machine Learning 29

K-Means Clustering Illustration (9) Clustering CSL465/603 - Machine Learning 30

K-Means Clustering Illustration (10) Blue point - Expectation step Red point Maximization step Clustering CSL465/603 - Machine Learning 31

How to Choose K? Similar to choosing K in knn The loss function generally decreases with K Clustering CSL465/603 - Machine Learning 32

Limitations of K-Means Clustering Hard assignments are susceptible to noise/outliers Assumes spherical (convex) clusters with uniform prior on the clusters Clusters can change arbitrarily for different K and initializations Clustering CSL465/603 - Machine Learning 33

K-Medoids K-Means is suitable only when using Euclidean distance Susceptible to outliers Challenge when the centroid of a cluster is not a valid data point Generalizing K-Means to arbitrary distance measures Replace the mean calcluation by median calculation Ensures the centroid to be a medoid always a valid data point Increases computation as we have to now find the medoid Clustering CSL465/603 - Machine Learning 34

Soft K-Means as Gaussian Mixture Models (1) Probabilistic Clusters Each cluster is associated with a Gaussian Distribution - N(μ Z, Σ Z ) Each cluster also has a prior probability - π Z Then the likelihood of a data point drawn from the K clusters will be Where [ Z%& π Z = 1 [ P x = 6 π Z P x μ Z, Σ Z Z%& Clustering CSL465/603 - Machine Learning 35

Soft K-Means as Gaussian Mixture Models (2) Given N iid data points, the likelihood function P x &,, x ' is P x &,, x ' = Clustering CSL465/603 - Machine Learning 36

Soft K-Means as Gaussian Mixture Models (3) Given N iid data points, the likelihood function P x &,, x ' is ' ' P x &,, x ' = "%& P(x " ) = [ "%& Z%& π Z P x " μ Z, Σ Z Let us take the negative log likelihood Clustering CSL465/603 - Machine Learning 37

Soft K-Means as Gaussian Mixture Models (4) Given N iid data points, the likelihood function P x &,, x ' is ' ' P x &,, x ' = "%& P(x " ) = [ "%& Z%& π Z P x " μ Z, Σ Z Let us take the log likelihood ' [ 6 log 6 π Z P x " μ Z, Σ Z "%& Z%& Clustering CSL465/603 - Machine Learning 38

Soft K-Means as Gaussian Mixture Models (5) Problem with maximum likelihood Sum over the components appears inside the log, thus coupling all parameters Can lead to singularities ' [ 6 log 6 π Z P x " μ Z, Σ Z "%& Z%& Clustering CSL465/603 - Machine Learning 39

Soft K-Means as Gaussian Mixture Models (6) Latent Variables Each data point x " is associated with a latent variable - z " = z "&,, z "[ [ Z%& Where z "Z 0, 1, z "Z = 1 and P z "Z = 1 = π Z Given the complete data X, Z, we look at maximizing P X, Z π Z, μ Z, Σ Z Clustering CSL465/603 - Machine Learning 40

Soft K-Means as Gaussian Mixture Models (7) Latent Variables Each data point x " is associated with a latent variable - z " = z "&,, z "[ [ Z%& Where z "Z 0, 1, z "Z = 1 and P z "Z = 1 = π Z Let the probability P z "Z = 1 x " be denoted as γ z "Z From Bayes theorem γ z "Z = P z "Z = 1 x " = P z Z = 1 P x " z "Z = 1 P x " The marginal distribution P x " = t P x ", z " = [ P z "Z = 1 P x " z "Z = 1 Z%& Clustering CSL465/603 - Machine Learning 41

Soft K-Means as Gaussian Mixture Models (8) Now, Therefore P z Z = 1 = π Z P x " z "Z = 1 = N x " μ Z, Σ Z γ z "Z = P z "Z = 1 x " = Clustering CSL465/603 - Machine Learning 42

Estimating the mean μ Z (1) Begin with the log-likelihood function ' [ 6 log 6 π Z P x " μ Z, Σ Z "%& Z%& Taking the derivative wrt to μ Z and equating it to 0 Clustering CSL465/603 - Machine Learning 43

Estimating the mean μ Z (2) ' μ Z = 1 N Z 6 γ z "Z x " "%& ' Where N Z = "%& γ z "Z Effective number of points assigned to cluster k So the mean of k Gaussian component is the weighted mean of all the points in the dataset Where the weight of the i data point is the posterior probability that component k was responsible for generating x " Clustering CSL465/603 - Machine Learning 44

Estimating the Covariance Σ Z Begin with the log-likelihood function ' [ 6 log 6 π Z P x " μ Z, Σ Z "%& Z%& Taking the derivative wrt to Σ Z and equating it to 0 ' Σ Z = 1 N Z 6 γ z "Z x " μ Z 0 x " μ Z "%& Similar to the result for a single Gaussian for the dataset, but each data point is weighted by the corresponding posterior probability. Clustering CSL465/603 - Machine Learning 45

Estimating the mixing coefficients π Z Begin with the log-likelihood function ' [ 6 log 6 π Z P x " μ Z, Σ Z "%& Z%& Maximize the log-likelihood, w.r.t π Z Subject to the condition that Z%& π Z = 1 Use Lagrange multiplier λ and maximize ' [ 6 log 6 π Z P x " μ Z, Σ Z "%& Z%& Solving this will result in [ π Z = N Z N [ + λ 6 π Z Z%& 1 Clustering CSL465/603 - Machine Learning 46

Soft K-Means as Gaussian Mixture Models (8) In Summary π Z = ' t ' μ Z = & ' γ z ' "%& "Z x " t Σ Z = & ' γ z ' "%& "Z x " μ 0 Z x " μ Z t But then what if z "Z is unkown? Use EM algorithm! Clustering CSL465/603 - Machine Learning 47

EM for GMM First choose initial values for π Z, μ Z, Σ Z Alternate between Expectation and Maximization Steps Expectation Step (E) Given the parameters of the compute the posterior probabilities γ(z "Z ) Maximization step (M) Given the posterior probabilities, update π Z, μ Z, Σ Z Clustering CSL465/603 - Machine Learning 48

EM for GMM Illustration (1) Clustering CSL465/603 - Machine Learning 49

EM for GMM Illustration (2) Clustering CSL465/603 - Machine Learning 50

EM for GMM Illustration (3) Clustering CSL465/603 - Machine Learning 51

EM for GMM Illustration (4) Clustering CSL465/603 - Machine Learning 52

EM for GMM Illustration (5) Clustering CSL465/603 - Machine Learning 53

EM for GMM Illustration (6) Clustering CSL465/603 - Machine Learning 54

Practical Issues with EM for GMM Takes many more iterations than k-means Each iteration requires more computation Run k-means first, and then EM for GMM Covariance can be initialized to the covariance of the clusters obtained from k-means EM is not guaranteed to find the global maximum of the log likelihood function Check for convergence Log likelihood does not change significantly between two iterations Clustering CSL465/603 - Machine Learning 55

Hierarchical Clustering (1) Organize clusters in a hierarchical fashion Produces a rooted binary tree (dendrogram) Clustering CSL465/603 - Machine Learning 56

Hierarchical Clustering (2) Bottom-up (agglomerative): recursively merge two groups with the smallest between cluster similarity Top-down (divisive): recursively split the least coherent cluster Users can choose a cut through the hierarchy to represent the most natural division of clusters Clustering CSL465/603 - Machine Learning 57

Hierarchical Clustering (3) Bottom-up (agglomerative): recursively merge two groups with the smallest between cluster similarity Top-down (divisive): recursively split the least coherent cluster Share a monotonicity property Dissimilarity between merged clusters is monotone increase with the level of the merger Cophenetic correlation coefficient Correlation between the N(N 1)/2 pairwise observation dissimilarities and the cophenetic dissmilarities derived from the dendrogram Cophenetic dissimilarity - inter group dissimilarity at which the observations are first joined together in the same cluster Clustering CSL465/603 - Machine Learning 58

Agglomerative Clustering (1) Single Linkage distance between two most similar points in G and H D n G, H = D(i, j) min " Œ,4 Also referred to as nearest neighbor linkage Results in extended clusters through chaining May violate the compactness property (large diameter) Clustering CSL465/603 - Machine Learning 59

Agglomerative Clustering (2) Complete Linkage distance between two most dissimilar points in G and H D Y G, H = D(i, j) max " Œ,4 Furthest neighbor technique Forces spherical clusters with consistent diameter May violate the closeness property Clustering CSL465/603 - Machine Learning 60

Agglomerative Clustering (3) Average Linkage (Group Average) average dissimilarity between the groups D ΠG, H = 1 6 6 d i, j N ΠN Less affected by outliers " Π4 Clustering CSL465/603 - Machine Learning 61

Agglomerative Clustering (4) Average Linkage Complete Linkage Single Linkage Clustering CSL465/603 - Machine Learning 62

Density-Based Clustering (1) (Extra Topic) DBSCAN Density Based Spatial Clustering of Applications with Noise Proposed by Ester, Kriegel, Sander and Xu (KDD 96) KDD 2014 Test of Time Award Winner Basic Idea Clusters are dense regions in the data space, separated by regions of lower object density Discovers clusters of arbitrary shape in spatial databases with noise Clustering CSL465/603 - Machine Learning 63

Density-Based Clustering (2) Why Density-Based Clustering? Results of a k-medoid algorithm for k=4 Clustering CSL465/603 - Machine Learning 64

Density-Based Clustering (3) Principle For any point in a cluster, the local point density around that point has to exceed some threshold The set of point from one cluster is spatially connected DBSCAN defines two parameters ε - radius for the neighborhood of point p: N p = q X d p, q ε MinPts minimum number of points in the given neighborhood N p Clustering CSL465/603 - Machine Learning 65

ε - Neighborhood ε - Neighborhood objects within a radius of ε from an object N p = q X d p, q ε High Density ε - Neighborhood of an object contains at least MinPts of objects ε q p ε Clustering CSL465/603 - Machine Learning 66

Core, Border and Outlier Points ε = 1 MinPts = 5 Border Core Outlier Given ε and MinPts categorize objects into three exclusive groups Core point if it has more than a specified number of points MinPts within εneighborhood (interior points of a cluster) Border point has fewer than MinPts within εneighborhood, but is in the neighborhood of a core point Noise/Outlier any point that is neither a core nor a border point Clustering CSL465/603 - Machine Learning 67

Density Reachability (1) Directly density-reachable An object q is directly density-reachable from object p if p is a core object and q is in p s ε-neighborhood. Density reachability is asymmetric ε q p ε MinPts = 4 Clustering CSL465/603 - Machine Learning 68

Density Reachability (2) Density-Reachable (directly and indirectly): A point p is directly density-reachable from p2; p2 is directly density-reachable from p1; p1 is directly density-reachable from q; p p2 p1 q form a chain. p is indirectly density reachable from q p 2 p q p 1 Clustering CSL465/603 - Machine Learning 69

Density - Connectivity Density-reachable is not symmetric Not good enough to describe clusters Density-Connected A pair of points p and q are density-connected, if they are commonly density-reachable from a point o. This is symmetric p q o Clustering CSL465/603 - Machine Learning 70

Cluster in DBSCAN Given a dataset X, parameter ε and threshold MinPts A cluster C is a subset of objects satisfying the criteria Connected - p, q C, p and q are density-connected Maximal - p, q X, if p C and q is density-reachable from p, then q C Clustering CSL465/603 - Machine Learning 71

DBSCAN - Algorithm Input Dataset X, Parameters - ε, MinPts For each object p X If p is a core object and not processed then C = retrieve all objects density reachable from p Mark all objects in C as processed Report C as a cluster Else mark p as outlier If p is a border point, no points are densityreachable from p and the DBSCAN algorithm visits the next point in X Clustering CSL465/603 - Machine Learning 72

DBSCAN Algorithm Illustration (1) ε = 2, MinPts = 3 For each object p X If p is a core object and not processed then C = retrieve all objects density reachable from p Mark all objects in C as processed Report C as a cluster Else mark p as outlier Clustering CSL465/603 - Machine Learning 73

DBSCAN Algorithm Illustration (2) ε = 2, MinPts = 3 For each object p X If p is a core object and not processed then C = retrieve all objects density reachable from p Mark all objects in C as processed Report C as a cluster Else mark p as outlier Clustering CSL465/603 - Machine Learning 74

DBSCAN Algorithm Illustration (3) ε = 2, MinPts = 3 For each object p X If p is a core object and not processed then C = retrieve all objects density reachable from p Mark all objects in C as processed Report C as a cluster Else mark p as outlier Clustering CSL465/603 - Machine Learning 75

DBSCAN Example (1) Where it works Original Points Clustering CSL465/603 - Machine Learning 76

DBSCAN Example (2) Where it does not work Varying densities Original points Clustering CSL465/603 - Machine Learning 77

Summary Unsupervised Learning K-means clustering Expectation Maximization for discovering the clusters K-medoids clustering Gaussian Mixture Models Expectation Maximization for estimating the parameters of the Gaussian mixtures Hierarchical Clustering Agglomerative Clustering Density Based Clustering DBSCAN Clustering CSL465/603 - Machine Learning 78