Hierarchical Clustering

Size: px

Start display at page:

Download "Hierarchical Clustering"

Alexander Holland
6 years ago
Views:

1 Hierarchical Clustering

2 Example for merging hierarchically

3 Merging Apples

4 Merging Oranges

5 Merging Strawberries

6 All together

7 Hierarchical Clustering In hierarchical clustering the data are not partitioned into a particular cluster in a single step. Instead, a series of partitions takes place, which may run from a single cluster containing all objects to n clusters each containing a single object.

8 Subdivisions Hierarchical Clustering is subdivided into agglomerative methods, which proceed by series of fusions of the n objects into groups, and divisive methods, which separate n objects successively into finer groupings. Agglomerative techniques are more commonly used.

9 Dendrogram Hierarchical clustering may be represented by a two dimensional diagram known as dendrogram which illustrates the fusions or divisions made at each successive stage of analysis. An example of such a dendrogram is given below: D I V I S I V E A G G L O M E R A T I V E

10 Strengths of Hierarchical Clustering No need to assume any particular number of clusters Any desired number of clusters can be obtained by cutting the dendogram at the proper level They may correspond to meaningful taxonomies Traditional hierarchical algorithms use a similarity or distance matrix to merge or split one cluster at a time

11 Agglomerative Clustering Algorithm More popular hierarchical clustering technique Basic algorithm is straightforward 1. Compute the proximity matrix 2. Let each data point be a cluster 3. Repeat 4. Merge the two closest clusters 5. Update the proximity matrix 6. Until only a single cluster remains Key operation is the computation of the proximity of two clusters Different approaches to defining the distance between clusters distinguish the different algorithms

Cluster Distance Measures Single link: smallest distance between an element in one cluster and an element in the other, i.e., dist(c i, C j ) = min{d(x ip, x jq )} Complete link: largest distance between an element in one cluster and an element in the other, i.

12 Cluster Distance Measures Single link: smallest distance between an element in one cluster and an element in the other, i.e., dist(c i, C j ) = min{d(x ip, x jq )} Complete link: largest distance between an element in one cluster and an element in the other, i.e., dist(c i, C j ) = max{d(x ip, x jq )} Average: avg distance between an element in one cluster and an element in the other, i.e., single link (min) complete link (max) average dist(c i, C j ) = avg{d(x ip, x jq )}

13 Working Example Given a data matrix, cluster using agglomerative algorithm Point X1 X2 A 1 1 B C 5 5 D 3 4 E 4 4 F 3 3.5

14 Working Example Distance Matrix is: A B C D E F A B C D E F

15 Working Example Merge the two closest clusters, which are D and F. And update the distance matrix. Using the Single Linkage metric, we get: d(d,f)->a = min(d(d,a), d(f,a)) = min(3.61, 3.20) = 3.20 d(d,f)->b = min(d(d,b), d(f,b)) = min(2.92, 2.50) = 2.50 d(d,f)->c = min(d(d,c), d(f,c)) = min(2.24, 2.50) = 2.24 d(d,f)->e = min(d(d,e), d(f,e)) = min(1.00, 1.12) = 1.00 The updated distance matrix is: A B C D, F E A B C D, F E

16 Working Example Next merging clusters are A and B since they have the least distance value. And update the distance matrix. Using the Single Linkage metric, we get: d(a,b)->c = min(d(a,c), d(a,b)) = min(5.66, 4.95) = 4.95 d(a,b)->(d,f) = min(d(a,d), d(a,f), d(b,d), d(b,f)) = min(3.61, 2.92, 3.20, 2.50) = 2.50 d(a,b)->e = min(d(a,e), d(b,e)) = min(4.24, 3.54) = 3.54 The updated distance matrix is: A,B C D, F E A,B C D, F E

17 Working Example Next merging clusters are (D,F) and E, since they have the least distance value. And update the distance matrix. Using the Single Linkage metric, we get: d(d,e,f)->(a,b) = 2.50 d(d,e,f)->c =1.41 The updated distance matrix is: A,B C ((D, F),E) A,B C ((D, F),E)

18 Working Example Next merging clusters are (D,E,F) and C, since they have the least distance value. And update the distance matrix. Using the Single Linkage metric, we get: d(c,d,e,f)->(a,b) = 2.50 The updated distance matrix is: A,B (((D, F),E),C) A,B (((D, F),E),C)

19 Working Example Since everything can be clustered as a single cluster, the algorithm is terminated. The final result is: D F E C A B

20 AGNES (Agglomerative Nesting) Introduced in Kaufmann and Rousseeuw (1990) Implemented in statistical analysis packages, e.g., Splus Use the Single-Link method and the dissimilarity matrix. Merge nodes that have the least dissimilarity Go on in a non-descending fashion Eventually all nodes belong to the same cluster

21 DIANA (Divisive Analysis) Introduced in Kaufmann and Rousseeuw (1990) Implemented in statistical analysis packages, e.g., Splus Inverse order of AGNES Eventually each node forms a cluster on its own

22 Weaknesses Major weakness of agglomerative clustering methods do not scale well: time complexity of at least O(n^2), where n is the number of total objects can never undo what was done previously sensitive to cluster distance measures

Clustering. Stephen Scott. CSCE 478/878 Lecture 8: Clustering. Stephen Scott. Introduction. Outline. Clustering.

Clustering. Stephen Scott. CSCE 478/878 Lecture 8: Clustering. Stephen Scott. Introduction. Outline. Clustering. 1 / 19 sscott@cse.unl.edu x1 If no label information is available, can still perform unsupervised learning Looking for structural information about instance space instead of label prediction function Approaches: