Clustering. Stephen Scott. CSCE 478/878 Lecture 8: Clustering. Stephen Scott. Introduction. Outline. Clustering.

Similar documents
Data Preprocessing. Cluster Similarity

University of Florida CISE department Gator Engineering. Clustering Part 1

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

Clustering. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein. Some slides adapted from Jacques van Helden

Clustering. Genome 373 Genomic Informatics Elhanan Borenstein. Some slides adapted from Jacques van Helden

Hierarchical Clustering

Machine Learning - MT Clustering

Clustering. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein. Some slides adapted from Jacques van Helden

Clustering. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein. Some slides adapted from Jacques van Helden

Multivariate Statistics

Machine Learning for Data Science (CS4786) Lecture 8

Data Exploration and Unsupervised Learning with Clustering

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Machine Learning for Data Science (CS4786) Lecture 2

More on Unsupervised Learning

STATS 306B: Unsupervised Learning Spring Lecture 5 April 14

Multivariate Analysis

Clustering. Léon Bottou COS 424 3/4/2010. NEC Labs America

Clusters. Unsupervised Learning. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Gaussian Mixture Models

CSE446: Clustering and EM Spring 2017

Multisurface Proximal Support Vector Machine Classification via Generalized Eigenvalues

Clustering and Gaussian Mixture Models

Unsupervised machine learning

Modern Information Retrieval

Hierarchical Clustering

Data Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining

Statistical Machine Learning

CS534 Machine Learning - Spring Final Exam

k-means clustering mark = which(md == min(md)) nearest[i] = ifelse(mark <= 5, "blue", "orange")}

Chapter 5: Microarray Techniques

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Unsupervised Learning. k-means Algorithm

Clustering using Mixture Models

Multivariate Statistics: Hierarchical and k-means cluster analysis

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

Computer Vision Group Prof. Daniel Cremers. 14. Clustering

Clustering VS Classification

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Machine Learning. Clustering 1. Hamid Beigy. Sharif University of Technology. Fall 1395

Module Master Recherche Apprentissage et Fouille

Cheng Soon Ong & Christian Walder. Canberra February June 2018

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Probabilistic & Unsupervised Learning

An introduction to clustering techniques

Expectation Maximization (EM)

Unsupervised Learning: K-Means, Gaussian Mixture Models

Part I. Linear regression & LASSO. Linear Regression. Linear Regression. Week 10 Based in part on slides from textbook, slides of Susan Holmes

CSC 2541: Bayesian Methods for Machine Learning

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

Clustering with k-means and Gaussian mixture distributions

Lecture 2: Linear Algebra Review

Pattern recognition. "To understand is to perceive patterns" Sir Isaiah Berlin, Russian philosopher

Introduction to clustering methods for gene expression data analysis

Machine Learning 2017

Data Mining. Preamble: Control Application. Industrial Researcher s Approach. Practitioner s Approach. Example. Example. Goal: Maintain T ~Td

Data Mining Techniques

Learning Decision Trees

Handling imprecise and uncertain class labels in classification and clustering

Applying cluster analysis to 2011 Census local authority data

Lecture 12 : Graph Laplacians and Cheeger s Inequality

Microarray Data Analysis: Discovery

Introduction to clustering methods for gene expression data analysis

Regression Clustering

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

Lecture 5: Clustering, Linear Regression

Decision Tree Learning Lecture 2

Unsupervised Classification via Convex Absolute Value Inequalities

Lecture 5: Clustering, Linear Regression

K-means. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. November 19 th, Carlos Guestrin 1

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Lecture 5: Clustering, Linear Regression

Measurement and Data. Topics: Types of Data Distance Measurement Data Transformation Forms of Data Data Quality

CS249: ADVANCED DATA MINING

Advanced Statistical Methods: Beyond Linear Regression

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation

CS6220: DATA MINING TECHNIQUES

Quantum Clustering and its Application to Asteroid Spectral Taxonomy

Logic and machine learning review. CS 540 Yingyu Liang

Zhongyi Xiao. Correlation. In probability theory and statistics, correlation indicates the

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

A Bayesian Method for Guessing the Extreme Values in a Data Set

A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data

Statistics 202: Data Mining. c Jonathan Taylor. Model-based clustering Based in part on slides from textbook, slides of Susan Holmes.

Measurement and Data

Clustering Oligarchies

Cluster Validity. Oct. 28, Cluster Validity 10/14/ Erin Wirch & Wenbo Wang. Outline. Hypothesis Testing. Relative Criteria.

Data Mining Techniques

Neural Networks Lecture 4: Radial Bases Function Networks

The Perceptron algorithm

Clustering with k-means and Gaussian mixture distributions

Notes on Machine Learning for and

Statistical Significance for Hierarchical Clustering

Clustering with k-means and Gaussian mixture distributions

Learning from Examples

Transcription:

1 / 19 sscott@cse.unl.edu

x1 If no label information is available, can still perform unsupervised learning Looking for structural information about instance space instead of label prediction function Approaches: density estimation, clustering, dimensionality reduction algorithms group similar instances together based on a similarity measure x1 Algorithm 2 / 19 x2 x2

3 / 19 background Similarity/dissimilarity measures k-means clustering clustering

Background Goal: Place patterns into sensible clusters that reveal similarities and differences Definition of sensible depends on application Measures: Point-Point Measures: Point-Set Measures: Set-Set 4 / 19 (a) How they bear young (c) Environment (b) Existence of lungs (d) Both (a) & (b)

Background (cont d) Measures: Point-Point Measures: Point-Set Measures: Set-Set Types of clustering problems: Hard (crisp): partition data into non-overlapping clusters; each instance belongs in exactly one cluster Fuzzy: Each instance could be a member of multiple clusters, with a real-valued function indicating the degree of membership : partition instances into numerous small clusters, then group the clusters into larger ones, and so on (applicable to phylogeny) End up with a tree with instances at leaves 5 / 19

Background (Dis-)similarity Measures: Between Instances Measures: Point-Point Measures: Point-Set Measures: Set-Set Dissimilarity measure: Weighted L p norm: ( n ) 1/p L p (x, y) = w i x i y i p i=1 Special cases include weighted Euclidian distance (p = 2), weighted Manhattan distance L 1 (x, y) = and weighted L norm n w i x i y i, i=1 L (x, y) = max 1 i n {w i x i y i } 6 / 19 Similarity measure: Dot product between two vectors (kernel)

Background (Dis-)similarity Measures: Between Instances (cont d) Measures: Point-Point Measures: Point-Set Measures: Set-Set If attributes come from {0,..., k 1}, can use measures for real-valued attributes, plus: Hamming distance: DM measuring number of places where x and y differ Tanimoto measure: SM measuring number of places where x and y are same, divided by total number of places Ignore places i where x i = y i = 0 Useful for ordinal features where x i is degree to which x possesses ith feature 7 / 19

8 / 19 Background (Dis-)similarity Measures: Between Instance and Set Measures: Point-Point Measures: Point-Set Measures: Set-Set Might want to measure proximity of point x to existing cluster C Can measure proximity α by using all points of C or by using a representative of C If all points of C used, common choices: max(x, C) = max {α(x, y)} α ps α ps min y C (x, C) = min {α(x, y)} y C α ps avg(x, C) = 1 C α(x, y), y C where α(x, y) is any measure between x and y

Background (Dis-)similarity Measures: Between Instance and Set (cont d) Measures: Point-Point Measures: Point-Set Measures: Set-Set Alternative: Measure distance between point x and a representative of the cluster C Mean vector m p = 1 C Mean center m c C: y y C y C d(m c, y) y C d(z, y) z C, where d(, ) is DM (if SM used, reverse ineq.) Median center: For each point y C, find median dissimilarity from y to all other points of C, then take min; so m med C is defined as med y C {d(m med, y)} med y C {d(z, y)} z C 9 / 19 Now can measure proximity between C s representative and x with standard measures

10 / 19 Background (Dis-)similarity Measures: Between Sets Measures: Point-Point Measures: Point-Set Measures: Set-Set Given sets of instances C i and C j and proximity measure α(, ) Max: αmax(c ss i, C j ) = max {α(x, y)} x C i,y C j Min: αmin(c ss i, C j ) = min {α(x, y)} x C i,y C j Average: α ss avg(c i, C j ) = 1 C i C j x C i y C j α(x, y) Representative (mean): α ss mean(c i, C j ) = α(m Ci, m Cj ),

11 / 19 Algorithm Very popular clustering algorithm Represents cluster i (out of k total) by specifying its representative m i (not necessarily part of the original set of instances X ) Each instance x X is assigned to the cluster with nearest representative Goal is to find a set of k representatives such that sum of distances between instances and their representatives is minimized NP-hard in general Will use an algorithm that alternates between determining representatives and assigning clusters until convergence (in the style of the EM algorithm)

12 / 19 Algorithm Algorithm Choose value for parameter k Initialize k arbitrary representatives m 1,..., m k E.g., k randomly selected instances from X Repeat until representatives m 1,..., m k don t change 1 For all x X Assign x to cluster C j such that x m j (or other measure) is minimized I.e., nearest representative 2 For each j {1,..., k} m j = 1 C j y C j y

with k = 2 x 2 20 10 0 k means: Initial x 2 20 10 0 After 1 iteration 10 10 20 20 Algorithm x 2 30 40 20 0 20 40 x 1 20 10 0 10 After 2 iterations x 2 30 40 20 0 20 40 x 1 20 10 0 10 After 3 iterations 20 20 13 / 19 30 40 20 0 20 40 x 1 30 40 20 0 20 40 x 1

14 / 19 Definitions Pseudocode Useful in capturing hierarchical relationships, e.g., evolutionary tree of biological sequences End result is a sequence (hierarchy) of clusterings Two types of algorithms: Agglomerative: Repeatedly merge two clusters into one Divisive: Repeatedly divide one cluster into two

15 / 19 Definitions Definitions Pseudocode Let C t = {C 1,..., C mt } be a level-t clustering of X = {x 1,..., x N }, where C t meets definition of hard clustering C t is nested in C t (written C t C t ) if each cluster in C t is a subset of a cluster in C t and at least one cluster in C t is a proper subset of some cluster in C t C 1 = {{x 1, x 3 }, {x 4 }, {x 2, x 5 }} {{x 1, x 3, x 4 }, {x 2, x 5 }} C 1 {{x 1, x 4 }, {x 3 }, {x 2, x 5 }}

Definitions (cont d) Definitions Pseudocode Agglomerative algorithms start with C 0 = {{x 1 },..., {x N }} and at each step t merge two clusters into one, yielding C t+1 = C t 1 and C t C t+1 At final step (step N 1) have hierarchy: C 0 = {{x 1 },..., {x N }} C 1 C N 1 = {{x 1,..., x N }} Divisive algorithms start with C 0 = {{x 1,..., x N }} and at each step t split one cluster into two, yielding C t+1 = C t + 1 and C t+1 C t At step N 1 have hierarchy: C N 1 = {{x 1 },..., {x N }} C 0 = {{x 1,..., x N }} 16 / 19

17 / 19 Pseudocode Definitions Pseudocode 1 Initialize C 0 = {{x 1 },..., {x N }}, t = 0 2 For t = 1 to N 1 Find closest pair of clusters: (C i, C j ) = argmin C s,c r C t 1,r s {d (C s, C r )} C t = (C t 1 {C i, C j }) {{C i C j }} and update representatives if necessary If SM used, replace argmin with argmax Number of calls to d (C k, C r ) is Θ ( N 3)

18 / 19 Definitions Pseudocode x 1 = [1, 1] T, x 2 = [2, 1] T, x 3 = [5, 4] T, x 4 = [6, 5] T, x 5 = [6.5, 6] T, DM = Euclidian/α ss min An (N t) (N t) proximity matrix P t gives the proximity between all pairs of clusters at level (iteration) t 0 1 5 6.4 7.4 1 0 4.2 5.7 6.7 P 0 = 5 4.2 0 1.4 2.5 6.4 5.7 1.4 0 1.1 7.4 6.7 2.5 1.1 0 Each iteration, find minimum off-diagonal element (i, j) in P t 1, merge clusters i and j, remove rows/columns i and j from P t 1, and add new row/column for new cluster to get P t

Pseudocode (cont d) A proximity dendogram is a tree that indicates hierarchy of clusterings, including the proximity between two clusters when they are merged Definitions Pseudocode 19 / 19 Cutting the dendogram at any level yields a single clustering