Outline. Administrivia and Introduction Course Structure Syllabus Introduction to Data Mining

Similar documents
Last time: PCA. Statistical Data Mining and Machine Learning Hilary Term Singular Value Decomposition (SVD) Eigendecomposition and PCA

Measurement, Scaling, and Dimensional Analysis Summer 2017 METRIC MDS IN R

Preprocessing & dimensionality reduction

14 Singular Value Decomposition

Revision: Chapter 1-6. Applied Multivariate Statistics Spring 2012

LECTURE NOTE #11 PROF. ALAN YUILLE

COMP6237 Data Mining Covariance, EVD, PCA & SVD. Jonathon Hare

Proximity data visualization with h-plots

Machine Learning - MT Clustering

LINGUIST 716 Week 9: Compuational methods for finding dimensions

15 Singular Value Decomposition

Linear Algebra (Review) Volker Tresp 2017

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Clusters. Unsupervised Learning. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Linear Algebra (Review) Volker Tresp 2018

Machine Learning (BSMC-GA 4439) Wenke Liu

Dimension Reduction and Low-dimensional Embedding

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Outline. Administrivia and Introduction Course Structure Syllabus Introduction to Data Mining

Unsupervised dimensionality reduction

L26: Advanced dimensionality reduction

LECTURE 16: PCA AND SVD

EDAMI DIMENSION REDUCTION BY PRINCIPAL COMPONENT ANALYSIS

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

Distance Preservation - Part I

Eigenvalues and diagonalization

Machine Learning - MT & 14. PCA and MDS

Linear Algebra & Geometry why is linear algebra useful in computer vision?

What is Principal Component Analysis?

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Statistical Machine Learning

Nonlinear Dimensionality Reduction

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach

PCA, Kernel PCA, ICA

Data Mining Lecture 4: Covariance, EVD, PCA & SVD

PCA and admixture models

Linear Dimensionality Reduction

Principal Components Analysis. Sargur Srihari University at Buffalo

Principal Component Analysis (PCA)

7 Principal Component Analysis

NP Recursion Over Time Supporting Material 2: Phylogenetic Analyses

Principal Component Analysis. Applied Multivariate Statistics Spring 2012

CSE 554 Lecture 7: Alignment

Data Mining II. Prof. Dr. Karsten Borgwardt, Department Biosystems, ETH Zürich. Basel, Spring Semester 2016 D-BSSE

CS47300: Web Information Search and Management

STAT 730 Chapter 14: Multidimensional scaling

Dublin City University (DCU) GCS/GCSE Recognised Subjects

Lecture 5: Ecological distance metrics; Principal Coordinates Analysis. Univariate testing vs. community analysis

DIMENSION REDUCTION AND CLUSTER ANALYSIS

Linear Algebra Review

Freeman (2005) - Graphic Techniques for Exploring Social Network Data

Discriminative Direction for Kernel Classifiers

Probabilistic Latent Semantic Analysis

MLCC 2015 Dimensionality Reduction and PCA

Statistical Pattern Recognition

Principal Component Analysis (PCA) Theory, Practice, and Examples

EVD & SVD EVD & SVD. Durgesh Kumar. June 10, 2016

Lecture 6 Sept Data Visualization STAT 442 / 890, CM 462

Statistical Geometry Processing Winter Semester 2011/2012

Signal Analysis. Principal Component Analysis

CS540 Machine learning Lecture 5

Principal Component Analysis

Properties of Matrices and Operations on Matrices

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Singular Value Decomposition

Introduction to Machine Learning

ISSN: (Online) Volume 3, Issue 5, May 2015 International Journal of Advance Research in Computer Science and Management Studies

Large Scale Data Analysis Using Deep Learning

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Unsupervised Learning Basics

Linear Algebra for Machine Learning. Sargur N. Srihari

Unsupervised learning: beyond simple clustering and PCA

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

Singular Value Decomposition and Principal Component Analysis (PCA) I

1 Linearity and Linear Systems

Nonlinear Manifold Learning Summary

Linear Algebra Review. Vectors

Lecture 5 Singular value decomposition

Apprentissage non supervisée

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

High Dimensional Covariance and Precision Matrix Estimation

Multivariate Statistical Analysis

Overview of clustering analysis. Yuehua Cui

An Efficient Sequential Monte Carlo Algorithm for Coalescent Clustering

Dimensionality Reduc1on

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Advanced Machine Learning & Perception

3D Computer Vision - WT 2004

Principal Components Analysis (PCA)

Global (ISOMAP) versus Local (LLE) Methods in Nonlinear Dimensionality Reduction

Spherical Euclidean Distance Embedding of a Graph

CS 143 Linear Algebra Review

CS60021: Scalable Data Mining. Dimensionality Reduction

Collaborative Filtering: A Machine Learning Perspective

Lecture 3: Review of Linear Algebra

Functional Analysis Review

Lecture 3: Review of Linear Algebra

Ridge Regression. Flachs, Munkholt og Skotte. May 4, 2009

Transcription:

Outline Administrivia and Introduction Course Structure Syllabus Introduction to Data Mining Dimensionality Reduction Introduction Principal Components Analysis Singular Value Decomposition Multidimensional Scaling Isomap Clustering Introduction Hierarchical Clustering K-means Vector Quantisation Probabilistic Methods

Multidimensional Scaling (MDS) MDS is a class of methods based on representing high-dimensional data in a lower dimensional space so that inter-point distances are preserved as best as possible. MDS effectively squeezes a high-dimensional cloud of points into a smaller number of dimensions, generally 2 or 3. Given x 1,...,x n R p, we can obtain a matrix of pairwise distances D with entries d ij = d(x i, x j ) using some measure of dissimilarity d. For example Euclidean distance d ij = x i x j 2. In most applications, only D is available. MDS finds representations z 1,...,z n R k such that d(x i, x j ) d(z i, z j ), where d represents dissimilarity in the original p-dimensional space and d represents dissimilarity in the reduced k-dimensional space. The best values of z i are chosen to minimise some stress function.

Metric vs Non-Metric Stress Functions Metric Where closeness is considered geometrically, Euclidean distance d ij = x i x j 2 is commonly measured with the classical stress function S metric (d ij, d ij )= i=j dij d ij 2 Non-Metric Sometimes it is more important to retain the ordering of d ij as good as possible rather than the actual values assigned. Non-metric stress functions have been developed for ordered distances 1 2 S non-metric (d ij, d ij )=min g monotone d i=j ij 2 i=j g(dij ) d ij

Solving the Metric MDS Problem Suppose we only have an n n matrix of Euclidean distances D =(d ij ) but not the points X themselves. The Classical MDS problem is to find a configuration of n points in p-dimensional space that yields the same Euclidean distance matrix as X. Infinitely many solutions exist as the distance matrix is invariant to rigid motions (rotations, reflections and translations). As distances are Euclidean, can write d ij = x i x j 2 for some points x 1,...,x n R p, where d 2 ij = x i x j 2 2 = (x i x j )(x i x j ) = x i x i + x j x j 2x i x j (1)

Solving the Metric MDS Problem We define matrix B with entries b ij = x i x j, we can compute D from B but also B from D. From this, it is possible to recover a configuration which solves this problem. Writing (1) in terms of b ij, we have d 2 ij = b ii + b jj 2b ij (2) If two configurations of n objects in p-dimensional space have identical matrix B = XX, then they also share the same distance matrix D. We can also compute b ij in terms of d ij assuming i x i = 0 (problem sheet).

Solving the Metric MDS Problem If two configurations of n objects in p-dimensional space have identical matrix B = XX, then they also share the same distance matrix D. Considering the eigendecomposition of B, we see that B = XX = ULU for some orthogonal matrix U with columns U =(u 1,...,u n ) and diagonal matrix L with entries λ 1,...,λ n. 1 So if n > p we can write X =[ λ 1 U 1,..., λ p U p ] i.e. we have found a p-dimensional configuration of n points X with the same distance matrix D as X. 1 If X = UDV is again the SVD of X, then XX = UDD U. The matrix U is thus the same in the EVD of XX and the n n-matrix L = DD has the same diagonal entries as the p p-matrix Λ=D D in the SVD of X X.

MDS Example: US City Flight Distances We present a table of flying mileages between 10 American cities, distances calculated from our 2-dimensional world. Using D as the starting point, metric MDS finds a configuration with the same distance matrix. ATLA CHIG DENV HOUS LA MIAM NY SF SEAT DC 0 587 1212 701 1936 604 748 2139 2182 543 587 0 920 940 1745 1188 713 1858 1737 597 1212 920 0 879 831 1726 1631 949 1021 1494 701 940 879 0 1374 968 1420 1645 1891 1220 1936 1745 831 1374 0 2339 2451 347 959 2300 604 1188 1726 968 2339 0 1092 2594 2734 923 748 713 1631 1420 2451 1092 0 2571 2408 205 2139 1858 949 1645 347 2594 2571 0 678 2442 2182 1737 1021 1891 959 2734 2408 678 0 2329 543 597 1494 1220 2300 923 205 2442 2329 0

MDS Example: US City Flight Distances library(mass) us <- read.csv("http://www.stats.ox.ac.uk/ ~teh/teaching/datamining/data/uscities.csv") ## use the classical stress function ## to find lower dimensional views of the data ## recover X in 2 dimensions us.classical <- cmdscale(d=us,k=2) plot(us.classical) text(us.classical,labels=names(us))

MDS Example: US City Flight Distances 1000 500 0 500 1000 MIAMI NY DC ATLANTA CHICAGO HOUSTON DENVER LA SF SEATTLE 1000 500 0 500 1000

Lower-dimensional Reconstructions Having managed to reconstruct a set of p-dimensional points with the same distance matrix D, we would like to find lower dimensional representations which minimise the stress function S metric. If the SVD of X is given by X = UDV, then B = XX = UDD U = ULU Generally the representation of X (chosen so that X and X have the same distance matrix) can be written as X =[ λ 1 U 1,..., λ r U r ] where r is the rank of B. Setting the smallest eigenvalues to zero reveals the best k-dimensional view of the data (where k is the number of non-zero eigenvalues), minimizing the stress function (proof not given). This is analogous to PCA, where the smallest eigenvalues of X X are effectively suppressed. Indeed, both PCA and MDS under Euclidean distance are dual and yield effectively the same result (yet MDS can also be applied to distance matrices not generated under Euclidean distance measure).

MDS Example: Virus Data A data set on 39 viruses with rod-shaped particles affecting various crops (tobacco, tomato, cucumber and others), described by Fauquet et al. (1988). These are Tobamoviruses with monopartite genomes spread by contact. There are 18 measurements on each virus, the number of amino acid residues per molecule of coat protein; the data come from a total of 26 sources. We want to investigate whether there are subgroups within this group of viruses.

MDS Example: Virus Data Metric Scaling Kruskal s MDS!2!1 0 1 2 3 35 38 37 32 33 6 34 5 36 2 12 21 22 31 19 13 17 30 20 18 41 28 1516 29 3 14 238 9 24 7 27 10 25 11 26!4!2 0 2 4 35 12 36 2 38 37 32 33 21 634 5 18 30 22 17 4 2913 14 1516 28 31 3 1920 23824 9 1 27 10 25 11 26 7!4!2 0 2 4!5 0 5 Distance-based representations of the Tobamovirus group of viruses (the variables were scaled before Euclidean distance was used).

MDS Example: Virus Data MDS reveals some clear subgroups within the Tobamoviruses. Viruses 7 (cucumber green mottle mosaic virus) and 21 (pepper mild mottle virus) have been clearly separated from the other viruses in the non-metric MDS plot, which is not the case in the metric version. Ripley (1996) states that the non-metric MDS plot shows interpretable groupings. The upper right is the cucumber green mottle virus, the upper left is the ribgrass mosaic virus. The one group of viruses at the bottom, namely 8,9,19,20,23,24, are the tobacco mild green mosaic and odontoglossum ringspot viruses.

Example: Crabs data library(mass) Crabs <- crabs[,4:8] Crabs.class <- factor(paste(crabs[,1],crabs[,2],sep="")) crabsmds <- cmdscale(d= dist(crabs),k=2) plot(crabsmds, pch=20, cex=2)

Example: Crabs data First two MDS components. 30 20 10 0 10 20 3 2 1 0 1 2 MDS 1 MDS 2

Example: Crabs data With grouping information. 30 20 10 0 10 20 3 2 1 0 1 2 MDS 1 MDS 2

Example: Crabs data Compare with previous PCA analysis. MDS solution corresponds to the first 2 PCs as metric scaling was used. 2 1 0 1 2 3 1.0 0.5 0.0 0.5 1.0 Comp.1 20 10 0 10 20 30 2 1 0 1 2 3 Comp.2 Comp.3 2 1 0 1 2 1.0 0.5 0.0 0.5 1.0 Comp.4 Comp.5 0.5 0.0 0.5 20 10 0 10 20 30 2 1 0 1 2 0.5 0.0 0.5

Example: Crabs data Use Kruskals non-metric multi-dimensional scaling instead. crabsmds <- isomds(d= dist(crabs),k=2) plot(crabsmds$points, pch=20, cex=2) 30 20 10 0 10 20 3 2 1 0 1 2 MDS 1 MDS 2

Example: Crabs data With grouping information. 30 20 10 0 10 20 3 2 1 0 1 2 MDS 1 MDS 2

Example: Language data Presence or absence of 2867 homologous traits in 87 Indo-European languages. > X[1:15,1:16] V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 Irish_A 0 0 0 0 1 0 0 0 0 0 0 0 0 Irish_B 0 0 0 0 1 0 0 0 0 0 0 0 0 Welsh_N 0 0 0 1 0 0 0 0 0 0 0 0 0 Welsh_C 0 0 0 1 0 0 0 0 0 0 0 0 0 Breton_List 0 0 0 0 1 0 0 0 0 0 0 0 0 Breton_SE 0 0 0 0 1 0 0 0 0 0 0 0 0 Breton_ST 0 0 0 0 1 0 0 0 0 0 0 0 0 Romanian_List 0 1 0 0 0 0 0 0 0 0 0 0 0 Vlach 0 1 0 0 0 0 0 0 0 0 0 0 0 Italian 0 1 0 0 0 0 0 0 0 0 0 0 0 Ladin 0 1 0 0 0 0 0 0 0 0 0 0 0 Provencal 0 1 0 0 0 0 0 0 0 0 0 0 0 French 0 1 0 0 0 0 0 0 0 0 0 0 0 Walloon 0 1 0 0 0 0 0 0 0 0 0 0 0 French_Creole_C 0 1 0 0 0 0 0 0 0 0 0 0 0

Example: Language data Using MDS with non-metric scaling.!0.6!0.4!0.2 0.0 0.2 0.4 0.6 Albanian_T Albanian_Top Albanian_G HITTITE Albanian_K Albanian_C Bengali Panjabi_ST Lahnda Gypsy_Gk Ossetic Gujarati Hindi Tadzik Persian_List Waziri Nepali_List Marathi Singhalese Khaskura Kashmiri Afghan Armenian_List Baluchi Ukrainian Macedonian Bulgarian Czech_E Czech Byelorussian Slovak Lithuanian_O TOCHARIAN_B Lithuanian_ST Latvian TOCHARIAN_A Greek_MDGreek_K Greek_ML Greek_D Greek_Mod Lusatian_U Serbocroatian Lusatian_L Polish Russian Slovenian Frisian Armenian_Mod Sardinian_N Ladin Sardinian_C Vlach Dutch_List Flemish Danish Afrikaans Riksmal German_ST Swedish_List Penn_Dutch Faroese Icelandic_ST Swedish_VL Swedish_Up English_ST Takitaki Sardinian_L Italian Spanish Portugue Brazilian Romanian_List Irish_B Irish_A Catalan Wakhi Breton_SE Breton_List Breton_ST Welsh_C Welsh_N Provencal Frenc French Fren Walloon!0.5 0.0 0.5