Bipartite spectral graph partitioning to co-cluster varieties and sound correspondences
|
|
- Rudolf Flynn
- 5 years ago
- Views:
Transcription
1 Bipartite spectral graph partitioning to co-cluster varieties and sound correspondences Martijn Wieling Department of Computational Linguistics, University of Groningen Seminar in Methodology and Statistics - May 20, 2009 Martijn Wieling 1/32
2 Goal Making the title of this presentation understandable! Bipartite spectral graph partitioning to co-cluster varieties and sound correspondences Martijn Wieling 2/32
3 Overview Why co-clustering? Method Introduction to eigenvalues and eigenvectors Simple clustering Co-clustering Complete dataset Results Conclusions Martijn Wieling 3/32
4 Why co-clustering? Research interest: language and dialectal variation Important method: cluster similar (dialectal) varieties together Problem: clustering varieties does not yield a linguistic basis Previous solutions: investigate sound correspondences post hoc (e.g., Heeringa, 2004) Co-clustering: clusters varieties and sound correspondences simultaneously Eigenvalues and eigenvectors are central in this approach Martijn Wieling 4/32
5 Graphs and matrices A graph is a set of vertices connected with edges: A graph can also be represented by its adjacency matrix A A B C D A B C D Martijn Wieling 5/32
6 Eigenvalues and eigenvectors The eigenvalues λ and the eigenvectors x of a square matrix A are defined as follows: Ax = λx [ (A λi)x = 0] In matrix-form: [ a11 λ a 12 ] [ x1 a 22 λ a 21 x 2 ] = [ 0 0 ] This is solved when: (a 11 λ)x 1 + a 12 x 2 = 0 a 21 x 1 + (a 22 λ)x 2 = 0 Martijn Wieling 6/32
7 Eigenvalues and eigenvectors The eigenvalues λ and the eigenvectors x of a square matrix A are defined as follows: Ax = λx [ (A λi)x = 0] In matrix-form: [ a11 λ a 12 ] [ x1 a 22 λ a 21 x 2 ] = [ 0 0 ] This is solved when: (a 11 λ)x 1 + a 12 x 2 = 0 a 21 x 1 + (a 22 λ)x 2 = 0 Martijn Wieling 6/32
8 Eigenvalues and eigenvectors The eigenvalues λ and the eigenvectors x of a square matrix A are defined as follows: Ax = λx [ (A λi)x = 0] In matrix-form: [ a11 λ a 12 ] [ x1 a 22 λ a 21 x 2 ] = [ 0 0 ] This is solved when: (a 11 λ)x 1 + a 12 x 2 = 0 a 21 x 1 + (a 22 λ)x 2 = 0 Martijn Wieling 6/32
9 Example of calculating eigenvalues and eigenvectors Consider the following example: A = Using (A λi)x = 0 we get: [ 1 λ 2 ] [ x1 2 1 λ x 2 [ ] ] = [ 0 0 ] Solved when det(a) = 0: (1 λ) 2 4 = 0 Using λ 1 = 3 and λ 2 = 1 we obtain x = [ ] [ ] 1 1 and x = 1 1 Martijn Wieling 7/32
10 Example of calculating eigenvalues and eigenvectors Consider the following example: A = Using (A λi)x = 0 we get: [ 1 λ 2 ] [ x1 2 1 λ x 2 [ ] ] = [ 0 0 ] Solved when det(a) = 0: (1 λ) 2 4 = 0 Using λ 1 = 3 and λ 2 = 1 we obtain x = [ ] [ ] 1 1 and x = 1 1 Martijn Wieling 7/32
11 Example of calculating eigenvalues and eigenvectors Consider the following example: A = Using (A λi)x = 0 we get: [ 1 λ 2 ] [ x1 2 1 λ x 2 [ ] ] = [ 0 0 ] Solved when det(a) = 0: (1 λ) 2 4 = 0 Using λ 1 = 3 and λ 2 = 1 we obtain x = [ ] [ ] 1 1 and x = 1 1 Martijn Wieling 7/32
12 Example of calculating eigenvalues and eigenvectors Consider the following example: A = Using (A λi)x = 0 we get: [ 1 λ 2 ] [ x1 2 1 λ x 2 [ ] ] = [ 0 0 ] Solved when det(a) = 0: (1 λ) 2 4 = 0 Using λ 1 = 3 and λ 2 = 1 we obtain x = [ ] [ ] 1 1 and x = 1 1 Martijn Wieling 7/32
13 Spectrum of a graph The spectrum of a graph are the eigenvalues of the adjacency matrix A of the graph The spectrum is considered to capture important structural properties of a graph (Chung, 1997) Some interesting applications of eigenvalues and eigenvectors: Principal Component Analysis (PCA; Duda et al., 2001: ) Pagerank (Google; Brin and Page, 1998) Partitioning (i.e. clustering; Von Luxburg, 2007) Martijn Wieling 8/32
14 Example of spectral graph clustering (1/8) Consider the matrix A with sound correspondences: In graph-form: [a]/[i] [2]/[i] [r]/[x] [k]/[x] [r]/[ö] [r]/[k] [a]/[i] [2]/[i] [r]/[x] [k]/[x] [r]/[ö] [r]/[k] Martijn Wieling 9/32
15 Example of spectral graph clustering (2/8) To partition this graph, we have to determine the optimal cut: The optimal cut yielding balanced clusters is obtained by finding the eigenvectors of the normalized Laplacian: L n = D 1 L, with L = D A and D the degree matrix of A (Shi and Malik, 2000; Von Luxburg, 2007). Martijn Wieling 10/32
16 Example of spectral graph clustering (3/8) The adjacency matrix A: [a]/[i] [2]/[i] [r]/[x] [k]/[x] [r]/[ö] [r]/[k] [a]/[i] [2]/[i] [r]/[x] [k]/[x] [r]/[ö] [r]/[k] Martijn Wieling 11/32
17 Example of spectral graph clustering (4/8) The Laplacian matrix L: [a]/[i] [2]/[i] [r]/[x] [k]/[x] [r]/[ö] [r]/[k] [a]/[i] [2]/[i] [r]/[x] [k]/[x] [r]/[ö] [r]/[k] Martijn Wieling 12/32
18 Example of spectral graph clustering (5/8) The normalized Laplacian matrix L n : [a]/[i] [2]/[i] [r]/[x] [k]/[x] [r]/[ö] [r]/[k] [a]/[i] [2]/[i] [r]/[x] [k]/[x] [r]/[ö] [r]/[k] Martijn Wieling 13/32
19 Example of spectral graph clustering (6/8) The eigenvalues λ and eigenvectors x of L n (i.e. L n x = λx): λ 1 = 0 with x = [ ] T λ 2 = 0.21 with x = [ ] T λ 3 = 1.17 with x = [ ] T... The first (smallest) eigenvector does not yield clustering information. Does the second? Martijn Wieling 14/32
20 Example of spectral graph clustering (6/8) The eigenvalues λ and eigenvectors x of L n (i.e. L n x = λx): λ 1 = 0 with x = [ ] T λ 2 = 0.21 with x = [ ] T λ 3 = 1.17 with x = [ ] T... The first (smallest) eigenvector does not yield clustering information. Does the second? Yes! Martijn Wieling 14/32
21 Example of spectral graph clustering (7/8) If we use the k-means algorithm (i.e. minimize the within-cluster sum of squares; Lloyd, 1982) to cluster the eigenvector in two groups we obtain the following partitioning: To cluster in k > 2 groups we use the second to k (smallest) eigenvectors Martijn Wieling 15/32
22 Example of spectral graph clustering (8/8) To cluster in k = 3 groups, we use: λ 2 = 0.21 with x = [ ] T λ 3 = 1.17 with x = [ ] T We obtain the following clustering: Martijn Wieling 16/32
23 Example of spectral graph clustering (8/8) To cluster in k = 3 groups, we use: λ 2 = 0.21 with x = [ ] T λ 3 = 1.17 with x = [ ] T We obtain the following clustering: Martijn Wieling 16/32
24 Bipartite graphs A bipartite graph is a graph whose vertices can be divided in two disjoint sets where every edge connects a vertex from one set to a vertex in another set. Vertices within a set are not connected. A matrix representation of a bipartite graph: [a]/[i] [2]/[i] [r]/[x] [k]/[x] [r]/[ö] [r]/[k] Appelscha Oudega Zoutkamp Kerkrade Appelscha Martijn Wieling 17/32
25 Example of co-clustering a biparte graph (1/6) The (naive) co-clustering procedure is equal to clustering in one dimension (i.e. cluster eigenvector(s) of normalized Laplacian) Consider the following graph: Martijn Wieling 18/32
26 Example of co-clustering a biparte graph (2/6) The adjacency matrix A: appelscha oudega zoutkamp kerkrade vaals [a]/[i] [2]/[i] [r]/[x] [k]/[x] [r]/[ö] [r]/[k] appelscha oudega zoutkamp kerkrade appelscha [a]/[i] [2]/[i] [r]/[x] [k]/[x] [r]/[ö] [r]/[k] Martijn Wieling 19/32
27 Example of co-clustering a biparte graph (3/6) The Laplacian matrix L: appelscha oudega zoutkamp kerkrade vaals [a]/[i] [2]/[i] [r]/[x] [k]/[x] [r]/[ö] [r]/[k] appelscha oudega zoutkamp kerkrade appelscha [a]/[i] [2]/[i] [r]/[x] [k]/[x] [r]/[ö] [r]/[k] Martijn Wieling 20/32
28 Example of co-clustering a biparte graph (4/6) The normalized Laplacian matrix L n : appelscha oudega zoutkamp kerkrade vaals [a]/[i] [2]/[i] [r]/[x] [k]/[x] [r]/[ö] [r]/[k] appelscha oudega zoutkamp kerkrade appelscha [a]/[i] [2]/[i] [r]/[x] [k]/[x] [r]/[ö] [r]/[k] Martijn Wieling 21/32
29 Example of co-clustering a biparte graph (5/6) To cluster in k = 2 groups, we use: λ 2 =.057, x = [ ] T Martijn Wieling 22/32
30 Example of co-clustering a biparte graph (5/6) To cluster in k = 2 groups, we use: λ2 =.057, x = [ ]T We obtain the following co-clustering: Martijn Wieling 22/32
31 Example of co-clustering a biparte graph (6/6) To cluster in k = 3 groups, we use: λ 2 =.057, x = [ ] T λ 3 =.53, x = [ ] T Martijn Wieling 23/32
32 Example of co-clustering a biparte graph (6/6) To cluster in k = 3 groups, we use: λ2 =.057, x = [ ]T λ3 =.53, x = [ ]T We obtain the following co-clustering: ( 0. 34,0. 25) ( 0. 32,0. 12) ( 0. 34,0. 25) ( 0. 32,0. 12) ( 0. 23,0, 33) ( 0,0. 7) ( 0. 23,0. 33) ( 0. 32,0. 12) ( 0. 34,0. 25) ( 0. 32,0. 12) ( 0. 34,0. 25) Martijn Wieling 23/32
33 A faster method The previous method is relatively slow due to the use of the large (sparse) matrix A of size (n + m) (n + m) The matrix A contains the same information, but is more dense (size: n m): [a]/[i] [2]/[i] [r]/[x] [k]/[x] [r]/[ö] [r]/[k] Appelscha Oudega Zoutkamp Kerkrade Appelscha The singular value decomposition (SVD) of A n also results in equivalent clustering information and is quicker to compute (Dhillon, 2001) Martijn Wieling 24/32
34 Complete dataset Alignments of pronunciations of 562 words for 424 varieties in the Netherlands against a reference pronunciation Pronunciations originate from the GTRP (Goeman and Taeldeman, 1996; Van den Berg, 2003; Wieling et al., 2007) The pronunciations of Delft were used as the reference Alignments were obtained using the Levenshtein algorithm using a Pointwise Mutual Information procedure (Wieling et al., 2009) We generated a bipartite graph of varieties v and sound correspondences s There is an edge between v i and s j iff freq(s j in v i ) > 0 All the following co-clustering results are obtained applying the fast SVD method Martijn Wieling 25/32
35 Distribution of sites Martijn Wieling 26/32
36 Results: {2,3,4} co-clusters in the data Martijn Wieling 27/32
37 Results: {2,3,4} clusters of varieties Martijn Wieling 28/32
38 Results: {2,3,4} clusters of sound correspondences Sound correspondences specific for the Frisian area Reference [2] [2] [a] [o] [u] [x] [x] [r] Frisian [I] [i] [i] [E] [E] [j] [z] [x] Sound correspondences specific for the Limburg area Reference [r] [r] [k] [n] [n] [w] Limburg [ö] [K] [x] [ö] [K] [f] Sound correspondences specific for the Low Saxon area Reference [-] [a] Low Saxon [m] [N] [ð] [P] [e] Martijn Wieling 29/32
39 To conclude Bipartite spectral graph partitioning is a very useful method to simultaneously cluster varieties and sound correspondences words and documents genes and conditions... and... Do you now understand the title? Bipartite Spectral graph partitioning to co-cluster varieties and sound correspondences Martijn Wieling 30/32
40 To conclude Bipartite spectral graph partitioning is a very useful method to simultaneously cluster varieties and sound correspondences words and documents genes and conditions... and... Do you now understand the title? Bipartite Spectral graph partitioning to co-cluster varieties and sound correspondences Martijn Wieling 30/32
41 Any questions? Thank You! Martijn Wieling 31/32
42 References Boudewijn van den Berg Phonology & Morphology of Dutch & Frisian Dialects in 1.1 million transcriptions. Goeman-Taeldeman-Van Reenen project , Meertens Instituut Electronic Publications in Linguistics 3. Meertens Instituut (CD-ROM), Amsterdam. Sergey Brin, and Lawrence Page The anatomy of a large-scale hypertextual Web search engine. Computers Networks and ISDN Systems, 30(1 7): Fan Chung Spectral Graph Theory. American Mathematical Society. Inderjit Dhillon Co-clustering documents and words using bipartite spectral graph partitioning. Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pp ACM New York. Richard Duda, Peter Hart, and David Stork Pattern Classification. Wiley New York. Ton Goeman, and Johan Taeldeman Fonologie en morfologie van de Nederlandse dialecten. Een nieuwe materiaalverzameling en twee nieuwe atlasprojecten. Taal en Tongval, 48: Wilbert Heeringa Measuring Dialect Pronunciation Differences using Levenshtein Distance. Ph.D. thesis, Rijksuniversiteit Groningen. Stuart Lloyd Least squares quantization in PCM. IEEE Transactions on Information Theory, 28(2): Ulrike von Luxburg A tutorial on spectral clustering. Statistics and Computing, 17(4): Jianbo Shi, and Jitendra Malik Normalized cuts and image segmentation. IEEE Transactions on pattern analysis and machine intelligence, 22(8): Martijn Wieling, Wilbert Heeringa, and John Nerbonne An aggregate analysis of pronunciation in the Goeman-Taeldeman-Van Reenen-Project data. Taal en Tongval, 59(1): Martijn Wieling, Jelena Prokić, and John Nerbonne Evaluating the pairwise string alignment of pronunciations. In: Lars Borin and Piroska Lendvai (eds.) Language Technology and Resources for Cultural Heritage, Social Sciences, Humanities, and Education (LaTeCH - SHELT&R 2009) Workshop at the 12th Meeting of the European Chapter of the Association for Computational Linguistics. Athens, 30 March 2009, pp Martijn Wieling 32/32
Relations Between Adjacency And Modularity Graph Partitioning: Principal Component Analysis vs. Modularity Component Analysis
Relations Between Adjacency And Modularity Graph Partitioning: Principal Component Analysis vs. Modularity Component Analysis Hansi Jiang Carl Meyer North Carolina State University October 27, 2015 1 /
More informationSpectral Clustering. Spectral Clustering? Two Moons Data. Spectral Clustering Algorithm: Bipartioning. Spectral methods
Spectral Clustering Seungjin Choi Department of Computer Science POSTECH, Korea seungjin@postech.ac.kr 1 Spectral methods Spectral Clustering? Methods using eigenvectors of some matrices Involve eigen-decomposition
More informationOn the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering
On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering Chris Ding, Xiaofeng He, Horst D. Simon Published on SDM 05 Hongchang Gao Outline NMF NMF Kmeans NMF Spectral Clustering NMF
More informationSpectral Graph Theory
Spectral Graph Theory Aaron Mishtal April 27, 2016 1 / 36 Outline Overview Linear Algebra Primer History Theory Applications Open Problems Homework Problems References 2 / 36 Outline Overview Linear Algebra
More informationPrincipal Component Analysis
Principal Component Analysis Giorgos Korfiatis Alfa-Informatica University of Groningen Seminar in Statistics and Methodology, 2007 What Is PCA? Dimensionality reduction technique Aim: Extract relevant
More informationSpectral Clustering on Handwritten Digits Database
University of Maryland-College Park Advance Scientific Computing I,II Spectral Clustering on Handwritten Digits Database Author: Danielle Middlebrooks Dmiddle1@math.umd.edu Second year AMSC Student Advisor:
More informationEigenvalue Problems Computation and Applications
Eigenvalue ProblemsComputation and Applications p. 1/36 Eigenvalue Problems Computation and Applications Che-Rung Lee cherung@gmail.com National Tsing Hua University Eigenvalue ProblemsComputation and
More informationLecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora
princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora Scribe: Today we continue the
More informationMarkov Chains and Spectral Clustering
Markov Chains and Spectral Clustering Ning Liu 1,2 and William J. Stewart 1,3 1 Department of Computer Science North Carolina State University, Raleigh, NC 27695-8206, USA. 2 nliu@ncsu.edu, 3 billy@ncsu.edu
More informationGraph Matching & Information. Geometry. Towards a Quantum Approach. David Emms
Graph Matching & Information Geometry Towards a Quantum Approach David Emms Overview Graph matching problem. Quantum algorithms. Classical approaches. How these could help towards a Quantum approach. Graphs
More informationSpectral Clustering. Zitao Liu
Spectral Clustering Zitao Liu Agenda Brief Clustering Review Similarity Graph Graph Laplacian Spectral Clustering Algorithm Graph Cut Point of View Random Walk Point of View Perturbation Theory Point of
More informationMATH 829: Introduction to Data Mining and Analysis Clustering II
his lecture is based on U. von Luxburg, A Tutorial on Spectral Clustering, Statistics and Computing, 17 (4), 2007. MATH 829: Introduction to Data Mining and Analysis Clustering II Dominique Guillot Departments
More informationLink Analysis Ranking
Link Analysis Ranking How do search engines decide how to rank your query results? Guess why Google ranks the query results the way it does How would you do it? Naïve ranking of query results Given query
More informationSpectra of Adjacency and Laplacian Matrices
Spectra of Adjacency and Laplacian Matrices Definition: University of Alicante (Spain) Matrix Computing (subject 3168 Degree in Maths) 30 hours (theory)) + 15 hours (practical assignment) Contents 1. Spectra
More informationSVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)
Chapter 14 SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Today we continue the topic of low-dimensional approximation to datasets and matrices. Last time we saw the singular
More informationLinear Spectral Hashing
Linear Spectral Hashing Zalán Bodó and Lehel Csató Babeş Bolyai University - Faculty of Mathematics and Computer Science Kogălniceanu 1., 484 Cluj-Napoca - Romania Abstract. assigns binary hash keys to
More informationSpectral Generative Models for Graphs
Spectral Generative Models for Graphs David White and Richard C. Wilson Department of Computer Science University of York Heslington, York, UK wilson@cs.york.ac.uk Abstract Generative models are well known
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More informationSpectral Clustering on Handwritten Digits Database Mid-Year Pr
Spectral Clustering on Handwritten Digits Database Mid-Year Presentation Danielle dmiddle1@math.umd.edu Advisor: Kasso Okoudjou kasso@umd.edu Department of Mathematics University of Maryland- College Park
More informationMATH 567: Mathematical Techniques in Data Science Clustering II
This lecture is based on U. von Luxburg, A Tutorial on Spectral Clustering, Statistics and Computing, 17 (4), 2007. MATH 567: Mathematical Techniques in Data Science Clustering II Dominique Guillot Departments
More informationGraph Partitioning Algorithms and Laplacian Eigenvalues
Graph Partitioning Algorithms and Laplacian Eigenvalues Luca Trevisan Stanford Based on work with Tsz Chiu Kwok, Lap Chi Lau, James Lee, Yin Tat Lee, and Shayan Oveis Gharan spectral graph theory Use linear
More informationComputational Methods. Eigenvalues and Singular Values
Computational Methods Eigenvalues and Singular Values Manfred Huber 2010 1 Eigenvalues and Singular Values Eigenvalues and singular values describe important aspects of transformations and of data relations
More informationIntroduction to Spectral Graph Theory and Graph Clustering
Introduction to Spectral Graph Theory and Graph Clustering Chengming Jiang ECS 231 Spring 2016 University of California, Davis 1 / 40 Motivation Image partitioning in computer vision 2 / 40 Motivation
More informationLink Analysis. Leonid E. Zhukov
Link Analysis Leonid E. Zhukov School of Data Analysis and Artificial Intelligence Department of Computer Science National Research University Higher School of Economics Structural Analysis and Visualization
More informationMachine learning for pervasive systems Classification in high-dimensional spaces
Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version
More informationSpectral Clustering. by HU Pili. June 16, 2013
Spectral Clustering by HU Pili June 16, 2013 Outline Clustering Problem Spectral Clustering Demo Preliminaries Clustering: K-means Algorithm Dimensionality Reduction: PCA, KPCA. Spectral Clustering Framework
More informationMLCC Clustering. Lorenzo Rosasco UNIGE-MIT-IIT
MLCC 2018 - Clustering Lorenzo Rosasco UNIGE-MIT-IIT About this class We will consider an unsupervised setting, and in particular the problem of clustering unlabeled data into coherent groups. MLCC 2018
More informationThe Forward-Backward Embedding of Directed Graphs
The Forward-Backward Embedding of Directed Graphs Anonymous authors Paper under double-blind review Abstract We introduce a novel embedding of directed graphs derived from the singular value decomposition
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Lecture 6: Numerical Linear Algebra: Applications in Machine Learning Cho-Jui Hsieh UC Davis April 27, 2017 Principal Component Analysis Principal
More informationLaplacian Eigenmaps for Dimensionality Reduction and Data Representation
Introduction and Data Representation Mikhail Belkin & Partha Niyogi Department of Electrical Engieering University of Minnesota Mar 21, 2017 1/22 Outline Introduction 1 Introduction 2 3 4 Connections to
More informationData Mining Techniques
Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!
More informationInverse Power Method for Non-linear Eigenproblems
Inverse Power Method for Non-linear Eigenproblems Matthias Hein and Thomas Bühler Anubhav Dwivedi Department of Aerospace Engineering & Mechanics 7th March, 2017 1 / 30 OUTLINE Motivation Non-Linear Eigenproblems
More informationQuick Tour of Linear Algebra and Graph Theory
Quick Tour of Linear Algebra and Graph Theory CS224W: Social and Information Network Analysis Fall 2014 David Hallac Based on Peter Lofgren, Yu Wayne Wu, and Borja Pelato s previous versions Matrices and
More informationMachine Learning for Data Science (CS4786) Lecture 11
Machine Learning for Data Science (CS4786) Lecture 11 Spectral clustering Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ ANNOUNCEMENT 1 Assignment P1 the Diagnostic assignment 1 will
More informationClustering VS Classification
MCQ Clustering VS Classification 1. What is the relation between the distance between clusters and the corresponding class discriminability? a. proportional b. inversely-proportional c. no-relation Ans:
More informationCS224W: Social and Information Network Analysis Jure Leskovec, Stanford University
CS224W: Social and Information Network Analysis Jure Leskovec Stanford University Jure Leskovec, Stanford University http://cs224w.stanford.edu Task: Find coalitions in signed networks Incentives: European
More informationData Analysis and Manifold Learning Lecture 7: Spectral Clustering
Data Analysis and Manifold Learning Lecture 7: Spectral Clustering Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture 7 What is spectral
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data Instructor: Yizhou Sun yzsun@ccs.neu.edu March 16, 2016 Methods to Learn Classification Clustering Frequent Pattern Mining Matrix Data Decision
More informationA spectral clustering algorithm based on Gram operators
A spectral clustering algorithm based on Gram operators Ilaria Giulini De partement de Mathe matiques et Applications ENS, Paris Joint work with Olivier Catoni 1 july 2015 Clustering task of grouping
More informationRobust Motion Segmentation by Spectral Clustering
Robust Motion Segmentation by Spectral Clustering Hongbin Wang and Phil F. Culverhouse Centre for Robotics Intelligent Systems University of Plymouth Plymouth, PL4 8AA, UK {hongbin.wang, P.Culverhouse}@plymouth.ac.uk
More informationPCA, Kernel PCA, ICA
PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per
More informationData Mining Recitation Notes Week 3
Data Mining Recitation Notes Week 3 Jack Rae January 28, 2013 1 Information Retrieval Given a set of documents, pull the (k) most similar document(s) to a given query. 1.1 Setup Say we have D documents
More informationDistributed Data Mining for Pervasive and Privacy-Sensitive Applications. Hillol Kargupta
Distributed Data Mining for Pervasive and Privacy-Sensitive Applications Hillol Kargupta Dept. of Computer Science and Electrical Engg, University of Maryland Baltimore County http://www.cs.umbc.edu/~hillol
More informationLecture 1 and 2: Random Spanning Trees
Recent Advances in Approximation Algorithms Spring 2015 Lecture 1 and 2: Random Spanning Trees Lecturer: Shayan Oveis Gharan March 31st Disclaimer: These notes have not been subjected to the usual scrutiny
More information1 Searching the World Wide Web
Hubs and Authorities in a Hyperlinked Environment 1 Searching the World Wide Web Because diverse users each modify the link structure of the WWW within a relatively small scope by creating web-pages on
More informationNon-negative matrix factorization with fixed row and column sums
Available online at www.sciencedirect.com Linear Algebra and its Applications 9 (8) 5 www.elsevier.com/locate/laa Non-negative matrix factorization with fixed row and column sums Ngoc-Diep Ho, Paul Van
More informationSpectral Clustering - Survey, Ideas and Propositions
Spectral Clustering - Survey, Ideas and Propositions Manu Bansal, CSE, IITK Rahul Bajaj, CSE, IITB under the guidance of Dr. Anil Vullikanti Networks Dynamics and Simulation Sciences Laboratory Virginia
More informationSummary: A Random Walks View of Spectral Segmentation, by Marina Meila (University of Washington) and Jianbo Shi (Carnegie Mellon University)
Summary: A Random Walks View of Spectral Segmentation, by Marina Meila (University of Washington) and Jianbo Shi (Carnegie Mellon University) The authors explain how the NCut algorithm for graph bisection
More informationSpectral clustering. Two ideal clusters, with two points each. Spectral clustering algorithms
A simple example Two ideal clusters, with two points each Spectral clustering Lecture 2 Spectral clustering algorithms 4 2 3 A = Ideally permuted Ideal affinities 2 Indicator vectors Each cluster has an
More informationLecture 1: Graphs, Adjacency Matrices, Graph Laplacian
Lecture 1: Graphs, Adjacency Matrices, Graph Laplacian Radu Balan January 31, 2017 G = (V, E) An undirected graph G is given by two pieces of information: a set of vertices V and a set of edges E, G =
More informationData Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings
Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline
More information3.3 Eigenvalues and Eigenvectors
.. EIGENVALUES AND EIGENVECTORS 27. Eigenvalues and Eigenvectors In this section, we assume A is an n n matrix and x is an n vector... Definitions In general, the product Ax results is another n vector
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA 2 Department of Computer
More informationCSI 445/660 Part 6 (Centrality Measures for Networks) 6 1 / 68
CSI 445/660 Part 6 (Centrality Measures for Networks) 6 1 / 68 References 1 L. Freeman, Centrality in Social Networks: Conceptual Clarification, Social Networks, Vol. 1, 1978/1979, pp. 215 239. 2 S. Wasserman
More informationStatistical Machine Learning
Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x
More informationMust-read Material : Multimedia Databases and Data Mining. Indexing - Detailed outline. Outline. Faloutsos
Must-read Material 15-826: Multimedia Databases and Data Mining Tamara G. Kolda and Brett W. Bader. Tensor decompositions and applications. Technical Report SAND2007-6702, Sandia National Laboratories,
More informationClustering compiled by Alvin Wan from Professor Benjamin Recht s lecture, Samaneh s discussion
Clustering compiled by Alvin Wan from Professor Benjamin Recht s lecture, Samaneh s discussion 1 Overview With clustering, we have several key motivations: archetypes (factor analysis) segmentation hierarchy
More informationCSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13
CSE 291. Assignment 3 Out: Wed May 23 Due: Wed Jun 13 3.1 Spectral clustering versus k-means Download the rings data set for this problem from the course web site. The data is stored in MATLAB format as
More informationSlide source: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University.
Slide source: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University http://www.mmds.org #1: C4.5 Decision Tree - Classification (61 votes) #2: K-Means - Clustering
More informationUnsupervised Clustering of Human Pose Using Spectral Embedding
Unsupervised Clustering of Human Pose Using Spectral Embedding Muhammad Haseeb and Edwin R Hancock Department of Computer Science, The University of York, UK Abstract In this paper we use the spectra of
More informationGraphs in Machine Learning
Graphs in Machine Learning Michal Valko INRIA Lille - Nord Europe, France Partially based on material by: Ulrike von Luxburg, Gary Miller, Doyle & Schnell, Daniel Spielman January 27, 2015 MVA 2014/2015
More informationHow does Google rank webpages?
Linear Algebra Spring 016 How does Google rank webpages? Dept. of Internet and Multimedia Eng. Konkuk University leehw@konkuk.ac.kr 1 Background on search engines Outline HITS algorithm (Jon Kleinberg)
More informationDiffuse interface methods on graphs: Data clustering and Gamma-limits
Diffuse interface methods on graphs: Data clustering and Gamma-limits Yves van Gennip joint work with Andrea Bertozzi, Jeff Brantingham, Blake Hunter Department of Mathematics, UCLA Research made possible
More informationMATH 567: Mathematical Techniques in Data Science Clustering II
Spectral clustering: overview MATH 567: Mathematical Techniques in Data Science Clustering II Dominique uillot Departments of Mathematical Sciences University of Delaware Overview of spectral clustering:
More informationMa/CS 6b Class 23: Eigenvalues in Regular Graphs
Ma/CS 6b Class 3: Eigenvalues in Regular Graphs By Adam Sheffer Recall: The Spectrum of a Graph Consider a graph G = V, E and let A be the adjacency matrix of G. The eigenvalues of G are the eigenvalues
More informationBruce Hendrickson Discrete Algorithms & Math Dept. Sandia National Labs Albuquerque, New Mexico Also, CS Department, UNM
Latent Semantic Analysis and Fiedler Retrieval Bruce Hendrickson Discrete Algorithms & Math Dept. Sandia National Labs Albuquerque, New Mexico Also, CS Department, UNM Informatics & Linear Algebra Eigenvectors
More informationSpectral Techniques for Clustering
Nicola Rebagliati 1/54 Spectral Techniques for Clustering Nicola Rebagliati 29 April, 2010 Nicola Rebagliati 2/54 Thesis Outline 1 2 Data Representation for Clustering Setting Data Representation and Methods
More informationInf 2B: Ranking Queries on the WWW
Inf B: Ranking Queries on the WWW Kyriakos Kalorkoti School of Informatics University of Edinburgh Queries Suppose we have an Inverted Index for a set of webpages. Disclaimer Not really the scenario of
More informationLecture 12: Introduction to Spectral Graph Theory, Cheeger s inequality
CSE 521: Design and Analysis of Algorithms I Spring 2016 Lecture 12: Introduction to Spectral Graph Theory, Cheeger s inequality Lecturer: Shayan Oveis Gharan May 4th Scribe: Gabriel Cadamuro Disclaimer:
More informationData Mining Lecture 4: Covariance, EVD, PCA & SVD
Data Mining Lecture 4: Covariance, EVD, PCA & SVD Jo Houghton ECS Southampton February 25, 2019 1 / 28 Variance and Covariance - Expectation A random variable takes on different values due to chance The
More informationSolving Constrained Rayleigh Quotient Optimization Problem by Projected QEP Method
Solving Constrained Rayleigh Quotient Optimization Problem by Projected QEP Method Yunshen Zhou Advisor: Prof. Zhaojun Bai University of California, Davis yszhou@math.ucdavis.edu June 15, 2017 Yunshen
More informationA New Space for Comparing Graphs
A New Space for Comparing Graphs Anshumali Shrivastava and Ping Li Cornell University and Rutgers University August 18th 2014 Anshumali Shrivastava and Ping Li ASONAM 2014 August 18th 2014 1 / 38 Main
More informationRecall : Eigenvalues and Eigenvectors
Recall : Eigenvalues and Eigenvectors Let A be an n n matrix. If a nonzero vector x in R n satisfies Ax λx for a scalar λ, then : The scalar λ is called an eigenvalue of A. The vector x is called an eigenvector
More informationTesting Cluster Structure of Graphs. Artur Czumaj
Testing Cluster Structure of Graphs Artur Czumaj DIMAP and Department of Computer Science University of Warwick Joint work with Pan Peng and Christian Sohler (TU Dortmund) Dealing with BigData in Graphs
More informationPower Grid Partitioning: Static and Dynamic Approaches
Power Grid Partitioning: Static and Dynamic Approaches Miao Zhang, Zhixin Miao, Lingling Fan Department of Electrical Engineering University of South Florida Tampa FL 3320 miaozhang@mail.usf.edu zmiao,
More informationLinear Algebra and Graphs
Linear Algebra and Graphs IGERT Data and Network Science Bootcamp Victor Amelkin victor@cs.ucsb.edu UC Santa Barbara September 11, 2015 Table of Contents Linear Algebra: Review of Fundamentals Matrix Arithmetic
More informationGraph Clustering Algorithms
PhD Course on Graph Mining Algorithms, Università di Pisa February, 2018 Clustering: Intuition to Formalization Task Partition a graph into natural groups so that the nodes in the same cluster are more
More information1 Matrix notation and preliminaries from spectral graph theory
Graph clustering (or community detection or graph partitioning) is one of the most studied problems in network analysis. One reason for this is that there are a variety of ways to define a cluster or community.
More informationk-means Approach to the Karhunen-Loéve Transform
Jagiellonian University Faculty of Mathematics and Computer Science k-means Approach to the Karhunen-Loéve Transform IM UJ preprint 22/ K. Misztal, P. Spurek, J. Tabor k-means Approach to the Karhunen-Loéve
More informationMore Linear Algebra. Edps/Soc 584, Psych 594. Carolyn J. Anderson
More Linear Algebra Edps/Soc 584, Psych 594 Carolyn J. Anderson Department of Educational Psychology I L L I N O I S university of illinois at urbana-champaign c Board of Trustees, University of Illinois
More informationData Mining and Matrices
Data Mining and Matrices 10 Graphs II Rainer Gemulla, Pauli Miettinen Jul 4, 2013 Link analysis The web as a directed graph Set of web pages with associated textual content Hyperlinks between webpages
More informationPROBABILISTIC LATENT SEMANTIC ANALYSIS
PROBABILISTIC LATENT SEMANTIC ANALYSIS Lingjia Deng Revised from slides of Shuguang Wang Outline Review of previous notes PCA/SVD HITS Latent Semantic Analysis Probabilistic Latent Semantic Analysis Applications
More informationSpace-Variant Computer Vision: A Graph Theoretic Approach
p.1/65 Space-Variant Computer Vision: A Graph Theoretic Approach Leo Grady Cognitive and Neural Systems Boston University p.2/65 Outline of talk Space-variant vision - Why and how of graph theory Anisotropic
More informationMining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University
Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit
More informationEE16B Designing Information Devices and Systems II
EE6B Designing Information Devices and Systems II Lecture 9B Geometry of SVD, PCA Uniqueness of the SVD Find SVD of A 0 A 0 AA T 0 ) ) 0 0 ~u ~u 0 ~u ~u ~u ~u Uniqueness of the SVD Find SVD of A 0 A 0
More informationEVD & SVD EVD & SVD. Durgesh Kumar. June 10, 2016
EVD & SVD Durgesh Kumar June 10, 2016 Table of contents 1 Why bother so much for SVD and EVD? 2 Pre-requisite Fundamental of Matrix Multiplication Eigen Value and Eigen Vector 3 EVD Defintion of EVD Use
More informationCS249: ADVANCED DATA MINING
CS249: ADVANCED DATA MINING Graph and Network Instructor: Yizhou Sun yzsun@cs.ucla.edu May 31, 2017 Methods Learnt Classification Clustering Vector Data Text Data Recommender System Decision Tree; Naïve
More informationGraph Metrics and Dimension Reduction
Graph Metrics and Dimension Reduction Minh Tang 1 Michael Trosset 2 1 Applied Mathematics and Statistics The Johns Hopkins University 2 Department of Statistics Indiana University, Bloomington November
More informationEcon Slides from Lecture 7
Econ 205 Sobel Econ 205 - Slides from Lecture 7 Joel Sobel August 31, 2010 Linear Algebra: Main Theory A linear combination of a collection of vectors {x 1,..., x k } is a vector of the form k λ ix i for
More informationDimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas
Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx
More informationTwo-Dimensional Linear Discriminant Analysis
Two-Dimensional Linear Discriminant Analysis Jieping Ye Department of CSE University of Minnesota jieping@cs.umn.edu Ravi Janardan Department of CSE University of Minnesota janardan@cs.umn.edu Qi Li Department
More informationCertifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering
Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering Shuyang Ling Courant Institute of Mathematical Sciences, NYU Aug 13, 2018 Joint
More informationMatrix Approximations
Matrix pproximations Sanjiv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 00 Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning Latent Semantic Indexing (LSI) Given n documents
More informationSingular Value Decomposition and Principal Component Analysis (PCA) I
Singular Value Decomposition and Principal Component Analysis (PCA) I Prof Ned Wingreen MOL 40/50 Microarray review Data per array: 0000 genes, I (green) i,i (red) i 000 000+ data points! The expression
More informationIntrinsic Structure Study on Whale Vocalizations
1 2015 DCLDE Conference Intrinsic Structure Study on Whale Vocalizations Yin Xian 1, Xiaobai Sun 2, Yuan Zhang 3, Wenjing Liao 3 Doug Nowacek 1,4, Loren Nolte 1, Robert Calderbank 1,2,3 1 Department of
More informationDATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD
DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary
More informationCS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine
CS 277: Data Mining Mining Web Link Structure Class Presentations In-class, Tuesday and Thursday next week 2-person teams: 6 minutes, up to 6 slides, 3 minutes/slides each person 1-person teams 4 minutes,
More informationParallel Singular Value Decomposition. Jiaxing Tan
Parallel Singular Value Decomposition Jiaxing Tan Outline What is SVD? How to calculate SVD? How to parallelize SVD? Future Work What is SVD? Matrix Decomposition Eigen Decomposition A (non-zero) vector
More informationCOMPSCI 514: Algorithms for Data Science
COMPSCI 514: Algorithms for Data Science Arya Mazumdar University of Massachusetts at Amherst Fall 2018 Lecture 8 Spectral Clustering Spectral clustering Curse of dimensionality Dimensionality Reduction
More informationEUSIPCO
EUSIPCO 2013 1569741067 CLUSERING BY NON-NEGAIVE MARIX FACORIZAION WIH INDEPENDEN PRINCIPAL COMPONEN INIIALIZAION Liyun Gong 1, Asoke K. Nandi 2,3 1 Department of Electrical Engineering and Electronics,
More information