Spectral Clustering using Multilinear SVD

Similar documents
Self-Tuning Spectral Clustering

CSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13

An indicator for the number of clusters using a linear map to simplex structure

Approximate Spectral Clustering via Randomized Sketching

Robust Motion Segmentation by Spectral Clustering

A Unifying Approach to Hard and Probabilistic Clustering

Sparse Subspace Clustering

Spectral Clustering. Spectral Clustering? Two Moons Data. Spectral Clustering Algorithm: Bipartioning. Spectral methods

MATLAB implementation of a scalable spectral clustering algorithm with cosine similarity

Scalable Subspace Clustering

THE HIDDEN CONVEXITY OF SPECTRAL CLUSTERING

Non-convex Robust PCA: Provable Bounds

Introduction to Spectral Graph Theory and Graph Clustering

Two-View Segmentation of Dynamic Scenes from the Multibody Fundamental Matrix

EECS 275 Matrix Computation

Grouping with Bias. Stella X. Yu 1,2 Jianbo Shi 1. Robotics Institute 1 Carnegie Mellon University Center for the Neural Basis of Cognition 2

Subspace sampling and relative-error matrix approximation

CS 664 Segmentation (2) Daniel Huttenlocher

Spectral clustering. Two ideal clusters, with two points each. Spectral clustering algorithms

Multi-way Clustering Using Super-symmetric Non-negative Tensor Factorization

arxiv: v3 [stat.ml] 29 May 2018

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)

On the convergence of higher-order orthogonality iteration and its extension

SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS

he Applications of Tensor Factorization in Inference, Clustering, Graph Theory, Coding and Visual Representation

Higher Order Learning with Graphs

Spectral Clustering. Zitao Liu

Faster quantum algorithm for evaluating game trees

A spectral clustering algorithm based on Gram operators

Recovering any low-rank matrix, provably

Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora

CS60021: Scalable Data Mining. Dimensionality Reduction

Markov Chains and Spectral Clustering

THE PERTURBATION BOUND FOR THE SPECTRAL RADIUS OF A NON-NEGATIVE TENSOR

MATH 567: Mathematical Techniques in Data Science Clustering II

Spectral Clustering. by HU Pili. June 16, 2013

Assignment #10: Diagonalization of Symmetric Matrices, Quadratic Forms, Optimization, Singular Value Decomposition. Name:

STATS 306B: Unsupervised Learning Spring Lecture 13 May 12

15 Singular Value Decomposition

Doubly Stochastic Normalization for Spectral Clustering

A Multi-Affine Model for Tensor Decomposition

Multiclass Spectral Clustering

Analysis of Spectral Kernel Design based Semi-supervised Learning

Learning Generative Models of Similarity Matrices

Learning Spectral Graph Segmentation

Dictionary Learning Using Tensor Methods

Kronecker Product Approximation with Multiple Factor Matrices via the Tensor Product Algorithm

Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec

Lab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018

Dissertation Defense

Object Detection and Segmentation from Joint Embedding of Parts and Pixels

Relative-Error CUR Matrix Decompositions

Grouping with Directed Relationships

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering

Spectral Techniques for Clustering

Approximate Principal Components Analysis of Large Data Sets

Machine Learning for Data Science (CS4786) Lecture 11

7. Symmetric Matrices and Quadratic Forms

Dimensionality Reduction and Principle Components Analysis

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Finding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October

Math 671: Tensor Train decomposition methods

Spectral Clustering. Guokun Lai 2016/10

Diffuse interface methods on graphs: Data clustering and Gamma-limits

SOS Boosting of Image Denoising Algorithms

Faloutsos, Tong ICDE, 2009

Column Selection via Adaptive Sampling

Graphs in Machine Learning

SVD, discrepancy, and regular structure of contingency tables

Spectral Graph Theory and its Applications. Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity

Notes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.

Reconstruction in the Generalized Stochastic Block Model

Provable Alternating Minimization Methods for Non-convex Optimization

Data Mining Techniques

OPERATIONS on large matrices are a cornerstone of

ELE 538B: Mathematics of High-Dimensional Data. Spectral methods. Yuxin Chen Princeton University, Fall 2018

The Second Eigenvalue of the Google Matrix

Salt Dome Detection and Tracking Using Texture Analysis and Tensor-based Subspace Learning

Graph Functional Methods for Climate Partitioning

Latent Semantic Analysis. Hongning Wang

Noise Robust Spectral Clustering

Multi-way Clustering Using Super-symmetric Non-negative Tensor Factorization

PU Learning for Matrix Completion

14 Singular Value Decomposition

Matrix Approximations

Matrix Decomposition in Privacy-Preserving Data Mining JUN ZHANG DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF KENTUCKY

Communities, Spectral Clustering, and Random Walks

PCA, Kernel PCA, ICA

Probabilistic Latent Semantic Analysis

NETWORKS (a.k.a. graphs) are important data structures

On Sampling-based Approximate Spectral Decomposition

NORMALIZED CUTS ARE APPROXIMATELY INVERSE EXIT TIMES

Efficiently Implementing Sparsity in Learning

Learning Spectral Clustering

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Spectral Algorithms I. Slides based on Spectral Mesh Processing Siggraph 2010 course

Introduction to Data Mining

A Multi-task Learning Strategy for Unsupervised Clustering via Explicitly Separating the Commonality

Sum-of-Squares Method, Tensor Decomposition, Dictionary Learning

Learning Spectral Clustering

Transcription:

Spectral Clustering using Multilinear SVD Analysis, Approximations and Applications Debarghya Ghoshdastidar, Ambedkar Dukkipati Dept. of Computer Science & Automation Indian Institute of Science

Clustering: The Elegant Way Input: Data points Step 1: Construct graph Step 2: Find the best cut

Clustering: The Elegant And Simple Way Construct graph Get the best cut Compute normalized Find leading Run k-means affinity matrix eigenvectors on rows

Spectral Clustering The Good Well-defined formulation based on graph partitioning Minimize normalized cut / maximize normalized associativity [Shi & Malik '00] Solved by matrix eigen-decomposition Guarantees from perturbation theory [Ng, Jordan & Weiss '02] Use matrix sampling techniques [Fowlkes et al. '04]

Spectral Clustering The Good Well-defined formulation based on graph partitioning Minimize normalized cut / maximize normalized associativity [Shi & Malik '00] Solved by matrix eigen-decomposition Guarantees from perturbation theory [Ng, Jordan & Weiss '02] Use matrix sampling techniques [Fowlkes et al. '04] The Bad Best cut Correct cut Cannot use higher-order relations (clusters are circles)

Higher-order Clustering And The Solution Use multi-way relations Need m( 4) points to decide a circle or not Construct graph m-uniform hypergraph Each edge connects m nodes Relations encoded in matrix m-way tensor

Higher-order Clustering And The Solution Use multi-way relations Need m( 4) points to decide a circle or not Construct graph m-uniform hypergraph Each edge connects m nodes Relations encoded in matrix m-way tensor And The Algorithms Approximate tensor by matrix [Govindu '05] Reduce hypergraph to graph [Agarwal et al. '05] Decompose joint probability tensor [Shashua, Zass & Hazan '06] Construct evolutionary game [Rota Bulo & Pelillo '13] Use other optimization criteria [Liu et al. '10; Ochs & Brox '11] and many more...

Matrix Tensor / Graph Hypergraph And Finally... The Ugly NO notion of cut / associativity in terms of affinity tensor NO motivation for using eigenvectors NO idea about what s the best way to sample

Matrix Tensor / Graph Hypergraph And Finally... The Ugly NO notion of cut / associativity in terms of affinity tensor NO motivation for using eigenvectors NO idea about what s the best way to sample Our contribution: Bridge the gap by defining squared associativity of hypergraph using multilinear singular value decomposition generalizing matrix sampling methods to tensors

Squared Associativity m-uniform hypergraph (V, E, w) Set of vertices V = {1, 2,..., n} Set of edges E: each edge e = {i 1,..., i m } with weight w(e) m-way affinity tensor { w(e) if e = {i1, i A i1 i 2...i m = 2,..., i m } E 0 otherwise

Squared Associativity m-uniform hypergraph (V, E, w) Set of vertices V = {1, 2,..., n} Set of edges E: each edge e = {i 1,..., i m } with weight w(e) m-way affinity tensor { w(e) if e = {i1, i A i1 i 2...i m = 2,..., i m } E 0 otherwise Squared associativity of the partition For C V, Assoc(C) = i 1,...,i m C For C 1, C 2,..., C k partition of V, SqAssoc(C 1, C 2,..., C k ) = A i1 i 2...i m ( k Assoc(Cj ) ) 2 j=1 C j m

Maximize SqAssoc: A Multilinear SVD Problem Our objective: Find k non-overlapping cluster assignment vectors that maximize SqAssoc Result Relaxation of above objective equivalent to: Find k leading left singular vectors  Result based on multilinear SVD of tensors [De Lathauwer, De Moore & Vandewalle '00; Chen & Saad '09] Similar approach also used in [Govindu '05]

Higher-order Clustering: The Elegant And Simple Way m-uniform hypergraph Flatten the tensor Compute m-way affinity tensor Find leading left singular vectors Run k-means on rows

Perturbation Result The ideal case: Result If C 1,..., C k are known a priori. Affinity tensor { 1 if i1, i A i1 i 2...i m = 2,..., i m C j for some j, 0 otherwise. n > some threshold, each cluster is not too small, and   = O(k m n m α ) for some α > 0, 2 then number of misclustered nodes is O(kn 2α ). Above bound improves upon [Chen & Lerman '09]

Higher-order Clustering: Computation too high m-uniform hypergraph Flatten the tensor Compute m-way affinity tensor Find leading left singular vectors Run k-means on rows

Higher-order Clustering: Approximations Needed m-uniform hypergraph Flatten the tensor Compute m-way affinity tensor Find leading left singular vectors Run k-means on rows

Random Sampling Long sparse matrix Sample entries SVD Random sampling used for tensor completion [Jain & Oh '14] Poor performance when used in clustering

Random Sampling Long sparse matrix Sample entries SVD Random sampling used for tensor completion [Jain & Oh '14] Poor performance when used in clustering We focus on: Column sampling [Drineas, Kannan & Mahoney '06] Nyström approximation [Fowlkes et al. '04] We generalize these samplings to tensors

Column Sampling Sample columns SVD

Column Sampling Sample columns SVD Can also represent as: Sample columns SVD

Column Sampling Sample columns SVD Can also represent as: Sample columns SVD Well-studied approach [Drineas, Kannan & Mahoney '06] Let column a i be sampled with probability p i Uniform sampling is not optimal Optimal sampling p i a i 2 2 (computation costly)

Column Sampling Sample columns SVD Improved sampling: Observe: Sample 1 column fix (m 1) data points Run initial clustering (k-means / k-subspace) Each time choose (m 1) points from a cluster (optional) Reject column a if a 2 too small

Nyström Approximation Matrix case: Compute eigenvectors of sub-matrix and extend Extension minimizes reconstruction error of some entries

Nyström Approximation Matrix case: Compute eigenvectors of sub-matrix and extend Extension minimizes reconstruction error of some entries Generalization to tensors: Sample SVD

Nyström Approximation Matrix case: Compute eigenvectors of sub-matrix and extend Extension minimizes reconstruction error of some entries Generalization to tensors: Sample SVD Minimize reconstruction error. Sample

Nyström Approximation Matrix case: Compute eigenvectors of sub-matrix and extend Extension minimizes reconstruction error of some entries Generalization to tensors: Sample SVD Minimize reconstruction error. Sample Choose initial sub-tensor wisely (run initial clustering)

Numerical Results Line clustering Motion segmentation Mean % error on Hopkins 155 dataset Method 2-motion 3-motion All LSA 4.23 7.02 4.86 SCC 2.89 8.25 4.10 LRR 4.10 9.89 5.41 LRR-H 2.13 4.03 2.56 LRSC 3.69 7.69 4.59 SSC 1.52 4.40 2.18 SGC 1.03 5.53 2.05 Column sampling (black) Nyström method (red) Comparable time taken Multilinear SVD with column sampling Uniform 1.83 9.31 3.52 with initial k-means 1.05 5.72 2.11 Column sampling better than Nyström approximation Significant improvement if we use initial clustering

This work was supported by under the Google Ph.D. Fellowship Program