Scalable Subspace Clustering René Vidal Center for Imaging Science, Laboratory for Computational Sensing and Robotics, Institute for Computational Medicine, Department of Biomedical Engineering, Johns Hopkins University
High-Dimensional Data In many areas, we deal with high-dimensional data Computer vision Medical imaging Medical robotics Signal processing Bioinformatics
Low-Dimensional Manifolds Face clustering and classification Lossy image representation S 1 S 2 Motion segmentation DT segmentation Video segmentation 3
Subspace Clustering Problem Given a set of points lying in multiple subspaces, identify The number of subspaces and their dimensions A basis for each subspace The segmentation of the data points Challenges Model selection Nonconvex Combinatorial More challenges Noise Outliers Missing entries
Subspace Clustering Problem: Challenges Even more challenges Angles between subspaces are small Nearby points are in different subspaces S 2 S 1 100 100 90 90 Percentage of subspace pairs 80 70 60 50 40 30 20 Hopkins 155 Extended YaleB Percentage of data points 80 70 60 50 40 30 20 Hopkins 155 Extended YaleB 10 10 0 0 10 20 30 40 50 60 70 80 90 Subspace angle (degree) 0 5 10 15 20 25 Number of nearest neighbors
Prior Work: Sparse and Low-Rank Methods Approach Data are self-expressive Global affinity by convex optimization Representative methods Sparse Subspace Clustering (SSC) (Elhamifar-Vidal 09 10 13, Candes-Soltanolkotabi 12 13, Wang-Xu 13) Low-Rank Subspace Clustering (LRR and LRSC) (Costeira-Kanade 98, Kanatani 01, Vidal 08, Liu et al. 10 13, Wei-Lin 10, Favaro-Vidal 11 13) Least Square Regression (LSR) (Lu 12) Sparse + Low-Rank (Luo 11, Wang 13) Sparse + Frobenius Elastic Net (EnSC) (Dyer 13, You 16)
Prior Work: Sparse and Low-Rank Methods min C,E f(c)+ g(e) s.t.x = XC + E Sparse Subspace Clustering (SSC) `1 `1, `22 Least Squares Regression (LSR) `22 `22 Elastic Net Subspace Clustering (EnSC) `1 + `22 `22 Low Rank Representation (LRR) nuclear `2,1 Low Rank Subspace Clustering (LRSC) nuclear `1, `22 f g Advantages Convex optimization Broad theoretical results Robust to noise/corruptions Disadvantages / Open Problems Low-dimensional subspaces Missing entries Scalability: can handle 10,000 data points
Talk Outline Sparse Subspace Clustering by Basis Pursuit (SSC-BP) Theoretical guarantees for noiseless, noisy, and corrupted data Great performance for applications with 1,000 data points Scalable Sparse Subspace Clustering by Orthogonal Matching Pursuit (SSC-OMP) Theoretical guarantees for noiseless data Scalable to 600,000 data points Scalable Elastic Net Subspace Clustering (EnSC) Theoretical guarantees for noiseless data New active set algorithm that is scalable to 600,000 data points E. Elhamifar and R. Vidal. Sparse Subspace Clustering. CVPR 2009. E. Elhamifar and R. Vidal. Sparse Subspace Clustering: Algorithm, Theory and Applications. TPAMI 2013. C. You, D. Robinson, R. Vidal. Scalable Sparse Subspace Clustering by Orthogonal Matching Pursuit. CVPR 2016. C. You, C. Li, D. Robinson, R. Vidal, Scalable Elastic Net Subspace Clustering. CVPR 2016.
Sparse Subspace Clustering by Basis Pursuit Ehsan Elhamifar and René Vidal Computer Science, Northeastern University Center for Imaging Science, Johns Hopkins University
Sparse Subspace Clustering: Spectral Clustering Spectral clustering Represent data points as nodes in graph G Connect nodes i and j with weight c ij Infer clusters from Laplacian of G How to define a subspace-preserving affinity matrix C? c ij 6=0 c ij =0 points in the same subspace: points in different subspaces:
Sparse Subspace Clustering: Intuition Data in a union of subspaces (UoS) are self-expressive NX x j = c ij x i =) x j = Xc j =) X = XC i=1 Data in a UoS admits a subspace-preserving representation c ij 6=0 =) x i and x j belong to the same subspace S 3 S 1 S 2 X S 1 S 3 S 2 Under what conditions is solution to P0 subspace-preserving? P 0 : min c j kc j k 0 s. t. x j = Xc j, c jj =0 E. Elhamifar and R. Vidal. Sparse Subspace Clustering. CVPR 2009. E. Elhamifar and R. Vidal. Clustering Disjoint Subspaces via Sparse Representation. ICASSP 2010. E. Elhamifar and R. Vidal. Sparse Subspace Clustering: Algorithm, Theory and Applications. TPAMI 2013.
Sparse Subspace Clustering: Noiseless Data Under what conditions on the subspaces and the data is the solution to P1 subspace-preserving? Point by point: All points: min c j kc j k 1 s. t. x j = Xc j, c jj =0 min C kck 1 s. t. X = XC, diag(c) =0 Theorem 1: P1 gives a subspace-preserving representation if the subspaces are independent, i.e., nm dim i=1 S i = nx dim(s i ) i=1 S 2 S 1 E. Elhamifar and R. Vidal. Sparse Subspace Clustering. CVPR 2009. E. Elhamifar and R. Vidal. Sparse Subspace Clustering: Algorithm, Theory and Applications. TPAMI 2013.
Sparse Subspace Clustering: Noiseless Data Independence may be too restrictive: e.g., articulated motions Theorem 2: P1 gives a subspace-preserving representation if the subspaces are sufficiently separated and the data are well distributed inside the subspaces, i.e., if for all i=1,, n, (incoherence) µ i = max j6=i cos( ij) <r i Theorem 3: n d-dimensional subspaces drawn independently, uniformly at random ρd + 1 points per subspace drawn independently, uniformly at random P1 gives a subspace-preserving representation with high probability if the dimension of the subspace d is small relative to the ambient dimension D (inradius) d< c2 ( ) log( ) 12 log(n) D E. Elhamifar and R. Vidal. Clustering Disjoint Subspaces via Sparse Representation. ICASSP 2010. E. Elhamifar and R. Vidal. Sparse Subspace Clustering: Algorithm, Theory and Applications. TPAMI 2013. M. Soltanolkotabi, E. Candes. A geometric analysis of subspace clustering with outliers. Annals of Statistics, 40(4):2195 2238, 2013.
6 Sparse Subspace Clustering: Noisy Data Under what conditions on the subspaces and the data is the solution to LASSO subspace-preserving? Noiseless (P1): Noise (LASSO): min C kck 1 s. t. X = XC, diag(c) =0 min C kck 1 + 2 kx XCk 2 F s. t. diag(c) =0 Theorem 4: LASSO gives a subspace-preserving representation if the subspaces are sufficiently separated, the data are well distributed, the noise is small enough, and the LASSO parameter is well chosen, i.e., µ i <r i, < r i(r i µ i ) 3r 2 i +8r i +2, < < Wang, Y.-X. and Xu, H. Noisy sparse subspace clustering. ICML 2013.
Experiments on Face Clustering Faces under varying illumination 9D subspace Extended Yale B dataset 38 subjects 64 images per subject Clustering error SSC < 2.0% error for 2 subjects SSC < 11.0% error for 10 subjects E. Elhamifar and R. Vidal, Sparse Subspace Clustering: Algorithm, Theory, and Applications, TPAMI13. Clustering error (%) 70 60 50 40 30 20 10 SSC LRSC LRR H LRR SCC LSA D = 2,016 dimensional data 0 2 4 6 8 10 Number of subjects
Scalable Sparse Subspace Clustering by Orthogonal Matching Pursuit Chong You, Daniel Robinson, and René Vidal Center for Imaging Science, Johns Hopkins University Applied Mathematics and Statistics, Johns Hopkins University
[1] E. Elhamifar and R. Vidal, Sparse Subspace Clustering, CVPR 09 [2] G. Liu, Z. Lin, Y. Yu, Robust Subspace Segmentation by Low- Rank Representation, ICML 10 [3] Lu et al,, Robust and efficient subspace segmentation via least squares regression, ECCV 2012. [4] X. Chen and D. Cai. Large Scale Spectral Clustering with Landmarkbased Representation, AAAI 11 [5] X. Peng, L. Zhang, Z. Yi, Scalable Sparse Subspace Clustering, CVPR 13 [6] A. Adler, M. Elad, Y. Hel-Or, Linear-Time Subspace Clustering via Bipartite Graph Modeling. TNNLS 15 Prior Work: overview arbitrary subspaces This work independent subspaces 1K data points 1M data points
Sparse Subspace Clustering (SSC) [1] Elhamifar-Vidal, Sparse Subspace Clustering, CVPR 2009 [2] Dyer et al, Greedy Feature Selection for Subspace Clustering, JMLR 2014
Sparse Subspace Clustering (SSC) [1] Elhamifar-Vidal, Sparse Subspace Clustering, CVPR 2009 [2] Dyer et al, Greedy Feature Selection for Subspace Clustering, JMLR 2014
SSC by Orthogonal Matching Pursuit (SSC-OMP)
Guaranteed Correct Connections: Deterministic Model [3] M. Soltanolkotabi, E. Candes. A geometric analysis of subspace clustering with outliers. Annals of Statistics, 40(4): 2195 2238, 2013.
Guaranteed Correct Connections: Deterministic Model SSC-OMP gives correct connections if
Guaranteed Correct Connections: Deterministic Model SSC-OMP gives correct connections if
Guaranteed Correct Connections: Random Model [3] M. Soltanolkotabi, E. Candes. A geometric analysis of subspace clustering with outliers. Annals of Statistics, 40(4): 2195 2238, 2013.
Synthetic Experiments
Synthetic Experiments
Experiment on Extended Yale B
Experiment on MNIST
Conclusion
Scalable Elastic Net Subspace Clustering Chong You, Chun-Guang Li *, Daniel Robinson, and René Vidal Center for Imaging Science, Johns Hopkins University *SICE, Beijing University of Posts and Telecommunications Applied Mathematics and Statistics, Johns Hopkins University
Motivation
Elastic net Subspace Clustering (EnSC)
Scalable Elastic net Subspace Clustering Prior methods ADMM Interior point Solution path Proximal gradient method etc.
Geometry of the Elastic Net Solution
Correct Connections vs. Connectivity
Guaranteed Correct Connections
Oracle Guided Active Set (ORGEN) Algorithm
Oracle Guided Active Set (ORGEN) Algorithm
Experiments database # data ambient dim. # clusters Examples Coil-100 7,200 1024 100 PIE 11,554 1024 68 MNIST 70,000 500 10 CovType 581,012 54 7
Experiments database # data SSC-BP SSC-OMP EnSC Coil-100 7,200 57.10% 42.93% 69.24% PIE 11,554 41.94% 24.06% 52.98% MNIST 70,000-93.07% 93.79% CovType 581,012-48.76% 53.52%
Experiments database # data SSC-BP SSC-OMP EnSC Coil-100 7,200 127 mins 3 mins 3 mins PIE 11,554 412 mins 5 mins 13 mins MNIST 70,000-6 mins 28 mins CovType 581,012-783 mins 1452 mins
Conclusion
Conclusions Many problems in computer vision can be posed as subspace clustering problems Spatial and temporal video segmentation Face clustering under varying illumination These problems can be solved using Sparse Subspace Clustering by Basis Pursuit (SSC-BP) Sparse Subspace Clustering by Orthogonal Matching Pursuit (SSC-OMP) Elastic Net Subspace Clustering (EnSC) These algorithms are provably correct when Subspaces are sufficiently separated Data are well distributed within each subspace The subspace dimension is small relative to the ambient dimension SSC-OMP and EnSC are scalable to 1M data points
Acknowledgements Vision Lab @ Johns Hopkins University http://www.vision.jhu.edu Thank You!