Nonlinear Eigenproblems in Data Analysis

Size: px

Start display at page:

Download "Nonlinear Eigenproblems in Data Analysis"

Sybil Thomas
5 years ago
Views:

Nonlinear Eigenproblems in Data Analysis Matthias Hein Joint work with: Thomas Bühler, Leonardo Jost, Shyam Rangapuram, Simon Setzer Department of Mathematics and Computer Science Saarland

1 Nonlinear Eigenproblems in Data Analysis Matthias Hein Joint work with: Thomas Bühler, Leonardo Jost, Shyam Rangapuram, Simon Setzer Department of Mathematics and Computer Science Saarland University, Saarbrücken, Germany Workshop on Unsupervised Learning and Highdimensional Statistics Ecole Normale Superieure, Paris, Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 1 / 35

2 From linear to nonlinear eigenproblems Linear eigenproblem (LEP) Ax = λ Bx x is critical point of F (x) = x,ax x,bx Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 2 / 35

3 From linear to nonlinear eigenproblems Linear eigenproblem (LEP) Ax = λ Bx x is critical point of F (x) = x,ax x,bx Generalization 0 R(x) λ S(x) = x is critical point of F (x) = R(x) S(x) Nonlinear eigenproblem (NLEP) Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 2 / 35

4 From linear to nonlinear eigenproblems Linear eigenproblem (LEP) Ax = λ Bx x is critical point of F (x) = x,ax x,bx Generalization 0 R(x) λ S(x) = x is critical point of F (x) = R(x) S(x) Nonlinear eigenproblem (NLEP) Why do we need nonlinear eigenproblems? The freedom in the choice of R and S in NLEPs allows to address problems not possible before with LEPs Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 2 / 35

5 Benefits of Nonlinear Eigenproblems in Data Analysis Linear EP Nonlinear EP Modeling power low high Relaxation of loose tight combinatorial problems Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 3 / 35

6 Benefits of Nonlinear Eigenproblems in Data Analysis Linear EP Nonlinear EP Modeling power low high Relaxation of loose tight combinatorial problems Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 3 / 35

7 Robust PCA Type of eigenproblem Ratio n PCA Linear n j=1 X j 2 i i=1 w,x 1 n w 2 2 Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 4 / 35

8 Robust PCA Source of outliers noisy data adversarial manipulation Type of eigenproblem Ratio Robustness n PCA Linear n j=1 X j 2 i i=1 w,x 1 n w 2 2 no Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 4 / 35

9 Robust PCA Source of outliers noisy data adversarial manipulation PCA Type of eigenproblem Linear Nonlinear n n j=1 X j 2 Ratio i=1 w,x i 1 n w 2 2 R(w) w 2 Robustness no yes Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 4 / 35

10 Robust PCA PCA uses non-robust variance: Var(w) = n w, X i 1 n i=1 n X j 2. j=1 Robust scale measure (Rousseeuw and Croux(1993)) R(w) = Quartile ( w, X i X j, i, j = 1,..., n ) breakdown point 50% requires no location measure can be written as a difference of convex functions in w projection pursuit type technique - algorithms currently based on search over finite set resp. two-dimensional grid search Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 5 / 35

11 Challenges of Nonlinear Eigenproblems Linear EP Nonlinear EP Numerical symmetric EP almost Algorithms solved nothing Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 6 / 35

12 (Inverse) Power Method for Nonlinear Eigenproblems Inverse Power Method for Linear Eigenproblems {1 Af k+1 = B f k f k+1 = arg min u R n 2 u, Au u, Bf k } Sequence f k converges to smallest eigenvector of generalized eigenproblem. Inverse Power Method for Nonlinear Eigenproblems (H.,B.(2010)) p > 1 p = 1 g k+1 = arg min {R(u) u, s(f k ) } f k+1 = arg min {R(u) λ k u, s(f k ) } u R n f k+1 = g k+1 /S(g k+1 ) 1/p u 2 1 s(f k+1 ) S(f k+1 ) s(f k+1 ) S(f k+1 ) λ k+1 = R(f k+1) S(f k+1 ) λ k+1 = R(f k+1) S(f k+1 ) Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 7 / 35

13 Properties of Nonlinear Inverse Power Method Theorem (Hein, Bühler (NIPS 2010)): It holds either λ k+1 < λ k or λ k+1 = λ k and the sequence terminates. Moreover, for every cluster point f of the sequence f k one has 0 R(f ) λ S(f ), where λ = R(f ) S(f ). Guarantees: monotonic descent method subconvergence guaranteed to some nonlinear eigenvector but not necessarily the one associated with the smallest eigenvalue. additional inclusion of proximal term allows to show f k+1 f k 0 as k. Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 8 / 35

14 Benefits of Nonlinear Eigenproblems in Data Analysis Linear EP Nonlinear EP Modeling power low high Relaxation of loose tight combinatorial problems Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 9 / 35

15 Balanced Graph Cuts - Applications Clustering/Community detection Image Segmentation Parallel Computing (Matrix Reordering) Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 10 / 35

16 The Cheeger Cut Problem Cheeger cut: (C, C) is a partition of the weighted, undirected graph φ(c) = cut(c, C) min{ C,, where cut(a, B) = C } i A,j B Optimal Cheeger cut, φ = min φ(c), is NP-hard C w ij Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 11 / 35

17 Relaxation of Cheeger Cut Problem Relaxation into semi-definite program with V 3 constraints: Best known (worst case) approximation guarantee: O( log V ). Spectral Relaxation based on graph Laplacian L L = D W, Isoperimetric inequality (Alon, Milman (1984)) (φ ) 2 2 max i d i λ 2 (L) 2φ. bipartitioning obtained by optimal thresholding of second eigenvector φ φ SPECTRAL 2 max i d i φ there are graphs known which realize upper bound Extension for graph p-laplacian (Amghibech (2003), Bühler, H.(2009)) φ φ p SPECTRAL p(max i d i ) p 1 p (φ ) 1 p. Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 12 / 35

18 1-Spectral Clustering 1-graph Laplacian: The nonlinear graph 1-Laplacian 1 induces the functional F 1 (f ), F 1 (f ) := f, 1f f 1 = 1 2 n i,j=1 w ij f i f j f 1. Theorem (Hein,Bühler (NIPS 2010)): Let G be connected, then min C cut(c, C) min{ C, C } = min F 1 (f ) = λ 2 ( 1 ), f nonconstant median(f)=0 where λ 2 ( 1 ) is the second smallest eigenvalue of 1. The second eigenvector of 1 is the indicator vector of the optimal partition. Tight relaxation of the optimal Cheeger cut! Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 13 / 35

19 Cheeger Cut: 1-Laplacian (NLEP) vs. 2-Laplacian (LEP) Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 14 / 35

20 NIPM for 1-Spectral Clustering NIPM for 1-Spectral Clustering: Computing a nonconstant 1-eigenvector of the graph 1-Laplacian 1: Input: weight matrix W 2: Initialization: nonconstant f 0 with median(f 0 ) = 0 and f 0 1 = 1, accuracy ɛ 3: repeat 4: g k+1 = arg min 5: f f k+1 = g k+1 median ( g k+1) 6: v k+1 1 (f k+1 ), 7: λ k+1 = F 1 (f k+1 ) 8: until λk+1 λ k < ɛ λ k { 1 2 n i,j=1 w ij f i f j λ k f, v k } Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 15 / 35

21 Balanced Graph Cuts and Nonlinear Eigenproblems min A V cut(a, A) Ŝ(A) = min f R n 1 2 n i,j=1 w ij f i f j S(f ) Balancing set function Ŝ and their Lovasz extensions: Ŝ(A) S(f ) Cheeger cut min{ A, A } f median(f )1 1 1 Ratio cut A A n { 2 i,j=1 f i f j 1, if min{ A, A } K Hard balanced cut... 0, else. Modeling of different bias towards balanced partitions via choice of Ŝ. Tight relaxations for all balancing set functions (H., Setzer (NIPS 2011)) Earlier Work: Riemannian manifolds: Rothaus (1985) Graphs: Tillich (2000), Houdre (2001) Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 16 / 35.

22 Graph Partitioning Benchmark Results Graph partitioning benchmark of Chris Walshaw (since 2000) contains results of all major methods (linear, spectral, SDP relaxation, METIS and variants, evolutionary, multi-agent methods,...). Objective is the cut subject to a strict balancing constraint. For α {0, 1, 3, 5}, { min cut(c, C) C ( max{ C, C } 1 + α 100 ) V } graphs with 2395 V and 6837 E Bipartitioning: 1-Spectral Clustering yields better cuts in 16 out of 136 possible problems 16 better than - 89 equal to - 31 worse than best reported cut Multipartitioning: Results on 34 graphs, Better cuts in 274 out of 816 problems Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 17 / 35

23 Constrained Fractional Set Programs min =C V ˆR(C) Ŝ(C) subj. to : Mi (C) k i, i = 1,..., K has a tight relaxation into a nonlinear eigenproblem if ˆR, Ŝ are non-negative set functions ˆR( ) = Ŝ( ) = 0 The constraint functions ˆM i underlie no restrictions Includes a large class of graph-based clustering/community detection problems Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 18 / 35

24 Problem classes Unconstrained Problems: balanced graph cuts (Total variation (TV) on graphs: 1 2 n i,j=1 w ij f i f j ) balanced cuts for directed graphs and hypergraphs TV directed graphs: n i,j=1 w ij max{f i f j, 0} (in preparation) TV on hypergraphs: e E w(e) max f i f j (NIPS 2013) i,j e maximum density subgraph problem Constrained Problems: cardinality constraints (ICML 2013, WWW 2013) seed constraints (ICML 2013, WWW 2013) link constraints (AISTATS, 2012) Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 19 / 35

25 Definition Let f R V be ordered in increasing order f 1 f 2... f n and define C i = {j V f j > f i }.Then S : R V R given by S(f ) = n i=1 ) n 1 f i (Ŝ(Ci 1 ) Ŝ(C i ) = Ŝ(C i )(f i+1 f i ) + f 1 Ŝ(V ) i=1 is the Lovasz extension of Ŝ : 2 V R. One has S(1 A ) = Ŝ(A), A V. Definition A set function ˆF : 2 V R is submodular if for all A, B V, ˆF (A B) + ˆF (A B) ˆF (A) + ˆF (B). Lemma The Lovasz extension of a set function is convex iff the set function is submodular. Proposition Every set function can be written as difference of two submodular set functions. Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 20 / 35

26 Tight Relaxation of General Ratios Theorem Let ˆR, Ŝ : 2V R be non-negative set functions and R, S : R n R their Lovasz extensions. Then, If in addition ˆR(V ) = Moreover, min C V ˆR(C) Ŝ(C) = min f R n + Ŝ(V ) = 0, then min C V R(f ) S(f ) min i=1,...,n R(f ) S(f ). ˆR(C) = min R(f ) Ŝ(C) f R n S(f ). ˆR(C i ) Ŝ(C i ), where C i = {j V f j > f i }. If ˆR(V ) = Ŝ(V ) = 0, the inequality holds for all f R n. Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 21 / 35

27 Inclusion of Constraints Idea of the proof: turn the constrained combinatorial ratio problem into an unconstrained ratio problem via an exact penalty approach. use the previous theorem to get the tight relaxation Penalty Function: The penalty set function for a constraint ˆM i (C) k i is defined as { { max 0, k T i (C) = i M } i (C), C, 0, C =, Equivalence of Constrained Fractional Set Programs min M i (C) k i, i=1,...,k ˆR(C) Ŝ(C) = min C V ˆR(C) + γ k i=1 T i (C) Ŝ(C) The constant γ can be computed using a feasible set. Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 22 / 35

28 Optimization - RatioDCA Optimization Problem: R(f ) min f R n + S(f ) =: R 1(f ) R 2 (f ) S 1 (f ) S 2 (f ). Minimize non-negative ratio of one-homog. d.c functions over R n + { f l+1 = arg min R 1 (u) u, r 2 (f l ) + λ l( ) } S 2 (u) u, s 1 (f l ) u R n +, u 2 1 r 2 (f l ) R 2 (f l ), s 1 (f l ) S 1 (f l ) λ l+1 = R(f l+1) )/S(f l+1 ) sequence of convex optimization problems same convergence properties as nonlinear inverse power method finite convergence for combinatorial problems Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 23 / 35

29 Quality Guarantee Tight relaxation of combinatorial fractional program: Minimization of continuous relaxation is as hard as original combinatorial problem = non-convex and non-smooth No guarantee that one obtains optimal solution by RatioDCA! Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 24 / 35

30 Quality Guarantee Tight relaxation of combinatorial fractional program: Minimization of continuous relaxation is as hard as original combinatorial problem = non-convex and non-smooth No guarantee that one obtains optimal solution by RatioDCA! but Theorem Let A V be feasible and γ chosen such that equivalence holds. If one uses as initialization of RatioDCA, f 0 = 1 A, then either it terminates after one step or it yields an f 1 which after optimal thresholding gives a partition (B, B) which is feasible and ˆR(B) Ŝ(B) < ˆR(A) Ŝ(A). Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 24 / 35

31 Clustering in Practice Major Goal: Integration of prior knowledge in clustering/community detection problems via constraints! Prior work: Link constraints in k-means (Wagstaff et al, ICML 2001) Link constraints in spectral clustering (Eriksson et al, ICCV 2007, Wang & Davidson, KDD 2010) Communities from seed sets (Andersen, Lang (WWW 2006)) Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 25 / 35

32 Constrained Normalized Cut Clustering with link constraints (Rangapuram, H. (AISTATS 2012)) must-link and cannot-link constraints a partition is called consistent if all constraints are satisfied Constrained normalized cut problem: min (C,C) consistent cut(c, C) vol(c) vol(c) first method that can guarantee that constraints are satisfied! Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 26 / 35

33 Constrained Normalized Cut - II Must-link and cannot-link constraints Unconstrained normalized cut Constrained normalized cut Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 27 / 35

34 Constrained Clustering in Image Segmentation Constrained Spectral Clustering 1-Spectral Clustering with linear equality constr. Original Image (Rangapuram, H., AISTATS 2012) (Eriksson et al, ICCV 2007) Labels Norm. Cut: Norm. Cut: Norm. Cut: Norm. Cut: Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 28 / 35

35 Constrained Normalized Cut - Results III Binary-partitioning problem (Spam dataset V = 4207): Multi-partitioning problem (extended MNIST dataset, V = ): SCLC: Eriksson, Olsson, Kahl (ICCV 2007), Xu, Li, Schuurmans, (CVPR 2009) CCSR: Li, Liu, Tang (CVPR 2009) CSP: Wang, Davidson, (KDD 2010) SL: Kamvar, Klein, Manning, (IJCAI 2003) COSC: Rangapuram, Hein (AISTATS 2012) Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 29 / 35

36 Local Clustering Task of Local Clustering: find set C including seed set J which is well-clustered and small Optimization Problem: Prior work: min C V cut(c, C) Ŝ(C) subj to : vol g (C) k, and J C. Andersen, Chung, Lang (FOCS, 2006) - local PageRank (local random walk) Maji, Vishnoi, Malik (CVPR, 2011), Mahoney, Orrechia, Vishnoi (JMLR, 2012) - biased normalized cuts (local spectral method) Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 30 / 35

37 Local Clustering - Results Amazon0505 Cit-HepPh CA-GrQC vertices vertices 4158 vertices edges edges edges LRW - Local Random Walk (Andersen, Lang (2006)), LS - Local Spectral (Mahoney, Orrechia, Vishnoi (2012)), CFSP - our method (Bühler, Rangapuram, Setzer, Hein (ICML, 2013)) Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 31 / 35

38 Local Minima and Critical Points Combinatorial versus continuous problem: Let ˆR, Ŝ : 2V R be non-negative set functions and R, S : R n R their Lovasz extensions. Then, min C V ˆR(C) Ŝ(C) = min f R n + R(f ) S(f ). Global minimum is the same but what about local minima? Motivation: Characterization of local minima of Cheeger cut by Bresson et al. (2012). Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 32 / 35

39 Local Minima - Continuous versus Discrete Problem Theorem Let R, S be the Lovasz extensions of ˆR, Ŝ. Then f = 1 C is a local minimum of R(f ) S(f ) with value λ = R(f ) S(f ) = ˆR(C ) if and only if for any C Ŝ(C ) with C C or C C it holds λ = ˆR(C ) Ŝ(C ) ˆR(C) Ŝ(C). Structure of local minima of continuous and combinatorial problem are (almost) the same Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 33 / 35

40 Guarantees, Local Minima and Cheeger cut Cheeger cut problem: ˆR(C) = cut(c, C) = R(f ) = 1 2 n i,j=1 w ij f i f j Ŝ(C) = min{ C, C } = S(f ) = f median(f ) 1 Proposition If NIPM terminates at f, then f = n i=1 α i1 C i where C i = {j V f j > f i } and R(f ) S(f ) = ˆR(C i ) Ŝ(Ci ) if f = 1 C, where wlog C V 2, then every C C fulfills ˆR(C ) Ŝ(C ) ˆR(C) Ŝ(C). Moreover, if C = C then f is a local minimum. Ongoing work: Global approximation guarantees. Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 34 / 35

41 Conclusion and Outlook Tight relaxation of nonlinear eigenproblems Tight relaxation of combinatorial fractional programs Integration of prior knowledge into clustering tasks RatioDCA makes computation of nonlinear eigenvectors feasible Locally optimal solutions of tight relaxations are better than globally optimal solutions of loose relaxations! Open Problems in Nonlinear Eigenproblems: What is a suitable min-max principle for nonlinear eigenvectors? Computation of higher-order eigenvectors Higher order cuts via min-max principle Approximation guarantees for tight relaxations of combinatorial problems Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 35 / 35

Inverse Power Method for Non-linear Eigenproblems

Inverse Power Method for Non-linear Eigenproblems Matthias Hein and Thomas Bühler Anubhav Dwivedi Department of Aerospace Engineering & Mechanics 7th March, 2017 1 / 30 OUTLINE Motivation Non-Linear Eigenproblems