Robust Asymmetric Nonnegative Matrix Factorization

Size: px

Start display at page:

Download "Robust Asymmetric Nonnegative Matrix Factorization"

Horace Johnson
6 years ago
Views:

1 Robust Asymmetric Nonnegative Matrix Factorization Hyenkyun Woo Korea Institue for Advanced Study Feb Joint work with Haesun Park Georgia Institute of Technology, H. Woo (KIAS) RANMF / 54

2 Outline 1 Robust Principal Component Analysis 2 Asymmetric Nonnegative Matrix Factorization Asymmetric Nonnegative Nuclear Norm Denseness and Stability Asymmetric Incoherence Condition 3 Asymmetric Soft Regularization 4 Numerical Results H. Woo (KIAS) RANMF / 54

3 Outline Robust Principal Component Analysis 1 Robust Principal Component Analysis 2 Asymmetric Nonnegative Matrix Factorization Asymmetric Nonnegative Nuclear Norm Denseness and Stability Asymmetric Incoherence Condition 3 Asymmetric Soft Regularization 4 Numerical Results H. Woo (KIAS) RANMF / 54

4 Motivation : Robust Principal Component Analysis Given data : A matrix (A 0) Separation Problems : A=L 0 (low rank)+o 0 (grouped outliers) Related works: Candes, Wright, Ma, Chandrasekaran, Parrilo, Tao, Recht, Osher, Yin, etc.

5 Goal : Asymmetric NMF for (low rank + + outliers) Background modeling by ANMF min L,O { O l 1 + β A L O 2 F : rank +(L) r, 0 L R} Figure : Proposed Model (ANMF for low rank + + Outliers) Choose relatively large r > rank + (L 0 ) rank + (L 0 ) r eff r is automatically recovered by Soft Regularization (or algorithmic regularization)

6 Review: Robust Principal Component Analysis Robust PCA: min L A L l1 s. t. rank(l) r. where X l1 = ij X ij. NP-hard problem and highly nonconvex How to find low rank matrix L? We have relaxations: nuclear norm, γ 2 -norm, etc.

7 Robust PCA with nuclear norm or min λ A L l1 + L L Principal component Pursuit (Candes et al. 2011): (ˆL, Ô) = arg min λ O l1 + L L,O s. t. O = A L, where L = i σ i(l) (sum of singular values of L) Note that e i is a standard basis (0,.., 0, 1, 0,...0). What happen if ˆL = e i e T j and Ô = e ke T l? Can we fix λ for random sparse outliers Ô?

8 Incoherence parameter µ(u) Definition Let U R n with dim(u) = r and P U be the orthogonal projection onto U. Then the coherence of U is defined as where e i is a standard basis. µ(u) = n r max 1 i n P Ue i 2 2 P U = U(U T U) 1 U T (U is full rank) P U e i 2 2 = UT e i 2 2 when UT U = I 1 µ(u) n r

9 Incoherence condition 1 where r = rank(l 0 ) and L 0 R n n = UΣV T = r i=1 σ i u i v T i max i U T e i 2 2 µr n, max i V T e i 2 2 µr n and e i = (0, 0,..., 1,..., 0) is a standard coordinate basis. Note that and µ should be sufficiently small. 1 µ n r 1 Candes and Recht 09, Candes and Tao 10

10 When A = L 0 + O 0 Theorem L 0 : rank(l 0 ) ρ r n µ(log n) 2 O 0 : random sparsity pattern of cardinality m s ρ s n 2 Then with probability 1 O(n 10 ), RPCA with λ = 1 n can separate: ˆL = L 0, Ô = O 0 But, we do not know about incoherence condition parameter µ (1 µ n r )!!! How do we control µ to be small? In general, SVD generate very dense basis!! If O 0 have grouped outliers then how? We need to tune λ 2 parameter based on size of outliers. 2 Ramiraz and Sapiro 12

11 Goal: Column Outliers + Nonnegative Rank Definition (Column Outliers) Let O R m n be grouped outliers with limited row sparsity, i.e., max row i(o) 0 < ζn, 1 i m then we call O as column outliers. Here, 0 < ζ < 1 2 level of O in row direction. decides sparsity min L,O { Φ(O) + α 2 A L O 2 F : R(L) τ and 0 < L B L } (1) Φ is a sparsity enforcing function (elementwise separable, l 0 ) l 0 < O < l 2,0 How to? Change rank constraint R(L) to handle column outliers O.

12 Outline Asymmetric Nonnegative Matrix Factorization 1 Robust Principal Component Analysis 2 Asymmetric Nonnegative Matrix Factorization Asymmetric Nonnegative Nuclear Norm Denseness and Stability Asymmetric Incoherence Condition 3 Asymmetric Soft Regularization 4 Numerical Results H. Woo (KIAS) RANMF / 54

13 Nonnegative Rank Nonnegative rank of L R m n + : rank + (L) = arg min r {r L = WH T, W R m r +, H Rn r + } 0 rank(l) rank + (L) min{n, m} NP-hard Problem 3 We need a relaxation for rank + (L) How to control denseness of L to separable? 3 Vavasis 09

14 Review: Nuclear norm and γ 2 -norm for rank 5 Nuclear norm: L = arg min { λ i : L = i λ i w i h T i, w i B 2 (m), h i B 2 (n) } where B 2 (d) = {x R d x 2 = 1} γ 2 -norm: L γ2 4 min{ λ i : L = i λ i w i h T i, w i B (m), h i B (n) } where B (d) = {x R d x 1} Lee et.al. 10

15 Review: Nuclear norm and γ 2 -norm for rank + nuclear norm 6 : L + = arg min { λ i : L = i λ i w i h T i, w i B + 2 (m), h i B + 2 (n) } where B + 2 (d) = {x Rd + x 2 1} γ 2 -norm: L + γ 2 min{ λ i : L = i λ i w i h T i, w i B + (m), h i B + (n) } where B + (d) = {x R d + x 1} 6 Fawzi & Parrilo 12

16 Possible Candidate for rank + + Column Outliers min O l 1 + α W 0,H 0,O 2 A WHT O 2 F + γψ(wht ) nuclear norm: Ψ(WH T ) = WH T + γ 2 -norm: Ψ(WH T ) = WH T + γ 2 But, we don t want to tune γ parameter!! Are W and H sufficiently dense for separation of column outliers?

17 Relaxation: Asymmetric Nonnegative Nuclear Norm L = arg min{ i λ i : L = i λ i w i h T i, w i Z ηwi (m), h i Z ηhi (n) } where Z η (d) = { v R d + : v B + 2 (d) ηb+ (d) } B + 2 (d) = {v Rd + : v 2 = 1} B (d) + = {v R d + : v 1} λ i λ j if i < j Note that we call L = i λ iw i h i as Asymmetric NMF. We have many parameters : η wi and η hi. Why: To control denseness for separation of column outliers

18 Toy Model for low rank + + column outliers Matrix decompositions (low rank + + column outliers) of the proposed Asymmetric NMF model (TOP) and Robust PCA (BOTTOM). Note that A [0, 255] Bottom: The basis of RPCA is orthogonal and dense. Top : The basis + of ANMF is linearly independent and sparse.

19 Toy Model for low rank + + column outliers Matrix decompositions (low rank + + column outliers) of the proposed Asymmetric NMF model (TOP) and Robust PCA (BOTTOM). Note that A [0, 255] Bottom: The basis of RPCA is orthogonal and dense. Top : The basis + of ANMF is linearly independent and sparse.

20 Toy Model for low rank + + column outliers Matrix decompositions (low rank + + column outliers) of the proposed Asymmetric NMF model (TOP) and Robust PCA (BOTTOM). Note that A [0, 255] Bottom: The basis of RPCA is orthogonal and dense. Top : The basis + of ANMF is linearly independent and sparse.

21 Why Z η (d) : Denseness 1 d B + (d) B + 2 (d) 1B+ (d) where B + 2 (d) = {x Rd + : x 2 = 1} and B + (d) = {x R d + : x 1} v 1 B + d (d) B + 2 (d) : the unique densest unit vector dense vectors becomes unstable!! v 1B (d) + B + 2 (d) : it can be standard coordinate include the sparsest unit vector (i.e., v 0 = 1)

22 Why Z η (d) : Lower bound of sparsity Lemma Let w Z η (d) = { v R d + : v B + 2 (d) ηb+ (d) }, then where w 0 = #{w 0} w 0 1/ w 2 1/η 2 Proof. Let 1 d = (1,..., 1) T then w Z η (d), we get 1 w, 1 d 1 w, w 1 d w, η1 d w 0 where w, 1 d > 0.

23 Singular values of Asymmetric NMF (L = i λ iw i h T i ) Theorem Let L = i L i with L = arg min{ i λ i : L i = λ i w i h T i, w i Z ηwi (m), h i Z ηhi (n) }. Then E 0 (L i ) L i 0 λ i R L i 0 where E 0 (x) = x 1 / x 0 and R = L l Approximately, we can say λ i O( L i 0 )

24 Lower bound of rank + (L) Theorem Let L = i L i have an Asymmetric NMF (L i = λ i w i h T i ) with L = arg min{ i λ i : w i Z ηwi (m), h i Z ηhi (n) }. Then rank + (L) L max i λ i and rank + (L) L A max i Li l0 where L A = i Li l0 norm-based and combinatorial bound for rank + (L)

25 Stability of basis matrix W Z m r η W = { W = [w 1,..., w r ] : w i Z ηwi (m), det(w T W ) 0} where η W = max i η Wi. Is W stable? S(W ) = W T W I l = max i j w T i w j stable 0 S(W ) < 1 unstable W = ρ max (W T W ) stable 1 W 1 + (r 1)S(W ) < r unstable Which one do you prefer as a measure of stability of a matrix W?

26 Stability of basis matrix : W min c i + ɛ W i ( ) 1 a 1 W max c i ɛ W (1 a), i c i is the sum of i-th column elements of W T W ɛ W = min i,j w T i w j > 0 When ɛ W = S(W ), we get W = 1 + (r 1)S(W ) W is more robust measure of stability of W, since it is related to all elements of W

27 Stability of basis matrix: δ-distinguishable W Z m r η W = { W = [w 1,..., w r ] : w i Z ηwi (m), det(w T W ) 0} where η W = max i η Wi. Is W (or Z ηw (m)) δ-distinguishable? For any w i, w j W (or Z ηw (m)), w i w j 2 δ? Equivalently, S(W ) = max i j w T i w j 1 δ2 2 Example For all δ > 0, Z m r 1 m is not δ-distinguishable.

28 Stability vs. Denseness Theorem For 1 k m and 0 < δ 2 k, we get N( δ ( m 2, Z 1 (m)) k k ) (2) where N(ε, Z ηw (m)) is the cardinality of balls in Z ηw with minimum ε-radius and no intersection each other (ε-packing). If we relax denseness (k small) then we can obtain more stable basis (δ big)

29 Example Let W = [w 1,..., w r ] Z m r 1 k with w i V k for all i = 1,..., r. V k = {x = (x 1,..., x m ) T 0 : x 2 = 1, x 0 = k, and x i {0, 1 k }}. Let us assume that ι T w i ι wj = k 1 for all i j. Here, ι wi is an indicator function. Then, the condition number of W becomes ρ max (W cond(w ) = T W ) ρ min (W T W ) = rk r + 1 O( rk). That is, stability of W depends on rank parameter r and sparsity k of each column vector w i.

30 Asymmetric NMF for low rank + + column outliers min L,O { O l 1 + β A O L 2 F : L = WSHT, W Z m r 1, H Z n r η H } where S = diag(λ 1, λ 2,..., λ r ) with λ i λ j if i < j. ANMF for low rank + + column outliers: Assumption: column outliers are tall (mainly stay on column direction) W : stable (and sparse), H: dense (and unstable 7 ) Good news: w i h T i 0 O(λ 2 i ) How to find a solution of the model? Soft Regularization 7 Studer et.al. (2014) Democratic Representations (equal selection of basis in W )

31 Stability of ANMF depends on stability of W (i.e., W or S(W )): where E ij = E ij wi T w j, i j, w i hi T, w j hj T. Denseness of w i h T i depends on denseness of H (i.e., η H ): w i hi T l0 = w i 0 h i 0 1 ηh 2, Since w i Z1 m, a rank one matrix w ihi T row direction (i.e., in h i direction). can be thin structure in Column outliers O need to satisfy the following condition max row i (O) 0 ζn < 1 i ηh 2 min i h i 0.

32 Asymmetric Nonnegative Matrix Factorization Denseness and Stability Asymmetric Incoherence Condition Definition For H Zη n r H, let then we get Ξ(H) ( 1 rn, 1]. Ξ(H) = H l H, (3) It decides stability and denseness of a matrix H. If Ξ(H) is large ( 1), then H is stable but sparse. If Ξ(H) is small ( 1 rn ), then H is dense but unstable. H. Woo (KIAS) RANMF / 54

33 Asymmetric Incoherence Condition Definition Let L = r i=1 λ iw i h T i be an ANMF with basis matrix W = [w 1,..., w r ] and coefficient matrix H = [h 1,..., h r ]. We define asymmetric incoherence criterion of L as follows: 1 < ainc(l) = Ξ(H) rn Ξ(W ) < rm It decides stability and denseness of a matrix L = WSH T. ainc(l) rm: W dense and H stable ainc(l) 1 rn : W stable and H dense

34 Asymmetric Nonnegative Matrix Factorization Asymmetric Incoherence Condition Asymmetric Incoherence Condition Example Let L = WSH T = r i=1 λ iw i hi T 1 be an ideal k -dense vectors. That is, W Z m r 1, H Z n r 1, w i 0 = k W, h i 0 = k H. Let us assume that kw kh ι T w i ι wj = k W 1 and ι T h i ι hj = k H 1 for worst case separability. Then Ξ(W ) = 1 rkw r + 1 = 1 cond(w ) and Ξ(H) =... = 1 cond(h). and ainc(l) = cond(w ) cond(h) = rk W r + 1 rk H r + 1 k W k H = η H η W. H. Woo (KIAS) RANMF / 54

35 Outline Asymmetric Soft Regularization 1 Robust Principal Component Analysis 2 Asymmetric Nonnegative Matrix Factorization Asymmetric Nonnegative Nuclear Norm Denseness and Stability Asymmetric Incoherence Condition 3 Asymmetric Soft Regularization 4 Numerical Results H. Woo (KIAS) RANMF / 54

36 ASR: Asymmetric Soft Regularization where ( W k+ 1 2, H k+ 1 2 ) = argmin { Ã r W 0, H 0 i=1 w i h i T 2 F : h i cb (n)} + ( W k+1, H k+1 ) = BASIS( W k+ 1 2, H k+ 1 2 ) satisfies the following conditions: with an r r diagonal matrix S. (W, HS) = BASIS( W, H) WSH T = W H T W = [w 1, w 2,..., w r ], w i Z 1 (m) H = [h 1, h 2,..., h r ], h i Z ηhi (n) diag i (S) diag j (S) if i < j

37 Optimization Framework for low rank + + outliers O k+1 = argmin O O 1 + β A O r i=1 w k i ( h k i ) T 2 F ( W k+ 1 2, H k+ 1 2 ) = argmin{ A O k+1 r W 0, H ( W k+1, H k+1 ) = BASIS( W k+ 1 2, H k+ 1 2 ) i=1 w i h T i 2 F : h i cb + (n)} where (W, HS) = BASIS( W, H) and W W 1 and H H ηh and Plug-In Outliers detector into ASR(asymmetric soft regularization) Now, we only need to solve L2-NMF with a box-constraint on H!! Asymmetric NMF : L = WSH T

38 L2-NMF Solver: Hierarchical ALS 8 min { F(h 1, h 2,..., h r ; w 1, w 2,..., w r ) : w i R m, h i cb (n) + } W,H F(h 1, h 2,..., h r ; w 1, w 2,..., w r ) = Ã r i=1 w ih T i 2 F BCD Framework: Update H (i=1,...,r): Update W (i=1,...,r): min F (h k+1 h i cb (n) + 1, h k+1 2,..., h i,..., hr k ; w1 k,..., wr k ) min F(h k+1 w i R m 1,..., hr k+1 ; w k+1 1,..., w i,..., wr k ) + 8 Cichocki et.al. 07

39 Outline Numerical Results 1 Robust Principal Component Analysis 2 Asymmetric Nonnegative Matrix Factorization Asymmetric Nonnegative Nuclear Norm Denseness and Stability Asymmetric Incoherence Condition 3 Asymmetric Soft Regularization 4 Numerical Results H. Woo (KIAS) RANMF / 54

40 Numerical Results : Synthetic Image (rank + (L) = 2) There is a strong correlation between r eff and cb + (d) small c r eff rank + (L)

41 Numerical Results (r = 20) Yale B dataset We have 64 different illumination direction A is matrix

42 20 basis of ANMF(top) vs RPCA(bottom)

43 20 basis of Lp-RANMF(top) vs L1-NMF(bottom)

44 asymmetric incoherence condition for Face Image

45 RPCA( λ is tuned for each image) vs RANMF(r = 20)

46 Numerical Results : ainc(l)

56 Numerical Results Thank you!! H. Woo (KIAS) RANMF / 54

Robust Principal Component Analysis

ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M