Supplementary Materials: Bilinear Factor Matrix Norm Minimization for Robust PCA: Algorithms and Applications

Size: px

Start display at page:

Download "Supplementary Materials: Bilinear Factor Matrix Norm Minimization for Robust PCA: Algorithms and Applications"

Zoe Newton
5 years ago
Views:

1 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 16 Supplementary Materials: Bilinear Factor Matrix Norm Minimization for Robust PCA: Algorithms and Applications Fanhua Shang, Member, IEEE, James Cheng, Yuanyuan Liu, Zhi-Quan Luo, Fellow, IEEE, and Zhouchen Lin, Senior Member, IEEE In this supplementary material, we give the detailed proofs of some theorems, lemmas and properties We also provide the stopping criterion of our algorithms and the details of Algorithm In addition, we present two new ADMM algorithms for image recovery application and their pseudo-codes, and some additional experimental results for both synthetic and real-world datasets NOTATIONS R l denotes the l-dimensional Euclidean space, and the set of all m n matrices 1 with real entries is denoted by R m n TrX T Y = ij X ijy ij, where Tr denotes the trace of a matrix We assume the singular values of X R m n are ordered as σ 1 X σ X σ r X > σ r+1 X = = σ n X = 0, where r = ranx Then the SVD of X is denoted by X = UΣV T, where Σ = diagσ 1,, σ n I n denotes an identity matrix of size n n Definition 5 For any vector x R l, its l p -norm for 0<p< is defined as l 1/p x lp = x i p i=1 where x i is the i-th element of x When p = 1, the l 1 -norm of x is x l1 = i x i which is convex, while the l p -norm of x is a quasi-norm when 0 < p < 1, which is non-convex and violates the triangle inequality In addition, the l -norm of x is x l = i x i The above definition can be naturally extended from vectors to matrices by the following form S lp = s i,j p i,j 1/p Definition 6 The Schatten-p norm 0<p< of a matrix X R m n is defined as follows: n 1/p X Sp = σ p i X where σ i X denotes the i-th largest singular value of X i=1 F Shang, J Cheng and Y Liu are with the Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, NT, Hong Kong {fhshang, jcheng, yyliu}@csecuheduh Z-Q Luo is with Shenzhen Research Institute of Big Data, the Chinese University of Hong Kong, Shenzhen, China and Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN 55455, USA luozq@cuheducn and luozq@umnedu Z Lin is with Key Laboratory of Machine Perception MOE, School of EECS, Peing University, Beijing , PR China, and the Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, Shanghai 0040, PR China zlin@pueducn 1 Without loss of generality, we assume m n in this paper

2 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 17 In the following, we will list some special cases of the Schatten-p norm 0<p< When 0<p<1, the Schatten-p norm is a quasi-norm, and it is non-convex and violates the triangle inequality When p=1, the Schatten-1 norm also nown as the nuclear norm or trace norm of X is defined as n X = σ i X When p=, the Schatten- norm is more commonly called the Frobenius norm defined as X F = n Xi,j APPENDIX A: PROOF OF LEMMA i=1 i=1 σ i X = i,j To prove Lemma, we first define the doubly stochastic matrix, and give the following lemma Definition 7 A square matrix is doubly stochastic if its elements are non-negative real numbers, and the sum of elements of each row or column is equal to 1 Lemma 7 Let P R n n be a doubly stochastic matrix, and if then 0 x 1 x x n, y 1 y y n 0, 6 n p ij x i y j i,j=1 n x y The proof of Lemma 7 is essentially similar to that of the lemma in [1], thus we give the following proof setch for this lemma Proof: Using 6, there exist non-negative numbers α i and β j for all 1 i, j n such that x = α i, y = β j for all = 1,, n 1 i j n Let δ ij denote the Kronecer delta ie, δ ij =1 if i=j, and δ ij =0 otherwise, we have n n n x y p ij x i y j = δ ij p ij x i y j =1 = 1 i,j n = If r s, by the lemma in [1] we now that Therefore, we have 1 r,s n r i n,1 j s i,j=1 δ ij p ij α r β s 1 i<r,1 j s =1 i,j=1 α r β s 1 r i j s n r i n,1 j s δ ij p ij + r i n,1 j s The same result can be obtained in a similar way for r s Proof of Lemma : Proof: Using the properties of the trace, we now that TrX T Y = δ ij p ij δ ij p ij 0, and 1 i<r,1 j s δ ij p ij 0 n U X ijλ i τ j i,j=1 δ ij p ij = 0 Note that U X is a unitary matrix, ie, UX T U X = U X UX T = I n, which implies that U X ij is a doubly stochastic matrix By Lemma 7, we have TrX T Y n i=1 λ iτ i Note that the Frobenius norm is the induced norm of the l -norm on matrices

3 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 18 APPENDIX B: PROOFS OF THEOREMS 1 AND Proof of Theorem 1: Proof: Using Lemma, for any factor matrices U R m d and V R n d with the constraint X =UV T, we have U + V X S1/ On the other hand, let U =L X Σ 1/ X and V =R X Σ 1/ X, where X =L XΣ X RX T is the SVD of X as in Lemma, then we have X = U V T and X S1/ = U + V /4 Therefore, we have This completes the proof U,V :X=UV T U + V = X S1/ Proof of Theorem : Proof: Using Lemma 4, for any U R m d and V R n d with the constraint X =UV T, we have U / F + V X S/ On the other hand, let U =L X Σ 1/ X and V =R X Σ / X, where X =L XΣ X RX T is the SVD of X as in Lemma, then we have X = U V T and X S/ = [ U F + V /] / Thus, we have This completes the proof U,V :X=UV T U / F + V = X S/ APPENDIX C: PROOF OF PROPERTY 4 Proof: The proof of Property 4 involves some properties of the l p -norm, which we recall as follows For any vector x in R n and 0 < p p 1 1, the following inequalities hold: x l1 x lp1, x lp1 x lp n 1 p 1 p 1 x lp1 Let X be an m n matrix of ran r, and denote its compact SVD by X = U m r Σ r r Vn r T By Theorems 1 and, and the properties of the l p -norm mentioned above, we have X = diagσ r r l1 diagσ r r l 1 = X D-N ranx X, X = diagσ r r l1 diagσ r r l = X F-N ranx X In addition, Therefore, we have X F-N = diagσ r r l diagσ r r l 1 = X D-N X X F-N X D-N ranx X This completes the proof APPENDIX D: SOLVING 18 VIA ADMM Similar to Algorithm 1, we also propose an efficient algorithm based on the alternating direction method of multipliers ADMM to solve 18, whose augmented Lagrangian function is given by L µ U, V, L, S, V, Y 1, Y, Y = λ U F + V + P Ω S / l / + Y 1, V V + Y, UV T L + Y, L+S D + µ UV T L F + V V F + L + S D F where Y 1 R n d, Y R m n and Y R m n are the matrices of Lagrange multipliers

4 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 19 Algorithm ADMM for solving S+L / problem 18 Input: D R m n, the given ran d and λ Initialize: µ 0, ρ>1, =0, and ϵ 1: while not converged do : while not converged do : Update U +1 and V +1 by 61 and 6, respectively 4: Update V +1 by V +1 = D λ/µ V +1 Y /µ 5: Update L +1 and S +1 by 4 and in this paper, respectively 6: end while // Inner loop 7: Update the multipliers Y1 +1, Y +1 and Y +1 by Y1 +1 =Y1 +µ V +1 V +1, Y +1 =Y +µ U +1 V+1 T +1, and Y +1 =Y +µ L +1 +S +1 D 8: Update µ +1 by µ +1 = ρµ 9: : end while // Outer loop Output: U +1 and V +1 Update of U +1 and V +1 : For updating U +1 and V +1, we consider the following optimization problems: V U and their optimal solutions can be given by where P =L µ 1 Y Update of V +1 : V +1 = λ U F + µ UV T L + µ 1 Y F, 7 V V + µ 1 Y 1 F + U +1 V T L + µ 1 Y F, 8 U +1 = µ P V λ I d + µ V T V 1, 9 [ V + µ 1 Y 1 + P T U +1 ] I d + U T +1U +1 1, 40 To update V +1, we fix the other variables and solve the following optimization problem V λ V + µ V V +1 + Y 1 /µ F 41 Similar to and 4, the closed-form solution of 41 can also be obtained by the SVT operator [] defined as follows Definition 8 Let Y be a matrix of size m n m n, and U Y Σ Y VY T be its SVD Then the singular value thresholding SVT operator D τ is defined as []: D τ Y = U Y S τ Σ Y VY T, where S τ x = max x τ, 0 sgnx is the soft shrinage operator [], [4], [5] Update of L +1 : For updating L +1, we consider the following optimization problem: U +1 V T L +1 L+µ 1 Y F + L+S D+µ 1 Y F 4 Since 4 is a least squares problem, and thus its closed-form solution is given by L +1 = 1 U +1 V+1 T + µ 1 Y S + D µ 1 Y 4 Together with the update scheme of S +1, as stated in in this paper, we develop an efficient ADMM algorithm to solve the Frobenius/nuclear hybrid norm penalized RPCA problem 18, as outlined in Algorithm

5 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 0 APPENDIX E: PROOF OF THEOREM In this part, we first prove the boundedness of multipliers and some variables of Algorithm 1, and then we analyze the convergence of Algorithm 1 To prove the boundedness, we first give the following lemma Lemma 8 [6] Let H be a real Hilbert space endowed with an inner product, and a corresponding norm, and y x, where fx denotes the subgradient of fx Then y =1 if x 0, and y 1 if x=0, where is the dual norm of For instance, the dual norm of the nuclear norm is the spectral norm,, ie, the largest singular value Lemma 9 Boundedness Let Y +1 1 =Y 1 +µ Û+1 U +1, Y +1 =Y +µ V +1 V +1, Y +1 =Y +µ U +1 V T and Y +1 4 = Y 4 +µ L +1 +S +1 D Suppose that µ is non-decreasing and {Û, V }, {Y 1, Y, Y µ +1 =0 µ 4/ 4 / µ 1 }, {L } and {S } produced by Algorithm 1 are all bounded +1 L +1 <, then the sequences {U, V }, Proof: Let U := U, V, V := Û, V and Y := Y 1, Y, Y, Y 4 By the first-order optimality conditions of the augmented Lagrangian function of 17 with respect to Û and V ie, Problems and 4, we have 0 ÛL µ U +1, V +1, L +1, S +1, Y and 0 V L µ U +1, V +1, L +1, S +1, Y, ie, Y1 +1 λ Û+1 and Y +1 λ V +1, where Y1 +1 = µ Û+1 U +1 +Y1 /µ and Y +1 = µ V +1 V +1 + Y /µ By Lemma 8, we obtain Y +1 1 λ and Y +1 λ, which implies that the sequences {Y1 } and {Y } are bounded Next we prove the boundedness of {Y4 / µ 1 } Using 7, it is easy to show that PΩ 4 } is bounded Let A=D L +1 µ 1 the sequence {PΩ +1 Y {Y4 / µ 1 }, we consider the following two cases Case 1: a ij > µ If a ij > µ, i, j Ω Y +1 4 = 0, which implies that Y 4, and ΦS:= P ΩS 1/ l 1/ To prove the boundedness of, and using Proposition 1, then [S +1 ] ij > 0 The first-order optimality condition of 7 implies that [ ΦS +1 ] ij + [Y +1 4 ] ij = 0, where [ ΦS +1 ] ij denotes the gradient of the penalty ΦS at [S +1 ] ij Since Φs ij = signs ij / s ij, then [Y ] ij = [S +1 ] ij Using Proposition 1, we have [Y4 +1 ] ij 1 = µ µ [S+1 ] ij = where γ =/µ Case : a ij, i, j Ω µ If a ij, and using Proposition 1, we have [S +1 ] ij = 0 Since µ [Y +1 4 /µ ] ij = [L +1 + µ 1 Y a ij 1 + cos π ϕ γa ij µ 4 D + S +1 ] ij = [L +1 + µ 1 1, Y 4 D] ij = a ij, µ then [Y4 +1 ] ij µ, and Y +1 4 F / µ = P Ω Y +1 4 F / µ Ω Therefore, {Y 4 / µ 1 } is bounded By the iterative scheme of Algorithm 1, we have L µ U +1, V + 1, L +1, S +1, Y L µ U +1, V +1, L, S, Y L µ U, V, L, S, Y =L µ 1 U, V, L, S, Y 1 + α Y j=1 j Yj 1 Y F + β 4 Y 4 1 F µ 1,

6 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 1 where α = µ 1+µ µ 1, β = µ 1+µ, and the above equality holds due to the definition of the augmented Lagrangian µ 1 4/ function L µ U, V, L, S, Û, V, {Y i } Since µ is non-decreasing, and =1 µ /µ 4/ <, then =1 β = =1 α = =1 =1 µ 1 + µ µ 1 µ 1 + µ µ 4/ 1 =1 µ µ =1 1 1 µ <, µ 4/ 1 µ < < =1 µ 4/ 1 Since { Y4 F / µ 1 } is bounded, and µ = ρµ 1 and ρ > 1, then { Y4 1 F / µ 1 } is also bounded, which implies that { Y4 Y4 1 F µ / 1 } is bounded Then {L µ U +1, V +1, L +1, S +1, Y } is upper-bounded due to the boundedness of the sequences of all Lagrange multipliers, ie, {Y 1 }, {Y }, {Y } and { Y 4 Y 1 4 F / µ } λ Û + V + P Ω S 1/ l 1/ = L µ 1 U, V, L, S, Y i=1 Y i F 1 Yi F is upper-bounded note that the above equality holds due to the definition of L µ U, V, L, S, Û, V, {Y i }, thus {S }, {Û} and { V } are all bounded Similarly, by U = Û [Y1 Y1 1 ]/µ 1, V = V [Y Y 1 ]/µ 1, L = U V T [Y Y 1 ]/µ 1 and the boundedness of {Û}, { V }, {Yi } i=1,,, and {Y4 / µ 1 }, thus {U }, {V } and {L } are also bounded This means that each bounded sequence must have a convergent subsequence due to the Bolzano-Weierstrass theorem µ 1 Proof of Theorem : Proof: I Û+1 U +1 = [Y1 +1 Y1 ]/µ, V +1 V +1 = [Y +1 Y ]/µ, U +1 V+1 T L +1 = [Y +1 Y ]/µ and L +1 +S +1 D = [Y4 +1 Y4 ]/µ Due to the boundedness of {Y1 }, {Y }, {Y } and {Y4 / µ 1 }, the non-decreasing property of {µ }, and <, we have =0 =0 =0 µ +1/µ 4/ Û+1 U +1 F L +1 U +1 V T +1 F which implies that =0 =0 lim Û+1 U +1 F =0, µ +1 µ Y1 +1 Y1 F <, µ +1 µ Y +1 Y F <, V +1 V +1 F =0 =0 µ +1 µ =0 L +1 +S +1 D F lim V +1 V +1 F = 0, and lim L +1 + S +1 D F = 0 =0 Y +1 Y F <, µ +1 Y +1 µ 5/ lim L +1 U +1 V+1 T F = 0, 4 Y4 F µ 1/ Hence, {U, V, Û, V, L, S } approaches to a feasible solution In the following, we will prove that the sequences {U } and {V } are Cauchy sequences Using Y1 = Y1 1 +µ 1 Û U, Y = Y 1 +µ 1 V V and Y = Y 1 +µ 1 U V T L, then the first-order optimality conditions of 19 and 0 with respect to U and V are written as follows: U +1 V T L + Y V + U +1 Û Y 1 µ µ = U +1 V T U V T Y 1 + Y + Y µ 1 µ 1 µ Y Y 1 = U +1 U V T V +I d + = 0, µ 1 + Y V +U +1 U +U Û Y 1 µ V + Y 1 1 Y1 µ Y 1 µ 1 µ <, 44

7 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE = V +1 U T +1 L T + Y T V +1 U T +1 V U T µ U +1 + V +1 V Y 1 Y T + Y T + Y µ 1 µ 1 T µ = V +1 V U T +1U +1 +I d +V U T +1 U T U +1 + = 0 By 44 and 45, we obtain = Recall that we have V +1 V [ V U T U T +1U +1 + µ µ U +1 + V +1 V +V V Y Y T Y 1 T µ 1 + Y T U +1 U [ Y 1 Y = Y V + Y 1 Y1 1 + Y ] 1 1 V T V + I d, µ 1 µ 1 =0 Y 1 T Y T µ 1 Y T U +1 U F =0 =0 µ µ +1 µ 4/ µ 1 ϑ 1 <, =0 µ U +1 + Y Y 1 µ +1 µ ϑ 1 µ µ + Y µ 1 =0 µ µ +1 ϑ µ 4/ 1 <, U +1 + Y 1 Y Y µ 1 ] U T +1U +1 +I d 1 where the constant ϑ 1 is defined as {[ ] } ϑ 1 = max ρ Y 1 Y F + Y F V F +ρ Y1 Y1 1 F + Y1 F V T V +I d 1 F, = 1,, In addition, V +1 V F =0 1 [ µ =0 + =0 =0 ρ Y 1 Y F + Y F U +1 F +ρ Y Y 1 V F U +1 F U T +1U +1 +I d 1 F U +1 U F 1 ϑ U +1 U F + ϑ µ =0 ϑ U +1 U F + =0 F + Y F ] U+1U T +1 +I d 1 F =0 µ +1 µ ϑ <, where the constants ϑ and ϑ are defined as { } ϑ = max V F U +1 F U+1U T +1 +I d 1 F, = 1,,, {[ } ϑ = max ρ Y 1 Y F + Y F U +1 F +ρ Y Y 1 F + Y F ] U +1U T +1 +I d 1 F, = 1,, Consequently, both {U } and {V } are convergent sequences Moreover, it is not difficult to verify that {U } and {V } are both Cauchy sequences Similarly, {Û}, { V }, {S } and {L } are also Cauchy sequences Practically, the stopping criterion of Algorithm 1 is satisfied within a finite number of iterations II Let U, V, Û, V, L, S be a critical point of 17, and ΦS = P Ω S 1/ l 1/ Applying the Fermat s rule as in [7] to the subproblem 7, we then obtain 0 λ Û + Y 1 and 0 λ V + Y, 0 ΦS + P Ω Y 4 and P Ω cy 4 = 0, L = U V T, Û = U, V = V, L + S = D µ 45

8 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Applying the Fermat s rule to 7, we have 0 ΦS +1 + P Ω Y +1 4, P Ω cy +1 4 = 0 46 In addition, the first-order optimal conditions for and 4 are given by 0 λ Û+1 + Y +1 1, 0 λ V +1 + Y Since {U }, {V }, {Û}, { V }, {L } and {S } are Cauchy sequences, then lim U +1 U = 0, lim V +1 V = 0, lim V +1 V = 0, lim L +1 L = 0, lim Û+1 Û = 0, lim S +1 S = 0 Let U, V, Û, V, S and L be their limit points, respectively Together with the results in I, then we have that U =Û, V = V, L =U V T and L +S =D Using 46 and 47, the following holds 0 λ Û + Y 1 and 0 λ V + Y, 0 ΦS + P Ω Y 4 and P Ω cy 4 = 0, L = U V T, Û = U, V = V, L + S = D Therefore, any accumulation point {U, V, Û, V, L, S } of the sequence {U, V, Û, Û, L, S } generated by Algorithm 1 satisfies the KKT conditions for the problem 17 That is, the sequence asymptotically satisfies the KKT conditions of 17 In particular, whenever the sequence {U, V, L, S } converges, it converges to a critical point of 15 This completes the proof APPENDIX F: STOPPING CRITERION For the problem 15, the KKT conditions are Using 48, we have 0 λ U + Y V and 0 λ V + Y T U, 0 ΦS + P Ω Ŷ and P Ω cŷ = 0, 48 L = U V T, P Ω L + P Ω S = P Ω D Y λ U V and Y [U T ] λ V T 49 where V is the pseudo-inverse of V The two conditions hold if and only if U V [U T ] V T Recalling the equivalence relationship between 15 and 17, the KKT conditions for 17 are given by 0 λ Û + Y 1 and 0 λ V + Y, 0 ΦS + P Ω Ŷ and P Ω cŷ = 0, L =U V T, Û =U, V =V, L +S =D Hence, we use the following conditions as the stopping criteria for Algorihtm 1: 50 where ϵ 1 = max{ U V T V F / V F } max {ϵ 1 / D F, ϵ } < ϵ L F, L +S D F, Y 1 V Û T Y T F } and ϵ = max{ Û U F / U F, V

9 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 4 Algorithm Solving image recovery problem 51 via ADMM Input: P Ω D, the given ran d, and λ Initialize: µ 0 =, µ max =10 0, ρ>1, =0, and ϵ 1: while not converged do [ : U +1 = L + Y /µ V +Û Y 1 /µ ] V T [ L + Y /µ T U +1 + V Y V + I 1 ] U T : V +1 = /µ +1 U +1 + I 1 4: Û+1 = D λ/µ U+1 + Y1 /µ, and V +1 = D λ/µ V+1 + Y /µ D+µ 5: L +1 = P U +1 V T +1 Y Ω 1+µ + PΩ U+1 V+1 T Y /µ 6: Y1 +1 = Y1 + µ U +1 Û+1, Y +1 = Y + µ V +1 V +1, and Y +1 = Y + µ L+1 U +1 V T 7: µ +1 = ρµ, µ max 8: + 1 9: end while Output: U +1 and V APPENDIX G: ALGORITHMS FOR IMAGE RECOVERY In this part, we propose two efficient ADMM algorithms as outlined in Algorithms and 4 to solve the following D-N/F-N penalty regularized least squares problems for matrix completion: λ U,V,L U + V + 1 P ΩL P Ω D F, st, L = UV T, 51 U,V,L λ U 1 F + V + P ΩL P Ω D F, st, L = UV T 5 Similar to 17 and 18, we also introduce the matrices Û and V as auxiliary variables to 51 ie, 5 in this paper, and obtain the following equivalent formulation, The augmented Lagrangian function of 5 is λ U,V,Û, V,L Û + V + 1 P ΩL P Ω D F, st, L = UV T, U = Û, V = V L µ = λ Û + V + 1 P ΩL P Ω D F + Y 1, U Û + Y, V V + Y, L UV T + µ U Û F + V V F + L UV T F where Y i i=1,, are the matrices of Lagrange multipliers 5 Updating U +1 and V +1 : By fixing the other variables at their latest values, and removing the terms that do not depend on U and V and adding some proper terms that do not depend on U and V, the optimization problems with respect to U and V are formulated as follows: U Û + Y1 /µ F + L UV T + Y /µ F, 54 V V + Y /µ F + L U +1 V T + Y /µ F 55 Since both 54 and 55 are smooth convex optimization problems, their closed-form solutions are given by [ ] 1 U +1 = L + Y /µ V + Û Y1 /µ V T V + I, 56 [ V +1 = L + Y /µ T U +1 + V ] 1 Y /µ U +1U T +1 + I 57

10 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 5 Algorithm 4 Solving image recovery problem 5 via ADMM Input: P Ω D, the given ran d, and λ Initialize: µ 0 =, µ max =10 0, ρ>1, =0, and ϵ 1: while not converged do : U +1 = [ µ L + Y ] V µ V T V + λ/i 1 [ : V +1 = L + Y /µ T U +1 + V ] U Y1 T /µ +1 U +1 + I 1 4: V +1 = D λ/µ V+1 + Y1 /µ D+µ 5: L +1 = P U +1 V T +1 Y Ω 1+µ + PΩ U+1 V+1 T Y /µ V +1 V +1 6: Y1 +1 =Y 7: µ +1 = ρµ, µ max 8: + 1 9: end while Output: U +1 and V µ and Y +1 =Y +µ L+1 U +1 V T +1 Updating Û+1 and V +1 : By eeping all other variables fixed, Û+1 is updated by solving the following problem: λ Û + µ U +1 Û + Y 1 /µ F 58 To solve 58, the SVT operator [] is considered as follows: Û +1 =D λ/µ U +1 +Y1 /µ 59 Similarly, V +1 is given by V +1 = D λ/µ V +1 + Y /µ 60 Updating L +1 : By fixing all other variables, the optimal L +1 is the solution to the following problem: 1 P ΩL P Ω D F + µ L U +1V+1 T + Y F 61 µ Since 61 is a smooth convex optimization problem, it is easy to show that the optimal solution to 61 is D + µ U +1 V+1 T L +1 = P Y Ω + PΩ U +1 V+1 T Y /µ 1 + µ where P Ω is the complementary operator of P Ω, ie, P Ω D ij =0 if i, j Ω, and P Ω D ij =D ij otherwise Based on the description above, we develop an efficient ADMM algorithm for solving the double nuclear norm imization problem 51, as outlined in Algorithm Similarly, we also present an efficient ADMM algorithm to solve 5, as outlined in Algorithm 4 6 APPENDIX H: MORE EXPERIMENTAL RESULTS In this paper, we compared both our methods with the state-of-the-art methods, such as LMaFit [8], RegL1 4 [9], Unifying [10], facten 5 [11], RPCA 6 [6], PSVT 7 [1], 8 [1], and LpSq 9 [14] The Matlab code of the proposed methods can be downloaded from the lin cslzhang/ DFMNMzip?dl=0

11 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 6 Stop criterion 10 r = 5 r = 10 r = 0 Stop criterion 10 r = 5 r = 10 r = 0 RSE 10 r = 5 r = 10 r = 0 RSE 10 r = 5 r =10 r = Iterations Iterations a Stop criterion: S+L 1/ left and S+L / right Iterations Iterations b RSE: S+L 1/ left and S+L / right Fig 1 Convergence behavior of our S+L 1/ and S+L / methods for the cases of the matrix ran 5, 10 and 0 10 S+L 1/ S+L / S+L 1/ S+L / RSE 10 LpSq PSVT Unifying S+L 1/ RSE 10 LpSq PSVT Unifying S+L 1/ RSE RSE 10 S+L / Parameter of ran S+L / Parameter of ran a RSE vs d: left and right Regularization parameter Regularization parameter b RSE vs λ: left and right Fig Comparison of RSE results on corrupted matrices with different ran parameters a and regularization parameters b Convergence Behavior Fig 1 illustrates the evolution of the relative squared error RSE, ie L L F / L F, and stop criterion over the iterations on corrupted matrices of size 1, 000 1, 000 with outlier ratio 5%, respectively From the results, it is clear that both the stopping tolerance and RSE values of our two methods decrease fast, and they converge within only a small number of iterations, usually within 50 iterations Robustness Lie the other non-convex methods such as PSVT and Unifying, the most important parameter of our methods is the ran parameter d To verify the robustness of our methods with respect to d, we report the RSE results of PSVT, Unifying and our methods on corrupted matrices with outlier ratio 10% in Fig a, in which we also present the results of the baseline method, LpSq [14] It is clear that both our methods perform much more robust than PSVT and Unifying, and consistently yield much better solutions than the other methods in all settings To verify the robustness of both our methods with respect to another important parameter ie the regularization parameter λ, we also report the RSE results of our methods on corrupted matrices with outlier ratio 10% in Fig b Note that the ran parameter of both our methods is computed by our ran estimation procedure From the resutls, one can see that both our methods demonstrate very robust performance over a wide range of the regularization parameter, eg from to 10 0 Text Removal We report the text removal results of our methods with varying ran parameters from 10 to 40, as shown in Fig, where the ran of the original image is 10 We also present the results of the baseline method, LpSq [14] The results show that our methods significantly outperform the other methods in terms of RSE and F-measure, and they perform much more robust than Unifying with respect to the ran parameter Moving Object Detection We present the detailed descriptions for five surveillance video sequences: Bootstrap, Hall, Lobby, Mall and WaterSurface data sets, as shown in Table 1 Moreover, Fig 4 illustrates the foreground and bacground separation results on the Hall, Mall, Lobby and WaterSurface data sets

12 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 7 TABLE 1 Information of the surveillance video sequences used in our experiments Datasets Bootstrap Hall Lobby Mall WaterSurface Size frames [10, 160] 1000 [144, 176] 1000 [18, 160] 1000 [56, 0] 600 [18, 160] 500 Description Crowded scene Crowded scene Dynamic foreground Crowded scene Dynamic bacground RSE LpSq Unifying S+L 1/ S+L / Parameter of ran F measure LpSq Unifying S+L 1/ S+L / Parameter of ran Fig The text removal results including RSE and F-measure of LpSq [14], Unifying [10] and our methods with different ran parameters Image Alignment We also conducted some image aligment experiments on the Dummy images used in [15] Figs 5b and 5c show the results of alignment, low-ran and sparse estimations by both our methods, some of which are shown in Fig 5a We can observe that both our methods are able to align the facial images nicely despite illuation variations and occlusions, and correctly detect and remove them Image Inpainting In this part, we first reported the average PSNR results of two proposed methods ie, D-N and F-N with different ratios of random missing pixels from 95% to 80%, as shown in Fig 6 Since the methods in [16], [17] are very slow, we only present the average inpainting results of 11 [18] and 1 [1], both of which use the fast SVD strategy and need to compute only partical SVD instead of the full one Thus, and are usually much faster than the methods [16], [17] Considering that only a small fraction of pixels are randomly selected, thus we conducted 50 independent runs and report the average PSNR and standard deviation std The results show that both our methods consistently outperform [18] and [1] in all the settings This experiment actually shows that both our methods have even greater advantage over existing methods in the cases when the numer of observed pixels is very limited, eg, 5% observed pixels As suggested in [17], we set d=9 for our two methods and 1 [17] To evaluate the robustness of our two methods with respect to their ran parameter, we report the average PSNR and standard deviation of two proposed methods with varying ran parameter d from 7 to 15, as shown in Fig 7 Moreover, we also present the average inpainting results of [17] over 50 independent runs It is clear that two proposed methods perform much more robust than with repsect to the ran parameter REFERENCES [1] L Mirsy, A trace inequality of John von Neumann, Monatsh Math, vol 79, pp 0 06, 1975 [] J Cai, E J Candès, and Z Shen, A singular value thresholding algorithm for matrix completion, SIAM J Optim, vol 0, no 4, pp , 010 [] I Daubechies, M Defrise, and C De Mol, An iterative thresholding algorithm for linear inverse problems with a sparsity constraint, Commun Pur Appl Math, vol 57, no 11, pp , 004 [4] D L Donoho, De-noising by soft-thresholding, IEEE Trans Inform Theory, vol 41, no, pp 61 67, May mattohc/nnlshtml 1 cslzhang/ 1

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 8 a

S+L / Fig 4 Bacground and foreground separation results of different

frame with missing data of each sequence top and its manual

are presented from b to f, respectively The top panel is the recovered

and Y Zhang, Fixed-point continuation for l 1 -imization: Methodology

M Chen, and L Wu, The augmented Lagrange multiplier method for exact

Urbana-Champaign, Tech Rep, Mar 009 [7] F Wang, W Cao, and Z Xu,

problems, arxiv:1505006, 015 [8] Y Shen, Z Wen, and Y Zhang, Augmented

low-ran factorization, Optim Method Softw, vol 9, no, pp 9 6, 014 [9] Y

13 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 8 a Input,segmentation b RegL1 [9] c Unifying [10] d facten [11] e S+L 1/ f S+L / Fig 4 Bacground and foreground separation results of different algorithms on the Hall, Mall, Lobby and WaterSurface data sets The one frame with missing data of each sequence top and its manual segmentation bottom are shown in a The results by different algorithms are presented from b to f, respectively The top panel is the recovered bacground and the bottom panel is the segmentation [5] E T Hale, W Yin, and Y Zhang, Fixed-point continuation for l 1 -imization: Methodology and convergence, SIAM J Optim, vol 19, no, pp , 008 [6] Z Lin, M Chen, and L Wu, The augmented Lagrange multiplier method for exact recovery of corrupted low-ran matrices, Univ Illinois, Urbana-Champaign, Tech Rep, Mar 009 [7] F Wang, W Cao, and Z Xu, Convergence of multi-bloc Bregman ADMM for nonconvex composite problems, arxiv: , 015 [8] Y Shen, Z Wen, and Y Zhang, Augmented Lagrangian alternating direction method for matrix separation based on low-ran factorization, Optim Method Softw, vol 9, no, pp 9 6, 014 [9] Y Zheng, G Liu, S Sugimoto, S Yan, and M Outomi, Practical low-ran matrix approximation under robust L 1 -norm, in Proc IEEE

14 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 9 a Original occluded images b Alignment results by our S+L 1/ method c Alignment results by our S+L / method Fig 5 Aligning the Dummy images used in [15] a Ten out of the 100 images b and c: From top to bottom show the aligned, low-ran and sparse results using our S+L 1/ and S+L / methods, respectively Conf Comput Vis Pattern Recognit, 01, pp [10] R Cabral, F De la Torre, J Costeira, and A Bernardino, Unifying nuclear norm and bilinear factorization approaches for low-ran matrix decomposition, in Proc IEEE Int Conf Comput Vis, 01, pp [11] E Kim, M Lee, and S Oh, Elastic-net regularization of singular values for robust subspace learning, in Proc IEEE Conf Comput Vis Pattern Recognit, 015, pp [1] T Oh, Y Tai, J Bazin, H Kim, and I Kweon, Partial sum imization of singular values in robust PCA: Algorithm and applications, IEEE Trans Pattern Anal Mach Intell, vol 8, no 4, pp , Apr 016 [1] S Gu, Q Xie, D Meng, W Zuo, X Feng, and L Zhang, Weighted nuclear norm imization and its applications to low level vision, Int J Comput Vis, vol 11, no, pp 18 08, 017 [14] F Nie, H Wang, X Cai, H Huang, and C Ding, Robust matrix completion via joint Schatten p-norm and L p-norm imization, in Proc IEEE Int Conf Data Mining, 01, pp [15] Y Peng, A Ganesh, J Wright, W Xu, and Y Ma, RASL: Robust alignment by sparse and low-ran decomposition for linearly correlated images, IEEE Trans Pattern Anal Mach Intell, vol 4, no 11, pp 46, Nov 01 [16] C Lu, J Tang, S Yan, and Z Lin, Generalized nonconvex nonsmooth low-ran imization, in Proc IEEE Conf Comput Vis Pattern Recognit, 014, pp [17] Y Hu, D Zhang, J Ye, X Li, and X He, Fast and accurate matrix completion via truncated nuclear norm regularization, IEEE Trans Pattern Anal Mach Intell, vol 5, no 9, pp , Sep 01 [18] K-C Toh and S Yun, An accelerated proximal gradient algorithm for nuclear norm regularized least squares problems, Pac J Optim, vol 6, pp , 010

15 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE PSNR db 0 15 PSNR db PSNR db 16 PSNR db a Image 1 b Image c Image d Image PSNR db PSNR db PSNR db PSNR db e Image 5 f Image 6 g Image 7 h Image 8 Fig 6 The average PSNR and standard deviation of [18], [1] and both our methods for image inpainting vs fraction of observed pixels best viewed in colors PSNR db 4 PSNR db PSNR db 0 PSNR db Ran parameter Ran parameter Ran parameter Ran parameter a Image 1 b Image c Image d Image PSNR db PSNR db 1 PSNR db PSNR db Ran parameter Ran parameter Ran parameter Ran parameter e Image 5 f Image 6 g Image 7 h Image 8 Fig 7 The average PSNR and standard deviation of [17] and both our methods for image inpainting vs the ran parameter best viewed in colors

THE sparse and low-rank priors have been widely used

THE sparse and low-rank priors have been widely used This article has been accepted for publication in a future issue of this journal, but has not been fully edited Content may change prior to final publication Citation information: DOI 101109/TPAMI017748590,