Robust PCA. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng

Robust PCA CS5240 Theoretical Foundations in Multimedia Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore Leow Wee Kheng (NUS) Robust PCA 1 / 52

Previously... Previously... not robust against outliers linear least squares PCA robust against outliers trimmed least squares trimmed PCA Various ways to robustify PCA: trimming: remove outliers covariance matrix with 0-1 weight [Xu95]: similar to trimming weighted SVD [Gabriel79]: weighting robust error function [Torre2001]: winsorizing Strength: simple concepts Weakness: no guarantee of optimality Leow Wee Kheng (NUS) Robust PCA 2 / 52

Robust PCA Robust PCA PCA can be formulated as follows: Given a data matrix D, recover a low-rank matrix A from D such that the error E = D A is minimized: min E F, subject to rank(a) r, D = A+E. (1) A,E r min(m,n) is the target rank of A. F is the Frobenius norm. Notes: This definition of PCA includes dimensionality reduction. PCA is severely affected by large-amplitude noise; not robust. Leow Wee Kheng (NUS) Robust PCA 3 / 52

Robust PCA [Wright2009] formulated the Robust PCA problem as follows: Given a data matrix D = A+E where A and E are unknown but A is low-rank and E is sparse, recover A. An obvious way to state the robust PCA problem in math is: Given a data matrix D, find A and E that solve the problem min rank(a)+λ E 0, subject to A+E = D. (2) A,E λ is a Lagrange multiplier. E 0 : l 0 -norm, number of non-zero elements in E. E is sparse if E 0 is small. Leow Wee Kheng (NUS) Robust PCA 4 / 52

Robust PCA min rank(a)+λ E 0, subject to A+E = D. (2) A,E D A E Problem 2 is a matrix recovery problem. rank(a) and E 0 are not continuous, not convex; very hard to solve; no efficient algorithm. Leow Wee Kheng (NUS) Robust PCA 5 / 52

Robust PCA [Candès2011, Wright2009] reformulated Problem 2 as follows: Given an m n data matrix D, find A and E that solve min A +λ E 1, subject to A+E = D. (3) A,E A : nuclear norm, sum of singular values of A; surrogate for rank(a). E 1 : l 1 -norm, sum of absolute values of elements of E; surrogate for E 0. Solution of Problem 3 can be recovered exactly if A is sufficiently low-rank but not sparse, and E is sufficiently sparse but not low-rank, with optimal λ = 1/ max(m,n). A and E 1 are convex; can apply convex optimization. Leow Wee Kheng (NUS) Robust PCA 6 / 52

Robust PCA Introduction to Convex Optimization Introduction to Convex Optimization For a differentiable function f(x), its minimizer x is given by [ df( x) f( x) dx = x 1 ] f( x) = 0. (4) x m f x df / dx = 0 x Leow Wee Kheng (NUS) Robust PCA 7 / 52

Robust PCA Introduction to Convex Optimization E 1 is not differentiable when any of its element is zero! f(x) 12 10 8 6 4 2 4 2 x2 0 2 4 4 2 0 x1 2 4 0 Cannot write the following because they don t exist: d E 1 de, E 1 e ij, d e ij de ij. WRONG! (5) Fortunately, E 1 is convex. Leow Wee Kheng (NUS) Robust PCA 8 / 52

Robust PCA Introduction to Convex Optimization A function f(x) is convex if and only if x 1,x 2, α [0,1], f(αx 1 +(1 α)x 2 ) αf(x 1 )+(1 α)f(x 2 ). (6) f ( x) x 1 x 2 x x Leow Wee Kheng (NUS) Robust PCA 9 / 52

Robust PCA Introduction to Convex Optimization A vector g(x) is a subgradient of convex function f at x if f(y) f(x) g(x) (y x), y. (7) f ( x) l 1 l 2 l 3 x 1 x 2 x At differentiable point x 1 : one unique subgradient = gradient. At non-differentiable point x 2 : multiple subgradients. Leow Wee Kheng (NUS) Robust PCA 10 / 52

Robust PCA Introduction to Convex Optimization The subdifferential f(x) is the set of all subgradients of f at x: { } f(x) = g(x) g(x) (y x) f(y) f(x), y. (8) Caution: Subdifferential f(x) is not partial differentiation f(x)/ x. Don t mix up. Partial differentiation is defined for differentiable functions. It is a single vector. Subdifferential is defined for convex functions, which may be non-differentiable. It is a set of vectors. Leow Wee Kheng (NUS) Robust PCA 11 / 52

Robust PCA Introduction to Convex Optimization Example: Absolute value. x is not differentiable at x = 0 but subdifferentiable at x = 0: x if x > 0, {+1} if x > 0, x = 0 if x = 0, x = [ 1,+1] if x = 0, x if x < 0. { 1} if x < 0. (9) g = 1 x g = +1 1 0 < g < 1 0 x 0 absolute value g = 0 x 1 subdifferential Leow Wee Kheng (NUS) Robust PCA 12 / 52

Robust PCA Introduction to Convex Optimization Example: l 1 -norm of an m-d vector x. m x 1 = x i, x 1 = i=1 m x i. (10) i=1 Caution: Right-hand-side of Eq. 10 is set addition. This gives a product of m sets, one for each element x i : {+1} if x i > 0, x 1 = J 1 J m, J i = [ 1,+1] if x i = 0, { 1} if x i < 0. Alternatively, we can write x 1 as = +1 if x i > 0, x 1 = {g} such that g i [ 1,+1] if x i = 0, = 1 if x i < 0. (11) (12) Leow Wee Kheng (NUS) Robust PCA 13 / 52

Robust PCA Introduction to Convex Optimization Another alternative: x 1 = {g} such that { gi = sgn(g i ) if x i > 0, g i 1 if x i = 0. (13) Here are some examples of the subdifferentials of 2D l 1 -norm. f(0,0) = [ 1,1] [ 1,1] f(1,0) = {1} [ 1,1] f(1,1) = {(1,1)} Leow Wee Kheng (NUS) Robust PCA 14 / 52

Robust PCA Introduction to Convex Optimization Optimality Condition x is a minimum of f if 0 f(x), i.e., 0 is a subgradient of f. f ( x) x g ( x) = 0 x For more details on convex functions and convex optimization, refer to [Bertsekas2003, Boyd2004, Rockafellar70, Shor85]. Leow Wee Kheng (NUS) Robust PCA 15 / 52

Robust PCA Back to Robust PCA Back to Robust PCA Shrinkage or soft threshold operator: x ε if x > ε, T ε (x) = x+ε if x < ε, 0 otherwise. (14) T ε( x ) ε 0 ε x Leow Wee Kheng (NUS) Robust PCA 16 / 52

Robust PCA Back to Robust PCA Using convex optimization, the following minimizers are shown [Cai2010, Hale2007]: For an m n matrix M with SVD USV, UT ε (S)V = argmin X ε X + 1 2 X M 2 F, (15) T ε (M) = argmin X ε X 1 + 1 2 X M 2 F. (16) Leow Wee Kheng (NUS) Robust PCA 17 / 52

Robust PCA Solving Robust PCA Solving Robust PCA There are several ways to solve robust PCA (Problem 3) [Lin2009,Wright2009]: principal component pursuit iterative thresholding accelerated proximal gradient augmented Lagrange multipliers Leow Wee Kheng (NUS) Robust PCA 18 / 52

Augmented Lagrange Multipliers Augmented Lagrange Multipliers Consider the problem min x f(x) subject to c j (x) = 0,j = 1,...,m. (17) This is a constrained optimization problem. Lagrange multipliers method reformulates Problem 17 as: with some constants λ j. min x f(x)+ m λ j c j (x) (18) j=1 Leow Wee Kheng (NUS) Robust PCA 19 / 52

Augmented Lagrange Multipliers Penalty method reformulates Problem 17 as: m minf(x)+µ c 2 x j(x). (19) j=1 Parameter µ increases over iteration, e.g., by factor of 10. Need µ to get good solution. Augmented Lagrange multipliers method combines Lagrange multipliers and penalty: m minf(x)+ λ j c j (x)+ µ m c 2 x 2 j(x). (20) j=1 j=1 Denote λ = [λ 1 λ m ], c(x) = [c 1 (x) c m (x)]. Then, Problem 20 becomes min x f(x)+λ c(x)+ µ 2 c(x) 2. (21) Leow Wee Kheng (NUS) Robust PCA 20 / 52

Augmented Lagrange Multipliers If the constraints form a matrix C = [c jk ], then define Λ = [λ jk ]. Then Problem 20 becomes min x f(x)+ Λ,C(x) + µ 2 C(x) 2 F. (22) Λ,C is sum of product of corresponding elements: Λ,C = j λ jk c jk. k C F is the Frobenius norm: C 2 F = j c 2 jk. k Leow Wee Kheng (NUS) Robust PCA 21 / 52

Augmented Lagrange Multipliers Augmented Lagrange Multipliers (ALM) Method 1. Initialize Λ, µ > 0, ρ 1. 2. Repeat until convergence: 2.1 Compute x = argmin x L(x) where 2.2 Update Λ = Λ+µC(x). 2.3 Update µ = ρµ. L(x) = f(x)+ Λ,C(x) + µ 2 C(x) 2 F. What kind of optimization algorithm is this? ALM does not need µ to get good solution. That means, can converge faster. Leow Wee Kheng (NUS) Robust PCA 22 / 52

Augmented Lagrange Multipliers With ALM, robust PCA (Problem 3) is reformulated as min A,E A +λ E 1 + Y,D A E + µ 2 D A E 2 F, (23) Y are the Lagrange multipliers. Constraints C = D A E. To implement Step 2.1 of ALM, need to find minima for A and E. Adopt alternating optimization. Leow Wee Kheng (NUS) Robust PCA 23 / 52

Augmented Lagrange Multipliers Trace of Matrix Trace of Matrix The trace of a matrix A is the sum of its diagonal elements a ii : tr(a) = n a ii. (24) i=1 Strictly speaking, scalar c and a 1 1 matrix [c] are not the same thing. Nevertheless, since tr([c]) = c, we often write, for simplicity tr(c) = c. Properties tr(a) = n λ i, where λ i are the eigenvalues of A. i=1 tr(a+b) = tr(a)+tr(b) tr(ca) = ctr(a) tr(a) = tr(a ) Leow Wee Kheng (NUS) Robust PCA 24 / 52

Augmented Lagrange Multipliers Trace of Matrix tr(abcd) = tr(bcda) = tr(cdab) = tr(dabc) tr(x Y) = tr(xy ) = tr(y X) = tr(yx ) = x ij y ij i j Leow Wee Kheng (NUS) Robust PCA 25 / 52

Augmented Lagrange Multipliers From Problem 23, the minimal E with other variables fixed is given by min E λ E 1 + Y,D A E + µ 2 D A E 2 F min E λ E 1 +tr(y (D A E))+ µ 2 D A E 2 F min E λ E 1 +tr( Y E)+ µ 2 tr((d A E) (D A E)). (Homework) min E λ E 1 + µ 2 tr((e (D A+Y/µ)) (E (D A+Y/µ))) min E λ E 1 + µ 2 E (D A+Y/µ) 2 F min E λ µ E 1 + 1 2 E (D A+Y/µ) 2 F. (25) Leow Wee Kheng (NUS) Robust PCA 26 / 52

Augmented Lagrange Multipliers Compare Eq. 16 and Problem 25 T ε (M) = argmin X ε X 1 + 1 2 X M 2 F, min E λ µ E 1 + 1 2 E (D A+Y/µ) 2 F. Set X = E, M = D A+Y/µ. Then, E = T λ/µ (D A+Y/µ). (26) Leow Wee Kheng (NUS) Robust PCA 27 / 52

Augmented Lagrange Multipliers From Problem 23, the minimal A with other variables fixed is given by min A A + Y,D A E + µ 2 D A E 2 F min A +tr(y (D A E))+ µ 2 D A E 2 F. (Homework) min A Compare Problem 27 and Eq. 15 1 µ A + 1 2 A (D E+Y/µ) 2 F (27) UT ε (S)V = argmin X ε X + 1 2 X M 2 F, Set X = A, M = D E+Y/µ. Then, A = UT 1/µ (S)V. (28) Leow Wee Kheng (NUS) Robust PCA 28 / 52

Augmented Lagrange Multipliers Robust PCA for Matrix Recovery via ALM Inputs: D. 1. Initialize A = 0, E = 0. 2. Initialize Y, µ > 0, ρ > 1. 3. Repeat until convergence: 4. Repeat until convergence: 5. U,S,V = SVD(D E+Y/µ). 6. A = UT 1/µ (S)V. 7. E = T λ/µ (D A+Y/µ). 8. Update Y = Y +µ(d A E). 9. Update µ = ρµ. Outputs: A, E. Leow Wee Kheng (NUS) Robust PCA 29 / 52

Augmented Lagrange Multipliers Typical initialization [Lin2009]: Y = sgn(d)/j(sgn(d)). sgn(d) gives sign of each matrix element of D. J( ) gives scaling factors: J(X) = max( X 2,λ 1 X ). X 2 is spectral norm, largest singular value of X. X is largest absolute value of elements of X. µ = 1.25 D 2. ρ = 1.5. λ = 1/ max(m,n) for m n matrix D. Leow Wee Kheng (NUS) Robust PCA 30 / 52

Augmented Lagrange Multipliers Example: Recovery of video background. video frames data matrix D Leow Wee Kheng (NUS) Robust PCA 31 / 52

Augmented Lagrange Multipliers Sample results: data matrix low-rank error low-rank error Leow Wee Kheng (NUS) Robust PCA 32 / 52

Augmented Lagrange Multipliers Example: Removal of specular reflection and shadow. D A E Leow Wee Kheng (NUS) Robust PCA 33 / 52

Fixed-Rank Robust PCA Fixed-Rank Robust PCA In reflection removal, reflection may be global. ground-truth input Then, E is not sparse: violate RPCA condition! But, rank of A = 1. Fix the rank of A to deal with non-sparse E [Leow2013]. Leow Wee Kheng (NUS) Robust PCA 34 / 52

Fixed-Rank Robust PCA Fixed-Rank Robust PCA via ALM Inputs: D. 1. Initialize A = 0, E = 0. 2. Initialize Y, µ > 0, ρ > 1. 3. Repeat until convergence: 4. Repeat until convergence: 5. U,S,V = SVD(D E+Y/µ). 6. If rank(t 1/µ (S)) < r, A = UT 1/µ (S)V ; else A = US r V. 7. E = T λ/µ (D A+Y/µ). 8. Update Y = Y +µ(d A E). 9. Update µ = ρµ. Outputs: A, E. S r is S with last m r singular values set to 0. Leow Wee Kheng (NUS) Robust PCA 35 / 52

Fixed-Rank Robust PCA Example: Removal of local reflections. ground-truth input FRPCA RPCA Leow Wee Kheng (NUS) Robust PCA 36 / 52

Fixed-Rank Robust PCA Example: Removal of global reflections. ground-truth input FRPCA RPCA Leow Wee Kheng (NUS) Robust PCA 37 / 52

Fixed-Rank Robust PCA Example: Removal of global reflections. ground-truth input FRPCA RPCA Leow Wee Kheng (NUS) Robust PCA 38 / 52

Fixed-Rank Robust PCA Example: Background recovery for traffic video: fast moving vehicles. input FRPCA RPCA Leow Wee Kheng (NUS) Robust PCA 39 / 52

Fixed-Rank Robust PCA Example: Background recovery for traffic video: slow moving vehicles. input FRPCA RPCA Leow Wee Kheng (NUS) Robust PCA 40 / 52

Fixed-Rank Robust PCA Example: Background recovery for traffic video: temporary stop. input FRPCA RPCA Leow Wee Kheng (NUS) Robust PCA 41 / 52

Matrix Completion Matrix Completion Customers are asked to rate the movies from 1 (poor) to 5 (excellent). A 5 5 3 2 B 3 4 3 C 3 4 5 4 D 4 5 5 3. Customers rate only some movies some data are missing. How to estimate the missing data? matrix completion. Leow Wee Kheng (NUS) Robust PCA 42 / 52

Matrix Completion Let D denote data matrix with missing elements set to 0, and M = {(i,j)} denote the indices of missing elements in D. Then, the matrix completion problem can be formulated as Given D and M, find matrix A that solves the problem min A A subject to A+E = D,E ij = 0 (i,j) / M. (29) For (i,j) / M, constrain E ij = 0 so that A ij = D ij ; no change. For (i,j) M, D ij = 0, i.e., A ij = E ij ; recovered value. Reformulating Problem 29 using augmented Lagrange multipliers gives min A A + Y,D A E + µ 2 D A E 2 F (30) such that E ij = 0 (i,j) / M. Leow Wee Kheng (NUS) Robust PCA 43 / 52

Matrix Completion Robust PCA for Matrix Completion Inputs: D. 1. Initialize A = 0, E = 0. 2. Initialize Y, µ > 0, ρ > 1. 3. Repeat until convergence: 4. Repeat until convergence: 5. U,S,V = SVD(D E+Y/µ). 6. A = UT 1/µ (S)V. 7. E = Γ M (D A+Y/µ), where { Xij, for (i,j) M. Γ M (X) = 0, for (i,j) / M, 8. Update Y = Y +µ(d A E). 9. Update µ = ρµ. Outputs: A, E. Leow Wee Kheng (NUS) Robust PCA 44 / 52

Matrix Completion Example: Recovery of occluded parts in face images. input ground matrix matrix truth completion recovery Leow Wee Kheng (NUS) Robust PCA 45 / 52

Summary Summary Robustification of PCA PCA trimming 0 1 covariance weighted SVD robust error function trimmed PCA trimmed PCA weighted PCA winsorized PCA Leow Wee Kheng (NUS) Robust PCA 46 / 52

Summary low rank + sparse matrix decomposition Robust PCA PCA robust PCA (matrix recovery) fix rank of A fix elements of E fixed rank robust PCA robust PCA (matrix completion) Leow Wee Kheng (NUS) Robust PCA 47 / 52

Probing Questions Probing Questions If the data matrix of a problem is composed of a low-rank matrix, a sparse matrix and something else, can you still use robust PCA methods? If yes, how? If no, why? In application of robust PCA to high-resolution colour image processing, the data matrix contains three times as many rows as the number of pixels in the images, which can lead to a very large data matrix that takes a long time to compute. Suggest a way to overcome this problem. In application of robust PCA to video processing, the data matrix contains as many columns as the number of video frames, which can lead to a very large data matrix that is more than the available memory required to store the matrix. Suggest a way to overcome this problem. Leow Wee Kheng (NUS) Robust PCA 48 / 52

Homework Homework 1. Show that tr(x Y) = i x ij y ij. 2. Show that the following two optimization problems are equivalent: min E λ E 1 +tr( Y E)+ µ 2 tr((d A E) (D A E)) min E λ E 1 + µ 2 tr((e (D A+Y/µ)) (E (D A+Y/µ))) 3. Show that the minimal A of Problem 23 with other variables fixed is given by min A 1 µ A + 1 2 A (D E+Y/µ) 2 F. j Leow Wee Kheng (NUS) Robust PCA 49 / 52

References I References 1. D. P. Bertsekas. Convex Analysis and Optimization. Athena Scientific, 2003. 2. S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004. 3. J. F. Cai, E. J. Candès, Z. Shen. A singular value thresholding algorithm for matrix completion. SIAM Journal of Optimization, 20(4):1956 1982, 2010. 4. E. J. Candès, X. Li, Y. Ma, and J. Wright. Robust principal component analysis? Journal of ACM, 58(3), 2011. 5. F. De la Torre and M. J. Black. Robust principal component analysis for computer vision. In Proc. Int. Conf. Computer Vision, 2001. 6. R. T. Rockafellar. Convex Analysis. Princeton University Press, 1970. Leow Wee Kheng (NUS) Robust PCA 50 / 52

References II References 7. K. Gabriel and S. Zamir. Lower rank approximation of matrices by least squares with any choice of weights. Technometrics, 21:489-498, 1979. 8. E. T. Hale, W. Yin, Y. Zhang. A fixed-point continuation method for l 1 -regularized minimization with applications to compressed sensing. Tech. Report TR07-07, Dept. of Comp. and Applied Math., Rice University, 2007. 9. W. K. Leow, Y. Cheng, L. Zhang, T. Sim, and L. Foo. Background recovery by fixed-rank robust principal component analysis. In Proc. Int. Conf. on Computer Analysis of Images and Patterns, 2013. 10. Z. Lin, M. Chen, L. Wu, and Y. Ma. The augmented Lagrange multiplier method for exact recovery of corrupted low-rank matrices. Tech. Report UILU-ENG-09-2215, UIUC, 2009. 11. N. Z. Shor. Minimization Methods for Non-Differentiable Functions. Springer-Verlag, 1985. Leow Wee Kheng (NUS) Robust PCA 51 / 52

References III References 12. J. Wright, Y. Peng, Y. Ma, A. Ganesh, and S. Rao. Robust principal component analysis: Exact recovery of corrupted low-rank matrices by convex optimization. In Proc. Conf. on Neural Information Processing Systems, 2009. 13. L. Xu and A. Yuille. Robust principal component analysis by self-organizing rules based on statistical physics approach. IEEE Trans. Neural Networks, 6(1):131-143, 1995. Leow Wee Kheng (NUS) Robust PCA 52 / 52