Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds

Size: px

Start display at page:

Download "Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds"

Anthony Bruce
5 years ago
Views:

Scientific Computing Karl-Franzens-University of Graz joint work

1 Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds Tao Wu Institute for Mathematics and Scientific Computing Karl-Franzens-University of Graz joint work with Prof. Michael Hintermüller Riem-RPCP (1/19)

Low-rank paradigm. Low-rank matrices arise in one way or another: low-degree statistical processes e.g. collaborative filtering, latent semantic indexing. regularization on complex objects e.g. manifold learning, metric learning.

2 Low-rank paradigm. Low-rank matrices arise in one way or another: low-degree statistical processes e.g. collaborative filtering, latent semantic indexing. regularization on complex objects e.g. manifold learning, metric learning. approximation of compact operators e.g. proper orthogonal decomposition. Fig.: Collaborative filtering (courtesy of wikipedia.org). Riem-RPCP (2/19)

3 Robust principal component pursuit. Sparse component corresponds to pattern-irrelevant outliers. Robustifies classical principal component analysis. Carries important information in certain applications; e.g. moving objects in surveillance video. Robust principal component pursuit: data low-rank sparse noise Z = A + B + N Introduced in [Candés, Li, Ma, and Wright, 11], [Chandrasekaran, Sanghavi, Parrilo, and Willsky, 11]. tao.wu@uni-graz.at Riem-RPCP (3/19)

4 Convex-relaxation approach. A popular (convex) variational model: min A nuclear + λ B l 1 s.t. A + B Z ε. Considered in [Candés, Li, Ma, and Wright, 11], [Chandrasekaran, Sanghavi, Parrilo, and Willsky, 11],... rank(a) relaxed by nuclear-norm; B 0 relaxed by l 1 -norm. Numerical solvers: proximal gradient method, augmented Lagrangian method,... Efficiency is constrained by SVD in full dimension at each iteration. tao.wu@uni-graz.at Riem-RPCP (4/19)

5 Manifold constrained least-squares model. Our variational model: min 1 A + B Z 2 2 s.t. A M(r) := {A R m n : rank(a) r}, B N (s) := {B R m n : B 0 s}. Our goal is to develop an algorithm such that: globally converges to a stationary point (often a local minimizer). provides exact decomposition with high probability for noiseless data. outperforms solvers based on convex-relaxation approach, especially in large scales. tao.wu@uni-graz.at Riem-RPCP (5/19)

6 Existence of solution and optimality condition. A little quadratic regularization (0 < µ 1) is included for the (theoretical) sake of existence of a solution; i.e. min f(a, B) := 1 2 A + B Z 2 + µ 2 A 2, s.t. (A, B) M(r) N (s). In numerics, choosing µ = 0 seems fine. Stationarity condition as variational inequalities: {, (1 + µ)a + B Z 0, for any T M(r) (A ),, A + B Z 0, for any T N (s) (B ). T M(r) (A ) and T N (s) (B ) refer to tangent cones. tao.wu@uni-graz.at Riem-RPCP (6/19)

7 Constraints of Riemannian manifolds. M(r) is Riemannian manifold around A if rank(a ) = r; N (s) is Riemannian manifold around B if B 0 = s. Optimality condition reduces to: { PTM(r)(A )((1 + µ)a + B Z) = 0, P TN (s) (B )(A + B Z) = 0. P TM(r) (A ) and P TN (s) (B ) are orthogonal projections onto subspaces. Tangent space formulae: T M(r) (A ) = {UMV + U pv + UV p : A = UΣV as compact SVD, M R r r, U p R m r, U p U = 0, V p R n r, V p V = 0}, T N (s) (B ) = { R m n : supp( ) supp(b )}. tao.wu@uni-graz.at Riem-RPCP (7/19)

8 A conceptual alternating minimization scheme. Initialize A 0 M(r), B 0 N (s). Set k := 0 and iterate: 1. A k+1 arg min A M(r) 1 2 A + Bk Z 2 + µ 2 A B k+1 arg min B N (s) 1 2 Ak+1 + B Z 2. Theorem (sufficient descrease + stationarity convergence) Let {(A k, B k )} be generated as above. Suppose that there exists δ > 0, ε k a 0, and ε k b 0 such that for all k: f(a k+1, B k ) f(a k, B k ) δ A k+1 A k 2, f(a k+1, B k+1 ) f(a k+1, B k ) δ B k+1 B k 2,, (1 + µ)a k+1 + B k Z ε k a, for any T M(r) (A k+1 ),, A k+1 + B k+1 Z ε k b, for any T N (s) (B k+1 ). Then any non-degenerate limit point (A, B ), i.e. rank(a ) = r and B 0 = s, satisfies the first-order optimality condition. tao.wu@uni-graz.at Riem-RPCP (8/19)

9 Sparse matrix subproblem. The global solution P N (s) (Z A k+1 ) (as metric projection) can be efficiently calculated from sorting. The global solution may not necessarily fulfill the sufficient descrease condition. Whenever necessary, safeguard by a local solution: { Bij k+1 (Z A k+1 ) = ij, if Bij k 0, 0, otherwise. Given non-degeneracy of B k+1, i.e. B k+1 0 = s, the exact stationarity holds. tao.wu@uni-graz.at Riem-RPCP (9/19)

10 Low-rank matrix subproblem: a Riemannian perspective. Global solution P M(r) ( 1 1+µ (Z Bk )) as metric projection: available due to Eckart-Young theorem; i.e. 1 n 1 + µ (Z Bk ) = σ ju jvj 1 r P M(r) ( 1 + µ (Z Bk )) = σ ju jvj. j=1 j=1 but requires SVD in full dimension expensive for large-scale problems (e.g. m, n 2000). Alternatively resolved by a single Riemannian optimization step on matrix manifold. Riemannian optimization applied to low-rank matrix/tensor problems; see [Simonsson and Eldén, 10], [Savas and Lim, 10], [Vandereycken, 13],... Our goal: The subproblem solver should activate the convergence criteria, i.e. sufficient descrease + stationarity. tao.wu@uni-graz.at Riem-RPCP (10/19)

11 Optimization on Manifolds in one picture Riemannian optimization: an overview. R M f References: [Smith, 93], [Edelman, Arias, and Smith, 98], [Absil, Mahony, and Sepulchre, 08],... Why Riemannian optimization? Local homeomorphism is computationally infeasible/expensive. 3 Intrinsically low dimensionality of the underlying manifold. Further dimension reduction via quotient manifold. Typical Riemannian manifolds in applications: finite-dimensional (matrix manifold): Stiefel manifold, Grassmann manifold, fixed-rank matrix manifold,... infinite-dimensional: shape/curve spaces,... tao.wu@uni-graz.at Riem-RPCP (11/19)

12 Riemannian optimization: a conceptual algorithm. T M(r)(A k ) TxM x A k ξ k retract Rx(ξ) M(r)(A k, k ) M(r) M At the current iterate: 1. Build a quadratic model in the tangent space using Riemannian gradient and Riemannian Hessian. 2. Based on the quadratic model, build a tangential search path. 3. Perform backtracking path search via retraction to determine the step size. 4. Generate the next iterate. tao.wu@uni-graz.at Riem-RPCP (12/19)

13 Riemannian gradient and Hessian. M(r) := {A : rank(a) = r}; f k A : A M(r) f(a, B k ). Riemannian gradient, gradfa k (A) T M(r) (A), is defined s.t. gradfa k(a), = Df A k (A)[ ], T M(r) (A). gradf k A(A) = P T M(r) (A)( f k A(A)). Riemannian Hessian, HessfA k (A) : T M(r) (A) T M(r) (A), is defined s.t. HessfA k(a)[ ] = gradfa k (A), T M (A). Hessf k A(A)[ ] = (I UU ) f k A(A)(I V V ) UΣ 1 V + UΣ 1 V (I UU ) f k A(A)(I V V ) + (1 + µ). See, e.g., [Vandereycken, 12]. tao.wu@uni-graz.at Riem-RPCP (13/19)

14 Dogleg search path and projective retraction. Trust region T M(r)(A k ) TxM x A k Optimal trajectory (σ) p( ) ξ k retract Rx(ξ) M(r)(A k, k ) C pu (unconstrained min along g) dogleg path g N p B ( full step ) M(r) M Dogleg path k (τ k ) as approximation of optimal trajectory of tangential trust-region subproblem (left figure): min f k A(A k ) + g k, + 1 2, Hk [ ] s.t. T M(r) (A k ), σ. Metric projection as retraction (right figure): retract M(r) (A k, k (τ k )) = P M(r) (A k + k (τ k )). Computationally efficient: reduced SVD on 2r-by-2r matrix! tao.wu@uni-graz.at Riem-RPCP (14/19)

15 Low-rank matrix subproblem: projected dogleg step. Given A k M(r), B k N (s): 1. Compute g k, H k, and build the dogleg search path k (τ k ) in T M(r) (A k ). 2. Whenever non-positive definiteness of H k is detected, replace the dogleg search path by the line search path along steepest descent direction, i.e. (τ k ) = τ k g k. 3. Perform backtracking path/line search; i.e. find the largest step size τ k {2, 3/2, 1, 1/2, 1/4, 1/8,...} s.t. the sufficient descrease condition is satisfied: f k A(A k ) f k A(P M(r) (A k + k (τ k ))) δ A k P M(r) (A k + k (τ k )) Return A k+1 = f k A (P M(r) (Ak + k (τ k ))). tao.wu@uni-graz.at Riem-RPCP (15/19)

16 Low-rank matrix subproblem: convergence theory. Backtracking path search: The sufficient descrease condition can always be fulfilled after finitely many trails on τ k. Any accumulation point of {A k } is stationary. Further assume Hessf(A, B ) 0 at a non-degenerate µ=0 accumulation point (A, B ). Then Tangent-space transversality holds, i.e. T M(r) (A ) T N (s) (B ) = {0}. Contractivity of PT M(r) (A ) P TN (s) (B ): κ [0, 1) s.t. (P T M(r) (A ) P TN (s) (B ))( ) κ. q-linear convergence of {A k } towards stationarity: lim sup k A k+1 A A k A κ. tao.wu@uni-graz.at Riem-RPCP (16/19)

17 Numerical implementation. Trimming Adaptive tuning of rank r k+1 and cardinality s k+1 based on the current iterate (A k, B k ). k-means clustering on (nonzero) singular values of A k in logarithmic scale. hard thresholding on entries of B k. q-linear convergence confirmed numerically: A k A * / A * B k B * / B * iteration (a) Convergence of {A k } iteration (b) Convergence of {B k }. tao.wu@uni-graz.at Riem-RPCP (17/19)

18 Comparison with augmented Lagrangian method (m = n = 2000) AMS AMS # fsvd ALM psvd ALM AMS AMS # fsvd ALM psvd ALM relative error on A k relative error on B k CPU time (a) Relative error of {A k } CPU time (b) Relative error of {B k } AMS AMS # fsvd ALM psvd ALM AMS AMS # fsvd ALM psvd ALM rank(a k )/n B k 0 /n CPU time CPU time (c) Phase transition of {A k }. (d) Phase transition of {B k }. tao.wu@uni-graz.at Riem-RPCP (18/19)

19 Application to surveillance video. Problem settings: A sequence of 200 frames taken from a surveillance video at an airport. Each frame is a gray image of resolution Stack 3D-array into a matrix. Results: CPU time: AMS 39.4s; ALM 124.4s. Visual comparison. tao.wu@uni-graz.at Riem-RPCP (19/19)

Robust Principal Component Pursuit via Inexact Alternating Minimization on Matrix Manifolds

Robust Principal Component Pursuit via Inexact Alternating Minimization on Matrix Manifolds Michael Hintermüller Tao Wu Abstract Robust principal component pursuit (RPCP) refers to a decomposition of a