Low-Rank Factorization Models for Matrix Completion and Matrix Separation

Size: px

Start display at page:

Download "Low-Rank Factorization Models for Matrix Completion and Matrix Separation"

Nelson Summers
6 years ago
Views:

1 for Matrix Completion and Matrix Separation Joint work with Wotao Yin, Yin Zhang and Shen Yuan IPAM, UCLA Oct. 5, 2010

2 Low rank minimization problems Matrix completion: find a low-rank matrix W R m n so that W ij = M ij for some given M ij with all (i, j) in an index set Ω: min W R m n rank(w ), s.t. W ij = M ij, (i, j) Ω Matrix separation: find a low-rank Z R m n and a sparse matrix S R m n so that Z + S = D for a given D.

3 Convex approximation Nuclear norm W : the summation of singular values of W Matrix completion: Nuclear-norm relaxation: min W, s.t. W ij = M ij, (i, j) Ω, W R m n Nuclear-norm regularized linear least square: Matrix separation: min µ W + 1 W R m n 2 P Ω(W M) 2 F, min Z + µ S 1 s.t. Z + S = D, Z,S R m n Singular value decomposition can be very expensive!

4 Low-rank factorization model for matrix completion Finding a low-rank matrix W so that P Ω (W M) 2 F or the distance between W and {Z R m n, Z ij = M ij, (i, j) Ω} is minimized. Any matrix W R m n with rank(w ) K can be expressed as W = XY where X R m K and Y R K n. New model min X,Y,Z 1 2 XY Z 2 F s.t. Z ij = M ij, (i, j) Ω Advantage: SVD is no longer needed! Related work: the solver OptSpace based on optimization on manifold

5 Low-rank factorization model for matrix separation Consider the model min S 1 s.t. Z + S = D, rank(l) K Z,S Low-rank factorization: Z = UV min Z D 1 s.t. UV Z = 0 U,V,Z Only the entries D ij, (i, j) Ω, are given. P Ω (D) is the projection of D onto Ω. New model min U,V,Z P Ω (Z D) 1 s.t. UV Z = 0 Advantage: SVD is no longer needed!

6 Nonlinear Gauss-Seideal scheme Consider: min X,Y,Z Alternating minimization: 1 2 XY Z 2 F s.t. Z ij = M ij, (i, j) Ω X + = ZY (YY ) 1 = argmin X R m K 2 XY Z 2 F,

7 Nonlinear Gauss-Seideal scheme Consider: min X,Y,Z 1 2 XY Z 2 F s.t. Z ij = M ij, (i, j) Ω First variant of alternating minimization: X + ZY ZY (YY ), Y + (X + ) Z (X + X + ) (X + Z ), Z + X + Y + + P Ω (M X + Y + ). Let P A be the orthogonal projection onto the range space R(A) X + Y + = ( X + (X + X + ) X + ) Z = PX+ Z One can verify that R(X + ) = R(ZY ). X + Y + = P ZY Z = ZY (YZ ZY ) (YZ )Z. idea: modify X + or Y + to obtain the same product X + Y +

8 Nonlinear Gauss-Seideal scheme Consider: min X,Y,Z 1 2 XY Z 2 F s.t. Z ij = M ij, (i, j) Ω Second variant of alternating minimization: X + ZY, Y + (X + ) Z (X + X + ) (X + Z ), Z + X + Y + + P Ω (M X + Y + ).

9 Nonlinear Gauss-Seideal scheme Consider: min X,Y,Z 1 2 XY Z 2 F s.t. Z ij = M ij, (i, j) Ω Second variant of alternating minimization: X + ZY, Y + (X + ) Z (X + X + ) (X + Z ), Z + X + Y + + P Ω (M X + Y + ). Third variant of alternating minimization: V = orth(zy ) X + V, Y + V Z, Z + X + Y + + P Ω (M X + Y + ).

10 Nonlinear SOR The nonlinear GS scheme can be slow Linear SOR: applying extrapolation to the GS method to achieve faster convergence The first implementation: X + ZY (YY ), X + (ω) ωx + + (1 ω)x, Y + (X + (ω) X + (ω)) (X + (ω) Z ), Y + (ω) ωy + + (1 ω)y, Z + (ω) X + (ω)y + (ω) + P Ω (M X + (ω)y + (ω)),

11 Nonlinear SOR Let S = P Ω (M XY ). Then Z = XY + S Let Z ω XY + ωs = ωz + (1 ω)xy Assume Y has full row rank, then Z ω Y (YY ) = ωzy (YY ) + (1 ω)xyy (YY ) = ωx + + (1 ω)x, Second implementation of our nonlinear SOR: X + (ω) Z ω Y or Z ω Y (YY ), Y + (ω) (X + (ω) X + (ω)) (X + (ω) Z ω ), P Ω c(z + (ω)) P Ω c(x + (ω)y + (ω)), P Ω (Z + (ω)) P Ω (M).

12 Reduction of the residual S 2 F S +(ω) 2 F Assume that rank(z ω ) = rank(z ), ω [1, ω 1 ] for some ω 1 1. Then there exists some ω 2 1 such that S 2 F S +(ω) 2 F = ρ 12(ω) + ρ 3 (ω) + ρ 4 (ω) > 0, ω [1, ω 2 ]. ρ 12 (ω) SP 2 F + Q(ω)S(I P) 2 F 0 ρ 3 (ω) P Ω c(sp + Q(ω)S(I P)) 2 F 0 ρ 4 (ω) 1 ω 2 S + (ω) + (ω 1)S 2 F S +(ω) 2 F Whenever ρ 3 (1) > 0 (P Ω c(x + (1)Y + (1) XY ) 0) and ω 1 > 1, then ω 2 > 1 can be chosen so that ρ 4 (ω) > 0, ω (1, ω 2 ].

13 Reduction of the residual S 2 F S +(ω) 2 F ρ 12 (ω) ρ 3 (ω) ρ 4 (ω) 2 S F 2 S+ (ω) F ω (a)

14 Reduction of the residual S 2 F S +(ω) 2 F ρ 12 (ω) ρ 3 (ω) ρ 4 (ω) 2 S F 2 S+ (ω) F ω (b)

15 Nonlinear SOR: convergence guarantee Problem: how can we select a proper weight ω to ensure convergence for a nonlinear model? Strategy: Adjust ω dynamically according to the change of the objective function values. Calculate the residual ratio γ(ω) = S+(ω) F S F A small γ(ω) indicates that the current weight value ω works well so far. If γ(ω) < 1, accept the new point; otherwise, ω is reset to 1 and this procedure is repeated. ω is increased only if the calculated point is acceptable but the residual ratio γ(ω) is considered too large ; that is, γ(ω) [γ 1, 1) for some γ 1 (0, 1).

16 Nonlinear SOR: complete algorithm Algorithm 1: A low-rank matrix fitting algorithm (LMaFit) Input index set Ω, data P Ω (M) and a rank overestimate K r. Set Y 0, Z 0, ω = 1, ω > 1, δ > 0, γ 1 (0, 1) and k = 0. while not convergent do Compute (X + (ω), Y + (ω), Z + (ω)). Compute the residual ratio γ(ω). if γ(ω) 1 then set ω = 1 and go to step 4. Update (X k+1, Y k+1, Z k+1 ) and increment k. if γ(ω) γ 1 then set δ = max(δ, 0.25(ω 1)) and ω = min(ω + δ, ω).

17 nonlinear GS.vs. nonlinear SOR normalized residual SOR: k=12 SOR: k=20 GS: k=12 GS: k=20 normalized residual SOR: k=12 SOR: k=20 GS: k=12 GS: k= iteration (a) n=1000, r=10, SR = iteration (b) n=1000, r=10, SR=0.15

18 Extension to problems with general linear constraints Consider problem: min W R m n Nonconvex relaxation: rank(w ), s.t. A(W ) = b min X,Y,Z 1 2 XY Z 2 F s.t. A(Z ) = b Let S := A (AA ) (b A(XY )) and Z ω = XY + ωs Nonlinear SOR scheme: X + (ω) Z ω Y or Z ω Y (YY ), Y + (ω) (X + (ω) X + (ω)) (X + (ω) Z ω ), Z + (ω) X + (ω)y + (ω) + A (AA ) (b A(X + (ω)y + (ω))).

19 Nonlinear SOR: convergence analysis Main result: Let {(X k, Y k, Z k )} be generated by the nonlinear SOR method and {P Ω c(x k Y k )} be bounded. Then there exists at least a subsequence of {(X k, Y k, Z k )} that satisfies the first-order optimality conditions in the limit. Technical lemmas: ωs (X + (ω)y + (ω) XY ) = X + (ω)y + (ω) XY 2 F. Residual reduction from S 2 F after the first two steps 1 ω 2 X +(ω)y + (ω) Z ω 2 F = (I Q(ω))S(I P) 2 F = S 2 F ρ 12(ω) lim ω 1 + ρ 4(ω) ω 1 = 2 P Ω c(x +(1)Y + (1) XY ) 2 F 0

20 Practical issues Variants to obtain X + (ω) and Y + (ω): linsolve.vs. QR Storage: full Z or partial S = P Ω (M XY ) Stopping criteria: relres = P Ω(M X k Y k ) F tol and reschg = P Ω (M) F 1 P Ω(M X k Y k ) F P Ω (M X k 1 Y k 1 ) F tol/2, Rank estimation: Start from K r then decrease K aggressively QR = XE, E is permutation, d := diag(r) nonincreasing compute the sequnce d i = d i /d i+1, i = 1,, K 1, examine the ratio τ = (K 1) d(p) i p d i, d(p) is the maximal element of { d i } and p is the corresponding index. reset K to p once τ > 10 Start from a small K and increase K to min(k + κ, rank_max) when the alg. stagnates

21 The sensitivity with respect to the rank estimation K SR=0.04 SR=0.08 SR=0.3 iteration number SR=0.04 SR=0.08 SR=0.3 cpu rank estimation (a) Iteration number rank estimation (b) CPU time in seconds

22 Convergence behavior of the residual r=10 r=50 r= SR=0.04 SR=0.08 SR=0.3 normalized residual normalized residual iteration (a) m = n = 1000, sr= iteration (b) m = n = 1000, r = 10

23 Phase diagrams for matrix completion recoverability sampling ratio sampling ratio rank (a) Model solved by FPCA rank (b) Model solved by LMaFit 0

24 Small random problems: m = n = 1000 Problem APGL FPCA LMaFit K = 1.25r K = 1.5r r SR FR µ time rel.err tsvd time rel.err tsvd time rel.err time rel.err e e-03 82% e-01 12% e e e e-04 71% e-04 19% e e e e-04 66% e-04 42% e e e e-04 58% e-04 72% e e e e-03 93% e-04 56% e e e e-04 87% e-04 67% e e e e-04 85% e-04 75% e e e e-04 82% e-04 77% e e e e-03 92% e-04 77% e e e e-04 91% e-04 79% e e e e-04 90% e-04 82% e e e e-04 89% e-04 81% e e-05

25 Large random problems Problem APGL LMaFit (K = 1.25r ) LMaFit (K = 1.5r ) n r SR FR µ iter #sv time rel.err iter #sv time rel.err iter #sv time rel.err e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e-4

26 Random low-rank approximation problems σ σ σ index σ index index (a) power-low: σ i = i index (b) exponentially: σ i = e 0.3i

27 Random low-rank approximation problems Problem APGL FPCA LMaFit(est_rank=1) LMaFit(est_rank=2) SR FR µ #sv time rel.err #sv time rel.err #sv time rel.err #sv time rel.err power-low decaying e e e e e e e e e e e e e e e e e e e e-04 exponentially decaying e e e e e e e e e e e e e e e e e e e e-04

28 Video Denoising original video 50% masked original video

29 Video Denoising recovered video by LMaFit recovered video by APGL APGL LMaFit m/n µ iter #sv time rel.err iter #sv time rel.err 76800/ e e e-02

30 Contribution Explore matrix completion by using the low-rank facotrization model Reduce the cost of the nonlinear Gauss-Seideal scheme by eliminating an unecessary least square problem Propose a nonlinear successive over-relaxation (SOR) algorithm with convergence guarantee Adjust the relaxation weight of SOR dynamically Excellent computational performance on a wide range of test problems

31 Matrix separation Consider: min U,V,Z P Ω (Z D) 1 s.t. UV Z = 0 Introduce the augmented Lagrangian function L β (U, V, Z, Λ) = P Ω (Z D) 1 + Λ, UV Z + β 2 UV Z 2 F, Alternating direction augmented Lagrangian framework: U j+1 := arg min U R m k L β (U, V j, Z j, Λ j ), V j+1 := arg min L β (U j+1, V, Z j, Λ j ), V R k n Z j+1 := arg min Z R m n L β(u j+1, V j+1, Z, Λ j ), Λ j+1 := Λ j + γβ(u j+1 V j+1 Z j+1 ).

32 ADM subproblems Let B = Z Λ/β, then U + = BV (VV ) and V + = (U +U + ) U +B Since U + V + = U + (U +U + ) U +B = P U+ B, then: Q := orth(bv ), U + = Q and V + = Q B Variable Z : ( P Ω (Z + ) = P Ω (S U + V + D + Λ β, 1 ) ) + D β ( P Ω c(z + ) = P Ω c U + V + + Λ ) β

33 Theoretical results Let X (U, V, Z, Λ). The KKT conditions are ΛV = 0, U Λ = 0, Z ( Z D 1 ) = Λ, UV Z = 0. The equation Z ( Z D 1 ) = Λ is equivalent to Z D = S(Z D + Λ/β, 1/β) = S(UV D + Λ/β, 1/β), Let {X j } p=1 be an uniformly bounded sequence generated by ADM with lim (X j+1 X j ) = 0. p Then any accumulation point of {X j } p=1 satisfies the KKT conditions.

34 CPU and rel. err vs. sparsity: m = 100 and k = 5 CPU time vs. Sparsity Relative Error vs. Sparsity CPU time 10 1 Rel Error 10 5 LMaFit IALM % of Nonzeros (a) CPU seconds LMaFit IALM % of Nonzeros (b) Relative Error CPU time CPU time vs. Sparsity LMaFit IALM LRSD Rel Error Relative Error vs. Sparsity LMaFit IALM LRSD % of Nonzeros (c) CPU seconds (with noise) % of Nonzeros (d) Relative Error (with noise)

35 Recoverability test: phase plots Digits of accuracy Digits of accuracy Rank Rank Sparsity % (a) IALM Sparsity % (b) LMaFit

36 Performance with respect to size CPU time LRSD LMaFit IALM dimension (a) k =0.05m and S 0 =0.05m 2 CPU time LRSD LMaFit IALM dimension (b) k =0.15m and S 0 =0.15m 2

37 Deterministic low-rank matrices Figure: Recovered results of brickwall". Upper left: original image; upper right: corrupted image; lower left: IALM; lower right: LMaFit.

38 Video separation Figure: original, separated results by IALM and LMaFit.

39 Video separation Figure: original, separated results by IALM and LMaFit.

40 Upcoming Talks: Oct. 1?th Wotao, Yin, Fast Curvilinear Search Algorithms for Optimization with Constraints x 2 = 1 or X X = I p-harmonic flow into spheres polynomial optimization with normalized constraints maxcut SDP relaxation low-rank nearest correlation matrix estimation linear and nonlinear eigenvalue problems quadratic assignment problem

Solving A Low-Rank Factorization Model for Matrix Completion by A Nonlinear Successive Over-Relaxation Algorithm

Solving A Low-Rank Factorization Model for Matrix Completion by A Nonlinear Successive Over-Relaxation Algorithm Zaiwen Wen, Wotao Yin, Yin Zhang 2010 ONR Compressed Sensing Workshop May, 2010 Matrix Completion