Low Rank Matrix Completion Formulation and Algorithm

1 2 Low Rank Matrix Completion and Algorithm Jian Zhang Department of Computer Science, ETH Zurich zhangjianthu@gmail.com March 25, 2014

Movie Rating 1 2 Critic A 5 5 Critic B 6 5 Jian 9 8 Kind Guy B 9 9 Observation: Challenge: Task: Matrix formulation, Clusters of opinion Missing entries Re the matrix with constraints

Low Rank Matrix Completion 1 2 Low Rank Matrix Completion minimize s.t. where Rank(X) P Ω (X) = P Ω (M) P Ω (S) i,j = { Si,j if (i, j) Ω 0 otherwise Ω index set of observed entries in M

Outlines 1 2 Linear Algebra Two s Algorithms Performance Comparison

: SVD and 1 2 Singular Value X = UΣV T where X R m n UU T = I m V V T = I n ] X = [u σ 1 ] T... 1, u 1,..., u m [v 1, v 1,..., v n σ i 0 X = σ r 0 r i=1... 0 σ r

: for Minimizing Rank 1 2 minimize Rank(X) relaxation minimize X s.t. X C s.t. X C Positive semidefinite matrix case X = U X = λ 1... λ r 0... r λ i = T r(x) = i=1 0 U T r λ i i=1

: for Minimizing Rank 1 2 r min X = min λ i i=1 Minimize L 1 norm encourage zero entries Rank(X) = # of non-zero eigenvalues

: for Minimizing Rank 1 2 General case minimize Rank(X) minimize Rank (diag(y, Z)) [ ] Y X s.t. X C s.t. X T 0, X C Z PSD minimize X minimize Tr (diag(y, Z)) [ ] Y X s.t. X C s.t. X T 0, X C Z

: Rank and 1 2 Rank and X = LR is at most rank r, if L R m r and R R r n Right null space of R is subspace of right null space of X dim (ker(x)) + dim (X) = n, dim(x) dim(r) r much fewer variables if r is much smaller than m and n.

: 1 2 minimize Rank(X) relaxation minimize X s.t. X C s.t. X C Rank and X = LR is at most rank r, if L R m r and R R r n

: 1 2 Low Rank Matrix Completion minimize s.t. Rank(X) P Ω (X) = P Ω (M) : Rank(X) is not continuous, not differentiable. Thus an objective function difficult to optimize.

: Least Square 1 2 Least Square minimize P Ω (X) P Ω (M) 2 2 s.t. X = LR T Essentially a Least Square problem. Enforce X to be at most rank r, if L R m r

: Least Square Overfitting: degrade generalization 1 2 Regularizer: control the parameter complexity - L2 norm of parameter vector is a common regularizer, which is m i=1 L i 2 2 + n j=1 R j 2 2 = L 2 F + R 2 F Regularized Least Square minimize ((LR T ) i,j M i,j ) 2 + µ 2 L 2 F + µ 2 R 2 F (i,j) Ω

: Based 1 2 Based minimize (X i,j M i,j ) 2 + µ X (i,j) Ω Soft squared error instead of hard constraints Nuclear norm and square error are convex functions of X µ to keep balance between error and rank. X still need to be updated as a easier explicit function of entries in X

: Based of 1 2 X = inf{ 1 2 L 2 F + 1 2 R 2 F : X = LR T } A F = ( i j A2 i,j ) 1 2 Verification with L = UΣ 1 2 and R = V Σ 1 2 from SVD L 2 F =Tr(LL T ) =Tr(UΣU T ) =Tr(U T UΣ) = X =Tr(Σ) 1 2 L 2 F + 1 2 R 2 F, as a upper bound for X, can be used to approximate X.

: Based 1 2 Approximated Based minimize ((LR T ) i,j M i,j ) 2 + µ 2 L 2 F + µ 2 R 2 F (i,j) Ω

: Converged 1 2 Converged minimize (i,j) Ω ((LR T ) i,j M i,j ) 2 + µ 2 L 2 F + µ 2 R 2 F Story Line 1: Approximately minimize rank with nuclear norm and square error. Story Line 2: Least square estimation with L 2 regularizer preventing overfitting. A convergency of beautiful minds.

: Non-Convexity Converged 1 2 minimize (i,j) Ω ((LR T ) i,j M i,j ) 2 + λ 2 L 2 F + λ 2 R 2 F A non-convex problem with local minimum. Ex. (x y 1) 2 + 0.001x 2 + 0.001y 2 Under mild conditions, a local minimum here is globally optimal for min (i,j) Ω (X i,j M i,j ) 2 +µ X Convex if λ large enough. Local mininum is worth our efforts.

: Alternating Least Square 1 2 Alternating Least Square Algorithm minimize ((LR T ) i,j M i,j ) 2 + µ 2 L 2 F + µ 2 R 2 F 1 Fix L, j (i,j) Ω min ˆL j R j. ˆM.j 2 2 + µ 2 R j. 2 2 ˆL j : include row L i., i, (i, j) Ω (On Blackboard) ˆM.j : include entry M i,j i, (i, j) Ω 2 Fix R, i min ˆR T i L.i ˆM T i. 2 2 + µ 2 L.i 2 2 3 Repeat 1 and 2 until convergency.

: Alternating Least Square 1 2 Pros: Convex sub-problem. Closed-form solution in each step. Rj. =argmin ˆL j R j. ˆM.j 2 2 + µ 2 R j. 2 2 = (ˆL T ˆL j j + µ ) 1 2 I ˆL T j Cons: ˆM.j O(max(nr 3, mr 3 )) update operation in each step.

: Gradient Descent 1 2 f(x) is the direction with most rapid increase. x (k+1) = x (k) η (k) f(x) gives decreasing sequence if step size is small enough.

: Gradient Descent 1 2 Gradient Descent Algorithm minimize ((LR T ) i,j M i,j) 2 + µ 2 L 2 F + µ 2 R 2 F 1 i, j (i,j) Ω where X = LR T, L (k+1) i. =L (k) i. η (k+1) L R m r, R R n r j:(i,j) Ω Similar for R (k+1).j 2 Repeat 1 until convergency. ( ) L (k) i. R (k)t j. M i,j R (k)t j. + µl (k) i.

: Gradient Descent 1 2 Pros: O((m + n)r 2 ) update operation in each step. O((m + n)r) update operation with Stochastic Gradient Descent. Pick (i, j) randomly. Only consider error and. F related to entry (i, j) [ ( ) L (k+1) i. =L (k) i. η (k+1) L (k) i. R (k)t j. M i,j R (k)t j. + µ ] Ω (i.) L(k) i. Cons: Iterative approach might need much more epochs.

: 1 2 Alternating Least Square Closed-form update with high stepwise complexity. Might need fewer epochs. (Stochastic) Gradient Descent Iterative update with low stepwise complexity. Might need more epochs.

: Efficiency 1 2 Netflix Dataset: Movie rating from 480K reviewer s rating for more than 18K movies. Configuration: maximal possible rank r, L R m r. method # of epochs time/min (r = 50) time/min (r = 100) 6 66 290 31 58 52.8

: More s and Qualitative Results 1 2 Video Denoising: Observe reliable pixels. Vectorizing and stacking similar patches to matrix. Employ low rank structure within the matrix.

: More s and Qualitative Results 1 2 Background Substraction: Vectorizing and stacking different video frame. Background is the low rank components. Foreground is sparse noise.

1 2 Technically Matrix Completion is a fundamental problem for many real world application. Nuclear norm based and Least Square based formulation converge into one with different relaxation. Alternating Least Square and (Stochastic) Gradient Descent are widely used alternatives to solve matrix completion problems. Methodologically Relaxing the formulation and employing heuristic is a wise way to approximately solve difficult problems. Tradeoff between elegant closed form solution and brute force operations.

1 2 Thank You! Q & A Acknowledgement: Thanks to Tobias for helpful discussions, suggestions and chips!!!

1 2 1 Low-rank Matrix Completion using Alternating Minimization. Prateek et al. STOC 2013. 2 Distributed Matrix Completion. Teflioudi et al. ICDM 2012. 3 Large-scale Parallel Collaborative Filtering for the Netflix Prize. Zhou et al. AAIM 2008. 4 Parallel Stochastic Gradient ALgorithms for Large-Scale Matrix Completion. Recht et al. MPC 2013. 5 Matrix Rank Minimization with s. Fazel. Phd Thesis. 6 Robust Video Denoising using Low Rank Matrix Completion. Ji et al. CVPR 2010. 7 Robust Principal Component Analysis:Exact Rey of Corrupted Low-Rank Matrices via Convex Wright et al. NIPS 2009.