Low Rank Matrix Completion Formulation and Algorithm

Similar documents
Collaborative Filtering

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

Using SVD to Recommend Movies

Collaborative Filtering Matrix Completion Alternating Least Squares

Spectral k-support Norm Regularization

Matrix Completion: Fundamental Limits and Efficient Algorithms

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

Structured matrix factorizations. Example: Eigenfaces

From Compressed Sensing to Matrix Completion and Beyond. Benjamin Recht Department of Computer Sciences University of Wisconsin-Madison


CSC 576: Variants of Sparse Learning

Matrix completion: Fundamental limits and efficient algorithms. Sewoong Oh Stanford University

Large-scale Matrix Factorization. Kijung Shin Ph.D. Student, CSD

Low-rank Matrix Completion with Noisy Observations: a Quantitative Comparison

Matrix Completion from Fewer Entries

Matrix Factorizations: A Tale of Two Norms

Linear dimensionality reduction for data analysis

Robust Principal Component Analysis

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison

A Simple Algorithm for Nuclear Norm Regularized Problems

An Extended Frank-Wolfe Method, with Application to Low-Rank Matrix Completion

Principal Component Analysis (PCA) for Sparse High-Dimensional Data

Scaling Neighbourhood Methods

Short Course Robust Optimization and Machine Learning. Lecture 4: Optimization in Unsupervised Learning

ECS289: Scalable Machine Learning

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Matrix Completion for Structured Observations

CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Adaptive one-bit matrix completion

Distance Geometry-Matrix Completion

Maximum Margin Matrix Factorization

Analysis of Robust PCA via Local Incoherence

PU Learning for Matrix Completion

Lecture 9: September 28

Lecture 8: February 9

Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds

Combining Sparsity with Physically-Meaningful Constraints in Sparse Parameter Estimation

Recommendation Systems

Machine Learning for Signal Processing Sparse and Overcomplete Representations

1-Bit Matrix Completion

Stochastic dynamical modeling:

Data Mining and Matrices

Low-Rank Matrix Recovery

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Direct Robust Matrix Factorization for Anomaly Detection

Collaborative Filtering: A Machine Learning Perspective

1-Bit Matrix Completion

Recommender Systems. Dipanjan Das Language Technologies Institute Carnegie Mellon University. 20 November, 2007

Chapter 2. Optimization. Gradients, convexity, and ALS

Collaborative Filtering. Radek Pelánek

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparse Parameter Estimation: Compressed Sensing meets Matrix Pencil

Preconditioning. Noisy, Ill-Conditioned Linear Systems

Latent Semantic Analysis. Hongning Wang

Solving Corrupted Quadratic Equations, Provably

On the Convex Geometry of Weighted Nuclear Norm Minimization

GoDec: Randomized Low-rank & Sparse Matrix Decomposition in Noisy Case

Andriy Mnih and Ruslan Salakhutdinov

10-725/36-725: Convex Optimization Prerequisite Topics

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Randomized Algorithms

Rank minimization via the γ 2 norm

Multi-stage convex relaxation approach for low-rank structured PSD matrix recovery

Non-convex Robust PCA: Provable Bounds

Lecture Notes 10: Matrix Factorization

Fantope Regularization in Metric Learning

1-Bit Matrix Completion

Decoupled Collaborative Ranking

High-Rank Matrix Completion and Subspace Clustering with Missing Data

An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems

Preconditioning. Noisy, Ill-Conditioned Linear Systems

Binary Principal Component Analysis in the Netflix Collaborative Filtering Task

Guaranteed Rank Minimization via Singular Value Projection

Linear Methods for Regression. Lijun Zhang

Matrix Rank Minimization with Applications

Binary matrix completion

Rapid, Robust, and Reliable Blind Deconvolution via Nonconvex Optimization

A Quick Tour of Linear Algebra and Optimization for Machine Learning

2.3. Clustering or vector quantization 57

L26: Advanced dimensionality reduction

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

An Introduction to Compressed Sensing

Lecture: Matrix Completion

Stochastic Gradient Descent with Variance Reduction

Closed-Form Solutions in Low-Rank Subspace Recovery Models and Their Implications. Zhouchen Lin ( 林宙辰 ) 北京大学 Nov. 7, 2015

EUSIPCO

Large-scale Collaborative Ranking in Near-Linear Time

ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications

Graphical Models for Collaborative Filtering

Robust PCA. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng

Block Coordinate Descent for Regularized Multi-convex Optimization

Provable Alternating Minimization Methods for Non-convex Optimization

Elastic-Net Regularization of Singular Values for Robust Subspace Learning

Lecture 20: November 1st

arxiv: v1 [stat.ml] 1 Mar 2015

Lecture 5 : Projections

Matrix Completion from Noisy Entries

Transcription:

1 2 Low Rank Matrix Completion and Algorithm Jian Zhang Department of Computer Science, ETH Zurich zhangjianthu@gmail.com March 25, 2014

Movie Rating 1 2 Critic A 5 5 Critic B 6 5 Jian 9 8 Kind Guy B 9 9 Observation: Challenge: Task: Matrix formulation, Clusters of opinion Missing entries Re the matrix with constraints

Low Rank Matrix Completion 1 2 Low Rank Matrix Completion minimize s.t. where Rank(X) P Ω (X) = P Ω (M) P Ω (S) i,j = { Si,j if (i, j) Ω 0 otherwise Ω index set of observed entries in M

Outlines 1 2 Linear Algebra Two s Algorithms Performance Comparison

: SVD and 1 2 Singular Value X = UΣV T where X R m n UU T = I m V V T = I n ] X = [u σ 1 ] T... 1, u 1,..., u m [v 1, v 1,..., v n σ i 0 X = σ r 0 r i=1... 0 σ r

: for Minimizing Rank 1 2 minimize Rank(X) relaxation minimize X s.t. X C s.t. X C Positive semidefinite matrix case X = U X = λ 1... λ r 0... r λ i = T r(x) = i=1 0 U T r λ i i=1

: for Minimizing Rank 1 2 r min X = min λ i i=1 Minimize L 1 norm encourage zero entries Rank(X) = # of non-zero eigenvalues

: for Minimizing Rank 1 2 General case minimize Rank(X) minimize Rank (diag(y, Z)) [ ] Y X s.t. X C s.t. X T 0, X C Z PSD minimize X minimize Tr (diag(y, Z)) [ ] Y X s.t. X C s.t. X T 0, X C Z

: Rank and 1 2 Rank and X = LR is at most rank r, if L R m r and R R r n Right null space of R is subspace of right null space of X dim (ker(x)) + dim (X) = n, dim(x) dim(r) r much fewer variables if r is much smaller than m and n.

: 1 2 minimize Rank(X) relaxation minimize X s.t. X C s.t. X C Rank and X = LR is at most rank r, if L R m r and R R r n

: 1 2 Low Rank Matrix Completion minimize s.t. Rank(X) P Ω (X) = P Ω (M) : Rank(X) is not continuous, not differentiable. Thus an objective function difficult to optimize.

: Least Square 1 2 Least Square minimize P Ω (X) P Ω (M) 2 2 s.t. X = LR T Essentially a Least Square problem. Enforce X to be at most rank r, if L R m r

: Least Square Overfitting: degrade generalization 1 2 Regularizer: control the parameter complexity - L2 norm of parameter vector is a common regularizer, which is m i=1 L i 2 2 + n j=1 R j 2 2 = L 2 F + R 2 F Regularized Least Square minimize ((LR T ) i,j M i,j ) 2 + µ 2 L 2 F + µ 2 R 2 F (i,j) Ω

: Based 1 2 Based minimize (X i,j M i,j ) 2 + µ X (i,j) Ω Soft squared error instead of hard constraints Nuclear norm and square error are convex functions of X µ to keep balance between error and rank. X still need to be updated as a easier explicit function of entries in X

: Based of 1 2 X = inf{ 1 2 L 2 F + 1 2 R 2 F : X = LR T } A F = ( i j A2 i,j ) 1 2 Verification with L = UΣ 1 2 and R = V Σ 1 2 from SVD L 2 F =Tr(LL T ) =Tr(UΣU T ) =Tr(U T UΣ) = X =Tr(Σ) 1 2 L 2 F + 1 2 R 2 F, as a upper bound for X, can be used to approximate X.

: Based 1 2 Approximated Based minimize ((LR T ) i,j M i,j ) 2 + µ 2 L 2 F + µ 2 R 2 F (i,j) Ω

: Converged 1 2 Converged minimize (i,j) Ω ((LR T ) i,j M i,j ) 2 + µ 2 L 2 F + µ 2 R 2 F Story Line 1: Approximately minimize rank with nuclear norm and square error. Story Line 2: Least square estimation with L 2 regularizer preventing overfitting. A convergency of beautiful minds.

: Non-Convexity Converged 1 2 minimize (i,j) Ω ((LR T ) i,j M i,j ) 2 + λ 2 L 2 F + λ 2 R 2 F A non-convex problem with local minimum. Ex. (x y 1) 2 + 0.001x 2 + 0.001y 2 Under mild conditions, a local minimum here is globally optimal for min (i,j) Ω (X i,j M i,j ) 2 +µ X Convex if λ large enough. Local mininum is worth our efforts.

: Alternating Least Square 1 2 Alternating Least Square Algorithm minimize ((LR T ) i,j M i,j ) 2 + µ 2 L 2 F + µ 2 R 2 F 1 Fix L, j (i,j) Ω min ˆL j R j. ˆM.j 2 2 + µ 2 R j. 2 2 ˆL j : include row L i., i, (i, j) Ω (On Blackboard) ˆM.j : include entry M i,j i, (i, j) Ω 2 Fix R, i min ˆR T i L.i ˆM T i. 2 2 + µ 2 L.i 2 2 3 Repeat 1 and 2 until convergency.

: Alternating Least Square 1 2 Pros: Convex sub-problem. Closed-form solution in each step. Rj. =argmin ˆL j R j. ˆM.j 2 2 + µ 2 R j. 2 2 = (ˆL T ˆL j j + µ ) 1 2 I ˆL T j Cons: ˆM.j O(max(nr 3, mr 3 )) update operation in each step.

: Gradient Descent 1 2 f(x) is the direction with most rapid increase. x (k+1) = x (k) η (k) f(x) gives decreasing sequence if step size is small enough.

: Gradient Descent 1 2 Gradient Descent Algorithm minimize ((LR T ) i,j M i,j) 2 + µ 2 L 2 F + µ 2 R 2 F 1 i, j (i,j) Ω where X = LR T, L (k+1) i. =L (k) i. η (k+1) L R m r, R R n r j:(i,j) Ω Similar for R (k+1).j 2 Repeat 1 until convergency. ( ) L (k) i. R (k)t j. M i,j R (k)t j. + µl (k) i.

: Gradient Descent 1 2 Pros: O((m + n)r 2 ) update operation in each step. O((m + n)r) update operation with Stochastic Gradient Descent. Pick (i, j) randomly. Only consider error and. F related to entry (i, j) [ ( ) L (k+1) i. =L (k) i. η (k+1) L (k) i. R (k)t j. M i,j R (k)t j. + µ ] Ω (i.) L(k) i. Cons: Iterative approach might need much more epochs.

: 1 2 Alternating Least Square Closed-form update with high stepwise complexity. Might need fewer epochs. (Stochastic) Gradient Descent Iterative update with low stepwise complexity. Might need more epochs.

: Efficiency 1 2 Netflix Dataset: Movie rating from 480K reviewer s rating for more than 18K movies. Configuration: maximal possible rank r, L R m r. method # of epochs time/min (r = 50) time/min (r = 100) 6 66 290 31 58 52.8

: More s and Qualitative Results 1 2 Video Denoising: Observe reliable pixels. Vectorizing and stacking similar patches to matrix. Employ low rank structure within the matrix.

: More s and Qualitative Results 1 2 Background Substraction: Vectorizing and stacking different video frame. Background is the low rank components. Foreground is sparse noise.

1 2 Technically Matrix Completion is a fundamental problem for many real world application. Nuclear norm based and Least Square based formulation converge into one with different relaxation. Alternating Least Square and (Stochastic) Gradient Descent are widely used alternatives to solve matrix completion problems. Methodologically Relaxing the formulation and employing heuristic is a wise way to approximately solve difficult problems. Tradeoff between elegant closed form solution and brute force operations.

1 2 Thank You! Q & A Acknowledgement: Thanks to Tobias for helpful discussions, suggestions and chips!!!

1 2 1 Low-rank Matrix Completion using Alternating Minimization. Prateek et al. STOC 2013. 2 Distributed Matrix Completion. Teflioudi et al. ICDM 2012. 3 Large-scale Parallel Collaborative Filtering for the Netflix Prize. Zhou et al. AAIM 2008. 4 Parallel Stochastic Gradient ALgorithms for Large-Scale Matrix Completion. Recht et al. MPC 2013. 5 Matrix Rank Minimization with s. Fazel. Phd Thesis. 6 Robust Video Denoising using Low Rank Matrix Completion. Ji et al. CVPR 2010. 7 Robust Principal Component Analysis:Exact Rey of Corrupted Low-Rank Matrices via Convex Wright et al. NIPS 2009.