Matrix Completion: Fundamental Limits and Efficient Algorithms

Similar documents
Matrix completion: Fundamental limits and efficient algorithms. Sewoong Oh Stanford University

Low-rank Matrix Completion from Noisy Entries

Low-rank Matrix Completion with Noisy Observations: a Quantitative Comparison

Matrix Completion from Fewer Entries

Matrix Completion from a Few Entries

Matrix Completion from Noisy Entries

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

Recovering any low-rank matrix, provably

Guaranteed Rank Minimization via Singular Value Projection

Low Rank Matrix Completion Formulation and Algorithm

Matrix Completion from Noisy Entries

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

Using SVD to Recommend Movies

EE 381V: Large Scale Learning Spring Lecture 16 March 7

Structured matrix factorizations. Example: Eigenfaces

Matrix Completion from Noisy Entries

Low-Rank Matrix Recovery

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Breaking the Limits of Subspace Inference

A Scalable Spectral Relaxation Approach to Matrix Completion via Kronecker Products

Compressed Sensing and Robust Recovery of Low Rank Matrices

From Compressed Sensing to Matrix Completion and Beyond. Benjamin Recht Department of Computer Sciences University of Wisconsin-Madison

Collaborative Filtering Matrix Completion Alternating Least Squares

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

The convex algebraic geometry of rank minimization

arxiv: v1 [stat.ml] 1 Mar 2015

Matrix Completion with Noise

Adaptive one-bit matrix completion

Matrix Completion for Structured Observations

Rank minimization via the γ 2 norm

Binary matrix completion

Matrix Factorizations: A Tale of Two Norms

Lecture Notes 10: Matrix Factorization

1-Bit Matrix Completion

MATRIX COMPLETION AND TENSOR RANK

Robust Principal Component Analysis

An iterative hard thresholding estimator for low rank matrix recovery

Linear dimensionality reduction for data analysis

Probabilistic Low-Rank Matrix Completion from Quantized Measurements

1-Bit Matrix Completion

Collaborative Filtering

Sparse and Low-Rank Matrix Decompositions

Exact Low-rank Matrix Recovery via Nonconvex M p -Minimization

BEYOND MATRIX COMPLETION

Spectral k-support Norm Regularization

Sensor Network Localization from Local Connectivity : Performance Analysis for the MDS-MAP Algorithm

A Novel Approach to Quantized Matrix Completion Using Huber Loss Measure

Tighter Low-rank Approximation via Sampling the Leveraged Element

CSC 576: Variants of Sparse Learning

Solving Corrupted Quadratic Equations, Provably

An Extended Frank-Wolfe Method, with Application to Low-Rank Matrix Completion

Lecture 2 Part 1 Optimization

Matrix estimation by Universal Singular Value Thresholding

CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Statistical Performance of Convex Tensor Decomposition

1-Bit Matrix Completion

1-bit Matrix Completion. PAC-Bayes and Variational Approximation

c Copyright 2015 Karthik Mohan

Riemannian Pursuit for Big Matrix Recovery

An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems

Application to Hyperspectral Imaging

A Characterization of Sampling Patterns for Union of Low-Rank Subspaces Retrieval Problem

Joint Capped Norms Minimization for Robust Matrix Recovery

GoDec: Randomized Low-rank & Sparse Matrix Decomposition in Noisy Case

A Simple Algorithm for Nuclear Norm Regularized Problems

Collaborative Filtering. Radek Pelánek

Compressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles

BEYOND MATRIX COMPLETION

Sparse and Low Rank Recovery via Null Space Properties

arxiv: v2 [cs.it] 12 Jul 2011

Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization

High-dimensional Statistics

Andriy Mnih and Ruslan Salakhutdinov

Compressive Sensing and Beyond

Provable Alternating Minimization Methods for Non-convex Optimization

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

Tractable Upper Bounds on the Restricted Isometry Constant

Fast and Robust Phase Retrieval

Recovery of Low-Rank Plus Compressed Sparse Matrices with Application to Unveiling Traffic Anomalies

Divide-and-Conquer Matrix Factorization

Recommendation Systems

EECS 275 Matrix Computation

Machine Learning for Disease Progression

Short Course Robust Optimization and Machine Learning. Lecture 4: Optimization in Unsupervised Learning

Rapid, Robust, and Reliable Blind Deconvolution via Nonconvex Optimization

THERE are many applications in signal processing and. Recovery of Low Rank Matrices Under Affine Constraints via a Smoothed Rank Function

The convex algebraic geometry of linear inverse problems

Online Identification and Tracking of Subspaces from Highly Incomplete Information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Rank Minimization over Finite Fields

Lecture 9: September 28

High-dimensional Statistical Models

The Projected Power Method: An Efficient Algorithm for Joint Alignment from Pairwise Differences

Network Localization via Schatten Quasi-Norm Minimization

Sparse Solutions of an Undetermined Linear System

Lecture 8: Linear Algebra Background

Combining Sparsity with Physically-Meaningful Constraints in Sparse Parameter Estimation

Lecture: Matrix Completion

Information-Theoretic Limits of Matrix Completion

Transcription:

Matrix Completion: Fundamental Limits and Efficient Algorithms Sewoong Oh PhD Defense Stanford University July 23, 2010 1 / 33

Matrix completion Find the missing entries in a huge data matrix 2 / 33

Example 1. Recommendation systems? 1 3 4 1 1 5 4 4 4 4 4 4 4 4 4 4 4 4 4 4 3 3 3 3 1 1 1 1 1 5 5 5 2 2 2 2 3???? 5 10 5 users 2 10 4 movies 10 6 queries Given less than 1% of the movie ratings Goal: Predict missing ratings 3 / 33

Example 2. Positioning Distance Matrix Only distances between close-by sensors are measured Goal: Find the sensor positions up to a rigid motion 4 / 33

Matrix completion More applications: Computer vision: Structure-from-motion Molecular biology: Microarray Numerical linear algebra: Fast low-rank approximations etc. 5 / 33

Outline 1 Background 2 Algorithm and main results 3 Applications in positioning 6 / 33

Background 7 / 33

The model Σ V T r αn Low-rank M = U n r Rank-r matrix M Random uniform sample set E Sample matrix M E M E ij = { Mij if (i, j) E 0 otherwise 8 / 33

The model αn Sample M E n Rank-r matrix M Random uniform sample set E Sample matrix M E M E ij = { Mij if (i, j) E 0 otherwise 8 / 33

Which matrices? Pathological example 1 0 0 1 0 0 0 M =...... = 0. [1 ] [1 0 0 ] 0 0 0 0 9 / 33

Which matrices? Pathological example 1 0 0 1 0 0 0 M =...... = 0. [1 ] [1 0 0 ] 0 0 0 0 [Candès, Recht 08] M = UΣV T has coherence µ if A0. max A1. max i,j 1 i αn k=1 r U 2 ik µ r n, r k=1 U ik V jk µ r n max 1 j n k=1 Intuition µ is small if singular vectors are well balanced We need low-coherence for matrix completion r Vjk 2 µ r n 9 / 33

Previous work Rank minimization minimize subject to rank(x) X ij = M ij, (i, j) E NP-hard 10 / 33

Previous work Rank minimization Heuristic [Fazel 02] minimize subject to rank(x) X ij = M ij, (i, j) E minimize subject to X X ij = M ij, (i, j) E NP-hard Convex relaxation Nuclear norm n X = σ i (X ) i=1 Can be solved using Semidefinite Programming(SDP) 10 / 33

Previous work [Candès, Recht 08] Nuclear norm minimization reconstructs M exactly with high probability, if E C µ r n 6/5 log n Surprise? 11 / 33

Previous work [Candès, Recht 08] Nuclear norm minimization reconstructs M exactly with high probability, if E C µ r n 6/5 log n Degrees of freedom (1 + α)rn Open questions Optimality: Do we need n 6/5 log n samples? Complexity: SDP is computationally expensive Noise: Can not deal with noise 11 / 33

Previous work [Candès, Recht 08] Nuclear norm minimization reconstructs M exactly with high probability, if E C µ r n 6/5 log n Degrees of freedom (1 + α)rn Open questions Optimality: Do we need n 6/5 log n samples? Complexity: SDP is computationally expensive Noise: Can not deal with noise A new approach to Matrix Completion: OptSpace 11 / 33

Example: 2000 2000 rank-8 random matrix low-rank matrix M sampled matrix M E OptSpace output M squared error (M M) 2 0.25% sampled 12 / 33

Example: 2000 2000 rank-8 random matrix low-rank matrix M sampled matrix M E OptSpace output M squared error (M M) 2 0.50% sampled 12 / 33

Example: 2000 2000 rank-8 random matrix low-rank matrix M sampled matrix M E OptSpace output M squared error (M M) 2 0.75% sampled 12 / 33

Example: 2000 2000 rank-8 random matrix low-rank matrix M sampled matrix M E OptSpace output M squared error (M M) 2 1.00% sampled 12 / 33

Example: 2000 2000 rank-8 random matrix low-rank matrix M sampled matrix M E OptSpace output M squared error (M M) 2 1.25% sampled 12 / 33

Example: 2000 2000 rank-8 random matrix low-rank matrix M sampled matrix M E OptSpace output M squared error (M M) 2 1.50% sampled 12 / 33

Example: 2000 2000 rank-8 random matrix low-rank matrix M sampled matrix M E OptSpace output M squared error (M M) 2 1.75% sampled 12 / 33

Algorithm 13 / 33

Naïve approach Singular Value Decomposition (SVD) n M E = σ i x i yi T i=1 Compute rank-r approximation M SVD r M SVD αn2 σ i x i yi T E i=1 M E MSVD 14 / 33

Naïve approach fails Singular Value Decomposition (SVD) n M E = σ i x i yi T i=1 Compute rank-r approximation M SVD r M SVD αn2 σ i x i yi T E i=1 M E MSVD 14 / 33

Naïve approach fails Singular Value Decomposition (SVD) n M E = σ i x i yi T i=1 Compute rank-r approximation M SVD r M SVD αn2 σ i x i yi T E i=1 M E MSVD 14 / 33

Trimming M E = M E ij = M E ij 0 if deg(row i ) > 2 E /αn 0 if deg(col j ) > 2 E /n otherwise deg( ) is the number of samples in that row/column 15 / 33

Trimming M E = M E ij = M E ij 0 if deg(row i ) > 2 E /αn 0 if deg(col j ) > 2 E /n otherwise deg( ) is the number of samples in that row/column 15 / 33

Trimming M E = M E ij = M E ij 0 if deg(row i ) > 2 E /αn 0 if deg(col j ) > 2 E /n otherwise deg( ) is the number of samples in that row/column 15 / 33

Algorithm OptSpace Input : sample indices E, sample values M E, rank r Output : estimation M 1: Trimming 2: Compute M SVD using SVD 3: Greedy minimization of the residual error 16 / 33

Algorithm OptSpace Input : sample indices E, sample values M E, rank r Output : estimation M 1: Trimming 2: Compute M SVD using SVD M SVD can be computed efficiently for sparse matrices 16 / 33

Main results Theorem For any E, M SVD achieves, with high probability, RMSE CM max nr E ( 1 RMSE = αn i,j (M M 2 SVD ) 2 ij M max max i,j M ij ) 1/2 Keshavan, Montanari, Oh, IEEE Trans. Information Theory, 2010 17 / 33

Main results Theorem For any E, M SVD achieves, with high probability, RMSE CM max nr E [Achlioptas, McSherry 07] If E n(8 log n) 4, with high probability, RMSE 4M max nr E For n = 10 5, (8 log n) 4 7.2 10 7 Keshavan, Montanari, Oh, IEEE Trans. Information Theory, 2010 17 / 33

Main results Theorem For any E, M SVD achieves, with high probability, RMSE CM max nr E [Achlioptas, McSherry 07] If E n(8 log n) 4, with high probability, RMSE 4M max nr E Netflix dataset A single user rated 17,000 movies. Miss Congeniality : 200,000 ratings. For n = 10 5, (8 log n) 4 7.2 10 7 Keshavan, Montanari, Oh, IEEE Trans. Information Theory, 2010 17 / 33

Can we do better? 18 / 33

Greedy minimization of residual error Starting from (X 0, Y 0 ) for M SVD = X 0 S 0 Y0 T, use gradient descent methods to solve minimize F (X, Y) subject to X T X = I, Y T Y = I ) 2 F (X, Y) (M E ij (XSY T ) ij min S R r r (i,j) E X S Y T rank Can be computed efficiently for sparse matrices 19 / 33

Algorithm OptSpace Input : sample indices E, sample values M E, rank r Output : estimation M 1: Trimming 2: Compute M SVD using SVD 3: Greedy minimization of the residual error 20 / 33

Main results Theorem (Trimming+SVD) M SVD achieves, with high probability, RMSE CM max nr E Theorem (Trimming+SVD+Greedy minimization) OptSpace reconstructs M exactly, with high probability, if E C µ r n max{µr, log n} Keshavan, Montanari, Oh, IEEE Trans. Information Theory, 2010 21 / 33

OptSpace is order-optimal Theorem If µ and r are bounded, OptSpace reconstructs M exactly, with high probability, if E C n log n Lower bound (coupon collector s problem): If E C n log n, then exact reconstruction is impossible 22 / 33

OptSpace is order-optimal Theorem If µ and r are bounded, OptSpace reconstructs M exactly, with high probability, if E C n log n Lower bound (coupon collector s problem): If E C n log n, then exact reconstruction is impossible Nuclear norm minimization: [Candès, Recht 08, Candès, Tao 09, Recht 09, Gross et al. 09] If E C n (log n) 2, then exact reconstruction by SDP 22 / 33

Comparison 1000 1000 rank-10 matrix M 1 P success 0.5 Fundamental Limit OptSpace FPCA SVT ADMiRA 0 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 Sampling rate Fundamental Limit [Singer, Cucuringu 09], FPCA [Ma, Goldfarb, Chen 09], SVT [Cai, Candès, Shen 08], ADMiRA [Lee, Bresler 09] 23 / 33

Story so far OptSpace reconstructs M from a few sampled entries, when M is exactly low-rank and samples are exact In reality, M is only approximately low-rank samples are corrupted by noise 24 / 33

The model with noise Σ V T r αn Low-rank M = U n r Rank-r matrix M Random sample set E Sample noise Z E Sample matrix N E = M E + Z E 25 / 33

The model with noise αn Sample N E n Rank-r matrix M Random sample set E Sample noise Z E Sample matrix N E = M E + Z E 25 / 33

Main results Theorem For E Cµrn max{µr, log n}, OptSpace achieves, with high probability, RMSE C n r E ZE 2, provided that the RHS is smaller than σ r (M)/n. 2 is the spectral norm Keshavan, Montanari, Oh, Journ. Machine Learning Research, 2010 26 / 33

OptSpace is order-optimal when noise is i.i.d. Gaussian Theorem For E Cµrn max{µr, log n}, OptSpace achieves, with high probability, RMSE C σ z r n E provided that the RHS is smaller than σ r (M)/n. Lower bound: [Candès, Plan 09] RMSE σ 2 r n z E Trimming + SVD r n r n RMSE CM max + C σ z E E }{{}}{{} missing entries sample noise 27 / 33

Comparison 500 500 rank-4 matrix M, Gaussian noise with σ z = 1 Example from [Candès, Plan 09] 1.5 Trim+SVD RMSE 1 0.5 Lower Bound 0 OptSpace 0.2 0.4 0.6 0.8 1 Sampling rate 28 / 33

Comparison 500 500 rank-4 matrix M, Gaussian noise with σ z = 1 Example from [Candès, Plan 09] 1.5 Trim+SVD RMSE 1 0.5 Lower Bound 0 OptSpace FPCA ADMiRA 0.2 0.4 0.6 0.8 1 Sampling rate FPCA [Ma, Goldfarb, Chen 09], ADMiRA [Lee, Bresler 09] 28 / 33

Positioning 29 / 33

The model Distance Matrix D R n wireless devices uniformly distributed in a bounded convex region Distance measurements between devices within radio range R Find the locations up to a rigid motion 30 / 33

The model Distance Matrix D R How is it related to Matrix Completion? Need to find the missing entries rank(d) = 4 How is it different? Non-uniform sampling Rich information not used in Matrix Completion 30 / 33

The model Distance Matrix D R MDS-MAP [Shang et al. 03] 1. Fill in the missing entries with shortest paths 2. Compute rank-4 approximation 30 / 33

Main results Theorem log n For R > C n, with high probability, RMSE C log n R n ( RMSE = 1 n 2 i,j ( D D) 2 ij) 1/2 + o(1). Lower Bound: If R <, then the graph is disconnected log n πn Generalized to quantized measurements and distributed algorithms We can add Greedy Minimization step Oh, Karbasi, Montanari, Information Theory Workshop, 2010 Karbasi, Oh, ACM SIGMETRICS, 2010 31 / 33

Numerical simulation RMSE 0.25 0.2 n=500 n=1,000 n=2,000 0.25 0.2 n=500 n=1,000 n=2,000 0.15 0.15 0.1 0.1 0.05 0.05 0 1 1.5 2 2.5 3 3.5 4 4.5 R/ log n πn R/ 0 2 3 4 5 6 7 8 9 10 11 12 1 πn 32 / 33

Conclusion Matrix Completion is an important problem with many practical applications OptSpace is an efficient algorithm for Matrix Completion OptSpace achieves performance close to the fundamental limit 33 / 33

Special thanks to: PMCRQOAVNBETHRDAZOKNWXAOURHLIO 33 / 33

Special thanks to: PMCRQOAVNBETHRDAZOKNWXAOURHLIO 33 / 33

Special thanks to: PMCRQOAVNBETHRDAZOKNWXAOURHLIO 33 / 33

Special thanks to: PMCRQOAVNBETHRDAZOKNWXAOURHLIO 33 / 33

Special thanks to: PMCRQOAVNBETHRDAZOKNWXAOURHLIO 33 / 33

Special thanks to: Officemates: Morteza, Yash, Jose, Raghu, Satish Friends: Mohsen, Adel, Farshid, Fernando, Arian, Haim, Sachin, Ivana, Ahn, Cha, Choi, Rhee, Kang, Kim, Lee, Na, Park, Ra, Seok, Song 33 / 33

Special thanks to: My family and Kyung Eun 33 / 33

Thank you! 33 / 33