Matrix Completion: Fundamental Limits and Efficient Algorithms

Matrix Completion: Fundamental Limits and Efficient Algorithms Sewoong Oh PhD Defense Stanford University July 23, 2010 1 / 33

Matrix completion Find the missing entries in a huge data matrix 2 / 33

Example 1. Recommendation systems? 1 3 4 1 1 5 4 4 4 4 4 4 4 4 4 4 4 4 4 4 3 3 3 3 1 1 1 1 1 5 5 5 2 2 2 2 3???? 5 10 5 users 2 10 4 movies 10 6 queries Given less than 1% of the movie ratings Goal: Predict missing ratings 3 / 33

Example 2. Positioning Distance Matrix Only distances between close-by sensors are measured Goal: Find the sensor positions up to a rigid motion 4 / 33

Matrix completion More applications: Computer vision: Structure-from-motion Molecular biology: Microarray Numerical linear algebra: Fast low-rank approximations etc. 5 / 33

Outline 1 Background 2 Algorithm and main results 3 Applications in positioning 6 / 33

Background 7 / 33

The model Σ V T r αn Low-rank M = U n r Rank-r matrix M Random uniform sample set E Sample matrix M E M E ij = { Mij if (i, j) E 0 otherwise 8 / 33

The model αn Sample M E n Rank-r matrix M Random uniform sample set E Sample matrix M E M E ij = { Mij if (i, j) E 0 otherwise 8 / 33

Which matrices? Pathological example 1 0 0 1 0 0 0 M =...... = 0. [1 ] [1 0 0 ] 0 0 0 0 9 / 33

Which matrices? Pathological example 1 0 0 1 0 0 0 M =...... = 0. [1 ] [1 0 0 ] 0 0 0 0 [Candès, Recht 08] M = UΣV T has coherence µ if A0. max A1. max i,j 1 i αn k=1 r U 2 ik µ r n, r k=1 U ik V jk µ r n max 1 j n k=1 Intuition µ is small if singular vectors are well balanced We need low-coherence for matrix completion r Vjk 2 µ r n 9 / 33

Previous work Rank minimization minimize subject to rank(x) X ij = M ij, (i, j) E NP-hard 10 / 33

Previous work Rank minimization Heuristic [Fazel 02] minimize subject to rank(x) X ij = M ij, (i, j) E minimize subject to X X ij = M ij, (i, j) E NP-hard Convex relaxation Nuclear norm n X = σ i (X ) i=1 Can be solved using Semidefinite Programming(SDP) 10 / 33

Previous work [Candès, Recht 08] Nuclear norm minimization reconstructs M exactly with high probability, if E C µ r n 6/5 log n Surprise? 11 / 33

Previous work [Candès, Recht 08] Nuclear norm minimization reconstructs M exactly with high probability, if E C µ r n 6/5 log n Degrees of freedom (1 + α)rn Open questions Optimality: Do we need n 6/5 log n samples? Complexity: SDP is computationally expensive Noise: Can not deal with noise 11 / 33

Example: 2000 2000 rank-8 random matrix low-rank matrix M sampled matrix M E OptSpace output M squared error (M M) 2 0.25% sampled 12 / 33

Example: 2000 2000 rank-8 random matrix low-rank matrix M sampled matrix M E OptSpace output M squared error (M M) 2 0.50% sampled 12 / 33

Example: 2000 2000 rank-8 random matrix low-rank matrix M sampled matrix M E OptSpace output M squared error (M M) 2 0.75% sampled 12 / 33

Example: 2000 2000 rank-8 random matrix low-rank matrix M sampled matrix M E OptSpace output M squared error (M M) 2 1.00% sampled 12 / 33

Example: 2000 2000 rank-8 random matrix low-rank matrix M sampled matrix M E OptSpace output M squared error (M M) 2 1.25% sampled 12 / 33

Example: 2000 2000 rank-8 random matrix low-rank matrix M sampled matrix M E OptSpace output M squared error (M M) 2 1.50% sampled 12 / 33

Example: 2000 2000 rank-8 random matrix low-rank matrix M sampled matrix M E OptSpace output M squared error (M M) 2 1.75% sampled 12 / 33

Algorithm 13 / 33

Naïve approach Singular Value Decomposition (SVD) n M E = σ i x i yi T i=1 Compute rank-r approximation M SVD r M SVD αn2 σ i x i yi T E i=1 M E MSVD 14 / 33

Naïve approach fails Singular Value Decomposition (SVD) n M E = σ i x i yi T i=1 Compute rank-r approximation M SVD r M SVD αn2 σ i x i yi T E i=1 M E MSVD 14 / 33

Trimming M E = M E ij = M E ij 0 if deg(row i ) > 2 E /αn 0 if deg(col j ) > 2 E /n otherwise deg( ) is the number of samples in that row/column 15 / 33

Algorithm OptSpace Input : sample indices E, sample values M E, rank r Output : estimation M 1: Trimming 2: Compute M SVD using SVD 3: Greedy minimization of the residual error 16 / 33

Algorithm OptSpace Input : sample indices E, sample values M E, rank r Output : estimation M 1: Trimming 2: Compute M SVD using SVD M SVD can be computed efficiently for sparse matrices 16 / 33

Main results Theorem For any E, M SVD achieves, with high probability, RMSE CM max nr E ( 1 RMSE = αn i,j (M M 2 SVD ) 2 ij M max max i,j M ij ) 1/2 Keshavan, Montanari, Oh, IEEE Trans. Information Theory, 2010 17 / 33

Main results Theorem For any E, M SVD achieves, with high probability, RMSE CM max nr E [Achlioptas, McSherry 07] If E n(8 log n) 4, with high probability, RMSE 4M max nr E For n = 10 5, (8 log n) 4 7.2 10 7 Keshavan, Montanari, Oh, IEEE Trans. Information Theory, 2010 17 / 33

Main results Theorem For any E, M SVD achieves, with high probability, RMSE CM max nr E [Achlioptas, McSherry 07] If E n(8 log n) 4, with high probability, RMSE 4M max nr E Netflix dataset A single user rated 17,000 movies. Miss Congeniality : 200,000 ratings. For n = 10 5, (8 log n) 4 7.2 10 7 Keshavan, Montanari, Oh, IEEE Trans. Information Theory, 2010 17 / 33

Can we do better? 18 / 33

Greedy minimization of residual error Starting from (X 0, Y 0 ) for M SVD = X 0 S 0 Y0 T, use gradient descent methods to solve minimize F (X, Y) subject to X T X = I, Y T Y = I ) 2 F (X, Y) (M E ij (XSY T ) ij min S R r r (i,j) E X S Y T rank Can be computed efficiently for sparse matrices 19 / 33

Algorithm OptSpace Input : sample indices E, sample values M E, rank r Output : estimation M 1: Trimming 2: Compute M SVD using SVD 3: Greedy minimization of the residual error 20 / 33

Main results Theorem (Trimming+SVD) M SVD achieves, with high probability, RMSE CM max nr E Theorem (Trimming+SVD+Greedy minimization) OptSpace reconstructs M exactly, with high probability, if E C µ r n max{µr, log n} Keshavan, Montanari, Oh, IEEE Trans. Information Theory, 2010 21 / 33

OptSpace is order-optimal Theorem If µ and r are bounded, OptSpace reconstructs M exactly, with high probability, if E C n log n Lower bound (coupon collector s problem): If E C n log n, then exact reconstruction is impossible Nuclear norm minimization: [Candès, Recht 08, Candès, Tao 09, Recht 09, Gross et al. 09] If E C n (log n) 2, then exact reconstruction by SDP 22 / 33

Comparison 1000 1000 rank-10 matrix M 1 P success 0.5 Fundamental Limit OptSpace FPCA SVT ADMiRA 0 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 Sampling rate Fundamental Limit [Singer, Cucuringu 09], FPCA [Ma, Goldfarb, Chen 09], SVT [Cai, Candès, Shen 08], ADMiRA [Lee, Bresler 09] 23 / 33

Story so far OptSpace reconstructs M from a few sampled entries, when M is exactly low-rank and samples are exact In reality, M is only approximately low-rank samples are corrupted by noise 24 / 33

The model with noise Σ V T r αn Low-rank M = U n r Rank-r matrix M Random sample set E Sample noise Z E Sample matrix N E = M E + Z E 25 / 33

The model with noise αn Sample N E n Rank-r matrix M Random sample set E Sample noise Z E Sample matrix N E = M E + Z E 25 / 33

Main results Theorem For E Cµrn max{µr, log n}, OptSpace achieves, with high probability, RMSE C n r E ZE 2, provided that the RHS is smaller than σ r (M)/n. 2 is the spectral norm Keshavan, Montanari, Oh, Journ. Machine Learning Research, 2010 26 / 33

OptSpace is order-optimal when noise is i.i.d. Gaussian Theorem For E Cµrn max{µr, log n}, OptSpace achieves, with high probability, RMSE C σ z r n E provided that the RHS is smaller than σ r (M)/n. Lower bound: [Candès, Plan 09] RMSE σ 2 r n z E Trimming + SVD r n r n RMSE CM max + C σ z E E }{{}}{{} missing entries sample noise 27 / 33

Comparison 500 500 rank-4 matrix M, Gaussian noise with σ z = 1 Example from [Candès, Plan 09] 1.5 Trim+SVD RMSE 1 0.5 Lower Bound 0 OptSpace 0.2 0.4 0.6 0.8 1 Sampling rate 28 / 33

Comparison 500 500 rank-4 matrix M, Gaussian noise with σ z = 1 Example from [Candès, Plan 09] 1.5 Trim+SVD RMSE 1 0.5 Lower Bound 0 OptSpace FPCA ADMiRA 0.2 0.4 0.6 0.8 1 Sampling rate FPCA [Ma, Goldfarb, Chen 09], ADMiRA [Lee, Bresler 09] 28 / 33

Positioning 29 / 33

The model Distance Matrix D R n wireless devices uniformly distributed in a bounded convex region Distance measurements between devices within radio range R Find the locations up to a rigid motion 30 / 33

The model Distance Matrix D R How is it related to Matrix Completion? Need to find the missing entries rank(d) = 4 How is it different? Non-uniform sampling Rich information not used in Matrix Completion 30 / 33

The model Distance Matrix D R MDS-MAP [Shang et al. 03] 1. Fill in the missing entries with shortest paths 2. Compute rank-4 approximation 30 / 33

Main results Theorem log n For R > C n, with high probability, RMSE C log n R n ( RMSE = 1 n 2 i,j ( D D) 2 ij) 1/2 + o(1). Lower Bound: If R <, then the graph is disconnected log n πn Generalized to quantized measurements and distributed algorithms We can add Greedy Minimization step Oh, Karbasi, Montanari, Information Theory Workshop, 2010 Karbasi, Oh, ACM SIGMETRICS, 2010 31 / 33

Numerical simulation RMSE 0.25 0.2 n=500 n=1,000 n=2,000 0.25 0.2 n=500 n=1,000 n=2,000 0.15 0.15 0.1 0.1 0.05 0.05 0 1 1.5 2 2.5 3 3.5 4 4.5 R/ log n πn R/ 0 2 3 4 5 6 7 8 9 10 11 12 1 πn 32 / 33

Conclusion Matrix Completion is an important problem with many practical applications OptSpace is an efficient algorithm for Matrix Completion OptSpace achieves performance close to the fundamental limit 33 / 33

Special thanks to: PMCRQOAVNBETHRDAZOKNWXAOURHLIO 33 / 33

Special thanks to: Officemates: Morteza, Yash, Jose, Raghu, Satish Friends: Mohsen, Adel, Farshid, Fernando, Arian, Haim, Sachin, Ivana, Ahn, Cha, Choi, Rhee, Kang, Kim, Lee, Na, Park, Ra, Seok, Song 33 / 33

Special thanks to: My family and Kyung Eun 33 / 33

Thank you! 33 / 33