EE 381V: Large Scale Learning Spring Lecture 16 March 7

Similar documents
Matrix Completion from a Few Entries

Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

Low-rank Matrix Completion from Noisy Entries

Matrix completion: Fundamental limits and efficient algorithms. Sewoong Oh Stanford University

Robust Principal Component Analysis

Lecture Notes 10: Matrix Factorization

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)

Recommendation Systems

Lecture 9: SVD, Low Rank Approximation

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Matrix Completion: Fundamental Limits and Efficient Algorithms

Low-rank Matrix Completion with Noisy Observations: a Quantitative Comparison

Lecture Notes 2: Matrices

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Breaking the Limits of Subspace Inference

EE 381V: Large Scale Optimization Fall Lecture 24 April 11

CS168: The Modern Algorithmic Toolbox Lecture #10: Tensors, and Low-Rank Tensor Recovery

Conceptual Questions for Review

Collaborative Filtering

Noisy Streaming PCA. Noting g t = x t x t, rearranging and dividing both sides by 2η we get

Structured matrix factorizations. Example: Eigenfaces

Collaborative Filtering Matrix Completion Alternating Least Squares

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

Lecture 9: Low Rank Approximation

web: HOMEWORK 1

Spectral k-support Norm Regularization

Recovering any low-rank matrix, provably

ELE 538B: Mathematics of High-Dimensional Data. Spectral methods. Yuxin Chen Princeton University, Fall 2018

Latent Semantic Analysis. Hongning Wang

Kernel methods, kernel SVM and ridge regression

Lecture 9: Vector Algebra

Lecture 8: Linear Algebra Background

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2013 PROBLEM SET 2

a Short Introduction

Math 520 Exam 2 Topic Outline Sections 1 3 (Xiao/Dumas/Liaw) Spring 2008

Tighter Low-rank Approximation via Sampling the Leveraged Element

CS 6820 Fall 2014 Lectures, October 3-20, 2014

Rank, Trace-Norm & Max-Norm

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

Using SVD to Recommend Movies

Learning representations

Spectral Graph Theory and You: Matrix Tree Theorem and Centrality Metrics

Recommender Systems. Dipanjan Das Language Technologies Institute Carnegie Mellon University. 20 November, 2007

25.2 Last Time: Matrix Multiplication in Streaming Model

Jointly Clustering Rows and Columns of Binary Matrices: Algorithms and Trade-offs

PCA, Kernel PCA, ICA

Low-Rank Matrix Recovery

Conditions for Robust Principal Component Analysis

MATH 240 Spring, Chapter 1: Linear Equations and Matrices

Lecture 2 Linear Codes

1 Singular Value Decomposition and Principal Component

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

Principal Component Analysis

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =

Relations Between Adjacency And Modularity Graph Partitioning: Principal Component Analysis vs. Modularity Component Analysis

ORIE 6300 Mathematical Programming I August 25, Recitation 1

Lecture 3: Review of Linear Algebra

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

Provable Alternating Minimization Methods for Non-convex Optimization

CS281 Section 4: Factor Analysis and PCA

Lecture 03. Math 22 Summer 2017 Section 2 June 26, 2017

8.1 Concentration inequality for Gaussian random matrix (cont d)

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Nonnegative Matrix Factorization

Chapter 3: Theory Review: Solutions Math 308 F Spring 2015

Lecture Notes 5: Multiresolution Analysis

Techniques for Dimensionality Reduction. PCA and Other Matrix Factorization Methods

arxiv: v1 [stat.ml] 1 Mar 2015

Summary and discussion of: Controlling the False Discovery Rate via Knockoffs

Lecture 2: Linear Algebra Review

LOW ALGEBRAIC DIMENSION MATRIX COMPLETION

Data Mining Techniques

Computational math: Assignment 1

Rank minimization via the γ 2 norm

Analysis of Robust PCA via Local Incoherence

Graphical Models for Collaborative Filtering

6.034 Introduction to Artificial Intelligence

Math 4377/6308 Advanced Linear Algebra

Linear dimensionality reduction for data analysis

Latent Semantic Analysis. Hongning Wang

Vector Spaces, Orthogonality, and Linear Least Squares

Review 1 Math 321: Linear Algebra Spring 2010

Lecture 12: Randomized Least-squares Approximation in Practice, Cont. 12 Randomized Least-squares Approximation in Practice, Cont.

Matrix Completion from Fewer Entries

1-Bit Matrix Completion

Kronecker Product Approximation with Multiple Factor Matrices via the Tensor Product Algorithm

Dimensionality Reduction

Homework 5. Convex Optimization /36-725

Inner Product Spaces 6.1 Length and Dot Product in R n

Review of Some Concepts from Linear Algebra: Part 2

Probabilistic Latent Semantic Analysis

Link Analysis Ranking

Lecture 9: September 28

1 Regression with High Dimensional Data

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016

Dimension Reduction and Iterative Consensus Clustering

Lecture: Face Recognition and Feature Reduction

ICS 6N Computational Linear Algebra Vector Equations

Transcription:

EE 381V: Large Scale Learning Spring 2013 Lecture 16 March 7 Lecturer: Caramanis & Sanghavi Scribe: Tianyang Bai 16.1 Topics Covered In this lecture, we introduced one method of matrix completion via SVD-based PCA. Specifically, we covered the following topics. 1. Introduction: problem formation and applications 2. One toy example: recovering a matrix of rank 1 3. SVD-based algorithm 4. Proof for the effectiveness of the algorithm 16.2 Introduction We would like to motivate the matrix completion problem through the example of collaborative filtering. Imagine that each of m customers watches and rates a subset of the n movies available through a website. This yields a dataset of customer-movie pairs (i, j) E [m] [n]. For each such pair, a rating M ij R is known. The objective of collaborative filtering is to predict the rating for the missing pairs as to provide suggestions, based on the previous rating results. In addition, another application of matrix completion is localization from distance measurements, a topic we discussed previously. 16.2.1 Mathematical Model A mathematical model is made to solve the collaborative filtering problem as follows. We denote by M the matrix whose entry (i, j) [m] [n] corresponds to the rating from user i to movie j. Out of the m n entries of M, only a subset E [m] [n] is known. We let M E be the m n matrix defined as { Mi,j E Mi,j if (i, j) E, = 0 otherwise. The set E is uniformly chosen from [m] [n] given its size. Furthermore, we also assume M has low rank r m, n for the following reasons. 1. Empirical datasets show that M has low rank. 16-1

Table 16.1. Matrix Completion Algorithm 1: Trimming: Trim M E, and let M E be the output. 2: Compute the SVD of M E. Let M E = min(m,n) i=1 σ i x i yi T, where σ 1 σ 2... 0. 3. Projection: Compute the recovered matrix M as M = P r ( M E ) = k i=1 σ ix i yi T. 2. The clustered nature of the matrix M suggests low rank. Namely, users of similar interests are more likely to give similar ratings to the films of similar features. 16.2.2 A Toy Problem: Recovering a matrix of rank 1 In the class, we first discussed the problem of how to complete a matrix of rank 1 to illustrate the key ideas of matrix completion. The toy problem is formatted as follows. Suppose M = UV, where U is an m 1 unknown vector, and V is an n 1 unknown vector. Hence M has rank 1. We want to recover M, i.e. to find appropriate vector U and V from M E, which is the matrix containing all observed entries as defined in the previous section. Firstly, note that the solution of U and V is not unique. The reason is that if U and V satisfy the constraints, then for any constant c 0, cu and 1 c V are also a proper pair of solutions. Hence without loss of generosity, we assume U 1 = 1. Next, j [n], if (1, j) E, we can recover V j = M E 1,j U 1. In the similar way, having known V j, we can further recover U i = M i,j E V j as long as (i, j) E. Hence, if the matrix M E is connected, we can iteratively recover all the entries of U and V. We say a matrix is connected in this problem if the bipartite constructed in the following way is a connected graph. Consider a bipartite graph formed by two disjoint subsets X and Y. X has m vertices, while Y has n vertices. The vertix X i is connected to Y j by an edge if Mi,j E > 0. If the bipartite is connected, then we can recover M through the iterative method introduced above. Lastly, note that M E is connected with high probability if O(n log n) number of its entries are nonzero (revealed). 16.3 SVD-based Matrix Completion Method The algorithm is shown in the Table 16.1. Details of the steps are discussed in the following paragraphs. 16.3.1 Trimming The operation of trimming is defined as follows. 16-2

Trimming: Set to zero all colus in M E with more than 2 m zero all rows in M E with more than 2 nonzero entries. n nonzero entries; Set to Note that the average number of nonzero entries in each colu (row) of M E is ( ). m n So we just discard all the colus and rows that have more than twice the average number of nonzero entries through trimming. In fact, if the size of revealed entries = O(nr log n), trimming is not necessary. However, if = O(nr), the performance without the step of trimming is poor. The reason is that when = O(nr), the maximum number of revealed (nonzero) entries in a row has ), while the average number is constant. These over-represented rows (colus) will alter the spectrum of M E artificially. Thus we drop these rows (colus) to avoid poor performance. the order of O( log n log log n 16.3.2 Projection If ignore the scaling constant, P r(m E ) is the orthogonal projection of M E onto the set of rank-r matrices. The scaling constant compensates the smaller average size of the nonzero entries of M E with respect to M. Namely, the scaling constant is obtained from the following equations. For x R m 1, y R n 1, E [ x T M E y ] [ ] = E x i M i,j y j 1 (i,j) E = 16.4 Proof of Effectiveness i,j x i M i,j y j i,j = xt My. The effectiveness of the proposed algorithm is provided in the following theorem. Theorem 16.1. Assume M to be a rank r matrix of dimension m n which satisfies M i,j M max for all i, j. Then with probability larger than 1 1/n 3, for some constant C. 1 n 2 M 2 max M Pr ( M E ) 2 F C rn, Theorem 16.1 can be proved using the following lemmas. 16-3

Lemma 16.2. There exists a constant C > 0 such that, with probability larger than 1 1/n 3, n σ q Σ q CM max, where σ q is the q th largest singular value of M E, and Σ q the q th largest singular value of M/. Note that for q > r, Σ q = 0. Hence for q > r, σ q C M max /n. Lemma 16.3. There exists a constant C > 0 such that, with probability larger than 1 1/n 3 We prove Theorem 16.1 as follows. M M E CM max 2 n. Proof: (Theorem 16.1) Note that for any matrix A of rank at most r, A F 2r A 2. Hence it follows that Next, by triangle inequality M P r ( M E ) F 2r M P r ( M E ) 2. (16.1) M P r ( M E ) 2 M M E 2 + C 1M max M E P r ( M E ) 2 n + M E P r ( M E ) 2 (16.2) C 2 M max n 3/2 1 + σ r+1 C 2 M max n 3/2 1 + C 3 M max /n (16.3) C 4 M max n 3/2 1, (16.4) where (16.2) is from Lemma 16.3, (16.2) is from Lemma 16.2, and C 1 C 4 are some positive constants. Finally, we complete the proof by substituting (16.4) for (16.1). Lastly, we provide the proof for Lemma 16.2 as follows. Proof: (Lemma 16.2) Observe that σ q = min H,dim(H)=n q+1 = max H,dim(H)=q max M E y (16.5) y H, y =1 min M E y. (16.6) y H, y =1 16-4

By (16.5), if denote H as the orthogonal complement space of span(v 1,..., v q 1 ), it follows σ q max y H, y =1 M E y = max M E y y H, y =1 M + M max 1 My + y H, y =1 Σ q + CM max n, max ( y H, y =1 M M E )y where the last step is from Lemma 16.3. Similarly using (16.6) and letting H = span(v 1,..., v q ), the lower bound follows σ q min y H, y =1 M E y min M y H, y =1 Σ q CM max n. max ( y H, y =1 M M E )y Reference [1] R. H. Keshavan, A. Montanari, and O. Sewoong, Matrix Completion From a Few Entries, IEEE Journal on Information Theory, vol. 56, no. 6, pp. 2980 2998, Jun. 2010. 16-5