Rank minimization via the γ 2 norm

Similar documents
An Approximation Algorithm for Approximation Rank

Approximation norms and duality for communication complexity lower bounds

An approximation algorithm for approximation rank

An approximation algorithm for approximation rank

Direct product theorem for discrepancy

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

Direct product theorem for discrepancy

Direct product theorem for discrepancy

Grothendieck Inequalities, XOR games, and Communication Complexity

Lecture 8: Linear Algebra Background

1-Bit Matrix Completion

Lower Bounds in Communication Complexity

1-Bit Matrix Completion

A Lower Bound for the Size of Syntactically Multilinear Arithmetic Circuits

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 2

MIT Algebraic techniques and semidefinite optimization February 14, Lecture 3

A direct product theorem for discrepancy

25.2 Last Time: Matrix Multiplication in Streaming Model

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

Lecture 7: Positive Semidefinite Matrices

2. Linear algebra. matrices and vectors. linear equations. range and nullspace of matrices. function of vectors, gradient and Hessian

1 The linear algebra of linear programs (March 15 and 22, 2015)

An Introduction to Linear Matrix Inequalities. Raktim Bhattacharya Aerospace Engineering, Texas A&M University

Collaborative Filtering Matrix Completion Alternating Least Squares

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

A Unified Theorem on SDP Rank Reduction. yyye

15-251: Great Theoretical Ideas In Computer Science Recitation 9 : Randomized Algorithms and Communication Complexity Solutions

1-Bit Matrix Completion

Multi-stage convex relaxation approach for low-rank structured PSD matrix recovery

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Lecture: Examples of LP, SOCP and SDP

Lecture 1: Review of linear algebra

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 11 Luca Trevisan February 29, 2016

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2017 LECTURE 5

Optimality, Duality, Complementarity for Constrained Optimization

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2013 PROBLEM SET 2

CSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming

Matrix Rank in Communication Complexity

Matrix Completion: Fundamental Limits and Efficient Algorithms

approximation algorithms I

Agenda. Applications of semidefinite programming. 1 Control and system theory. 2 Combinatorial and nonconvex optimization

Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora

Convex Optimization. (EE227A: UC Berkeley) Lecture 6. Suvrit Sra. (Conic optimization) 07 Feb, 2013

Low-Rank Matrix Recovery

Convex sets, conic matrix factorizations and conic rank lower bounds

Basic Concepts in Linear Algebra

Optimal compression of approximate Euclidean distances

LinGloss. A glossary of linear algebra

SDP Relaxations for MAXCUT

Collaborative Filtering

Review of Basic Concepts in Linear Algebra

Ir O D = D = ( ) Section 2.6 Example 1. (Bottom of page 119) dim(v ) = dim(l(v, W )) = dim(v ) dim(f ) = dim(v )

Functional Analysis Review

Dissertation Defense

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 4

Compute the Fourier transform on the first register to get x {0,1} n x 0.

Review of Some Concepts from Linear Algebra: Part 2

Lower bounds on the size of semidefinite relaxations. David Steurer Cornell

Limitations in Approximating RIP

Grothendieck s Inequality

CS/Ph120 Homework 4 Solutions

Throughout these notes we assume V, W are finite dimensional inner product spaces over C.

CS Foundations of Communication Complexity

Semidefinite Programming

Knowledge Discovery and Data Mining 1 (VO) ( )

Chapter 1. Matrix Algebra

STA141C: Big Data & High Performance Statistical Computing

Conditions for Robust Principal Component Analysis

Matrix decompositions

Local MAX-CUT in Smoothed Polynomial Time

Simultaneous Diagonalization of Positive Semi-definite Matrices

Homework 2 Foundations of Computational Math 2 Spring 2019

1 Strict local optimality in unconstrained optimization

Symmetric Matrices and Eigendecomposition

6.895 PCP and Hardness of Approximation MIT, Fall Lecture 3: Coding Theory

Lower Bounds in Communication Complexity Based on Factorization Norms

Communication Complexity

Network Localization via Schatten Quasi-Norm Minimization

Matrix Rank Minimization with Applications

E5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming

There are several approaches to solve UBQP, we will now briefly discuss some of them:

Relaxations and Randomized Methods for Nonconvex QCQPs

Convex Optimization M2

Rank, Trace-Norm & Max-Norm

Jointly Clustering Rows and Columns of Binary Matrices: Algorithms and Trade-offs

A Linear Round Lower Bound for Lovasz-Schrijver SDP Relaxations of Vertex Cover

Robust Principal Component Analysis

CS286.2 Lecture 8: A variant of QPCP for multiplayer entangled games

An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems

SCHUR IDEALS AND HOMOMORPHISMS OF THE SEMIDEFINITE CONE

An exponential separation between quantum and classical one-way communication complexity

ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications

Approximation Algorithms

be a Householder matrix. Then prove the followings H = I 2 uut Hu = (I 2 uu u T u )u = u 2 uut u

Maximal vectors in Hilbert space and quantum entanglement

Singular Value Decomposition

Jim Lambers MAT 610 Summer Session Lecture 1 Notes

Matrix estimation by Universal Singular Value Thresholding

Section 3.9. Matrix Norm

3. Vector spaces 3.1 Linear dependence and independence 3.2 Basis and dimension. 5. Extreme points and basic feasible solutions

Transcription:

Rank minimization via the γ 2 norm Troy Lee Columbia University Adi Shraibman Weizmann Institute

Rank Minimization Problem Consider the following problem min X rank(x) A i, X b i for i = 1,..., k Arises in many contexts: complexity theory, recommendation systems, control theory Known to be NP-hard Optimization problem over a nonconvex function

Communication complexity Two parties Alice and Bob wish to evaluate a function f : X Y { 1, +1} where Alice holds x X and Bob y Y. How much communication is needed? Can consider deterministic D(f), randomized R ɛ (f), and even quantum versions Q ɛ (f) Often convenient to work with X -by- Y matrix A known as communication matrix where A(x, y) = f(x, y). Allows tools from linear algebra to be applied.

How a protocol partitions communication matrix Bob Y Alice X 0 1

How a protocol partitions communication matrix Bob Y 00 01 Alice X 1 10

How a protocol partitions communication matrix Bob Y Alice X 001 000 010 011 101 111 110 100

Log rank bound As a successful protocol partitions the communication matrix into rank one matrices we find D(f) log rank(a f ) One of the greatest open problems in communication complexity is the log rank conjecture [LS88], which states that D(f) (log rank(a f )) k for some constant k.

Approximation rank As rank lower bounds deterministic communication complexity, the relevant quantity for randomized (and quantum) models is approximation rank. Given a target matrix A, find a low rank matrix entrywise close to A: rank ɛ (A) = min X rank(x) X(i, j) A(i, j) ɛ for all i, j Krause [Kra96] shows that R ɛ (A) log rank ɛ (A).

Approximation rank We do not know if approximation rank remains NP-hard to compute, but can be difficult in practice. For the disjointness matrix A with rows and columns labeled by n-bit strings where { 1 if x y = 0 A(x, y) = 1 otherwise. The ɛ = 1/3 approximation rank is 2 Θ( n) [Raz03, AA05].

Example 2: Matrix completion Popularly known as the Netflix problem. Think of a M-by-N matrix where rows are labeled by users, columns are labeled by movies, and entries are ratings. From a partial filling of this matrix ratings supplied by some users would like to make predictions for other users, i.e. fill out the rest of the matrix. A useful assumption: a user s rating depends on only on a few factors, thus the completed matrix should have low rank.

Example 2: Matrix completion Given a set Ω M N of constraints X(i, j) = a i,j for (i, j) Ω, find the lowest rank completion of X min X rank(x) X(i, j) = a i,j for all (i, j) Ω For applications, can think of a hidden low rank matrix A in the background. The goal is to exactly recover this matrix. Basic observations: rank r matrix has O((M + N)r) degrees of freedom. In general, will need some assumptions for interesting results think of matrix with only one nonzero entry.

Convex relaxations Part of the difficulty of the rank minimization problem is that it is an optimization problem over a nonconvex function. Much work has looked at substituting the rank function by a convex function. We will look at substituting rank function by different norms.

Matrix norms Define the i th singular value as σ i (X) = λ i (XX t ) Many useful matrix norms expressed in terms of vector of singular values σ(x) = (σ 1 (X),..., σ n (X)). X 1 = l 1 (σ(x)) trace norm X = l (σ(x)) spectral norm X 2 = l 2 (σ(x)) = Tr(XX t ) Frobenius norm

Trace norm heuristic Popular heuristic is to replace rank by the trace norm min X X 1 A i, X b i for i = 1,..., k Motivation: rank is equal to the number of nonzero singular values, thus X 1 X rank(x). Over matrices satisfying X 1, trace norm is the largest convex lower bound on rank [Faz02].

Trace norm heuristic for matrix completion There has recently been a lot of work on the trace norm heuristic for the matrix completion problem [Faz02, RFP07, CR08, CT09]. These results are of the form: Say that X is generated by taking N-by-r random Gaussian matrices Y and Z and setting X = Y Z t. Let Ω Nr log 7 n consist of entries of X sampled uniformly at random. Then with high probability the trace norm heuristic will exactly recover X.

Trace norm heuristic for approximation rank In the context of approximation rank and communication complexity, often work with sign matrices. Here the trace norm heuristic is better motivated by another simple inequality: X 1 = σ i (X) rank(x) X 2. i For a M-by-N sign matrix A this simplifies nicely: rank(a) A 2 1 MN

Trace norm bound on rank (example) Let H N be a N-by-N Hadamard matrix (entries from { 1, +1}). Then H N 1 = N 3/2. Trace norm method gives bound on rank of N 3 /N 2 = N

Trace norm bound (drawback) As a complexity measure, the trace norm bound suffers one drawback it is not monotone. ( HN 1 N Trace norm at most N 3/2 + 3N Trace norm method gives (N 3/2 + 3N) 2 1 N 1 N ) 4N 2 = N 4 + O( N) worse bound on whole than on H N submatrix!

Trace norm method (a fix) We can fix this by considering max A uv t u,v: 1 u 2 = v 2 =1 As rank(a uv t ) rank(a) we still have rank(a) ( A uv t 1 A uv t 2 ) 2

The γ 2 norm This bound simplifies nicely for a sign matrix A rank(a) max u,v: u 2 = v 2 =1 ( A uv t 1 A uv t 2 ) 2 = max A uv t 2 u,v: 1 u 2 = v 2 =1 We have arrived at the γ 2 norm introduced to communication complexity by [LMSS07, LS07] γ 2 (A) = max A uv t u,v: 1 u 2 = v 2 =1

γ 2 norm: Surprising usefulness γ 2 is a norm, though not a matrix norm. In matrix analysis known as Schur/Hadamard product operator/trace norm. Schur (1911) showed that γ 2 (A) = max i A ii if A positive semidefinite. γ 2 (A) can be written as a semidefinite program and so can be well approximated in time polynomial in the size of A. The dual norm γ 2(A) = max B A, B /γ 2 (B) turns up in semidefinite programming relaxation of MAX-CUT of Goemans and Williamson, and is closely related to the discrepancy method in communication complexity.

Cross section of γ2 unit ball of symmetric matrices [x,y;y,z] 0.9 0.8 1 0.8 0.7 0.6 0.6 0.4 0.2 z 0.5 0 0.2 0.4 0.4 0.3 0.6 0.8 1 1 1 0.2 0.5 0.8 0 0.6 0.4 0.5 0.2 y 0 1 x 0.1

Approximation rank with γ 2 Substitute rank by γ 2 in the approximation rank optimization problem: γ ɛ 2(A) = min X γ 2(X) A(i, j) X(i, j) ɛ for all (i, j). Main theorem: For any M-by-N sign matrix A and constant 0 < ɛ < 1/2 γ ɛ 2(A) 2 (1 + ɛ) 2 rank ɛ(a) = O ( γ ɛ 2(A) 2 log(mn) ) 3

Proof sketch We introduced γ 2 as a maximization problem. For the proof, we use an alternative characterization of γ 2 in terms of a minimization problem. Trace norm can be written as X 1 = min Y,Z:X=Y Z t Y 2 Z 2 This follows from singular value decomposition: X = UΣV where U, V unitary.

Min formulation of γ 2 γ 2 (X) = max u,v: u 2 = v 2 =1 X uvt 1 = max u,v min Y,Z X=Y Z t D(u)Y 2 Z t D(v) 2 = min max D(u)Y 2 Z t D(v) 2 Y,Z u,v X=Y Z t = min Y,Z X=Y Z t Y r Z r where Y r is the largest l 2 norm of a row of Y.

First step: dimension reduction Thus γ 2 looks for factorization X = Y Z t where Y, Z have short rows in terms of l 2 norm. Similarly rank looks for factorization X = Y Z t where Y, Z have short rows in terms of dimension. Use Johnson-Lindenstrauss lemma to project rows of Y, Z to dimension about equal to γ 2 (X) 2. Let R be a random K -by-k matrix Pr R [ Ru, Rv u, v δ ] 2 ( u 2 + v 2 ) 4e δ2 K /8

Second step: error reduction After the first step, we obtain a new matrix X of rank O(γ ɛ 2(A) 2 log N) but the approximation factor has worsened X is only 2ɛ close to A. Trick going back to Krivine: Apply a low degree polynomial entrywise to the matrix X. p(m) = a 0 J + a 1 M +... + a d M d. See that rank(p(m)) (d+1)rank(m) d. Taking p to be approximation to the sign function reduces error.

Polynomial for Error Reduction (7x-x^3)/6 1.6 1.2 0.8 0.4-2.4-2 -1.6-1.2-0.8-0.4 0 0.4 0.8 1.2 1.6 2 2.4-0.4-0.8-1.2

Final result For any M-by-N sign matrix A and constant 0 < ɛ < 1/2 γ ɛ 2(A) 2 (1 + ɛ) 2 rank ɛ(a) = O ( γ ɛ 2(A) 2 log(mn) ) 3 Logarithmic factor is necessary as (sign version of) identity matrix has approximation rank log(n) [Alo03] but constant γ 2. [BES02] used dimension reduction to upper bound sign rank by γ 2 (A) 2. Interestingly, here the lower bound fails.

Extension to general rank minimization problem Consider again the general rank minimization problem α(a, b) = min X rank(x) A i, X b i for i = 1,..., k Let C = {X : A i, X b i } be the feasible set. Let l (C) = max X C l (X).

Extension to general rank minimization problem As argued before we have α(a, b) min X C γ 2 (X) 2 l (C) 2. Say we solve via semidefinite programming the program min X C γ 2 (X) Then by doing dimension reduction on an optimal X we obtain a matrix Y of rank about γ 2 (X ) 2 log(n) which is ɛ close to X. This matrix will satisfy the i th constraint up to a factor ɛ l 1 (A i ).

Application to the matrix completion problem In matrix completion, often is a natural bound on l (C). For example, Netflix uses ratings {1, 2, 3, 4, 5} so matrix will be bounded. Also in the matrix completion problem each constraint matrix A i consists of a single entry so l 1 (A i ) = 1. Thus if the lowest rank completion has rank d, via γ 2 we can find a rank O(d log(n)) matrix which is ɛ-close on the specified entries.

Open questions Approximation algorithm for the limiting case of sign rank? For matrix completion, can one show similar unconditional results for trace norm heuristic? Practical implementations of γ 2 for large matrices.