The convex algebraic geometry of rank minimization

Similar documents
Convex sets, conic matrix factorizations and conic rank lower bounds

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

Sparse and Low-Rank Matrix Decompositions

From Compressed Sensing to Matrix Completion and Beyond. Benjamin Recht Department of Computer Sciences University of Wisconsin-Madison

Latent Variable Graphical Model Selection Via Convex Optimization

arxiv: v1 [math.oc] 11 Jun 2009

Lecture Notes 9: Constrained Optimization

The convex algebraic geometry of linear inverse problems

Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

Recovering any low-rank matrix, provably

Exact Low-rank Matrix Recovery via Nonconvex M p -Minimization

Linear dimensionality reduction for data analysis

Robust Principal Component Analysis

Optimisation Combinatoire et Convexe.

Compressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles

Going off the grid. Benjamin Recht Department of Computer Sciences University of Wisconsin-Madison

Positive semidefinite rank

Low-Rank Matrix Recovery

Low-rank Matrix Completion with Noisy Observations: a Quantitative Comparison

MIT Algebraic techniques and semidefinite optimization February 14, Lecture 3

Matrix Completion: Fundamental Limits and Efficient Algorithms

Tractable performance bounds for compressed sensing.

Constrained optimization

Copyright by SIAM. Unauthorized reproduction of this article is prohibited.

Recovery of Simultaneously Structured Models using Convex Optimization

Matrix completion: Fundamental limits and efficient algorithms. Sewoong Oh Stanford University

Provable Alternating Minimization Methods for Non-convex Optimization

Three Generalizations of Compressed Sensing

Structured matrix factorizations. Example: Eigenfaces

approximation algorithms I

Combining Sparsity with Physically-Meaningful Constraints in Sparse Parameter Estimation

Compressed Sensing and Robust Recovery of Low Rank Matrices

Dimension reduction for semidefinite programming

Tractable Upper Bounds on the Restricted Isometry Constant

Uniqueness Conditions For Low-Rank Matrix Recovery

Semidefinite programming lifts and sparse sums-of-squares

BEYOND MATRIX COMPLETION

PHASE RETRIEVAL OF SPARSE SIGNALS FROM MAGNITUDE INFORMATION. A Thesis MELTEM APAYDIN

Optimization for Compressed Sensing

Guaranteed Rank Minimization via Singular Value Projection

Collaborative Filtering Matrix Completion Alternating Least Squares

Robust and Optimal Control, Spring 2015

Sum-of-Squares Method, Tensor Decomposition, Dictionary Learning

Convex sets, matrix factorizations and positive semidefinite rank

Null Space Conditions and Thresholds for Rank Minimization

Limitations in Approximating RIP

Convex relaxation. In example below, we have N = 6, and the cut we are considering

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Maximum Margin Matrix Factorization

The Convex Geometry of Linear Inverse Problems

Graph structure in polynomial systems: chordal networks

GoDec: Randomized Low-rank & Sparse Matrix Decomposition in Noisy Case

Rapid, Robust, and Reliable Blind Deconvolution via Nonconvex Optimization

Reconstruction from Anisotropic Random Measurements

Analysis of Robust PCA via Local Incoherence

Sparse Optimization Lecture: Basic Sparse Optimization Models

Recent Developments in Compressed Sensing

Graph structure in polynomial systems: chordal networks

Complexity of Deciding Convexity in Polynomial Optimization

Solving Corrupted Quadratic Equations, Provably

Strengthened Sobolev inequalities for a random subspace of functions

Introduction How it works Theory behind Compressed Sensing. Compressed Sensing. Huichao Xue. CS3750 Fall 2011

Analysis and synthesis: a complexity perspective

Convex Optimization and l 1 -minimization

Non-convex Robust PCA: Provable Bounds

Sparse and Low Rank Recovery via Null Space Properties

Generalized Orthogonal Matching Pursuit- A Review and Some

Convex Optimization M2

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison

Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization

Sparse Parameter Estimation: Compressed Sensing meets Matrix Pencil

Matrix Completion from a Few Entries

Recovering overcomplete sparse representations from structured sensing

Constructing Explicit RIP Matrices and the Square-Root Bottleneck

Lower bounds on the size of semidefinite relaxations. David Steurer Cornell

Spectral k-support Norm Regularization

An iterative hard thresholding estimator for low rank matrix recovery

Relaxations and Randomized Methods for Nonconvex QCQPs

Sta$s$cal models and graphs

Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees

Semidefinite Programming

Symmetric Factorization for Nonconvex Optimization

The Fundamentals of Compressive Sensing

Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds

Compressive Sensing and Beyond

Semidefinite Programming Basics and Applications

A Characterization of Sampling Patterns for Union of Low-Rank Subspaces Retrieval Problem

MATRIX COMPLETION AND TENSOR RANK

Random projections. 1 Introduction. 2 Dimensionality reduction. Lecture notes 5 February 29, 2016

Lecture 22: More On Compressed Sensing

Necessary and Sufficient Conditions for Success of the Nuclear Norm Heuristic for Rank Minimization

BEYOND MATRIX COMPLETION

The Geometry of Semidefinite Programming. Bernd Sturmfels UC Berkeley

V&V MURI Overview Caltech, October 2008

Robust PCA via Outlier Pursuit

Recovery of Low-Rank Plus Compressed Sparse Matrices with Application to Unveiling Traffic Anomalies

Error Correction via Linear Programming

Thresholds for the Recovery of Sparse Solutions via L1 Minimization

Equivariant semidefinite lifts and sum of squares hierarchies

Transcription:

The convex algebraic geometry of rank minimization Pablo A. Parrilo Laboratory for Information and Decision Systems Massachusetts Institute of Technology International Symposium on Mathematical Programming August 2009 Chicago 1

First, my coauthors... Ben Recht (UW Madison) Maryam Fazel (U. Washington) Venkat Chandrasekaran (MIT) Sujay Sanghavi (UT Austin) Alan Willsky (MIT) 2

Rank minimization PROBLEM: Find the lowest rank matrix in a given convex set. E.g., given an affine subspace of matrices, find one of lowest possible rank In general, NP hard (e.g., reduction from maxcut, sparsest vector, etc.). Many applications... 3

Application 1: Quadratic optimization Boolean quadratic minimization (e.g., max cut) Relax using the substitution X := xx T : If solution X has rank 1, then we solved the original problem! 4

Application 2: Matrix Completion M = M ij known for black cells M ij unknown for white cells Partially specified matrix, known pattern Often, random sampling of entries Applications: Partially specified covariances (PSD case) Collaborative prediction (e.g., Rennie Srebro 05, Netflix problem) 5

Application 3: Sum of squares P( x) = r i= 1 2 q ( x) i Number of squares equal to the rank of the Gram matrix How to compute a SOS representation with the minimum number of squares? Rank minimization with SDP constraints 6

Application 4: System Identification noise P Measure response at time T, with random input Response at time T is linear in h. Complexity of P rank(hank(h)) 7

Application 5: Video inpainting (Ding Sznaier Camps, ICCV 07) Given video frames with missing portions? Reconstruct / interpolate the missing data Simple dynamics < > Low rank (multi) Hankel 8

Application 5: Video inpainting (cont) (Ding Sznaier Camps, ICCV 07) 9

Overview Rank optimization is ubiquitous in optimization, communications, and control. Difficult, even under linear constraints. This talk: Underlying convex geometry and nuclear norm Rank minimization can be efficiently solved (under certain conditions!) Links with sparsity and compressed sensing Combined sparsity + rank 10

Outline Applications Mathematical formulation Rank vs. sparsity What is the underlying geometry? Convex hulls of varieties Geometry and nuclear norm Sparsity + Rank Algorithms M Rank r matrices 11

Rank minimization PROBLEM: Find the matrix of smallest rank that satisfies the underdetermined linear system: Given an affine subspace of matrices, find one of lowest rank 12

Rank can be complicated... What does a rank constraint look like? (section of) 3x3 matrices, rank 2 (section of) 7x7 matrices, rank 6 13

How to solve this? Many methods have been proposed (after all, it s an NLP!) Newton like local methods Manifold optimization Alternating projections Augmented Lagrangian Etc... Sometimes (often?) work very well. But, very hard to prove results on global performance. 14

Geometry of rank varieties What is the structure of the set of matrices of fixed rank? An algebraic variety (solution set of polynomial equations), defined by the vanishing of all (k+1) (k+1) minors of M. Its dimension is k x (2n k), and is nonsingular, except on those matrices of rank less than or equal to k 1. At smooth points, well defined tangent space: M Rank k matrices 15

Nuclear norm Sum of the singular values of a matrix Unitarily invariant Also known as Schatten 1 norm, Ky Fan r norm, trace norm, etc... 16

Why is nuclear norm relevant? Bad nonconvex problem > Convexify! Nuclear norm is best convex approximation of rank rank nuclear norm 17

Comparison with sparsity Consider sparsity minimization Geometric interpretation Take sparsity 1 variety Intersect with unit ball Take convex hull L1 ball! (crosspolytope) 18

Nuclear norm Same idea! Take rank 1 variety Intersect with unit ball Take convex hull Nuclear ball! 19

Convex hulls of algebraic varieties Systematic methods to produce (exact or approximate) SDP representations of convex hulls Based on sums of squares (Shor, Nesterov, Lasserre, P., Nie, Helton) Parallels/generalizations from combinatorial optimization (theta bodies, e.g., Gouveia Laurent P. Thomas 09) 20

Nuclear norm and SDP The nuclear norm is SDP representable! Semidefinite programming characterization: 21

A convex heuristic PROBLEM: Find the matrix of lowest rank that satisfies the underdetermined linear system Convex optimization heuristic Minimize nuclear norm (sum of singular values) of X This is a convex function of the matrix X Equivalent to a SDP problem 22

Nuclear norm heuristic Affine Rank Minimization: Relaxation: Proposed in Maryam Fazel s PhD thesis (2002). Nuclear norm is the convex envelope of rank Convex, can be solved efficiently Seems to work well in practice 23

Nice, but will it work? Affine Rank Minimization: Relaxation: Let s see... 24

Numerical experiments Test matrix arbitrarily chosen ;) Rank 5 matrix, 46x81 pixels Generate random equations, Gaussian coeffs. Nuclear norm minimization via SDP (SeDuMi) 25

Phase transition 26

How to explain this? Apparently, under certain conditions, nuclear norm minimization works. How to formalize this? 27

How to prove that relaxation will work? Affine Rank Minimization: Relaxation: General recipe: Find a deterministic condition that ensures success Sometimes, condition may be hard to check If so, invoke randomness of problem data, to show condition holds with high probability (concentration of measure) 28

Compressed sensing Overview {Compressed compressive}{sensing sampling} New paradigm for data acquisition/estimation Influential recent work of Donoho/Tanner, Candès/Romberg/Tao, Baraniuk,... Relies on sparsity (on some domain, e.g., Fourier, wavelet, etc) Few random measurements + smart decoding Many applications, particularly MRI, geophysics, radar, etc. 29

Nice, but will it work? Affine Rank Minimization: Relaxation: Use a restricted isometry property (RIP) Then, this heuristic provably works. For random operators, RIP holds with overwhelming probability 30

Restricted Isometry Property (RIP) Let A:R mx n R p be a linear map. For every positive integer r m, define the r restricted isometry constant to be the smallest number δ r (A) such that holds for all matrices X of rank at most r. Similar to RIP condition for sparsity studied by Candès and Tao (2004). 31

RIP Heuristic Succeeds Theorem: Let X 0 be a matrix of rank r. Let X * be the solution of A(X)=A(X 0 ) of smallest nuclear norm. Suppose that r 1 is such that δ 5r (A) < 1/10. Then X * =X 0. Deterministic condition on A Checking RIP can be hard Independent of m,n,r,p However, random A will have RIP with high probability 32

Random RIP holds Theorem: Fix 0<δ<1. If A is random, then for every r, there exist a constant c 0 depending only on δ such that δ r (A) δ whenever p c 0 r(2n r) log(n 2 ) with probability exponentially close to 1. Number of measurements c 0 r(2n r) log(n 2 ) constant intrinsic dimension ambient dimension Typical scaling for this type of result. 33

Comparison Compressed Sensing cardinality disjoint support implies cardinality of sum is sum of cardinalities l 1 norm dual norm: l best convex approximation of cardinality on unit ball of l norm Optimized via LP RIP: vectors of cardinality at most s Low rank Recovery rank disjoint row and column spaces implies rank of sum is sum of ranks nuclear norm dual norm: operator norm best convex approximation of rank on unit ball of operator norm Optimized via SDP RIP: matrices of rank at most r 34

Secant varieties What is common to the two cases? Can this be further extended? A natural notion: secant varieties Generalize notions of rank to other objects (e.g., tensors, nonnegative matrices, etc.) However, technical difficulties In general, these varieties may not be closed In general, associated norms are not polytime computable 35

Nuclear norm minimization: Algorithms Convex, nondifferentiable, special structure. Many possible approaches: Interior point methods for SDP Projected subgradient Low rank parametrization (e.g., Burer Monteiro) Much exciting recent work: Liu Vandenberghe (interior point, exploits structure) Cai Candès Shen (singular value thresholding) Ma Goldfarb Chen (fixed point and Bregman) Toh Yun (proximal gradient) Lee Bresler (atomic decomposition) Liu Sun Toh (proximal point) 36

Low rank completion problem Recent work of Candès Recht, Candès Tao, Keshavan Montanari Oh, etc. These provide theoretical guarantees of recovery, either using nuclear norm, or alternative algorithms E.g., Candès Recht: p C r log(n) Additional considerations: noise robustness, etc. n ( 6 / 5 ) 37

What if sparsity pattern is not known? = +?? Given Composite matrix Unknown Sparse Matrix Unknown support, values Unknown Low rank Matrix Unknown rank, eigenvectors Task: given C, recover A* and B* 38

Application: Graphical models A probabilistic model given by a graph. Then, the inverse covariance S 1 is sparse. Σ No longer true if graphical model has hidden variables. But, it is the sum of a sparse and a low rank matrix (Schur complement) 39

Application: Matrix rigidity Smallest number of changes to reduce rank [Valiant 77] Rigidity of a matrix NP hard to compute Rigidity bounds have a number of applications Complexity of linear transforms [Valiant 77] Communication complexity [Lokam 95] Cryptography 40

Identifiability issues Problem can be ill posed. Need to ensure that terms cannot be simultaneously sparse and low rank. Bad Benign Define two geometric quantities, related to tangent spaces to sparse and low rank varieties: M M 0 Rank r matrices 41

Tangent space identifiability Necessary and sufficient condition for exact decomposition Tangent spaces must intersect transversally Sufficient condition for transverse intersection 42

Back to decomposition If, where is sparse and is low-rank, combinatorial, NP-hard in general not known how to choose when does this exactly recover? 43

Natural convex relaxation Propose: Convex program (in fact, an SDP) 44

Matrix decomposition Theorem: For any A* and B* is unique optimum of convex program for a range of Essentially a refinement of tangent space transversality conditions Transverse intersection: Convex recovery: Under natural random assumptions, condition holds w.h.p. 45

Summary Sparsity, rank, and beyond Manyapplications Common geometric formulation: secant varieties Convex hulls of these varieties give good proxies for optimization Algebraic and geometric aspects Theoretical challenges Efficient descriptions Sharper, verifiable conditions for recovery Other formulations (e.g., Ames Vavasis on planted cliques) Finite fields? Algorithmic issues Reliable, large scale methods 46

Thank you! Want to know more? Details below, and in references therein: B. Recht, M. Fazel, P.A. Parrilo, Guaranteed Minimum Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization, arxiv:0706.4138, 2007. V. Chandrasekaran, S. Sanghavi, P.A. Parrilo, A. Willsky, Rank Sparsity Incoherence for Matrix Decomposition, arxiv:0906.2220, 2009. Thanks for your attention! 47