Sum-of-Squares Method, Tensor Decomposition, Dictionary Learning

Similar documents
approximation algorithms I

Unique Games Conjecture & Polynomial Optimization. David Steurer

Lower bounds on the size of semidefinite relaxations. David Steurer Cornell

Sum-of-Squares and Spectral Algorithms

Approximation & Complexity

SDP Relaxations for MAXCUT

Tight Size-Degree Lower Bounds for Sums-of-Squares Proofs

Unique Games and Small Set Expansion

Lecture 5. Max-cut, Expansion and Grothendieck s Inequality

Maximum cut and related problems

Lecture 4: Polynomial Optimization

Hierarchies. 1. Lovasz-Schrijver (LS), LS+ 2. Sherali Adams 3. Lasserre 4. Mixed Hierarchy (recently used) Idea: P = conv(subset S of 0,1 n )

Provable Alternating Minimization Methods for Non-convex Optimization

U.C. Berkeley CS294: Beyond Worst-Case Analysis Handout 12 Luca Trevisan October 3, 2017

6-1 The Positivstellensatz P. Parrilo and S. Lall, ECC

Non-convex Robust PCA: Provable Bounds

8 Approximation Algorithms and Max-Cut

Agenda. Applications of semidefinite programming. 1 Control and system theory. 2 Combinatorial and nonconvex optimization

Convex Optimization. (EE227A: UC Berkeley) Lecture 28. Suvrit Sra. (Algebra + Optimization) 02 May, 2013

Computational Lower Bounds for Statistical Estimation Problems

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 11 Luca Trevisan February 29, 2016

Convex sets, conic matrix factorizations and conic rank lower bounds

Introduction to Semidefinite Programming I: Basic properties a

The convex algebraic geometry of rank minimization

CSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming

Lecture 12 : Graph Laplacians and Cheeger s Inequality

Lecture 13: Spectral Graph Theory

Lecture 3: Semidefinite Programming

2.1 Laplacian Variants

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering

Semidefinite programming lifts and sparse sums-of-squares

Robust and Optimal Control, Spring 2015

Rounding Sum-of-Squares Relaxations

MIT Algebraic techniques and semidefinite optimization February 14, Lecture 3

Analysis and synthesis: a complexity perspective

ORIE 6334 Spectral Graph Theory September 8, Lecture 6. In order to do the first proof, we need to use the following fact.

Graph Partitioning Using Random Walks

Optimization over Nonnegative Polynomials: Algorithms and Applications. Amir Ali Ahmadi Princeton, ORFE

Graph Partitioning Algorithms and Laplacian Eigenvalues

ORIE 6334 Spectral Graph Theory November 22, Lecture 25

U.C. Berkeley CS294: Beyond Worst-Case Analysis Handout 8 Luca Trevisan September 19, 2017

On the efficient approximability of constraint satisfaction problems

Basic Calculus Review

IE 521 Convex Optimization

Dictionary Learning Using Tensor Methods

Lecture 14: Random Walks, Local Graph Clustering, Linear Programming

Donald Goldfarb IEOR Department Columbia University UCLA Mathematics Department Distinguished Lecture Series May 17 19, 2016

Functional Analysis Review

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

CS 229r: Algorithms for Big Data Fall Lecture 19 Nov 5

Semidefinite Programming Basics and Applications

sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU)

5 Compact linear operators

Lectures 2 3 : Wigner s semicircle law

Graph Sparsification I : Effective Resistance Sampling

linear programming and approximate constraint satisfaction

Orthogonal tensor decomposition

Inverse Power Method for Non-linear Eigenproblems

Lecture 17 (Nov 3, 2011 ): Approximation via rounding SDP: Max-Cut

Sparse and Low-Rank Matrix Decompositions

Lecture 10: October 27, 2016

arxiv: v1 [math.oc] 26 Sep 2015

Lectures 2 3 : Wigner s semicircle law

Grothendieck s Inequality

CS675: Convex and Combinatorial Optimization Fall 2016 Combinatorial Problems as Linear and Convex Programs. Instructor: Shaddin Dughmi

Convex and Semidefinite Programming for Approximation

Lecture 9: Low Rank Approximation

PCA, Kernel PCA, ICA

Semidefinite Programming

Subsampling Semidefinite Programs and Max-Cut on the Sphere

BEYOND MATRIX COMPLETION

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

1 Regression with High Dimensional Data

The moment-lp and moment-sos approaches

Lecture 12: Introduction to Spectral Graph Theory, Cheeger s inequality

BEYOND MATRIX COMPLETION

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

An 0.5-Approximation Algorithm for MAX DICUT with Given Sizes of Parts

Partitioning Algorithms that Combine Spectral and Flow Methods

Lecture 20: Goemans-Williamson MAXCUT Approximation Algorithm. 2 Goemans-Williamson Approximation Algorithm for MAXCUT

Reductions Between Expansion Problems

Near-Optimal Algorithms for Maximum Constraint Satisfaction Problems

Non-Convex Optimization via Real Algebraic Geometry

Tractable Upper Bounds on the Restricted Isometry Constant

Signal Recovery from Permuted Observations

A better approximation ratio for the Vertex Cover problem

Complexity of 10 Decision Problems in Continuous Time Dynamical Systems. Amir Ali Ahmadi IBM Watson Research Center

Modeling with semidefinite and copositive matrices

THE HIDDEN CONVEXITY OF SPECTRAL CLUSTERING

Spectral Graph Theory Lecture 2. The Laplacian. Daniel A. Spielman September 4, x T M x. ψ i = arg min

Acyclic Semidefinite Approximations of Quadratically Constrained Quadratic Programs

ECE 598: Representation Learning: Algorithms and Models Fall 2017

How hard is it to find a good solution?

Optimization methods

Tensor Methods for Feature Learning

ORIE 6334 Spectral Graph Theory December 1, Lecture 27 Remix

Reconstruction from Anisotropic Random Measurements

Structured matrix factorizations. Example: Eigenfaces

Lecture 2: November 9

Supplementary lecture notes on linear programming. We will present an algorithm to solve linear programs of the form. maximize.

Transcription:

Sum-of-Squares Method, Tensor Decomposition, Dictionary Learning David Steurer Cornell Approximation Algorithms and Hardness, Banff, August 2014

for many problems (e.g., all UG-hard ones): better guarantees stronger relaxations more complex proxy solutions??? possible relaxations (tractable problems) original problem (hard to solve directly) flows multicommodity, expander, electrical actual solutions metrics negative-type, simplex proxy solutions distributions Gaussian, random spanning tree

for many problems (e.g., all UG-hard ones): better guarantees stronger relaxations more complex proxy solutions??? possible relaxations (tractable problems) original problem (hard to solve directly) flows multicommodity, expander, electrical actual solutions metrics negative-type, simplex proxy solutions small price: need to use low-degree lens sum-of-squares allows us to avoid proxy solutions and focus on actual solutions distributions Gaussian, random spanning tree

example: max cut x i = 1 x i = 1 combinatorial viewpoint: polynomial viewpoint: given undirected graph G, bipartition vertex set to cut as many edges as possible given polynomial L G (x) = ij E G 1 4 x i x j 2, find maximum over x ±1 V(G) (hypercube) how to certify upper bound c on maximum? R 2 2 1 + + R t decompose c L G as sum of squares of polynomials plus quadratic polynomial vanishing over hypercube α 1 x 1 2 1 + + α n x n 2 1 n 2 -size semidefinite program

example: max cut x i = 1 x i = 1 combinatorial viewpoint: polynomial viewpoint: given undirected graph G, bipartition vertex set to cut as many edges as possible given polynomial L G (x) = ij E G 1 4 x i x j 2, find maximum over x ±1 V(G) (hypercube) how to certify upper bound c on maximum? R 2 2 1 + + R t decompose c L G as sum of squares of polynomials plus quadratic polynomial vanishing over hypercube α 1 x 1 2 1 + + α n x n 2 1 n 2 -size semidefinite program polynomial is sum of squares coefficient matrix is positive-semidefinite

example: max cut x i = 1 x i = 1 combinatorial viewpoint: polynomial viewpoint: given undirected graph G, bipartition vertex set to cut as many edges as possible given polynomial L G (x) = ij E G 1 4 x i x j 2, find maximum over x ±1 V(G) (hypercube) how to certify upper bound c on maximum? R 2 2 1 + + R t decompose c L G as sum of squares of polynomials plus quadratic polynomial vanishing over hypercube α 1 x 1 2 1 + + α n x n 2 1 n 2 -size semidefinite program Goemans-Williamson bound: either decomposition exists or max. 0.868 c ( best known approximation guarantee) spectral methods: add restriction α 1 = = α n largest Laplacian eigenvalue

example: max cut x i = 1 x i = 1 combinatorial viewpoint: polynomial viewpoint: how to certify upper bound c on maximum? given undirected graph G, bipartition vertex set to cut as many edges as possible given polynomial L G (x) = ij E G 1 4 x i x j 2, find maximum over x ±1 V(G) (hypercube) R 2 2 1 + + R t decompose c L G as sum of squares of polynomials plus quadratic polynomial vanishing over hypercube degree-k α 1 x 1 2 1 + + α n x n 2 1 n k n 2 -size semidefinite program degree-n bound is exact (interpolate c L G as degree-n polynomial over hypercube) does degree-n o(1) bound improve over GW in worst-case? (would refute UGC) for every* candidate graph construction, degree-16 bound improves over GW [Brandao-Barak-Harrow-Kelner-S.-Zhou]

multivariate polynomials P 1,, P m R x 1,, x n when is E unsatisfiable over R n? sum-of-squares (SOS) refutation of E system of equations E = P 1 = 0,, P m = 0 idea: derive 1 0 from E obviously unsatisfiable constraint always negative Q 1 P 1 + + Q m P m + R 1 2 + + R t 2 = 1 non-negative over E ( derived constraint ) intuitive proof system: many common inequalities have proofs in this form, e.g., Cauchy-Schwarz, Hölder, l p -triangle inequalities linear case: Gaussian elimination, Farkas lemma Real Nullstellensatz every polynomial system is either satisfiable over R n or SOS refutable [Artin, Krivine, Stengle] SOS method: n O k -time algorithm to find SOS refutation with degrees k if one exists (uses SDP) [Shor, Nesterov, Parrilo, Lasserre]

multivariate polynomials P 1,, P m R x 1,, x n system of equations E = P 1 = 0,, P m = 0 what if no deg.-d SOS refutation exists? degree-d SOS implication, E d P 0, P = Q 1 P 1 + + Q m P m + R 1 2 + + R t 2 with deg Q i P i, deg R i 2 d 1 linear functional E on deg.-d polynomials E1 = 1 P P 1P2 P m E EP 0 whenever E d P 0 no appreciable difference to deg.-d moments of some distribution over solutions to E P E d P 0 deg.-d pseudo-distribution for E

SOS implication: hypercube triangle inequality suppose: E = x 2 = 1, y 2 = 1, z 2 = 1 and P = x y 2 + y z 2 x z 2 claim: E 4 P 0 polynomial P as function on ±1 3 y P = 0 x z 0 x = y = z 8 x = y z x z therefore, P = 1 2 x + y 2 x z 2 + x 2 1 Q x + y 2 1 Q y + z 2 1 Q z square polynomial

SOS implication: univariate inequalities suppose: P univariate and P 0 over R claim: deg P P 0 proof by induction on deg P P α 0 choose: minimizer α of P α R then: P = P α + x α 2 P for some polynomial P with deg P < deg P squares sum of squares by ind. hyp. useful consequence deg P deg Q P Q x 1,, x n 0 for any polynomial Q R x 1,, x n concrete infinite family of global constraints (unclear how to get with other methods)

optimization (e.g., MAX CUT) maximize P 0 over P 1 = 0,, P m = 0 v-vs-v approximation: given: sat. system P 0 = v, P 1 = 0,, P m = 0 find: solution to P 0 = v, P 1 = 0,, P m = 0 [Barak-Kelner-S. 14] claim: SOS reduces approximation in time n O k to deg.-k combining subset / distr. x of solutions to P 0 = v, P 1 = 0,, P m = 0 represented by all degree-k moments of x, e.g., E {x} x 1 x k use only properties of moments / solutions with degree-k SOS proofs single solution x to P 0 v, P 1 = 0,, P m = 0 proof: deg.-k combiner cannot distinguish between actual distributions and deg.-k pseudo-distributions

uses only that Cov(x) is p.s.d. has deg.-2 SOS proof v E xx v = E v x 2 0 distribution of solutions {x} deg.-2 combiner for MAX CUT - sample Gaussian distribution ξ with same deg.-2 moments as {x} - output x = sign ξ analysis show: if ξ i, ξ j and x i, x j have same deg.-2 moments, then P x i x j 0.878 P x i x j single solution x involves only 2 variables x i, x j has low-deg. SOS proof

dictionary learning (aka sparse coding) application: machine learning (feature extraction) neuroscience (model for visual cortex) data vectors linear transformation dictionary A = sparse vectors y 1 y T a 1 a m x 1 x T goal: given data vectors y 1,, y T, reconstruct A example: dictionary for natural images [Olshausen-Fields 96] previous works assume incoherence a 1,., a m unknown unit vectors in isotropic position x 1,, x t are i.i.d. samples from unknown nice distr. over sparse vectors (only small correlations between coord s) [Arora-Ge-Moitra, Agarwal-Anandkumar-Jain-Netrapalli-Tandon] previous methods (local search): only very sparse vectors, up to n non-zeros [Barak-Kelner-S. 14] sum-of-squares method: full sparsity range, up to constant fraction non-zeros (quasipolynomial-time for sparsity o(1); polynomial-time for n ε )

dictionary learning (aka sparse coding) application: machine learning (feature extraction) neuroscience (model for visual cortex) data vectors linear transformation dictionary A = sparse vectors example: dictionary for natural images [Olshausen-Fields 96] y 1 y T a 1 a m x 1 x T a 1,., a m unknown unit vectors in isotropic position x 1,, x t are i.i.d. samples from unknown nice distr. over sparse vectors (only small correlations between coord s) theorem: [Barak-Kelner-S. 14] suppose m = O n and correlations between coord s small enough then, O log n -SOS can recover set A {±a 1,, ±a m } in Hausdorff distance

theorem: [Barak-Kelner-S. 14] suppose m = O n and correlations between coord s small enough then, O log n -SOS can recover set A {±a 1,, ±a m } in Hausdorff distance ±a 1 ±a 2 ±a n 1 ±a m?????? 1. construct polynomial P 0 u = 1 T t y t, u 4 from data vectors can show: global optima of P 0 correspond to ±a 1,, ±a m (but no control over local optima of P 0 ) low-degree SOS proof 2. compute global optima of P 0 in general: NP-hard problem (even approximately) approach: use SOS method and degree-o log m combiner works because every solution set clustered around m points

theorem: [Barak-Kelner-S. 14] suppose m = O n and correlations between coord s small enough then, O log n -SOS can recover set A {±a 1,, ±a m } in Hausdorff distance ±a 1 ±a 2 ±a n 1 ±a m connection to robust tensor decomposition?????? M = t y t 4 close to i a i 4 in spectral norm claim: O log m -SOS finds components {±a i } 1. construct polynomial P 0 u = 1 T t y t, u 4 from data vectors can show: global optima of P 0 correspond to ±a 1,, ±a m (but no control over local optima of P 0 ) low-degree SOS proof 2. compute global optima of P 0 in general: NP-hard problem (even approximately) approach: use SOS method and degree-o log m combiner works because every solution set clustered around m points

given: 4-tensor M R n 4 that is ε-close to i a 4 i in spectral norm for orthonormal vectors a 1,, a n R n goal: find a 1,, a n polynomial system: E = M, x 4 = 1 ε x 2 2 = 1 distribution of solutions {x} to E single solution x

given: 4-tensor M R n 4 that is ε-close to i a 4 i in spectral norm for orthonormal vectors a 1,, a n R n goal: find a 1,, a n polynomial system: E = M, x 4 = 1 ε x 2 2 = 1 deg.-o(k) combiner for TENSOR DECOMP. - choose random unit vectors {w} ` - reweigh distribution {x} by w, x 2k so that P x w, x 2k P x - output top eigenvector of E xx ±a 1 ±a 2 ±a n 1 ±a m distribution of solutions {x} to E single solution x analysis has low-deg. SOS proof - solutions {x} clustered around ±a i - with probability 1/n O 1 w, a 2 1 2 max w, a i 2 i>1 reweighing increases probability of a 1 -cluster by factor 2 k relative to rest for k = O log k, reweighted distr. concentrated along ±a 1

conclusions polynomial optimization: often easy when global optima unique (occurs naturally for recovery problems) unsupervised learning: higher-degree SOS gives better guarantees for recovering hidden structures low-degree combiner: general way to make proofs into algorithms thank you!