Lift me up but not too high Fast algorithms to solve SDP s with block-diagonal constraints

Similar documents
A Riemannian low-rank method for optimization over semidefinite matrices with block-diagonal constraints

The non-convex Burer Monteiro approach works on smooth semidefinite programs

arxiv: v2 [math.oc] 6 Jan 2016

Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization

Manifold optimization for k-means clustering

c 2010 Society for Industrial and Applied Mathematics

Module 04 Optimization Problems KKT Conditions & Solvers

On the local stability of semidefinite relaxations

Interior-Point Methods for Linear Optimization

PENNON A Generalized Augmented Lagrangian Method for Convex NLP and SDP p.1/39

Orientation Assignment in Cryo-EM

Computer Science and Artificial Intelligence Laboratory Technical Report. MIT-CSAIL-TR February 5, 2017

E5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming

Homework 5. Convex Optimization /36-725

Low-ranksemidefiniteprogrammingforthe. MAX2SAT problem. Bosch Center for Artificial Intelligence

A direct formulation for sparse PCA using semidefinite programming

Acyclic Semidefinite Approximations of Quadratically Constrained Quadratic Programs

Projection methods to solve SDP

Semidefinite Programming

1 Computing with constraints

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines

Three-dimensional structure determination of molecules without crystallization

Solving PhaseLift by low-rank Riemannian optimization methods for complex semidefinite constraints

Optimization Methods. Lecture 23: Semidenite Optimization

Semidefinite Programming Basics and Applications

arxiv: v3 [cs.cv] 3 Sep 2013

Fast and Robust Phase Retrieval

Convex Optimization M2

PENNON A Generalized Augmented Lagrangian Method for Nonconvex NLP and SDP p.1/22

A Continuation Approach Using NCP Function for Solving Max-Cut Problem

Part 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL)

Riemannian Optimization and its Application to Phase Retrieval Problem

On the low-rank approach for semidefinite programs arising in synchronization and community detection

An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization

6-1 The Positivstellensatz P. Parrilo and S. Lall, ECC

Semidefinite Programming, Combinatorial Optimization and Real Algebraic Geometry

ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications

SDP Relaxations for MAXCUT

Sketchy Decisions: Convex Low-Rank Matrix Optimization with Optimal Storage

Nonlinear Optimization for Optimal Control

10-725/36-725: Convex Optimization Prerequisite Topics

ONP-MF: An Orthogonal Nonnegative Matrix Factorization Algorithm with Application to Clustering

User s Guide for LMaFit: Low-rank Matrix Fitting

Handout 6: Some Applications of Conic Linear Programming

BURER-MONTEIRO GUARANTEES FOR GENERAL SEMIDEFINITE PROGRAMS arxiv: v1 [math.oc] 15 Apr 2019

Fast Algorithms for SDPs derived from the Kalman-Yakubovich-Popov Lemma

Advances in Convex Optimization: Theory, Algorithms, and Applications

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints

Part 5: Penalty and augmented Lagrangian methods for equality constrained optimization. Nick Gould (RAL)

Some Optimization and Statistical Learning Problems in Structural Biology

Sparse PCA with applications in finance

Lecture 3: Semidefinite Programming

Optimization and Root Finding. Kurt Hornik

The Projected Power Method: An Efficient Algorithm for Joint Alignment from Pairwise Differences

Sparse Optimization Lecture: Basic Sparse Optimization Models

Fast Angular Synchronization for Phase Retrieval via Incomplete Information

approximation algorithms I

minimize x x2 2 x 1x 2 x 1 subject to x 1 +2x 2 u 1 x 1 4x 2 u 2, 5x 1 +76x 2 1,

Introduction to gradient descent

Sparse and low-rank decomposition for big data systems via smoothed Riemannian optimization

An interior-point stochastic approximation method and an L1-regularized delta rule

Sparse Covariance Selection using Semidefinite Programming

Fast ADMM for Sum of Squares Programs Using Partial Orthogonality

Image restoration: numerical optimisation

LINEAR AND NONLINEAR PROGRAMMING

Generalized Power Method for Sparse Principal Component Analysis

Collaborative Filtering Matrix Completion Alternating Least Squares

Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds

Network Localization via Schatten Quasi-Norm Minimization

Linear Algebra. P R E R E Q U I S I T E S A S S E S S M E N T Ahmad F. Taha August 24, 2015

Solving large Semidefinite Programs - Part 1 and 2

Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization

Probabilistic Graphical Models

Big Data Analytics: Optimization and Randomization

Numerical Optimization of Partial Differential Equations

A Feasible Method for Optimization with Orthogonality Constraints

Optimization on the Grassmann manifold: a case study

Statistical Computing (36-350)

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares

N. L. P. NONLINEAR PROGRAMMING (NLP) deals with optimization models with at least one nonlinear function. NLP. Optimization. Models of following form:

POLYNOMIAL OPTIMIZATION WITH SUMS-OF-SQUARES INTERPOLANTS

Solving SDPs for synchronization and MaxCut problems via the Grothendieck inequality

Positive Semi-definite programing and applications for approximation

SF2822 Applied nonlinear optimization, final exam Saturday December

minimize x subject to (x 2)(x 4) u,

III. Applications in convex optimization

Tutorial on Convex Optimization: Part II

Linear Regression. CSL603 - Fall 2017 Narayanan C Krishnan

Stochastic dynamical modeling:

Linear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan

A direct formulation for sparse PCA using semidefinite programming

Low-Rank Factorization Models for Matrix Completion and Matrix Separation

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

An Extended Frank-Wolfe Method, with Application to Low-Rank Matrix Completion

4TE3/6TE3. Algorithms for. Continuous Optimization

Computing and Processing Correspondences with Functional Maps

Numerical Algorithms as Dynamical Systems

L26: Advanced dimensionality reduction

8 Approximation Algorithms and Max-Cut

Transcription:

Lift me up but not too high Fast algorithms to solve SDP s with block-diagonal constraints Nicolas Boumal Université catholique de Louvain (Belgium) IDeAS seminar, May 13 th, 2014, Princeton

The Riemannian staircase Because sometimes you re just going to the second floor

This talk is about solving this, fast: min X f(x) X = X T 0, X ii = I d for i = 1 m. Let s see how it comes up in applications.

Synchronization of rotations Orthogonal matrices to estimate: Q 1, Q 2,, Q m O(d). Measurements: C ij = Q i Q j T + ε ij Q i C ij Q j

Synchronization of rotations Measurements (white noise): C ij = Q i Q j T + ε ij Maximum likelihood: min Q i O(d) i,j C ij Q i Q j T 2 Q i C ij Q j

Synchronization of rotations Measurements (white noise): C ij = Q i Q j T + ε ij Maximum likelihood: max Q i O(d) i,j Trace(C ij T Q i Q j T ) Q i C ij Q j

Maximizing the likelihood is NP-hard max Trace(C ij T Q i Q j T ) i,j Such that Q i Q i T = I d for i = 1 m Indeed: if d = 1, this includes Max-Cut

The classic trick is to lift: replace quadratic terms by linear ones. max Trace(C ij T Q i Q j T ) i,j Such that Q i Q T i = I d for i = 1 m T Introduce X ij = Q i Q j

The classic trick is to lift: replace quadratic terms by linear ones. max Trace(C ij T X ij ) i,j Such that X ii = I d for i = 1 m Introduce X ij = Q i Q j T

From Q to X, a block matrix such that: X ij = Q i Q j T, thus: X = Q 1 T Q 1 T Q m R md md Q m

From Q to X, a block matrix such that: X = In other words: Q 1 T Q 1 QT m R md md Q m X = X T 0, X ii = I d for i = 1 m, rank X = d.

This new problem formulation is equivalent to the original one. max X Trace(CX) X = X T 0, X ii = I d for i = 1 m, rank X = d

Dropping the rank constraint altogether yields an SDP relaxation. max X Trace(CX) X = X T 0, This is sometimes called the Orthogonal-Cut SDP. X ii = I d for i = 1 m. If C 0, the value of this SDP approximates the value of the rank-constrained (hard) problem.

More generally, we address this problem (with f convex, smooth): min X f(x) X = X T 0, X ii = I d for i = 1 m. Control over the cost means we can aim for robustness.

A few different applications involve the same formulation The generalized Procrustes problem Global registration (Chaudhury et al. 12) Synchronization of rotations (Singer 11) Common lines registration (LUD) (Wang et al. 13) Orthogonal-Cut (Bandeira et al. 13), Phase-Cut (Waldspurger et al. 12), Max-Cut (Goemans et al. 95)

Computation time in minutes The computation time grows quickly, but the final answer is always of rank d. SeDuMi Number m of rotations to synchronize

We should expect low-rank solutions There exists a solution of rank n(d + 1) (Pataki 98, for the linear cost case) Wishful thinking: the underlying problem calls for a low-rank solution (?)

Lifting is like playing hide and seek in a reverse pyramid shaped building, and you know the guy you re looking for went by the stairs, but you take the elevator to the last floor and start searching down from there.

The SDPLR idea (Burer et al. 03, 04): Factorize with tall and skinny Y. min Y Trace(CYY T ) such that X = YY T is feasible, Y R n p. rank X p They handle constraints via an augmented Lagrangian. If Y is rank deficient, X is optimal.

What if most local optimizers are full-rank? In practice, we don t see that. Burer and Monteiro ( 04) explain why for linear cost functions (Theorem 3.4): Suppose Y is a local optimizer for p such that p d + 1 m. Then, X = YY T is contained in the relative interior of a face F of the SDP over which the objective function is constant. Moreover, if F is just an extreme point, then X is a global optimizer of the SDP.

Computation time in minutes Much better. But SDPLR is generic software. Our constraints have structure SeDuMi SDPLR Number m of rotations to synchronize

Acceptable Y s live on a manifold X = X T 0, rank X p, X ii = I d i Y R n p such that X = YY T, Y 1 Y =, Y i R d p Y m Y i Y i T = I d i

Thus, the nonlinear program is a Riemannian optimization problem min Y f(yy T ) Y i is d p orthonormal for i = 1 m. We use Riemannian Trust-Regions to solve this. See Absil, Baker, Gallivan: Trust-Region Methods on Riemannian Manifolds (2007). Matlab toolbox: manopt.org

Thus, the nonlinear program is a Riemannian optimization problem min Y f(yy T ) Y Stiefel d, p m. We use Riemannian Trust-Regions to solve this. See Absil, Baker, Gallivan: Trust-Region Methods on Riemannian Manifolds (2007). Matlab toolbox: manopt.org

Computation time in minutes SeDuMi All solvers return solutions of rank d. The staircase method optimizes just once, with rank d+1. SDPLR Staircase Number m of rotations to synchronize

Computation time in seconds SeDuMi SDPLR 68[s] Staircase 3[s] Number m of rotations to synchronize

Pros and cons SDPLR Our method Deals with any SDP Is restricted to diagonal block constraints Handles only linear costs Penalizes constraints in the cost Handles any smooth cost (guarantees if convex) Satisfies the constraints at all iterates Is mature C code Is experimental Matlab code

That s all very well in practice, but does it work in theory? min X f(x) X 0, X ii = I d. min Y g Y = f(yy T ) Y Stiefel d, p m. Theorem: Let Y be a local minimizer of the nonlinear program. If Y is rank deficient and/or if p = n (Y is square), then X = YY T is a global minimizer of the convex program.

This suggests an algorithm 1. Set p = d + 1. 2. Compute Y p, a Riemannian local optimizer. 3. If Y p is rank deficient, stop. 4. Otherwise, increase p and go to 2. This is guaranteed to return a globally optimal X. (Worst case scenario: p increases all the way to n.)

From local opt Y to global opt X. min X f(x) X 0, X ii = I d. min Y g Y = f(yy T ) Y Stiefel d, p m. X is globally optimal iff there exists S such that: (KKT) X 0, X ii = I d S 0, SX = 0 f X S is block diagonal If Y is locally optimal, then grad g Y = 0 Hess g Y 0 These are the Riemannian (projected) gradient and Hessian.

(d + 1) st singular value of Y Synchronization of 100 orthogonal matrices in O(d = 3), optimizing with p = d + 1. Noise level on the measurements

Phase transition for rank recovery? It appears that even at high levels of noise, the SDP admits a rank d solution. This solves the hard problem How can we understand this?

µ-partial answer: single cycle synch For synchronization on a cycle, with measurements C 12, C 23, C 34,, C m1 O d, if the product of the measurements Proof: write explicit solution and intuit a dual certificate. P = C 12 C 23 C m1 has no eigenvalue -1, the SDP has a rank d solution.

Three further ideas to think about Robust works too: minimize sum of unsquared errors with Huber regularization, fast. Fancy rounding technique: if rank(x) > d, project to rank d and re-optimize. Additional constraints could be handled by convex penalties (research in progress).

Will you take the stairs next time? Code available on my webpage. Or e-mail me: nicolasboumal@gmail.com

Robust synchronization: the leastunsquared deviation approach (LUD) min R 1,,R m i j C ij R i R j T F such that R i R i T = I d and det R i = 1. Wang & Singer 2013: Exact and stable recovery of rotations for robust synchronization.

Robust synchronization: the leastunsquared deviation approach (LUD) min X 0 i j C ij X ij such that X ii = I d and det R i = 1. Wang & Singer 2013: Exact and stable recovery of rotations for robust synchronization. F Nonlinear cost: SDPLR does not apply. The authors use an adapted ADMM.

Robust synchronization: Smoothing the cost with a Huber loss. min X 0 i j l H ij X ij F such that X ii = I d. l

Dealing with additional constraints? For example, when searching for permutations, enforcing X ii = I d and X ij doubly stochastic is useful, since if rank(x) = d, then the X ij s are doubly stochastic and orthogonal, hence they are permutations.

There are ways to accommodate more constraints in our algorithm For example, enforce A i, X b i by penalizing min A i, X b i ε log e i i A i, X b i ε But it adds parameters and it reduces the competitive edge of the Riemannian approach. (research in progress)