Dimensionality reduction of SDPs through sketching

Similar documents
Dimensionality reduction of SDPs through sketching

Lecture 18: March 15

Faster Johnson-Lindenstrauss style reductions

6-1 The Positivstellensatz P. Parrilo and S. Lall, ECC

Interior Point Algorithms for Constrained Convex Optimization

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 11 Luca Trevisan February 29, 2016

EE 227A: Convex Optimization and Applications October 14, 2008

Rank minimization via the γ 2 norm

Semidefinite and Second Order Cone Programming Seminar Fall 2012 Project: Robust Optimization and its Application of Robust Portfolio Optimization

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

A Unified Theorem on SDP Rank Reduction. yyye

On positive maps in quantum information.

Assignment 1: From the Definition of Convexity to Helley Theorem

Convex Optimization in Classification Problems

12. Interior-point methods

MIT Algebraic techniques and semidefinite optimization February 14, Lecture 3

Sketching as a Tool for Numerical Linear Algebra

Barrier Method. Javier Peña Convex Optimization /36-725

Lecture 3: Semidefinite Programming

Global Optimization of Polynomials

Accelerated Dense Random Projections

Lecture #21. c T x Ax b. maximize subject to

3. Vector spaces 3.1 Linear dependence and independence 3.2 Basis and dimension. 5. Extreme points and basic feasible solutions

Non-commutative polynomial optimization

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

1 The linear algebra of linear programs (March 15 and 22, 2015)

Determining a span. λ + µ + ν = x 2λ + 2µ 10ν = y λ + 3µ 9ν = z.

Free probability and quantum information

ON THE MINIMUM VOLUME COVERING ELLIPSOID OF ELLIPSOIDS

Semidefinite Programming

Lecture notes for quantum semidefinite programming (SDP) solvers

4. Algebra and Duality

Fast Random Projections

15.083J/6.859J Integer Optimization. Lecture 10: Solving Relaxations

Properties for systems with weak invariant manifolds

Introduction to Semidefinite Programming I: Basic properties a

Compressed Sensing: Lecture I. Ronald DeVore

1 Positive and completely positive linear maps

ENTANGLED STATES ARISING FROM INDECOMPOSABLE POSITIVE LINEAR MAPS. 1. Introduction

Semidefinite Programming Basics and Applications

c 2000 Society for Industrial and Applied Mathematics

Network Localization via Schatten Quasi-Norm Minimization

Example: feasibility. Interpretation as formal proof. Example: linear inequalities and Farkas lemma

C* ALGEBRAS AND THEIR REPRESENTATIONS

A notion of Total Dual Integrality for Convex, Semidefinite and Extended Formulations

The Johnson-Lindenstrauss Lemma

SDP Relaxations for MAXCUT

Methods for sparse analysis of high-dimensional data, II

Proving Unsatisfiability in Non-linear Arithmetic by Duality

NORMS ON SPACE OF MATRICES

Lecture 6 September 13, 2016

E5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming

MP 472 Quantum Information and Computation

Semidefinite Programming

Notes on the decomposition result of Karlin et al. [2] for the hierarchy of Lasserre by M. Laurent, December 13, 2012

INTERIOR-POINT METHODS ROBERT J. VANDERBEI JOINT WORK WITH H. YURTTAN BENSON REAL-WORLD EXAMPLES BY J.O. COLEMAN, NAVAL RESEARCH LAB

Multiuser Downlink Beamforming: Rank-Constrained SDP

The Trust Region Subproblem with Non-Intersecting Linear Constraints

Lecture 8: Linear Algebra Background

Compression, Matrix Range and Completely Positive Map

Operator norm convergence for sequence of matrices and application to QIT

x 1 + x 2 2 x 1 x 2 1 x 2 2 min 3x 1 + 2x 2

Various notions of positivity for bi-linear maps and applications to tri-partite entanglement

Lecture: Examples of LP, SOCP and SDP

Approximation Algorithms

Towards stability and optimality in stochastic gradient descent

The query register and working memory together form the accessible memory, denoted H A. Thus the state of the algorithm is described by a vector

Advances in Convex Optimization: Theory, Algorithms, and Applications

Convex optimization problems. Optimization problem in standard form

Constrained optimization

Lecture 1. 1 Conic programming. MA 796S: Convex Optimization and Interior Point Methods October 8, Consider the conic program. min.

Exercises Chapter II.

March 2002, December Introduction. We investigate the facial structure of the convex hull of the mixed integer knapsack set

arzelier

Global Maxwellians over All Space and Their Relation to Conserved Quantites of Classical Kinetic Equations

Branch-and-cut (and-price) for the chance constrained vehicle routing problem

Signal Recovery from Permuted Observations

Applications of Linear Programming

1. Consider the following polyhedron of an LP problem: 2x 1 x 2 + 5x 3 = 1 (1) 3x 2 + x 4 5 (2) 7x 1 4x 3 + x 4 4 (3) x 1, x 2, x 4 0

Acyclic Semidefinite Approximations of Quadratically Constrained Quadratic Programs

Lecture 8: Semidefinite programs for fidelity and optimal measurements

Handout 8: Dealing with Data Uncertainty

STAT 200C: High-dimensional Statistics

Sparsity Matters. Robert J. Vanderbei September 20. IDA: Center for Communications Research Princeton NJ.

A Quantum Interior Point Method for LPs and SDPs

ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications

Lecture: Algorithms for LP, SOCP and SDP

SGD and Randomized projection algorithms for overdetermined linear systems

1 Review of last lecture and introduction

Recent Developments in Compressed Sensing

We describe the generalization of Hazan s algorithm for symmetric programming

Notes taken by Graham Taylor. January 22, 2005

CS 6820 Fall 2014 Lectures, October 3-20, 2014

1. What is the determinant of the following matrix? a 1 a 2 4a 3 2a 2 b 1 b 2 4b 3 2b c 1. = 4, then det

Handout 6: Some Applications of Conic Linear Programming

On the computation of Hermite-Humbert constants for real quadratic number fields

MATH 323 Linear Algebra Lecture 12: Basis of a vector space (continued). Rank and nullity of a matrix.

Scientific Computing

An Approximation Algorithm for Approximation Rank

A strongly polynomial algorithm for linear systems having a binary solution

Transcription:

Technische Universität München Workshop on "Probabilistic techniques and Quantum Information Theory", Institut Henri Poincaré Joint work with Andreas Bluhm arxiv:1707.09863

Semidefinite Programs (SDPs) Semidefinite programs are constrained optimization problems of the form: maximize tr (AX ) subject to tr (B i X ) γ i, i [m] where A, B 1,..., B m M sym D γ 1,..., γ m R. X 0, are symmetric matrices and

Semidefinite Programs (SDPs) Semidefinite programs are constrained optimization problems of the form: maximize tr (AX ) subject to tr (B i X ) γ i, i [m] X 0, where A, B 1,..., B m M sym D are symmetric matrices and γ 1,..., γ m R. They can be seen as a generalization of linear programs and have many applications throughout QIT.

Semidefinite Programs (SDPs) Semidefinite programs are constrained optimization problems of the form: maximize tr (AX ) subject to tr (B i X ) γ i, i [m] X 0, where A, B 1,..., B m M sym D are symmetric matrices and γ 1,..., γ m R. They can be seen as a generalization of linear programs and have many applications throughout QIT. They can be written in many equivalent forms. Will call this one sketchable SDP. Any SDP can be formulated in this form.

Semidefinite Programs (SDPs) Good news: in most cases they can be solved in polynomial time!

Semidefinite Programs (SDPs) Good news: in most cases they can be solved in polynomial time! Using the ellipsoid method we can solve them in O(max { m, D 2 } D 6 log(1/ζ)) time, where ζ is the error tolerance.

Semidefinite Programs (SDPs) Good news: in most cases they can be solved in polynomial time! Using the ellipsoid method we can solve them in O(max { m, D 2 } D 6 log(1/ζ)) time, where ζ is the error tolerance. Bad news: scaling still prohibitive for high dimensional problems! Especially when it comes to memory. Try running an SDP with D 10 3 on your laptop and you will already run out of memory.

Semidefinite Programs (SDPs) Good news: in most cases they can be solved in polynomial time! Using the ellipsoid method we can solve them in O(max { m, D 2 } D 6 log(1/ζ)) time, where ζ is the error tolerance. Bad news: scaling still prohibitive for high dimensional problems! Especially when it comes to memory. Try running an SDP with D 10 3 on your laptop and you will already run out of memory. Need techniques to solve larger problems. Ideally using available solvers.

Sketch of Idea Apply a positive linear map Φ : M D M d to constraints s.t. tr (Φ(B i )Φ(X )) tr (B i X ) holds with high probability. Here X is a solution of the SDP.

Sketch of Idea Apply a positive linear map Φ : M D M d to constraints s.t. tr (Φ(B i )Φ(X )) tr (B i X ) holds with high probability. Here X is a solution of the SDP. Solve the SDP defined by Φ(B i ) and show that its value is not far from the value of the original problem.

Sketch of Idea Apply a positive linear map Φ : M D M d to constraints s.t. tr (Φ(B i )Φ(X )) tr (B i X ) holds with high probability. Here X is a solution of the SDP. Solve the SDP defined by Φ(B i ) and show that its value is not far from the value of the original problem. If d << D and computing Φ(B i ) is cheap, then this gives a computational advantage.

SDPs cannot be sketched Theorem (SDPs cannot be sketched) Let Φ : M 2D R d be a random linear map such that for all sketchable SDPs there exists an algorithm which allows us to estimate the value of an SDP up to a constant factor 1 τ < 2 3 given the sketch {Φ(A), Φ(B 1 ),..., Φ(B m )} with probability at least 9/10. Then d = Ω(D 2 ).

SDPs cannot be sketched Theorem (Not all SDPs can be sketched) Let Φ : M 2D R d be a random linear map such that for all sketchable SDPs there exists an algorithm which allows us to estimate the value of an SDP up to a constant factor 1 τ < 2 3 given the sketch {Φ(A), Φ(B 1 ),..., Φ(B m )} with probability at least 9/10. Then d = Ω(D 2 ).

Johnson-Lindenstrauss transforms Definition (Johnson-Lindenstrauss transform) A random matrix S M d,d is a Johnson-Lindenstrauss transform (JLT) with parameters (ɛ, δ, k) if with probability at least 1 δ, for any k-element subset V K D, for all v, w V it holds that Sv, Sw v, w ɛ v 2 w 2.

Johnson-Lindenstrauss transforms Definition (Johnson-Lindenstrauss transform) A random matrix S M d,d is a Johnson-Lindenstrauss transform (JLT) with parameters (ɛ, δ, k) if with probability at least 1 δ, for any k-element subset V K D, for all v, w V it holds that Sv, Sw v, w ɛ v 2 w 2. Example: S = 1 d R M d,d, where the entries of R are i.i.d. standard Gaussian random variables. If d = Ω(ɛ 2 log(kδ 1 )), then S is an (ɛ, δ, k)-jlt.

Sketching the HS scalar product Lemma (Sketching Hilbert-Schmidt Scalar Product) Let B 1,..., B m M D and S M d,d be an (ɛ, δ, k)-jlt with ɛ 1 and k such that k m rank (B i ). i=1 Then, with probability at least 1 δ, ( ) i, j [m] : tr SB i S T SB j S T tr (B i B j ) 3ɛ B i 1 B j 1

Bad scaling ( ) tr SB i S T SB j S T tr (B i B j ) 3ɛ B i 1 B j 1 Scaling with 1 is undesirable. Normal JLT gives scaling with 2.

Bad scaling ( ) tr SB i S T SB j S T tr (B i B j ) 3ɛ B i 1 B j 1 Scaling with 1 is undesirable. Normal JLT gives scaling with 2. Proof of the inequality admittedly crude. Can we improve the inequality?

No Johnson-Lindenstrauss with positive maps Theorem (No Johnson-Lindenstrauss with positive maps) Let Φ : M D M d be a random positive map such that with strictly positive probability for any Y 1,... Y D+1 M D and 0 < ɛ < 1 4 we have ( ) ( ) tr Φ(Y i ) T Φ(Y j ) tr Yi T Y j ɛ Y i 2 Y j 2. Then d = Ω(D).

The Algorithm Assumptions: uniform bounds on A 1, B 1 1,..., B m 1 and X 1, where X is an optimal point of the SDP. Standard regularity assumptions on SDP.

The Algorithm Assumptions: uniform bounds on A 1, B 1 1,..., B m 1 and X 1, where X is an optimal point of the SDP. Standard regularity assumptions on SDP. Consider the sketchable SDP of dimension D: maximize tr (AX ) subject to tr (B i X ) γ i, i [m] X 0.

The Algorithm Now pick a (ɛ, δ, k) JL-transform S M d,d, where m k rank (X ) + rank (A) + rank (B i ), i=1 and consider the SDP of dimension d: ( ) maximize tr SAS T Y ( ) subject to tr SB i S T Y γ i, Y 0. i [m]

The Algorithm Relax the constraints: ( ) maximize tr SAS T Y ( ) subject to tr SB i S T Y γ i + 3ɛ B i 1 X 1, Y 0 i [m]

The Algorithm Relax the constraints: ( ) maximize tr SAS T Y ( ) subject to tr SB i S T Y γ i + 3ɛ B i 1 X 1, Y 0 i [m] Call this SDP the sketched SDP!

The Algorithm Relax the constraints: ( ) maximize tr SAS T Y ( ) subject to tr SB i S T Y γ i + 3ɛ B i 1 X 1, Y 0 i [m] Call this SDP the sketched SDP! Solve this SDP!

The Algorithm Relax the constraints: ( ) maximize tr SAS T Y ( ) subject to tr SB i S T Y γ i + 3ɛ B i 1 X 1, Y 0. i [m] Bound on HS scalar product gives that SX S T is a feasible point of the relaxed problem with probability 1 δ.

The Algorithm Relax the constraints: ( ) maximize tr SAS T Y ( ) subject to tr SB i S T Y γ i + 3ɛ B i 1 X 1, Y 0. i [m] Bound on HS scalar product gives that SX S T is a feasible point of the relaxed problem with probability 1 δ. We have tr ( SAS T SX S T ) tr (AX ) 3ɛ X 1 A 1.

The Algorithm Relax the constraints: ( ) maximize tr SAS T Y ( ) subject to tr SB i S T Y γ i + 3ɛ B i 1 X 1, Y 0. i [m] Bound on HS scalar product gives that SX S T is a feasible point of the relaxed problem with probability 1 δ. We have tr ( SAS T SX S T ) tr (AX ) 3ɛ X 1 A 1. But tr (AX ) = α is the value of the sketchable SDP! We therefore obtain α S + 3ɛ X 1 A 1 α, where α S is the value of the sketched SDP.

The Algorithm Relax the constraints: ( ) maximize tr SAS T Y ( ) subject to tr SB i S T Y γ i + 3ɛ B i 1 X 1, Y 0. i [m] Bound on HS scalar product gives that SX S T is a feasible point of the relaxed problem with probability 1 δ. We have tr ( SAS T SX S T ) tr (AX ) 3ɛ X 1 A 1. But tr (AX ) = α is the value of the sketchable SDP! We therefore obtain α S + 3ɛ X 1 A 1 α, where α S is the value of the sketched SDP.

Upper bound through sketch Theorem Let A, B 1,..., B m M sym D, η, γ 1,..., γ m R and ɛ > 0. Denote by α the value of the sketchable SDP and assume it is attained at an optimal point X which satisfies tr (X ) η. Moreover, let S M d,d be an (ɛ, δ, k)-jlt, with k rank (X ) + rank (A) + m rank (B i ). Let α S be the value of the sketched SDP defined by A, B i and S. Then with probability at least 1 δ. i=1 α S + 3ɛη A 1 α

Lower Bound Can it be the case that α S α?

Lower Bound Can it be the case that α S α? Depends on how stable your SDP is!

Lower Bound Let Y be an optimal point of your sketched SDP. That is, a solution of: ( ) maximize tr SAS T Y ( ) subject to tr SB i S T Y γ i + 3ɛ B i 1 X 1, i [m] Y 0

Lower Bound By the cyclicity of the trace, S T Y S is a feasible point of maximize tr (AX ) subject to tr (B i X ) γ i + 3ɛ B i 1 X 1, i [m] X 0 with value α S.

Lower Bound By the cyclicity of the trace, S T Y S is a feasible point of maximize tr (AX ) subject to tr (B i X ) γ i + 3ɛ B i 1 X 1, i [m] X 0 with value α S. This is just a perturbed verision of the original SDP!

Lower Bound for positive γ i Theorem (Lower Bound in terms of α S ) For a sketchable SDP with γ i = 1 and κ = max i [m] B i 1, we have that α S 1 + ν α, where ν = 3ɛηκ. Moreover, denoting by XS an optimal point of the 1 sketched SDP, we have that 1+ν S T XS S is a feasible point of the sketchable SDP that attains this lower bound.

Summary Theorem For a sketchable SDP with γ i = 1 and κ = max i [m] B i 1, we have that where ν = 3ɛηκ. α S 1 + ν α α S + 3ɛη A 1,

Complexity and Memory Considerations Assuming A 1, B 1,..., B m 1, X 1 = O(1) and ɛ, δ, ζ fixed we obtain: Theorem Let A, B 1,..., B m M sym D, of a sketchable SDP be given. Furthermore, let SDP(m, d) be the complexity of solving a sketchable SDP of dimension d and m constraints up to some given precision. Then a number of O(D 2 m log(k)) + SDP(m, log(k))) operations is needed to generate and solve the sketched SDP, where k (m + 2)D 2 is defined as before.

Complexity and Memory Considerations Assuming ɛ, δ fixed, sketching gives a speedup as long as the complexity of solving the SDP directly is Ω(mD 2+µ ), where µ > 0.

Complexity and Memory Considerations Assuming ɛ, δ fixed, sketching gives a speedup as long as the complexity of solving the SDP directly is Ω(mD 2+µ ), where µ > 0. Need only to store O(mɛ 4 log(mk/δ) 2 ) entries to solve the sketched problem.

Uncertainty Relations Given observables A, B M sym D, consider uncertainty relations of the form tr ( A 2 ρ ) + tr ( B 2 ρ ) c for all states ρ s.t. tr (Aρ) (a ɛ, a + ɛ) and tr (Bρ) (b ɛ, b + ɛ).

Uncertainty Relations Given observables A, B M sym D, consider uncertainty relations of the form tr ( A 2 ρ ) + tr ( B 2 ρ ) c for all states ρ s.t. tr (Aρ) (a ɛ, a + ɛ) and tr (Bρ) (b ɛ, b + ɛ). Finding the optimal c can easily be cast as an SDP.

Uncertainty Relations minimize tr ( (A 2 + B 2 )X ) subject to tr (AX ) a ± ɛ, tr (BX ) b ± ɛ, tr (X ) = 1, X 0.

Uncertainty Relations minimize tr ( (A 2 + B 2 )X ) subject to tr (AX ) a ± ɛ, tr (BX ) b ± ɛ, tr (X ) = 1, X 0.

Uncertainty Relations minimize tr ( (A 2 + B 2 )X ) subject to tr (AX ) a ± ɛ, tr (BX ) b ± ɛ, tr (X ) = 1, X 0. Can t handle tr (X ) = 1 as a constraint. Relax problem and drop it.

Uncertainty Relations minimize tr ( (A 2 + B 2 )X ) subject to tr (AX ) a ± ɛ, tr (BX ) b ± ɛ, X 0. Can t handle tr (X ) = 1 as a constraint. Relax problem and drop it.

Uncertainty Relations minimize tr ( (A 2 + B 2 )X ) subject to tr (AX ) a ± ɛ, tr (BX ) b ± ɛ, X 0. Can t handle tr (X ) = 1 as a constraint. Relax problem and drop it. If B 1, A 1 = O(1) and their nonzero spectrum flat, we can show X 1 = O(1).

Uncertainty Relations minimize tr ( (A 2 + B 2 )X ) subject to tr (AX ) a ± ɛ, tr (BX ) b ± ɛ, X 0. Can t handle tr (X ) = 1 as a constraint. Relax problem and drop it. If B 1, A 1 = O(1) and their nonzero spectrum flat, we can show X 1 = O(1). Example: A, B of fixed rank and nonzero spectrum contained in some compact interval.

Numerical Results D d Value Error L.B. M.R.T. Sketchable [s] M.R.T Sketch [s] 200 50 0.0928 0.0429 6.73 0.663 200 100 0.0897 0.0401 6.51 1.336 500 100 0.0353 0.0181 96.5 1.35 500 200 0.0364 0.0152 96.4 6.81 Table: For each combination of the sketchable dimension (D) and dimension of the sketch (d) we have generated 40 instances of the uncertainty relation SDP. Here M.R.T. stands for mean running time, L.B. stands for lower bound we obtained from the sketch. The column Value stands for the optimal value of the sketchable SDP.

Thanks!