Dimensionality reduction of SDPs through sketching

Technische Universität München Workshop on "Probabilistic techniques and Quantum Information Theory", Institut Henri Poincaré Joint work with Andreas Bluhm arxiv:1707.09863

Semidefinite Programs (SDPs) Semidefinite programs are constrained optimization problems of the form: maximize tr (AX ) subject to tr (B i X ) γ i, i [m] where A, B 1,..., B m M sym D γ 1,..., γ m R. X 0, are symmetric matrices and

Semidefinite Programs (SDPs) Semidefinite programs are constrained optimization problems of the form: maximize tr (AX ) subject to tr (B i X ) γ i, i [m] X 0, where A, B 1,..., B m M sym D are symmetric matrices and γ 1,..., γ m R. They can be seen as a generalization of linear programs and have many applications throughout QIT.

Semidefinite Programs (SDPs) Good news: in most cases they can be solved in polynomial time!

Semidefinite Programs (SDPs) Good news: in most cases they can be solved in polynomial time! Using the ellipsoid method we can solve them in O(max { m, D 2 } D 6 log(1/ζ)) time, where ζ is the error tolerance. Bad news: scaling still prohibitive for high dimensional problems! Especially when it comes to memory. Try running an SDP with D 10 3 on your laptop and you will already run out of memory.

Sketch of Idea Apply a positive linear map Φ : M D M d to constraints s.t. tr (Φ(B i )Φ(X )) tr (B i X ) holds with high probability. Here X is a solution of the SDP.

Sketch of Idea Apply a positive linear map Φ : M D M d to constraints s.t. tr (Φ(B i )Φ(X )) tr (B i X ) holds with high probability. Here X is a solution of the SDP. Solve the SDP defined by Φ(B i ) and show that its value is not far from the value of the original problem.

SDPs cannot be sketched Theorem (SDPs cannot be sketched) Let Φ : M 2D R d be a random linear map such that for all sketchable SDPs there exists an algorithm which allows us to estimate the value of an SDP up to a constant factor 1 τ < 2 3 given the sketch {Φ(A), Φ(B 1 ),..., Φ(B m )} with probability at least 9/10. Then d = Ω(D 2 ).

SDPs cannot be sketched Theorem (Not all SDPs can be sketched) Let Φ : M 2D R d be a random linear map such that for all sketchable SDPs there exists an algorithm which allows us to estimate the value of an SDP up to a constant factor 1 τ < 2 3 given the sketch {Φ(A), Φ(B 1 ),..., Φ(B m )} with probability at least 9/10. Then d = Ω(D 2 ).

Johnson-Lindenstrauss transforms Definition (Johnson-Lindenstrauss transform) A random matrix S M d,d is a Johnson-Lindenstrauss transform (JLT) with parameters (ɛ, δ, k) if with probability at least 1 δ, for any k-element subset V K D, for all v, w V it holds that Sv, Sw v, w ɛ v 2 w 2. Example: S = 1 d R M d,d, where the entries of R are i.i.d. standard Gaussian random variables. If d = Ω(ɛ 2 log(kδ 1 )), then S is an (ɛ, δ, k)-jlt.

Sketching the HS scalar product Lemma (Sketching Hilbert-Schmidt Scalar Product) Let B 1,..., B m M D and S M d,d be an (ɛ, δ, k)-jlt with ɛ 1 and k such that k m rank (B i ). i=1 Then, with probability at least 1 δ, ( ) i, j [m] : tr SB i S T SB j S T tr (B i B j ) 3ɛ B i 1 B j 1

Bad scaling ( ) tr SB i S T SB j S T tr (B i B j ) 3ɛ B i 1 B j 1 Scaling with 1 is undesirable. Normal JLT gives scaling with 2.

Bad scaling ( ) tr SB i S T SB j S T tr (B i B j ) 3ɛ B i 1 B j 1 Scaling with 1 is undesirable. Normal JLT gives scaling with 2. Proof of the inequality admittedly crude. Can we improve the inequality?

No Johnson-Lindenstrauss with positive maps Theorem (No Johnson-Lindenstrauss with positive maps) Let Φ : M D M d be a random positive map such that with strictly positive probability for any Y 1,... Y D+1 M D and 0 < ɛ < 1 4 we have ( ) ( ) tr Φ(Y i ) T Φ(Y j ) tr Yi T Y j ɛ Y i 2 Y j 2. Then d = Ω(D).

The Algorithm Assumptions: uniform bounds on A 1, B 1 1,..., B m 1 and X 1, where X is an optimal point of the SDP. Standard regularity assumptions on SDP.

The Algorithm Assumptions: uniform bounds on A 1, B 1 1,..., B m 1 and X 1, where X is an optimal point of the SDP. Standard regularity assumptions on SDP. Consider the sketchable SDP of dimension D: maximize tr (AX ) subject to tr (B i X ) γ i, i [m] X 0.

The Algorithm Now pick a (ɛ, δ, k) JL-transform S M d,d, where m k rank (X ) + rank (A) + rank (B i ), i=1 and consider the SDP of dimension d: ( ) maximize tr SAS T Y ( ) subject to tr SB i S T Y γ i, Y 0. i [m]

The Algorithm Relax the constraints: ( ) maximize tr SAS T Y ( ) subject to tr SB i S T Y γ i + 3ɛ B i 1 X 1, Y 0 i [m]

The Algorithm Relax the constraints: ( ) maximize tr SAS T Y ( ) subject to tr SB i S T Y γ i + 3ɛ B i 1 X 1, Y 0 i [m] Call this SDP the sketched SDP!

The Algorithm Relax the constraints: ( ) maximize tr SAS T Y ( ) subject to tr SB i S T Y γ i + 3ɛ B i 1 X 1, Y 0 i [m] Call this SDP the sketched SDP! Solve this SDP!

The Algorithm Relax the constraints: ( ) maximize tr SAS T Y ( ) subject to tr SB i S T Y γ i + 3ɛ B i 1 X 1, Y 0. i [m] Bound on HS scalar product gives that SX S T is a feasible point of the relaxed problem with probability 1 δ. We have tr ( SAS T SX S T ) tr (AX ) 3ɛ X 1 A 1. But tr (AX ) = α is the value of the sketchable SDP! We therefore obtain α S + 3ɛ X 1 A 1 α, where α S is the value of the sketched SDP.

Upper bound through sketch Theorem Let A, B 1,..., B m M sym D, η, γ 1,..., γ m R and ɛ > 0. Denote by α the value of the sketchable SDP and assume it is attained at an optimal point X which satisfies tr (X ) η. Moreover, let S M d,d be an (ɛ, δ, k)-jlt, with k rank (X ) + rank (A) + m rank (B i ). Let α S be the value of the sketched SDP defined by A, B i and S. Then with probability at least 1 δ. i=1 α S + 3ɛη A 1 α

Lower Bound Can it be the case that α S α?

Lower Bound Can it be the case that α S α? Depends on how stable your SDP is!

Lower Bound Let Y be an optimal point of your sketched SDP. That is, a solution of: ( ) maximize tr SAS T Y ( ) subject to tr SB i S T Y γ i + 3ɛ B i 1 X 1, i [m] Y 0

Lower Bound By the cyclicity of the trace, S T Y S is a feasible point of maximize tr (AX ) subject to tr (B i X ) γ i + 3ɛ B i 1 X 1, i [m] X 0 with value α S.

Lower Bound By the cyclicity of the trace, S T Y S is a feasible point of maximize tr (AX ) subject to tr (B i X ) γ i + 3ɛ B i 1 X 1, i [m] X 0 with value α S. This is just a perturbed verision of the original SDP!

Lower Bound for positive γ i Theorem (Lower Bound in terms of α S ) For a sketchable SDP with γ i = 1 and κ = max i [m] B i 1, we have that α S 1 + ν α, where ν = 3ɛηκ. Moreover, denoting by XS an optimal point of the 1 sketched SDP, we have that 1+ν S T XS S is a feasible point of the sketchable SDP that attains this lower bound.

Summary Theorem For a sketchable SDP with γ i = 1 and κ = max i [m] B i 1, we have that where ν = 3ɛηκ. α S 1 + ν α α S + 3ɛη A 1,

Complexity and Memory Considerations Assuming A 1, B 1,..., B m 1, X 1 = O(1) and ɛ, δ, ζ fixed we obtain: Theorem Let A, B 1,..., B m M sym D, of a sketchable SDP be given. Furthermore, let SDP(m, d) be the complexity of solving a sketchable SDP of dimension d and m constraints up to some given precision. Then a number of O(D 2 m log(k)) + SDP(m, log(k))) operations is needed to generate and solve the sketched SDP, where k (m + 2)D 2 is defined as before.

Complexity and Memory Considerations Assuming ɛ, δ fixed, sketching gives a speedup as long as the complexity of solving the SDP directly is Ω(mD 2+µ ), where µ > 0.

Complexity and Memory Considerations Assuming ɛ, δ fixed, sketching gives a speedup as long as the complexity of solving the SDP directly is Ω(mD 2+µ ), where µ > 0. Need only to store O(mɛ 4 log(mk/δ) 2 ) entries to solve the sketched problem.

Uncertainty Relations Given observables A, B M sym D, consider uncertainty relations of the form tr ( A 2 ρ ) + tr ( B 2 ρ ) c for all states ρ s.t. tr (Aρ) (a ɛ, a + ɛ) and tr (Bρ) (b ɛ, b + ɛ).

Uncertainty Relations Given observables A, B M sym D, consider uncertainty relations of the form tr ( A 2 ρ ) + tr ( B 2 ρ ) c for all states ρ s.t. tr (Aρ) (a ɛ, a + ɛ) and tr (Bρ) (b ɛ, b + ɛ). Finding the optimal c can easily be cast as an SDP.

Uncertainty Relations minimize tr ( (A 2 + B 2 )X ) subject to tr (AX ) a ± ɛ, tr (BX ) b ± ɛ, tr (X ) = 1, X 0.

Uncertainty Relations minimize tr ( (A 2 + B 2 )X ) subject to tr (AX ) a ± ɛ, tr (BX ) b ± ɛ, tr (X ) = 1, X 0. Can t handle tr (X ) = 1 as a constraint. Relax problem and drop it.

Uncertainty Relations minimize tr ( (A 2 + B 2 )X ) subject to tr (AX ) a ± ɛ, tr (BX ) b ± ɛ, X 0. Can t handle tr (X ) = 1 as a constraint. Relax problem and drop it.

Uncertainty Relations minimize tr ( (A 2 + B 2 )X ) subject to tr (AX ) a ± ɛ, tr (BX ) b ± ɛ, X 0. Can t handle tr (X ) = 1 as a constraint. Relax problem and drop it. If B 1, A 1 = O(1) and their nonzero spectrum flat, we can show X 1 = O(1).

Numerical Results D d Value Error L.B. M.R.T. Sketchable [s] M.R.T Sketch [s] 200 50 0.0928 0.0429 6.73 0.663 200 100 0.0897 0.0401 6.51 1.336 500 100 0.0353 0.0181 96.5 1.35 500 200 0.0364 0.0152 96.4 6.81 Table: For each combination of the sketchable dimension (D) and dimension of the sketch (d) we have generated 40 instances of the uncertainty relation SDP. Here M.R.T. stands for mean running time, L.B. stands for lower bound we obtained from the sketch. The column Value stands for the optimal value of the sketchable SDP.

Thanks!