Agenda Applications of semidefinite programming 1 Control and system theory 2 Combinatorial and nonconvex optimization 3 Spectral estimation & super-resolution
Control and system theory SDP in wide use in control theory Example: differential inclusion (ẋ(t) is time derivative) ẋ(t) = Ax(t) + Bu(t) (S) y(t) = Cx(t) u i (t) y i (t) x(t) R n, y(t), u(t) R p Problem: Find ellipsoid E such that for any x, u obeying (S) x(0) E x(t) E t 0 Implication: if such E exists, then all solutions of differential inclusion are bounded
Quadratic Lyapunov function Ellipsoid E = { x : x T P x 1 } P 0 Quadratic Lyapunov function V (t) = x(t) T P x(t) Claim E invariant V (t) nonincreasing Proof: obvious V (0) 0 (otherwise leave E ) V (t) > 0 and x(t) λe. Then starting at x(0) = λ 1 x(t) E, we would leave E Hence, existence of a Lyapunov function proves stability of (S)
(i) V (t) 0 [ x(t) u(t) ] T [ A T P + P A P B B T P 0 ] [ x(t) u(t) ] 0 (ii) u i (t) y i (t) u 2 i (t) y 2 i (t) 0 With y i (t) = c i (t) T x(t), this can be expressed as [ ] T [ ] [ ] x(t) c T i c i 0 x(t) 0 u(t) 0 E ii u(t) where E ii is matrix with all zero entries except (i, i)th equal to 1 E invariant (ii) (i) That is, constraint quadratic holds V (t) 0 Formally, we want z R n+p z T T i z 0 i = 1,..., p z T T 0 z 0 where T 0 = [ A T P + P A ] P B B T P 0 [ ] c T T i = i c i 0 0 E ii
Obvious sufficient condition: λ 1,..., λ p 0 such that T 0 λ 1 T 1 +... + λ p T p called S-procedure in control (analogy with later cvx relaxations) Λ = diagλ i T 0 [ ] c λ i T i = T Λc 0 0 Λ [ ] A T P + P A + c T Λc P B B T 0 P Λ By solving an SDP feasibilty problem, we can certify stability (find an invariant E)
Applications of semidefinite programming 1 Control and system theory 2 Combinatorial and nonconvex optimization 3 Spectral estimation & super-resolution
Combinatorial and nonconvex optimization min f 0(x) s.t. f i (x) 0 i = 1,..., m f i (x) = x T A i x + 2b T i x + c i A i 0 cvx prob A i S n indefinite non cvx, very hard
Examples: A. Boolean least-squares min Ax b 2 s.t. x i { 1, 1} i = 1,..., n Basic problem in digital communications: MLE for digital signals Boolean least squares can be cast as nonconvex QCQP min x T A T Ax 2b T x + b T b s.t. x 2 i 1 = 0 i = 1,..., n x 2 i 1 = 0 { x 2 i 1 0 x 2 i 1 0
Examples: B. minimum cardinality problems min card(x) s.t. Ax b card(x) = x l0 = {i : x i 0} Many applications in signal processing, statistics, finance; e.g. optimization with fixed transaction costs portfolio z i = 1{x i 0} (1 z i )x i = 0 z i {0, 1} Min cardinality problem can be cast as nonconvex QCQP in (x, z) min zi s.t. Ax b (1 z i )x i = 0 zi 2 z i = 0
Examples: C. partitioning problems min xt Qx s.t. x 2 i = 1 i = 1,..., n Q S n, x R n Feasible gives a partition {1,..., n} = {i : x i = 1} {i : x i = 1} Interpretation Q ij is cost of having i and j in same partition Q ij is cost of having i and j in different partitions x T Qx is total cost Problem: Find partition with least total cost noncvx QCQP
Examples: D. MAXCUT Graph G = (V, E) with weighted edges { w ij (i, j) E 0 otherwise MAXCUT cut of G with largest possible weight: partition (V 1, V 2 ) s.t. sum of weights of edges between V 1 and V 2 is maximized classical problem in network optimization special case of partitioning probem
Weight of a particular cut f 0 (x) = 1 2 i,j:x ix j= 1 w ij = 1 w ij (1 x i x j ) 4 i,j Set W ij = { w ij i j 0 i = j D ij = { 0 i j j i w ij i = j MAXCUT max x T (D W )x := x T Ax s.t. x 2 i = 1 i = 1,..., n
Examples: E. polynomial problems min p 0 (x) s.t. p i (x) 0 i = 1,..., m more complex than QCQP? No! All polynomial problems can be cast as QCQPs e.g. min x3 2xyz + y + z s.t. x 2 + y 2 + z 2 1 0 new variables u = x 2, v = yz. min xu 2xv + y + z s.t. x 2 + y 2 + z 2 1 0 u x 2 = 0 v yz = 0
Two tricks: (i) can reduce max degree of an equation via { y 2n u + (...) n + (...) 0 u = y 2 (ii) can eliminate product terms { ux + (...) 0 xyz + (...) 0 u = yz Apply tricks iteratively reduction to QCQP (noncvx)
Convex relaxations Q: How to get lower bound on opt. value? (QCQP) min xt A 0 x + 2b T 0 x + c 0 s.t. x T A i x + 2b T i x + c i 0 Semidefinite relaxation (QCQP) x T A i x = trace(x T A i x) = trace(a i xx T ) min trace(a 0 X) + 2b T 0 x + c 0 s.t. trace(a i X) + 2b T i x + c i i = 1,..., m X = xx T Relax noncvx constraint X = xx T by considering X xx T. [ ] X xx T X x x T 0 (Schur complement) 1
Convex relaxations SDP relaxation min trace(xa 0 ) + 2b T 0 x + c 0 s.t. trace(xa [ ] i ) + 2b T i x + c i i = 1,..., m X x x T 0 1 Lagrangian relaxation min xt A 0 x + 2b T 0 + c 0 s.t. x T A i x + 2b T i x + c i i = 1,..., m Lagrangian: L(x, λ) = x T A(λ)x + 2b T (λ)x + c(λ) A(λ) = A 0 + λ i A i b(λ) = b 0 + λ i b i c(λ) = c 0 + λ i c i
min x T Ax + 2b T x + c = { c b T A b A 0, b R(A) otherwise Dual function: g(λ) = b(λ) T A b(λ) + c(λ) max γ + c(λ) Dual problem: s.t. λ i 0 A(λ) 0 b(λ) T A (λ)b(λ) γ This is an SDP! max γ + c(λ) s.t. λ [ i 0 ] A(λ) b(λ) b(λ) T 0 γ
Question: Which relaxation is better? Two problems are dual from each other If strictly feasible, bounds are the same Perfect duality: Sometimes cvx relaxation is exact, i.e. under some conditions OPT(D) = OPT(P )
Examples Boolean LS min Ax b 2 s.t. x 2 i = 1 min trace(a T Ax) 2b T Ax + b T b s. t. X ii = 1 [ ] X x X xx T or x T 0 1 Partitioning and max cut min xt W x s.t. x 2 i = 1 min trace(w X) s.t X ii = 1 X xx T or [ ] X x x T 0 1
Delicate issue Relaxations provide a lower bound on optimal value but provide no hints on how to compute a good feasible point Frequently discusssed approach: randomization Case study: MAXCUT max s.t. X 0 X ii = 1 i,j w ij(1 X ij ) is the SDP relaxation of MAXCUT (P ) max wij (1 x i x j ) = x T Ax s.t. x 2 i = 1
Goemans and Williamson (1996) Theorem If X feasible for SDP: 0.878SDP OPT SDP X 0 X ij = vi T v j (X = V T V ) X ii = 1 v i = 1 V can be obtained by Cholevsky factorization Pick v at random on unit sphere: cut V + = {i : v T v i > 0} V = {i : v T v i < 0} x i = sgn(v T v i )
What is the expected value of this cut? Expected weight of random cut E w ij (1 x i x j ) = 2w ij P(V separates i and j) 2 π = 2w ij P(sgn(vi T v) sgn (vi T v)) ( ) 2θ = 2w ij 2π = 2 π w ij cos 1 (v T i v j ) w ij cos 1 (X ij ) i,j
and so 2 π cos 1 (t) α(1 t) α = 0.87856... 2 w ij cos 1 (X ij ) α w ij (1 X ij ) = αtrace(ax) π i,j i,j True for all feasible X true for optimal X Expected weight from random cut generated by X opt is at least α SDP This gives OPT α SDP Provides an algorithm (randomized) for finding a good cut which on average has weight at least 87.5 % of OPT
Expected weight of a random cut Suppose x i iid with P(x i = ±1) = 1/2 E x T Ax = ij w ij (1 E x i x j ) = ij w ij Expected weight of a random cut is at least 50% of total edge weight No polynomial approximation algorithm with constant better than 0.9412 exists unless P = NP [Hästaad 97]
Extension I: diagonal dominance max xt Ax s.t. x 2 i = 1 max trace(ax) s.t. X ii = 1 X 0 If A is diagonally dominant, then same result holds Diagonal dominance a ii a ij for all i j:j i
If A diag. dominant, then x T Ax is a sum of terms of the form x 2 i and (x i ± x j ) 2 with positive coefficients. In expectation 1 2 E(x i ± x j ) 2 = E(1 ± x i x j ) = 1 ± 2 π sin 1 (X ij ) 0.878(1 ± X ij ) Value of GW randomized cut obeys 0.878trace(AX) E x T Ax p trace(ax) For graph Laplacian A = D W x T Ax = 1 w ij (x i x j ) 2 2 ij
Extension: A 0 max xt Ax s.t. x 2 i = 1 max trace(ax) s.t. X ii = 1 X 0 Theorem (Nesterov s theorem) If A 0, then 2 π SDP E xt Ax SDP with the same randomized construction
X 0 = sin 1 (X) X (*) Hence, E x T Ax = 2 π trace(a sin 1 (X)) 2 π trace(ax) Proof of (*) relies on a fact: assume f : R R has a Taylor series with non-negative coefficients and set Y = f(x) [Y ij = f(x ij )]. Then X 0 = Y 0 Apply with f(t) = sin 1 (t) t to get (*) Proof of fact is a direct consequence of this: A, B 0 = A B 0 Hadamard product: (A B) ij = A ij B ij
Extension: bipartite graphs [ ] 0 S T A = 1 2 S 0 max xt Ax s.t. x 2 i = 1 max v T Su s.t. u 2 i = 1 vi 2 = 1 First analyzed by Gothendieck κ G = sup A trace(ax) p Theorem (Krivine) κ G 1.7822
Lemma f, g : R R s.t. f + g and f g have nonnegative Taylor coefficients. Let [ ] [ ] X11 X X = 12 Y11 Y X12 T Y = 12 X 22 Y12 T Y 22 Then X 0 = Y 0 f(t) = sinh(c κ πt/2) with c κ 0.5611 so that f(1) = 1 g(t) = sin(c κ πt/2) f and g are as in Lemma since sinh(t) = k=0 t 2k+1 (2k + 1)! sin(t) = ( 1) k t 2k+1 (2k + 1)! k=0
X is optimal sol and Y is as in lemma Y 0 and Y ii = 1 We can apply rounding to feas. Y to get y E y T Ay = 2 π trace(a sin 1 (Y )) = 2 π trace(s sin 1 (Y 12 )) = c κ trace(sx 12 ) At least c κ times best possible value c κ trace(sx 12 ) E y T Ay trace(sx 12 )
Generalized randomization approach max trace(ax) s.t. X 0 X ii = 1 (1) v N(0, X ) (2) x i = sgn(v i ) Sometimes E X f 0 (X) α p
Applications of semidefinite programming 1 Control and system theory 2 Combinatorial and nonconvex optimization 3 Spectral estimation & super-resolution
Spectral estimation Sparse superposition of tones s(t) = j c j e i2πωjt (+ noise) ω j [0, 1] Observe samples d = s(t), t T n = {0, 1,..., n} Problem How do we find frequencies and amplitudes?
Convex programming approach If ω Ω with Ω finite, natural procedure min c 1 s.t. Ac = d A tω = e i2πωt (t, ω) T n Ω But Ω = [0, 1]... Proposal Recover signal by solving min c TV subject to Ac = d total-variation norm c TV = sup j c(b j ) with sup over all finite partitions {B j } of [0, 1] Linear mapping (Ac)(t) = e i2πωt c(dω) Continuum of decision variables!
Super-resolution Swap time and frequency x = j c j δ τj c j C, τ j [0, 1] Wish to recover x: spike locations and amplitudes Only have low-frequency data d d k = j c j e i2πktj k = n/2, n/2 + 1,..., n/2 Recovery Linear mapping min x TV subject to Ax = d (Ax)(k) = e i2πkt x(dt) Continuum of decision variables!
Formulation as a finite-dimensional problem Primal problem min x TV s. t. Ax = y Infinite-dimensional variable x Finitely many constraints Semidefinite representability (A c)(t) 1 for all t [0, 1] equivalent to Dual problem (1) there is Q Hermitian s. t. [ ] Q c c 0 1 (2) trace(q) = 1 max Re y, c s. t. A c 1 Finite-dimensional variable c Infinitely many constraints (A c)(t) = c k e i2πkt k n/2 (3) sums along superdiagonals vanish, n j i=1 Q i,i+j = 0 for 1 j n 1
Semidefinite representability P (t) = n 1 k=0 c ke i2πkt P (t) 1 for all t [ ] Q c c 0, 1 n j Q i,i+j = i=1 { 1 j = 0 0 j = 1, 2,..., n 1 = (easy part) [ ] Q c c 0 1 Q cc 0 = z cc z z Qz z = (z 0,..., z n 1 ), z k = e i2πkt z Qz = 1 z cc z = c z 2 = p(t) 2
How to compute primal solutions? Use complementary slackness Support of x contained in {t : p(t) = 1} Find support and solve least-squares problem
References 1 A. Ben-Tal and A. Nemirovski, Lectures on Modern Convex Optimization: Analysis, Algorithms, and Engineering Applications, MPS-SIAM Series on Optimization 2 S. Boyd, EE 364B, Stanford University 3 Semidefinite Optimization and Convex Algebraic Geometry, Edited by G. Blekherman, P. Parrilo and R. Thomas 4 E. J. Candès, and C. Fernandez-Granda, Towards a mathematical theory of super-resolution. To appear in Comm. Pure Appl. Math