A Unified Theorem on SDP Rank Reduction. yyye

SDP Rank Reduction Yinyu Ye, EURO XXII 1 A Unified Theorem on SDP Rank Reduction Yinyu Ye Department of Management Science and Engineering and Institute of Computational and Mathematical Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/ yyye Joint work with Anthony So and Jiawei Zhang

SDP Rank Reduction Yinyu Ye, EURO XXII 2 Outline Problem Statement Application New SDP Rank Reduction Theorem and Algorithm Sketch of Proof Extension and Question

SDP Rank Reduction Yinyu Ye, EURO XXII 3 Problem Statement Consider the system of Semidefinite Programming constraints: A i X = b i i = 1,..., m, X 0 where given A 1,..., A m are n n symmetric positive semidefinite matrices, and b 1,..., b m 0, and A X = i,j a ijx ij = TrA T X. Clearly, the feasibility of the above system can be decided by using SDP interior-point algorithms (Alizadeh 91, Nesterov and Nemirovskii 93, etc). More precisely, find an ɛ-approximate solution where solution time is linear in log(1/ɛ).

SDP Rank Reduction Yinyu Ye, EURO XXII 4 Problem Statement (Cont d) However, we are interested in finding a low rank solution to the above system. The low rank problem arises in many applications, e.g.: localizing sensor network (e.g., Biswas and Y 03, So and Y 04) metric embedding/dimension reduction (e.g., Johnson and Lindenstrauss 84, Matousek 90) approximating non-convex (complex, quaternion) quadratic optimization (e.g., Nemirovskii, Roos and Terlaky 99, Luo, Sidiropoulos, Tseng and Zhang 06, Faybusovich 07) graph rigidity/distance matix (e.g., Alfakih, Khandani and Wolkowicz 99, etc.)

SDP Rank Reduction Yinyu Ye, EURO XXII 5 Graph Realization Given a graph G = (V, E) and sets of non negative weights, say {d ij : (i, j) E} and {θ ilj : (i, l, j) Θ}, the goal is to compute a realization of G in the Euclidean space R d for a given low dimension d, i.e. to place the vertices of G in R d such that the Euclidean distance between every pair of adjacent vertices (i, j) equals (or bounded) by the prescribed weight d ij E, and the angle between edges (i, l) and (j, l) equals (or bounded) by the prescribed weight θ ilj Θ.

SDP Rank Reduction Yinyu Ye, EURO XXII 6 0.5 0.4 0.3 0.2 0.1 0 0.1 0.2 0.3 0.4 0.5 0.5 0.4 0.3 0.2 0.1 0 0.1 0.2 0.3 0.4 0.5 Figure 1: 50-node 2-D Sensor Localization

SDP Rank Reduction Yinyu Ye, EURO XXII 7 Figure 2: A 3-D Tensegrity Graph Realization; provided by Anstreicher

SDP Rank Reduction Yinyu Ye, EURO XXII 8 Figure 3: Tensegrity Graph: A Needle Tower; provided by Anstreicher

SDP Rank Reduction Yinyu Ye, EURO XXII 9 Figure 4: Molecular Conformation: 1F39(1534 atoms) with 85% of distances below 6Å and 10% noise on upper and lower bounds

SDP Rank Reduction Yinyu Ye, EURO XXII 10 Math Programming: Rank-Constrained SDP Given a k R d, d ij N x, ˆd kj N a, and v ilj Θ, find x i R d such that x i x j 2 ( ) = ( ) d 2 ij, (i, j) N x, i < j, a k x j 2 ( ) = ( ) ˆd2 kj, (k, j) N a, (x i x l ) T (x j x l ) ( ) = ( ) v ilj, (i, l, j) Θ, which lead to and relaxed to A i X = b i i = 1,..., m, X 0, rank(x) d; A i X = b i i = 1,..., m, X 0.

SDP Rank Reduction Yinyu Ye, EURO XXII 11 Some Background Barvinok 95 showed that if the system is feasible, then there exists a solution X whose rank is at most 2m (also see Carathéodorys theorem). Moreover, Pataki 98 showed how to construct such an X efficiently. Unfortunately, for the applications mentioned above, this is not enough. We want a fixed-low-rank (say d) solution! However, there are some issues: Such a solution may not exist! Even if it does, one may not be able to find it efficiently. So we consider an approximation of the problem.

SDP Rank Reduction Yinyu Ye, EURO XXII 12 Approximating the Problem We consider the problem of finding an ˆX 0 of rank at most d that satisfies the system approximately: β(m, n, d) b i A i ˆX α(m, n, d) b i i = 1,..., m Here, distortion factors α 1 and β (0, 1]. Clearly, the closer are both to 1, the better.

SDP Rank Reduction Yinyu Ye, EURO XXII 13 Our Result Theorem 1. Suppose that the original system is feasible. Let r = max i {Rank(A i )}. Then, for any d 1, there exists an ˆX 0 of rank at most d such that: α(m, n, d) = 12 log(4mr) 1 + d 12 log(4mr) 1 + d for 1 d 12 log(4mr) for d > 12 log(4mr)

SDP Rank Reduction Yinyu Ye, EURO XXII 14 Our Result Theorem 1. Suppose that the original system is feasible. Let r = max i {Rank(A i )}. Then, for any d 1, there exists an ˆX 0 of rank at most d such that: α(m, n, d) = β(m, n, d) = 1 5e 1 4e 12 log(4mr) 1 + d 12 log(4mr) 1 + d for 1 d 12 log(4mr) for d > 12 log(4mr) 1 m 2/d for 1 d 2 log m log log(2m) 1 log f(m)/d (2m) for 2 log m log log(2m) < d 4 log(4mr) 4 log(4mr) 1 d where f(m) = 3 log m/ log log(2m). for d > 4 log(4mr)

SDP Rank Reduction Yinyu Ye, EURO XXII 15 Our Result Theorem 1. Suppose that the original system is feasible. Let r = max i {Rank(A i )}. Then, for any d 1, there exists an ˆX 0 of rank at most d such that: α(m, n, d) = β(m, n, d) = 1 5e 12 log(4mr) 1 + d 12 log(4mr) 1 + d for 1 d 12 log(4mr) for d > 12 log(4mr) 1 m 2/d for 1 d 2 log m log log(2m) 1 4e 1 log f(m)/d (2m) 4 log(4mr) 1 d for 2 log m log log(2m) < d 4 log(4mr) for d > 4 log(4mr) where f(m) = 3 log m/ log log(2m). Moreover, such an ˆX can be found in randomized polynomial time.

SDP Rank Reduction Yinyu Ye, EURO XXII 16 Some Remarks In general, the data parameter r can be bounded by 2m, so that ( ) log m α(m, n, d) = 1 + O d and β(m, n, d) = Ω Ω ( m 2/d) ( 3 log m/(d log log m)) (log m) otherwise for d = O ( log m ) log log m

SDP Rank Reduction Yinyu Ye, EURO XXII 17 Some Remarks (Cont d) In the region 1 d 2 log m/ log log(2m), the lower bound β is independent of the ranks of A 1,..., A m. f(m)/d 3/2 in the region d > 1 4 log(4mr) d 2 log m log log(2m). is a constant in the region d > 4 log(4mr) Our result contains as special cases several well-known results in the literature.

SDP Rank Reduction Yinyu Ye, EURO XXII 18 Early Result: Metric Embedding Given an n point set V = {v 1,..., v n } in R l, we would like to embed it into a low dimensional Euclidean space as faithfully as possible. Specifically, a map f : V R d is an α embedding (where α 1) if u v 2 f(u) f(v) 2 α u v 2 The goal is to find an f such that α is as small as possible. It is known that: for any ɛ > 0, an (1 + ɛ) embedding into R O(ɛ 2 log n) exists (Johnson Lindenstrauss); for any fixed d 1, an O(n 2/d d 1/2 log 1/2 n) embedding into R d exists (Matousek).

SDP Rank Reduction Yinyu Ye, EURO XXII 19 Early Result: Metric Embedding (Cont d) We can get these results from our Theorem. We focus on the fixed d case. Let {e i } m i=1 be the standard basis vectors, and set E ij = (e i e j )(e i e j ) T. Let U be the m n matrix whose i th column is v i. Then, X = U T U satisfies the system E ij X = v i v j 2 2 for 1 i < j n. By our Theorem, we can find an ˆX 0 of rank at most d such that: Ω(n 4/d ) v i v j 2 2 E ij ˆX O(log n/d) v i v j 2 2 Upon decomposing ˆX = Û T Û, where û 1,..., û n R d such that: Û is d n, we recover points Ω(n 2/d ) v i v j 2 û i û j 2 O( log n/d) v i v j 2. The embedding results imply only a weaker version (r = 1) of our theorem.

SDP Rank Reduction Yinyu Ye, EURO XXII 20 Early Result: Approximating QPs Let A 1,..., A m be positive semidefinite. Consider the following QP: v = maximize x T Ax subject to x T A i x 1 i = 1,..., m and its natural SDP relaxation: v sdp = maximize A X subject to A i X 1 i = 1,..., m; X 0 Let X be an optimal solution to the SDP. Nemirovskii et al. showed that one can randomly extract a rank 1 matrix ˆX from X such that it is feasible for the SDP and that E[A ˆX] Ω(log 1 m)v.

SDP Rank Reduction Yinyu Ye, EURO XXII 21 Early Result: Approximating QPs (Cont d) We can obtain a similar result from our Theorem. The matrix X satisfies the system: A i X = b i 1 i = 1,..., m Our proof of the Theorem shows that one can find a rank 1 matrix ˆX 0 such that: E[A ˆX] = vsdp, A i ˆX O(log m) b i i = 1,..., m By scaling down ˆX by a factor of O(log m), we obtain a feasible rank 1 matrix ˆX that satisfies E[A ˆX ] Ω(log 1 m)v.

SDP Rank Reduction Yinyu Ye, EURO XXII 22 Early Result: Approximating QPs (Cont d) Luo et al. considered the following real (complex) QP: minimize x T Ax subject to x T A i x 1 i = 1,..., m and its natural SDP relaxation: minimize A X subject to A i X 1 i = 1,..., m; X 0 They showed how to extract a solution ˆx from an optimal solution matrix to the SDP so that it is feasible for the SDP and that it is within a factor O(m 2 ) (O(m 1 )) of the optimal. Again, we can obtain the same results from our Theorem on both real (d = 1) and complex (d = 2) spaces.

SDP Rank Reduction Yinyu Ye, EURO XXII 23 How Sharp are the Bounds? For metric embedding, it is known that: for any d 1, there exists an n point set V R d+1 such that any embedding of V into R d requires D = Ω(n 1/ (d+1)/2 ) (Matousek); there exists an n point set V R l for some l such that for any ɛ (n 1/2, 1/2), say, an (1 + ɛ) embedding of V into R d will require d = Ω((ɛ 2 log(1/ɛ)) 1 log n) (Alon 03). Thus, from the metric embedding perspective, the ratio of our upper and lower bounds is almost tight for d 3.

SDP Rank Reduction Yinyu Ye, EURO XXII 24 How Sharp are the Bounds? (Cont d) For the QP: v = maximize x T Ax subject to x T A i x 1 i = 1,..., m and its natural SDP relaxation: v sdp = maximize A X subject to A i X 1 i = 1,..., m; X 0 Nemirovskii et al. showed that the ratio between v and v sdp as Ω(log m). can be as large For the minimization version, Luo et al. showed that the ratio can be as small as Ω(m 2 ). Thus, from the QP perspective, the ratio of our upper and lower bounds is almost tight for d = 1.

SDP Rank Reduction Yinyu Ye, EURO XXII 25 Sketch of Proof of the Theorem Without loss of generality, we may assume that X = I is feasible for the original system, that is, our system becomes A i X = Tr(A i ) i = 1,..., m; X 0. Thus, the Theorem becomes: Theorem 2. Let A 1,..., A m be n n positive semidefinite matrices. Then, for any d 1, there exists an ˆX 0 with rank at most d such that: β(m, n, d) Tr(A i ) A i ˆX α(m, n, d) Tr(A i ) i = 1,..., m

SDP Rank Reduction Yinyu Ye, EURO XXII 26 Sketch of Proof of the Theorem (Cont d) The algorithm for generating ˆX is simple: generate i.i.d. Gaussian RVs ξ j i with mean 0 and variance 1/d and define column vector ξ j = (ξ j 1 ;... ; ξj n), for 1 i n and 1 j d; return ˆX = d j=1 ξj (ξ j ) T.

SDP Rank Reduction Yinyu Ye, EURO XXII 27 Sketch of Proof of the Theorem (Cont d) The analysis makes use of the following Markov inequality: Lemma 1. Let ξ 1,..., ξ n be i.i.d. standard Gaussian RVs. Let α (1, ) and β (0, 1) be constants, and Chi-square U n = n i=1 ξ2 i. Then, the following hold: [ n ] Pr (U n αn) exp 2 (1 α + log α) [ n ] Pr (U n βn) exp 2 (1 β + log β)

SDP Rank Reduction Yinyu Ye, EURO XXII 28 Sketch of Proof of the Theorem (Cont d) Lemma 2. Let H be an n n positive semidefinite matrix and r = rank(h). Then, for any β (0, 1), we have: Pr ( H ˆX ) βtr(h) r exp [ d 2 ] (1 β + log β) On the other hand, if β satisfies eβ log r 1/5, then the above can be sharpened to: ( Pr H ˆX ) βtr(h) (5eβ/2) d/2 Note that (2) is independent of r!

SDP Rank Reduction Yinyu Ye, EURO XXII 29 Sketch of Proof of the Theorem (Cont d) Lemma 2. Let H be an n n positive semidefinite matrix and r = rank(h). Then, for any β (0, 1), we have: Pr ( H ˆX ) βtr(h) r exp [ d 2 ] (1 β + log β) On the other hand, if β satisfies eβ log r 1/5, then the above can be sharpened to: ( Pr H ˆX ) βtr(h) (5eβ/2) d/2 (2) Note that (2) is independent of r! (1) For any α > 1, we have: Pr ( H ˆX ) αtr(h) r exp [ d 2 ] (1 α + log α) (3)

SDP Rank Reduction Yinyu Ye, EURO XXII 30 Sketch of Proof of the Theorem (Cont d) It is easy to establish (1) and (3) using Lemma 1. By applying the bounds (1) and (3) of Lemma 2 to each A 1,..., A m and taking the union bound, we can get the upper bound in the Theorem. However, the lower bound obtained this way is weaker. To obtain a better lower bound (for the region 1 d 2 log m/ log log(2m)), we use the bound (2) in Lemma 2. To prove it, consider the spectral decomposition H = r k=1 λ kv k v T k and λ 1 λ 2 λ r > 0.

SDP Rank Reduction Yinyu Ye, EURO XXII 31 Sketch of Proof of the Theorem (Cont d) Recall that it says if β (0, 1) satisfies eβ log r 1/5, then ( Pr H ˆX ) βtr(h) (5eβ/2) d/2 First, by the spectral decomposition, we have ( r ) H ˆX d = λ k v k vk T ξ j (ξ j ) T = k=1 j=1 r k=1 d λ k (vk T ξ j ) 2. j=1 Now, observe that (v T k ξj ) k,j N (0, d 1 ) and mutually independent. Hence, H ˆX has the same distribution as the weighted Chi-square r k=1 λ d ξ k j=1 kj 2, where ξ kj are i.i.d. Gaussian of N (0, d 1 ).

SDP Rank Reduction Yinyu Ye, EURO XXII 32 Sketch of Proof of the Theorem (Cont d) Let λ / r k = λ k k=1 λ k. It then follows that: ( ( Pr H ˆX ) r d r βtr(h) = Pr ξ kj 2 β Pr r λ k k=1 d j=1 k=1 λ k j=1 ξ kj 2 β p(r, λ, β) k=1 We now bound p(r, λ, β). On the one hand, by replacing all λ k by the smallest one λ r and using the tail estimates of Lemma 1, we have: p ( r, λ, β ) Pr λr r k=1 j=1 d ( ξ kj 2 β eβ r λ r λ k ) = ) rd/2

SDP Rank Reduction Yinyu Ye, EURO XXII 33 On the other hand, by dropping smallest term λ r in the summation p ( r, λ, β ) r 1 d Pr λ k ξ2 kj β r 1 = Pr p d k=1 j=1 k=1 j=1 λ k 1 λ ξ2 kj β r 1 λ r ( r 1, λ 1:r 1 1 λ r, ) β 1 λ. r

SDP Rank Reduction Yinyu Ye, EURO XXII 34 Sketch of Proof of the Theorem (Cont d) By unrolling the recursive formula, we have: p ( r, λ, β ) min 1 k r { ( eβ k λ k ) kd/2 } Let γ = p ( r, λ, β ) 2/d. Note that γ (0, 1). From the above, we have λ k ( kγ 1/k) 1 eβ for k = 1,..., r. Upon summing over k and using the fact that r k=1 λ k = 1, we obtain: eβ r k=1 1 kγ 1/k 1

SDP Rank Reduction Yinyu Ye, EURO XXII 35 Sketch of Proof of the Theorem (Cont d) r k=1 1 kγ 1/k 1 γ + r 1 1 tγ 1/t dt = 1 γ + log(1/γ) log(1/γ) r e t t dt. Then, one can show that the above implies that: 1 eβ 2 γ + log r Together with the assumption that eβ log r 1/5, we conclude that: as desired. 5eβ/2 γ = Pr ( H ˆX ) 2/d βtr(h)

SDP Rank Reduction Yinyu Ye, EURO XXII 36 SDP with an Objective Function Our result can be used for solving SDP with an objective function: min C X, subject to A i X = b i for i = 1,..., m; X 0. When X is optimal, there must be a ( S, ȳ) feasible for the dual such that S X = 0 (under a mild condition). One can treat S X = 0 as an equality constraint. Thus, the rounding method will preserve S ˆX = 0, that is, low rank ˆX is optimal for a nearby problem to the original SDP with the identical objective.

SDP Rank Reduction Yinyu Ye, EURO XXII 37 Question Is there deterministic algorithm? Choose the largest d eigenvalue component of X? In practical applications, we see much smaller distortion, why? Add a regularization objective to find a low rank SDP solution?