The Q Method for Second-Order Cone Programming

Size: px

Start display at page:

Download "The Q Method for Second-Order Cone Programming"

Domenic Jenkins
6 years ago
Views:

1 The Q Method for Second-Order Cone Programming Yu Xia Farid Alizadeh July 5, 005 Key words. Second-order cone programming, infeasible interior point method, the Q method Abstract We develop the Q method for the Second Order Cone Programming problem. The algorithm is the adaptation of the Q method for semidefinite programming originally developed by Alizadeh, Haeberly and Overton, [3] and []. We take advantage of the special algebraic structure associated with second order cone programs to formulate the Q method. Furthermore we prove that our algorithm is globally convergent. Finally Some numerical results are presented. Introduction The Second Order Cone Programming SOCP problem is a family of convex optimization problems more general than linear programming; however it is a special case of semidefinite programming SDP problem. In [] a review of this problem and its analytic, algorithmic and numerical properties, as well a number of its applications in various areas are presented. We refer the reader to this paper and references in it for further information. In this paper we present a new interior point algorithm which is an adaptation of the Q method for semidefinite programming introduced in [3] and []. However we will not rely on the knowledge of SDP algorithms and will develop the Q method for the SOCP problem from ground up. We only mention that in [8] we analyze the Q method for optimization problems over any symmetric cone which include both the cone of positive semidefinite matrices and the second order cone. In the following sections we first develop the algebraic foundations in. In 3 we develop the Q method and derive the associated Newton direction. The global convergence analysis of this method is given in Appendix A. In 4 numerical results for both randomly generated problems, and also problems arising from Steiner tree problems are presented. Finally in 5 a modification of the Q method along with additional computational results are presented. We also present global convergence of the modified algorithm in Appendix A. Notation For the most part we use the notation of []. In particular, boldface letters such as a, b, x, etc. represent column vectors and capital letters such as A, X, etc. represent matrices. To refer to the i th entry of the vector x we write x i. For transpose of matrix or vector A we write A. The The Institute of Statistical Mathematics, Minami-Azabu, Minato-ku Tokyo , JAPAN xiay@optlab.mcmaster.ca Department of Management Science and Information Systems and Rutgers Center for Operations Research, Rutgers, the State University of New Jersey, U.S.A. alizadeh@rutcor.rutgers.edu. Research supported in part by the U.S. National Science Foundation Grant #NSF-CCR and Office of Naval Research through contract #N

2 vector of all zeros is written as 0, and the vector of all ones as. If we need to join vectors and matrices, we adapt the syntax of MATLAB and similar programming languages. Thus, is used to join matrices and vectors in a row and ; is used to join them in a column. For example, x, y x a b = = x; y and = a, b; c, d y c d If a vector is partitioned into several blocks we write x = x ;... ; x r ; therefore, x i refers to the i th block and should not be confused with x i. Also x ij means the j th entry of the i th block. For norms we use. In particular, x, x and x are the Euclidean, the and the infinity norms of x, respectively. For a vector x Diagx is a diagonal matrix whose diagonal is x. In general, if X,..., X n are a sequence of matrices, then DiagX,..., X n is a block diagonal matrix made up of X i. The simple second order cone Q n+ is defined by Q n+ = {x 0 ; x x 0 x } Vectors x in Q n+ are indexed from zero, and the subvector consisting of x through x n is represented by x. We also right x for the subvector formed from entries x through x n. Also a matrix multiplied from left by x has its columns indexed from zero. The identity matrix is I and the matrix R n+ =, 0 ; 0, I; R is indexed from zero in both rows and columns. The vector e n+ = ; 0, a zero indexed n + -dimensional vector will play a fundamental role. For reasons to become clear shortly e is called the identity element. More generally, we call any direct sum of simple second order cones the second order cone and write Q. Thus, Q = {x ;... ; x r x i Q ni+ for i =... r for simple second order cones Q ni+} In this context, e = e n+;... ; e nr+, R = Diag R n+,..., R nr+, and when we say x Q it is understood that x is decomposed into blocks: x = x ;... ; x r consistent with Q and each x i Q ni. Foundations The standard second order cone programming problem is defined as a pair of mutually dual optimization problems Primal: min s.t. c x + + c n x r A x + + A n x r = b x i Q ni+ i =,..., r Dual: max b y s.t. A i y + z i = c i i =,..., r z i Q ni+ i =,..., r where Q ni+ are simple second order cones. This problem can also be written more succinctly as Primal: min s.t. c x Ax = b x Q Dual: max s.t. b y Ay + z = c z Q where Q is direct sum of Q ni+ and x, c and z can be decomposed into blocks x i, c i and z i as in In this paper we switch back and forth between and.

3 . Duality and complementarity It can be shown that under certain constraint qualifications the optimal values of Primal and Dual problems are equal. That is at the optimal point x; y; z we have c x b y = x z = 0. Furthermore, at the optimal point a series of complementary slackness conditions analogous to those in linear programming hold. To express this complementarity conditions we first define an algebraic binary operation as follows: For x, z R ni+, both of which are indexed from zero define x z x 0 z + x z 0 x z =. x ni z 0 + x 0 z ni Note that the operation is bilinear in x and z and it is commutative, that is x z = z x. Bilinear property of implies that there is a matrix X such that x z = Xz for every z. This matrix is nothing other than the arrow-shaped matrix representation of Lorentz transformation and is defined by x0 x Arwx = x x 0 I Thus, x z = Arwxz = Arwx Arwze, where e = ; 0. Notice that e acts as the identity element with respect to, that is e x = x e = x. In the case of multiple blocks where x = x ;... ; x r and z = z i ;... ; z r x z = x z ;... ; x r z r Also, for a multiple block x, we set Arwx = Diag Arwx,..., Arwx r. Now the complementary slackness theorem for SOCP can be formulated as follows: Proposition. Complementary slackness If x = x ;..., x r is optimal for Primal problem in and y; z with z = z ;... ; z r is optimal for Dual, then for i =,..., r. x i z i = 0 The significance of the complementary slackness theorem is that the equalities for primal and dual feasibility and complementarity 3 Ax = b A y + z = c x z = 0 constitutes a square system of equations which in the absence of various degeneracies, determine the optimal solution uniquely, see [].. The traditional interior point methods in a nutshell Let Q ni+ be a simple second order cone and let x Int Q ni+. Then the function bx = ln x 0 x is a barrier function defined on the interior of Q ni+. This means that bx is defined on Int Q ni+ and tends to infinity as x tends to any point on the boundary of Q ni+. Also observe that bx is strictly and strongly convex on Int Q ni+. When Q is the direct sum of simple second order cones Q ni+, define bx = i bx i. 3

4 In traditional interior point methods, Primal in is replaced by 4 min s.t. c x + µbx Ax = b with the understanding that x is restricted to the interior of Q. Applying Karush-Kuhn-Tucker conditions to 4 and simplifying we arrive at the following system of equations: 5 Ax = b A y + z = c x z = µe where e = e ;... ; e k and each e i = ; 0. Definition The set of points x µ ; y µ ; z µ that satisfy 5 as µ varies traverse a path called the central path. Note that x i µ z i µ = µ and thus x µ z µ = µr; therefore as µ tends to zero, the duality gap x µ z µ also tends to zero, and thus x µ, y µ and z µ tend to the optimal solution. It can be shown that for every x µ and z µ on the central path the matrices Arwx µ and Arwz µ commute, but outside of the central path this need not hold. The next step in solving SOCP problems in traditional interior point methods is to apply Newton s method to the nonlinear system of equations 5. This amounts to replacing x; y; z by x+ x; y + y; z+ z in 5, with x; y; z as our unknowns. In this approach, all nonlinear terms in the s are removed and the Newton system, that is the following linear system is solved: A x = b Ax A y + z = c A y z z x + x z = µe x z Most interior point algorithms apply some linear transformation to x and z before this step in order to simplify the analysis; refer to [] and the relevant references there for details. Note that the last equation can be written as Arwz x + Arwx z = µe x z Thus in block matrix form the system of equations 6 can be written as A A I x r p r p = b Ax y = r d where r d = c A y z Arwz 0 Arwx z r c = µe x z The algorithms proceed as follows: At each iteration the system 9 is solved for a given value of µ and a new solution x new = x + α x; y new = y + β y; z new = z + β y is calculated. Then µ is reduced by some factor, and x new ; y new ; z new is used in place of x; y; z in 9. The new system is solved again, and the process is continued until we get sufficiently close to the optimal solution. With judicious choice of step lengths, reduction factor of µ and scaling of x, y, z it can be shown that the resulting algorithm or class of algorithms converge to the optimal solution in polynomial time; see references in []. There are two issues with this class of algorithms just described. When solving the system 6 we will end up solving equations with the Schur complement r c A Arwz ArwxA 4

5 Without any scaling of x and z, Arwx and Arwz do not commute and the Schur complement is not a symmetric matrix for points not on the central path. With certain scaling applied to x and z at each iteration, it is possible to transform the problem to an equivalent one where Arwx and Arwz in the transformed space commute making the Schur complement symmetric and positive semidefinite; see [] for details and references. However, since the Newton method is applied to a different system at each iteration, technically it is not Newton method any more, and in particular the Newton system may converge to a singular system. The Q method is devised to ensure that we deal with a symmetric Schur complement at each iteration while keeping the Newton system nonsingular even at the optimum..3 Further algebraic properties of second order cones Before describing the Q method we need to develop some machinery that simplifies presentation. For the moment let us assume that we are dealing with a simple second order cone Q n+ and vectors are assumed to have just one block. As we saw the binary operation x z is commutative, however the associative law does not hold in general. Instead a weaker condition holds. Writing x = x x it is straightforward to verify that x x y = x x y This identity is equivalent to stating that Arwx and Arwx commute. An important fact is that the second order cone Q is the cone of squares under operation: 0 Q n+ = {x x R n+ } There are two elementary but very important identities that are satisfied by all vectors. The first one is the quadratic identity x x 0 x + x 0 x e = 0. The polynomial λ x 0 λ + x 0 x is called the characteristic polynomial of x and plays a similar role as the characteristic polynomials of symmetric matrices. The roots of this polynomials are x 0 ± x and are called the eigenvalues of x. Only when x = 0, that is when x is a scalar multiple of the identity element e, the two eigenvalues are equal. The second important identity is x = x 0 + x x x + x 0 x x. x Defining def c = def x, c = x x, x λ = x 0 + x, λ = x 0 x, it follows that can be written as 3 x = λ c + λ c. The relationships and 3 are spectral decomposition of x. They play much the same role as the eigenvalue decomposition of symmetric matrices. Observe that c c = 0, and c c = 0 c = c and c = c, c + c = e, c = Rc and c = Rc, c, c bd Q. 5

6 Any pair of vectors {c, c } that satisfies properties 4,5, and 6 is called a Jordan frame. It can be shown that every Jordan frame {c, c } is of the form, that is Jordan frames are pairs of vectors ; ±c with c =. When a vector has distinct eigenvalues that is when x 0, the Jordan frame is determined uniquely. Therefore, the only vectors whose associated Jordan frames are ambiguous are multiples of the identity vector e. For these vectors both eigenvalues are equal, and it turns out that we can take any Jordan frame as their associated Jordan frame. This issue will cause a slight difficulty in our algorithm which we will address when we discuss the Q method. c x 0 e x c - -x - x x x Figure : Construction of Jordan frames from a vector x. We can also define the trace and determinant of x: trx = λ + λ = x 0 and detx = λ λ = x 0 x. Thus the barrier function defined earlier can be written as bx = ln det x. From the spectral decomposition we can prove that the binary operation is power associative, that is as long as we are multiplying many copies of x by itself, the order of multiplication does not matter, Thus for any nonnegative integer p we can write x p unambiguously. Another consequence is that the second order cone consists of those vectors x whose both eigenvalues are nonnegative and the interior of Q consists of vectors whose both eigenvalues are positive. Using the spectral decomposition one can extend any real valued continuous functions f defined on R to vectors: x = λ c + λ c : fx def = fλ c + fλ c whenever fλ, fλ are defined In particular, when x = λ c + λ c, we may define 9 x def = λ c + λ c when λ 0 and λ 0 It is easily verified that x x = e. Notice that the condition x z = e for a given x may be satisfied by an infinite number of vectors. In other words for the binary operation there is no unique inverse in general. Therefore when we say inverse we mean the one defined by 9. Also note that the definition of the central path in 5 is not ambiguous. It turns out that the only 6

7 vectors x for which there are more than one z with x z = µe are those with x 0 = 0. Since in 5 we assume that x IntQ it follows that x 0 > 0 and z is unique. As we saw earlier, for two vectors x and z, it is not necessarily the case that the matrices Arwx and Arwz commute. It turns out that the case when Arwx and Arwz commute is an important property that can be exploited in design of interior point algorithms, including the Q method. Therefore we present the following Definition Two vectors x and z operator commute if Arwx and Arwz commute. However two matrices commute if and only if they share a system of common eigenvectors. Moreover, eigenvalues and eigenvectors of Arwx are described by the Jordan frame associated with x. Lemma. Let the spectral decomposition of x = λ c + λ c, where λ, and c, are as in. Then eigenvalues and eigenvectors of Arwx are given as follows: i. x 0 + x is an eigenvalue with corresponding eigenvector c. ii. x 0 x is an eigenvalue with corresponding eigenvector c. iii. All other eigenvalues are equal to x 0 and their corresponding eigenvectors are any set of orthonormal vectors, orthogonal to c and c. It is fairly simple to verify this lemma. As a direct consequence we get Theorem. Two vectors x and z operator commute if and only if they share a common Jordan frame. In other words if x and z operator commute, then there is a Jordan frame {c, c } such that x = λ c + λ c and z = ω c + ω c. On the central path defined by 5 the vectors x and z operator commute since x = µz for µ > 0. Similarly it is not hard to show that at the optimum x and z operator commute. Therefore, the complementary slackness theorem may be restated as: Theorem. x = x ;... ; x r is primal optimal and y, z with z = z,..., z r is dual optimal solutions of if and only if i. there is exist Jordan frames {c i, c i } for i =,..., r and pairs of real numbers λ i, λ i and ω i, ω i such that x i = λ i c i + λ i c i and z i = ω i c i + ω i c i, for i =,..., r, and ii. λ ij ω ij = 0 for i =,..., r and j =,. It is possible to apply an orthogonal transformation to any Jordan frame {c, c } to change it to another Jordan frame {f, f }. Such transformations are essential for the Q method. More specifically if {c, c } and {f, f } are two pairs of Jordan frames, then we wish to find a linear transformation that maps c to f, c to f and the second order cone Q n+ back to itself. However, in general there may be many such transformations. To make this choice unique we choose to make this transformation the unique rotation in the plane spanned by c and f fixing the identity element e. This rotation will also map c, to f and thus the frame {c, c } to {f, f }. In particular when we use the standard frame {d, d }, defined by setting d = ; ; 0 and d = ; ; 0, then the rotation that maps the frame {c, c } to {d, d } is 0 Q x def = 0 0 x 0 x x x x xx 0 x I x x +x = c c 0 c I 4cc +c.

8 c x 0 d d e c c d c θ d x x Figure : Regardless of the dimension, a two dimensional rotation θ in the plane spanned by c and d can map a frame {c, c } to frame {d, d }. where x = x ;... ; x n, and similarly for c. Also c is the entry of c indexed by noting that c = c 0 ; c ;... is indexed from 0. Observe that this transformation really depends on the Jordan frame of x rather than x itself. Thus all vectors sharing the same Jordan frame will have the same Q x. We also note two special cases. First, if c = d, c = d, then Q x = I. Second if c = d, c = d then Q x = Q flip, where Q flip def = I Thus, Q flip corresponds to rotation of π which effectively maps d to d and d to d. It is well-known that if S is a skew-symmetric matrix then the matrix exps = k=0 Sk k! is orthogonal. However we are not interested in all orthogonal matrices, but only those that are of the form Q x for some x. It turns out that a particular subset of skew-symmetric matrices generates all Q x through the matrix exponential map. More precisely, let 0 0 L def = 0 c c c = c 0 ; c ; c and {c, Rc} is a Jordan frame 0 c I 4cc +c Now define the set l l def = S s = s s R n, 0 s 0 and l π = {S s l s < π}. Then we have Proposition. The exponential map exp maps l to L and l π to L\{Q flip }; the latter is one-toone. 8

9 Proof: For any S s l, Ss = 0 s s 0, 0 0 ss and Hence for s 0, 3 exps s = I + S s s [ S k+ s = s s k S s, S k+ s = s s k S s. i= i+ s i i! ] + Ss s [ i=0 i s i+ i+! ] = I + cos s S s s + sin s s S s. When s = 0, then S = 0 and exp0 = I. Now let c R n+ such that {c, Rc} is a Jordan frame; thus in particular, c = /. Assume c < /. Then there is a unique 0 < α < π such that cos α = c and sin α = c. Since c 0, let s = αc, then expss = Q c. On the other hand, S s l, S s 0, let 4 c = ; cos s ; sin s s Then exps s = Q c L. For c = / Q c = I and for c = / note that as s π, the matrix Q c tends to Q flip. Corollary. For any Q in L, QQ = Q. Proof: It is well-known that the set of orthogonal transformations that fix the coordinate x 0 and act on x of vectors x Q are part of the automorphism group of Q, see [4]. And L is a subset of the orthogonal transformations that fix e. Instead of the exponential map we may use the Cayley transform to map skew symmetric matrices to orthogonal matrices: cays def = I + I S S For the special case of S s we write cays for cays s. Proposition.3 Under the Cayley transform l is mapped onto L. Furthermore, this map is oneto-one. Proof: When s <, all eigenvalues of S s are smaller than two in modulus. Therefore, the power series expansion of the cays is defined and can be expanded by the following power series: 5 + I + S si S s = I + k= s k S k s = I s S s + 4+ s S s. Since the right hand side of 5 is well defined even for s, we use the right hand side of 5 as the definition of Cayley transformation for any S l It is not hard to see that given S l, the Cayley transformation of S is Q c where 6 c = ; 4 s 4+ s ; 4s 4+ s Next let Q c L, where {c, Rc} is a Jordan frame, c = / and c. Set s = c c. Then 0+ cays = Q c. Note that as s and s s, cays converges to Q d where d = ; ; 0. Propositions. and.3 show that the tangent space to L at the identity I is l.. 9

10 3 Description of the Q Method As stated earlier, the traditional interior point methods for SOCP are based on applying Newton s method to the system of equations 5. Assume that x; y; z is our current estimate of the optimal solution. In general there is no reason for x and z to operator commute, even though the optimal x and z do. In many traditional interior point algorithms a transformation is applied to x and z so that in the transformed space they operator commute. The price paid for this operation is that the algorithm deviates from Newton s method. In particular the Schur complement A Arwz ArwxA may be undefined or singular at the optimal point. 3. Change of variables by spectral decomposition The Q method imposes the requirement that x and z should operator commute all the time. To achieve this, we make a change of variables by replacing blocks x i and z i with their spectral decomposition. Furthermore, we require that in their spectral decomposition x i and z i share a common Jordan frame {c i, c i }. Thus, if x = x,... ; x r and z = z ;... ; z r where each x i and z i are in the simple second order cone Q ni+, then λ i c i + λ i c i x i ω i c i + ω i c i z i With this change of variables the system 5 becomes 7 r A i λ i c i + λ i c i = b i= A i y + ω i c i + ω i c i = c i for i =,..., r λ i ω i = λ i ω i = µ for i =,..., r The difficulty with this change of variables is that we have to somehow ensure that {c i, c i } is a Jordan frame. To this end we write c i = Q ci d i, c i = Q ci d i, where {d i, d i } is the standard frame, and Q cij L. Next we express each Q ci by either exps si or cays si where S si l. With these modifications the system 7 can be written as A i exps si λ i d i + λ i d i = b for i =,..., r 8 3. The Newton direction i A i y + exps si ω i d i + ω i d i = ci for i =,..., r λ i ω i = λ i ω i = µ for i =,..., r The next step is to apply Newton s method to 8. First, for multiple block system, as usual, we set λ = λ ; λ ;... ; λ r ; λ r, ω = ω ; ω ;... ; ω r ; ω r, s = s ;... ; s r, S s = DiagS s,..., S sr, and Q = Q,..., Q r. The same applies to λ, ω, etc. To derive the the next approximation to the optimal solution using Newton s method we make the following substitution in the primal and dual feasibility and complementarity relations 8: λ λ + λ y y + y ω ω + ω Q Q exps s or Q Q cays s 0

11 To update Q i we may use cays i instead of the exponential map. Once we make this substitution in 8 we drop all nonlinear terms involving s i, λ, y, and ω and solve the resulting linear system in these variables. For exps s we simply write I + S s and drop higher order terms of its power series expansion. Since cay and exp have the same first two terms in their power series expansion we make the same replacement if we are using the Cayley transform. After simplification the primal feasibility, dual feasibility and complementarity conditions result in the following linear Newton system: λ i + λ i i B i λ i λ i λi λ = r p i s 9 ω i + ω i B i y + ω i ω i ωi ω = Q i r d i for i =,..., r i s λ ij ω ij + ω ij λ ij = µ λ ij ω ij for i =,..., r, j =,, where B i = A i Q i r p = b Ax r d = c A y z r c = µ λ ω Here λ ω is a vector which is the component-wise product of entries of λ and ω. Also r d = rd ;... ; r d r. Note that when x is primal-feasible then rp = 0. Similarly when y; z is dualfeasible then r d = 0. In general even if x; y; z is feasible the solution to the Newton system 9 is not necessarily primal or dual feasible direction. This is due to the fact that the change of variables has turned both primal and dual feasibility equations nonlinear. In contrast, in traditional interior point methods, if x; y; z is primal respectively dual feasible, then the Newton direction is primal respectively dual feasible. Let P = DiagP,..., P r where P i =, ;,. Note that P = P. Finally let Λ = Diagλ and Ω = Diagω. After collecting the first two columns of each B i into B, the remaining columns of each B i into ˆB, and splitting Q T r d accordingly to r d and ˆr d, we can rewrite the Newton system as P ω + B T y = r d, 30 ω i ω i s i + ˆB i T y = ˆr d i i =,..., n, n λi λ i BP λ + ˆB i s i = r p, i= Λ ω + Ω λ = r c. def Let E i = ωi ωi def I, E = DiagE,..., E r, D i = λi λi I, and D = DiagD,..., D r. Then the solution to the Newton linear system is given by y = BP Ω ΛP T BT ˆBDE ˆBT r p BP Ω r c ˆBDE ˆr d + BP Ω ΛP T r d ω = P r d 3 B T y λ = Ω ˆr r c Λ ω s = E d ˆB T y.

12 Comparison of Newton systems in the Q and traditional methods. In traditional methods our variables are x, y, z. When we have r blocks, i n i = n and A has m rows then we will have a total n + m variables. In the Q method for each block of size n i we have λ i, λ i, ω i, ω i and s i, for a total of n + r + m variables.. The Schur complement BP Ω ΛP T BT ˆBDE ˆBT is a symmetric positive definite matrix. This can be seen by writing the Schur complement as AQDiag P Ω i Λ i P T, D i E i Q T A T, The traditional interior point methods require a scaling in order to get a symmetric Schur complement. 3.3 Centrality requirement To ensure that the sequence of iterates in the Q method converges to an optimal solution when one exists, we define a region that includes the optimal solution and require that each new approximation remain in this region. The region or neighborhood N γ c, γ p, γ d depends on several parameters. Let ɛ p, ɛ d, and ɛ c be our tolerance for, respectively, primal infeasibility, dual infeasibility, and duality gap. In Appendix A we will explain the details of dependence of γ p, γ d, γ c on ɛ p, ɛ d, ɛ c and other parameters derived from the data. For now let us simply assume that the γ s are given. Then the neighborhood N is defined as { N γ c, γ p, γ d def = λ, ω, y, Q λ R r, ω R r, y R m, Q L, λ > 0, ω > 0, λ λ ij ω ij γ ω c r i =,..., r, j =, and λ AQ AQ ω γ p P λ b or P λ b ɛp and λ A ω γ d y + Q P ω c or A y + Q P } ω c ɛ d The first inequality is the centrality condition; it ensures that none of the products λ ij ω ij approach zero much faster than others. The second and third inequalities guarantee that the complementarity is not attained before primal or the dual feasibility. Obviously, when γ c, γ p, γ d γ c, γ p, γ d, N γ c, γ p, γ d N γ c, γ p, γ d, and γ c,γ p,γ d >0 N γ c, γ p, γ d = {λ, ω, y, Q: λ > 0, ω > 0}. Therefore, if λ k, ω k, y k, Q k N γ c, γ p, γ d is a sequence of points where λ k ω k approaches zero, then λ k, ω k, y k, Q k necessarily approaches a point in the optimal solution set of. The Q method is summarized in Figure 3.3. Here we have used the Cayley transformation, however, updating the Q i through the exponential map can be achieved in a similar way; and the analysis presented in Appendix A carries over with slight modifications of constants. One issue that we need to address here is that if λ k i = λk i or ωk i = ωk i then xk i is a multiple of the identity vector. In this case the Jordan frame is not unique. Indeed, as was stated earlier, every Jordan frame can be taken as the associated frame of multiples of the identity vector. The result is that the vector s may not be uniquely determined, and in fact the Newton system may be singular. To avoid this situation, we take care to ensure that eigenvalues of primal and dual blocks are distinct

13 Algorithm Choose 0 < σ < σ < σ 3 < and Υ > 0. To start from an arbitrary point λ 0, ω 0, y 0, Q 0, one may select 0 < γ c <, γ p > 0, γ d > 0, so that λ 0, ω 0, y 0, Q 0 N γ c, γ p, γ d. Do. Set µ = σ λ k ω k r.. Compute the search direction λ, ω, y, s from Choose step sizes α, β, γ and set λ k+ = λ k + α λ, y k+ = y k + β y, ω k+ = ω k + β ω, Q k+ = Q c k+ where c k+ is computed from 4 for expγ c S sk and 6 for cayγ c s k 4. k k +. Until either: r k p < ɛ p, r k d < ɛ d, and λ k ω k < ɛ c ; or: λ k, ω k > Υ. End eigenvalues of different blocks may be equal and this will not cause any difficulty. By careful choice of step sizes we can always ensure that λ k+ i > λ k+ i > 0 and ω k+ i > ω k+ i > 0 for each block i. First, if ω k+ i > ω k+ i, or λ k i > λk i, then we simply swap the the corresponding Jordan frame. This means that c i, c i is replaced by c i, c i. This transformation geometrically is equivalent to a rotation of π and algebraically equivalent to multiplication by Q flip. Now let ωi k > ωk i. Only when ω i ω i and β = ωk i ωk i ω i ω i, is it possible that ωi k + β ω i = ωi k + β ω i. In this case, instead of taking a uniform step size of length β we simply take a smaller step size β i for each block i Thus β i are not necessarily equal. For instance β i can be set to at least β. In this way we guarantee that in each block eigenvalues are distinct for both the x i and z i. Note that Algorithm does not require a large number of numerical operations to calculate updates of Q i, since we will use 5 for the Cayley transform and 3 for the exponential map. Let ˆα k be the maximum of α [0, ], so that for any α [0, α], and λ k + α λ, ω k + α ω, y k + α y, Q k I + α SI α S N, λ k + α λ ω k + α ω [ α σ ] λ k ω k. The step sizes α 0, ], β 0, ], γ 0, ] are chosen so that λ k+, ω k+, y k+, Q k+ N γ c, γ p, γ d, λ k+ ω k+ [ ˆα k σ 3 ] λ k ω k. Since σ < σ < σ 3, the primal and dual step sizes are not necessarily equal. One important property of the Q method is that in the absence of primal and dual nondegeneracy and presence of strict complementarity see [] for definitions the Newton system 9 is nonsingular 3

14 at the optimum. A similar result holds for the so-called AHO method for SOCP see [] but not for many other traditional methods. We will not prove nonsingularity of the Newton system here since it is proved in a more general setting in [8]. In Appendix A we prove that with appropriate choice of step lengths the algorithm converges to an ɛ p, ɛ d, ɛ c -optimal solution in finite number of iterations. 4 Numerical Results To test the Q method, we have implemented a version of Algorithm in MATLAB. Below we present the results of our test on,000 randomly generated problems with known solutions. We test the algorithm using the Cayley transform for the most part. For the step sizes we set α = min, τα, β = min, τβ, γ = αβ, where α is the maximum possible step size such that x + α x Q. Similarly β is the maximum step size such that z + β z Q. These choices are in line with implementation of traditional methods. 4. Test results on randomly generated problems We use x 0 i = ; ; 0, s0 i = ; ; 0, y0 = 0 as starting point. We pick σ = 0.5, τ = Our code reduced the l norm of primal infeasibility, l norm of dual infeasibility, and the duality gap to less than 5.0e for all problems. The data for A, b and c were randomly generated between -0.5 and 0.5. Thus, in most cases the magnitude of the solutions were comparable and there is no need to use relative accuracy. bk dimension of each block type of each block m r p0 r d 0 it 0 [,,,,,,,,,] [b,i,o,b,i,b,o,i,i,b] [0,0,0,0,0,0,0,0, [b,o,i,b,b,i,o,b,b, ,0] o] 0 [3,0,8,9,,4,6,3,4,8] [b,i,o,b,i,o,i,i,b,o] [0,0,8,9,,5,6,3,4,8] [b,i,b,i,i,o,b,i,b,o] [0,5,5,5,5,5,5,5, [b,i,b,i,i,o,b,i,b,o] ,5] [0,0,0,0,0,0,0,0, [b,o,i,b,b,i,o,b,b,o, ,0,0,0] b,i] 5 [0,0,0,0,0,0,0,0 [b,o,i,b,b,i,o,b,b,o, , 0,0,0,0,0,0] b,o,i,i,o] 5 [5,5,5,5,5,5,5,5, [i,o,b,i,i,b,o,i,b,b, ,5,5,5,5,5,5,] i,o,b,b,o] 0 [0,0,3,0,4,0,3,8,6, [b,o,i,b,b,i,o,b,b,o, ,9,,,3,,3,5,,0, b,b,i,o,i,b,b,b,i,b] 8] 0 [0,0,0,0,0,0,0,0 [b,o,i,b,b,i,o,b,b,o, ,0,0,0,0,0,0,0, b,b,i,o,i,b,b,b] 0,0,0,0] Figure 3: Computational test results of performance of the Q method for 000 randomly generated SOCP s. For each of 0 different classes the size and type of each block along with the average number of iterations for the 00 generated instances of that type is given. For our tests, we generated 00 random instances of similar classes of SOCP problems. In each class, the number of blocks r, dimensions n i for i =,..., r, number of rows of matrix A, and the type of each block at the optimum is fixed. Each block can be one of three types: Type i: x i Int Q ni+, z i = 0, Type o: x i = 0, z i Int Q ni+, 4

15 Type b: x i, z i bd Q ni+. In the table in Figure 4. bk stands for the number of blocks, type of each block is a string made of i, o and b, and m is the number of constraints. r p 0 and r d 0 give the average l norm of initial primal and dual infeasibility for the 00 instances in each row, and it is the average number of iterations for the 00 instances. Every instance was solved to optimality in fewer than 50 iterations, indicating the robustness of the algorithm. Notice that the problem type and size have little effect on the average number of iterations, which is typical of most interior point methods for SOCP. In Figure 4. a plot indicating progress of one run of the algorithm is presented. The logarithm of primal and dual feasibility as represented by euclidean norm of r p and r d and duality gap are plotted against iteration count. The plot shows that, at least in this case, the order of convergence Convergence in a typical run of Algorithm 4 primal infeasibility dual infeasibility duality gap log 0 gap iteration is well above. We have observed that this is a typical behavior for almost all runs. It is possible to adapt Mehrotra s predictor-corrector method developed for linear programming in [7] to the Q method for SOCP. However, once this predictor-corrector method was added to our code, we observed an inconsistent behavior. In most cases the number of iterations was reduced by as much as one third. In other cases though the number of iterations was increased significantly and some times, no convergence was attained at all. We believe that adaption of the predictor-corrector method for the Q method requires further research and experimentation. Finally we have tested the algorithm using the exponential map instead of the Cayley transform. The results were entirely compatible with Cayley transform version of the algorithm. 4. Testing the Q method on the Steiner minimal tree problem We also tested the Q method on problems arising from euclidean minimal Steiner tree problems. Details can be found in the paper of Xue and Ye, [9]. Here we briefly mention that in the Steiner tree problem we are searching for a tree consisted of a set of N given points called regular points and possibly additional vertices called Steiner points such that the sum of euclidean length of edges in the tree is minimum. The problem may be decomposed into two subproblems. In the first subproblem we need to find a particular topology, that is a set of Steiner points. In the second subproblem, given the topology we need to find the optimal position of the Steiner points. The first 5

16 subproblem finding a particular topology is combinatorial in nature, the second can be formulated as an SOCP problem. It can be shown that in the optimal tree the number of Steiner points is at most N and the degree of each steiner point is three. We tested our code on Example in [9]. We use the same starting points as we did in the randomly generated SOCP s of the previous section. The progress of the Q method in each iteration is summarized in Figure 4. it network-cost r p r d gap e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e-0 Figure 4: Progress of the Q method on one instance of a Steiner tree problem with a given topology. In contrast to the algorithm used in Xue and Ye [9] we start ours from an infeasible point. We achieve the same accuracy as Xue and Ye did at iteration number Taking advantage of sparsity Our implementation of Algorithm in MATLAB does not take advantage of sparsity in the input matrix A. Here we briefly state how sparse input matrices A can be exploited and compare the situation with traditional interior point methods. Let both A and AA be sparse matrices. It is well-known that in interior point methods for linear programming the Schur complement in all versions are of the form AF A for some diagonal matrix F. Also the sparsity pattern of AA and AF A are identical. The most difficult part for a sparse Cholesky factorization is to find a good order for pivoting. Once such an order is found for AA, the same ordering may be used for all subsequent iterations for FACTORING AF A. 6

17 The situation for SOCP is more involved, see [] for a detailed analysis for the case of traditional interior point methods. However, in all methods we have the fact that the Schur complement at each iteration is of the form AF A + L where L is of rank r, r or 3r, and is symmetric for most methods except for the so-called AHO method. We now show that the Schur complement for the Q method is also of the same format. For convenience let us temporarily assume that r =. In this case, as stated in 3, the Schur complement is given by AQDiag P Ω i Λ i P T, D i E i Q T A T, The Cholesky factorization of this matrix is the main computational effort at each iteration. However it is easily verified that the matrices Q c L are all of the form of I + rank- matrix. Furthermore, with some algebraic manipulation it can be verified that the QDiag P Ω i Λ i P T, D i E i is also sum of a diagonal matrix with positive entries and a positive semidefinite rank- matrix. Thus, the Schur complement is of the form AF A +a rank- matrix. With r blocks, the Schur complement is AF A + a rank-r matrix. In both cases F is a diagonal. As in linear programming, sparse factorization of AA will yield sparse factorization of ADA. Sophisticated numerical techniques are available to extend this sparse factorization to ones with low rank updates, see [] for details and references. 5 A Modified Q Method In this section, we give a variant of the Q method for SOCP, which has similar properties and convergence results as Algorithm. 5. The Newton system We can write dual and primal feasibility and complementarity relations alternatively in the following form Q T 3 Q P ω + A y = c, AQ P λ = b, λ ω = µ where P = Diag P,..., P r where each P i = d i, d i, made up of the standard frame in R ni. Thus each block P i is an n i matrix with only the top part nonzero. Now in 3 The matrix Q appears only as Q P. Thus for each block i only columns indexed zero and of each Q i is involved. Since each Q i is of the form Q c L, the 0 th column is all zeros except the first entry. The column indexed is of the form 0; c where {c, Rc} is a Jordan frame. In particular, defining q = c it follows that the only relevant part of Q i in 3 is the vector q i, of the column indexed. Furthermore q i q i =. Partitioning A i into A i = A i 0, Āi we write Ā def = Ā,, Ān. Observe that xi and z i may be written as 33 x i = λ i+λ i ; λi λi q i, zi = ω i+ω i ; ωi ωi q i. We set q i = ; 0 when x i = 0. Then the decomposition 33 is unique when λ i λ i and ω i ω i. We now substitute 33 into 3, and add constraints q i q i = for i =,..., r. The 7

18 residuals r p, r d and r c are defined as before. Let r d k i be the first element of r d k i, and r d k i the remaining subvector. Then, similar to Algorithm we make the substitution λ λ + λ, ω ω + ω, and y y + y. But now, instead of exponential map or the Cayley transform we simply substitute q q + q. After making these substitutions in 3 and dropping nonlinear terms in s we get the linear Newton system ω i+ ω i + A i 0 y = r d k i 34 ω i ω i q k i + ωk i ωk i q i + Āi y = r d k i n λi+ λ i λi λi A i 0 + Āi q k i + Āi λk i λk i q k i = r k p i= q k i qi = 0 i =,..., n Λ k ω + Ω k λ = r k c. where, as before Λ = Diagλ and Ω = Diagω. The modified algorithm is similar to Algorithm with the exception that the new q i should be normalized: q k+ i = qk i +γ qi q k+γ qi. i 5. The solution to modified Q method Let def u i = λi ω i + λi ω i, def v i = λi ω i λi ω i ; E i and D i are defined as that in section 3, but with dimensions adjusted properly. Omitting the superscript k, the Schur complement M can be written as M = n i= Ai v i 0 q i Ā i + Āiq i A i 0 + ĀDiag u i, D i E i + ui I q i q i D i E i Ā. The solution to 34 is y = M r p + n i= A i 0 r c i ω i Āiq i r c i + u i A i 0 r d i + v i Ā i q i r d i + v i A i 0 q i r d i + u i + D i E i Āiq i q i r d i D i E i Ā i r d i ω i = q i r d i q i r d i + q i Ā i A i 0 y q i = E i rd i q i r d i + q i ω i Ā i q i A i 0 y ω i = r d i A i 0 y ω i λ = Ω r c Λ ω. Properties of the solution ω i. The number of unknowns in the modified Q method is the same as the Q method: Instead of the vector s i we now have q i in each block i.. a The dimension of each block M i of M is one smaller than the Schur complement of the traditional methods. 8

19 b To keep each iterate in Q, we only need to compute α max{ λ i /λ i : λ i < 0}. In comparison, in traditional methods we need to solve a quadratic equation for each block. c When λ i > λ i > 0 and ω i > ω i > 0, we have u i > v i > 0, and D i E i is a positive scalar matrix. Therefore it is easily verified that each M i is symmetric positive definite and we may use Cholesky factorization to solve systems of equations coefficient matrix M. 3. Accuracy and stability. It can be shown that at the optimum the Newton system 34 is nonsingular if primal and dual nondegeneracy and strict complementarity holds. This in turn implies that the algorithm converges quadratically near the optimal solution. 4. We can show that with careful choice of parameters the modified Q method converges to the optimal solution in finite number of iterations. This is argued in Appendix A. 5.3 Numerical examples We have implemented the basic algorithm for the modified Q method in MATLAB and have tested on,000 randomly generated problems. Step sizes α, β, γ are chosen similar to our implementation of Algorithm. We applied our modified Q code to the same set 0 classes of SOCP problems each with 00 instances. The results are summarized in Figure 5.3. Although the algorithm finds an problem r p0 r d 0 it ɛ p, ɛ d, ɛ c -optimal solution for all of the, 000 problems, a small portion of them need more than 00 iterations to reach the required accuracy. Finally we applied our implementation of the modified Q method to the same instance of the Steiner tree problem we used for our implementation of Algorithm. The progress of the algorithm is reported in Figure 5.3 The performance is similar to Algorithm. 6 Conclusions We have developed and analyzed the Q method and a modified variant for SOCP. Preliminary numerical results show that the algorithm is robust. We conclude by mentioning some future research directions. First, it remains open whether the Q method has polynomial time iteration complexity. The finite convergence result presented in this paper depends on the condition number of the input matrix A. For practical and stable implementation of the method, Mehotra s predictor-corrector method should be adjusted to perform more robustly. Finally, a serious implementation should take advantage of sparse factorization of the Schur complement. 9

20 it network-cost r p r d gap e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e-06.49e e e e e e e e e e e-0 Figure 5: Progress of the modified Q method for one instance of a Steiner tree problem with given topology. A Appendix: Convergence Analysis of the Q and Modified Q Methods A. Convergence analysis Global convergence of Algorithm can be established by adopting ideas of Kojima, Megiddo and Mizuno, [6] for infeasible interior point linear programming. Theorem A. If Algorithm does not stop after a finite number of steps, the smallest singular value of the Jacobian of 30 must converge to zero. Proof: Let J be the Jacobian of 30. The key to the proof is to show that the step sizes are bounded below. Assume the algorithm doesn t stop after a finite number of steps. Define ɛ def = minɛ c, γ p ɛ p, γ d ɛ d. Then for each iteration k, λ k ω k ɛ, and λ k ; ω k Υ, otherwise, the termination criterion will force the algorithm to stop. Boundedness of y k is explained by the dual feasibility constraint. Also Q k is orthogonal and the set of orthogonal matrices is compact. Assume the smallest singular value of J does not converge to zero. Then there must exist a positive scalar d, and a subsequence {λ mi, ω mi, y mi, Q mi } i= such that for all m i, the largest singular value of J is at most d. Both the right hand side and the left hand side of 30 0

21 depend continuously on λ, ω, y, Q. Since the set λ, ω, y, Q derived from the Newton system forms a compact set the Newton direction in 30 is a continuous function of λ, ω, y, Q. Therefore, solution of 30 is uniformly bounded for the subsequence {m i }. Hence, there exists a positive constant η, such that the search direction computed by 30 satisfies λ ij ω ij γ c n λ ω η, λ ω η, λ η, ω η, s i η, for i =,..., n; j =,. Note that S i = s i for i =,..., n, S = max i s i. For k {m i } i=, adopting the notation used in [6], we define f ij α def = [ λ k ij + α λ ij ] [ ω k ij + α ω ij ] γ c n λk + α λ ω k + α ω, g p α def = λ k + α λ ω k + α ω AQ γ k p I + α SI α S P λ k + α λ b, g d α def = λ k + α λ ω k + α ω A γ d y k + y + Q k I + α SI α S P ω k + α ω c, hα def = [ α σ ] λ k ω k λ k + α λ ω k + α ω. Therefore, ˆα k is the largest α satisfying the following inequalities: f ij α 0 i =,..., n; j =,, g p α 0 or AQ k P λ k b ɛ p, g d α 0 or A y k + Q k P ω k c ɛ d, hα 0. Next, we give a lower bound for the set {ˆα k }. Recall that each block of the Cayley transformation is given by 35 I + αs i α3 s i 4 + α s i S i + α 4 + α s i S i. The inequalities for f ij and h can be obtained by similar arguments as those in [6]: ɛ f ij α σ n γ cα ηα, hα σ σ ɛ α ηα. Next, we estimate g p α and g d α. Note that the first column of Si is zero; and the only nonzero entry of its second column is s i s i. Let Q k be the matrix consisting of only the nd column of each block of Q k, λ be the vector made of all of the larger of the two eigenvalues of x i and λ be the vector made up of all the smaller of the two eigenvalues of x i, for i =,..., r. When Recall that S, the spectral norm of S, is the given by the modulus of the largest-modulus eigenvalue of S.

22 λ k ω k AQ k γ p P λ k b, 36 g p α αλ k ω k + ασ λ k ω k + α λ ω γ p α AQ k P λ k b [ AQ γ p α k S P λ + max i s i AQ k 4 λ k λ k + α λ α λ ] + α max i s i AQ k S 4 P λ k + α λ ασ ɛ α η γ p α A η 3/ + 4 ηυ + η + 4 η3/ Υ + 4 η5/. The first inequality follows from the Newton linear system for the search directions 30, the fact that λ k ω γp AQ k P λ b, and the Cayley transform 35. The second inequality follows from the bound on the variables and search directions, the fact that α, and If AQ k P λ k b ɛ p, then P λ = λ λ. 37 AQ k I + α SI α S P λ k + α λ b α AQ k P λ k b [ AQ + α k S P λ + max i s i AQ k 4 λ k λ k + α λ α λ ] + α max i s i AQ k S 4 P λ k + α λ k αɛ p + α A η 3/ + 4 ηυ + η + 4 η3/ Υ + 4 η5/. Therefore when ɛ p α, A η 3/ + 4 ηυ + η + 4 η3/ Υ + 4 η5/ we must have AQ k+ P λ k+ b ɛ p. Next, we will consider the dual constraints. When λ k ω k A γ d y k + Q k P ω k c, 38 g d α αλ k ω k + ασ λ k ω k + α λ ω [ Q γ d α A y k + Q k P ω k c γ d α k S P ω ] + Q k S P ω k + α ω + α max i s i Q k S 4 4 P ω k + α ω ασ ɛ α η γ d α η 3/ + 4 ηυ + η + 4 η3/ Υ + 4 η5/.

The Q Method for Symmetric Cone Programmin

The Q Method for Symmetric Cone Programming The Q Method for Symmetric Cone Programmin Farid Alizadeh and Yu Xia alizadeh@rutcor.rutgers.edu, xiay@optlab.mcma Large Scale Nonlinear and Semidefinite Progra