An inexact accelerated proximal gradient method for large scale linearly constrained convex SDP

Size: px

Start display at page:

Download "An inexact accelerated proximal gradient method for large scale linearly constrained convex SDP"

Doris Booker
5 years ago
Views:

1 An inexact accelerated proximal gradient method for large scale linearly constrained convex SDP Kaifeng Jiang, Defeng Sun, and Kim-Chuan Toh March 3, 2012 Abstract The accelerated proximal gradient (APG) method, first proposed by Nesterov for minimizing smooth convex functions, and later extended by Bec and Teboulle to composite convex objective functions, and studied in a unifying manner by Tseng has proven to be highly efficient in solving some classes of large scale structured convex optimization (possibly nonsmooth) problems, including nuclear norm minimization problems in matrix completion and l 1 minimization problems in compressed sensing. The method has superior worst-case iteration complexity over the classical projected gradient method, and usually has good practical performance on problems with appropriate structures. In this paper, we extend the APG method to the inexact setting where the subproblem in each iteration is only solved approximately, and show that it enjoys the same worst-case iteration complexity as the exact counterpart if the subproblems are progressively solved to sufficient accuracy. We apply our inexact APG method to solve large scale convex quadratic semidefinite programming (QSDP) problems of the form: min{ 1 2 x, Q(x) + c, x A(x) = b, x 0, where Q, A are given linear maps and b, c are given data. The subproblem in each iteration is solved by a semismooth Newton-CG (SSNCG) method with warm-start using the iterate from the previous iteration. Our APG-SSNCG method is demonstrated to be efficient for QSDP problems whose positive semidefinite linear maps Q are highly ill-conditioned or ran deficient. Department of Mathematics, National University of Singapore, 10 Lower Kent Ridge Road, Singapore (aifengjiang@nus.edu.sg) Department of Mathematics and Ris Management Institute, National University of Singapore, 10 Lower Kent Ridge Road, Singapore (matsundf@nus.edu.sg). Department of Mathematics National University of Singapore, 10 Lower Kent Ridge Road, Singapore (mattohc@nus.edu.sg); and Singapore-MIT Alliance, 4 Engineering Drive 3, Singapore

2 1 Introduction Let S n be the space of n n real symmetric matrices endowed with the standard trace inner product, and Frobenius norm, and S+ n (S++) n the set of positive semidefinite (definite) matrices in S n. We consider the following linearly constrained convex semidefinite programming problem: (P ) min{f(x) : A(x) = b, x 0, x S n where f is a smooth convex function on S n +, A : S n R m is a linear map, b R m, and x 0 means that x S n +. Let A be the adjoint of A. The dual problem associated with (P ) is given by (D) max{f(x) f(x), x + b, p : f(x) A p z = 0, p R m, z 0, x 0. We assume that the linear map A is surjective, and that strong duality holds for (P ) and (D). Let x be an optimal solution of (P ) and (x, p, z ) be an optimal solution of (D). Then, as a consequence of strong duality, they must satisfy the following KKT conditions: A(x) = b, f(x) A p z = 0, x, z = 0, x, z 0. The problem (P ) contains the following important special case of convex quadratic semidefinite programming (QSDP): { 1 min 2 x, Q(x) + c, x : A(x) = b, x 0, (1) where Q : S n S n is a given self-adjoint positive semidefinite linear operator and c S n. Note that the Lagrangian dual problem of (1) is given by { max 1 2 x, Q(x) + b, p : A (p) Q(x) + z = c, z 0. (2) A typical example of QSDP is the nearest correlation matrix problem, where given a symmetric matrix u S n and a linear map L : S n R n n, one intends to solve { 1 min 2 L(x u) 2 : diag(x) = e, x 0, (3) where e R n is the vector of all ones and u S n is given. If we let Q = L L and c = L L(u) in (3), then we get the QSDP problem (1). A well studied special case of (3) is the W -weighted nearest correlation matrix problem, where L = W 1/2 W 1/2 for a given W S n ++ and Q = W W. Note that for U R n r, V R n s, U V : R r s S n is the symmetrized Kronecer product linear map defined by U V (M) = (UMV T + V M T U T )/2. 2

3 There are several methods available for solving this special case of (3), which include the alternating projection method [5], the quasi-newton method [7], the inexact semismooth Newton-CG method [11] and the inexact interior-point method [14]. All these methods, excluding the inexact interior-point method, rely critically on the fact that the projection of a given matrix x S n onto S+ n has an analytical formula with respect to the norm W 1/2 ( )W 1/2. However, all above mentioned techniques cannot be extended to efficiently solve the H-weighted case [5] of (3), where L(x) = H x for some H S n with nonnegative entries and Q(x) = (H H) x, with denoting the Hardamard product of two matrices defined by (A B) ij = A ij B ij. The aforementioned methods are not well suited for the H-weighted case of (3) because there is no explicitly computable formula for the following problem { 1 min 2 H (x u) 2 : x 0, (4) where u S n is a given matrix. To tacle the H-weighted case of (3), Toh [13] proposed an inexact interior-point method for a general convex QSDP including the H-weighted nearest correlation matrix problem. Recently, Qi and Sun [12] introduced an augmented Lagrangian dual method for solving the H-weighted version of (3), where the inner subproblem was solved by a semismooth Newton-CG (SSNCG) method. The augmented Lagrangian dual method avoids solving (4) directly and it can be much faster than the inexact interior-point method [13]. However, if the weight matrix H is very sparse or illconditioned, the conjugate gradient (CG) method would have great difficulty in solving the linear system of equations in the semismooth Newton method, and the augmented Lagrangian method would not be efficient or even fail. Another example of QSDP comes from the civil engineering problem of estimating a positive semidefinite stiffness matrix for a stable elastic structure from r measurements of its displacements {u 1,..., u r R n in response to a set of static loads {f 1,..., f r R n [18]. In this application, one is interested in the QSDP problem: min{ f L(x) 2 x S+, n where L : S n R n r is defined by L(x) = xu, and f = [f 1,..., f r ], u = [u 1,..., u r ]. In this case, the corresponding map Q = L L is given by Q(x) = (xb + Bx)/2 with B = uu T. The main purpose of this paper is to design an efficient algorithm to solve the problem (P ). The algorithm we propose here is based on the accelerated proximal gradient (APG) method of Bec and Teboulle [1] (the method is called FISTA in [1]), where in the th iteration with iterate x, a subproblem of the following form must be solved: { min f( x ), x x x x, H (x x ) : A(x) = b, x 0 (5) where H : S n S n is a given self-adjoint positive definite linear operator. In FISTA [1], H is restricted to LI, where I denotes the identity map and L is a Lipschitz constant for f. More significantly, for FISTA in [1], the subproblem (5) must be solved exactly to generate the next iterate x +1. In this paper, we design an inexact APG method which 3

4 overcomes the two limitations just mentioned. Specifically, in our inexact algorithm, the subproblem (5) is only solved approximately and H is not restricted to be a scalar multiple of I. In addition, we are able to show that if the subproblem (5) is progressively solved with sufficient accuracy, then the number of iterations needed to achieve ε-optimality (in terms of the function value) is also proportional to 1/ ε, just as in the exact algorithm. Another strong motivation for designing an inexact APG algorithm comes from the recent paper [2], which considered the following regularized inverse problem: min x { 1 2 Φx y 2 + λ x B where Φ : R p R n is a given linear map and x B is the atomic norm induced by a given compact set of atoms B in R p. It appears that the APG algorithm is highly suited for solving (6). But note that in each iteration of the APG algorithm, a subproblem of the form, min z { 1 z 2 x 2 + µ z B min{ 1 y 2 x 2 y B µ, must be solved. However, for most choices of B, the subproblem does not admit an analytical solution and has to be solved numerically. As a result, the subproblem is never solved exactly. In fact, it could be computationally very expensive to solve the subproblem to high accuracy. Our inexact APG algorithm thus has the attractive computational advantage that the subproblems need only be solved with progressively better accuracy while still maintaining the global iteration complexity. We should mention that the fast gradient method of Nesterov [9] has also been extended in [4] to the problem, min{f(x) x Q, where the function f is convex (not necessarily smooth) on the closed convex set Q, and is equipped with the so-called firstorder (δ, L)-oracle where for any y Q, we can compute a pair (f δ,l (y), g δ,l (y)) such that 0 f(x) f δ,l (y) g δ,l (y), x y L x 2 y 2 + δ x Q. In the inexact-oracle fast gradient method in [4], the subproblem of the form min{ g, x y + γ x 2 y 2 x Q in each iteration must be solved exactly. Thus the ind of the inexactness considered in [4] is very different from what we consider in this paper. From a practical perspective, the extension of the APG algorithm in [1] to the inexact setting would not be as interesting if we cannot demonstrate it to be computationally viable or offer any computational advantage over alternative algorithms for solving a problem such as (1). Hence, even though the focus of this paper is on designing some theoretical inexact APG algorithms for solving the problem (P ), and on establishing their iteration complexities, we do present some preliminary numerical results to demonstrate the practical viability of our proposed inexact algorithms. In particular, as we will demonstrate later in the paper, if the linear operator H is chosen appropriately so that the subproblem (5) is amenable to computation via the SSNCG method, then our inexact APG algorithm can be much more efficient than the state-of-the-art algorithm (the augmented Lagrangian method in [12]) for solving some convex QSDP problems arising from the H-weighted case of the nearest correlation matrix problem (3). In fact, from our preliminary numerical results, we observe that when the weight matrix H in (4) is (6) 4

5 highly ill-conditioned, our inexact APG algorithm can be times faster than the state-of-the-art augmented Lagrangian method designed in [12]. The paper is organized as follows. In section 2, we propose an inexact APG algorithm for solving a minimization problem of the form min{f(x) + g(x) : x X where f is a smooth convex function with Lipschitz continuous gradient and g is a proper lower semicontinuous convex function. We also prove that the proposed inexact APG algorithm enjoys the same iteration complexity as the FISTA algorithm in [1]. In section 3, we propose and analyse an inexact APG algorithm for the problem (P ) for which the semidefinite least squares subproblem in each iteration is not required to satisfy a stringent primal feasibility condition. In section 4, we conduct some preliminary numerical experiments to evaluate the practical performance of our proposed inexact APG algorithms for solving QSDP problems (1) arising from H-weighted nearest correlation matrix problems. We also evaluate the performance of the proposed algorithms on randomly generated QSDP problems for which the map Q taes the form as in the stiffness matrix estimation problem in [18]. 2 An inexact accelerated proximal gradient method For more generality, we consider the following minimization problem min{f (x) := f(x) + g(x) : x X (7) where X is a finite-dimensional Hilbert space. The functions f : X R, g : X R {+ are proper, lower semi-continuous convex functions (possibly nonsmooth). We assume that dom(g) := {x X : g(x) < is closed, f is continuously differentiable on X and its gradient f is Lipschitz continuous with modulus L on X, i.e., f(x) f(y) L x y x, y X. We also assume that the problem (7) is solvable with an optimal solution x dom(g). The inexact APG algorithm we propose for solving (7) is described as follows. 5

6 Algorithm 1. Given a tolerance ε > 0. Input y 1 = x 0 dom(g), t 1 = 1. Set = 1. Iterate the following steps. Step 1. Find an approximate minimizer { x arg min f(y ) + f(y ), y y + 1 y X 2 y y, H (y y ) + g(y), (8) where H is a self-adjoint positive definite linear operator that is chosen by the user. Step 2. Compute t +1 = t 2 Step 3. Compute y +1 = x + 2. ( t 1 t +1 )(x x 1 ). Notice that Algorithm 1 is an inexact version of the algorithm FISTA in [1], where x need not be the exact minimizer of the subproblem (8). In addition, the quadratic term is not restricted to the form L 2 y y 2 where L is a positive scalar. Given any positive definite linear operator H j : X X, and y j X, we define q j ( ) : X R by q j (x) = f(y j ) + f(y j ), x y j x y j, H j (x y j ). (9) Note that if we choose H j = LI, then we have f(x) q j (x) for all x dom(g). Let {ξ, {ϵ be given convergent sequences of nonnegative numbers such that ξ < =1 and ϵ <. =1 Suppose for each j, we have an approximate minimizer: that satisfies the conditions x j arg min{q j (x) + g(x) : x X (10) F (x j ) q j (x j ) + g(x j ) + ξ j, (11) 2t 2 j f(y j ) + H j (x j y j ) + γ j = δ j with H 1/2 j δ j ϵ j /( 2t j ) (12) ξ j where γ j g(x j ; ξ j ) (the set of -subgradients of g at x 2t 2 j 2t 2 j ). Note that for x j to be an j approximate minimizer, we must have x j dom(g). We should mention that the condition (11) is usually easy to satisfy. For example, if H j is chosen such that f(x) q j (x) for all x dom(g), then (11) is automatically satisfied. To establish the iteration complexity result analogous to the one in [1] for Algorithm 1, we need to establish a series of lemmas whose proofs are extensions of those in [1] to 6

7 account for the inexactness in x. We should note that though the ideas in the proofs are similar, but as the reader will notice later, the technical details become much more involved due to the error terms induced by the inexact solutions of the subproblems. Lemma 2.1. Given y j X and a positive definite linear operator H j on X such that the conditions (11) and (12) hold. Then for any x X, we have F (x) F (x j ) 1 2 x j y j, H j (x j y j ) + y j x, H j (x j y j ) + δ j, x x j ξ j. (13) t 2 j Proof. The proof follows similar arguments as in [1, Lemma 2.3] and we omit it here. For later purpose, we define the following quantities: v = F (x ) F (x ) 0, u = t x (t 1)x 1 x, (14) a = t 2 v 0, b = 1 2 u, H (u ) 0, e = t δ, u, (15) τ = 1 2 x 0 x, H 1 (x 0 x ), ϵ = ϵ j, ξ = (ξ j + ϵ 2 j). (16) Note that for the choice where ϵ j = 1/j α = ξ j for all j 1, where α > 1 is fixed, we have ϵ 1 α 1, ξ α 1 1. Lemma 2.2. Suppose that H 1 H 0 for all. Then a 1 + b 1 a + b e ξ. (17) Proof. The proof follows similar arguments as in [1, Lemma 4.1] and we omit it here. Lemma 2.3. Suppose that H 1 H 0 for all. Then a ( τ + ϵ ) ξ. (18) Proof. Note that we have e H 1/2 δ H 1/2 u t ϵ H 1/2 u / 2 = ϵ b. First, we show that a 1 + b 1 τ + ϵ 1 b1 + ξ 1. Note that a 1 = F (x 1 ) F (x ) and b 1 = 1 2 x 1 x, H 1 (x 1 x ). By applying the inequality (13) to x = x with j = 1, and noting that y 1 = x 0, we have that a x 1 y 1, H 1 (x 1 y 1 ) + y 1 x, H 1 (x 1 y 1 ) + δ 1, x x 1 ξ 1 = 1 2 x 1 x, H 1 (x 1 x ) 1 2 y 1 x, H 1 (y 1 x ) + δ 1, x x 1 ξ 1 = b x 0 x, H 1 (x 0 x ) + δ 1, x x 1 ξ 1. 7

8 Hence, by using the fact that H 1/2 1 δ 1 ϵ 1 / 2, we get a 1 + b x 0 x, H 1 (x 0 x ) δ 1, x x 1 + ξ 1 τ + ϵ 1 b1 + ξ 1. (19) Let By Lemma 2.2, we have s = ϵ 1 b1 + + ϵ b + ξ ξ. τ a 1 + b 1 ϵ 1 b1 ξ 1 a 2 + b 2 ϵ 1 b1 ϵ 2 b2 ξ 1 ξ 2 a + b s. (20) Thus we have a + b τ + s, and hence s = s 1 + ϵ b + ξ s 1 + ϵ τ + s + ξ. (21) Note that since τ b 1 ϵ 1 b1 ξ 1, we have b (ϵ 1+ ϵ (τ + ξ 1 )) ϵ 1 + τ + ξ 1. Hence s 1 = ϵ 1 b1 + ξ 1 ϵ 1 (ϵ 1 + τ + ξ 1 ) + ξ 1 ϵ ξ 1 + ϵ 1 ( τ + ξ 1 ). The inequality (21) implies that Hence we must have (τ + s ) ϵ τ + s (τ + s 1 + ξ ) 0. τ + s 1 2 ( ) ϵ + ϵ 2 + 4(τ + s 1 + ξ ). Consequently s s ϵ2 + ξ ϵ ϵ 2 + 4(τ + s 1 + ξ ) s 1 + ϵ 2 + ξ + ϵ ( τ + s 1 + ξ ). This implies that s s 1 + ϵ 2 j + j=2 ξ + τ ϵ + ξ j + τ j=2 ϵ j sj ϵ j + j=2 ϵ j sj 1 + ξ j ξ + τ ϵ + s ϵ. (22) In the last inequality, we used the fact that s j 1 + ξ j s j and 0 s 1 s. The inequality (22) implies that s 1 ( ) ϵ + ( ϵ ξ + 4 ϵ τ) 1/ j=2

9 From here, we get s ϵ ξ + 2 ϵ τ, and the required result follows from the fact that a τ + s in (20). Now we are ready to state the iteration complexity result for the inexact APG algorithm described in Algorithm 1. Theorem 2.1. Suppose the conditions (11) and (12) hold, and H 1 H 0 for all. Then 4 ( 0 F (x ) F (x ) ( τ + ϵ ( + 1) 2 ) ξ ). (23) Proof. By Lemma 2.3 and the fact that t ( + 1)/2, we have F (x ) F (x ) = a /t 2 4 ( + 1) 2 (( τ + ϵ ) ξ ). From the assumption on the sequences {ξ and {ϵ, we now that both { ϵ and { ξ are bounded. Then the required convergent complexity result follows. Observe that in Theorem 2.1, we will recover the complexity result established in [1] if ϵ j = 0 = ξ j for all j. 2.1 Specialization to the case where g = δ Ω Problem (P ) can be expressed in the form (7) with g = δ Ω, where δ Ω denotes the indicator function on the set Ω = {x S n : A(x) = b, x 0. (24) The sub-problem (8), for a fixed y, then becomes the following constrained minimization problem: { min f(y ), x y x y, H (x y ) : A(x) = b, x 0. (25) Suppose we have an approximate solution (x, p, z ) to the KKT optimality conditions for (25). More specifically, f(y ) + H (x y ) A p z =: δ 0 A(x ) b = 0 (26) x, z =: ε 0, x, z 0. To apply the complexity result established in Theorem 2.1, we need δ and ε to be sufficiently small so that the conditions (11) and (12) are satisfied. Observe that we need 9

10 x to be contained in Ω in (26). Note that the first equation in (26) is the feasibility condition for the dual problem of (25), and it corresponds to the condition in (12) with γ = A p z. Indeed, as we shall show next, γ is an ε -subgradient of g at x Ω if z 0. Now, given any v Ω, we need to show that g(v) g(x ) + γ, v x ε. We have g(v) = 0, g(x ) = 0 since v, x Ω, and γ, v x = A p + z, x v = p, A(x ) A(v) + z, x z, v = z, x z, v z, x = ε. (27) Note that in deriving (27), we used the fact that z, v 0 since v 0 and z 0. Thus the condition (12) is satisfied if H 1/2 δ ϵ /( 2t ) and ε ξ /(2t 2 ). As we have already noted in the last paragraph, the approximate solution x obtained by solving the sub-problem (25) should be feasible, i.e. x Ω. In practice we can maintain the positive semidefiniteness of x by performing projection onto S+. n But the residual vector r := A(x ) b is usually not exactly equal to 0, except for some special cases. For the nearest correlation matrix problem (3), one can indeed obtain an approximate solution x which is contained in Ω be performing a simple diagonal scaling to mae the diagonal entries of the resulting matrix to be one. In the following paragraph, we will propose a strategy to find a feasible solution x Ω given an approximate solution x of (26) for which r is not necessarily 0, but (x, p, z ) satisfies that conditions that x 0, z 0, and H 1/2 δ 1ϵ 2 /( 2t ) and ε 1ξ 2 /(2t 2 ). Suppose that there exists x 0 such that A( x) = b. Since A is surjective, AA is nonsingular. Let ω = A (AA ) 1 (r ). We note that ω 2 r /σ min (A), and A(x + ω ) = b, where 2 denotes the spectral norm. However, x + ω may not be positive semidefinite. Thus we consider the following iterate: x = λ(x + ω ) + (1 λ) x = λx + (λω + (1 λ) x), where λ [0, 1]. It is clear that A x = b. By choosing λ = 1 ω 2 /( ω 2 + λ min ( x)), we can guarantee that x is positive semidefinite. For x, we have Moreover 0 x, z λε + λ n ω 2 z + ω 2 nλmax ( x) z ω 2 + λ min ( x) ε + n ω 2 z + n ω 2 λ min ( x) λ max( x) z ( ) 2ε, if ω 2 ε n z 1 + λ max( x) 1. λ min ( x) f(y ) + H ( x y ) (A p + z ) = δ + H ( x x ) =: δ 10

11 Thus γ = A p z is an 2ε -subgradient of g at x Ω. Now H 1/2 δ H 1/2 ( x x ), and H 1/2 ( H 1/2 ( x x ) 2 = x x, H ( x x ) n ω 2 2λ max (H 1 ) 1 + x x ) 2 2. λ min ( x) δ + Thus we have H 1/2 δ ϵ /( 2t ) if w 2 ϵ ( 2 (λ max (H 1 )) 1/2 1 + x x 2 2n t λ min ( x) ) 1. To conclude, ( x, p, z ) would satisfy the condition (12) if { ξ ω 2 min 4t 2 n z ( 1 + λ max( x) λ min ( x) ) 1, ϵ 2 2nλ max (H 1 ) t ( 1 + x x 2 λ min ( x) ) 1. (28) We should note that even though we have succeeded in constructing a feasible x in Ω. The accuracy requirement in (28) could be too stringent for computational efficiency. For example, when σ min (A) is small, or z is large, or x has a large condition number, or λ max (H 1 ) is large, we would expect that x must be computed to rather high accuracy so that r is small enough for (28) to be satisfied. 3 Analysis of an inexact APG method for (P ) To apply Algorithm 1 to solve the problem (P ), the requirement that x must be primal feasible, i.e., x Ω, can be restrictive as it limits our flexibility of choosing a non-primal feasible algorithm for solving (25). Even though the modification outlined in the last paragraph of section 2.1 is able to produce a primal feasible x, the main drawbac is that the residual norm ω must satisfy the stringent accuracy condition in (28). To overcome the drawbacs just mentioned, here we propose an inexact APG algorithm for solving (P ) for which the iterate x need not be strictly contained in Ω. As the reader will observe later, the analysis of the iteration complexity of the proposed inexact APG becomes even more challenging than the analysis done in the previous section. We let (x, p, z ) be an optimal solution of (P ) and (D). In this section, we let q (x) = f(y ) + f(y ), x y x y, H (x y ), x X. (29) Note that X = S n. The inexact APG algorithm we propose for solving (P ) is given as follows. 11

12 Algorithm 2. Given a tolerance ε > 0. Input y 1 = x 0 X, t 1 = 1. Set = 1. Iterate the following steps. Step 1. Find an approximate minimizer x arg min x X { q (x) : x Ω, (30) where H is a self-adjoint positive definite operator that is chosen by the user, and x is allowed to be contained in a suitable enlargement Ω of Ω. Step 2. Compute t +1 = t 2 Step 3. Compute y +1 = x + 2. ( t 1 t +1 )(x x 1 ). Note that when Ω = Ω, the dual problem of (30) is given by max { q (x) q (x), x + b, p q (x) A p z = 0, z 0, x 0. (31) Let {ξ, {ϵ, {µ be given convergent sequences of nonnegative numbers such that ξ <, ϵ <, and µ <, =1 =1 =1 and be a given positive number. We assume that the approximate minimizer x in (30) has the property that x and its corresponding dual variables (p, z ) satisfy the following conditions: f(x ) q (x ) + ξ /(2t 2 ) q (x ), x b, p q (x ) A p z = δ, with H 1/2 δ ϵ /( 2t ) (32) r µ /t 2 x, z ξ /(2t 2 ) x 0, z 0, where r := A(x ) b. We assume that µ /t 2 µ +1/t 2 +1 and ϵ /t ϵ +1 /t +1 for all. Observe that the last five conditions in (32) stipulate that (x, p, z ) is an approximate optimal solution of (30) and (31). Just as in the previous section, we need to establish a series of lemmas to analyse the iteration complexity of Algorithm 2. However, we should mention that the lac of feasibility in x (i.e., x may not be contained in Ω) introduces nontrivial technical 12

13 difficulties in the proof of the complexity result for Algorithm 2. For example, F (x ) F (x ) no longer holds as in the feasible case when x Ω. Lemma 3.1. Given y j X and a positive definite linear operator H j on X such that the conditions in (32) hold. Then for any x S n +, we have f(x) f(x j ) 1 2 x j y j, H j (x j y j ) + y j x, H j (x j y j ) Proof. Since f(x j ) q j (x j ) + ξ j /(2t 2 j), we have + δ j + A p j, x x j ξ j /t 2 j. (33) f(x) f(x j ) f(x) q j (x j ) ξ j /(2t 2 j) = f(x) f(y j ) f(y j ), x j y j 1 2 x j y j, H j (x j y j ) ξ j /(2t 2 j) f(y j ), x x j 1 2 x j y j, H j (x j y j ) ξ j /(2t 2 j). Note that in the last inequality, we have used the fact that f(x) f(y j ) f(y j ), x y j for all x X. Now, by using (32), we get f(x) f(x j ) f(y j ), x x j 1 2 x j y j, H j (x j y j ) ξ j /(2t 2 j) = δ j + A p j H j (x j y j ), x x j + z j, x x j 1 2 x j y j, H j (x j y j ) ξ j /(2t 2 j) δ j + A p j H j (x j y j ), x x j 1 2 x j y j, H j (x j y j ) ξ j /t 2 j. From here, the required inequality (33) follows readily. Note that in deriving the last inequality, we have used the fact that z j, x j ξ j /(2t 2 j) and z j, x 0. For later purpose, we define the following quantities for 1: v = f(x ) f(x ), u = t x (t 1)x 1 x, a = t 2 v, b = 1 2 u, H (u ) 0, e = t δ, u, (34) η = p, t 2 r t 2 1 r 1, with η 1 = p 1, r 1, χ = p 1 p µ 1, with χ 1 = 0, ϵ = ϵ j, ξ = (ξ j + ϵ 2 j), χ = χ j, τ = 1 2 x 0 x, H 1 (x 0 x ). Note that unlie the analysis in the previous section, a may be negative. 13

14 Lemma 3.2. Suppose that H 1 H 0 for all. Then a 1 + b 1 a + b e ξ η. (35) Proof. By applying the inequality (33) to x = x 1 0 with j =, we get v 1 v = f(x 1 ) f(x ) (36) 1 2 x y, H (x y ) + y x 1, H (x y ) + δ + A p, x 1 x ξ /t 2. Similarly, by applying the inequality (33) to x = x 0 with j =, we get v = f(x ) f(x ) (37) 1 2 x y, H (x y ) + y x, H (x y ) + δ + A p, x x ξ /t 2. By multiplying (36) throughout by t 1 (note that t 1 for all 1) and adding that to (37), we get (t 1)v 1 t v t 2 x y, H (x y ) + t y (t 1)x 1 x, H (x y ) δ + A p, t x (t 1)x 1 x ξ /t. (38) Now, by multiplying (38) throughout by t and using the fact that t 2 1 = t (t 1), we get a 1 a = t 2 1v 1 t 2 v t2 2 x y, H (x y ) + t t y (t 1)x 1 x, H (x y ) δ + A p, t 2 x t 2 1x 1 t x ξ. Let a = t y, b = t x, and c = (t 1)x 1 + x. By using the fact that b a, H (b a) + 2 a c, H (b a) = b c, H (b c) a c, H (a c), we get a 1 a 1 2 b c, H (b c) 1 2 a c, H (a c) δ + A p, t 2 x t 2 1x 1 t x ξ. (39) Now a c = t y c = t x 1 + (t 1 1)(x 1 x 2 ) c = u 1, b c = u, and t 2 x t 2 1 x 1 t x = t u. Thus (39) implies that a 1 a 1 2 u, H (u ) 1 2 u 1, H (u 1 ) δ + A p, t u ξ b b 1 δ, t u p, A(t u ) ξ. (40) 14

15 Note that in deriving (40), we have used the fact that H 1 H. Now p, A(t u ) = p, t 2 (Ax b) t 2 1(Ax 1 b) = p, t 2 r t 2 1r 1. From here, the required result is proved. Lemma 3.3. Suppose that H 1 H 0 and the conditions in (32) are satisfied for all. Then where ω = ϵ j Aj, and a + b ( τ + ϵ ) 2 + p µ + 2( ξ + χ + ω ) (41) A j = p j µ j + a j, with a j = max{0, a j. Proof. Note that we have e H 1/2 δ H 1/2 u t ϵ H 1/2 u / 2 = ϵ b. First, we show that a 1 + b 1 τ + p 1, r 1 + ϵ 1 b1 + ξ 1. Note that a 1 = f(x 1 ) f(x ) and b 1 = 1 2 x 1 x, H 1 (x 1 x ). By applying the inequality (33) to x = x with j = 1, and noting that y 1 = x 0, we have that a x 1 y 1, H 1 (x 1 y 1 ) + y 1 x, H 1 (x 1 y 1 ) + δ 1 + A p 1, x x 1 ξ 1 = 1 2 x 1 x, H 1 (x 1 x ) 1 2 y 1 x, H 1 (y 1 x ) + δ 1 + A p 1, x x 1 ξ 1 = b x 0 x, H 1 (x 0 x ) + δ 1 + A p 1, x x 1 ξ 1. Hence, by using the fact that H 1/2 1 δ 1 ϵ 1 / 2, we get a 1 + b x 0 x, H 1 (x 0 x ) δ 1 + A p 1, x x 1 + ξ 1 τ + ϵ 1 b1 + p 1, r 1 + ξ 1 τ + p 1, r 1 + ϵ 1 b1 + ξ 1. Let s 1 = ϵ 1 b1 + ξ 1 and for 2, s = ϵ j bj + By Lemma 3.2, we have ξ j + τ a 1 + b 1 ϵ 1 b1 ξ 1 η 1 χ j. a 2 + b 2 ϵ 2 b2 ϵ 1 b1 ξ 1 ξ 2 η 1 η 2 a + b a + b +1 ϵ j bj ξ j ϵ j bj 15 η j ξ j p, t 2 r χ j.

16 Note that in the last inequality, we used the fact that 1 η j = p, t 2 r + p j p j+1, t 2 jr j p, t 2 r + Thus we have a + b τ + p, t 2 r + s, and this implies that Hence χ j. b τ + s where τ := τ + p, t 2 r a τ + A. (42) s = s 1 + ϵ b + ξ + χ s 1 + ϵ τ + s + ξ + χ. (43) Note that since τ 1 b 1 ϵ 1 b1 ξ 1, we have b 1 1(ϵ ϵ (τ 1 + ξ 1 )) ϵ 1 + τ1 + ξ 1. Hence s 1 = ϵ 1 b1 + ξ 1 ϵ 1 (ϵ 1 + τ 1 + ξ 1 ) + ξ 1 ϵ ξ 1 + ϵ 1 ( τ 1 + ξ 1 ). The inequality (43) implies that ) (τ + s ) ϵ τ + s (τ + s 1 + ξ + χ 0. Hence we must have τ + s 1 2 ( ) ϵ + ϵ 2 + 4(τ + s 1 + ξ + χ ). Consequently, by using the fact that a + b a + b for any a, b 0, we have s s ϵ2 + ξ + χ ϵ ϵ 2 + 4(τ + s 1 + ξ + χ ) s ϵ2 + ξ + χ ϵ ϵ 2 + 4(τ + A + s 1 + ξ + χ ) This implies that s 1 + ϵ 2 + ξ + χ + ϵ ( τ + A + s 1 + ξ + χ ). (44) s s 1 + (ϵ 2 j + ξ j + χ j ) + j=2 ξ + χ + ϵ j τ + Aj + j=2 ϵ j τ + Aj + ϵ j sj ϵ j sj 1 + ξ j + χ j ξ + χ + ω + ϵ τ + ϵ s. (45) In the last inequality, we used the fact that s j 1 + ξ j + χ j s j, and 0 s 1 s. The inequality (45) implies that s 1 ( ) ϵ + ϵ θ, (46) 16 j=2

17 where θ = ξ + χ + ω + ϵ τ. From here, we get s ϵ 2 + 2θ. (47) The required result follows from (47) and the fact that a + b τ + s + p, t 2 r τ + s + p µ. Let Ω := {x S n : A(x) b µ /t 2, x 0. (48) We assume that the following minimization problem min {f(x) : x Ω has an optimal solution x. Since x, x Ω, we have f(x ) f(x ) and f(x ) f(x ). Hence v = f(x ) f(x ) f(x ) f(x ). Also, since µ /t 2 µ +1/t 2 +1, we have f(x +1 ) f(x ) and Ω +1 Ω. Lemma 3.4. For all 1, we have Proof. By the convexity of f, we have 0 f(x ) f(x ) p µ /t 2. (49) f(x ) f(x ) f(x ), x x = A p + z, x x = p, A(x ) A(x ) + z, x z, x p b A(x ) p µ /t 2. Note that in deriving the last inequality, we have used the fact that z, x = 0, z, x 0, and A(x ) = b. Theorem 3.1. Suppose that H 1 H 0 for all. Let M := max 1 j { ( p + p j )µ j. Then we have 4 p µ ( + 1) f(x 4 ( ) f(x 2 ) ( ) τ + ϵ ( + 1) 2 ) 2 + p µ + 2 ϵ M + 2( ξ + χ ).(50) Proof. The inequality on the left-hand side of (50) follows from Lemma 3.4 and the fact that t ( + 1)/2 and f(x ) f(x ) f(x ) f(x ). Next, we prove the inequality on the right-hand side of (50). By Lemma 3.3 and noting that b 0, we have t 2 (f(x ) f(x )) = a ( τ + ϵ ) 2 + p µ + 2( ξ + χ ) + 2ω. 17

18 Now a j = t 2 j(f(x ) f(x j )) t 2 j(f(x ) f(x j )) p µ j. Hence a j p µ j, and ω ϵ j p j µ j + p µ j M ϵ. (51) From here, the required result follows. From the assumption on the sequences {ϵ, {ξ, and {µ, we now that the sequences { ϵ and { ξ are bounded. In order to show that the sequence of function values f(x ) converges to the optimal function value f(x ) with the convergent rate O(1/ 2 ), it is enough to show that the sequence { p is bounded under certain conditions, from which we can also have the boundedness of {M and { χ. Then the desired convergent rate of O(1/ 2 ) for our inexact APG method follows. 3.1 Boundedness of {p In this subsection, we consider sufficient conditions to ensure the boundedness of the sequence {p. Lemma 3.5. Suppose that there exists ( x, p, z) such that A( x) = b, x 0, f( x) = A p + z, z 0. (52) If the sequence {f(x ) is bounded from above, then the sequence {x is bounded. Proof. By using the convexity of f, we have Thus f( x) f(x ) f( x), x x = A p + z, x x = p, A( x) A(x ) + z, x z, x p b A(x ) + z, x z, x p µ /t 2 + z, x z, x p µ 1 + z, x z, x. λ min ( z)tr(x ) z, x p µ 1 + z, x f( x) + f(x ). (53) From here, the required result is proved. Remar 3.1. The condition that {f(x ) is bounded from above appears to be fairly wea. But unfortunately we are not able to prove that this condition holds true. In many cases of interest, such as the nearest correlation matrix problem (3), the condition that {f(x ) is bounded above or that {x is bounded can be ensured since Ω 1 is bounded. 18

19 Lemma 3.6. Suppose that H 1 H 0 for all, {x is bounded and there exists ˆx such that A(ˆx) = b, ˆx 0. Then the sequence {z is bounded. In addition, the sequence {p is also bounded. Proof. From (32), we have λ min (ˆx)Tr(z ) ˆx, z = ˆx, q (x ) A p δ = b, p + ˆx, q (x ) ˆx, δ + ˆx x, q (x ) ˆx, δ = + ˆx x, f(y ) + H (x y ) ˆx, δ + ˆx x f(y ) + H (x y ) + H 1/2 ˆx ϵ /( 2t ) + ˆx x f(y ) + H (x y ) + H 1/2 1 ˆx ϵ 1 / 2. (54) Since {x is bounded, it is clear that {y is also bounded. By the continuity of f and that fact that 0 H H 1, the sequence { f(y ) + H (x y ) is also bounded. From (54), we can now conclude that {z is bounded. Next, we show that {p is bounded. Let A = (AA ) 1 A. Note that the matrix AA is nonsingular since A is assumed to be surjective. From (32), we have p = A ( q (x ) z δ ), and hence ( ) p A q (x ) z δ A f(y ) + H (x y ) + z + δ. Since H H 1 λ max (H 1 )I, we have δ λ max (H 1 ) H 1/2 δ λ max (H 1 )ϵ 1 / 2. By using the fact that the sequences { f(y ) + H (x y ) and {z are bounded, the boundedness of {p follows. 3.2 A semismooth Newton-CG method for solving the inner subproblem (30) In Section 3, an inexact APG method (Algorithm 2) was presented for solving (P ) with the desired convergent rate of O(1/ 2 ). However, an important issue on how to efficiently solve the inner subproblem (30) has not been addressed. In this section, we propose the use of a SSNCG method to solve (30) with warm-start using the iterate from the previous iteration. Note that the self-adjoint positive definite linear operator H can be chosen by the user. Suppose that at each iteration we are able to choose a linear operator of the form: H := w w, where w S n ++ 19

20 such that f(x) q (x) for all x Ω. (Note that we can always choose w = LI if there are no other better choices.) Then the objective function q ( ) in (30) can equivalently be written as q (x) = 1 2 w1/2 (x u )w 1/2 2 + f(y ) 1 2 w 1/2 f(y )w 1/2 2, where u = y w 1 f(y )w 1. By dropping the last two constant terms in the above equation, we can equivalently write (30) as the following well-studied W -weighted semidefinite least squares problem { 1 min 2 w1/2 (x u )w 1/2 2 : A(x) = b, x 0. (55) Let x = w 1/2 xw 1/2, ū = w 1/2 u w 1/2, and define the linear map Ā : Sn R m by Ā( x) = A(w 1/2 x w 1/2 ). Then (55) can equivalently be written as { 1 min 2 x ū 2 : Ā( x) = b, x 0, (56) whose Lagrangian dual problem is given by max {θ(p) := b T p 1 2 Π S n+ (ū + Ā p) 2 p R m (57) where Π S n + (u) denotes the metric projection of u S n onto S+. n The problem (57) is an unconstrained continuously differentiable convex optimization problem, and it can be efficiently solved by the SSNCG method developed in [11]. Note that once an approximate solution p is computed from (57), an approximate solution for (55) can be computed by setting x = w 1/2 x w 1/2 with x = Π S n + (ū + Ā p ) and its complementary dual slac variable is z = w 1/2 ( x ū Ā p )w 1/2. Note that the problem (57) is an unconstrained continuously differentiable convex optimization problem which can also be solved by a gradient ascent method. In our numerical implementation, we use a gradient method to solve (57) during the initial phase of Algorithm 2 when high accuracy solutions are not required. When the gradient method encounters difficulty in solving the subproblem to the required accuracy or becomes inefficient, we switch to the SSNCG method to solve (57). We should note that approximate solution computed for the current subproblem can be used to warm start the SSNCG and gradient methods for solving the next subproblem. In fact, the strategy of solving a semidefinite least squares subproblem (30) in each iteration of our inexact APG algorithm is practically viable precisely because we are able to warm start the SSNCG or gradient methods when solving the subproblems. In our numerical experience, the SSNCG method would typically tae less than 5 Newton steps to solve each subproblem with warm start. 20

21 To successfully apply the SSNCG method to solve (30), we must find a suitable symmetrized Kronecer product approximation of Q. Note that for the H-weighted nearest correlation matrix problem where Q is a diagonal operator defined by Q(x) = (H H) x, a positive definite symmetrized Kronecer product approximation for Q can be derived as follows. Consider a ran-one approximation dd T of H H, then diag(d) diag(d) is a symmetrized Kronecer product approximation of Q. For the vector d R n, one can simply tae d j = max { ϵ, max ij 1 i n, j = 1,..., n, (58) where ϵ > 0 is a small positive number. For the convex QSDP problem (1) where the linear operator Q is defined by Q(x) = B I(x) = (Bx + xb)/2, (59) where B S n +, we propose the following strategy for constructing a suitable symmetrized Kronecer product approximation of Q = B I. Suppose we have the eigenvalue decomposition B = P ΛP T, where Λ = diag(λ) and λ = (λ 1,..., λ n ) T is the vector of eigenvalues of B. Then x, B I(x) = 1 2 ˆx, Λˆx + ˆxΛ = 1 2 n i=1 n ˆx ij (λ i + λ j ) = n i=1 n ˆx ij M ij, where ˆx = P T xp and M = 1 2 (λet + eλ T ) with e R n being the vector of all ones. For the choice of w, one may simply choose w = max(m)i, where max(m) is the largest element of M. However, if the matrix B is ill-conditioned, this choice of w may not wor very well in practice since max(m)i I may not be a good approximation of Q = B I. To find a better approximation of Q, we propose to consider the following nonconvex minimization problem: { n min i=1 n h i h j h i h j M ij 0 i, j = 1,..., n, h R n +. (60) Thus if ĥ is a feasible solution to the above problem, then we have x, B I(x) = n n ˆx ij M ij i=1 n n ˆx ij ĥ i ĥ j = x, w xw i=1 x S n with w = P diag(ĥ)p T. Note that the above strategy can also be used to get a suitable symmetrized Kronecer product approximation of the form diag(d) diag(d) when Q is a diagonal operator. 21

22 To find a good feasible solution for (60), we consider the following strategy. Suppose we are given an initial vector u R n + such that uu T M 0. For example, we may tae u = max(m)e. Our purpose is to find a correction vector ξ R n + such that h := u ξ satisfies the constraint in (60) while the objective value is reduced. Note that we have h i h j M ij = u i u j M ij (u i ξ j + u j ξ i ) + ξ i ξ j u i u j M ij (u i ξ j + u j ξ i ). Thus the constraints in (60) are satisfied if ξ u and u i ξ j + u j ξ i u i u j M ij i, j = 1,..., n. Since n n i=1 h ih j = (e T ξ) 2 2(e T u)(e T ξ) + (e T u) 2, and noting that 0 e T ξ e T u, the objective value in (60) is minimized if e T ξ is maximized. Thus we propose to consider the following LP: max { e T ξ u i ξ j + u j ξ i u i u j M ij i, j = 1,..., n, 0 ξ u. (61) Observe that the correction vector ξ depends on the given vector u. Thus if necessary, after a new u is obtained, one may repeat the process by solving the LP associated with the new u. 4 Numerical Experiments In this section, we report the performance of the inexact APG algorithm (Algorithm 2) for large scale linearly constrained QSDP problems. We implemented our algorithm in MATLAB 2008a (version 7.6), and the numerical experiments are run in MATLAB under a Linux operating system on an Intel Core 2 Duo 2.40GHz CPU with 4GB memory. We measure the infeasibilities for the primal and dual problems (1) and (2) as follows: R P = b A(x) 1 + b, R D = Q(x) + c A p z, (62) 1 + c where x, p, z are computed from (57). In our numerical experiments, we stop the inexact APG algorithm when max{r P, R D Tol, (63) where Tol is a pre-specified accuracy tolerance. Unless otherwise specified, we set Tol = 10 6 as the default tolerance. In addition, we also stop the inexact APG algorithm when the maximum number of outer iteration exceeds 300. When solving the subproblem (57) at iteration of our inexact APG method, we stop the SSNCG or gradient method when θ(p ) /(1 + b ) < min{1/t 3.1, 0.2 f(x 1) A p 1 z 1 /(1 + c ). 22

23 4.1 Example 1 In this example, we consider the following H-weighted nearest correlation matrix problem { 1 min 2 H (x u) 2 diag(x) = e, x 0. (64) We compare the performance of our inexact APG (IAPG) method and the augmented Lagrangian dual method (AL) studied by Qi and Sun in [12], whose Matlab codes are available at matsundf. We consider the gene correlation matrices û from [6]. For testing purpose we perturb û to u := (1 α)û + αe, where α (0, 1) and E is a randomly generated symmetric matrix with entries in [ 1, 1]. We also set u ii = 1, i = 1,..., n. The weight matrix H is a sparse random symmetric matrix with about 50% nonzero entries. The Matlab code for generating H and E is as follows: H = sprand(n,n,0.5); H = triu(h) + triu(h,1) ; H = (H + H )/2; E = 2*rand(n,n)-1; E = triu(e)+triu(e,1) ; E = (E+E )/2. In order to generate a good initial point, we use the SSNCG method in [11] to solve the following unweighted nearest correlation matrix problem { 1 min 2 x u 2 diag(x) = e, x 0. (65) Due to the difference in stopping criteria for different algorithms, we set different accuracy tolerances for the IAPG and augmented Lagrangian methods. For the IAPG method, we set Tol = For the augmented Lagrangian method, its stopping criteria depends on a tolerance parameter Tol1 which controls the three conditions in the KKT system (26). We set Tol1 = Table 1 presents the numerical results obtained by the IAPG method (Algorithm 2) and the augmented Lagrangian dual method (AL) for various instances of Example 1. We use the primal infeasibility, primal objective value and computing time to compare the performance of the two algorithms. For each instance in the table, we report the matrix dimension (n), the noise level (α), the number of outer iterations (iter), the total number of Newton systems solved (newt) the primal infeasibility (R P ), the dual infeasibility (R D ), the primal objective value (pobj) in (64), as well as the computation time (in the format hours:minutes:seconds) and the ran of the computed solution (ranx). We may observe from the table that the IAPG method can solve (64) very efficiently. For each instance, the IAPG method can achieve nearly the same primal objective value as the augmented Lagrangian method, and the former can achieve much better primal infeasibility while taing less than 50% of the time needed by the augmented Lagrangian method. 23

24 Algo. problem n α iter/newt R P R D pobj time ranx IAPG Lymph / e e e+0 2: / e e e-1 4: AL Lymph e e e+0 5: e e e-1 30: IAPG ER / e e e+1 3: / e e e+0 3: AL ER e e e+1 9: e e e+0 14: IAPG Arabidopsis / e e e+1 4: / e e e+0 4: AL Arabidopsis e e e+1 12: e e e+0 22: IAPG Leuemia / e e e+2 9: / e e e+1 8: AL Leuemia e e e+2 22: e e e+1 28: IAPG hereditarybc / e e e+2 17: / e e e+2 17: AL hereditarybc e e e+2 38: e e e+2 36: Table 1: Comparison of the inexact APG (IAPG) and augmented Lagrangian dual (AL) algorithms on (64) using sample correlation matrix from gene data sets. The weight matrix H is a sparse random matrix with about 50% nonzero entries. 4.2 Example 2 We consider the same problem as in Example 1, but the weight matrix H is generated from a weight matrix H 0 used by a hedge fund company. The matrix H 0 is a symmetric matrix with all positive entries. It has about 24% of the entries equal to 10 5 and the rest are distributed in the interval [2, ]. It has 28 eigenvalues in the interval [ 520, 0.04], 11 eigenvalues in the interval [ , ], and the rest of 54 eigenvalues in the interval [10 4, ]. The Matlab code for generating the matrix H is given by tmp = ron(ones(25,25),h0); H = tmp([1:n],[1:n]); H = (H+H )/2. We use the same implementation techniques as in Example 1. The stopping tolerance for the IAPG method is set to Tol = 10 6 while the tolerance for the augmented Lagrangian method is set to Tol1 = Table 2 presents the numerical results obtained by the IAPG and augmented Lagrangian dual (AL) methods. In the table, means that the augmented Lagrangian method cannot achieve the required tolerance of 10 2 in 24 hours. As we can see from Table 2, the IAPG method is much more efficient than the augmented Lagrangian method, and it can achieve much better primal infeasibility. For the last gene correlation matrix of size 1869, the IAPG method can find a good approx- 24

25 Algo. problem n α iter/newt R P R D pobj time ranx IAPG Lymph / e e e+6 1: / e e e+6 1: AL Lymph e e e+6 56: e e e+6 29: IAPG ER / e e e+7 2: / e e e+6 2: AL ER e e e+7 2:05: e e e+6 53: IAPG Arabidopsis / e e e+7 4: / e e e+6 3: AL Arabidopsis e e e+7 4:49: e e e+6 1:28: IAPG Leuemia / e e e+7 11: / e e e+7 10: AL Leuemia e e e+7 5:55: IAPG hereditarybc / e e e+8 29: / e e e+7 26: AL hereditarybc Table 2: Same as Table 1, but with a bad weight matrix H. imate solution within half an hour. For the augmented Lagrangian method, because the map Q associated with the weight matrix H is highly ill-conditioned, the CG method has great difficulty in solving the ill-conditioned linear system of equations obtained by the semismooth Newton method. Note that we have also used Algorithm 1 to solve the problems in Tables 1 and 2 since we can obtain an approximate solution x which is feasible for (64) for the subproblems (25) appearing in Algorithm 1. For these problems, the numerical results obtained by Algorithm 1 are quite similar to those of Algorithm 2, and hence we shall not report them here. 4.3 Example 3 In this example, we report the performance of the inexact APG on the linearly constrained QSDP problem (1). The linear operator Q is given by Q(x) = 1 (Bx + xb) for a given 2 B 0, and the linear map A is given by A(x) = diag(x). We generate a positive definite matrix X and set b = A(x). Similarly we can generate a random vector p R m and a positive definite matrix z and set c = A (p)+z Q(x). The Matlab code for generating the matrix B is given by randvec = 1+ 9*rand(n,1); tmp = randn(n,ceil(n/4)); B = diag(randvec)+(tmp*tmp )/n; B = (B+B )/2. Note that the matrix B generated is rather well conditioned. 25

26 As discussed in section 3.2, we are able to find a good symmetrized Kronecer product approximation w w of Q. By noting that 1 2 x, w w(x) + c, x = 1 2 w1/2 (x u)w 1/ w 1/2 cw 1/2 2, where u = w 1 cw 1, and dropping the constant term, we propose to solve the following problem to generate a good initial point for the inexact APG method: { 1 min 2 w1/2 (x u)w 1/2 2 A(x) = b, x 0, which can be efficiently solved by the the SSNCG method in [11]. n m cond(b) iter/newt R P R D pobj dobj time e+0 9/9 3.24e e e e e+0 9/9 3.68e e e e+4 1: e+0 9/9 3.16e e e e+5 8: e+0 9/9 3.32e e e e+5 16: e+0 9/9 2.98e e e e+5 29:02 Table 3: Numerical results of the inexact APG algorithm on (1), where the positive definite matrix B for the linear operator Q is well-conditioned. The performance results of our IAPG method on convex QSDP problems are given in Table 3, where pobj and dobj are the primal and dual objective values for QSDP, respectively. We may see from the table that the IAPG method can solve all the five instances of QSDP problems very efficiently with very good primal infeasibility. 4.4 Example 4 We consider the same problem as Example 3 but the linear map A is generated by using the first generator in [8] with order p = 3. The positive definite matrix B is generated by using Matlab s built-in function: B = gallery( lehmer,n). The condition number cond(b) of the generated Lehmer matrix B is within the range [n, 4n 2 ]. For this example, the simple choice of w = λ max (B)I in the symmetrized Kronecer product w w for approximating Q does not wor well. In our numerical implementation, we employ the strategy described in section 3.2 to find a suitable w. Table 4 presents the performance results of our IAPG method on convex QSDP problems where the matrix B is very ill-conditioned. As observed from the table, the condition numbers of B are large. We may see from the table that the IAPG method can solve the problem very efficiently with very accurate approximate optimal solution. 26

27 n m cond(b) iter/newt R P R D pobj dobj time e+5 51/ e e e e+3 1: e+6 62/ e e e e+4 11: e+6 76/ e e e e+4 1:14: e+6 80/ e e e e+4 2:11:01 Table 4: Same as Table 3, but the matrix B for the linear operator Q is ill-conditioned and the linear map A is randomly generated as in [8]. References [1] A. Bec, and M. Teboulle, A fast iterative shrinage-thresholding algorithm for linear inverse problems, SIAM J. Imaging Sciences, 2 (2009), pp [2] V. Chandrasearan, B. Recht, P.A. Parrilo, and A.S. Willsy, The convex geometry of linear inverse problems, prepint, December, [3] F.H. Clare, Optimization and Nonsmooth Analysis, John Wiley and Sons, New Yor, [4] O. Devolder, F. Glineur, and Y. Nesterov, First-order methods of smooth convex optimization with inexact oracle, preprint, [5] N.J. Higham, Computing the nearest correlation matrix a problem from finance, IMA Journal of Numerical Analysis, 22 (2002), [6] L. Li and K.C. Toh, An inexact interior point method for L1-regularized sparse covariance selection, Mathematical Programming Computation, 2 (2010), [7] J. Malic, A dual approach to semidefinite least-squares problems, SIAM Journal on Matrix Analysis and Applications, 26 (2004), [8] J. Malic, J. Povh, F. Rendl, and A. Wiegele, Regularization methods for semidefinite programming, SIAM Journal on Optimization, 20 (2009), [9] Y. Nesterov, A method of solving a convex programming problem with convergence rate O(1/ 2 ), Soviet Mathematics Dolady 27 (1983), [10] Y. Nesterov, Gradient method for minimizing composite objective function, preprint, [11] H. Qi and D.F. Sun, A quadratically convergent Newton method for computing the nearest correlation matrix, SIAM J. on Matrix Analysis and Applications, 28 (2006),

An inexact accelerated proximal gradient method for large scale linearly constrained convex SDP

An inexact accelerated proximal gradient method for large scale linearly constrained convex SDP Kaifeng Jiang, Defeng Sun, and Kim-Chuan Toh Abstract The accelerated proximal gradient (APG) method, first