On the Second-Order Feasibility Cone: Primal-Dual Representation and Efficient Projection

Size: px

Start display at page:

Download "On the Second-Order Feasibility Cone: Primal-Dual Representation and Efficient Projection"

Damon Thomas
5 years ago
Views:

1 On the Second-Order Feasibility Cone: Primal-Dual Representation and Efficient Projection Alexandre Belloni and Robert M. Freund October, 2006 Abstract We study the second-order feasibility cone F = {y IR n : My g T y} for given data (M, g). We construct a new representation for this cone and its dual based on the spectral decomposition of the matrix M T M gg T. This representation is used to efficiently solve the problem of projecting an arbitrary point x IR n onto F: min y { y x : My g T y}, which aside from theoretical interest also arises as a necessary subroutine in the re-scaled perceptron algorithm. We develop a method for solving the projection problem to an accuracy ε whose computational complexity is bounded by O(mn 2 + n ln ln(/ε) + n ln ln(/ min{width(f), width(f )})) operations after the spectral decomposition of M T M gg T is computed. Here the width(f), width(f ) denotes the widths of F and F, respectively. This is a substantial improvement over the complexity of a generic interior-point method. This research has been partially supported through the Singapore-MIT Alliance and an IBM PhD Fellowship. IBM T. J. Watson Research Center and MIT, 32-22, 0 Kitchawan Road, Yorktown Heights, New York 0598, belloni@mit.edu MIT Sloan School of Management, 50 Memorial Drive, Cambridge, MA 0242, USA, rfreund@mit.edu

2 Introduction and Main Results Our notation is as follows: let K denote the dual of a convex cone K IR k, i.e., K := {z IR k : z T y 0 for all y K}. A convex cone K is regular if it is closed, has nonempty interior, and contains no lines, in which case K is also regular, see Rockafellar [3]. Define the standard second-order cone in IR k to be Q k := {y IR k : (y,..., y k ) y k }, where denotes the Euclidean norm. Let B(y, r) denote the Euclidean ball of radius r centered at y. Given data (M, g) (IR m n, IR n ), our interest lies in the second-order feasibility cone F := {y IR n : My g T y} = {y IR n : (My, g T y) Q m+ } and its dual cone F. We make the following assumption about the data: Assumption rank(m) 2 and g 0. We now describe our main representation result for F and F. It is elementary to establish that M T M gg T has at most one negative eigenvalue, and we can write its eigendecomposition as M T M gg T = QDQ T where Q is orthonormal (Q = Q T ) and D is the diagonal matrix of eigenvalues. For notational convenience we denote D i and Q i as the i th diagonal component of D and the i th column of Q, respectively. By reordering the columns of Q we can presume that D D n and D,..., D n 0. By choosing either Q n or Q n we can further presume that g T Q n 0. We implicitly assume Q and D can be computed to within machine precision (in the relative sense) in O(mn 2 ) operations, consistent with computational practice. Our interest lies in the case when F is a regular cone, so we will hypothesize that F is a regular cone for the remainder of this section. We indicate how to amend our results and proofs to relax this hypothesis at the ends of Sections 2 and 3. Our main representation result is as follows: Theorem Suppose that F is a regular cone. Then D,..., D n > 0 > D n, and: (i) F = {y : y T QDQ T y 0, y T Q n 0}; (ii) F = {z : z T QD Q T z 0, z T Q n 0}; (iii) If y F and α 0, then z := αqdq T y F. Furthermore, if y F, then z F and z T y = 0; (iv) If z F and α 0, then y := αqd Q T z F. Furthermore, if z F, then y F and z T y = 0. Note that (i) and (ii) of Theorem describe easily computable representations of F and F that have the same computational structure, in that checking membership in each cone uses similar data, operations, etc., in a manner that is symmetric between the dual cones. Parts (iii) and (iv) indicate that the same matrices in (i) and (ii) can be used constructively to map points on the boundary of one cone to their orthogonal counterpart in the dual cone. 2

3 Remark Geometry of F and F. Examining (i) and the property that D n < 0, the orthonormal transformation y s := Q T y maps F onto the axes-aligned ellipsoidal cone S := {s IR n n : j= D is 2 i D n s n }, so that F is the image of S under Q, and F = n {y : i= D i(q T i y)2 D n Q T n y} and F n = {z : i= (/D i)(q T i z)2 / D n Q T n z}. This establishes that F is indeed simply an ellipsoidal cone whose axes are the eigenvectors of Q with dilations corresponding to the eigenvalues of M T M gg T. From this perspective, the representation of F via (ii) makes natural geometric sense. Also, the central axis of both F and F is the ray {αq n : α 0}. Last of all, note that F = {y : y T QDQ T y 0, y T Q n 0} and F = {z : z T QD Q T z 0, z T Q n 0}. It turns out that the eigendecomposition of M T M gg T = QDQ T, while useful both conceptually and algorithmically (as we shall see), is not even necessary for the above representation of F and F. Indeed, Theorem can alternatively be stated replacing QDQ T and QD Q T by M T M gg T and (M T M gg T ). Under the further hypothesis that M T M is invertible, the theorem can be restated as follows: Corollary Suppose that F is a regular cone and rank(m T M) = n. Then: (i) F = {y : y T (M T M)y g T y}; (ii) F = {z : z T (M T M) z gt (M T M) z }; g T (M T M) g (iii) If y F and α 0, then z := α(m T M gg T )y F. Furthermore, if y F, then z F and z T y = 0; (iv) If z F and α 0, then y := α [(M T M) (M ] T M) gg T (M T M) z F. Furthermore, if z F, then y F and z T y = 0. g T (M T M) g The proofs of Theorem and Corollary are presented in Section 2, along with proofs that all the stated quantities are well-defined: in particular D exists and g T (M T M) g > 0 under the given hypotheses. These representation results are used to solve the following dual pair of optimization problems, where x IR n is a given point: P : t := min y y x D : t := max z x T z s.t. y F s.t. z z F. () The problem P is the classical projection problem onto the cone F, whose solution is the point in F closest to x, and strong duality is easily established for this pair of problems. The problem D arises as a necessary subroutine in the re-scaled perceptron algorithm in []: the subroutine needs to efficiently solve D using x = x k that arises at each outer iteration k of the algorithm. It is this latter problem that motivated our interest in efficiently representing F and solving both P and D. Notice that P/D involve intersections of a Euclidean ball and a 3

4 second-order feasibility cone. This dual pair of problems is therefore a modest generalization of the trust region problem of optimizing a quadratic function over a Euclidean ball, for which Ye [5] showed how to combine binary search and Newton s method to obtain double-logarithmic complexity. Using the representation results above, and extending ideas from [5], we develop an algorithm for solving () in Section 3. The complexity of the algorithm depends on the widths of the cones F and F, where the width τ K of a cone K is defined to be the radius of the largest ball contained in K that is centered at unit distance from the origin: τ K := max{r : B(y, r) K, y }. y,r It readily follows from Theorem that the widths of F and F are simple functions of the largest and smallest positive eigenvalues and the negative eigenvalue of M T M gg T, and it is straightforward to derive: Dn / D n τ F = and τ F =. D n + D / D n + /D n The main complexity result, which is proved in Section 3, is: Theorem 2 Suppose that F is a regular cone, and x IR n satisfying x = is given. Then feasible solutions (y, z) of (P, D) satisfying a duality gap of at most σ are computable in O(mn 2 + n ln ln(/σ) + n ln ln(/ min{τ F, τ F })) operations. Note that this is a substantial improvement over the complexity of a generic interior-point method which is O(mn 2 (ln(/σ) + ln(/ min{τ F, τ F }))). We note also that the assumption that F is regular can be relaxed with no loss of strength of the results herein, but with substantial expositional overhead. These matters are discussed at the end of Section 3. 2 Proofs of Representation Results Recall the eigendecomposition of M T M gg T = QDQ T with D D n. A simple dimension argument establishes that M T M gg T has at most one negative eigenvalue, whereby D,..., D n 0. By choosing either Q n or Q n we can ensure that g T Q n 0. In preparation for the proof of Theorem, we first prove some preliminary results. Proposition Suppose that int F. Then D n < 0, and there exists y satisfying My < g T y. Proof: We first suppose that there exists ȳ that satisfies Mȳ < g T ȳ. In this case it easily follows that 0 > ȳ T (M T M gg T )ȳ = ȳ T QDQ T ȳ, whereby D n < 0. Next suppose that every y F satisfies My = g T y, and let ȳ int F. Using the singular-value decomposition, we can write M = P R T where P IR m r, R IR n r P T P = I and R T R = E for some positive 4

5 diagonal matrix E of rank r = rank(m). Since ȳ int F we have M(ȳ + βd) = g T (ȳ + βd) for all d B(0, ) and all sufficiently small positive β. Substituting M = P R T and squaring the previous equation and rearranging terms yields 2β(d T RR T ȳ ȳ T gg T d) + β 2 (d T RR T d d T gg T d) = 0, which is only true if g T d = 0 R T d = 0. This in turn means that rank(r) =, and so rank(m) =, violating Assumption. Therefore there exists y satisfying My < g T y. The following straightforward characterization of F, which was presented in more general form in [], is included here for completeness. Proposition 2 Let T = { } M T λ + gα : λ α. Then F = cl (T ). Proof: ( ) Let λ α. Then for every x F it follows that λ T Mx + αg T x αg T x Mx λ 0. Thus T F, whereby cl (T ) F since F is closed. ( ) Assume that there exists y F \cl (T ). Thus there exists h 0 satisfying h T y < 0 and h T w 0 for all w cl (T ). Notice that λ T Mh + αg T h 0 for all λ, α satisfying λ α, which implies that Mh g T h, and so h F. On the other hand, since y F, it follows that h T y 0, contradicting h T y < 0. [ ] [ ] 0 The lack of closure of T can arise easily. Let M = and g =. In this case, 0 0 T = {( λ + α, λ 2 ) (λ, λ 2 ) α}. It is easy to verify that (0, ) / T but (ε, ) T for every ε > 0 (set λ = 2ε ε 2, λ 2 =, and α = 2ε + ε 2 ), which shows that T is not closed. Proof of Theorem : Since int F, Proposition implies that D n < 0, and so for the sake of this proof we re-scale (M, g) by / D n in order to conveniently satisfy D n =. (i) Define H := {y : y T QDQ T y 0, y T Q n 0}. We first prove that F H. If y F, then 0 y T (M T M gg T )y = y T QDQ T y. We now prove that y T Q n 0. Define λ := MQ n and α = g T Q n, whereby α 0 by the presumption above. Then λ = (Q n ) T M T MQ n = (Q n ) T (QDQ T + gg T )Q n = (g T Q n ) 2 < g T Q n = g T Q n = α, which also shows that g T Q n. Direct arithmetic substitution shows that M T λ + gα = Q n, whereby we have y T Q n = y T (M T λ + gα) My λ + g T yα 0. This shows that y H. Next suppose that y H. Then y T M T My (g T y) 2, whereby y F unless g T y < 0. Supposing this is the case, it follows that g T y My, and using the values of λ, α above, we have 0 y T Q n = y T M T λ + y T gα λ My α My = My (α λ ) 0. This then implies that My = 0 (since we showed above that α λ > 0), and hence g T y = 0 as well since all inequalities above are then equalities. This contradiction establishes that g T y 0 and hence y F, completing the proof of (i). (ii) Having established (i), suppose that D i = 0 for some i {,..., n }. Then (θq i ) T QDQ T (θq i ) = 0 and Q T n (θq i ) = 0 whereby θq i F for all θ, violating the hypothesis 5

6 that F is regular. Therefore D i > 0 for all i {,..., n } and hence D exists. Define J := {z : z T QD Q T z 0, z T Q n 0}. Suppose that z J and y F, in which case y T z = y T QQ T z = n i= D 2 i (Q T i y)d 2 i (Q T i z) + yt Q n z T Q n n i= D i(q T n i y)2 i= D i (Q T i z)2 + y T Q n z T Q n 0 where the first inequality is an application of the Cauchy-Schwartz inequality, and the second inequality follows since z J and y F using part (i). Thus z F, which shows that J F. Next let Q denote the matrix of the first n columns of Q and let D denote the diagonal matrix composed of the n diagonal components D,..., D n. Then from part (i) we have F = {y : y T Q D Q T y Q T n y} = {y : D 2 QT y Q T n y}, and using Proposition 2 we know that F = cl T where T = { Q D 2 λ+q n α : λ α}. Let z T where z = Q D 2 λ+q n α and λ α. Then z T QD Q T z = ( Q D 2 λ + Qn α) T QD Q T ( Q D 2 λ + Qn α) = λ T λ α 2 0, and furthermore Q T n z = α 0, whereby z J. Thus T J. It then follows that F = cl T cl J = J, which completes the proof of (ii). To prove (iii), notice that Q T n z = αd n Q T n y 0 and z T QD Q T z = α 2 y T QDQ T QD Q T QDQ T y = α 2 y T QDQ T y (=) 0 since y F (y F) implies y T QDQ T y (=) 0, and hence z F (z F ) from part (ii). Furthermore y T z = αy T QDQ T y = 0 when y F, completing the proof of (iii). The proof of (iv) follows similar logic. Before proving Corollary we first prove: Proposition 3 Suppose that int F and rank(m T M) = n. Then g T (M T M) g > and ȳ := (M T M) g int F. Proof: Let α := g T (M T M) g > 0 since g 0 from Assumption. From Proposition we know there exists ŷ satisfying Mŷ < g T ŷ, and re-scale ŷ if necessary so that g T ŷ = α. Notice that ȳ optimizes the function f(y) = y T M T My 2g T y whose optimal objective function value is α. Therefore α ŷ T M T Mŷ 2g T ŷ < α 2 2α, which implies that α 2 > α > 0 and hence α >. Next observe that Mȳ = ȳ T M T Mȳ = α < α = gt ȳ, whereby ȳ int F. Proof of Corollary : (i) is a restatement of the definition of F, (iii) is a restatement of part (iii) of Theorem, and (iv) is a restatement of part (iv) of Theorem using the Sherman- Morrison formula: QD Q T = (M T M gg T ) = (M T M) (M T M) gg T (M T M) g T (M T M) g 6

7 together with the fact from Proposition 3 that g T (M T M) g >. It remains to prove (ii). Let K := {z IR n : z T QD Q T z 0}. Then from Theorem we have K = F F. Let ȳ = (M T M) g and note that ȳ int F from Proposition 3. Define H := {z IR n : ȳ T z 0}, and note that H F = F and H F = {0}. Therefore F = K H = {z IR n : z T QD Q T z 0, g T (M T M) z 0}. Using the Sherman-Morrison formula we obtain: F = {z T ( (M T M) (M T M) gg T (M T M) g T (M T M) g which after rearranging yields the expression in (ii). ) z 0, g T (M T M) z 0 } Remark 2 The case when F is not regular. Let Z and N partition the set of indices according to zero and nonzero values of D i. If D n = 0, then one can show that F is a halfsubspace in the subspace spanned by the Q i for i Z. If D n > 0, then F = {0}. If D n < 0, then F has an interior, and we can interpret Di = for i Z. Then Theorem remains valid if we interpret z T QD Q T z 0 in (ii) as i N D i(q T z) 2 i 0, (Q T z) 2 i = 0 for i Z, and y := αqd Q T z in (iv) as Q T i y := αd i Q T i z for i N and QT i y is set arbitrarily for i N. Remark 3 The case when rank(m) =. In this case M = fc T for some f, c and My = f c T y for any y. This implies that F = {y IR n : (g f c) T y 0, (g + f c) T y 0}. Therefore F is the intersection of either one or two halfspaces. 3 An Algorithm for Approximately Solving () 3. Basic Properties of () and the Polar Problem Pair Returning to () where x is the given vector, consider the following conditions in (y, z, θ): y θz = x y F z F z θ 0, θ z = θ. (2) Examining (2), we see that x is decomposed into x = y θz where y F and θz F, and (y, z) is feasible for the problems (). Let G denote the duality gap for (), namely G = y x + x T z. We also consider the following pair of conic problems that are polar to (): P : f := min x s x D : f := max w x T w s.t. s F s.t. w w F, 7 (3)

8 together with the following conditions in (s, w, ρ): s ρw = x s F w F w ρ 0, ρ w = ρ ; (4) here x is decomposed into x = s ρw where now (s, w) is feasible for the problems (3) and ρw F and s F. Let G denote the duality gap for (3), namely G = s x + x T w. It is a straightforward exercise to show that conditions (2) together with the complementarity condition y T z = 0 constitute necessary and sufficient optimality conditions for (), and similarly (4) together with s T w = 0 are necessary and sufficient for optimality for (3). Furthermore, solutions of (2) and (4) tranform to one-another: (y, z, θ) (s, w, ρ) = ( θz, y/ y, y ) (s, w, ρ) (y, z, θ) = ( ρw, s/ s, s ) with necessary modifications for the cases when y = 0 (set w = 0) and/or s = 0 (set z = 0). Proposition 4 Suppose (y, z, θ) satisfy (2) and (s, w, ρ) satisfy (4). Then (y, z) and (s, w) are feasible for their respective problems with respective duality gaps: (i) G = y T z, (ii) G = s T w. Furthermore, (iii) if (y, z) is optimal for (), then t = θ (iv) if (s, w) is optimal for (3), then f = ρ (v) (t ) 2 + (f ) 2 = x 2. Proof: To prove (i), observe y T z = z T x + θ z 2 = z T x + θ z = z T x + y x = G, and a similar argument establishes (ii). To prove (iii), observe that t = x y = θz = θ with similar arguments for (iv). To prove (v), notice that (y, z, θ) satisfy (2) and y T z = 0 if and only if (y, z) is optimal for (), in which case it is easy to verify that (s, w, ρ) ( θz, y/ y, y ) satisfy (4) and (s, w) is optimal for (3). Therefore x 2 = (y θz) T (y θz) = y T y + θ 2 = ρ 2 + θ 2 = (f ) 2 + (t ) 2. Proposition 5 If Q T n x 0, then t τ F x. Proof: We assume for the proof that x =, since t, f scale positively with x. Define c = t f Q n and note that c = t f. By definition of the width, B(c, t f τ F ) F. Note that x c = x T x + 2 t f Q T n x + t 2 Q f T 2 n Q n + t 2 = f 2 f. Therefore f x c. 8

9 Next observe that c + τ F c (x c) x c F which is equivalent to c + τ F t (x c) f x c the previous inequality, we have c + τ F t (x c) F. Thus we have f c + τ F t (x c) x = ( τ F t ) x c ( τ F t ) f. Therefore, t 2 = f 2 τ F t which implies that τ F t. F. By Proposition 6 Given x satisfying x = and Q T n x 0, suppose that (s, w, ρ) satisfies (4) with duality gap G στ F /2 for (3), where σ. Consider the assignment: (y, z, θ) ( ρw, s/ s, s ) (with the necessary modification that y = 0 if s = 0). Then (y, z, θ) satisfies (2), with duality gap G σ for (). Proof: Note that y T z = (wt s)ρ s στ F ρ 2 s and we have the following relations: (i) w T s στ F /2 /2, (ii) s = θ = y x t τ F from Proposition 5, and (iii) ρ = s x = s T w w T x /2 + f 3/2 from Proposition 4. Therefore y T z τ F σ τ σ. F 3.2 The Six Cases We assume here that the given x has unit norm, i.e., x =, and that we seek feasible solutions to () with duality gap at most σ where σ. Armed with Propositions 4, 5, and 6, we now show how to compute a feasible solution (y, z) of () with duality gap G σ. Our method is best understood with the help of Figure. We know from Section 3. and the conditions (2) and/or (4) that we need to decompose x into the sum of a vector in F plus a vector in F, and that the central axes of F and F are the rays corresponding to Q n and Q n respectively. Define the dividing hyperplane L F := {y : Q T n y = 0} perpendicular to the central axes of F and F, and define L + F := {y IRn : Q T n y 0} and L F := L+ F. We divide L + F into three regions: region corresponds to points in F, region 2 corresponds to points in L + F near the dividing hyperplane (where our nearness criterion will be defined shortly), and region 3 corresponds to points in L + F \ F that are far from L F. We divide L F similarly, into regions 4, 5, and 6. For each of the three regions in L + F we will work with the problem pair () and show how to compute a feasible solution (y, z) of () with duality gap G σ. For each of the three regions in L F we will instead work with the problem pair (3) and show how to compute a feasible solution (w, s) of (3) with duality gap G στ F /2, whereby from Proposition 6 we obtain a feasible solution (y, z) of () with duality gap G σ. We will consider six cases, one for each of the regions described above and in Figure. We first describe how we choose whether x is in region 2 or 3. Let s = Qx, therefore x = Q T s and s i = Q T i x, i =,..., n, and s =. For x L+ F \ F, define: ε P = ε P (x) := Q T n x D n n i= D i(q T i x)2, (5) and notice that x L + F implies ε P 0 and x / F implies ε P <, and smaller values of ε P correspond to Q T n x closer to zero and hence x closer to L F. We specify a tolerance ε P and 9

10 F θ x 3 L F F ρ Figure : The geometry of the sets F, F, and L F, and the six cases. The central axes of F and F are the rays generated by ±Q n, respectively, which are orthogonal to the hyperplane L F. The regions corresponding to the six cases are shown as well. determine whether x is in region 2 or 3 depending on whether ε P ε or ε P > ε, respectively, where we set ε = ε P := στ F. Case : Q T n x 0 and x T QDQ T x 0. From Theorem we know that x F. Then it is elementary to show that (y, z, θ) (x, 0, 0) satisfy (2) with y T z = 0 whereby from Proposition 4 the duality gap is G = 0. Case 2: Q T n x 0 and x T QDQ T x > 0, ε P ε P := στ F. Let ŷ solve the following system of equations: [I + / D n D]Q T ŷ = Q T x e n Q T n x Q T (6) n ŷ = 0 where e n = (0,..., 0, ) IR n. Notice that the last row of the first equation system has all zero entries. Therefore this system is not over-determined, and one can write the closedform solution (Q T ŷ) i = (Q T x) i /( + / D n D i ) for i =,..., n and (Q T ŷ) n = 0, in the transformed variables ŝ := Q T ŷ. Having computed ŷ, next compute α := ŷ T QDQ T ŷ/ D n, 0

11 and then make the following assignments to variables: ȳ ŷ + αq n θ ȳ T QD 2 Q T ȳ/ D n z QDQ T ȳ/( D n θ) y ȳ + Q T n xq n Proposition 7 Suppose that x =, σ, and ε P ε <, and (y, z, θ) are computed according to Case 2 above. Then (y, z, θ) is feasible for (2) with duality gap G ε/τ F for (). Applying Proposition 7 using ε = ε P := στ F ensures that the resulting duality gap satisfies G ε/τ F = σ. Note that the complexity of the computations in Case 2 is O(mn 2 ) (assuming that square roots are sufficiently accurately computed in O() operations). Proof of Proposition 7: It is easy to establish that (Q T x,..., QT n x) 0 and hence α > 0. This in turn implies that Q T n ȳ = α > 0 and hence θ > 0, so z is well-defined. It is straightforward to verify: ȳ T QDQ T ȳ = (ŷ + αq n ) T QDQ T (ŷ + αq n ) = ŷ T QDQ T ŷ α 2 D n = 0, which shows via Theorem that ȳ F and therefore z F and z T ȳ = 0. straightforward to verify that z =. Finally, we have from (6) that It is also [I + / D n D]Q T ȳ = [I + / D n D](Q T ŷ + αe n ) = [I + / D n D](Q T ŷ) = Q T (x Q n Q T n x) (where the second equality above follows since the last row and column of the matrix are zero), hence ȳ + / D n QDQ T ȳ = x Q n Q T n x. Substituting the values of y, z, θ into this expression yields y θz = x, which then shows that (y, z, θ) satisfy (2). Therefore from Proposition 4 (y, z) is feasible for () with duality gap G = z T y = z T ȳ+z T Q n Q T n x Q T n x = ε P n i= D i(q T i x)2 Dn ε D Dn ε D + D n = ε/τ F Dn Case 3: Q T n x 0 and x T QDQ T x > 0, ε P > ε P := στ F. Here x is on the same side of the dividing hyperplane L F as F but is neither in F nor close enough to L F in the nearness measure. Consider the following univariate function in γ: f(γ) := x T Q[I + γd] D[I + γd] Q T x = n i= D i (x T Q i ) 2 ( + D i γ) 2, (7) shown canonically in Figure 2. Notice that f(0) = x T QDQ T x > 0, and since D n < 0 we have f(γ) as γ / D n. Furthermore, f (γ) = 2 n i= D 2 i (xt Q i ) 2 ( + γd i ) 3 < 0 for γ [0, / D n ). Therefore f(γ) is strictly decreasing in the domain [0, / D n ) whereby from the mean value theorem there is a unique value γ (0, / D n ) for which f(γ ) = 0. We show in Section 4 how to combine binary search and Newton s method to very efficiently compute

12 γ (0, / D n ) satisfying f(γ) 0 and f(γ) 0 (and γ γ ). Presuming that this can be done very efficiently, consider the following variable assignment: y Q [I + γd] Q T x θ γ y T QD 2 Q T y z γqdq T y/θ (8) We now show that (y, θ, z) satisfy (2). First note that Q T n y = Q T n x/( γ D n ) > 0, and furthermore this shows that θ > 0 and so z is well-defined. By the hypothesis that f(γ) 0 we have y T QDQ T y = x T Q[I + γd] D[I + γd] Q T x = f(γ) 0, which implies that y F and hence z F from Theorem. It is also straightforward to verify that z =. Finally, rearranging the formula for y yields: x = y + γqdq T y = y θz, which shows that (2) is satisfied. From Proposition 4, (y, z) is feasible for (), and using the above assignments the duality gap works out to be G = y T z = f(γ)/ x T QD 2 [I + γd] 2 Q T x, whereby G will be small if f(γ) 0. To make this more precise requires a detailed analysis of binary search and Newton s method, which is postponed to Section 4 where we will prove: Proposition 8 Suppose that x =, > ε P > ε, and g > 0 is a given gap tolerance. If Q T n x > 0 and x T QDQ T x > 0, then a solution (y, z, θ) of (2) with duality gap G g for () is computable in O(n ln ln(/τ F + / ε + /g)) operations. Substituting ε = ε P := στ F and g = σ, it follows that the complexity of computing a feasible of solution of (y, z) of () with duality gap at most σ is O(n ln ln(/τ F + /σ))=o(n ln ln(/ min{τ F, τ F } + /σ)) operations. Case 4: Q T n x 0 and x T QD Q T x 0. From Theorem we know that x F. Then it is elementary to show that (y, z, θ) (0, x/ x, x ) satisfy (2) with y T z = 0 whereby from Proposition 4 the duality gap is G = 0. Before describing how we treat Cases 5 and 6 (corresponding to regions 5 and 6), we need to describe how we choose whether x is in region 5 or 6. We use a parallel concept to that used to distinguish regions 2 and 3, except that F is replaced by F, see Figure. For x L F \ F, define the following quantity analogous to (5): ε P = ε P (x) := Q T n x / D n n i= (/D i)(q T i x)2, (9) and notice that x L F implies ε P 0 and x / F implies ε P <, and smaller values of ε P correspond to Q T n x closer to zero and hence x closer to L F. We specify a tolerance ε P and determine whether x is in region 5 or 6 depending on whether ε P ε or ε P > ε, respectively, where we set ε = ε P := στf 2 /2. 2

13 Case 5: Q T n x 0 and x T QD Q T x > 0, and ε P ε P := στf 2 /2. This case is an exact analog of Case 2, with F replaced by F and the pair () replaced by (3). Therefore the methodology of Case 2 can be used to compute (s, w, ρ) satisfying (4) and hence (s, w) is feasible for (3). Applying Proposition 7 to the context of the polar pair (3) with ε = ε P, it follows that the duality gap for (3) will be G = s T w and will satisfy G ε/τ F = στf 2 /(2τ F ) στ F /2. Converting (s, w, ρ) to (y, z, θ) using Proposition 6, we obtain (y, z) feasible for () with duality gap G σ. Here the complexity of the computations is of the same order as Case 2. Case 6: Q T n x 0 and x T QD Q T x > 0, and ε P > ε P := στf 2 /2. In concert with the previous case, this case is an exact analog of Case 3, with F replaced by F and the pair () replaced by (3). Therefore the methodology of Case 3 can be used to compute (s, w, ρ) satisfying (4) and hence (s, w) is feasible for (3). Applying Proposition 8 to the context of the polar pair (3) with ε = ε P and g = στ F /2, it follows that a solution (s, w, ρ) of (4) with duality gap G g = στ F /2 for (3) is computable in O(n ln ln(/τ F + / ε + /g)) = O(n ln ln(/ min{τ F, τ F }+/σ)) operations. Converting (s, w, ρ) to (y, z, θ) using Proposition 6, we obtain (y, z) feasible for () with duality gap G σ. Proof of Theorem 2: The spectral decomposition of M T M = QDQ T is assumed to take O(mn 2 ) operations. The computations in cases and 4 are trivial after checking the conditions of the cases, which is O(mn 2 ) operations, and similarly for cases 2 and 5. Regarding cases 3 and 6, the discussion in the description of these cases establishes the desired operation bound. Remark 4 The case when F is not regular, again. As in Remark 2, let Z and N partition the set of indices according to zero and nonzero values of D i. Consider the case when D n < 0 (the cases when D n > 0 and D n = 0 were discussed in Remark 2). We interpret Di = for i Z. Consider the orthonormal transformation Q T x and Q T y, Q T z of the given vector x and the variables y, z. Then for i Z simply set Q T i y = QT i x and set QT i z = 0, and work in the lower-dimensional problem in the subspace spanned by Q i, i N. 4 Proof of Proposition 8 This section is devoted to the proof of Proposition 8. Our algorithmic approach is motivated by Ye [5], and consists of a combination of binary search and Newton s method to approximately solve f(γ) = 0 for the function f given in (7). An alternate approach would be to use interpolation methods as presented and analyzed in Meldman [2], for which global quadratic convergence is proved but there is no complexity analysis of associated constants. While Proposition 8 indicates that a solution (y, z, θ) of (2) with duality gap G g for () can be computed extremely efficiently, unfortunately our proof is not nearly as efficient as we or the reader might wish. We assume throughout this section that the hypotheses of Proposition 8 hold. We start with a review of Smale s main result for Newton s method in [4]. 3

14 4. Newton s Method and Smale s Results Let g be an analytic function, and consider the Newton iterate from a given point ˆγ: γ + = ˆγ g(ˆγ) g (ˆγ) and let {γ k } k 0 denote the sequence of points generated starting from ˆγ = γ 0. Definition A point γ 0 is said to be an approximate zero of g if γ k γ k (/2) 2k γ γ 0 for k. For an approximate zero γ 0, let γ = lim k γ k. Then γ is a zero of g and Newton s method starting from γ 0 converges quadratically to γ from the very first iteration. The main result in [4] can be re-stated as follows. Theorem 3 (Smale [4]) Let g be an analytic function. If ˆγ satisfies g (k) (ˆγ) /(k ) sup k> k!g g (ˆγ) (ˆγ) 8 g(ˆγ), (0) then ˆγ is an approximate zero of g. Furthermore, if ˆγ is an approximate zero of g, then γ k γ 2(/2) 2k γ γ 0 for all k. 4.2 Properties of f(γ) We employ the change of variables s = Q T x, whereby from the hypotheses of Proposition 8 we have s n > 0, s T n Ds > 0, and ε P = s n Dn / j= D is 2 i > ε. We consider computing a zero of our function of interest: n f(γ) = s T (I + γd) 2 D i s 2 i Ds = ( + γd i ) 2 () Lemma Under the hypotheses of Proposition 8 f has the following properties: (i) f(0) > 0, lim f(γ) =, and f has a unique root γ / D γ (0, / D ) (ii) f is analytic on ( /D, / D n ) and for k the k th derivative of f is f (k) (γ) = ( ) k (k + )!s T (I + γd) (k+2) D k+ s = ( ) k (k + )! f (k) (γ) (iii) sup k!f (γ) k> /(k ) 3 2 max { D + γd, i= } D n γ D n 4 n i= Di k+ s 2 i ( + γd i ) k+2

15 (iv) ε P D n +ε P D γ ε P D n where ε P is given by (5) (v) There exists a unique value γ ( /D, / D n ) such that f is convex on ( /D, γ] and concave on [ γ, / D n ). Proof: (i) follows from the Mean Value Theorem and the observation that f is decreasing on (0, / D ), and (ii) follows using a standard derivation. To prove (iii) observe f (k) (γ) /(k ) k!f (γ) = (k + )! /(k ) s T (I + γd) (k+2) D k+ s /(k ) 2k! s T (I + γd) 3 D 2 s 3 [ s T (I + γd) 3/2 D (I + γd) k D] D (I + γd) 3/2 /(k ) s 2 s T (I + γd) 3/2 D 2 (I + γd) 3/2 s 3 2 max v T P k v /(k ) v 0 v T v = 3 { } 2 max Di i=,...,n + γd i where P = (I + γd) D. Therefore f (k) (γ) /(k ) k!f (γ) 3 { } 2 max Di i=,...,n + γd i 3 2 max { D + γd, } D n γ D n, which proves (iii). To prove the first inequality of (iv), note that f(γ) = n i= D i s 2 n i ( + γd i ) 2 ( + γd ) 2 i= D i s 2 i D n s 2 n ( + γd n ) 2. The right-hand side of the expression above equals zero only at γ := ε P D n +ε P D. This implies that f( γ) 0, whereby γ γ since f is strictly decreasing. For the second inequality note that ε P (0, ) since s n > 0 and s T Ds > 0. We have f(γ) < n i= s2 i D i D n s 2 n/( + γd n ) 2, and substituting γ = ε P D n into this strict inequality yields f( ε P D n ) < 0, which then implies that γ < ε P D n. To prove (v), examine the derivatives of f in (ii), and notice that f (k) (γ) < 0 for any odd value of k, whereby f is strictly decreasing. Let γ be the unique point in ( /D, / D n ) such that f ( γ) = 0. Since f is strictly decreasing, f is convex on ( /D, γ) and concave on ( γ, / D n ) Figure 2 illustrates the geometry underlying some of the analytical properties of f described by Lemma. ( ] D, 2 D n 2D Remark 5 In the interval the maximum in (iii) of Lemma is [ ) in the interval 2 D n D 2D, D n the maximum is n +γd n. D +γd and 5

16 f(0) > 0 γ γ γ = D γ = D n Figure 2: The function f on the interval ( /D, / D n ). Among many desirable properties, f is strictly decreasing, analytic, and has a unique root γ (0, / D n ). Moreover, f is convex over ( /D, γ) and concave over ( γ, / D n ), where γ is the unique point satisfying f ( γ) = 0. Note that one can have γ γ or γ γ. 4.3 Locating an Approximate Zero of f by Binary Search From Lemma we know that γ (0, Ū] where Ū := ( ε)/ D n. We will cover this interval with subintervals and use binary search to locate an approximate zero of f, motivated by the method of Ye [5]. Noticing from Remark 5 that the maximum in (ii) of Lemma depends on the midpoint M := 2 D n 2D, we will consider two types of subintervals, the left intervals will cover [0, max{0, M}], and the right intervals will cover [ max{0, M}, Ū]. (Of course, in the case when M 0 there is no need to create the left intervals.) ( ( ) ) i The left intervals will be of the form [L i, L i ] where L i := 3 D 2 for i = 0,,.... If M 0 we do not consider creating these intervals. The right intervals will have the form ( ) ( i [R i, R i ] where R i := D n D Ū 3 n 2) for i = 0,,.... 6

17 Let [a, b] denote one of these intervals (either [L i, L i ] or [R i, R i ] for some i). Note that if f(a) 0 and f(b) 0, then γ [a, b]. Supposing that this case, it follows from Lemma that f is either convex on [a, γ ] or concave on [γ, b] (or both), and consider starting Newton s method from ˆγ = a in the first case or ˆγ = b in the second case. Then the Newton step satisfies γ + = ˆγ f(ˆγ) f (ˆγ) f(ˆγ) f (ˆγ) = γ+ ˆγ γ ˆγ b a, (2) where the first inequality follows either from the convexity of f on [a, γ ] or the concavity of f on [γ, b]. In particular, we have f(ˆγ) f (ˆγ) γ ˆγ (3) which relates the value of the function at an approximate solution and the error in our approximation. Lemma 2 Under the hypotheses of Proposition 8 the intervals described herein have the following propeties: (i) the total number of left intervals and right intervals needed to cover [0, Ū] is K L := ln(/2)+2 ln(/τf ) ln(3/2) + and KR := ln(/ ε) ln(3/2), respectively. (ii) let [a, b] denote one of these intervals, and suppose that f(a) 0 and f(b) 0. Then either a or b is an approximate zero of f, and γ [a, b]. (iii) R i R i 2 D n for i =,..., K R and L i L i 2 D n for i =,..., K L. Proof: We first prove (i) for the right intervals. We have R 0 = Ū and R K R = D n ε D n ( ) 3 KR 2 D n ε { D n ε min, D n + } { = max 0, 2D 2 2 D n } = max{0, M}, 2D thus the right intervals cover [max{0, M}, Ū]. Note that using the above reasoning one easily shows that because K R + ln(/ ε)/ ln(3/2) one also has ( ) 3 KR ε. (4) For the left intervals, first consider the case when M 0. Then D n D and τ F 2, whereby there is no need to take the nonnegative part in the definition of K L. We have L 0 = 0 and L K L = ( (3 ) ) KL ( ) D 2 D 2τF 2 = ( ) D + D n = M, D 2 D n 7

18 thus the left intervals cover [0, M] = [0, max{0, M}]. Note that using the above reasoning one easily shows that because K L + ln(/2)+2 ln(/τ F ) ln(3/2) one also has ( ) 3 KL τF 2. (5) When M 0 there is nothing to prove. To prove (ii), we consider the two cases of [a, b] being either a left or right interval. If [a, b] is a left interval, then M 0 and b = a(3/2) + 2D. In this case, for one of ˆγ = a or ˆγ = b we have for all k > : f (ˆγ) 8 f(ˆγ) /8 b a = /8 (/2)(a + /D ) 3 D f (k) (ˆγ) /(k ) 2 + ˆγD k!f, (γ 0 ) where the first inequality uses (2), the second inequality uses a ˆγ, and the third inequality uses Remark 5 and the fact that ˆγ M in conjunction with Lemma. Therefore ˆγ is an approximate zero of f. If [a, b] is a right interval, then a = b(3/2) 2 D n and M a b. In this case, for one of ˆγ = a or ˆγ = b and we have for all k > : 8 f (ˆγ) f(ˆγ) /8 b a = /8 b b(3/2) + = 2 D n 3 D n 2 ˆγ D n f (k) (γ 0 ) /(k ) k!f, (γ 0 ) /8 2 ( D b) = 3 n 2 D n b D n where the first inequality uses (2), the second inequality uses M a ˆγ b, and the third inequality uses Remark 5 and the fact that ˆγ M in conjunction with Lemma. Therefore ˆγ is an approximate zero of f. To prove (iii), for the right intervals R i R i = ε 3 D n ( ) 3 i 2 ε 3 D n ( ) 3 KR D n = 2 D n by the definition of K R and the second inequality derives from (4). For the left intervals we can assume M 0 (otherwise they are not constructed), in which case D D n. In this case, we have L i L i = 3D ( ) 3 i 2 3D ( ) 3 KL 3 2 3D 24τF 2 by the definition of K L and the second inequality derives from (5). = ( + ) 24 D D n 2 D n, Based on these properties, consider the following method for locating an approximate zero of f. Perform binary search on the endpoints of the intervals, testing the endpoints to locate an interval [a, b] for which f(a) 0 and f(b) 0. Then either a or b is an approximate zero of 8

19 f. Then initiate Newton s method from both a and b either in parallel or iterate-sequentially. Notice that in order to perform binary search on the left and right intervals there is no need to compute and evaluate f for all of the endpoints. In fact, the operation complexity of a binary search will be O(n ln K L ) and O(n ln K R ), respectively, since each function evaluation of f requires O(n) operations. 4.4 Computing a Solution of () with Duality Gap at most σ Under the hypotheses of Proposition 8, suppose that [a, b] is one of the constructed intervals, and f(a) 0 and f(b) 0. Then from Lemmas and 2, γ [a, b] and either f is convex on [a, γ ] or concave on [γ, b] (or both). We first analyze the latter case, i.e., when f is concave on [γ, b] whereby b is an approximate zero of f, and we analyze the iterates of Newton s method for k iterations starting at γ 0 = b. Let γ := γ k be the final iterate. It follows from the concavity of f on [γ, b] that γ γ and consequently f(γ) 0. Then the analysis in Case 3 shows that the assignment (8) yields a feasible solution of () with duality gap G = f(γ)/ s T D 2 [I + γd] 2 s. The following result bounds the value of this duality gap: Lemma 3 Let g (0, ] be the desired duality gap for (), and let ( ( ) ( )) ln ln 3g + ε ln ln 2 τf k = ln 2. Under the hypotheses of Proposition 8 and the set-up above where b is an approximate zero of f, let γ 0 := b and γ,..., γ k be the Newton iterates, and define γ := γ k. Then the assignment (8) will be feasible for () with duality gap at most g. Proof: We have f(γ) f (γ) γ γ from the concavity of f on [γ, b]. Also, we have f (γ) = 2 Substitute +γd n n i= Di 2s2 n i ( + γd i ) 3 2 Di 2s2 i ( + γd i ) Dns 2 2 n ( + γd n ) 2 ( + γd n ). = + γdn +γd n to obtain f (γ) 2 n i= i= D 2 i s2 i ( + γd i ) 2 2 γd 3 ns 2 n ( + γd n ) 3. 9

20 Let G = y T z denote the duality gap. Then G = f(γ) s T D 2 [I+γD] 2 s f (γ) γ γ s T D 2 [I+γD] 2 s 2 n D i 2 s2 i i= (+γd i ) 2 +2 γ D n 3 s 2 n (+γdn) 3 s T D 2 [I+γD] 2 s γ γ = ( 2 s T D 2 [I + γd] 2 s + 2 γ D n 3 s 2 n (+γd n ) 3 s T D 2 [I+γD] 2 s ) γ γ ( ) 2D + 2 D n +γd n + 2 γd2 ns n (+γd n) γ γ, 2 where we used s T D 2 [I + γd] 2 s D n s n /( + γd n ) in the last inequality. Next note that γ Ū = ε D n which implies that ε +γd n. Therefore, recalling that γ is the k th iterate we have ( G 2 γ γ D + Dn ) ε + ( ε)d2 n ( ) D n ε 2 2 γ γ D n + ε τf ( 2 2 ) ( ) 2 4 γ γ 0 D n + ε k τf ( ) ( ) D n D n + ε k τf ( ) ( ) ε k 2 2, τ 2 F where we used Theorem 3 for the third inequality and Lemma 2 for the fourth inequality. Substituting the value of k above yields G g. Last of all, we analyze the case when f is convex on [a, γ ], whereby a is an approximate zero of f, and we analyze the iterates of Newton s method for k iterations starting at γ 0 = a. Let γ k be the final iterate. It follows from the convexity of f on [a, γ ] that γ k γ and consequently f(γ k ) 0, in which case the assignment (8) is not necessarily feasible for (). However, invoking Theorem 3 we know that γ k + 2(/2) 2k γ γ 0 γ, and we also know that Ū γ, and we can set γ := min{γ k + 2(/2) 2k γ γ 0, Ū}. Then the analysis in Case 3 shows that the assignment (8) yields a feasible solution of () with duality gap G = f(γ)/ s T D 2 [I + γd] 2 s. The following result bounds the value of this duality gap: Lemma 4 Let g (0, ] be the desired duality gap for (), and let ( ( ) ( )) ln ln 6 3g + ε ln ln 2 τf k = ln 2. Under the hypotheses of Proposition 8 and the set-up above where a is an approximate zero of f, let γ 0 := a and γ,..., γ k be the Newton iterates, and define γ := min{γ k + 2(/2) 2k γ γ 0, Ū}. Then the assignment (8) will be feasible for () with duality gap at most g. 20

21 Proof: Define δ := γ γ k, and it follows that δ 0 and γ k + δ Ū. Furthermore, δ 2(/2) 2k γ γ 0 ) min{ ε2,τf 2 } [/τ 2 F +/ ε 2 ]2 D n ( 6 2 3g D n min{ ε,τ 2 F /( τ 2 F )} D n = min{ ε/ D n, /D }. (6) Therefore δ ε/ D n, whereby + γ k D n + 2δD n = (γ k + δ) D n δ D n + ε ε = 0, where we also used γ k + δ Ū = ( ε)/ D n. Therefore + γ k D n 2( + (γ k + δ)d n ) 2( + td n ) for all t [γ k, γ k + δ]. (7) We also have from (6) that δ /D /D i /D i + γ k for i =,..., n, hence The duality gap of the assignment (8) is + γ k D i + δd i 2( + γ k D i ), i =,..., n. (8) G = y T z = f(γ) = s T D 2 [I + γd] 2 s f(γ k + δ) s T D 2 [I + (γ k + δ)d] 2 s. We now proceed to bound the numerator and denominator of the right-most expression. For the numerator we have: f(γ k + δ) = f(γ k + δ) = f(γ k ) + γk +δ γ k f (t)dt. However, observe that f(γ k ) 0, f(γ k + δ) 0, and f (t) 0 for all t [0, / D n ), whereby Using (7) for t [γ k, γ k + δ] we have n f (t) = 2 i= f(γ k + δ) γk +δ Di 2s2 i ( + td i ) Dns 2 2 n n ( + td n ) 3 2 γ k f (t) dt. i= D 2 i s2 i ( + γ k D i ) D 2 ns 2 n ( + γ k D n ) 3 8 f (γ k ), and it follows that f(γ k +δ) 8δ f (γ k ). To bound the denominator, simply notice from (8) and + γ k D n + δd n + γ k D n that s T D 2 [I + (γ k + δ)d] 2 s (/2) s T D 2 [I + γ k D] 2 s. Therefore f(γ k + δ) δ f (γ k ) G = 6. s T D 2 [I + (γ k + δ)d] 2 s s T D 2 [I + γ k D] 2 s Next notice from the logic from the proof of Lemma 3 that f ( (γ k ) 2 D n s T D 2 [I + γ k D] 2 s τf 2 + ε ) 2, therefore G 32δ D n ( τ 2 F + ε 2 ) 32 D n ( τ 2 F + ε 2 ) ( 6 3g 2 ) [/τ 2 F + / ε 2] 2 D n = g, 2

22 where the last inequality uses the second inequality of (6). Proof of Proposition 8: Note from the discussion at the end of Section 4.3 that the operation complexity of the binary search is O(n ln K L + n ln K R ) = O(n ln ln(/τ F + / ε)) from Lemma 2. The number of Newton steps is O(ln ln(/τ F + / ε + /g)) from Lemmas 3 and 4 with each Newton step requiring O(n) operations, yielding the desired complexity bound. References [] A. Belloni, R. M. Freund, and S. Vempala. Efficiency of a re-scaled perceptron algorithm for conic systems. Working Paper OR , MIT Operations Research Center, [2] A. Meldman. A unifying convergence analysis of second-order methods for secular equations. Mathematics of Computation, 66(27): , 997. [3] R. T. Rockafellar. Convex Analysis. Princeton University Press, Princeton, New Jersey, 970. [4] S. Smale. Newton s method estimates from data at one point. The Merging of Disciplines: New Directions in Pure, Applied and Computational Mathematics, pages 85 96, 986. [5] Y. Ye. A new complexity result for minimizing a general quadratic function with a sphere constraint. Recent advances in global optimization, pages 9 3,

c 2008 Society for Industrial and Applied Mathematics

c 2008 Society for Industrial and Applied Mathematics SIAM J. OPTIM. Vol. 9, No. 3, pp. 073 09 c 008 Society for Industrial and Applied Mathematics ON THE SECOND-ORDER FEASIBILITY CONE: PRIMAL-DUAL REPRESENTATION AND EFFICIENT PROJECTION ALEXANDRE BELLONI