An E cient A ne-scaling Algorithm for Hyperbolic Programming Jim Renegar joint work with Mutiara Sondjaja 1
Euclidean space A homogeneous polynomial p : E!R is hyperbolic if there is a vector e 2E such that for all x 2E, the univariate polynomial t 7! p(x + te) has only real roots. p is hyperbolic in direction e Example: E S n n, p(x) det(x), E I (identity matrix) then p(x + t E) is the characteristic polynomial of X The hyperbolicity cone ++ is the connected component of {x : p(x) 6 0} containing e. All roots are real because symmetric matrices have only real eigenvalues. For the example, ++ S n n ++ (cone of positive-definite matrices) the convexity of this particular cone is true of hyperbolicity cones in general... 2
Thm (Gårding, 15): ++ is a convex cone A hyperbolic program is an optimization problem of the form min hc, xi Ax b x 2 + ; HP closure of ++ Güler (17) introduced hyperbolic programming, motivated largely by the realization that f(x) ln p(x) is a self-concordant barrier function O( p n) iterations to halve the duality gap wheren is the degree of p Güler showed the barrier functions f(x) ln p(x) possess many of the nice properties of X 7! ln det(x) although hyperbolicity cones in general are not symmetric (i.e., self-scaled) 3
min hc, xi Ax b x 2 + ; HP There are natural ways in which to relax HP to hyperbolic programs for lower degree polynomials. For example, to obtain a relaxation of SDP... Fix n, and for 1 apple k apple n let k( 1,..., n) : P j 1 <...<j k j 1 j k elementary symmetric polynomial of degree k Then X 7! k ( (X)) is a hyperbolic polynomial in direction E I of degree k, and its hyperbolicity cone contains S n n ++ These polynomials can be evaluated e ciently via the FFT. Perhaps relaxing SDP s in this and related ways will allow larger SDP s to be approximately solved e ciently. The relaxations easily generalize to all hyperbolic programs. 4
min hc, xi Ax b x 2 + ; HP barrier function, f(x) ln p(x) its gradient g(x) and Hessian H(x) positive-definite for all x 2 ++ local inner product at e 2 ++ hu, vi e : hu, H(e)vi the induced norm: kvk e p hv, vi e Dikin ellipsoids : Be (e, r) {x : kx ek e apple r} The gist of the original a ne-scaling method due to Dikin is simply: Given a strictly feasible point e for HP and an appropriate value r>0, move from e to the optimal solution e + for min hc, xi Ax b x 2 B e (e, r) Dikin focused on linear programming and chose r 1 (giving the largest Dikin ellipsoids contained in R n +) also: Vanderbei, Meketon and Freedman (186) 5
In the mid-180 s, there was considerable e ort trying to prove that Dikin s e-scaling method runs in polynomial-time (perhaps with choice r<1) The e orts mostly ceased when in 186, Shub and Megiddo showed that the infinitesimal version of the algorithm can come near all vertices of a Klee-Minty cube. Nevertheless, several algorithms with spirit similar to Dikin s method have been shown to halve the duality gap in polynomial time: Monteiro, Adler and Resende 10 LP, and convex QP } Jansen, Roos and Terlaky 16 LP 17 PSD LCP-problems Sturm and Zhang 16 SDP Chua 2007 symmetric cone programming These algorithms are primal-dual methods and rely heavily on the cones being self-scaled. use scaling points and V-space } use ellipsoidal cones rather than ellipsoids Our framework shares some strong connections to the one developed by Chek Beng Chua, to whom we are indebted. 6
min hc, xi Ax b x 2 + ; HP For e 2 ++ and 0 < < p n,let K e ( ) :{x : he, xi e kxk e } this happens to be the smallest cone containing the Dikin ellipsoid B e e, p n 2 Keep in mind that the cone grows in size as decreases. min hc, xi Ax b x 2 + min hc, xi ; HP! Ax b x 2 K e ( ) ; QP e( ) Definition: Prop: Swath( ) {e 2 ++ : Ae b and QP e ( ) has an optimal solution} Swath(0) Central Path Thus, can be regarded as a measure of the proximity of points in Swath( ) to the central path. 7
min hc, xi Ax b x 2 + min hc, xi ; HP! Ax b x 2 K e ( ) ; QP e( ) {x : he, xi e kxk e } Definition: Swath( ) {e 2 ++ : Ae b and QP e ( ) has an optimal solution} Let x e ( ) optimal solution of QP e ( ) (assuming e 2 Swath( )) the main work in computing x e ( ) lies in solving a system of linear equations We assume 0 < <1, in which case + K e ( ) thus,k e ( ) is a relaxation of HP hence, optimal value of HP hc, x e ( )i Current iterate: e 2 ++ Next iterate will be e 0, a convex combination of e and x e ( ) e 0 1 1+t e + tx e( ) The choice of t is made through duality... 8
min hc, xi Ax b x 2 + min hc, xi ; HP! Ax b x 2 K e ( ) ; QP e( ) {x : he, xi e kxk e } Definition: Swath( ) {e 2 ++ : Ae b and QP e ( ) has an optimal solution} Let x e ( ) optimal solution of QP e ( ) (assuming e 2 Swath( ))
min hc, xi Ax b x 2 + min hc, xi ; HP! Ax b x 2 K e ( ) ; QP e( ) {x : he, xi e kxk e } Definition: Swath( ) {e 2 ++ : Ae b and QP e ( ) has an optimal solution} Let x e optimal solution of QP e ( ) (assuming e 2 Swath( )) 10
min hc, xi Ax b x 2 + min hc, xi ; HP! Ax b x 2 K e ( ) ; QP e( ) {x : he, xi e kxk e } Definition: Swath( ) {e 2 ++ : Ae b and QP e ( ) has an optimal solution} Let x e optimal solution of QP e ( ) (assuming e 2 Swath( )) max b T y A y + s c s 2 + ; HP! max b T y A y + s c x 2 K e ( ) ; QP e( ) First-order optimality conditions for x e yield optimal solution (y e,s e ) for QP e ( ) Moreover, (y e,s e ) is feasible for HP because + K e ( ) and hence K e ( ) + primal-dual feasible pair: e for HP, (y e,s e ) for HP duality gap: gap e : hc, ei b T y e 11
Swath( ) {e 2 ++ : Ae b and QP e ( ) has an optimal solution} x e optimal solution of QP e ( ) (assuming e 2 Swath( )) primal-dual feasible pair: e for HP, (y e,s e ) for HP Current iterate: e 2 ++ Next iterate will be a convex combination of e and x e : e(t) 1 1+t e + tx e Want t to be large so as to improve primal objective value, but also want e(t) 2 Swath( ) We choose t to be the minimizer of a particular quadratic polynomial, and thereby ensure that: e(t) 2 ++ s e 2 int(k e(t) ( ) ) consequently, both e(t) is strictly feasible for QP e(t) ( ) and (y e,s e ) is strictly feasible for QP e(t) ( ) hence, e(t) 2 Swath( ) 12
Swath( ) {e 2 ++ : Ae b and QP e ( ) has an optimal solution} x e optimal solution of QP e ( ) (assuming e 2 Swath( )) primal-dual feasible pair: e for HP, (y e,s e ) for HP Current iterate: e 2 ++ Next iterate will be a convex combination of e and x e : e(t) 1 1+t e + tx e Want t to be large so as to improve primal objective value, but also want e(t) 2 Swath( ) We choose t to be the minimizer of a particular quadratic polynomial, and thereby ensure that: t 1 2 /kx ek e and thus ensure good improvement in the primal objective value if, say, kx e k e apple p n 13
Swath( ) {e 2 ++ : Ae b and QP e ( ) has an optimal solution} x e optimal solution of QP e ( ) (assuming e 2 Swath( )) primal-dual feasible pair: e for HP, (y e,s e ) for HP Current iterate: e 2 ++ Next iterate will be a convex combination of e and x e : e(t) 1 1+t e + tx e Want t to be large so as to improve primal objective value, but also want e(t) 2 Swath( ) We choose t to be the minimizer of a particular quadratic polynomial, and thereby ensure that: s e 2 K e(t) ( ) where q 1+ 2 whichimpliess e is deep within K e(t) ( ) and hence (y e,s e ) is very strongly feasible for QP e(t) ( ) 14
e(t) 1 1+t e + tx e We choose t to be the minimizer of a particular quadratic polynomial, and thereby ensure that: 1. There is good improvement in primal objective value if kx e k e apple p n 2. (y e,s e ) is very strongly feasible for QP e(t) Sequence of iterates: e 0,e 1,e 2,... If i>0, then write x i and (y i,s i ) rather than x ei and (y ei,s ei ) 1. kx i k ei apple p n ) hc, e i+1 i hc, e i i 2. (y i 1,s i 1 ) is very strongly feasible for QP e i On the other hand, we show 3. (kx i k ei p n) ^ (2.) ) b T y i b T y i 1 In this manner we establish the Main Theorem... 15
K e ( ) :{x : he, xi e kxk e } Sequence of iterates: e 0,e 1,e 2,... Main Thm: The primal objective value improves monotonically, and so does the dual objective value. If i, k 0, then r! k 1 gap ei 1 1+ gap ei+k 8n e(t) 1 1+t e + tx e We choose t to be the minimizer of a particular quadratic polynomial In fact, the theorem holds if one simply chooses t 1 2 /kx ik ei, but choosing t to be the minimizer can result in steps that are far longer. So what is the particular quadratic polynomial? 16
K e ( ) :{x : he, xi e kxk e } Special Case of SDP: E 2 S n n ++, X E optimal for QP E ( ), (y E,S E ) optimal for QP E ( ) Let E(t) 1 1+t (E + tx E) Here is the quadratic polynomial: q(t) : tr ((E + tx E )S E ) 2 Prop: (t 0) ^ (E(t) 0) ) min : S E 2 K E(t) ( ) p n 1/q(t) Prop: The minimizer t of q satisfies t> 1 2 /kx Ek E, E(t) 0 and q(t) apple q 1+ 2. Corollary: S E is deep within K E(t) ( ) for the minimizer t of q 17
K e ( ) :{x : he, xi e kxk e } Hyperbolic Programming in General: e 2 ++, x e optimal for QP e ( ), (y e,s e ) optimal for QP e ( ) Let e(t) 1 1+t (e + tx e) As happened for SDP, we would like to have an easily computable function q for which (t 0) ^ (e(t) 2 ++ ) ) min : s e 2 K e(t) ( ) p n 1/ q(t) However, in general the resulting function q need not be a quadratic polynomial, nor do we see any reason that it necessarily be e ciently computable in fact, about the most we know is that q is semi algebraic. But we do know how to obtain a quadratic polynomial q which serves as an appropriate upper bound to the function q we do this by leveraging our SDP result with the (very deep) Helton-Vinnikov Theorem for hyperbolicity cones. 18
Helton-Vinnikov Theorem: If p : E!R is hyperbolic in direction e and of degree n, and if L is a 3-dimensional subspace of E containing e, then there exists a linear transformation T : L! S n n satisfying T (e) I and p(x) p(e) det(t (x)) for all x 2 L. 1
Luckily, the resulting quadratic polynomial always can be e ciently computed: First compute the five leading coe cients, 1, 2, 3, 4 of the univariate polynomial 7! p(x e + e) P n i1 a i i Then compute apple 1 1, apple 2 an 1 2 2, apple 3 an 1 3 3 2 1 2 + 1 2 3 and apple 4 an 1 4 2 an 1 2 2 + 1 2 an 2 2 + 2 3 6 1 3 1 4 The desired quadratic polynomial is t 7! at 2 + bt + c where a apple 2 1 apple 2 2 2 apple 1 apple 3 + 4 apple 4, b 2 4 apple 3 2 apple 3 1 and c (n 2 ) apple 2 1 20
Epilogue: Recently we learned of a paper on linear programming in which quadratic cones relaxing the non-negative orthant are used in devising a polynomial-time algorithm, albeit one with complexity O(nL) iterations rather than O( p nl) iterations. I.S. Litvinchev, A circular cone relaxation primal interior point algorithm for LP, Optimization 52 (2003) 52-540. 21