The moment-lp and moment-sos approaches in optimization

The moment-lp and moment-sos approaches in optimization LAAS-CNRS and Institute of Mathematics, Toulouse, France Workshop Linear Matrix Inequalities, Semidefinite Programming and Quantum Information Theory 2016, Toulouse, January 2016

978-0-387-09685-8.tif 827 1 355 pixels http://images.springer.com/covers/978-0-387-09685-8.tif 13/09/12 10:09 Page 1 sur 1 See in particular the Chapter by M. Laurent

M.F. Anjos, Polytechnique Montréal, QC, Canada; J.B. Lasserre, LAAS, Toulouse Cedex 4, France (Eds.) Handbook on Semidefinite, Conic and Polynomial Optimization Series: International Series in Operations Research & Management Science Summarizes research and developments of last ten years and brings field up to date Individual sections covering theory, algorithms, software and applications Editors are quite prominent in the field. 2012, 2012, XI, 957 p. 57 illus. Printed book Hardcover 199,95 180.00 $279.00 *213,95 (D) 219,94 (A) CHF 287.00 ebook For individual purchases buy at a lower price on springer.com. A free preview is available on SpringerLink. Also available from libraries offering Springer s ebook Collection. springer.com/ebooks MyCopy Printed ebook exclusively available to patrons whose library offers Springer s ebook Collection.*** $ 24.95 springer.com/mycopy Semidefinite and conic optimization is a major and thriving research area within the optimization community. Although semidefinite optimization has been studied (under different names) since at least the 1940s, its importance grew immensely during the 1990s after polynomial-time interior-point methods for linear optimization were extended to solve semidefinite optimization problems. Since the beginning of the 21st century, not only has research into semidefinite and conic optimization continued unabated, but also a fruitful interaction has developed with algebraic geometry through the close connections between semidefinite matrices and polynomial optimization. This has brought about important new results and led to an even higher level of research activity. This Handbook on Semidefinite, Conic and Polynomial Optimization provides the reader with a snapshot of the state-of-the-art in the growing and mutually enriching areas of semidefinite optimization, conic optimization, and polynomial optimization. It contains a compendium of the recent research activity that has taken place in these thrilling areas, and will appeal to doctoral students, young graduates, and experienced researchers alike. The Handbook s thirty-one chapters are organized into four parts:theory, covering significant theoretical developments as well as the interactions between conic optimization and polynomial optimization;algorithms, documenting the directions of current algorithmic development;software, providing an overview of the state-ofthe-art;applications, dealing with the application areas where semidefinite and conic optimization has made a significant impact in recent years.

Why polynomial optimization? LP- and SDP- CERTIFICATES of POSITIVITY The moment-lp and moment-sos approaches

Consider the polynomial optimization problem: P : f = min{ f (x) : g j (x) 0, j = 1,..., m } for some polynomials f, g j R[x]. Why Polynomial Optimization? After all... P is just a particular case of Non Linear Programming (NLP)!

True!... if one is interested with a LOCAL optimum only!! When searching for a local minimum... Optimality conditions and descent algorithms use basic tools from REAL and CONVEX analysis and linear algebra The focus is on how to improve f by looking at a NEIGHBORHOOD of a nominal point x K, i.e., LOCALLY AROUND x K, and in general, no GLOBAL property of x K can be inferred. The fact that f and g j are POLYNOMIALS does not help much!

BUT for GLOBAL Optimization... the picture is different! Remember that for the GLOBAL minimum f : f = sup { λ : f (x) λ 0 x K}. (Not true for a global minimum!)) and so to compute f... one needs to handle EFFICIENTLY the difficult constraint f (x) λ 0 x K, i.e. one needs TRACTABLE CERTIFICATES of POSITIVITY on K for the polynomial x f (x) λ!

REAL ALGEBRAIC GEOMETRY helps!!!! Indeed, POWERFUL CERTIFICATES OF POSITIVITY EXIST! Moreover... and importantly, Such certificates are amenable to PRACTICAL COMPUTATION! ( Stronger Positivstellensatzë exist for analytic functions but are useless from a computational viewpoint.)

SOS-based certificate K = { x : g j (x) 0, j = 1,..., m } Theorem (Putinar s Positivstellensatz) If K is compact (+ a technical Archimedean assumption) and f > 0 on K then: f (x) = σ 0 (x) + m σ j (x) g j (x), x R n, j=1 for some SOS polynomials (σ j ) R[x].

However... In Putinar s theorem... nothing is said on the DEGREE of the SOS polynomials (σ j )! BUT... GOOD news..!! Testing whether holds for some SOS (σ j ) R[x] with a degree bound, is SOLVING an SDP!

Semidefinite Programming The CONVEX optimization problem: P min x R n { c x n A i x i i=1 b}, where c R n and b, A i S m (m m symmetric matrices), is called a semidefinite program. The notation 0" means the real symmetric matrix " is positive semidefinite, i.e., all its (real) EIGENVALUES are nonnegative.

Example P : min {x 1 + x 2 : x [ 3 + 2x1 + x s.t. 2 x 1 5 x 1 5 x 1 2x 2 or, equivalently ] }, 0 P : min {x 1 + x 2 : x [ 3 5 s.t. 5 0 ] + x 1 [ 2 1 1 1 ] + x 2 [ 1 0 0 2 ] } 0

P and its dual P are convex problems that are solvable in polynomial time to arbitrary precision ɛ > 0. = generalization to the convex cone S + m (X 0) of Linear Programming on the convex polyhedral cone R m + (x 0). Indeed, with DIAGONAL matrices Semidefinite programming = Linear Programming! Several academic SDP software packages exist, (e.g. MATLAB LMI toolbox, SeduMi, SDPT3,...). However, so far, size limitation is more severe than for LP software packages. Pioneer contributions by A. Nemirovsky, Y. Nesterov, N.Z. Shor, B.D. Yudin,...

Dual side of Putinar s theorem: The K -moment problem Given a real sequence y = (y α ), α N n, does there exist a Borel measure µ on K such that y α = x α 1 1 x n αn dµ, α N n. K Introduce the so-called Riesz linear functional L y : R[x] R : ( f = ) f α x α L y (f ) = f α y α α α N n

Theorem If K = {x : g j (x) 0, j = 1,..., m} is compact and satisfies an Archimedean assumption then holds if and only if for every h R[x] 2 : ( ) L y (h 2 ) 0; L y (h 2 g j ) 0, j = 1,..., m. The condition ( ) is equivalent to m + 1 positive semidefiniteness of some moment and localizing matrices, i.e., M(y) 0; M(g j y) 0, j = 1,..., m. whose rows & columns are indexed by N n, and entries are LINEAR in the y α s

LP-based certificate K = {x : g j (x) 0; (1 g j (x)) 0, j = 1,..., m} Theorem (Krivine-Vasilescu-Handelman s Positivstellensatz) Let K be compact and the family {1, g j } generate R[x]. If f > 0 on K then: ( ) f (x) = α,β c αβ m j=1 g j (x) α j (1 g j (x)) β j,, x R n, for some NONNEGATIVE scalars (c αβ ).

However... Again in Krivine s theorem In ( )... nothing is said on how many nonnegative scalars c αβ are needed! BUT... GOOD news... again!! Testing whether ( ) holds for some nonnegative scalars (c αβ ) is SOLVING an LP!

Dual side of Krivine s theorem: The K -moment problem Theorem If K = {x : g j (x) 0, j = 1,..., m} is compact, 0 g j 1 on K, and {1, g j } generates R[x], then holds if and only if ( ) L y m α g j β j (1 g j j 0, α, β N m. j=1 The condition ( ) is equivalent to countably many LINEAR INEQUALITIES on the y α s

HENCE... SUCH POSITIVITY CERTIFICATES allow to infer GLOBAL Properties of FEASIBILITY and OPTIMALITY,... the analogue of (well-known) previous ones valid in the CONVEX CASE ONLY!

In addition, polynomials NONNEGATIVE ON A SET K R n are ubiquitous. They also appear in many important applications (outside optimization),... modeled as particular instances of the so called Generalized Moment Problem, among which: Probability, Optimal and Robust Control, Game theory, Signal processing, multivariate integration, etc. (GMP) : s inf { f i dµ i : µ i M(K i ) i=1 K i s i=1 K i h ij dµ i = b j, j J} with M(K i ) space of Borel measures on K i R n i, i = 1,..., s.

The DUAL of the GMP is the linear program GMP : sup λ j { s λ j b j : j J f i j J λ j h ij 0 on K i, i = 1,..., s } And one can see that... the constraints of GMP state that some functions f i j J λ j h ij must be nonnegative on a certain set K i, i = 1,..., s.

A couple of examples I: Global OPTIM f = inf x { f (x) : x K } is the SIMPLEST example of the GMP because... f = inf { f dµ : µ M(K) K K 1 dµ = 1} Indeed if f (x) f for all x K and µ is a probability measure on K, then K f dµ f dµ = f. On the other hand, for every x K the probability measure µ := δ x is such that f dµ = f (x).

II. Let K R n and S K be given, and let Γ N n be also given. BOUNDS on measures with moment conditions max { 1 S, µ : x α dµ = m α, α Γ } µ M(K) to compute an upper bound on µ(s) over all distributions µ M(K) with a certain fixed number of moments m α. K If Γ = N n then one may use this to compute the Lebesgue volume of a compact basic semi-algebraic set S K := [ 1, 1] n. Take m α := x α dx, α N n. [ 1,1] n

III. For instance, one may also want: To approximate sets defined with QUANTIFIERS, like.e.g., R f := {x B : f (x, y) 0 for all y such that (x, y) K} D f := {x B : f (x, y) 0 for some y such that (x, y) K} where f R[x, y], B is a simple set (box, ellipsoid). To compute convex polynomial underestimators p f of a polynomial f on a box B R n. (Very useful in MINLP.)

The moment-lp and moment-sos approaches consist of using a certain type of positivity certificate (Krivine-Vasilescu-Handelman s or Putinar s certificate) in potentially any application where such a characterization is needed. (Global optimization is only one example.) In many situations this amounts to solving a HIERARCHY of : LINEAR PROGRAMS, or SEMIDEFINITE PROGRAMS... of increasing size!.

LP- and SDP-hierarchies for optimization Replace f = sup λ,σj { λ : f (x) λ 0 x K} with: The SDP-hierarchy indexed by d N: m fd = sup { λ : f λ = σ 0 + σ }{{} j g j ; deg (σ j g j ) 2d } }{{} SOS j=1 SOS or, the LP-hierarchy indexed by d N: θ d = sup { λ : f λ = α,β c αβ }{{} 0 m α g j j (1 g j ) β j ; α + β 2d} j=1

Theorem Both sequence (f d ), and (θ d), d N, are MONOTONE NON DECREASING and when K is compact (and satisfies a technical Archimedean assumption) then: f = lim fd = lim θ d. d d

What makes this approach exciting is that it is at the crossroads of several disciplines/applications: Commutative, Non-commutative, and Non-linear ALGEBRA Real algebraic geometry, and Functional Analysis Optimization, Convex Analysis Computational Complexity in Computer Science, which BENEFIT from interactions! As mentioned... potential applications are ENDLESS!

Has already been proved useful and successful in applications with modest problem size, notably in optimization, control, robust control, optimal control, estimation, computer vision, etc. (If sparsity then problems of larger size can be addressed) HAS initiated and stimulated new research issues: in Convex Algebraic Geometry (e.g. semidefinite representation of convex sets, algebraic degree of semidefinite programming and polynomial optimization) in Computational algebra (e.g., for solving polynomial equations via SDP and Border bases) Computational Complexity where LP- and SDP-HIERARCHIES have become an important tool to analyze Hardness of Approximation for 0/1 combinatorial problems ( links with quantum computing)

Recall that both LP- and SDP- hierarchies are GENERAL PURPOSE METHODS... NOT TAILORED to solving specific hard problems!!

A remarkable property of the SOS hierarchy: I When solving the optimization problem P : f = min {f (x) : g j (x) 0, j = 1,..., m} one does NOT distinguish between CONVEX, CONTINUOUS NON CONVEX, and 0/1 (and DISCRETE) problems! A boolean variable x i is modelled via the equality constraint x 2 i x i = 0". In Non Linear Programming (NLP), modeling a 0/1 variable with the polynomial equality constraint xi 2 x i = 0" and applying a standard descent algorithm would be considered stupid"! Each class of problems has its own ad hoc tailored algorithms.

Even though the moment-sos approach DOES NOT SPECIALIZE to each class of problems: It recognizes the class of (easy) SOS-convex problems as FINITE CONVERGENCE occurs at the FIRST relaxation in the hierarchy. Finite convergence also occurs for general convex problems and generically for non convex problems (NOT true for the LP-hierarchy.) The SOS-hierarchy dominates other lift-and-project hierarchies (i.e. provides the best lower bounds) for hard 0/1 combinatorial optimization problems! The Computer Science community talks about a META-Algorithm.

A remarkable property: II FINITE CONVERGENCE of the SOS-hierarchy is GENERIC!... and provides a GLOBAL OPTIMALITY CERTIFICATE, the analogue for the NON CONVEX CASE of the KKT-OPTIMALITY conditions in the CONVEX CASE!

Theorem (Marshall, Nie) Let x K be a global minimizer of P : f = min {f (x) : g j (x) 0, j = 1,..., m}. and assume that: (i) The gradients { g j (x )} are linearly independent, (ii) Strict complementarity holds (λ j g j (x ) = 0 for all j.) (iii) Second-order sufficiency conditions hold at (x, λ ) K R m +. m Then f (x) f = σ0 (x) + σj (x)g j(x), SOS polynomials {σ j }. j=1 x R n, for some Moreover, the conditions (i)-(ii)-(iii) HOLD GENERICALLY!

Certificates of positivity already exist in convex optimization f = f (x ) = min { f (x) : g j (x) 0, j = 1,..., m } when f and g j are CONVEX. Indeed if Slater s condition holds there exist nonnegative KKT-multipliers λ j R m + such that: f (x ) m λ j g j (x ) = 0; λ j g j (x ) = 0, j = 1,..., m. j=1... and so... the Lagrangian L λ (x) := f (x) f j=1 λ j g j (x), satisfies L λ (x ) = 0 and L λ (x) 0 for all x. Therefore: L λ (x) 0 f (x) f x K!

In summary: KKT-OPTIMALITY when f and g j are CONVEX m f (x ) λ j g j(x ) = 0 f (x) f j=1 m λ j g j(x) j=1 PUTINAR s CERTIFICATE in the non CONVEX CASE m f (x ) σ j (x ) g j (x ) = 0 f (x) f j=1 m σj (x)g j(x) j=1 0 for all x R n (= σ0 (x)) 0 for all x Rn. for some SOS {σj }, and σj (x ) = λ j.

The no-free lunch rule... The size of SDP-relaxations grows rapidly with the original problem size... In particular: O(n 2d ) variables for the d th SDP-relaxation in the hierarchy O(n d ) matrix size for the LMIs In view of the present status of SDP-solvers... only small to medium size problems can be solved by "standard" SDP-relaxations...... How to handle larger size problems?

develop more efficient general purpose SDP-solvers... (limited impact)... or perhaps dedicated solvers...? exploit symmetries when present... Recent promising works by De Klerk, Gaterman, Gvozdenovic, Laurent, Pasechnick, Parrilo, Schrijver.. in particular for combinatorial optimization problems. Algebraic techniques permit to define an equivalent SDP of much smaller size. exploit sparsity in the data. In general, each constraint involves a small number of variables κ, and the cost criterion is a sum of polynomials involving also a small number of variables. Recent works by Kim, Kojima, Lasserre, Maramatsu and Waki

In recent works, Koijma s group has developed a systematic procedure to discover sparsity pattern p p I = {1, 2,..., n} = I j. J = {1, 2,..., m} = J k. j=1 k=1 Essentially one looks for maximal cliques {I j } in some chordal graph extension of a graph associated with the problem. P : f = min {f (x) : g j (x) 0, j = 1,..., m}.

Assumption I: Sparsity pattern f = p k=1 f k with f k R[X i ; i I k ]. For all j J k, g j R[X i ; i I k ]. Assumption II: The Running Intersection Property For every k = 2,... p k 1 I k I s for some s k 1, j=1 I j

Theorem (Lass (Sparse version of Putinar s Positivstellensatz)) Let Assumption I and II hold and an archimedean assumption holds for K. If f > 0 on K then: p f = σ 0k + σ jk g j, j Jk k=1 for some s.o.s. polynomials σ jk R[X i : i I k ]. Examples with n large (say with n = 1000) and small clique size κ (e.g. κ = 3, 4,..7) are solved relatively easily with Kojima s group software SparsePOP

There has been also recents attempts to use other types of algebraic certificates of positivity that try to avoid the size explosion due to the semidefinite matrices associated with the SOS weights in Putinar s positivity certificate Recent work by Ahmadi et al. and Lasserre, Toh and Zhang