Moments and Positive Polynomials for Optimization II: LP- VERSUS SDP-relaxations

Moments and Positive Polynomials for Optimization II: LP- VERSUS SDP-relaxations LAAS-CNRS and Institute of Mathematics, Toulouse, France Tutorial, IMS, Singapore 2012

LP-relaxations LP- VERSUS SDP-relaxations

Recall the Global Optimization problem P: f := min{ f (x) g j (x) 0, j = 1,..., m}, where f and g j are all POLYNOMIALS, and let K := { x R n g j (x) 0, j = 1,..., m} be the feasible set (a compact basic semi-algebraic set)

Putinar Positivstellensatz Assumption 1: For some M > 0, the quadratic polynomial M X 2 belongs to the quadratic module Q(g 1,..., g m ) Theorem (Putinar-Jacobi-Prestel) Let K be compact and Assumption 1 hold. Then [ f R[X] and f > 0 on K ] f Q(g 1,..., g m ), i.e., m f (x) = σ 0 (x) + σ j (x) g j (x), j=1 for some s.o.s. polynomials {σ j } m j=0. x R n

If one fixes an a priori bound on the degree of the s.o.s. polynomials {σ j }, checking f Q(g 1,..., g m ) reduces to solving a SDP!! Moreover, Assumption 1 holds true if e.g. : - all the g j s are linear (hence K is a polytope), or if - the set { x g j (x) 0} is compact for some j {1,..., m}. If x K x M for some (known) M, then it suffices to add the redundant quadratic constraint M 2 X 2 0, in the definition of K.

Krivine-Handelman-Vasilescu Positivstellensatz Assumption I: With g 0 = 1, the family {g 0,..., g m } generates the algebra R[x], that is, R[x 1,..., x n ] = R[g 0,..., g m ]. Assumption II: Recall that K is compact. Hence we also assume with no loss of generality (but possibly after scaling) that for every j = 1,..., m: 0 g j (x) 1 x K.

Notation: for α, β N m, let g(x) α = g 1 (x) α1 g m (x) αm (1 g(x)) β = (1 g 1 (x)) β1 (1 g m (x)) βm Theorem (Krivine,Vasilescu Positivstellensatz) Let Assumption I and Assumption II hold: If f R[x 1,..., x m ] is POSITIVE on K then f (x) = c αβ g(x) α (1 g(x)) β, x R n, α,β N m for finitely many positive coefficients (c αβ ).

f (x) = Testing whether α,β N m c αβ g(x) α (1 g(x)) β, x R n, for finitely many positive coefficients (c αβ ) and with i α i + β i d... reduces to solving a LP!. Indeed, recall that f (x) = γ f γ x γ. So expand c αβ g(x) α (1 g(x)) β = θ γ (c) x γ α,β N m γ N n and state that f γ = θ γ (c), γ N n 2d ; c 0. a linear system!

DUAL side: The K-moment problem Let {X α } be a canonical basis for R[X], and let y := {y α } be a given sequence indexed in that basis. Recall the K-moment problem Given K R n, does there exist a measure µ on K, such that y α = X α dµ, α N n? (where X α = X α 1 1 X αn n ). K

Given y = {y α }, let L y : R[X] R, be the linear functional f (= α f α X α ) L y (f ) := α N n f α y α. Moment matrix M d (y) with rows and columns also indexed in the basis {X α }. M d (y)(α, β) := L y (X α+β ) = y α+β, α, β N n, α, β d.

For instance in R 2 : M 1 (y) = 1 {}}{ y 00 X 1 {}}{ {}}{ X 2 y 10 y 01 y 10 y 20 y 11 y 01 y 11 y 02 Importantly... M d (y) 0 L y (h 2 ) 0, h R[X] d

Localizing matrix The Localizing matrix" M d (θy) w.r.t. a polynomial θ R[X] with X θ(x) = γ θ γ X γ, has its rows and columns also indexed in the basis {X α } of R[X] d, and with entries: M d (θ y)(α, β) = L y (θ X α+β ) = θ γ y α+β+γ, γ N n { α, β N n α, β d.

For instance, in R 2, and with X θ(x) := 1 X 2 1 X 2 2, M 1 (θ y) = 1 {}}{ y 00 y 20 y 02, X 1 {}}{ y 10 y 30 y 12, X 2 {}}{ y 01 y 21 y 03 y 10 y 30 y 12, y 20 y 40 y 22, y 11 y 21 y 12 y 01 y 21 y 03, y 11 y 21 y 12, y 02 y 22 y 04. Importantly... M d (θ y) 0 L y (h 2 θ) 0, h R[X] d

Putinar s dual conditions Again K := { x R n g j (x) 0, j = 1,..., m}. Assumption 1: For some M > 0, the quadratic polynomial M X 2 is in the quadratic module Q(g 1,..., g m ) Theorem (Putinar: dual side) Let K be compact, and Assumption 1 hold. Then a sequence y = (y α ), α N n, has a representing measure µ on K if and only if ( ) L y (f 2 ) 0; L y (f 2 g j ) 0, j = 1,..., m; f R[X].

Checking whether (**) holds for all f R[X] with degree d reduces to checking whether M d (y) 0 and M d (g j y) 0, for all j = 1,..., m! m + 1 LMI conditions to verify!

Krivine-Vasilescu: dual side Theorem Let K be compact, and Assumption I and II hold. Then the sequence y = (y α ), α N n, has a representing measure µ on K if and only if L y (g α (1 g) β ) 0, α, β N m.

LP-relaxations With f R[x], consider the hierarchy of LP-relaxations ρ d = min L y (f ) y L y (g α (1 g) β ) 0, α + β 2d L y (1) = 1

with associated sequence of dual LPs: ρ d = max λ,c αβ λ f λ = α,β N m c αβ g α (1 g) β c αβ 0, α + β 2d and of course ρ d = ρ d for all d.

Theorem Assume that K is compact and Assumption I and II hold. Then the LP-relaxations CONVERGE, that is, ρ d f as d. The SHERALI-ADAMS RLT s hierarchy is exactly this type of LP-relaxations. Its convergence for 0/1 programs was proved with ah-hoc arguments. In fact, the rationale behind such convergence if Krivine-Vasilescu Positivstellensatz.

Some remarks on LP-relaxations 1. Notice the presence of binomial coefficients in both primal and dual LP-relaxations... which yields numericall ill-conditioning for relatively large d. 2. Let x K be a global minimizer, and for x K, let J(x) be the set of active constraints, i.e., g j (x) = 0 or...1 g k (x) = 0. Then FINITE convergence CANNOT occur if there exists nonoptimal x K with J(x) J(x )! And so... not possible for CONVEX problems in general! For instance, if K is a Polytope then FINITE convergence is possible only if every global minimizer is a vertex of K!

Example Consider the CONVEX problem: f = min x {x(x 1) : 0 x 1}, so that x = 0.5 and f = 0.25. One CANNOT write because f (x) f = f (x) + 0.25 = c ij x i (1 x) j, i,j N 0 = f (x ) + 0.25 = c ij 2 i j > 0. i,j N In addition, the convergence ρ d 0.25 is very slow... ρ 2 = ρ 4 = 1/3; ρ 6 = 0.3; ρ 10 = 0.27,...

Consider now the CONCAVE minimization problem: f = min x {x(1 x) : 0 x 1}, so that f = 0 and x = 0 or x = 1 (both vertices of K). f (x) f = x (1 x), x R, so that the first LP-relaxation is exact!! Hence we have the PARADOX that... the LP-relaxations behave much better for the (difficult) concave problem than for the (easy) convex one!!

LP- and SDP-Relaxations with their dual Primal LP-relaxation Primal SDP-relaxation min y L y (f ) L y (g α (1 g) β ) 0 α, β N m, α + β 2d min y L y (f ) L y (h 2 g j ) 0, j = 1,..., m h, deg(h g j ) 2d, j m Dual LP-relaxation Dual SDP-relaxation max λ λ,{c αβ } f λ = α,β N m c αβ g α (1 g) β c αβ 0; α + β 2d max λ,{σ j } λ f λ = deg(σ j g j ) semidefinite σ j characterization m σ j g j j=0 2d, j m s.o.s., j m