A JOINT+MARGINAL APPROACH TO PARAMETRIC POLYNOMIAL OPTIMIZATION

Size: px
Start display at page:

Download "A JOINT+MARGINAL APPROACH TO PARAMETRIC POLYNOMIAL OPTIMIZATION"

Transcription

1 A JOINT+MARGINAL APPROACH TO PARAMETRIC POLNOMIAL OPTIMIZATION JEAN B. LASSERRE Abstract. Given a compact parameter set R p, we consider polynomial optimization problems (P y) on R n whose description depends on the parameter y. We assume that one can compute all moments of some probability measure ϕ on, absolutely continuous with respect to the Lebesgue measure (e.g. is a box or a simplex and ϕ is uniformly distributed). We then provide a hierarchy of semidefinite relaxations whose associated sequence of optimal solutions converges to the moment vector of a probability measure that encodes all information about all global optimal solutions x (y) of P y, as y. In particular, one may approximate as closely as desired any polynomial functional of the optimal solutions, like e.g. their ϕ-mean. In addition, using this knowledge on moments, the measurable function y x k (y) of the k-th coordinate of optimal solutions, can be estimated, e.g. by maximum entropy methods. Also, for a boolean variable x k, one may approximate as closely as desired its persistency ϕ({y : x k (y) = 1}, i.e. the probability that in an optimal solution x (y), the coordinate x k (y) takes the value 1. At last but not least, from an optimal solution of the dual semidefinite relaxations, one provides a sequence of polynomial (resp. piecewise polynomial) lower approximations with L 1 (ϕ) (resp. ϕ-almost uniform) convergence to the optimal value function. ey words. Parametric and polynomial optimization; semidefinite relaxations AMS subject classifications. 65 D15, 65 5, 46 N1, 9 C22 1. Introduction. Roughly speaking, given a set of parameters and an optimization problem whose description depends on y (call it P y ), parametric optimization is concerned with the behavior and properties of the optimal value as well as primal (and possibly dual) optimal solutions of P y, when y varies in. This is quite a challenging problem and in general one may only obtain information locally around some nominal value y of the parameter. There is a vast and rich literature on the topic and for a detailed treatment, the interested reader is referred to e.g. Bonnans and Shapiro [5] and the many references therein. Sometimes, in the context of optimization with data uncertainty, some probability distribution ϕ on the parameter set is available and in this context one is also interested in e.g. the distribution of the optimal value, optimal solutions, all viewed as random variables. In fact, very often only some moments of ϕ are known (typically first- and second-order moments) and the goal of this more realistic moment-based approach is to obtain optimal bounds over all distributions that share this moment information. In particular, for discrete optimization problems where coefficients of the cost vector (of the objective function to optimize) are random variables with joint distribution ϕ, some bounds on the expected optimal value have been obtained. More recently Natarajan et al. [2] extended the earlier work in [4] to even provide a convex optimization problem for computing the so-called persistency values 1 of (discrete) variables, for a particular distribution ϕ in a certain set Φ of distributions that share some common moment information. However, the resulting second-order cone program requires knowledge of the convex hull of some discrete set of points, which LAAS-CNRS and Institute of Mathematics, University of Toulouse, 7 Avenue du Colonel Roche, 3177 Toulouse Cédex 4, France. Tel: ; Fax: ; lasserre@laas.fr 1 Given a 1 optimization problem max{c x : x X {, 1} n } and a distribution ϕ on c, the persistency value of the variable x i is Prob ϕ(x i = 1) at an optimal solution x (c) = (x i ). 1

2 2 Jean B. LASSERRE is possible when the number of vertices is small. The approach is nicely illustrated on a discrete choice problem and a stochastic knapsack problem. For more details on persistency in discrete optimization, the interested reader is referred to [2] and the references therein. Recently Natarajan et al. [21] have considered mixed zero-one linear programs with uncertainty in the objective function and first- and second-order moment information. They show that computing the supremum of the expected optimal value (where the supremum is over all distributions sharing this moment information) reduces to solving a completely positive program. Moment-based approaches also appear in robust optimization and stochastic programming under data uncertainty, where the goal is different since in this context, decisions of interest must be taken before the uncertain data is known. For these minmax type problems, a popular approach initiated in Arrow et al. [2] is to optimize decisions against the worst possible distribution θ (on uncertain data) taken in some set Φ of candidate distributions that share some common moment information (in general first- and second-order moments). Recently Delage and e [9]) have even considered the case where only a confidence interval is known for first- and second-order moments. For a nice discussion the interested reader is referred to [9] and the many references therein. In the context of solving systems of polynomial equations whose coefficients are themselves polynomials of some parameter y, specific parametric methods exist. For instance, one may compute symbolically once and for all, what is called a comprehensive Gröbner basis, i.e., a fixed basis that, for all instances of y, is a Gröbner basis of the ideal associated with the polynomials in the system of equations; see Weispfenning [29] and more recently Rostalski [23] for more details. Then when needed, and for any specific value of the parameter y, one may compute all complex solutions of the system of equations, e.g. by the eigenvalue method of Möller and Stetter [19]. However, one still needs to apply the latter method for each value of the prameter y. A similar two-step approach is also proposed for homotopy (instead of Gröbner bases) methods in [23]. The purpose of this paper devoted to parametric optimization is to show that in the case of polynomial optimization, all information about the optimal value and optimal solutions can be obtained, or at least, approximated as closely as desired. Contribution. We here restrict our attention to parametric polynomial optimization, that is, when P y is described by polynomial equality and inequality constraints on both the parameter vector y and the optimization variables x. Moreover, the set is restricted to be a compact basic semi-algebraic set of R p, and preferably a set sufficiently simple so that one may obtain the moments of some probability measure on, absolutely continuous with respect to the Lebesgue measure (or with respect to the counting measure if is discrete). For instance if is a simple set (like a simplex, a box) one may choose ϕ to be the probability measure uniformly distributed on ; typical candidates are polyhedra. Or sometimes, in the context of optimization with data uncertainty, ϕ is already specified. In this specific context we are going to show that one may get insightful information on the set of all global minimizers of P y and on the global optimum, via what we call a joint+marginal approach. Our contribution is as follows: (a) Call J(y) (resp. X y Rn ) the optimal value (resp. the set of optimal solutions) of P y for the value y of the parameter. We first define an infinite-

3 Parametric polynomial optimization 3 dimensional optimization problem P whose optimal value is exactly ρ = J(y)dϕ(y), i.e. the ϕ-mean of the global optimum. Any optimal solution of P is a probability measure µ on R n R p with marginal ϕ on R p. It turns out that µ encodes all information on the set of global minimizers X y, y. Whence the name joint+marginal as µ is a joint distribution of x and y, and ϕ is the marginal of µ on R p. (b) Next, we provide a hierarchy of semidefinite relaxations of P with associated sequence of optimal values (ρ i ) i, in the spirit of the hierarchy defined in [15]. An optimal solution of the i-th semidefinite relaxation is a sequence z i =(z i αβ ) indexed in the monomial basis (x α y β ) of the subspace R[x, y] 2i of polynomials of degree at most 2i. If for ϕ-almost all y, P y has a unique global minimizer x (y) R n, then as i, z i converges pointwise to the sequence of moments of µ defined in (a). In particular, one obtains the distribution of the optimal solution x (y), and therefore, one may approximate as closely as desired any polynomial functional of the solution x (y), like e.g. the ϕ-mean or variance of x (y). In addition, if the optimization variable x k is boolean then one may approximate as closely as desired its persistency ϕ({y : x k (y) = 1} (i.e., the probability that x k (y) = 1 in an optimal solution x (y)), as well as a a necessary and sufficient condition for this persistency to be 1. (c) Let e(k) N n be the vector with e j (k) =δ j=k, j =1,..., n. Then as i, and for every β N p, the sequence (ze(k)β i ) converges to z kβ := yβ g k (y)dϕ(y) for the measurable function y g k (y) := x k (y). In other words, the sequence (z kβ ) β N p is the moment sequence of the measure dψ(y) := x k (y)dϕ(y) on. And so, the k-th coordinate function y x k (y) of the global minimizer of P y, y, can be estimated, e.g. by maximum entropy methods. Of course, the latter estimation is not pointwise but it still provides useful information on optimal solutions, e.g. the shape of the function y x k (y), especially if the function x k ( ) is continuous, as illustrated on some simple examples. For instance, for parametric polynomial equations, one may use this estimation of x (y) as an initial point for Newton s method for any given value of the parameter y. (d) At last but not least, from an optimal solution of the dual of the i-th semidefinite relaxation, one obtains a piecewise polynomial approximation of the optimal value function y J(y), that converges ϕ-almost surely to J. Finally, the computational complexity of the above methodology is roughly the same as the moment approach described in [15] for an optimization problem with n + p variables since we consider the joint distribution of the n variables x and the p parameters y. Hence, the approach is particularly interesting when the number of parameters is small, say 1 or 2. In addition, in the latter case the max-entropy estimation has been shown to be very efficient in several examples in the literature; see e.g. [6, 26, 27]. However, in view of the present status of SDP solvers, if no sparsity or symmetry is taken into account as proposed in e.g. [17], the approach is limited to small to medium size polynomial optimization problems. Alternatively one may also use LP-relaxations which can handle larger size problems but with probably less precise results because of their poor convergence properties in general. But this computational price may not seem that high in view of the ambitious goal of the approach. After all, keep in mind that by applying the moment approach to a single (n + p)-variables problem, one obtains information on global optimal solutions of an n-variables problem that depends on p parameters, that is, one approximates n functions of p variables!

4 4 Jean B. LASSERRE 2. A related linear program. For a Borel space X let B(X) denote the Borel σ-field associated with X. Let R[x, y] denote the ring of polynomials in the variables x =(x 1,...,x n ), and the variables y =(y 1,..., y p ), whereas R[x, y] k denotes its subspace of polynomials of degree at most k. Let Σ[x, y] R[x, y] denote the subset of polynomials that are sums of squares (in short s.o.s.). For a real symmetric matrix A the notation A stands for A is positive semidefinite. The parametric optimization problem. Let R p be a compact set, called the parameter set, and let f, h j : R n R p R, j =1,..., m, be continuous. For each y, fixed, consider the following optimization problem: J(y) := inf x { f y(x) : h jy (x), j=1,..., m } (2.1) where the functions f y,h jy : R n R are defined via: x f y (x) := f(x, y) x h jy (x) := h j (x, y), j=1,..., m } x R n, y R p. Next, let R n R p be the set: and for each y, let := { (x, y) : y ; h j (x, y), j =1,..., m }, (2.2) y := { x R n : h jy (x), j =1,..., m }. (2.3) The interpretation is as follows: is a set of parameters and for each instance y of the parameter, one wishes to compute an optimal decision vector x (y) that solves problem (2.1). Let ϕ be a Borel probability measure on, with a positive density with respect to the Lebesgue measure on the smallest affine variety that contains. For instance choose for ϕ the probability measure ( ) 1 ϕ(b) := dy B dy, B B(), uniformly distributed on (assuming of course that has nonempty interior). Of course, one may also treat the case of a discrete set of parameters (finite or countable) by taking for ϕ a discrete probability measure on with strictly positive weight at each point of the support. Sometimes, e.g. in the context of optimization with data uncertainty, ϕ is already specified. We will use ϕ (or more precisely, its moments) to get information on the distribution of optimal solutions x (y) of P y, viewed as random vectors. In the rest of the paper we assume that for every y, the set y in (2.3) is nonempty A related infinite-dimensional linear program. Let M() be the set of finite Borel measures on, and consider the following infinite-dimensional linear program P: ρ := inf { f dµ : πµ = ϕ } (2.4) µ M()

5 Parametric polynomial optimization 5 where πµ denotes the marginal of µ on R p, that is, πµ is a probability measure on R p defined by πµ(b) := µ(r n B), B B(R p ). Notice that µ() = 1 for any feasible solution µ of P. Indeed, as ϕ is a probability measure and πµ = ϕ one has 1 = ϕ() =µ(r n R p )=µ(). Recall that for two Borel spaces X,, the graph Gr ψ X of a set-valued mapping ψ : X is the set Gr ψ := {(x, y) : x X ; y ψ(x) }. If ψ is measurable then any measurable function h : X with h(x) ψ(x) for every x X, is called a measurable selector. (See 4.1 for more details.) Lemma 2.1. Let both R n and in (2.2) be compact. Then the set-valued mapping y y is Borel-measurable. In addition: (a) The mapping y J(y) is measurable. (b) There exists a measurable selector g : y such that J(y) =f(g(y), y) for every y. Proof. As and are both compact, the set valued mapping y y R n is compact-valued. Moreover, the graph of y is by definition the set, which is a Borel subset of R n R p. Next, since x f y (x) is continuous for every y, (a) and (b) follow from Proposition 4.3 and 4.4. Theorem 2.2. Let both R p and in (2.2) be compact and assume that for every y, the set y R n in (2.3) is nonempty. Let P be the optimization problem (2.4) and let X y := {x Rn : f(x, y) =J(y)}, y. Then: (a) ρ = J(y) dϕ(y) and P has an optimal solution. (b) For every optimal solution µ of P, and for ϕ-almost all y, there is a probability measure ψ (dx y) on R n, concentrated on X y, such that: µ (C B) = ψ (C y) dϕ(y), B B(), C B(R n ). (2.5) B (c) Assume that for ϕ-almost all y, the set of minimizers of X y is the singleton {x (y)} for some x (y) y. Then there is a measurable mapping g : y such that g(y) =x (y) for every y ; ρ = f(g(y), y) dϕ(y), (2.6) and for every α N n, and β N p : x α y β dµ (x, y) = y β g(y) α dϕ(y). (2.7) Proof. (a) As is compact then so is y for every y. Next, as y for every y and f is continuous, the set X y := {x Rn : f(x, y) =J(y)} is nonempty for every y. Let µ be any feasible solution of P and so by definition, its marginal on R p is just ϕ. Since X y, y, one has f y (x) J(y) for all x y and all y. So, f(x, y) J(y) for all (x, y) and therefore fdµ J(y) dµ = J(y) dϕ,

6 6 Jean B. LASSERRE which proves that ρ J(y) dϕ. On the other hand, recall that y, y. Consider the set-valued mapping y X y y. As f is continuous and is compact, then X y is compact-valued. In addition, as f y is continuous, by Proposition 4.4 there exists a measurable selector g : X y (and so f(g(y), y) =J(y)). Therefore, for every y, let ψ y := δ g(y) be the Dirac probability measure with support on the singleton g(y) X y, and let µ be the probability measure on defined by: µ(c B) := 1 C (g(y)) ϕ(dy), B B(R p ),C B(R n ). B (The measure µ is well-defined because g is measurable.) Then µ is feasible for P and [ ] ρ f dµ = f(x, y) dδ g(y) dϕ(y) = y f(g(y), y) dϕ(y) = J(y) dϕ(y), which shows that µ is an optimal solution of P and ρ = J(y)dϕ(y). (b) Let µ be an arbitrary optimal solution of P, hence on R n and concentrated on Gr y = y. Therefore by Proposition 4.6 the probability measure µ can be disintegrated as µ (C B) := ψ (C y) dϕ(y), B B(), C B(R n ), B where for all y, ψ ( y) is a probability measure on y. (The object ψ ( ) is called a stochastic kernel; see Proposition 4.6.) Hence from (a), ρ = J(y) dϕ(y) = f(x, y) dµ (x, y) ( ) = f(x, y) ψ (dx y) dϕ(y). y Therefore, using f(x, y) J(y) on, = J(y) f(x, y) ψ (dx y) dϕ(y), y }{{} which implies ψ (X (y) y) = 1 for ϕ-almost all y. (c) Let g : y be the measurable mapping of Lemma 2.1(b). As J(y) = f(g(y), y) and (g(y), y) then necessarily g(y) X y for every y. Next, let µ be an optimal solution of P, and let α N n, β N p. Then ( ) x α y β dµ (x, y) = y β x α ψ (dx y) dϕ(y) X y = y β g(y) α dϕ(y),

7 Parametric polynomial optimization 7 the desired result. An optimal solution µ of P encodes all information on the optimal solutions x (y) of P y. For instance, let B be a given Borel set of R n. Then from Theorem 2.2, Prob (x (y) B) =µ (B R p )= ψ (B y) dϕ(y), with ψ as in Theorem 2.2(b). Consequently, if one knows an optimal solution µ of P then one may evaluate functionals on the solutions of P y, y. That is, assuming that for ϕ-almost all y, problem P y has a unique optimal solution x (y), and given a measurable mapping h : R n R q, one may evaluate the functional h(x (y)) dϕ(y). For instance, with x h(x) := x one obtains the mean vector E ϕ (x (y)) := x (y)dϕ(y) of optimal solutions x (y), y. Corollary 2.3. Let both R p and in (2.2) be compact. Assume that for every y, the set y R n in (2.3) is nonempty, and for ϕ-almost all y, the set X y := {x y : J(y) =f(x, y)} is the singleton {x (y)}. Then for every measurable mapping h : R n R q, h(x (y)) dϕ(y) = h(x) dµ (x, y). (2.8) where µ is an optimal solution of P. Proof. By Theorem 2.2(c) [ ] h(x) dµ (x, y) = h(x)ψ (dx y) X y dϕ(y) = h(x (y)) dϕ(y). Remark 2.4. If the set X (y) is not a singleton on some set with positive ϕ-measure then Theorem 2.2(c) now becomes: For each α N n, there exists a measurable mapping g α : R such that: ( ) x α y β dµ = y β x α ψ (dx y) dy = y β g α (y) dy, X y where y g α (y) = x α ψ (dx y) = E [x α y], y, X y and E [ y] denotes the conditional expectation operator associated with µ Duality. Consider the following infinite-dimensional linear program P : ρ := sup p dϕ p R[y] (2.9) f(x, y) p(y) (x, y).

8 8 Jean B. LASSERRE Then P is a dual of P. Lemma 2.5. Let both R p and in (2.2) be compact and let P and P be as in (2.4) and (2.9) respectively. Then there is no duality gap, i.e., ρ = ρ. Proof. For a topological space X denote by C(X ) the space of bounded continuous functions on X. Let M() be the vector space of finite signed Borel measures on (and so M() is its positive cone). Let π : M() M() be defined by (πµ)(b) = µ((r n B) ) for all B B(), with adjoint mapping π : C() C() defined as (x, y) (π h)(x, y) := h(y), h C(). Put (2.4) in the framework of infinite-dimensional linear programs on vector spaces, as described in e.g. [1]. That is: with dual: ρ = inf { f, µ : πµ = ϕ, µ }, µ M() ρ = sup { h, ϕ : f π h h C() on }. Endow M() (respectively M()) with the weak topology σ(m(),c()) (respectively σ(m(),c()). One first proves that ρ = ρ and then ρ = ρ. By [1, Theor. 3.1], to get ρ = ρ, it suffices to prove that the set D := {(πµ, f, µ ) :µ M()} is closed for the respective weak topologies σ(m() R,C() R) of M() R and σ(m(),c()) of M(). Therefore consider a converging sequence πµ n a with µ n M(). The sequence (µ n ) is uniformly bounded because µ n () = (πµ n )() = 1, πµ n 1,a = a(). But by the Banach-Alaoglu Theorem (see e.g. [3, Theor ]), the bounded closed sets of M() are compact in the weak topology. And so µ nk µ for some µ M() and some subsequence (n k ). Next, observe that for h C() arbitrary, h, πµ nk = π h, µ nk π h, µ = h, πµ, where we have used that π h C(). Hence combining the above with πµ nk a, we obtain πµ = a. Similarly, f, µ nk f, µ because f C(). Hence D is closed and the desired result ρ = ρ follows. We next prove that ρ = ρ. Given ɛ> fixed arbitrary, there is a function h ɛ C() such that f h ɛ on and h ɛdϕ ρ ɛ. By compactness of and the Stone-Weierstrass theorem, there is p ɛ R[y] such that sup y h ɛ (y) p ɛ (y) ɛ. Hence the polynomial p ɛ := p ɛ ɛ is feasible with value p ɛdϕ ρ 3ɛ, and as ɛ was arbitrary, the result ρ = ρ follows. As next shown, optimal or nearly optimal solutions of P provide us with polynomial lower approximations of the optimal value function y J(y) that converges to J( ) in the L 1 (ϕ) norm. Moreover, one may also obtain a piecewise polynomial approximation that converges to J( ) ϕ-almost uniformly. (Recall that a sequence of measurable functions (g n ) on a measure space (, B(),ϕ) converges to gϕ-almost uniformly if and only if for every ɛ>, there is a set A B() such that ϕ(a) <ɛ and g n g uniformly on \ A.)

9 Parametric polynomial optimization 9 Corollary 2.6. Let both R p and in (2.2) be compact and assume that for every y, the set y is nonempty. Let P be as in (2.9). If (p i ) i N R[y] is a maximizing sequence of (2.9) then J(y) p i (y) dϕ as i. (2.1) Moreover, define the functions ( p i ) as follows: p := p, y p i (y) := max [ p i 1 (y),p i (y)], i =1, 2,... Then p i J( ) ϕ-almost uniformly. Proof. By Lemma 2.5, we already know that ρ = ρ and so p i (y) dϕ(y) ρ = ρ = J(y) dϕ. Next by feasibility of p i in (2.9) f(x, y) p i (y) (x, y) inf x y f(x, y) =J(y) p i (y) y. Hence (2.1) follows from p i (y) J(y) on. With y fixed, the sequence ( p i (y)) i is obviously monotone nondecreasing and bounded above by J(y), hence with a limit p (y) J(y). Therefore p i has the pointwise limit y p (y) J(y). Also, by the Montone convergence theorem, p i(y)dϕ(y) p (y)dϕ(y). This latter fact combined with (2.1) and p i (y) p i (y) J(y) yields = (J(y) p (y)) dϕ(y), which in turn implies that p (y) =J(y) for ϕ-almost all y. Therefore p i (y) J(y) for ϕ-almost all y. And so by Egoroff s Theorem [3, Theor ], p i J( ), ϕ-almost uniformly. 3. A hierarchy of semidefinite relaxations. In general, solving the infinitedimensional problem P and getting an optimal solution µ is impossible. One possibility is to use numerical discretization schemes on a box containing ; see for instance [14]. But in the present context of parametric optimization, if one selects finitely many grid points (x, y), one is implicitly considering solving (or rather approximating) P y for finitely many points y in a grid of, which we want to avoid. To avoid this numerical discretization scheme we will use specific features of P when its data f (resp. ) is a polynomial (resp. a compact basic semi-algebraic set). Therefore in this section we are now considering a polynomial parametric optimization problem, a special case of (2.1) as we assume the following: f R[x, y] and h j R[x, y], for every j =1,..., m. is compact and R p is a compact basic semi-algebraic set. Hence the set R n R p in (2.2) is a compact basic semi-algebraic set. We also assume that there is a probability measure ϕ on, absolutely continuous with respect to the Lebesgue measure, whose moments γ =(γ β ), β N p, are available. As already mentioned, if is a simple set (like e.g. a simplex or a box) then one may choose ϕ to be the probability measure uniformly distributed on, for which all moments can be computed easily. Sometimes, in the context of optimization with data uncertainty, the probability measure ϕ is already specified and in this case we assume that its moments γ =(γ β ), β N p, are available.

10 1 Jean B. LASSERRE 3.1. Notation and preliminaries. Let N n i := {α N n : α i} with α = i α i. With a sequence z =(z αβ ), α N n,β N p, indexed in the canonical basis (x α y β ) of R[x, y], let L z : R[x, y] R be the linear mapping: f (= αβ f αβ (x, y)) L z (f) := αβ f αβ z αβ, f R[x, y]. Moment matrix. The moment matrix M i (z) associated with a sequence z = (z αβ ), has its rows and columns indexed in the canonical basis (x α y β ), and with entries. M i (z)(α, β), (δ, γ)) = L z (x α y β x δ y γ )=z (α+δ)(β+γ), for every α, δ N n i and every β, γ N p i. Localizing matrix. Let q be the polynomial (x, y) q(x, y) := u,v q uvx u y v. The localizing matrix M i (q z) associated with q R[x, y] and a sequence z =(z αβ ), has its rows and columns indexed in the canonical basis (x α y β ), and with entries. M i (q z)(α, β), (δ, γ)) = L z (q(x, y)x α y β x δ y γ ) = q uv z (α+δ+u)(β+γ+v), u N n,v N p for every α, δ N n i and every β, γ N p i. A sequence z =(z αβ ) R has a representing finite Borel measure supported on if there exists a finite Borel measure µ such that z αβ = x α y β dµ, α N n,β N p. The next important result states a necssary and sufficient condition when is compact and its defining polynomials (h k ) R[x, y] satisfy some condition. Assumption 3.1. Let (h j ) t j=1 R[x, y] be a given family of polynomials. There is some N such that the quadratic polynomial (x, y) N (x, y) 2 can be written N (x, y) 2 = σ + t σ j h j, for some s.o.s. polynomials (σ j ) t j= Σ[x, y]. Theorem 3.2. Let := {(x, y) : h k (x, y), k=1,... t} and let (h k ) t k=1 satisfy Assumption 3.1. A sequence z =(z αβ ) has a representing measure on if and only if, for all i =, 1,... j=1 M i (z) ; M i (h k z), k =1,..., t. Theorem 3.2 is a direct consequence of Putinar s Positivstellensatz [22] and [25]. Of course, when Assumption 3.1 holds then is compact. On the other hand, if is compact and one knows a bound N for (x, y) on then its suffices to add the redundant quadratic constraint h t+1 (x, y)(:= N 2 (x, y) 2 ) to the definition of, and Assumption 3.1 holds.

11 Parametric polynomial optimization Semidefinite relaxations. To compute (or at least, approximate) the optimal value ρ of problem P in (2.4), we now provide a hierarchy of semidefinite relaxations in the spirit of those defined in [15]. Let R n R p be as in (2.2), and let R p be the compact semi-algebraic set defined by: := { y R p : h k (y), k = m +1,..., t} (3.1) for some polynomials (h k ) t k=m+1 R[y]; let v k := (deg h k )/2 ] for every k =1,...,t. Next, let γ =(γ β ) with γ β = y β dϕ(y), β N p, be the moments of a probability measure ϕ on, absolutely continuous with respect to the Lebesgue measure, and let i := max[ (deg f)/2, max k v k ]. For i i, consider the following semidefinite relaxations: ρ i = inf L z (f) z s.t. M i (z) (3.2) M i vj (h j z), j =1,..., t L z (y β )=γ β, β N p i. Theorem 3.3. Let, be as (2.2) and (3.1) respectively, and let (h k ) t k=1 satisfy Assumption 3.1. Assume that for every y the set y is nonempty and consider the semidefinite relaxations (3.2). Then: (a) ρ i ρ as i. (b) Let z i be a nearly optimal solution of (3.2), e.g. such that L z i(f) ρ i +1/i, and let g : y be the measurable mapping in Theorem 2.2(c). If for ϕ-almost all y, J(y) is attained at a unique optimal solution x (y), then: lim i zi αβ = y β g(y) α dϕ(y), α N n,β N p. (3.3) In particular, for every k =1,..., n, lim i zi e(k)β = y β g k (y) dϕ(y), β N p, (3.4) where e(k) j = δ j=k, j =1,..., n (with δ being the ronecker symbol). The proof is postponed to 4.2. Remark 3.4. Observe that if ρ i =+ for some index i in the hierarchy (and hence for all i i), then the set y is empty for all y in some Borel set B of with ϕ(b) >. Conversely, one may prove that if y is empty for all y B with ϕ(b) >, then necessarily ρ i =+ for all i sufficiently large. In other words, the hierarchy of semidefinite relaxations (3.2) may also provide a certificate of emptyness of y for some Borel set of with positive ϕ-measure.

12 12 Jean B. LASSERRE 3.3. The dual semidefinite relaxations. The dual of the semidefinite relaxtion (3.2) reads: ρ i = sup p dϕ (= p β γ β ) p,(σ i) s.t. β N p i f p = σ + t j=1 σ j h j p R[y]; σ j Σ[x, y], j =1,..., t deg p 2i, deg σ j h j 2i, j =1,..., t (3.5) Observe that (3.5) is a strenghtening of (2.9) as one restricts to polynomials p R[y] of degree at most 2i and the nonnegativity of f p in (2.9) is replaced with the stronger weighted s.o.s. representation in (3.5). Therefore ρ i ρ for every i. Theorem 3.5. Let, be as (2.2) and (3.1) respectively, and let (h k ) t k=1 satisfy Assumption 3.1. Assume that for every y the set y is nonempty, and consider the semidefinite relaxations (3.5). Then: (a) ρ i ρ as i. (b) Let (p i, (σj i)) be a nearly optimal solution of (3.5), e.g. such that p i dϕ ρ i 1/i. Then p i J( ) and Moreover if one defines lim i J(y) p i (y) dϕ(y) = (3.6) p := p, y p i (y) := max [ p i 1 (y),p i (y)], i =1, 2,..., then p i J( ) ϕ-almost uniformly on. Proof. Recall that by Lemma 2.5, ρ = ρ. Moreover let (p k ) R[y] be a maximizing sequence of (2.9) as in Corollary 2.6 with value s k := p kdϕ, and let p k := p k 1/k for every k so that f p k > 1/k on. By Theorem 3.2, there exist s.o.s. polynomials (σj k) Σ[x, y] such that f p k = σk + j σk j h j. Letting d k be the maximum degree of σ and σ j h j, j =1,..., t, it follows that (s k 1/k, (σj k )) is a feasible solution of (3.5) with i := d k. Hence ρ ρ d k s k 1/k and the result (a) follows because s k ρ, and the sequence ρ i is monotone. Then (b) follows from Corollary 2.6. Hence in Theorem 3.5, p i R[y] provides a polynomial lower approximation of the optimal value function J( ), with degree at most 2i (the order of the moments (γ β ) of ϕ taken into account in the semidefinite relaxation (3.5)). Moreover one may even define a piecewise polynomial lower approximation p i that converges ϕ-almost uniformly to J( ) on. Functionals of the optimal solutions. Theorem 3.3 provides a mean of approximating any polynomial functional on the global minimizers of P y, y. Indeed, Corollary 3.6. Let, be as (2.2) and (3.1) respectively, and let (h k ) t k=1 satisfy Assumption 3.1. Assume that for every y the set y is nonempty, and for ϕ-almost all y, J(y) is attained at a unique optimal solution x (y) X y. Let x h(x) := h α x α, α N n

13 Parametric polynomial optimization 13 and let z i be a nearly optimal solution of the semidefinite relaxations (3.2). Then, for i sufficiently large, h(x (y)) dϕ(y) h α zα i. α N n Proof. The proof is an immediate consequence of Theorem 3.3 and Corollary Persistence for Boolean variables. One interesting and potentially useful application is in Boolean optimization. Indeed suppose that for some subset I {1,..., n}, the variables (x i ), i I, are boolean, that is, the definition of in (2.2) includes the quadratic constraints x 2 i x i =, for every i I. Then for instance, one might be interested to determine whether in any optimal solution x (y) of P y, and for some index i I, one has x i (y) = 1 (or x i (y) = ) for ϕ-almost all values of the parameter y. In [4, 2] the probability that x k (y) is 1 is called the persistency of the boolean variable x k (y) Corollary 3.7. Let, be as in (2.2) and (3.1) respectively. Let (h k ) t k=1 satisfy (3.1). Assume that for every y the set y is nonempty. Let z i be a nearly optimal solution of the semidefinite relaxations (3.2). Then for k I fixed: (a) x k (y) = 1 in any optimal solution and for ϕ-almost all y, only if lim i zi e(k) =1. (b) x k (y) = in any optimal solution and for ϕ-almost all y, only if lim i zi e(k) =. Assume that for ϕ-almost all y, J(y) is attained at a unique optimal solution x (y) X y. Then Prob (x k (y) = 1) = lim i zi e(k), and so: (c) x k (y) =1for ϕ-almost all y, if and only if lim i zi e(k) =1. (d) x k (y) =for ϕ-almost all y, if and only if lim i zi e(k) =. Proof. (a) The only if part. Let α := e(k) N n. From the proof of Theorem 3.3, for an arbitrary converging subsequence (i l ) l (i) i as in (4.5), lim l zi l e(k) = x k dµ, where µ is some optimal solution of P. Hence, by Theorem 2.2(b), µ can be disintegrated into ψ (dx y)dϕ(y) where ψ (dx y) is a probability measure on X y for every y. Therefore, lim l zi l e(k) = = = ( X y x k ψ (dx y) ) dϕ(y), ψ (X y y) dϕ(y) [because x k = 1 in X y] dϕ(y) =1, and as the converging subsequence (i l ) l in (4.5) was arbitrary, the whole sequence (ze(k) i ) converges to 1, the desired result. The proof of (b) being exactly the same is

14 14 Jean B. LASSERRE omitted. Next, if for every y, J(y) is attained at a singleton, then by Theorem 3.3(b), from which (c) and (d) follow. lim i zi e(k) = x k(y) dϕ(y) =ϕ({y : x k(y) =1}) = Prob (x k(y) = 1), 3.5. Estimating the density g(y). By Corollary 3.6, one may approximate any polynomial functional of the optimal solutions, like for instance the mean, variance, etc. (with respect to the probability measure ϕ). However, one may also wish to approximate (in some sense) the function y g k (y), that is, the curve described by the k-th coordinate x k (y) of the optimal solution x (y) when y varies in. So let g : R n be the measurable mapping in Theorem 3.3 and suppose that one knows some lower bound vector a =(a k ) R n, where: a k inf { x k :(x, y) }, k =1,..., n. Then for every k =1,..., n, the measurable function ĝ k : R n defined by y ĝ k (y) := g k (y) a k, y, (3.7) is nonnegative and ϕ-integrable. Hence for every k =1,..., n, one may consider dλ := ĝ k dϕ as a Borel measure on with unknown density ĝ k with respect to ϕ, but with known moments u =(u β ). Indeed, using (3.4), u β := y β dλ(y) = a k y β dϕ(y)+ y β g k (y) dϕ(y) = a k γ β + z e(k)β, β N p, (3.8) where for every k =1,...,n, z e(k)β = lim i z i e(k)β, β Nn, with z i being an optimal (or nearly optimal) solution of the semidefinite relaxation (3.2). Hence we are now faced with a density estimation problem, that is: Given the sequence of moments u β = yβ g k (y)dϕ, β N p, of the unknown nonnegative measurable function ĝ k on, estimate ĝ k. One possibility is to use the so-called maximum entropy approach, briefly described in the next section. Maximum-entropy estimation. We briefly describe the maximum entropy estimation technique in the univariate case. The multivariate case generalizes easily. Let g L 1 ([, 1]) be a nonnegative function 2 only known via the first 2d + 1 moments u =(u j ) 2d j= of its associated measure dϕ = gdx on [, 1]. (In the context of previous section, the function g to estimate is y ĝ k (y) in (3.7) from the sequence u in (3.8) of its (multivariate) moments.) 2 L 1 ([, 1]) denote the Banach space of integrable functions on the interval [, 1] of the real line, equipped with the norm g 1 = R 1 b(x) dx.

15 Parametric polynomial optimization 15 From that partial knowledge one wishes (a) to provide an estimate h d of g such that the first 2d + 1 moments of the measure h d dx match those of gdx, and (b) analyze the asymptotic behavior of h d when d. This problem has important applications in various areas of physics, engineering, and signal processing in particular. An elegant methodology is to search for h d in a (finitely) parametrized family {h d (λ, x)} of functions, and optimize over the unknown parameters λ via a suitable criterion. For instance, one may wish to select an estimate h d that maximizes some appropriate entropy. Several choices of entropy functional are possible as long as one obtains a convex optimization problem in the finitely many coefficients λ i s. For more details the interested reader is referred to e.g. Borwein and Lewis [7, 8] and the many references therein. We here choose the Boltzmann-Shannon entropy H : L 1 ([, 1]) R { }: h H[h] := 1 h(x) ln h(x) dx, (3.9) a strictly concave functional. Therefore, the problem reduces to: { 1 sup h H[h] : x j h(x) dx = u j, j =,..., 2d }. (3.1) The structure of this infinite-dimensional convex optimization problem permits to search for an optimal solution h d of the form: 2d x h d (x) = exp λ j xj, (3.11) and so λ is an optimal solution of the finite-dimensional unconstrained convex problem 1 2d θ(u) := sup u,λ exp λ j x j dx. λ Notice that the above function θ is just the Legendre-Fenchel transform of the convex function λ 1 exp 2d j= λ jx j dx. An optimal solution can be calculated by applying first-order methods, in which case the gradient v d of the function λ v d (λ) := u,λ 1 j= exp j= 2d j= λ j x j dx, is provided by: v d (λ) λ k = u k 1 x k exp 2d j= λ j x j dx, k =,..., 2d. If one applies second-order methods, e.g. Newton s method, then computing the Hessian 2 v d at current iterate λ, reduces to computing 2 v d (λ) 1 2d = x k+j exp λ j x j dx, k, j =,..., 2d. λ k λ j j=

16 16 Jean B. LASSERRE In such simple cases like a box [a, b] (or [a, b] n in the multivariate case) such quantities can be approximated quite accurately via cubature formula as described in e.g. [11]. In particular, several cubature formula behave very well for exponentials of polynomials as shown in e.g. Bender et al. [6]. An alternative with no cubature formula is also proposed in [18]. One has the following convergence result which follows directly from [7, Theor. 1.7 and p. 259]. Proposition 3.8. Let g L 1 ([, 1]) and for every d N, let h d in (3.11) be an optimal solution of (3.1). Then, as d, 1 ψ(x)(h d (x) g(x)) dx, for every bounded measurable function ψ : [, 1] R which is continuous ϕ-almost everywhere Hence, the max-entropy estimate we obtain is not a pointwise estimate of g, and so, at some points of [, 1] the max-entropy density h d and the density g to estimate may differ significantly. However, for sufficiently large d, both curves of h d and g are close to each other. For instance, in our context and with a one-dimensional parameter y in say = [, 1], recall that g is the mappping y x k (y), and so in general, for fixed y, h d (y) is close to x k (y) and might be chosen for the k-coordinate of an initial point x, input of a local minimization algorithm to find a local minimizer x (y) (with reasonable hope that it is a global minimizer) LP-relaxations. Of course, in view of the present status of semidefinite programming solvers, the proposed approach is limited to small to medium size problems because the size of the semidefinite relaxations (3.2) grows like O(n+p) i. On the other hand, if there is some structured sparsity pattern in the original problem, then one may use alternative sparse semidefinite relaxations as defined in e.g. [28], which increases significantly the size of problems that can be addressed by this technique. Alternatively, instead of the hierarchy of semidefinite relaxations (3.2), one may also use an analogous hierarchy of LP-relaxations as defined in [16], whose convergence is also guaranteed. In view of the status of LP solvers, and although the former relaxations have much better convergence properties, sometimes it may be better to use LP-relaxations, e.g., if one wishes to obtain only crude approximations but for larger size problems Illustrative examples. In this section we provide some simple illustrative examples. To show the potential of the approach we have voluntarily chosen very simple examples for which one knows the solutions exactly so as to compare the results we obtain with the exact optimal value and optimal solutions. The semidefinite relaxations (3.2) were implemented by using the software package Gloptipoly [12]. The max-entropy estimate h d of g k was computed by using Newton s method, where at each iterate (λ (k),h d (λ (k) )): λ (k+1) = λ (k) ( 2 v d (λ (k) )) 1 v d (λ (k) ). Example 3.9. For illustration purpose, consider the toy example where := [, 1], := {(x, y) : 1 x 2 y 2 ; x, y } R 2, (x, y) f(x, y) := x 2 y. Hence for each value of the parameter y, the unique optimal solution is x (y) := 1 y2. And so in Theorem 3.3(b), y g(y) = 1 y 2.

17 Parametric polynomial optimization 17 Let ϕ be the probability measure uniformly distributed on [, 1]. Therefore, ρ = 1 J(y) dϕ(y) = 1 y(1 y 2 ) dy = 1/4. Solving (3.2) with i := 3, that is, with moments up to order 6, one obtains the optimal value Solving (3.2) with i := 4, one obtains the optimal value and the moment sequence z = (1,.7812,.5,.664,.3334,.3333,.5813,.25,.1964,.25,.5244,.2,.1333, Observe that.1334,.2,.481,.1667,.981,.833,.983,.1667) z 1k 1 y k 1 y 2 dy O(1 6 ), k =,...4, z 1k 1 y k 1 y 2 dy O(1 5 ), k =5, 6, 7. Using a max-entropy approach to approximate the density y g(y) on [, 1], with the first 5 moments z 1k, k =,..., 4, we find that the optimal function h 4 in (3.11) is obtained with λ =(.1564, , , , ). Both curves of g and h 4 are displayed in Figure 3.1. Observe that with only 5 moments, the max-entropy solution h 4 approximates g relatively well, even if it differs significantly at some points. Indeed, the shape of h 4 resembles very much that of g. Finally, from an optimal solution of (3.5) one obtains for p R[y], the degree-8 univariate polynomial y p(y) =.4.999y.876y y y y y y y 8 and Figure 3.2 displays the curve y J(y) p(y) on [, 1]. One observes that J p and the maximum difference is about close to and much less for y.1, a good precision with only 8 moments. Example 3.1. Again with := [, 1], let := {(x,y) : 1 x 2 1 x 2 2 } R 2, (x,y) f(x,y) := yx 1 + (1 y)x 2. For each value of the parameter y, the unique optimal solution x satisfies (x 1(y)) 2 +(x 2(y)) 2 = 1; (x 1(y)) 2 = with optimal value y 2 y 2 + (1 y) 2, (x 2(y)) 2 (1 y)2 = y 2 + (1 y) 2, y 2 J(y) = y2 + (1 y) (1 y) 2 2 y2 + (1 y) = y 2 + (1 y) 2. 2

18 18 Jean B. LASSERRE Fig Example 3.9: g(y) = p 1 y 2 versus h 4 (y) x Fig Example 3.9: J(y) p(y) on [, 1] So in Theorem 3.3(b), y g 1 (y) = y y2 + (1 y) 2, y g 2(y) = y 1 y2 + (1 y) 2,

19 Parametric polynomial optimization 19 and with ϕ being the probability measure uniformly distributed on [, 1], ρ = 1 J(y) dϕ(y) = 1 y2 + (1 y) 2 dy Solving (3.2) with i := 3, that is, with moments up to order 6, one obtains ρ with ρ 3 ρ O(1 5 ). Solving (3.2) with i := 4, one obtains ρ with ρ 4 ρ O(1 6 ), and the moment sequence (z k1 ), k =, 1, 2, 3, 4: and z k1 =(.6232,.458,.2971,.2328,.197), z k1 1 y k g 1 (y) dy O(1 5 ), k =,..., 4. Using a max-entropy approach to approximate the density y g 1 (y) on [, 1], with the first 5 moments z 1k, k =,..., 4, we find that the optimal function h 4 in (3.11) is obtained with and we find that λ =( , ). z k1 + 1 y k h 4 (y) dy O(1 11 ), k =,..., 4. In Figure 3.3 are displayed the two functions g 1 and h 4, and one observes a very good concordance Fig Example 3.1: h 4 (y) versus g 1(y) =y/ p y 2 + (1 y) 2

20 2 Jean B. LASSERRE Finally, from an optimal solution of (3.5) one obtains for p R[y], the following degree-8 univariate polynomial y p(y) := y.4537y y y y y y y 8 and Figure 3.4 displays the curve y J(y) p(y) on [, 1]. One observes that J p and the maximum difference is about 1 4, a good precision with only 8 moments. 1.2 x Fig Example 3.1: J(y) p(y) on [, 1] Example In this example one has = [, 1], (x,y) f(x,y) := yx 1 + (1 y)x 2, and := {(x,y):yx x 2 2 y<= ; x yx 2 y<=}. That is, for each y the set y is the intersection of two ellipsoids. It is easy to chack that 1 + x i (y) for all y, i := 1, 2. With i = 4, the max-entropy estimate y h 4(y) for 1 + x 1(y) is obtained with λ =(.2894, , , , ), whereas the max-entropy estimate y h 4 (y) for 1 + x 2 (y) is obtained with λ =(.118, 3.928, 4.468, 1.796, ). Figure 3.5 displays the curves of x 1 (y) and x 2 (y), as well as the constraint h 1(x (y),y). Observe that h 1 (x (y),y) on [, 1] which means that for ϕ-almost all y [, 1], at an optimal solution x (y), the constraint h 1 is saturated. Figure 3.6 displays the curves of h 1 (x (y),y) and h 2 (x (y),y). Example This time = [, 1], (x,y) f(x,y) := (1 2y)(x 1 + x 2 ), and := {(x,y):yx x2 2 y = ; x2 1 + yx2 y =}.

21 Parametric polynomial optimization Fig Example 3.11: x 1 (y), x 2 (y) and h 1(x (y),y) on [, 1] Fig Example 3.11: h 1 (x (y),y) and h 2 (x (y),y) on [, 1] That is, for each y the set y is the intersection of two ellipses, and ( ) y y y x = ± 1+y, ± ; J(y) = 2 1 2y 1+y 1+y. With i = 4 the max-entropy estimate y h 4 (y) for 1 + x 1 (y) is obtained with λ = ( , , , , ).

22 22 Jean B. LASSERRE In Figure 3.7 are displayed the curves y p(y) and y J(y), whereas in Figure 3.8 is displayed the curve y p(y) J(y). One may see that p is a good lower approximation of J even with only 8 moments Fig Example 3.12: p(y) and J(y) on [, 1] Fig Example 3.12: the curve p(y) J(y) on [, 1] On the other hand, in Figure 3.9 is displayed h 4 (y) versus x 1 (y) where the latter is y/(1 + y) on [, 1/2] and y/(1 + y) on [1/2, 1]. Here we see that the discontinuity

23 y=.1 y=.5 y=1 J(y) p 6 (y) p 8 (y) Parametric polynomial optimization 23 Table 3.1 J(y) versus p k (y) in Example 3.13 of x 1 (y) is difficult to approximate pointwise with few moments, and despite a very good precision on the five first moments. Indeed: 1 1 y k (h 4 (y) 1) dx y k x 1 (y) dx = O(1 14 ), k =,..., Fig Example 3.12: h 4 (y) 1 and x 1 (y) on [, 1] Example Consider the following system of 4 quadratic equations in 4 variables and one parameter y = [, 1]: x 1 x 2 x 1 x 3 x 4 = yx 2 x 3 x 2 x 4 x 1 = y x 1 x 3 + x 3 x 4 x 2 = yx 1 x 4 x 2 x 4 x 3 = y, for which one wishes to compute the minimum norm J(y) of real solutions as a function of y. We have computed the polynomial y p k (y) from (3.5) with degree k =6 and k = 8, with ϕ being the probability with uniform distribution on [, 1]. The results displayed in Table 1 show that a pretty good approximation is obtained with few moments of ϕ.

24 24 Jean B. LASSERRE We end up this section with the case where the density g k to estimate is a step function which would be the case in an optimization problem P y with boolean variables (e.g. the variable x k takes values in {, 1}). Example Assume that with a single parameter y [, 1], the density g k to estimate is the step function. { 1 if y [, 1/3] [2/3, 1] y g k (y) := otherwise. The max-entropy estimate h 4 in (3.11) with 5 moments is obtained with λ =[ ], and we have 1 1 y k h 4 (y) dy y k dg k (y) O(1 8 ), k =,..., 4. In particular, the persistency 1 g k(y)dy =2/3ofthe variable x k (y), is very well approximated (up to 1 8 precision) by h 4(y)dy, with only 5 moments Fig Example 3.14: Max-entropy estimate h 4 (y) of 1 [,1/3] [2/3,1] Of course, in this case and with only 5 moments, the density h 4 is not a good pointwise approximation of the step function g k (y) = 1 [,1/3] [2/3,1] ; however its shape in Figure 3.1 reveals the two steps of value 1 separated by a step of value. A better pointwise approximation would require more moments. 4. Appendix.

25 Parametric polynomial optimization Set-valued functions and measurable selectors. The material of this subsection is taken from [13, D]. Let X R p and R n be Borel spaces. A setvalued mapping ψ : X is a function such that ψ(x) is a nonempty subset of for all x X. The graph Gr ψ of the set-valued function ψ is the subset of X defined by: Gr ψ := { (x, y) : x X, y ψ(x) }. Definition 4.1. A set-valued function ψ : X is said to be: (a) Borel-measurable if ψ 1 (B) is a Borel subset of X for every open set B. (b) compact-valued if ψ(x) is compact for every x X. (c) closed if its graph Gr ψ is closed. Definition 4.2. Let ψ : X be a Borel-measurable set-valued function, and let F be the set of all measurable functions f : X with f(x) ψ(x) for all x X. A function f F is called a measurable selector. Let v : Gr ψ : R be a measurable function and let v : X R be defined by: v (x) := inf { v(x, y) : y ψ(x) }, x X. y Proposition 4.3. ([13, D.4]) Let ψ : X be compact-valued. Then the following are equivalent: (a) ψ is Borel-measurable. (b) ψ 1 (F ) is a Borel subset of X for every closed set F. (c) Gr ψ is a Borel subset of X. (d) ψ is a measurable function from X to the space of nonempty compact subsets of, topologized by the Hausdorff metric. Proposition 4.4. ([13, D.5]) Suppose that ψ : X is compact-valued and the function y v(x, y) is lower-semicontinuous on ψ(x) for every x X. Then there exists a measurable selector f F such that: v(x,f(x)) = v (x) = min y {v(x, y) : y ψ(x) }, x X. Definition 4.5. For two Borel spaces X,, a stochastic kernel on given X is a function P ( ) such that: (a) P ( x) is a probability measure on for each fixed x X. (b) P (B ) is a measurable function on X for each fixed Borel subset B of. Let ψ : X be a Borel-measurable set-valued function such that F is nonempty (equivalently, there is a measurable function f : X whose graph is contained in Gr ψ). Let Φ be the class of stochastic kernels ϕ on given such that ϕ(ψ(x) x) = 1 for every x X. Finally, for a probability measure µ on X, denote by µ 1 the marginal (or projection) of µ on X, i.e., µ 1 (B) := µ(b ), B B(X), where B(X) is the Borel σ-field associated with X. Proposition 4.6. ([13, D.8]) If µ is a probability measure on X, concentrated on the graph Gr ψ of ψ, then there exists a stochastic kernel ϕ Φ such that µ(b C) = ϕ(c x) µ 1 (dx), B B(X), C B( ). See also [1, p ]. B

26 26 Jean B. LASSERRE 4.2. Proof of Theorem We already know that ρ i ρ for all i i. We also need to prove that ρ i > for sufficiently large i. Let Q R[x, y] be the quadratic module generated by the polynomials {h j } R[x, y] that define, i.e., t Q := { σ R[x, y] : σ = σ + σ j h j j=1 with {σ j } t j= Σ[x, y]}. In addition, let Q(l) Q be the set of elements σ Q which have a representation σ + t j= σ j h j for some s.o.s. family {σ j } Σ 2 with deg σ 2l and deg σ j h j 2l for all j =1,...,t. Let i N be fixed. As is compact, there exists N such that N ± x α y β > on, for all α N n and β N p, with α + β 2i. Therefore, under Assumption 3.1(ii), the polynomial N ± x α y β belongs to Q; see Putinar [22]. But there is even some l(i) such that N ± x α y β Q(l(i)) for every α + β 2i. Of course we also have N ± x α y β Q(l) for every α + β 2i, whenever l l(i). Therefore, let us take l(i) i. For every feasible solution z of Q l(i) one has z αβ = L z (x α y β ) N, α + β 2i. This follows from z = 1, M l(i) (z) and M l(i) vj (h j z), which implies Nz ± z αβ = L z (N ± x α y β )=L z (σ )+ t L z (σ j h j ) for some {σ j } Σ[x, y] with deg σ j h j 2l(i). In particular, L z (f) N α,β f αβ, which proves that ρ l(i) >, and so ρ i > for all sufficiently large i. From what precedes, and with k N arbitrary, let l(k) k and N k be such that N k ± x α y β Q(l(k)) α N n,β N p with α + β 2k. (4.1) Let i l(i ), and let z i be a nearly optimal solution of (3.2) with value ρ i L z i(f) ρ i + 1 ( ρ + 1 ). (4.2) i i j=1 Fix k N. Notice that from (4.1), for every i l(k), one has L z i(x α y β ) N k z = N k, α N n,β N p with α + β 2k. Therefore, for all i l(i ), z i αβ = L z i(xα y β ) N k, α Nn,β N p with α + β 2k, (4.3) where N k = max[n k,v k ], with V k := max α,β,i { zi αβ : α + β 2k ; l(i ) i l(k) }. Complete each vector z i with zeros to make it an infinite bounded sequence in l, indexed in the canonical basis (x α y β ) of R[x, y]. In view of (4.3), z i αβ N k α N n,β N p with 2k 1 α + β 2k, (4.4)

arxiv: v2 [math.oc] 31 May 2010

arxiv: v2 [math.oc] 31 May 2010 A joint+marginal algorithm for polynomial optimization Jean B. Lasserre and Tung Phan Thanh arxiv:1003.4915v2 [math.oc] 31 May 2010 Abstract We present a new algorithm for solving a polynomial program

More information

A new look at nonnegativity on closed sets

A new look at nonnegativity on closed sets A new look at nonnegativity on closed sets LAAS-CNRS and Institute of Mathematics, Toulouse, France IPAM, UCLA September 2010 Positivstellensatze for semi-algebraic sets K R n from the knowledge of defining

More information

The moment-lp and moment-sos approaches

The moment-lp and moment-sos approaches The moment-lp and moment-sos approaches LAAS-CNRS and Institute of Mathematics, Toulouse, France CIRM, November 2013 Semidefinite Programming Why polynomial optimization? LP- and SDP- CERTIFICATES of POSITIVITY

More information

Convergence rates of moment-sum-of-squares hierarchies for volume approximation of semialgebraic sets

Convergence rates of moment-sum-of-squares hierarchies for volume approximation of semialgebraic sets Convergence rates of moment-sum-of-squares hierarchies for volume approximation of semialgebraic sets Milan Korda 1, Didier Henrion,3,4 Draft of December 1, 016 Abstract Moment-sum-of-squares hierarchies

More information

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure? MA 645-4A (Real Analysis), Dr. Chernov Homework assignment 1 (Due ). Show that the open disk x 2 + y 2 < 1 is a countable union of planar elementary sets. Show that the closed disk x 2 + y 2 1 is a countable

More information

Moments and Positive Polynomials for Optimization II: LP- VERSUS SDP-relaxations

Moments and Positive Polynomials for Optimization II: LP- VERSUS SDP-relaxations Moments and Positive Polynomials for Optimization II: LP- VERSUS SDP-relaxations LAAS-CNRS and Institute of Mathematics, Toulouse, France Tutorial, IMS, Singapore 2012 LP-relaxations LP- VERSUS SDP-relaxations

More information

The moment-lp and moment-sos approaches in optimization

The moment-lp and moment-sos approaches in optimization The moment-lp and moment-sos approaches in optimization LAAS-CNRS and Institute of Mathematics, Toulouse, France Workshop Linear Matrix Inequalities, Semidefinite Programming and Quantum Information Theory

More information

2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure? MA 645-4A (Real Analysis), Dr. Chernov Homework assignment 1 (Due 9/5). Prove that every countable set A is measurable and µ(a) = 0. 2 (Bonus). Let A consist of points (x, y) such that either x or y is

More information

Moments and Positive Polynomials for Optimization II: LP- VERSUS SDP-relaxations

Moments and Positive Polynomials for Optimization II: LP- VERSUS SDP-relaxations Moments and Positive Polynomials for Optimization II: LP- VERSUS SDP-relaxations LAAS-CNRS and Institute of Mathematics, Toulouse, France EECI Course: February 2016 LP-relaxations LP- VERSUS SDP-relaxations

More information

Cone-Constrained Linear Equations in Banach Spaces 1

Cone-Constrained Linear Equations in Banach Spaces 1 Journal of Convex Analysis Volume 4 (1997), No. 1, 149 164 Cone-Constrained Linear Equations in Banach Spaces 1 O. Hernandez-Lerma Departamento de Matemáticas, CINVESTAV-IPN, A. Postal 14-740, México D.F.

More information

THEOREMS, ETC., FOR MATH 516

THEOREMS, ETC., FOR MATH 516 THEOREMS, ETC., FOR MATH 516 Results labeled Theorem Ea.b.c (or Proposition Ea.b.c, etc.) refer to Theorem c from section a.b of Evans book (Partial Differential Equations). Proposition 1 (=Proposition

More information

Optimization Theory. A Concise Introduction. Jiongmin Yong

Optimization Theory. A Concise Introduction. Jiongmin Yong October 11, 017 16:5 ws-book9x6 Book Title Optimization Theory 017-08-Lecture Notes page 1 1 Optimization Theory A Concise Introduction Jiongmin Yong Optimization Theory 017-08-Lecture Notes page Optimization

More information

4. Algebra and Duality

4. Algebra and Duality 4-1 Algebra and Duality P. Parrilo and S. Lall, CDC 2003 2003.12.07.01 4. Algebra and Duality Example: non-convex polynomial optimization Weak duality and duality gap The dual is not intrinsic The cone

More information

2. Function spaces and approximation

2. Function spaces and approximation 2.1 2. Function spaces and approximation 2.1. The space of test functions. Notation and prerequisites are collected in Appendix A. Let Ω be an open subset of R n. The space C0 (Ω), consisting of the C

More information

Functional Analysis I

Functional Analysis I Functional Analysis I Course Notes by Stefan Richter Transcribed and Annotated by Gregory Zitelli Polar Decomposition Definition. An operator W B(H) is called a partial isometry if W x = X for all x (ker

More information

arxiv: v1 [math.oc] 31 Jan 2017

arxiv: v1 [math.oc] 31 Jan 2017 CONVEX CONSTRAINED SEMIALGEBRAIC VOLUME OPTIMIZATION: APPLICATION IN SYSTEMS AND CONTROL 1 Ashkan Jasour, Constantino Lagoa School of Electrical Engineering and Computer Science, Pennsylvania State University

More information

On duality theory of conic linear problems

On duality theory of conic linear problems On duality theory of conic linear problems Alexander Shapiro School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia 3332-25, USA e-mail: ashapiro@isye.gatech.edu

More information

Strong duality in Lasserre s hierarchy for polynomial optimization

Strong duality in Lasserre s hierarchy for polynomial optimization Strong duality in Lasserre s hierarchy for polynomial optimization arxiv:1405.7334v1 [math.oc] 28 May 2014 Cédric Josz 1,2, Didier Henrion 3,4,5 Draft of January 24, 2018 Abstract A polynomial optimization

More information

PROBLEMS. (b) (Polarization Identity) Show that in any inner product space

PROBLEMS. (b) (Polarization Identity) Show that in any inner product space 1 Professor Carl Cowen Math 54600 Fall 09 PROBLEMS 1. (Geometry in Inner Product Spaces) (a) (Parallelogram Law) Show that in any inner product space x + y 2 + x y 2 = 2( x 2 + y 2 ). (b) (Polarization

More information

An introduction to some aspects of functional analysis

An introduction to some aspects of functional analysis An introduction to some aspects of functional analysis Stephen Semmes Rice University Abstract These informal notes deal with some very basic objects in functional analysis, including norms and seminorms

More information

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,

More information

On John type ellipsoids

On John type ellipsoids On John type ellipsoids B. Klartag Tel Aviv University Abstract Given an arbitrary convex symmetric body K R n, we construct a natural and non-trivial continuous map u K which associates ellipsoids to

More information

6-1 The Positivstellensatz P. Parrilo and S. Lall, ECC

6-1 The Positivstellensatz P. Parrilo and S. Lall, ECC 6-1 The Positivstellensatz P. Parrilo and S. Lall, ECC 2003 2003.09.02.10 6. The Positivstellensatz Basic semialgebraic sets Semialgebraic sets Tarski-Seidenberg and quantifier elimination Feasibility

More information

7 Complete metric spaces and function spaces

7 Complete metric spaces and function spaces 7 Complete metric spaces and function spaces 7.1 Completeness Let (X, d) be a metric space. Definition 7.1. A sequence (x n ) n N in X is a Cauchy sequence if for any ɛ > 0, there is N N such that n, m

More information

Commutative Banach algebras 79

Commutative Banach algebras 79 8. Commutative Banach algebras In this chapter, we analyze commutative Banach algebras in greater detail. So we always assume that xy = yx for all x, y A here. Definition 8.1. Let A be a (commutative)

More information

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability... Functional Analysis Franck Sueur 2018-2019 Contents 1 Metric spaces 1 1.1 Definitions........................................ 1 1.2 Completeness...................................... 3 1.3 Compactness......................................

More information

Convex Optimization 1

Convex Optimization 1 Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.245: MULTIVARIABLE CONTROL SYSTEMS by A. Megretski Convex Optimization 1 Many optimization objectives generated

More information

Approximate Optimal Designs for Multivariate Polynomial Regression

Approximate Optimal Designs for Multivariate Polynomial Regression Approximate Optimal Designs for Multivariate Polynomial Regression Fabrice Gamboa Collaboration with: Yohan de Castro, Didier Henrion, Roxana Hess, Jean-Bernard Lasserre Universität Potsdam 16th of February

More information

LMI MODELLING 4. CONVEX LMI MODELLING. Didier HENRION. LAAS-CNRS Toulouse, FR Czech Tech Univ Prague, CZ. Universidad de Valladolid, SP March 2009

LMI MODELLING 4. CONVEX LMI MODELLING. Didier HENRION. LAAS-CNRS Toulouse, FR Czech Tech Univ Prague, CZ. Universidad de Valladolid, SP March 2009 LMI MODELLING 4. CONVEX LMI MODELLING Didier HENRION LAAS-CNRS Toulouse, FR Czech Tech Univ Prague, CZ Universidad de Valladolid, SP March 2009 Minors A minor of a matrix F is the determinant of a submatrix

More information

Semidefinite Programming

Semidefinite Programming Semidefinite Programming Notes by Bernd Sturmfels for the lecture on June 26, 208, in the IMPRS Ringvorlesung Introduction to Nonlinear Algebra The transition from linear algebra to nonlinear algebra has

More information

Convex Optimization & Parsimony of L p-balls representation

Convex Optimization & Parsimony of L p-balls representation Convex Optimization & Parsimony of L p -balls representation LAAS-CNRS and Institute of Mathematics, Toulouse, France IMA, January 2016 Motivation Unit balls associated with nonnegative homogeneous polynomials

More information

L p Spaces and Convexity

L p Spaces and Convexity L p Spaces and Convexity These notes largely follow the treatments in Royden, Real Analysis, and Rudin, Real & Complex Analysis. 1. Convex functions Let I R be an interval. For I open, we say a function

More information

CHAPTER I THE RIESZ REPRESENTATION THEOREM

CHAPTER I THE RIESZ REPRESENTATION THEOREM CHAPTER I THE RIESZ REPRESENTATION THEOREM We begin our study by identifying certain special kinds of linear functionals on certain special vector spaces of functions. We describe these linear functionals

More information

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Compiled by David Rosenberg Abstract Boyd and Vandenberghe s Convex Optimization book is very well-written and a pleasure to read. The

More information

THEOREMS, ETC., FOR MATH 515

THEOREMS, ETC., FOR MATH 515 THEOREMS, ETC., FOR MATH 515 Proposition 1 (=comment on page 17). If A is an algebra, then any finite union or finite intersection of sets in A is also in A. Proposition 2 (=Proposition 1.1). For every

More information

On Polynomial Optimization over Non-compact Semi-algebraic Sets

On Polynomial Optimization over Non-compact Semi-algebraic Sets On Polynomial Optimization over Non-compact Semi-algebraic Sets V. Jeyakumar, J.B. Lasserre and G. Li Revised Version: April 3, 2014 Communicated by Lionel Thibault Abstract The optimal value of a polynomial

More information

Probability and Measure

Probability and Measure Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 84 Paper 4, Section II 26J Let (X, A) be a measurable space. Let T : X X be a measurable map, and µ a probability

More information

CONVEXITY IN SEMI-ALGEBRAIC GEOMETRY AND POLYNOMIAL OPTIMIZATION

CONVEXITY IN SEMI-ALGEBRAIC GEOMETRY AND POLYNOMIAL OPTIMIZATION CONVEXITY IN SEMI-ALGEBRAIC GEOMETRY AND POLYNOMIAL OPTIMIZATION JEAN B. LASSERRE Abstract. We review several (and provide new) results on the theory of moments, sums of squares and basic semi-algebraic

More information

1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3

1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3 Index Page 1 Topology 2 1.1 Definition of a topology 2 1.2 Basis (Base) of a topology 2 1.3 The subspace topology & the product topology on X Y 3 1.4 Basic topology concepts: limit points, closed sets,

More information

The small ball property in Banach spaces (quantitative results)

The small ball property in Banach spaces (quantitative results) The small ball property in Banach spaces (quantitative results) Ehrhard Behrends Abstract A metric space (M, d) is said to have the small ball property (sbp) if for every ε 0 > 0 there exists a sequence

More information

Convex Optimization. (EE227A: UC Berkeley) Lecture 28. Suvrit Sra. (Algebra + Optimization) 02 May, 2013

Convex Optimization. (EE227A: UC Berkeley) Lecture 28. Suvrit Sra. (Algebra + Optimization) 02 May, 2013 Convex Optimization (EE227A: UC Berkeley) Lecture 28 (Algebra + Optimization) 02 May, 2013 Suvrit Sra Admin Poster presentation on 10th May mandatory HW, Midterm, Quiz to be reweighted Project final report

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

arxiv: v2 [math.oc] 30 Sep 2015

arxiv: v2 [math.oc] 30 Sep 2015 Symmetry, Integrability and Geometry: Methods and Applications Moments and Legendre Fourier Series for Measures Supported on Curves Jean B. LASSERRE SIGMA 11 (215), 77, 1 pages arxiv:158.6884v2 [math.oc]

More information

Notions such as convergent sequence and Cauchy sequence make sense for any metric space. Convergent Sequences are Cauchy

Notions such as convergent sequence and Cauchy sequence make sense for any metric space. Convergent Sequences are Cauchy Banach Spaces These notes provide an introduction to Banach spaces, which are complete normed vector spaces. For the purposes of these notes, all vector spaces are assumed to be over the real numbers.

More information

Mean squared error minimization for inverse moment problems

Mean squared error minimization for inverse moment problems Mean squared error minimization for inverse moment problems Didier Henrion 1,2,3, Jean B. Lasserre 1,2,4, Martin Mevissen 5 August 28, 2012 Abstract We consider the problem of approximating the unknown

More information

A ten page introduction to conic optimization

A ten page introduction to conic optimization CHAPTER 1 A ten page introduction to conic optimization This background chapter gives an introduction to conic optimization. We do not give proofs, but focus on important (for this thesis) tools and concepts.

More information

Aliprantis, Border: Infinite-dimensional Analysis A Hitchhiker s Guide

Aliprantis, Border: Infinite-dimensional Analysis A Hitchhiker s Guide aliprantis.tex May 10, 2011 Aliprantis, Border: Infinite-dimensional Analysis A Hitchhiker s Guide Notes from [AB2]. 1 Odds and Ends 2 Topology 2.1 Topological spaces Example. (2.2) A semimetric = triangle

More information

Dynkin (λ-) and π-systems; monotone classes of sets, and of functions with some examples of application (mainly of a probabilistic flavor)

Dynkin (λ-) and π-systems; monotone classes of sets, and of functions with some examples of application (mainly of a probabilistic flavor) Dynkin (λ-) and π-systems; monotone classes of sets, and of functions with some examples of application (mainly of a probabilistic flavor) Matija Vidmar February 7, 2018 1 Dynkin and π-systems Some basic

More information

Math 209B Homework 2

Math 209B Homework 2 Math 29B Homework 2 Edward Burkard Note: All vector spaces are over the field F = R or C 4.6. Two Compactness Theorems. 4. Point Set Topology Exercise 6 The product of countably many sequentally compact

More information

Lecture 3: Semidefinite Programming

Lecture 3: Semidefinite Programming Lecture 3: Semidefinite Programming Lecture Outline Part I: Semidefinite programming, examples, canonical form, and duality Part II: Strong Duality Failure Examples Part III: Conditions for strong duality

More information

DUALITY AND INTEGER PROGRAMMING. Jean B. LASSERRE

DUALITY AND INTEGER PROGRAMMING. Jean B. LASSERRE LABORATOIRE d ANALYSE et d ARCHITECTURE des SYSTEMES DUALITY AND INTEGER PROGRAMMING Jean B. LASSERRE 1 Current solvers (CPLEX, XPRESS-MP) are rather efficient and can solve many large size problems with

More information

(1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define

(1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define Homework, Real Analysis I, Fall, 2010. (1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define ρ(f, g) = 1 0 f(x) g(x) dx. Show that

More information

McGill University Math 354: Honors Analysis 3

McGill University Math 354: Honors Analysis 3 Practice problems McGill University Math 354: Honors Analysis 3 not for credit Problem 1. Determine whether the family of F = {f n } functions f n (x) = x n is uniformly equicontinuous. 1st Solution: The

More information

Local strong convexity and local Lipschitz continuity of the gradient of convex functions

Local strong convexity and local Lipschitz continuity of the gradient of convex functions Local strong convexity and local Lipschitz continuity of the gradient of convex functions R. Goebel and R.T. Rockafellar May 23, 2007 Abstract. Given a pair of convex conjugate functions f and f, we investigate

More information

Integer programming, Barvinok s counting algorithm and Gomory relaxations

Integer programming, Barvinok s counting algorithm and Gomory relaxations Integer programming, Barvinok s counting algorithm and Gomory relaxations Jean B. Lasserre LAAS-CNRS, Toulouse, France Abstract We propose an algorithm based on Barvinok s counting algorithm for P max{c

More information

On a Class of Multidimensional Optimal Transportation Problems

On a Class of Multidimensional Optimal Transportation Problems Journal of Convex Analysis Volume 10 (2003), No. 2, 517 529 On a Class of Multidimensional Optimal Transportation Problems G. Carlier Université Bordeaux 1, MAB, UMR CNRS 5466, France and Université Bordeaux

More information

A SET OF LECTURE NOTES ON CONVEX OPTIMIZATION WITH SOME APPLICATIONS TO PROBABILITY THEORY INCOMPLETE DRAFT. MAY 06

A SET OF LECTURE NOTES ON CONVEX OPTIMIZATION WITH SOME APPLICATIONS TO PROBABILITY THEORY INCOMPLETE DRAFT. MAY 06 A SET OF LECTURE NOTES ON CONVEX OPTIMIZATION WITH SOME APPLICATIONS TO PROBABILITY THEORY INCOMPLETE DRAFT. MAY 06 CHRISTIAN LÉONARD Contents Preliminaries 1 1. Convexity without topology 1 2. Convexity

More information

Continuous Functions on Metric Spaces

Continuous Functions on Metric Spaces Continuous Functions on Metric Spaces Math 201A, Fall 2016 1 Continuous functions Definition 1. Let (X, d X ) and (Y, d Y ) be metric spaces. A function f : X Y is continuous at a X if for every ɛ > 0

More information

Locally convex spaces, the hyperplane separation theorem, and the Krein-Milman theorem

Locally convex spaces, the hyperplane separation theorem, and the Krein-Milman theorem 56 Chapter 7 Locally convex spaces, the hyperplane separation theorem, and the Krein-Milman theorem Recall that C(X) is not a normed linear space when X is not compact. On the other hand we could use semi

More information

Integration on Measure Spaces

Integration on Measure Spaces Chapter 3 Integration on Measure Spaces In this chapter we introduce the general notion of a measure on a space X, define the class of measurable functions, and define the integral, first on a class of

More information

Lecture 5. The Dual Cone and Dual Problem

Lecture 5. The Dual Cone and Dual Problem IE 8534 1 Lecture 5. The Dual Cone and Dual Problem IE 8534 2 For a convex cone K, its dual cone is defined as K = {y x, y 0, x K}. The inner-product can be replaced by x T y if the coordinates of the

More information

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9 MAT 570 REAL ANALYSIS LECTURE NOTES PROFESSOR: JOHN QUIGG SEMESTER: FALL 204 Contents. Sets 2 2. Functions 5 3. Countability 7 4. Axiom of choice 8 5. Equivalence relations 9 6. Real numbers 9 7. Extended

More information

THE STONE-WEIERSTRASS THEOREM AND ITS APPLICATIONS TO L 2 SPACES

THE STONE-WEIERSTRASS THEOREM AND ITS APPLICATIONS TO L 2 SPACES THE STONE-WEIERSTRASS THEOREM AND ITS APPLICATIONS TO L 2 SPACES PHILIP GADDY Abstract. Throughout the course of this paper, we will first prove the Stone- Weierstrass Theroem, after providing some initial

More information

CHAPTER V DUAL SPACES

CHAPTER V DUAL SPACES CHAPTER V DUAL SPACES DEFINITION Let (X, T ) be a (real) locally convex topological vector space. By the dual space X, or (X, T ), of X we mean the set of all continuous linear functionals on X. By the

More information

Tools from Lebesgue integration

Tools from Lebesgue integration Tools from Lebesgue integration E.P. van den Ban Fall 2005 Introduction In these notes we describe some of the basic tools from the theory of Lebesgue integration. Definitions and results will be given

More information

Real Analysis Problems

Real Analysis Problems Real Analysis Problems Cristian E. Gutiérrez September 14, 29 1 1 CONTINUITY 1 Continuity Problem 1.1 Let r n be the sequence of rational numbers and Prove that f(x) = 1. f is continuous on the irrationals.

More information

Your first day at work MATH 806 (Fall 2015)

Your first day at work MATH 806 (Fall 2015) Your first day at work MATH 806 (Fall 2015) 1. Let X be a set (with no particular algebraic structure). A function d : X X R is called a metric on X (and then X is called a metric space) when d satisfies

More information

Optimality Conditions for Constrained Optimization

Optimality Conditions for Constrained Optimization 72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)

More information

Lecture 7: Semidefinite programming

Lecture 7: Semidefinite programming CS 766/QIC 820 Theory of Quantum Information (Fall 2011) Lecture 7: Semidefinite programming This lecture is on semidefinite programming, which is a powerful technique from both an analytic and computational

More information

Example: feasibility. Interpretation as formal proof. Example: linear inequalities and Farkas lemma

Example: feasibility. Interpretation as formal proof. Example: linear inequalities and Farkas lemma 4-1 Algebra and Duality P. Parrilo and S. Lall 2006.06.07.01 4. Algebra and Duality Example: non-convex polynomial optimization Weak duality and duality gap The dual is not intrinsic The cone of valid

More information

Semialgebraic Relaxations using Moment-SOS Hierarchies

Semialgebraic Relaxations using Moment-SOS Hierarchies Semialgebraic Relaxations using Moment-SOS Hierarchies Victor Magron, Postdoc LAAS-CNRS 17 September 2014 SIERRA Seminar Laboratoire d Informatique de l Ecole Normale Superieure y b sin( par + b) b 1 1

More information

REVIEW OF ESSENTIAL MATH 346 TOPICS

REVIEW OF ESSENTIAL MATH 346 TOPICS REVIEW OF ESSENTIAL MATH 346 TOPICS 1. AXIOMATIC STRUCTURE OF R Doğan Çömez The real number system is a complete ordered field, i.e., it is a set R which is endowed with addition and multiplication operations

More information

Review of measure theory

Review of measure theory 209: Honors nalysis in R n Review of measure theory 1 Outer measure, measure, measurable sets Definition 1 Let X be a set. nonempty family R of subsets of X is a ring if, B R B R and, B R B R hold. bove,

More information

Real Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi

Real Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi Real Analysis Math 3AH Rudin, Chapter # Dominique Abdi.. If r is rational (r 0) and x is irrational, prove that r + x and rx are irrational. Solution. Assume the contrary, that r+x and rx are rational.

More information

Convex Geometry. Carsten Schütt

Convex Geometry. Carsten Schütt Convex Geometry Carsten Schütt November 25, 2006 2 Contents 0.1 Convex sets... 4 0.2 Separation.... 9 0.3 Extreme points..... 15 0.4 Blaschke selection principle... 18 0.5 Polytopes and polyhedra.... 23

More information

Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University

Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University February 7, 2007 2 Contents 1 Metric Spaces 1 1.1 Basic definitions...........................

More information

WEAK CONVERGENCES OF PROBABILITY MEASURES: A UNIFORM PRINCIPLE

WEAK CONVERGENCES OF PROBABILITY MEASURES: A UNIFORM PRINCIPLE PROCEEDINGS OF THE AMERICAN MATHEMATICAL SOCIETY Volume 126, Number 10, October 1998, Pages 3089 3096 S 0002-9939(98)04390-1 WEAK CONVERGENCES OF PROBABILITY MEASURES: A UNIFORM PRINCIPLE JEAN B. LASSERRE

More information

Integral Jensen inequality

Integral Jensen inequality Integral Jensen inequality Let us consider a convex set R d, and a convex function f : (, + ]. For any x,..., x n and λ,..., λ n with n λ i =, we have () f( n λ ix i ) n λ if(x i ). For a R d, let δ a

More information

MATH & MATH FUNCTIONS OF A REAL VARIABLE EXERCISES FALL 2015 & SPRING Scientia Imperii Decus et Tutamen 1

MATH & MATH FUNCTIONS OF A REAL VARIABLE EXERCISES FALL 2015 & SPRING Scientia Imperii Decus et Tutamen 1 MATH 5310.001 & MATH 5320.001 FUNCTIONS OF A REAL VARIABLE EXERCISES FALL 2015 & SPRING 2016 Scientia Imperii Decus et Tutamen 1 Robert R. Kallman University of North Texas Department of Mathematics 1155

More information

arxiv: v2 [math.ag] 24 Jun 2015

arxiv: v2 [math.ag] 24 Jun 2015 TRIANGULATIONS OF MONOTONE FAMILIES I: TWO-DIMENSIONAL FAMILIES arxiv:1402.0460v2 [math.ag] 24 Jun 2015 SAUGATA BASU, ANDREI GABRIELOV, AND NICOLAI VOROBJOV Abstract. Let K R n be a compact definable set

More information

3 Integration and Expectation

3 Integration and Expectation 3 Integration and Expectation 3.1 Construction of the Lebesgue Integral Let (, F, µ) be a measure space (not necessarily a probability space). Our objective will be to define the Lebesgue integral R fdµ

More information

Part II Probability and Measure

Part II Probability and Measure Part II Probability and Measure Theorems Based on lectures by J. Miller Notes taken by Dexter Chua Michaelmas 2016 These notes are not endorsed by the lecturers, and I have modified them (often significantly)

More information

LEBESGUE INTEGRATION. Introduction

LEBESGUE INTEGRATION. Introduction LEBESGUE INTEGATION EYE SJAMAA Supplementary notes Math 414, Spring 25 Introduction The following heuristic argument is at the basis of the denition of the Lebesgue integral. This argument will be imprecise,

More information

Lecture 5. Theorems of Alternatives and Self-Dual Embedding

Lecture 5. Theorems of Alternatives and Self-Dual Embedding IE 8534 1 Lecture 5. Theorems of Alternatives and Self-Dual Embedding IE 8534 2 A system of linear equations may not have a solution. It is well known that either Ax = c has a solution, or A T y = 0, c

More information

Metric Spaces and Topology

Metric Spaces and Topology Chapter 2 Metric Spaces and Topology From an engineering perspective, the most important way to construct a topology on a set is to define the topology in terms of a metric on the set. This approach underlies

More information

The optimal partial transport problem

The optimal partial transport problem The optimal partial transport problem Alessio Figalli Abstract Given two densities f and g, we consider the problem of transporting a fraction m [0, min{ f L 1, g L 1}] of the mass of f onto g minimizing

More information

CHAPTER VIII HILBERT SPACES

CHAPTER VIII HILBERT SPACES CHAPTER VIII HILBERT SPACES DEFINITION Let X and Y be two complex vector spaces. A map T : X Y is called a conjugate-linear transformation if it is a reallinear transformation from X into Y, and if T (λx)

More information

Contents: 1. Minimization. 2. The theorem of Lions-Stampacchia for variational inequalities. 3. Γ -Convergence. 4. Duality mapping.

Contents: 1. Minimization. 2. The theorem of Lions-Stampacchia for variational inequalities. 3. Γ -Convergence. 4. Duality mapping. Minimization Contents: 1. Minimization. 2. The theorem of Lions-Stampacchia for variational inequalities. 3. Γ -Convergence. 4. Duality mapping. 1 Minimization A Topological Result. Let S be a topological

More information

Mean squared error minimization for inverse moment problems

Mean squared error minimization for inverse moment problems Mean squared error minimization for inverse moment problems Didier Henrion 1,2,3, Jean B. Lasserre 1,2,4, Martin Mevissen 5 June 19, 2013 Abstract We consider the problem of approximating the unknown density

More information

Towards Solving Bilevel Optimization Problems in Quantum Information Theory

Towards Solving Bilevel Optimization Problems in Quantum Information Theory Towards Solving Bilevel Optimization Problems in Quantum Information Theory ICFO-The Institute of Photonic Sciences and University of Borås 22 January 2016 Workshop on Linear Matrix Inequalities, Semidefinite

More information

Functional Analysis HW #3

Functional Analysis HW #3 Functional Analysis HW #3 Sangchul Lee October 26, 2015 1 Solutions Exercise 2.1. Let D = { f C([0, 1]) : f C([0, 1])} and define f d = f + f. Show that D is a Banach algebra and that the Gelfand transform

More information

Contents Real Vector Spaces Linear Equations and Linear Inequalities Polyhedra Linear Programs and the Simplex Method Lagrangian Duality

Contents Real Vector Spaces Linear Equations and Linear Inequalities Polyhedra Linear Programs and the Simplex Method Lagrangian Duality Contents Introduction v Chapter 1. Real Vector Spaces 1 1.1. Linear and Affine Spaces 1 1.2. Maps and Matrices 4 1.3. Inner Products and Norms 7 1.4. Continuous and Differentiable Functions 11 Chapter

More information

Stone-Čech compactification of Tychonoff spaces

Stone-Čech compactification of Tychonoff spaces The Stone-Čech compactification of Tychonoff spaces Jordan Bell jordan.bell@gmail.com Department of Mathematics, University of Toronto June 27, 2014 1 Completely regular spaces and Tychonoff spaces A topological

More information

BASICS OF CONVEX ANALYSIS

BASICS OF CONVEX ANALYSIS BASICS OF CONVEX ANALYSIS MARKUS GRASMAIR 1. Main Definitions We start with providing the central definitions of convex functions and convex sets. Definition 1. A function f : R n R + } is called convex,

More information

Convex Functions and Optimization

Convex Functions and Optimization Chapter 5 Convex Functions and Optimization 5.1 Convex Functions Our next topic is that of convex functions. Again, we will concentrate on the context of a map f : R n R although the situation can be generalized

More information

COURSE ON LMI PART I.2 GEOMETRY OF LMI SETS. Didier HENRION henrion

COURSE ON LMI PART I.2 GEOMETRY OF LMI SETS. Didier HENRION   henrion COURSE ON LMI PART I.2 GEOMETRY OF LMI SETS Didier HENRION www.laas.fr/ henrion October 2006 Geometry of LMI sets Given symmetric matrices F i we want to characterize the shape in R n of the LMI set F

More information

MATHS 730 FC Lecture Notes March 5, Introduction

MATHS 730 FC Lecture Notes March 5, Introduction 1 INTRODUCTION MATHS 730 FC Lecture Notes March 5, 2014 1 Introduction Definition. If A, B are sets and there exists a bijection A B, they have the same cardinality, which we write as A, #A. If there exists

More information

g 2 (x) (1/3)M 1 = (1/3)(2/3)M.

g 2 (x) (1/3)M 1 = (1/3)(2/3)M. COMPACTNESS If C R n is closed and bounded, then by B-W it is sequentially compact: any sequence of points in C has a subsequence converging to a point in C Conversely, any sequentially compact C R n is

More information

THE INVERSE FUNCTION THEOREM

THE INVERSE FUNCTION THEOREM THE INVERSE FUNCTION THEOREM W. PATRICK HOOPER The implicit function theorem is the following result: Theorem 1. Let f be a C 1 function from a neighborhood of a point a R n into R n. Suppose A = Df(a)

More information

Problem Set 6: Solutions Math 201A: Fall a n x n,

Problem Set 6: Solutions Math 201A: Fall a n x n, Problem Set 6: Solutions Math 201A: Fall 2016 Problem 1. Is (x n ) n=0 a Schauder basis of C([0, 1])? No. If f(x) = a n x n, n=0 where the series converges uniformly on [0, 1], then f has a power series

More information