Integral Jensen inequality Let us consider a convex set R d, and a convex function f : (, + ]. For any x,..., x n and λ,..., λ n with n λ i =, we have () f( n λ ix i ) n λ if(x i ). For a R d, let δ a be the Dirac measure concentrated at a, that is { if a δ a () = if a /. Then µ := n λ iδ xi is a probability measure on, defined for all subsets of. Moreover, we can write n λ if(x i ) = f dµ and n λ ix i = x dµ(x), where the last integral is the integral of the vector-valued function ψ(x) = x. (Vector-valued integration will be recalled below.) Thus the finite Jensen inequality () can be written in the integral form (2) f ( x dµ(x)) f dµ. Our aim is to show that (2) holds, under suitable assumptions, also for more general probability measures on. Vector integration. Let (, Σ, µ) be a positive measure space, and F = (F,..., F d ): R d a measurable function (that is, F (A) Σ for each open set A R d ). It is easy to see that F is measurable if and only if each F k is a measurable function. We say that F is integrable on a set Σ if F k L (, µ) for each k =,..., d. It is easy to see that F is integrable on if and only if F dµ < +. In this case, we define ( ) F dµ = F dµ,..., F d dµ. Observe that, for each linear functional l (R d ), we have ( ) (3) l F dµ = l F dµ (this follows immediately by representing l with a vector of R d ). As an easy consequence, we obtain that F dµ exists if and only if l F L (, µ) for each l (R d ). Returning to our aim, our measure µ should be defined on a σ-algebra of subsets of, for which the restriction of the identity function x x to is measurable. The smallest such σ-algebra is the family B() = {B : B Borel(R d )} where Borel(R d ) is the Borel σ-algebra of R d (that is, the σ-algebra generated by the family of all open sets). Thus we shall consider the following family of measures: M () = { µ: B() [, + ) : µ measure, µ() = }. The barycenter x µ of a measure µ M () is defined by x µ = x dµ(x)
2 (if the vector integral exists). Notice that by (3) we have l(x µ ) = l dµ for each l (R d ). Observation.. Let R d be a nonempty convex set, µ M (). (a) x µ exists L (µ) l L (µ) for each l (R d ). (This follows easily from the fact that x µ exists if and only if x(i) dµ(x) < + for each i =,..., d.) (b) If µ is concentrated on a bounded subset of (in particular, if is bounded), then x µ exists. Proposition.2. Let R d be a nonempty convex set, µ M (). barycenter x µ exists, then x µ. If the Proof. Let us proceed by induction with respect to n := dim(). For n =, we have = {x } and hence µ = δ x, x µ = x. Now, fix m < d and suppose that the statement holds whenever n m. Now, let dim() = m + and x µ /. By the Relative Interior Theorem, ri(), and hence we can suppose that ri(). In this case, L := span() = aff() and int L (). Let us consider two cases. (a) If x µ / L, there exists l (R d ) such that l L and l(x µ ) >. (b) If x µ L, there exists l L \ {} such that l(x µ ) sup l() (by the H-B Separation Theorem), and this l can be extended to an element (denoted again by l) of (R d ). In both cases, we have [ l(xµ ) l(x) ] dµ(x) = l(x µ ) ( ) l dµ = l(x µ ) l x dµ(x) =. Since the expression in square brackets is nonnegative, it must be µ-a.e. null. This implies that necessarily x µ L. But in this case, l is not identically zero on L. onsequently, µ is concentrated on the set := H where H = {x L : l(x) = l(x µ )} is a hyperplane in L. Thus dim( ) m, µ := µ M ( ) and, by the induction assumption, x µ = x µ. This contradiction completes the proof. Let us state an interesting corollary to the above proposition. By an infinite convex combination of elements of we mean any point x of the form where c i, λ i, + j= λ j =. x = + i= orollary.3. Let be a finite-dimensional convex set in a Hausdorff t.v.s. X. Then each infinite convex combination of elements of belongs to. Proof. Let x = + i= λ ic i be an infinite convex combination of elements of. Of course, we can restrict ourselves to the subspace Y = span( {x }). Since Y, being a finite-dimensional t.v.s., is isomorphic to R d, d = dim(y ), we can suppose λ i c i
that X = R d. We can also suppose that c i c j whenever i j. onsider the measure µ concentrated on {c i : i N}, given by µ(c i ) = λ i. Since x µ = x, we can apply Proposition.2. Before proving the Jensen inequality, we shall need the following separation lemma. Lemma.4. Let X be a locally convex t.v.s., X a convex set, f : (, + ] a l.s.c. convex function, x dom(f), and t < f(x ). Then there exists a continuous affine function a: X R such that a < f on, and a(x ) > t. Proof. Fix α R such that t < α < f(x ). By lower semicontinuity, there exists a convex neighborhood V of x such that α < f(x) for each x V. xtend f to the whole X defining f(x) = + for x /, and consider the concave function { α if x V g(x) = otherwise. Since g f and g is continuous at x, we can use a H-B Theorem on Separation of Functions to get a continuous affine function a on X such that g a f. Now, t < α a (x ) f(x ). Thus, for a sufficiently small ε >, the function a := a ε has the required properties. Theorem.5 (Jensen inequality). Let R d be a nonempty convex set, f : (, + ] a convex l.s.c. function, and µ M (). Assume that the baricenter x µ exists (which is automatically true for a bounded ). Then x µ and (4) f(x µ ) f dµ (in particular, the integral on the right-hand side exists). Proof. We already know that x µ (Proposition.2). If f +, the assertion is obvious. Now, suppose that f is proper. Notice that f is B()-measurable since the sublevel sets {x : f(x) α} are relatively closed in. We claim that the right-hand-side integral in (4) exists. Indeed, by Lemma.4, there exists a continuous affine function a: R d R such that a < f on. Write a in the form a(x) = l(x) + β where l (R d ), β R. By Observation., a L (µ), which implies that f L (µ). The effective domain dom(f) = f (R) is convex and belongs to B(). If µ( \ dom(f)) >, then obviously f dµ = + and (4) trivially holds. Let µ( \ dom(f)) =. In this case, µ is concentrated on dom(f), and hence, by Proposition.2, x µ dom(f). Assume that (4) is false, that is t := f dµ < f(x µ ). By Lemma.4, there exist l (R d ) and β R such that l + β < f on, and l(x µ ) + β > t. But these two properties are in contradiction: t = f dµ > (l + β) dµ = l dµ + β = l(x µ ) + β > t. 3
4 orollary.6 (Hermite-Hadamard inequalities). Let f : [a, b] R be a continuous convex function. Then ( ) a + b f b f(a) + f(b) f(x) dx. 2 b a 2 a Proof. The measure µ, defined by dµ = dx b a, belongs to M ([a, b]). Moreover, its barycenter is x µ = b a+b b a a x dx = 2. Thus the first inequality follows from the integral Jensen inequality. Let us show the second inequality. Substituting x = a + t(b a), we get b b a a f(x) dx = f(( t)a + tb) dt [ ] ( t)f(a) + tf(b) dt = f(a)+f(b) 2. Image of a probability measure. Let (, Σ, µ) be a probability space, and (T, τ) a topological space. Let g : T be a measurable mapping, that is g (A) Σ whenever A T is an τ-open set. Let Borel(τ) denote the σ-algebra of all Borel sets in T. Since g (B) Σ for each B Borel(τ), we can consider the function ν : Borel(τ) [, ], ν(b) = µ(g (B)). It is easy to verify that ν is a (borelian) probability measure on T, called the image of µ by the map g. Moreover, ν is concentrated on the image g(), in the sense that ν(b) = whenever B Borel(τ) is disjoint from g(). Let us see how to integrate with respect to ν. Let s = n α iχ Bi be a nonnegative simple function on T such that the sets B i are borelian and pairwise disjoint. Then the composition s g is a measurable simple function on and it can be represented as s g = n α iχ g (B i). Thus we have T s dν = n α i ν(b i ) = n α i µ(g (B i )) = s g dµ. Now, if f is a nonnegative Borel-measurable function on T, we can approximate it by a pointwise converging nondecresing sequence of simple Borel-measurable functions. Passing to limits in the above formula, we get (5) f dν = f g dµ. T It follows that, for an arbitrary Borel-measurable function f on T, we have: T f dν exists if and only if f g dµ exists; in this case, the two integrals are equal; f L (ν) if and only if f g L (µ). Let us return to the Jensen inequality. We can apply it to an image measure to obtain the following Theorem.7 (Second Jensen inequality). Let (, Σ, µ) be a probability measure space, and g : R d a measurable mapping that is µ-integrable. Let R d be a convex set such that g(ω) for µ-a.e. ω, and f : (, + ] a l.s.c. convex function. Then: g dµ ;
f g dµ exists; we have the inequality ( ) f g dµ f g dµ. Proof. Let ν be the image of the measure µ by the mapping g :. Then ν is a probability measure on, defined on the relative Borel σ-algebra of B(). Its barycenter is x ν = x dν = g dµ. The rest follows directly from Theorem.5 since f dν = f g dµ. orollary.8 (xamples of applications). Let (, Σ, µ) be a probability measure space, and g L (). (a) Second Jensen inequality for = R and f(x) = x p (p ) gives ( /p g dµ g dµ) p. (b) Second Jensen inequality for = R and f(x) = e x = exp(x) gives ( ) exp g dµ e g dµ. (c) Second Jensen inequality for = (, + ) and f(x) = log x gives ( ) log g dµ log g dµ. Hölder via Jensen. Let us show how the well-known Hölder inequality can be derived from the Second Jensen inequality. Let (, Σ, µ) be a (not necessarily probability) nonnegative measure space. Let p, p (, + ) be two conjugate Hölder exponents (that is, p + p = ). Let f, g be nonnegative measurable functions on, such that their norms in L p (µ) and L p (µ) satisfy < f p < + and < g p < +. onsider the set = {g > } ( Σ) and the measure ν on Σ, defined by dν = dµ gp dµ. gp Then ν is a probability measure which is concentrated on. Applying orollary.8(a) to the function fg p (defined on the set ), we get gp dµ fg dµ = dν fg p ( ) /p f p g p( p ) dν ( ) /p = gp dµ f p g p+p pp dµ = ( gp dµ ) /p ( f p dµ ) /p 5
6 since p + p pp = pp ( p + p ) =. Multiplying by gp dµ, we obtain the Hölder inequality fg dµ f p g p. Finally, notice that if our assumption on the L p -norm of f and the L p -norm of g is not satisfied, the Hölder inequality is trivially satisfied (with the usual convention = ). A generalization of the Hermite-Hadamard inequalities. Let be an arbitrary norm on R d, B R d a closed -ball centered at a point x R d, and S = B. Let m denote the Lebesgue measure in R d and σ the surface measure. Then, for each continuous convex function f : B R, we have the inequalities (6) f(x ) m(b) B f dm σ(s) S f dσ. dm m(b), Proof. By translation, we can suppose that x =. For the probability measure dµ = we have x µ = by symmetry, and hence the Jensen inequality implies the first inequality in (6). Let us show the second inequality. Notice that m(b) = σ(rs) dr = σ(s) r d dr = d σ(s). Now, a similar onion-skin integration and convexity of f imply ( ) f dm = f dσ dr B = ( ( ( = rs rs f ( ) +r 2 ( x r ) + r 2 ( x r )) dσ(x) dr [ +r 2 f( x r ) + r 2 f( x r )] dσ(x) rs ) r d dr f dσ = m(b) f dσ. σ(s) The second inequality in (6) follows by dividing by m(b). S S ) dr Possible generalizations. Some of the above results, except Proposition.2, can be easily generalized to the infinite-dimensional setting. The main problem is that we have to introduce appropriately the barycenter of a probability measure. The most general is the following Pettis-integral approach which defines the integral of a vector-valued function by requiring the equality (3) for every continuous linear functional l. Let X be a t.v.s., and X a nonempty set. We shall say that a point x µ X is a barycenter for a probability measure µ M () (defined on the relative Borel σ-algebra B()) if y (x µ ) = y dµ for each y X. Then we have the following results. Let X be a locally convex t.v.s., X a nonempty convex set, and µ M (). (i) µ has at most one barycenter x µ. (This follows from the fact that X separates the points of X.) (ii) If x µ exists and is either closed or open, then x µ. (The proof by contradiction uses the H-B Separation Theorem, in the same spirit as in Proposition.2.)
(iii) If is compact, then x µ exists. (This is a bit more difficult.) (iv) Theorem.5 (Jensen inequality) holds with X in place of R d, under the additional assumption that is either closed or open, and f is finite. (The proof is the same.) 7