Integral Jensen inequality

Similar documents
Extreme points of compact convex sets

(convex combination!). Use convexity of f and multiply by the common denominator to get. Interchanging the role of x and y, we obtain that f is ( 2M ε

Convexity in R n. The following lemma will be needed in a while. Lemma 1 Let x E, u R n. If τ I(x, u), τ 0, define. f(x + τu) f(x). τ.

L p Spaces and Convexity

l(y j ) = 0 for all y j (1)

MATH MEASURE THEORY AND FOURIER ANALYSIS. Contents

THEOREMS, ETC., FOR MATH 515

2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3

MATHS 730 FC Lecture Notes March 5, Introduction

(c) For each α R \ {0}, the mapping x αx is a homeomorphism of X.

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

(1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define

g 2 (x) (1/3)M 1 = (1/3)(2/3)M.

REAL AND COMPLEX ANALYSIS

Continuity of convex functions in normed spaces

HILBERT SPACES AND THE RADON-NIKODYM THEOREM. where the bar in the first equation denotes complex conjugation. In either case, for any x V define

Appendix B Convex analysis

THEOREMS, ETC., FOR MATH 516

Banach Spaces V: A Closer Look at the w- and the w -Topologies

Notions such as convergent sequence and Cauchy sequence make sense for any metric space. Convergent Sequences are Cauchy

(U) =, if 0 U, 1 U, (U) = X, if 0 U, and 1 U. (U) = E, if 0 U, but 1 U. (U) = X \ E if 0 U, but 1 U. n=1 A n, then A M.

1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer 11(2) (1989),

Locally convex spaces, the hyperplane separation theorem, and the Krein-Milman theorem

Real Analysis Notes. Thomas Goller

Maths 212: Homework Solutions

L p Functions. Given a measure space (X, µ) and a real number p [1, ), recall that the L p -norm of a measurable function f : X R is defined by

Recall that if X is a compact metric space, C(X), the space of continuous (real-valued) functions on X, is a Banach space with the norm

3. (a) What is a simple function? What is an integrable function? How is f dµ defined? Define it first

Signed Measures and Complex Measures

Final. due May 8, 2012

Lebesgue Integration: A non-rigorous introduction. What is wrong with Riemann integration?

Math 4121 Spring 2012 Weaver. Measure Theory. 1. σ-algebras

Chapter 2 Metric Spaces

CHAPTER 6. Differentiation

x 0 + f(x), exist as extended real numbers. Show that f is upper semicontinuous This shows ( ɛ, ɛ) B α. Thus

Local strong convexity and local Lipschitz continuity of the gradient of convex functions

i=1 β i,i.e. = β 1 x β x β 1 1 xβ d

Convex Analysis and Economic Theory Winter 2018

Topology. Xiaolong Han. Department of Mathematics, California State University, Northridge, CA 91330, USA address:

CHAPTER I THE RIESZ REPRESENTATION THEOREM

1 Inner Product Space

A NICE PROOF OF FARKAS LEMMA

CHAPTER V DUAL SPACES

Lebesgue Integration on R n

Jensen s inequality for multivariate medians

Measurable functions are approximately nice, even if look terrible.

Chapter 4. Measure Theory. 1. Measure Spaces

Measure Theory on Topological Spaces. Course: Prof. Tony Dorlas 2010 Typset: Cathal Ormond

Chapter 1. Measure Spaces. 1.1 Algebras and σ algebras of sets Notation and preliminaries

Reminder Notes for the Course on Measures on Topological Spaces

NOTES ON VECTOR-VALUED INTEGRATION MATH 581, SPRING 2017

Introduction to Topology

g(x) = P (y) Proof. This is true for n = 0. Assume by the inductive hypothesis that g (n) (0) = 0 for some n. Compute g (n) (h) g (n) (0)

Lecture 4 Lebesgue spaces and inequalities

MATH 51H Section 4. October 16, Recall what it means for a function between metric spaces to be continuous:

Functional Analysis. Martin Brokate. 1 Normed Spaces 2. 2 Hilbert Spaces The Principle of Uniform Boundedness 32

GEOMETRIC APPROACH TO CONVEX SUBDIFFERENTIAL CALCULUS October 10, Dedicated to Franco Giannessi and Diethard Pallaschke with great respect

Problem Set. Problem Set #1. Math 5322, Fall March 4, 2002 ANSWERS

MAS3706 Topology. Revision Lectures, May I do not answer enquiries as to what material will be in the exam.

NOTES ON THE REGULARITY OF QUASICONFORMAL HOMEOMORPHISMS

Part V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory

Analysis III Theorems, Propositions & Lemmas... Oh My!

Convex Analysis and Economic Theory AY Elementary properties of convex functions

JUHA KINNUNEN. Harmonic Analysis

Contents. Index... 15

3. The Sheaf of Regular Functions

Real Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi

Introduction to Functional Analysis

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product

Constraint qualifications for convex inequality systems with applications in constrained optimization

Measure and integration

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

Review of measure theory

consists of two disjoint copies of X n, each scaled down by 1,

2 Lebesgue integration

Signed Measures. Chapter Basic Properties of Signed Measures. 4.2 Jordan and Hahn Decompositions

Math 209B Homework 2

Solutions to Tutorial 11 (Week 12)

Functional Analysis HW #3

Aliprantis, Border: Infinite-dimensional Analysis A Hitchhiker s Guide

van Rooij, Schikhof: A Second Course on Real Functions

Sobolev spaces. May 18

be the set of complex valued 2π-periodic functions f on R such that

5 Measure theory II. (or. lim. Prove the proposition. 5. For fixed F A and φ M define the restriction of φ on F by writing.

II - REAL ANALYSIS. This property gives us a way to extend the notion of content to finite unions of rectangles: we define

Tools from Lebesgue integration

LECTURE 15: COMPLETENESS AND CONVEXITY

Review of Multi-Calculus (Study Guide for Spivak s CHAPTER ONE TO THREE)

6.2 Fubini s Theorem. (µ ν)(c) = f C (x) dµ(x). (6.2) Proof. Note that (X Y, A B, µ ν) must be σ-finite as well, so that.

SOLUTIONS TO SOME PROBLEMS

The Lebesgue Integral

ANALYSIS QUALIFYING EXAM FALL 2017: SOLUTIONS. 1 cos(nx) lim. n 2 x 2. g n (x) = 1 cos(nx) n 2 x 2. x 2.

An introduction to some aspects of functional analysis

Convex Analysis and Economic Theory Winter 2018

Combinatorics in Banach space theory Lecture 12

Optimization and Optimal Control in Banach Spaces

6. Duals of L p spaces

INTRODUCTION TO REAL ANALYTIC GEOMETRY

+ 2x sin x. f(b i ) f(a i ) < ɛ. i=1. i=1

Transcription:

Integral Jensen inequality Let us consider a convex set R d, and a convex function f : (, + ]. For any x,..., x n and λ,..., λ n with n λ i =, we have () f( n λ ix i ) n λ if(x i ). For a R d, let δ a be the Dirac measure concentrated at a, that is { if a δ a () = if a /. Then µ := n λ iδ xi is a probability measure on, defined for all subsets of. Moreover, we can write n λ if(x i ) = f dµ and n λ ix i = x dµ(x), where the last integral is the integral of the vector-valued function ψ(x) = x. (Vector-valued integration will be recalled below.) Thus the finite Jensen inequality () can be written in the integral form (2) f ( x dµ(x)) f dµ. Our aim is to show that (2) holds, under suitable assumptions, also for more general probability measures on. Vector integration. Let (, Σ, µ) be a positive measure space, and F = (F,..., F d ): R d a measurable function (that is, F (A) Σ for each open set A R d ). It is easy to see that F is measurable if and only if each F k is a measurable function. We say that F is integrable on a set Σ if F k L (, µ) for each k =,..., d. It is easy to see that F is integrable on if and only if F dµ < +. In this case, we define ( ) F dµ = F dµ,..., F d dµ. Observe that, for each linear functional l (R d ), we have ( ) (3) l F dµ = l F dµ (this follows immediately by representing l with a vector of R d ). As an easy consequence, we obtain that F dµ exists if and only if l F L (, µ) for each l (R d ). Returning to our aim, our measure µ should be defined on a σ-algebra of subsets of, for which the restriction of the identity function x x to is measurable. The smallest such σ-algebra is the family B() = {B : B Borel(R d )} where Borel(R d ) is the Borel σ-algebra of R d (that is, the σ-algebra generated by the family of all open sets). Thus we shall consider the following family of measures: M () = { µ: B() [, + ) : µ measure, µ() = }. The barycenter x µ of a measure µ M () is defined by x µ = x dµ(x)

2 (if the vector integral exists). Notice that by (3) we have l(x µ ) = l dµ for each l (R d ). Observation.. Let R d be a nonempty convex set, µ M (). (a) x µ exists L (µ) l L (µ) for each l (R d ). (This follows easily from the fact that x µ exists if and only if x(i) dµ(x) < + for each i =,..., d.) (b) If µ is concentrated on a bounded subset of (in particular, if is bounded), then x µ exists. Proposition.2. Let R d be a nonempty convex set, µ M (). barycenter x µ exists, then x µ. If the Proof. Let us proceed by induction with respect to n := dim(). For n =, we have = {x } and hence µ = δ x, x µ = x. Now, fix m < d and suppose that the statement holds whenever n m. Now, let dim() = m + and x µ /. By the Relative Interior Theorem, ri(), and hence we can suppose that ri(). In this case, L := span() = aff() and int L (). Let us consider two cases. (a) If x µ / L, there exists l (R d ) such that l L and l(x µ ) >. (b) If x µ L, there exists l L \ {} such that l(x µ ) sup l() (by the H-B Separation Theorem), and this l can be extended to an element (denoted again by l) of (R d ). In both cases, we have [ l(xµ ) l(x) ] dµ(x) = l(x µ ) ( ) l dµ = l(x µ ) l x dµ(x) =. Since the expression in square brackets is nonnegative, it must be µ-a.e. null. This implies that necessarily x µ L. But in this case, l is not identically zero on L. onsequently, µ is concentrated on the set := H where H = {x L : l(x) = l(x µ )} is a hyperplane in L. Thus dim( ) m, µ := µ M ( ) and, by the induction assumption, x µ = x µ. This contradiction completes the proof. Let us state an interesting corollary to the above proposition. By an infinite convex combination of elements of we mean any point x of the form where c i, λ i, + j= λ j =. x = + i= orollary.3. Let be a finite-dimensional convex set in a Hausdorff t.v.s. X. Then each infinite convex combination of elements of belongs to. Proof. Let x = + i= λ ic i be an infinite convex combination of elements of. Of course, we can restrict ourselves to the subspace Y = span( {x }). Since Y, being a finite-dimensional t.v.s., is isomorphic to R d, d = dim(y ), we can suppose λ i c i

that X = R d. We can also suppose that c i c j whenever i j. onsider the measure µ concentrated on {c i : i N}, given by µ(c i ) = λ i. Since x µ = x, we can apply Proposition.2. Before proving the Jensen inequality, we shall need the following separation lemma. Lemma.4. Let X be a locally convex t.v.s., X a convex set, f : (, + ] a l.s.c. convex function, x dom(f), and t < f(x ). Then there exists a continuous affine function a: X R such that a < f on, and a(x ) > t. Proof. Fix α R such that t < α < f(x ). By lower semicontinuity, there exists a convex neighborhood V of x such that α < f(x) for each x V. xtend f to the whole X defining f(x) = + for x /, and consider the concave function { α if x V g(x) = otherwise. Since g f and g is continuous at x, we can use a H-B Theorem on Separation of Functions to get a continuous affine function a on X such that g a f. Now, t < α a (x ) f(x ). Thus, for a sufficiently small ε >, the function a := a ε has the required properties. Theorem.5 (Jensen inequality). Let R d be a nonempty convex set, f : (, + ] a convex l.s.c. function, and µ M (). Assume that the baricenter x µ exists (which is automatically true for a bounded ). Then x µ and (4) f(x µ ) f dµ (in particular, the integral on the right-hand side exists). Proof. We already know that x µ (Proposition.2). If f +, the assertion is obvious. Now, suppose that f is proper. Notice that f is B()-measurable since the sublevel sets {x : f(x) α} are relatively closed in. We claim that the right-hand-side integral in (4) exists. Indeed, by Lemma.4, there exists a continuous affine function a: R d R such that a < f on. Write a in the form a(x) = l(x) + β where l (R d ), β R. By Observation., a L (µ), which implies that f L (µ). The effective domain dom(f) = f (R) is convex and belongs to B(). If µ( \ dom(f)) >, then obviously f dµ = + and (4) trivially holds. Let µ( \ dom(f)) =. In this case, µ is concentrated on dom(f), and hence, by Proposition.2, x µ dom(f). Assume that (4) is false, that is t := f dµ < f(x µ ). By Lemma.4, there exist l (R d ) and β R such that l + β < f on, and l(x µ ) + β > t. But these two properties are in contradiction: t = f dµ > (l + β) dµ = l dµ + β = l(x µ ) + β > t. 3

4 orollary.6 (Hermite-Hadamard inequalities). Let f : [a, b] R be a continuous convex function. Then ( ) a + b f b f(a) + f(b) f(x) dx. 2 b a 2 a Proof. The measure µ, defined by dµ = dx b a, belongs to M ([a, b]). Moreover, its barycenter is x µ = b a+b b a a x dx = 2. Thus the first inequality follows from the integral Jensen inequality. Let us show the second inequality. Substituting x = a + t(b a), we get b b a a f(x) dx = f(( t)a + tb) dt [ ] ( t)f(a) + tf(b) dt = f(a)+f(b) 2. Image of a probability measure. Let (, Σ, µ) be a probability space, and (T, τ) a topological space. Let g : T be a measurable mapping, that is g (A) Σ whenever A T is an τ-open set. Let Borel(τ) denote the σ-algebra of all Borel sets in T. Since g (B) Σ for each B Borel(τ), we can consider the function ν : Borel(τ) [, ], ν(b) = µ(g (B)). It is easy to verify that ν is a (borelian) probability measure on T, called the image of µ by the map g. Moreover, ν is concentrated on the image g(), in the sense that ν(b) = whenever B Borel(τ) is disjoint from g(). Let us see how to integrate with respect to ν. Let s = n α iχ Bi be a nonnegative simple function on T such that the sets B i are borelian and pairwise disjoint. Then the composition s g is a measurable simple function on and it can be represented as s g = n α iχ g (B i). Thus we have T s dν = n α i ν(b i ) = n α i µ(g (B i )) = s g dµ. Now, if f is a nonnegative Borel-measurable function on T, we can approximate it by a pointwise converging nondecresing sequence of simple Borel-measurable functions. Passing to limits in the above formula, we get (5) f dν = f g dµ. T It follows that, for an arbitrary Borel-measurable function f on T, we have: T f dν exists if and only if f g dµ exists; in this case, the two integrals are equal; f L (ν) if and only if f g L (µ). Let us return to the Jensen inequality. We can apply it to an image measure to obtain the following Theorem.7 (Second Jensen inequality). Let (, Σ, µ) be a probability measure space, and g : R d a measurable mapping that is µ-integrable. Let R d be a convex set such that g(ω) for µ-a.e. ω, and f : (, + ] a l.s.c. convex function. Then: g dµ ;

f g dµ exists; we have the inequality ( ) f g dµ f g dµ. Proof. Let ν be the image of the measure µ by the mapping g :. Then ν is a probability measure on, defined on the relative Borel σ-algebra of B(). Its barycenter is x ν = x dν = g dµ. The rest follows directly from Theorem.5 since f dν = f g dµ. orollary.8 (xamples of applications). Let (, Σ, µ) be a probability measure space, and g L (). (a) Second Jensen inequality for = R and f(x) = x p (p ) gives ( /p g dµ g dµ) p. (b) Second Jensen inequality for = R and f(x) = e x = exp(x) gives ( ) exp g dµ e g dµ. (c) Second Jensen inequality for = (, + ) and f(x) = log x gives ( ) log g dµ log g dµ. Hölder via Jensen. Let us show how the well-known Hölder inequality can be derived from the Second Jensen inequality. Let (, Σ, µ) be a (not necessarily probability) nonnegative measure space. Let p, p (, + ) be two conjugate Hölder exponents (that is, p + p = ). Let f, g be nonnegative measurable functions on, such that their norms in L p (µ) and L p (µ) satisfy < f p < + and < g p < +. onsider the set = {g > } ( Σ) and the measure ν on Σ, defined by dν = dµ gp dµ. gp Then ν is a probability measure which is concentrated on. Applying orollary.8(a) to the function fg p (defined on the set ), we get gp dµ fg dµ = dν fg p ( ) /p f p g p( p ) dν ( ) /p = gp dµ f p g p+p pp dµ = ( gp dµ ) /p ( f p dµ ) /p 5

6 since p + p pp = pp ( p + p ) =. Multiplying by gp dµ, we obtain the Hölder inequality fg dµ f p g p. Finally, notice that if our assumption on the L p -norm of f and the L p -norm of g is not satisfied, the Hölder inequality is trivially satisfied (with the usual convention = ). A generalization of the Hermite-Hadamard inequalities. Let be an arbitrary norm on R d, B R d a closed -ball centered at a point x R d, and S = B. Let m denote the Lebesgue measure in R d and σ the surface measure. Then, for each continuous convex function f : B R, we have the inequalities (6) f(x ) m(b) B f dm σ(s) S f dσ. dm m(b), Proof. By translation, we can suppose that x =. For the probability measure dµ = we have x µ = by symmetry, and hence the Jensen inequality implies the first inequality in (6). Let us show the second inequality. Notice that m(b) = σ(rs) dr = σ(s) r d dr = d σ(s). Now, a similar onion-skin integration and convexity of f imply ( ) f dm = f dσ dr B = ( ( ( = rs rs f ( ) +r 2 ( x r ) + r 2 ( x r )) dσ(x) dr [ +r 2 f( x r ) + r 2 f( x r )] dσ(x) rs ) r d dr f dσ = m(b) f dσ. σ(s) The second inequality in (6) follows by dividing by m(b). S S ) dr Possible generalizations. Some of the above results, except Proposition.2, can be easily generalized to the infinite-dimensional setting. The main problem is that we have to introduce appropriately the barycenter of a probability measure. The most general is the following Pettis-integral approach which defines the integral of a vector-valued function by requiring the equality (3) for every continuous linear functional l. Let X be a t.v.s., and X a nonempty set. We shall say that a point x µ X is a barycenter for a probability measure µ M () (defined on the relative Borel σ-algebra B()) if y (x µ ) = y dµ for each y X. Then we have the following results. Let X be a locally convex t.v.s., X a nonempty convex set, and µ M (). (i) µ has at most one barycenter x µ. (This follows from the fact that X separates the points of X.) (ii) If x µ exists and is either closed or open, then x µ. (The proof by contradiction uses the H-B Separation Theorem, in the same spirit as in Proposition.2.)

(iii) If is compact, then x µ exists. (This is a bit more difficult.) (iv) Theorem.5 (Jensen inequality) holds with X in place of R d, under the additional assumption that is either closed or open, and f is finite. (The proof is the same.) 7