L p Spaces and Convexity These notes largely follow the treatments in Royden, Real Analysis, and Rudin, Real & Complex Analysis. 1. Convex functions Let I R be an interval. For I open, we say a function f : I R is convex if for every a, b I and every λ (0, 1), we have (1) φ(λb + (1 λ)a) λφ(b) + (1 λ)φ(a). (Note we do not assume that φ is differentiable, as for example φ(x) = x is convex.) If I is not open, then we say φ : I R is convex if (1) is satisfied and φ is continuous at any endpoint of I. Geometrically, I is convex if every secant line segment lies above the graph of φ. A convex function φ is said to be strictly convex if whenever the equality in (1) is satisfied for some λ (0, 1) and a, b I, then a = b. In other words, φ is strictly convex if for every a b I and λ (0, 1), φ(λb + (1 λ)a) < λφ(b) + (1 λ)φ(a). Here are a few lemmas about convex functions, whose proofs will be left as exercises. Lemma 1. Let φ be a convex function, and let a, b I with a < b. Assume there is a λ (0, 1) so that Then the restriction of φ to [a, b] is linear. φ(λb + (1 λ)a) = λφ(b) + (1 λ)φ(a). Lemma 2. A convex function φ is strictly convex if and only if its graph contains no line segments. Lemma 3. Each tangent line to the graph of a differentiable strictly convex function φ intersects the graph of φ only at the point of tangency. Lemma 4. Any convex function is continuous. If φ : I R is convex and x 0 I, then the line given by the graph of l(x) = φ(x 0 ) + m( ) is a supporting line of φ at x 0, if φ(x) l(x) for all x I. Proposition 5. Let φ : I R be convex, and let x 0 I. Then there is a supporting line for φ at x 0. Proof. For x I {x 0 }, let m(x) = φ(x) φ(x 0) x x 0. Then we claim m is an increasing function of x. To prove the claim, first consider x 0 < x < x be points in I, and define λ (0, 1) so that x = λx + (1 λ)x 0. Consider the secant line from x 0 to x. Then the convexity of φ implies φ(x ) = φ(λx + (1 λ)x 0 ) λφ(x) + (1 λ)φ(x 0 ). 1
2 Compute λ = x x 0, 1 λ = x x, φ(x ) x x 0 φ(x) + x x φ(x 0 ) = x x 0 φ(x) x x 0 φ(x 0 ) + x x 0 φ(x 0 ) + x x φ(x 0 ) = x x 0 [φ(x) φ(x 0 )] + φ(x 0 ) = (x x 0 )m(x) + φ(x 0 ), φ(x ) φ(x 0 ) x x 0 m(x), m(x ) m(x). The other cases of x < x 0 < x and x < x < x 0 are similar. Now since m(x) is increasing on I {x 0 }, the one-sided limits m + = lim x x + m(x) and 0 m = lim x x m(x) exist and satisfy m+ m. Then we claim that if m m m +, 0 then l(x) = φ(x 0 ) + m( ) is a supporting line of φ at x 0. Since m m(x) for all x > x 0, l(x) = φ(x 0 ) + m( ) φ(x 0 ) + m(x)( ) = φ(x). Similarly, since m m(x) for all x < x 0, we also see l(x) φ(x) for all x < x 0, and thus the graph of l is a supporting line for φ at x 0. Corollary 6. Let φ be a differentiable convex function. Let a, b I. Then φ(b) φ(a) + φ (a)(b a). In other words, the graph of φ lies above the graph of each tangent line. For the proof, just recognize m + = m = φ (a) in this case. Proposition 7. If φ: I R is strictly convex, and the graph of l is a supporting line of φ at x 0 I, then for all x I {x 0 }, φ(x) > l(x). Proof. Apply Lemma 2. Proposition 8. Let φ: I R be continuous, and assume φ > 0 on the interior I of I. Then φ is strictly convex on I. Proof. Since φ > 0, we see that φ is strictly increasing on I. Let a < b in I. Define ψ(t) = φ(tb + (1 t)a) tφ(b) (1 t)φ(a). Then we want to show ψ < 0 on (0, 1). Note ψ(0) = ψ(1) = 0. Now ψ (t) = φ (tb + (1 t)a)(b a) φ(b) + φ(a) is strictly increasing. Since ψ(0) = ψ(1) = 0, there is a T (0, 1) where ψ (T ) = 0 (either there is a local extremum point or ψ is constant; this is Rolle s Theorem). Since ψ is strictly increasing, we have ψ (t) < 0 for t (0, T ) and ψ (t) > 0 for t (T, 1). Therefore,
ψ is strictly decreasing on [0, T ] and strictly increasing on [T, 1]. Since ψ(0) = ψ(1) = 0, we find ψ(t) < 0 for t (0, 1). Corollary 9. For p (1, ), the function x x p is strictly convex on [0, ). The exponential function exp x = e x is strictly convex on (, ). 3 2. The Banach space L p Let (, M, µ) be a measure space. For a measurable function f, define ( ) 1 f L p = f p p dµ. Then we define L p () = {f : R (or C) : f L p < }/, where as usual f g if f = g almost everywhere. Proposition 10. L p is a norm on L p (). Proof. It is obvious that f L p 0 always, and if f L p = 0, then we have f p dµ = 0, which implies f p = 0 a.e. This is equivalent to f = 0 a.e. It is obvious that if α is a constant, then αf L p = α f L p. The Triangle Inequality is harder, and we cover it in Minkowski s Theorem below. Theorem 1 (Minkowski s Theorem). Let p [1, ]. If f, g L p (), then (2) f + g L p f L p + g L p. If p (1, ), then equality can hold only if there are nonnegative constants α, β, not both zero, so that βf = αg. Moreover, if f, g 0 are measurable (but not necessarily in L p ()), then (2) holds. Proof. We have already addressed the cases of p = 1,. So we may assume p (1, ). Also, if f L p = 0, then f = 0 a.e., and the conclusion is valid. So now assume p (1, ), α = f L p > 0, and β = g L p > 0. Choose functions f 0 = α 1 f, g 0 = β 1 g. Therefore, f 0 L p = g 0 L p = 1. For λ = α/(α + β), and so 1 λ = β/(α + β). Compute f(x) + g(x) p ( f(x) + g(x) ) p = [αf 0 (x) + βg 0 (x)] p = (α + β) p [λf 0 (x) + (1 λ)g 0 (x)] p (α + β) p [λf 0 (x) p + (1 λ)g 0 (x) p ] by the convexity of φ(t) = t p. Recall p (1, ) implies this last inequality is strict unless f 0 (x) = g 0 (x). For z C, define sgn 0 = 0 and sgn z = z z otherwise. Also define sgn( ) = sgn( ) = 0. For f(x), g(x) finite and nonzero, we see f(x) + g(x) p = ( f(x) + g(x) ) p if and only
4 if sgn f(x) = sgn g(x). Thus, when f(x) and g(x) are finite, by considering various cases, we find (3) f(x) + g(x) p (α + β) p [λf 0 (x) p + (1 λ)g 0 (x) p ] with equality if and only if α 1 f(x) = β 1 g(x) when f(x), g(x) are finite. Integrating both sides of (3) gives f + g p L p (α + β)p [λ f 0 p L + (1 λ) g 0 p p L p] = (α + β) p = ( f L p + g L p) p. Therefore f + g L p f L p + g L p for f, g L p. Moreover, if there is equality, then f(x) + g(x) p (α + β) p [λf 0 (x) p + (1 λ)g 0 (x) p ] = 0, and the integrand is nonnegative almost everywhere. Therefore, the integrand must vanish almost everywhere, and thus α 1 f(x) = β 1 g(x) for almost every x. Finally, the remaining case in which f, g are nonnegative and f L p or g L p is infinite is trivial. Let (V, ) be a normed linear space. In other words, V is a vector space over R or C equipped with a norm. A series n=1 v n for v n V is convergent if the partial sums converge to a limit in V. The series is said to be absolutely convergent if n=1 v n <. Proposition 11. Let V be a vector space over the field R or C equipped with a norm. Consider the metric on V with the distance function x y. Then V is complete if and only if every absolutely convergent series in V is convergent. Proof. First of all assume V is complete. Let n=1 v n be an absolutely convergent series. Let s n = n j=1 v j be the partial sum. Then if m > n, s m s n = m j=n+1 v j and m m (4) s m s n = v j v j v j. j=n+1 j=n+1 j=n+1 But now since n=1 v n is absolutely convergent, the sum n=1 v n converges, and so the tail of the series j=n+1 v j 0 as n. In other words, for every ɛ > 0, there is an N so that if n N, then j=n+1 v j ɛ. Then (4) shows the sequence of partial sums s n is a Cauchy sequence. Since V is complete, it has a limit s V, which is the sum of the series. On the other hand, assume every absolutely convergent series in V is convergent. Let w n be a Cauchy sequence. Define w nk as a subsequence as follows: For ɛ = 1, there is an 2 N so that if n, m N, then w n w m 1. Let n 2 1 = N. Then define n k recursively as n k = max{n k 1 + 1, N}, for N a constant so that if n, m N, then w n w m 1. 2 k By induction, w nk is a subsequence of w n so that w nk w nk+1 1 for all k. Now if 2 k v 1 = w n1 and v k = w nk w nk 1 for k 2. By construction 1 v k w n1 + = w 2 k n1 + 1 <. 2 k=1 k=2
Therefore, k=1 v k is absolutely convergent, and thus is convergent to a sum s by our assumption. Now we show w n s. Let ɛ > 0. Note the partial sum k j=1 v j = w nk, and so w nk s as k. So there is a K so that if k K, then w nk s ɛ. Since w 2 n is Cauchy, there is an N so that if n, m N, then w n w m ɛ. So choose L K so that 2 n L N, and then for n n L, we have So w n s. w n s w n w nl + w nl s ɛ 2 + ɛ 2 = ɛ. Theorem 2. For p [1, ], L p () is a Banach space. Proof. We have already addressed the case of p =. Thus we may assume that p [1, ). We have also proved above that L p is a norm. Thus we only need to prove L p () is complete. We will use the previous proposition to show that absolutely convergent series in L p () are convergent. Let f n L p () be an absolutely convergent series, so that n=1 f n L p = M. Define g n (x) = n k=1 f k(x). By Minkowski s Inequality, n g n L p f k L p f k L p = M. k=1 Since g n is increasing pointwise, it converges g n (x) g(x) as n (where g may take the value ). Moreover, g n (x) p g(x) p. By Fatou s Lemma, we see g p lim inf gn p = lim inf g n p n n L M p. p So g p is integrable, and g(x) is finite for almost every x. For x with g(x) <, n=1 f n(x) is absolutely convergent, and thus is convergent in R (or C). So for almost every x, we define s(x) = n=1 f n(x), and let s n (x) be the corresponding partial sum. Note s n (x) g(x) implies s(x) g(x) and thus s L p (). This implies s n (x) s(x) p 2 p g(x) p, and 2 p g p is integrable. Therefore, the Dominated Convergence Theorem applies, and since s n (x) s(x) p 0 almost everywhere, s n s p L = s p n s p 0 = 0. Thus the sum n=1 f n converges to s L p (). Theorem 3. Let p [1, ), and consider R d with Lebesgue measure. Then the following sets of functions are dense in L p (R d ): k=1 Simple functions. Step functions. Continuous functions with compact support. The proof is very similar to the case p = 1. 5
6 3. Hölder s Inequality For p [1, ], the conjugate exponent is defined to be q so that 1 + 1 = 1. We consider p q 1, to be conjugate exponents. Theorem 4 (Hölder s Inequality). Let p, q be conjugate exponents. Let f L p () and g L q (). Then (5) fg L 1 = fg f L p g L q. Moreover, if p (1, ), equality holds in (5) if and only if there are constants α, β which are not both zero so that α f p = β g q almost everywhere. More generally, if f, g are nonnegative measurable functions, then (5) holds. Proof. First of all, if f L p = 0, then f = 0 a.e. and the result is trivial. The same is true if g L q = 0. If p = 1 and q =, then g(x) g L for almost all x. Therefore, fg g L f = f L 1 g L. The same is true if p = and q = 1. Thus we assume p, q (1, ). We may assume α = f L p and β = g L q are positive. Let f 0 = α 1 f and g 0 = β 1 g. The convexity of the exponential function implies since 1 + 1 = 1 that e s p + t q p q p 1 e s + q 1 e t. Now for x so that f 0 (x), g 0 (x) (0, ), define s, t by f 0 (x) = exp( s) and p g 0 (x) = exp( t ). Therefore, q (6) f 0 (x)g 0 (x) p 1 f 0 (x) p + q 1 g 0 (x) q for every x. (The cases where f 0 (x), g 0 (x) are 0 or are easy to analyze.) Moreover, the strict convexity of the exponential function implies that if there is equality in (6), then s = t, which implies f 0 (x) p = g 0 (x) q, at least in the case when f 0 (x), g 0 (x) are both finite. Now integrate (6) to see f 0 g 0 p 1 f p 0 + q 1 g q 0 = p 1 + q 1 = 1. Then the definitions of f 0, g 0 imply (5). If Hölder s Inequality is an equality, then f 0 g 0 (p 1 f p 0 + q 1 g0) p = 0, while the integrand is nonnegative. Thus we have f 0 (x)g 0 (x) = p 1 f 0 (x) p + q 1 g 0 (x) q for almost all x. This implies f 0 (x) p = g 0 (x) q for almost all x. One remaining case is that of f, g 0 but f L p =. The inequality is trivially true here. The last remaining case, of g L q =, is handled the same way.
7 4. Jensen s Inequality A measure µ on a σ-algebra M on a set is called a probability measure if µ() = 1. Proposition 12. Let (, M, µ) be a measure space. Let f be a nonnegative measurable function on. For every E M, define ν(e) = f dµ. Then ν is a measure on M. E Proof. We need to check countable additivity. So let E j be a countable disjoint collection of measurable sets. Then ( ) ν E j = f χ j=1 E j dµ = f χ Ej = f χ Ej = ν(e j ). j=1 j=1 Here the second equality is by the assumption that the E j are disjoint, while the third follows from the Monotone Convergence Theorem, since f χ Ej 0. This proposition shows how to produce a probability measure from any measure space together with a measurable nonnegative function with integral 1. Theorem 5 (Jensen s Inequality). Let (, M, µ) be a probability measure space. Let g be an integrable function on with range in an interval I R. Let φ: I R be convex. Then ( ) φ g dµ φ g dµ. Proof. Let α = g dµ. Then we claim α I. To prove the claim, consider b = sup I. If b =, then α < b since g is integrable. On the other hand, if b is finite, then α = g dµ b dµ = b µ() = b. A similar analysis applies to inf I, and this implies α Ī the closure of I. Moreover, if b is an endpoint of I, then α b unless g(x) = b for almost every x. (Why?) Thus there are two cases. In the trivial case g(x) = b for almost every x. In this case, ( ) φ g dµ = φ(b) dµ = φ(b) = φ g dµ. Otherwise, α I. By Proposition 5, we my choose l(x) = φ(α) + m(x α) φ(x) for all x I. Therefore, ( ) φ g dµ φ(α) + m(g α) dµ = φ(α) = φ g dµ. Corollary 13. If on a measure space, f is a positive measurable function with integral 1, let g and φ satisfy the hypotheses of Jensen s Inequality. Then ( ) (φ g)f dµ φ gf dµ). j=1 j=1