Generalized Orlicz spaces and Wasserstein distances for convex concave scale functions

Bull. Sci. math. 135 (2011 795 802 www.elsevier.com/locate/bulsci Generalized Orlicz spaces and Wasserstein distances for convex concave scale functions Karl-Theodor Sturm Institut für Angewandte Mathematik, Endenicher Allee 60, 53115 Bonn, Germany Available online 6 July 2011 Abstract Given a strictly increasing, continuous function ϑ : R + R +, based on the cost functional ϑ ( d(x,y dq(x, y, we define the L ϑ -Wasserstein distance W ϑ (μ, ν between probability measures μ, ν on some metric space (, d. The function ϑ will be assumed to admit a representation ϑ = ψ as a composition of a convex and a concave function and ψ, resp. Besides convex functions and concave functions this includes all C 2 functions. For such functions ϑ we extend the concept of Orlicz spaces, defining the metric space L ϑ (, m of measurable functions f : R such that, for instance, d ϑ (f, g 1 ϑ ( f(x g(x dμ(x 1. 2011 Elsevier Masson SAS. All rights reserved. 1. Convex concave compositions Throughout this paper, ϑ will be a strictly increasing, continuous function from R + to R + with ϑ(0 = 0. E-mail address: sturm@uni-bonn.de. 0007-4497/$ see front matter 2011 Elsevier Masson SAS. All rights reserved. doi:10.1016/j.bulsci.2011.07.013

796 K.-T. Sturm / Bull. Sci. math. 135 (2011 795 802 Definition 1.1. ϑ will be called ccc function ( convex concave composition iff there exist two strictly increasing continuous functions,ψ : R + R + with (0 = ψ(0 = 0s.t. is convex, ψ is concave and ϑ = ψ. The pair (, ψ will be called convex concave factorization of ϑ. The factorization is called minimal (or non-redundant if for any other factorization (, ψ the function 1 is convex. Two minimal factorizations of a given function ϑ differ only by a linear change of variables. Indeed, if 1 is convex and also 1 is convex then there exists a λ (0, s.t. (t = (λt and ψ(t= λ 1 ψ(t. For each convex, concave or ccc function f : R + R + put f (t := f 1 [ ] (t+ := lim f(t+ h f(t. h 0 h Lemma 1.2. (i For any ccc function ϑ, the function log ϑ is locally of bounded variation and the distribution (log ϑ defines a signed Radon measure on (0,, henceforth denoted by d(log ϑ. (ii A pair (, ψ of strictly increasing convex or concave, resp., continuous functions with (0 = ψ(0 = 0 is a factorization of ϑ iff d ( log ϑ = ψ 1 d( log + d ( log ψ (1 in the sense of signed Radon measures. (iii The factorization (, ψ is minimal iff for any other factorization (, ψ d ( log ψ d ( log ψ in the sense of nonnegative Radon measures on (0,. (iv Every ccc function ϑ admits a minimal factorization ( ˇϑ, ˆϑ given by ˇϑ := ϑ ˆϑ 1 and ˆϑ(x:= x 0 ( exp y 1 dν (z dy where dν (z denotes the negative part of the Radon measure dν(z = d(log ϑ (z. Proof. (i, (ii: The chain rule for convex/concave functions yields ϑ (t = ( ψ(t ψ (t for each factorization (, ψ of a ccc function ϑ. Taking logarithms it implies that log ϑ locally is a BV function (as a difference of two increasing functions and, hence, that the associated Radon measures satisfy d ( log ϑ = d ( log ψ + d ( log ψ = ψ 1 d( log + d ( log ψ.

K.-T. Sturm / Bull. Sci. math. 135 (2011 795 802 797 (iii: The factorization (, ψ is minimal if and only if for any other factorization (, ψ the function u = 1 = ψ ψ 1 is convex. Since log ψ = log u ( ψ + log ψ, the latter is equivalent to d ( log ψ d ( log ψ which is the claim. (iv: Define ˆϑ as above. It remains to verify that ˆϑ <. Let(, ψ be any convex concave factorization of ϑ. Without restriction assume ψ (1 = 1. Then the Hahn decomposition of (1 yields dν d ( log ψ. Hence, for all 0 x 1 x 0 ˆϑ(x= exp dν (z dy x ( 0 y 1 exp d ( log ψ (z dy = ψ(x<. 0 y (2 This already implies that ˆϑ is finite, strictly increasing and continuous on [0,. (For instance, for x>1itfollows ˆϑ(x ˆϑ(1 + x 1. Moreover, one easily verifies that ˆϑ is concave. Since ν +,ν are the minimal nonnegative measures in the ( Hahn or Jordan decomposition of ν = ν + ν, it follows that ( ˇϑ, ˆϑ is a minimal cc decomposition of ϑ. Examples 1.3. Each convex function ϑ is a ccc function. A minimal factorization is given by (ϑ, Id. Each concave function ϑ is a ccc function. A minimal factorization is given by (Id,ϑ. Each C 2 function ϑ with ϑ (0+>0 is a ccc function. The minimal factorization is given by ˆϑ(x:= x 0 exp ( y 1 ϑ (z 0 ϑ (z dz dy and ˇϑ := ϑ ˆϑ 1. (The condition ϑ (0+>0 can be replaced by the strictly weaker requirement that the previous integral defining ˆϑ is finite. 2. The metric space L ϑ (, μ Let (,Ξ,μ be a σ -finite measure space and (, ψ a minimal ccc factorization of a given function ϑ. Then L ϑ (, μ will denote the space of all measurable functions f : R such that t ψ( f dμ <

798 K.-T. Sturm / Bull. Sci. math. 135 (2011 795 802 for some t (0, where as usual functions which agree almost everywhere are identified. Note that due to the fact that r (r for large r grows at least linearly the previous condition is equivalent to the condition (1 t ψ( f dμ 1forsomet (0,. Theorem 2.1. L ϑ (, μ is a complete metric space with the metric { d ϑ (f, g = inf t (0, : t ψ( f g } dμ 1. The definition of this metric does not depend on the choice of the minimal ccc factorization of the function ϑ. However, choosing an arbitrary convex concave factorization of ϑ might change the value of d ϑ. Note that always d ϑ (f, g = d ϑ (f g,0. Proof of Theorem 2.1. Let f,g,h L ϑ (, μ be given and choose r, s > 0 with d ϑ (f, g < r and d ϑ (g, h < s. The latter implies r ψ( f g dμ 1, s ψ( g h dμ 1. Concavity of ψ yields ψ( f h ψ( f g + ψ( g h. Put t = r + s. Then convexity of implies t ψ( f h ( r ψ( f g + s ψ( g h t r t s r ( ψ( f g t + s ( ψ( g h r t. s Hence, t ψ( f h dμ r t ( ψ( f g r r t 1 + s t 1 = 1 dμ+ s t ( ψ( g h s dμ and thus d ϑ (f, h t. This proves that d ϑ (f, h d ϑ (f, g + d ϑ (g, h. In order to prove the completeness of the metric, let (f n n be a Cauchy sequence in L ϑ. Then d ϑ (f n,f m <ɛ n for all n, m with m n and suitable ɛ n 0. Choose an increasing sequence of measurable sets k, k N, with μ( k < and k k =. Then ψ ( f n f m dμ 1 ɛ n k for all k,m,n with m n. Jensen s inequality implies ( 1 1 ψ ( f n f m dμ 1 μ( k ɛ n μ( k k

K.-T. Sturm / Bull. Sci. math. 135 (2011 795 802 799 and thus k ψ(fn ψ(f m dμ ɛn μ( k 1 μ( k In other words, (ψ(f n n is a Cauchy sequence in L 1 ( k,μ. It follows that it has a subsequence (ψ(f ni i which converges μ-almost everywhere on k. In particular, (f ni i converges almost everywhere on k towards some limiting function f (which easily is shown to be independent of k. Finally, Fatou s lemma now implies k ɛ n ψ ( f n f dμ lim inf for each k and n N. Hence, ψ ( f n f dμ 1, ɛ n that is, d ϑ (f n,f ɛ n which proves the claim. Finally, it remains to verify that m k d ϑ (f, g = 0 f = g μ-a.e. on.. ψ ( f n f m dμ 1 ɛ n The implication is trivial. For the reverse implication, we may argue as in the previous completeness proof: d ϑ (f, g = 0 will yield k t ψ( f g dμ 1 for all k N and all t>0 which in turn implies k ψ(f ψ(g dμ = 0. The latter proves f = gμ-a.e. on which is the claim. Examples 2.2. If ϑ(r= r p for some p (0, then ( d ϑ (f, g = 1/p f g p dμ with p := p if p 1 and p := 1ifp 1. Proposition 2.3. (i If ϑ is convex then f L ϑ (,μ := d ϑ (f, 0 is indeed a norm and L ϑ (, μ is a Banach space, called Orlicz space. The norm is called Luxemburg norm. (ii If ϑ is concave then d ϑ (f, g = ϑ ( f g dμ ϑ(f ϑ(g L 1 (,μ.

800 K.-T. Sturm / Bull. Sci. math. 135 (2011 795 802 (iii For general ccc function ϑ = ψ d ϑ (f, g = ψ ( f g L (,μ. (iv If μ(m = 1 then for each strictly increasing, convex function Φ : R + R + Φ 1 (1 = 1 d Φ ϑ (f, g d ϑ (f, g ( Jensen s inequality. with Proof. (i If ψ(r= cr then obviously d ϑ (tf, 0 = t d ϑ (f, 0. See also standard literature [2]. (ii Concavity of ϑ implies ϑ( f g ϑ(f ϑ(g. (iv Assume that d Φ ϑ (f, g < t for some t (0,. It implies Φ ( t ψ( f g dμ 1. Classical Jensen inequality for integrals yields ( Φ t ψ( f g dμ 1 which due to the fact that Φ 1 (1 = 1 in turn implies d ϑ (f, g t. 3. The L ϑ -Wasserstein space Let (, d be a complete separable metric space and ϑ a ccc function with minimal factorization (, ψ.thel ϑ -Wasserstein space P ϑ ( is defined as the space of all probability measures μ on equipped with its Borel σ -field s.t. t ψ( d(x,y dμ(x < for some y and some t (0,. TheL ϑ -Wasserstein distance of two probability measures μ, ν P ϑ ( is defined as { W ϑ (μ, ν = inf t>0: inf q Π(μ,ν t ψ( d(x,y } dq(x,y 1 where Π(μ,ν denotes the set of all couplings of μ and ν, i.e. the set of all probability measures q on s.t. q(a = μ(a, q( A = ν(a for all Borel sets A. Given two probability measures μ, ν P ϑ (, a coupling q of them is called optimal iff w ψ( d(x,y dq(x,y 1 for w := W ϑ (μ, ν. Proposition 3.1. For each pair of probability measures μ, ν P ϑ ( there exists an optimal coupling q.

K.-T. Sturm / Bull. Sci. math. 135 (2011 795 802 801 Proof. For t (0, define the cost function c t (x, y = t ψ(d(x,y. Note that t c t (x, y is continuous and decreasing. Given μ, ν s.t. w := W ϑ (μ, ν <. Then for all t>wthe measures μ and ν have finite c t -transportation costs. More precisely, inf c t (x,ydq(x,y 1. q Π(μ,ν Hence, there exists q n Π(μ,ν s.t. c w+ 1 (x, y dq n (x, y 1 + 1 n n. In particular, c w+1(x, y dq n (x, y 2 for all n N. Hence, the family (q n n is tight [3, Lemma 4.4]. Therefore, there exists a converging subsequence (q nk k with limit q Π(μ,ν satisfying c w+ 1 (x,ydq(x,y 1 + 1 n n for all n [3, Lemma 4.3] and thus c w (x,ydq(x,y 1. Proposition 3.2. W ϑ is a complete metric on P ϑ (. The triangle inequality for W ϑ is valid not only on P ϑ ( but on the whole space P( of probability measures on. The triangle inequality implies that W ϑ (μ, ν < for all μ, ν P ϑ (. Proof of Proposition 3.2. Given three probability measures μ 1,μ 2,μ 3 on and numbers r, s with W ϑ (μ 1,μ 2 <r and W ϑ (μ 2,μ 3 <s. Then there exist a coupling q 12 of μ 1 and μ 2 and a coupling q 23 of μ 2 and μ 3 s.t. r ψ d dq 12 1, ( 1 s ψ d dq 23 1. Let q 123 be the gluing of the two couplings q 12 and q 23, see e.g. [1, Lemma 11.8.3]. That is, q 123 is a probability measure on s.t. the projection onto the first two factors coincides with q 12 and the projection onto the last two factors coincides with q 23.Letq 13 denote the projection of q 123 onto the first and third factor. In particular, this will be a coupling of μ 1 and μ 3. Then for t := r + s t ψ( d(x,z dq 13 (x, z t ψ( d(x,y + d(y,z dq 123 (x,y,z

802 K.-T. Sturm / Bull. Sci. math. 135 (2011 795 802 r t r t 1 + s t 1 = 1. ( r ψ(d(x,y + s t r t ψ(d(y,z dq 123 (x,y,z s ( ψ(d(x,y dq 123 (x,y,z+ s r t ( ψ(d(y,z dq 123 (x,y,z s Hence, W ϑ (μ 1,μ 3 t. This proves the triangle inequality. To prove completeness, assume that (μ k k is a W ϑ -Cauchy sequence, say W ϑ (μ n,μ k t n for all k n with t n 0asn. Then there exist couplings q n,k of μ n and μ k s.t. ψ ( d(x,y dq n,k (x, y 1. (3 t n Jensen s inequality implies d(x,ydq n,k (x, y t n 1 (1 with d(x,y := ψ(d(x,y. The latter is a complete metric on with the same topology as d. That is, (μ k k is a Cauchy sequence w.r.t. the L 1 -Wasserstein distance on P(, d. Because of completeness of P 1 (, d, we thus obtain an accumulation point μ and a converging subsequence (μ ki i. According to [3, Lemma 4.4], this also yields an accumulation point q n of the sequence (q n,ki i. Continuity of the involved cost functions together with Fatou s lemma allows to pass to the limit in (3 to derive ψ ( d(x,y dq n (x, y 1 t n which proves that W ϑ (μ, μ n t n 0asn. With a similar argument, one verifies that W ϑ (μ, ν = 0 if and only if μ = ν. Remark 3.3. For each pair of probability measures μ, ν on W ϑ (μ, ν 1 inf ϑ ( d(x,y dq(x,y 1. References q Π(μ,ν [1] R.M. Dudley, Real Analysis and Probability, Cambridge University Press, 2002. [2] M.M. Rao, Z.D. Ren, Theory of Orlicz Spaces, Pure Appl. Math., Marcel Dekker, 1991. [3] C. Villani, Optimal Transport, Old and New, Grundlehren Math. Wiss., vol. 338, Springer, Berlin/Heidelberg, 2009.