AN ELEMENTARY PROOF OF THE TRIANGLE INEQUALITY FOR THE WASSERSTEIN METRIC

PROCEEDINGS OF THE AMERICAN MATHEMATICAL SOCIETY Volume 136, Number 1, January 2008, Pages 333 339 S 0002-9939(07)09020- Article electronically published on September 27, 2007 AN ELEMENTARY PROOF OF THE TRIANGLE INEQUALITY FOR THE WASSERSTEIN METRIC PHILIPPE CLEMENT AND WOLFGANG DESCH (Communicated by Richard C Bradley) Abstract We give an elementary proof for the triangle inequality of the p- Wasserstein metric for probability measures on separable metric spaces Unlike known approaches, our proof does not rely on the disintegration theorem in its full generality; therefore the additional assumption that the underlying space is Radon can be omitted We also supply a proof, not depending on disintegration, that the Wasserstein metric is complete on Polish spaces 1 Introduction In [1] the Wasserstein metric W p (with p 1) is defined for probability measures on a Radon space The proof of the triangle inequality relies on the disintegration theorem [1, Theorem 531] The aim of this paper is to give an elementary proof of the triangle inequality for a general separable metric space To be more precise, we introduce the following notation and definitions (according to [1]): Let (, d) be a separable metric space with its Borel σ-algebra B() Here P() denotes the set of Borel probability measures If (Y,d) is another separable metric space, if µ P() andf : Y is a measurable map, then f # µ is the image measure on B(Y ), ie (f # µ)(a) µ(f 1 (A)) for A B(Y ) In particular, considering the product 1 2 of two separable metric spaces, we define the canonical projections π i : 1 2 i If γ P( 1 2 ), then π i #γ (for i 1, 2) are the marginal distributions of γ Similarly, for products of three spaces, π i,j : 1 2 3 i j denotes the canonical projection If µ i P( i ), we define Γ(µ 1,µ 2 ){γ P( 1 2 ): π i #γ µ i for i 1, 2} Let 1 p < By P p () wedenotethesetofallµ P() such that d(x, y)p dµ(x) < for some (equivalently, all) y For µ 1,µ 2 P p () we define the Wasserstein metric as in [2, Section 118, Problem 7]: (11) W p (µ 1,µ 2 ) inf d(x 1,x 2 ) p dγ(x 1,x 2 )] γ Γ(µ 1,µ 2 ) In [2, Section 1183] it is shown for p 1 that W 1 is a metric Moreover, [2, Section 1183, Problem 9] consists in proving that W p is a metric for p 1, Received by the editors October 30, 2006 2000 Mathematics Subject Classification Primary 60B05 Key words and phrases Wasserstein metric, triangle inequality, probability measures on metric spaces 333 c 2007 American Mathematical Society

334 PHILIPPE CLEMENT AND WOLFGANG DESCH provided (, d) is a Polish space In [1, Section 71] it is proved that W p is a metric for p 1 under the more general assumption that (, d) is a separable Radon space In this note (Section 2) we prove that the triangle inequality can be proved by more elementary means, thus extending its validity to general separable metric spaces The strategy of our proof follows quite closely that of [2] However, in the case of a countable metric space, the disintegration can be done in a very straightforward way without requiring higher tools of measure theory The generalisation from countable to separable metric spaces is done by an approximation procedure In [1], the disintegration theorem is also used to prove the completeness of (P p (),W p )when(, d) is complete In Section 3 of this note we give a proof which does not rely on the disintegration theorem but on the completeness of (P(),β), where β is the dual bounded Lipschitz metric (see [2, p 394]) 2 The triangle inequality We begin with a proof of the triangle inequality in the case where (, d) isa countable metric space Proposition 21 Let (, d) be a countable metric space and let 1 p< Let µ 1,µ 2,µ 3 P p (), γ 1,2 Γ(µ 1,µ 2 ) and γ 2,3 Γ(µ 2,µ 3 ) Then there exists some γ 1,3 Γ(µ 1,µ 3 ) such that d(x 1,x 3 ) p dγ 1,3 (x 1,x 3 ) d(x 1,x 2 ) p dγ 1,2 (x 1,x 2 )] + d(x 2,x 3 ) p dγ 2,3 (x 1,x 3 )] In particular, W p (µ 1,µ 3 ) W p (µ 1,µ 2 )+W p (µ 2,µ 3 ) Proof Let {v 1,v 2, } For short, we denote µ i k µi ({v k })andγ i,j k,l γ i,j ({(v k,v l )}) whenever i, j {1, 2, 3} and k, l N We define (in accordance with the notation just introduced) the measure γ γ k,m,n (δ vk δ vm δ vn ) with γ k,m,n k,m,n { γ 1,2 k,m γ2,3 m,n µ 2 m if µ 2 m 0, 0 if µ 2 m 0 Since the first marginal of γ 2,3 equals µ 2,weobtain γ π 1,2 #γ({(v k,v m )}) { 1,2 k,m γ2,3 m,n n µ γ 1,2 2 m k,m if µ2 m 0, 0γ 1,2 k,m if µ 2 m 0 Similarly, π 2,3 #γ γ 2,3 Since the marginals are probability measures, we infer that γ is a probability measure Moreover it follows for j 1, 2, 3thatπ j #γ µ j We define γ 1,3 π 1,3 #γ which is again a probability measure and has marginals µ 1 and µ 3

WASSERSTEIN METRIC 335 Now use the definition of γ 1,3 and Minkowski s inequality to estimate d(x 1,x 3 ) p dγ 1,3 (x 1,x 3 ) d(x 1,x 3 ) p dγ(x 1,x 2,x 3 ) [d(x 1,x 2 )+d(x 2,x 3 )] p dγ(x 1,x 2,x 3 ) + d(x 1,x 2 ) p dγ(x 1,x 2,x 3 ) d(x 2,x 3 ) p dγ(x 1,x 2,x 3 ) d(x 1,x 2 ) p dγ 1,2 (x 1,x 2 )] + Thus γ 1,3 satisfies the desired estimate d(x 2,x 3 ) p dγ 2,3 (x 2,x 3 ) The extension from the case of a countable metric space to a separable space is done by an approximation procedure We will need several lemmas Lemma 22 Let (, d) be a separable metric space and let be a countable dense subset of Then, for each ɛ>0, there exists a Borel measurable map f : such that d(x, f(x)) <ɛfor each x ThesetsS i f 1 (v i ) form a partition of Proof Let {v 1,v 2, } We define inductively S 1 B(v 1,ɛ), S i B(v i,ɛ) \ j<i S j This is a partition of by Borel sets Finally, we define f(x) v i whenever x S i Remark 23 By skipping the indices where S i and renumbering, we can obtain a partition of into nonempty sets S i Lemma 24 Let be a separable metric space and let be a countable dense subset of Let ɛ>0 and let f be given according to Lemma 22 Moreover, let γ P( ) and γ P( ) be such that γ (f f) # γ Then the following assertions hold: (1) the marginals satisfy (for i 1, 2) π i # γ f # (π # i γ), [ (2) [ d(x, y)p dγ(x, y)] d(x, y)p d γ(x, y)] 2ɛ Proof Let Ũ Then (π 1 # γ)(ũ) γ(ũ ) γ(f 1 (Ũ) ) (π1 #γ)(f 1 (Ũ)) [f #(π 1 #γ)](ũ)

336 PHILIPPE CLEMENT AND WOLFGANG DESCH Of course, the same proof holds for π 2 To obtain the estimate, we utilize the fact that γ (f f) # γ and Minkowski s inequality: d(x, y) p dγ(x, y)] d(x, y) p d γ(x, y)] d(x, y) p dγ(x, y)] d(f(x),f(y)) p dγ(x, y)] d(x, y) d(f(x),f(y)) p dγ(x, y) (2ɛ) p dγ(x, y)] 2ɛ Theorem 25 Let (, d) be a separable metric space and let 1 p< Let µ 1,µ 2,µ 3 P p () ThenW p (µ 1,µ 3 ) W p (µ 1,µ 2 )+W p (µ 2,µ 3 ) Proof Let {v 1,v 2, } be a dense countable subset of, letɛ>0, and let f begivenaccordingtolemma22 LetS i f 1 ({v i }) For i 1, 2, 3 we define µ i f # µ i Now, for the pairs (i, j) {(1, 2), (2, 3)}, letγ i,j Γ(µ i,µ j ) be such that d(x i,x j ) p dγ i,j (x i,x j )] <W p (µ i,µ j )+ɛ On we define the measures γ i,j (f f) # γ i,j By Lemma 24(1) we infer that γ i,j Γ( µ i, µ j ) By Lemma 24(2) we have the estimates d(x i,x j ) p d γ i,j d(x i,x j ) p dγ i,j +2ɛ W p (µ i,µ j )+3ɛ Proposition 21 implies that there exists Γ( µ 1, µ 3 ) such that d(x 1,x 3 ) p d d(x 1,x 2 ) p d γ 1,2 + d(x 2,x 3 ) p d γ 2,3 W p (µ 1,µ 2 )+W p (µ 2,µ 3 )+6ɛ For short, we write γ1,3 ({(v k,v n )}) and µ i m µ i ({v m })µ i (S m ) we define the measure γ 1,3 (U) N, µ 1 k 0, µ3 n 0 µ 1 (µ 1 µ 3 )[U (S k S n )] k µ3 n We will show below that γ 1,3 Γ(µ 1,µ 3 )and(f f) # γ 1,3 Then Lemma 24(2) implies that d(x 1,x 3 ) p dγ 1,3 d(x 1,x 3 ) p d +2ɛ W p (µ 1,µ 2 )+W p (µ 2,µ 3 )+8ɛ On

WASSERSTEIN METRIC 337 From this we conclude that W p (µ 1,µ 3 ) W p (µ 1,µ 2 )+W p (µ 2,µ 3 )+6ɛ Withɛ 0 we infer the desired triangle inequality Now we show that γ 1,3 Γ(µ 1,µ 3 ) We will use the definition of γ 1,3 and the marginals of In the following computation the sum has to be understood over all indices where the denominators are nonzero Notice that µ 1 k 0orµ3 n 0 implies γ 1,3 0 π 1 #γ 1,3 (V )γ 1,3 (V ) µ 1 1 k µ 1 (V S k ) µ 1 (V S k )µ 1 (V ) k1 µ 1 k1 k 1 µ 1 (µ 1 µ 3 )[(V S k ) S n ] k µ3 n 1 µ 1 (V S k ) n1 To show that (f f) # γ 1,3,considerŨ 1 Then (f f) (Ũ) :(v k,v n ) Ũ S k S n Consequently, 1,3 (f f) # γ (Ũ) γ1,3 S k S n :(v k,v n ) Ũ Thus the proof is finished :(v k,v n ) Ũ µ 1 (µ 1 µ 3 )(S k S n ) k µ3 n :(v k,v n ) Ũ γ1,3 (Ũ) 3 Completeness Here we give an alternative proof of the completeness of (P p (),W p ) whenever is complete (see [1, Propostion 715]) Throughout this section, let 1 p< We recall the definition of the dual bounded Lipschitz metric for µ, ν P(): (31) β(µ, ν) sup f(x) dµ(x) f(x) dν(x) where the supremum is taken over all bounded and Lipschitz continuous f : R such that f +[f] Lip 1 Convergence with respect to β is equivalent to narrow convergence [2, 1133] Moreover, for µ, ν P p () (32) β(µ, ν) W p (µ, ν) Indeed, let f be bounded and Lipschitz with [f] Lip 1 For γ Γ(µ, ν) wehave f(x) dµ(x) f(x) dν(x) (f(x) f(y)) dγ(x, y) f(x) f(y) dγ(x, y) d(x, y) dγ(x, y) d(x, y) p dγ(x, y)

338 PHILIPPE CLEMENT AND WOLFGANG DESCH Let µ n be a Cauchy sequence in (P p (),W p ) From (32) it is also a Cauchy sequence in (P(),β) Since (, d) is complete, (P(),β) is also complete by [2, Corollary 1155] Let µ denote the limit of µ n with respect to the metric β Given ɛ>0, let N 1 be such that W p (µ m,µ n ) ɛ for all m, n N We claim that given x, n N (33) d(x, x) p dµ(x) 2 p ɛ p +2 p d(y, x) p dµ n (y), (34) W p (µ, µ n ) ɛ This implies that µ P p () andw p (µ n,µ) 0 By [1, Section 71], since (, d) is complete, the infimum in (11) is in fact a minimum For m, n 1letγ m,n Γ(µ m,µ n ) be such that (35) Wp p (µ m,µ n ) d(x, y) p dγ m,n (x, y) Since µ m µ narrowly, [1, Lemma 522] implies that for each n there exist γ n P( ) and a subsequence γ mk,n such that (36) γ mk,n γ n narrowly in P( ) Since the maps π i # (i 1, 2) are continuous with respect to narrow convergence, we infer that (37) γ n Γ(µ, µ n ) Let x Then, for m k,n N, d(x, x) p dµ mk (x) d(x, x) p dγ mk,n 2 p d(x, y) p dγ mk,n +2 p d(y, x) p dγ mk,n 2 p ɛ p +2 p d(y, x) p dµ n In the last step we have utilized (35) Since µ mk converges narrowly to µ, claim (33) follows from the Portmanteau theorem (see [1, (5115)]) Similarly, d(x, y) p dγ n (x, y) lim inf d(x, y) p dγ mk,n(x, y) k lim inf W p p (µ mk,µ n ) ɛ p k By (37) we have Wp p (µ, µ n ) d(x, y) p dγ n (x, y) ɛ p This proves claim (34) Thus our proof is complete Acknowledgments The authors wish to thank L Ambrosio, G Savaré, and C Villani for their valuable comments In particular a suggestion by G Savaré helpedtomakethe technicalities in Section 2 shorter and the proof of Proposition 21 more transparent

WASSERSTEIN METRIC 339 References [1] L Ambrosio, N Gigli, G Savaré, Gradient Flows in Metric Spaces and in the Space of Probability Measures, Birkhäuser, 2005 MR2129498 (2006k:49001) [2] R M Dudley, Real Analysis and Probability, Cambridge Studies in Advanced Mathematics 74, Cambridge University Press, 2002 MR1932358 (2003h:60001) Mathematical Institute, Leiden University, P O Box 9512, NL-2300 RA Leiden, The Netherlands E-mail address: philippeclem@gmailcom Institut für Mathematik und Wissenschaftliches Rechnen, Karl-Franzens-Universität Graz, Heinrichstrasse 36, 8010 Graz, Austria E-mail address: georgdesch@uni-grazat