AN ELEMENTARY PROOF OF THE TRIANGLE INEQUALITY FOR THE WASSERSTEIN METRIC

Similar documents
Notes of the seminar Evolution Equations in Probability Spaces and the Continuity Equation

SEPARABILITY AND COMPLETENESS FOR THE WASSERSTEIN DISTANCE

Measure Theory on Topological Spaces. Course: Prof. Tony Dorlas 2010 Typset: Cathal Ormond

A description of transport cost for signed measures

1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3

Real Analysis Notes. Thomas Goller

Reminder Notes for the Course on Measures on Topological Spaces

MATHS 730 FC Lecture Notes March 5, Introduction

Lebesgue s Differentiation Theorem via Maximal Functions

MATH 202B - Problem Set 5

2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

MATH MEASURE THEORY AND FOURIER ANALYSIS. Contents

1. Supremum and Infimum Remark: In this sections, all the subsets of R are assumed to be nonempty.

Local semiconvexity of Kantorovich potentials on non-compact manifolds

Real Analysis Problems

Generalized Orlicz spaces and Wasserstein distances for convex concave scale functions

Optimal Transport for Data Analysis

Math 5051 Measure Theory and Functional Analysis I Homework Assignment 3

Preparatory Material for the European Intensive Program in Bydgoszcz 2011 Analytical and computer assisted methods in mathematical models

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

CS229T/STATS231: Statistical Learning Theory. Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018

GLUING LEMMAS AND SKOROHOD REPRESENTATIONS

Problem set 1, Real Analysis I, Spring, 2015.

Notions such as convergent sequence and Cauchy sequence make sense for any metric space. Convergent Sequences are Cauchy

MTH 404: Measure and Integration

Integration on Measure Spaces

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

Probability and Measure

ABSTRACT INTEGRATION CHAPTER ONE

Invariant measures for iterated function systems

Weak KAM pairs and Monge-Kantorovich duality

2 Lebesgue integration

(2) E M = E C = X\E M

Another Riesz Representation Theorem

MATH 51H Section 4. October 16, Recall what it means for a function between metric spaces to be continuous:

MAT 544 Problem Set 2 Solutions

5 Measure theory II. (or. lim. Prove the proposition. 5. For fixed F A and φ M define the restriction of φ on F by writing.

Measurable Choice Functions

converges as well if x < 1. 1 x n x n 1 1 = 2 a nx n

CHAPTER 1. Metric Spaces. 1. Definition and examples

Math 5051 Measure Theory and Functional Analysis I Homework Assignment 2

Tools from Lebesgue integration

ON THE SET OF ALL CONTINUOUS FUNCTIONS WITH UNIFORMLY CONVERGENT FOURIER SERIES

CHAPTER 6. Differentiation

Bootcamp. Christoph Thiele. Summer As in the case of separability we have the following two observations: Lemma 1 Finite sets are compact.

Review of measure theory

Metric Spaces and Topology

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

Lebesgue-Radon-Nikodym Theorem

Functional Analysis. Martin Brokate. 1 Normed Spaces 2. 2 Hilbert Spaces The Principle of Uniform Boundedness 32

Math 140A - Fall Final Exam

6.2 Fubini s Theorem. (µ ν)(c) = f C (x) dµ(x). (6.2) Proof. Note that (X Y, A B, µ ν) must be σ-finite as well, so that.

AN INTRODUCTION TO GEOMETRIC MEASURE THEORY AND AN APPLICATION TO MINIMAL SURFACES ( DRAFT DOCUMENT) Academic Year 2016/17 Francesco Serra Cassano

Dual Space of L 1. C = {E P(I) : one of E or I \ E is countable}.

Annalee Gomm Math 714: Assignment #2

g 2 (x) (1/3)M 1 = (1/3)(2/3)M.

MATS113 ADVANCED MEASURE THEORY SPRING 2016

1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer 11(2) (1989),

Definition 6.1. A metric space (X, d) is complete if every Cauchy sequence tends to a limit in X.

MAA6617 COURSE NOTES SPRING 2014

In N we can do addition, but in order to do subtraction we need to extend N to the integers

Math 4317 : Real Analysis I Mid-Term Exam 1 25 September 2012

Real Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi

REAL AND COMPLEX ANALYSIS

4th Preparation Sheet - Solutions

THEOREMS, ETC., FOR MATH 515

L p Spaces and Convexity

In N we can do addition, but in order to do subtraction we need to extend N to the integers

Math 328 Course Notes

L p Functions. Given a measure space (X, µ) and a real number p [1, ), recall that the L p -norm of a measurable function f : X R is defined by

Examples of Dual Spaces from Measure Theory

CHAPTER I THE RIESZ REPRESENTATION THEOREM

arxiv: v1 [math.fa] 14 Jul 2018

M17 MAT25-21 HOMEWORK 6

Your first day at work MATH 806 (Fall 2015)

Measure and integration

SINGULAR MEASURES WITH ABSOLUTELY CONTINUOUS CONVOLUTION SQUARES ON LOCALLY COMPACT GROUPS

GRADIENT FLOWS FOR NON-SMOOTH INTERACTION POTENTIALS

1 Definition of the Riemann integral

MATH41011/MATH61011: FOURIER SERIES AND LEBESGUE INTEGRATION. Extra Reading Material for Level 4 and Level 6

Hausdorff measure. Jordan Bell Department of Mathematics, University of Toronto. October 29, 2014

Real Analysis II, Winter 2018

arxiv: v1 [math.ap] 10 Oct 2013

(U) =, if 0 U, 1 U, (U) = X, if 0 U, and 1 U. (U) = E, if 0 U, but 1 U. (U) = X \ E if 0 U, but 1 U. n=1 A n, then A M.

REVIEW OF ESSENTIAL MATH 346 TOPICS

FUNDAMENTALS OF REAL ANALYSIS by. IV.1. Differentiation of Monotonic Functions

6. Duals of L p spaces

+ 2x sin x. f(b i ) f(a i ) < ɛ. i=1. i=1

(1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define

About the method of characteristics

Functional Analysis MATH and MATH M6202

Solutions to Tutorial 11 (Week 12)

THE RIEMANN INTEGRAL USING ORDERED OPEN COVERINGS

n [ F (b j ) F (a j ) ], n j=1(a j, b j ] E (4.1)

Absolutely continuous curves in Wasserstein spaces with applications to continuity equation and to nonlinear diffusion equations

HILBERT SPACES AND THE RADON-NIKODYM THEOREM. where the bar in the first equation denotes complex conjugation. In either case, for any x V define

Normed Vector Spaces and Double Duals

Introduction to Empirical Processes and Semiparametric Inference Lecture 08: Stochastic Convergence

Lebesgue measure and integration

The small ball property in Banach spaces (quantitative results)

Transcription:

PROCEEDINGS OF THE AMERICAN MATHEMATICAL SOCIETY Volume 136, Number 1, January 2008, Pages 333 339 S 0002-9939(07)09020- Article electronically published on September 27, 2007 AN ELEMENTARY PROOF OF THE TRIANGLE INEQUALITY FOR THE WASSERSTEIN METRIC PHILIPPE CLEMENT AND WOLFGANG DESCH (Communicated by Richard C Bradley) Abstract We give an elementary proof for the triangle inequality of the p- Wasserstein metric for probability measures on separable metric spaces Unlike known approaches, our proof does not rely on the disintegration theorem in its full generality; therefore the additional assumption that the underlying space is Radon can be omitted We also supply a proof, not depending on disintegration, that the Wasserstein metric is complete on Polish spaces 1 Introduction In [1] the Wasserstein metric W p (with p 1) is defined for probability measures on a Radon space The proof of the triangle inequality relies on the disintegration theorem [1, Theorem 531] The aim of this paper is to give an elementary proof of the triangle inequality for a general separable metric space To be more precise, we introduce the following notation and definitions (according to [1]): Let (, d) be a separable metric space with its Borel σ-algebra B() Here P() denotes the set of Borel probability measures If (Y,d) is another separable metric space, if µ P() andf : Y is a measurable map, then f # µ is the image measure on B(Y ), ie (f # µ)(a) µ(f 1 (A)) for A B(Y ) In particular, considering the product 1 2 of two separable metric spaces, we define the canonical projections π i : 1 2 i If γ P( 1 2 ), then π i #γ (for i 1, 2) are the marginal distributions of γ Similarly, for products of three spaces, π i,j : 1 2 3 i j denotes the canonical projection If µ i P( i ), we define Γ(µ 1,µ 2 ){γ P( 1 2 ): π i #γ µ i for i 1, 2} Let 1 p < By P p () wedenotethesetofallµ P() such that d(x, y)p dµ(x) < for some (equivalently, all) y For µ 1,µ 2 P p () we define the Wasserstein metric as in [2, Section 118, Problem 7]: (11) W p (µ 1,µ 2 ) inf d(x 1,x 2 ) p dγ(x 1,x 2 )] γ Γ(µ 1,µ 2 ) In [2, Section 1183] it is shown for p 1 that W 1 is a metric Moreover, [2, Section 1183, Problem 9] consists in proving that W p is a metric for p 1, Received by the editors October 30, 2006 2000 Mathematics Subject Classification Primary 60B05 Key words and phrases Wasserstein metric, triangle inequality, probability measures on metric spaces 333 c 2007 American Mathematical Society

334 PHILIPPE CLEMENT AND WOLFGANG DESCH provided (, d) is a Polish space In [1, Section 71] it is proved that W p is a metric for p 1 under the more general assumption that (, d) is a separable Radon space In this note (Section 2) we prove that the triangle inequality can be proved by more elementary means, thus extending its validity to general separable metric spaces The strategy of our proof follows quite closely that of [2] However, in the case of a countable metric space, the disintegration can be done in a very straightforward way without requiring higher tools of measure theory The generalisation from countable to separable metric spaces is done by an approximation procedure In [1], the disintegration theorem is also used to prove the completeness of (P p (),W p )when(, d) is complete In Section 3 of this note we give a proof which does not rely on the disintegration theorem but on the completeness of (P(),β), where β is the dual bounded Lipschitz metric (see [2, p 394]) 2 The triangle inequality We begin with a proof of the triangle inequality in the case where (, d) isa countable metric space Proposition 21 Let (, d) be a countable metric space and let 1 p< Let µ 1,µ 2,µ 3 P p (), γ 1,2 Γ(µ 1,µ 2 ) and γ 2,3 Γ(µ 2,µ 3 ) Then there exists some γ 1,3 Γ(µ 1,µ 3 ) such that d(x 1,x 3 ) p dγ 1,3 (x 1,x 3 ) d(x 1,x 2 ) p dγ 1,2 (x 1,x 2 )] + d(x 2,x 3 ) p dγ 2,3 (x 1,x 3 )] In particular, W p (µ 1,µ 3 ) W p (µ 1,µ 2 )+W p (µ 2,µ 3 ) Proof Let {v 1,v 2, } For short, we denote µ i k µi ({v k })andγ i,j k,l γ i,j ({(v k,v l )}) whenever i, j {1, 2, 3} and k, l N We define (in accordance with the notation just introduced) the measure γ γ k,m,n (δ vk δ vm δ vn ) with γ k,m,n k,m,n { γ 1,2 k,m γ2,3 m,n µ 2 m if µ 2 m 0, 0 if µ 2 m 0 Since the first marginal of γ 2,3 equals µ 2,weobtain γ π 1,2 #γ({(v k,v m )}) { 1,2 k,m γ2,3 m,n n µ γ 1,2 2 m k,m if µ2 m 0, 0γ 1,2 k,m if µ 2 m 0 Similarly, π 2,3 #γ γ 2,3 Since the marginals are probability measures, we infer that γ is a probability measure Moreover it follows for j 1, 2, 3thatπ j #γ µ j We define γ 1,3 π 1,3 #γ which is again a probability measure and has marginals µ 1 and µ 3

WASSERSTEIN METRIC 335 Now use the definition of γ 1,3 and Minkowski s inequality to estimate d(x 1,x 3 ) p dγ 1,3 (x 1,x 3 ) d(x 1,x 3 ) p dγ(x 1,x 2,x 3 ) [d(x 1,x 2 )+d(x 2,x 3 )] p dγ(x 1,x 2,x 3 ) + d(x 1,x 2 ) p dγ(x 1,x 2,x 3 ) d(x 2,x 3 ) p dγ(x 1,x 2,x 3 ) d(x 1,x 2 ) p dγ 1,2 (x 1,x 2 )] + Thus γ 1,3 satisfies the desired estimate d(x 2,x 3 ) p dγ 2,3 (x 2,x 3 ) The extension from the case of a countable metric space to a separable space is done by an approximation procedure We will need several lemmas Lemma 22 Let (, d) be a separable metric space and let be a countable dense subset of Then, for each ɛ>0, there exists a Borel measurable map f : such that d(x, f(x)) <ɛfor each x ThesetsS i f 1 (v i ) form a partition of Proof Let {v 1,v 2, } We define inductively S 1 B(v 1,ɛ), S i B(v i,ɛ) \ j<i S j This is a partition of by Borel sets Finally, we define f(x) v i whenever x S i Remark 23 By skipping the indices where S i and renumbering, we can obtain a partition of into nonempty sets S i Lemma 24 Let be a separable metric space and let be a countable dense subset of Let ɛ>0 and let f be given according to Lemma 22 Moreover, let γ P( ) and γ P( ) be such that γ (f f) # γ Then the following assertions hold: (1) the marginals satisfy (for i 1, 2) π i # γ f # (π # i γ), [ (2) [ d(x, y)p dγ(x, y)] d(x, y)p d γ(x, y)] 2ɛ Proof Let Ũ Then (π 1 # γ)(ũ) γ(ũ ) γ(f 1 (Ũ) ) (π1 #γ)(f 1 (Ũ)) [f #(π 1 #γ)](ũ)

336 PHILIPPE CLEMENT AND WOLFGANG DESCH Of course, the same proof holds for π 2 To obtain the estimate, we utilize the fact that γ (f f) # γ and Minkowski s inequality: d(x, y) p dγ(x, y)] d(x, y) p d γ(x, y)] d(x, y) p dγ(x, y)] d(f(x),f(y)) p dγ(x, y)] d(x, y) d(f(x),f(y)) p dγ(x, y) (2ɛ) p dγ(x, y)] 2ɛ Theorem 25 Let (, d) be a separable metric space and let 1 p< Let µ 1,µ 2,µ 3 P p () ThenW p (µ 1,µ 3 ) W p (µ 1,µ 2 )+W p (µ 2,µ 3 ) Proof Let {v 1,v 2, } be a dense countable subset of, letɛ>0, and let f begivenaccordingtolemma22 LetS i f 1 ({v i }) For i 1, 2, 3 we define µ i f # µ i Now, for the pairs (i, j) {(1, 2), (2, 3)}, letγ i,j Γ(µ i,µ j ) be such that d(x i,x j ) p dγ i,j (x i,x j )] <W p (µ i,µ j )+ɛ On we define the measures γ i,j (f f) # γ i,j By Lemma 24(1) we infer that γ i,j Γ( µ i, µ j ) By Lemma 24(2) we have the estimates d(x i,x j ) p d γ i,j d(x i,x j ) p dγ i,j +2ɛ W p (µ i,µ j )+3ɛ Proposition 21 implies that there exists Γ( µ 1, µ 3 ) such that d(x 1,x 3 ) p d d(x 1,x 2 ) p d γ 1,2 + d(x 2,x 3 ) p d γ 2,3 W p (µ 1,µ 2 )+W p (µ 2,µ 3 )+6ɛ For short, we write γ1,3 ({(v k,v n )}) and µ i m µ i ({v m })µ i (S m ) we define the measure γ 1,3 (U) N, µ 1 k 0, µ3 n 0 µ 1 (µ 1 µ 3 )[U (S k S n )] k µ3 n We will show below that γ 1,3 Γ(µ 1,µ 3 )and(f f) # γ 1,3 Then Lemma 24(2) implies that d(x 1,x 3 ) p dγ 1,3 d(x 1,x 3 ) p d +2ɛ W p (µ 1,µ 2 )+W p (µ 2,µ 3 )+8ɛ On

WASSERSTEIN METRIC 337 From this we conclude that W p (µ 1,µ 3 ) W p (µ 1,µ 2 )+W p (µ 2,µ 3 )+6ɛ Withɛ 0 we infer the desired triangle inequality Now we show that γ 1,3 Γ(µ 1,µ 3 ) We will use the definition of γ 1,3 and the marginals of In the following computation the sum has to be understood over all indices where the denominators are nonzero Notice that µ 1 k 0orµ3 n 0 implies γ 1,3 0 π 1 #γ 1,3 (V )γ 1,3 (V ) µ 1 1 k µ 1 (V S k ) µ 1 (V S k )µ 1 (V ) k1 µ 1 k1 k 1 µ 1 (µ 1 µ 3 )[(V S k ) S n ] k µ3 n 1 µ 1 (V S k ) n1 To show that (f f) # γ 1,3,considerŨ 1 Then (f f) (Ũ) :(v k,v n ) Ũ S k S n Consequently, 1,3 (f f) # γ (Ũ) γ1,3 S k S n :(v k,v n ) Ũ Thus the proof is finished :(v k,v n ) Ũ µ 1 (µ 1 µ 3 )(S k S n ) k µ3 n :(v k,v n ) Ũ γ1,3 (Ũ) 3 Completeness Here we give an alternative proof of the completeness of (P p (),W p ) whenever is complete (see [1, Propostion 715]) Throughout this section, let 1 p< We recall the definition of the dual bounded Lipschitz metric for µ, ν P(): (31) β(µ, ν) sup f(x) dµ(x) f(x) dν(x) where the supremum is taken over all bounded and Lipschitz continuous f : R such that f +[f] Lip 1 Convergence with respect to β is equivalent to narrow convergence [2, 1133] Moreover, for µ, ν P p () (32) β(µ, ν) W p (µ, ν) Indeed, let f be bounded and Lipschitz with [f] Lip 1 For γ Γ(µ, ν) wehave f(x) dµ(x) f(x) dν(x) (f(x) f(y)) dγ(x, y) f(x) f(y) dγ(x, y) d(x, y) dγ(x, y) d(x, y) p dγ(x, y)

338 PHILIPPE CLEMENT AND WOLFGANG DESCH Let µ n be a Cauchy sequence in (P p (),W p ) From (32) it is also a Cauchy sequence in (P(),β) Since (, d) is complete, (P(),β) is also complete by [2, Corollary 1155] Let µ denote the limit of µ n with respect to the metric β Given ɛ>0, let N 1 be such that W p (µ m,µ n ) ɛ for all m, n N We claim that given x, n N (33) d(x, x) p dµ(x) 2 p ɛ p +2 p d(y, x) p dµ n (y), (34) W p (µ, µ n ) ɛ This implies that µ P p () andw p (µ n,µ) 0 By [1, Section 71], since (, d) is complete, the infimum in (11) is in fact a minimum For m, n 1letγ m,n Γ(µ m,µ n ) be such that (35) Wp p (µ m,µ n ) d(x, y) p dγ m,n (x, y) Since µ m µ narrowly, [1, Lemma 522] implies that for each n there exist γ n P( ) and a subsequence γ mk,n such that (36) γ mk,n γ n narrowly in P( ) Since the maps π i # (i 1, 2) are continuous with respect to narrow convergence, we infer that (37) γ n Γ(µ, µ n ) Let x Then, for m k,n N, d(x, x) p dµ mk (x) d(x, x) p dγ mk,n 2 p d(x, y) p dγ mk,n +2 p d(y, x) p dγ mk,n 2 p ɛ p +2 p d(y, x) p dµ n In the last step we have utilized (35) Since µ mk converges narrowly to µ, claim (33) follows from the Portmanteau theorem (see [1, (5115)]) Similarly, d(x, y) p dγ n (x, y) lim inf d(x, y) p dγ mk,n(x, y) k lim inf W p p (µ mk,µ n ) ɛ p k By (37) we have Wp p (µ, µ n ) d(x, y) p dγ n (x, y) ɛ p This proves claim (34) Thus our proof is complete Acknowledgments The authors wish to thank L Ambrosio, G Savaré, and C Villani for their valuable comments In particular a suggestion by G Savaré helpedtomakethe technicalities in Section 2 shorter and the proof of Proposition 21 more transparent

WASSERSTEIN METRIC 339 References [1] L Ambrosio, N Gigli, G Savaré, Gradient Flows in Metric Spaces and in the Space of Probability Measures, Birkhäuser, 2005 MR2129498 (2006k:49001) [2] R M Dudley, Real Analysis and Probability, Cambridge Studies in Advanced Mathematics 74, Cambridge University Press, 2002 MR1932358 (2003h:60001) Mathematical Institute, Leiden University, P O Box 9512, NL-2300 RA Leiden, The Netherlands E-mail address: philippeclem@gmailcom Institut für Mathematik und Wissenschaftliches Rechnen, Karl-Franzens-Universität Graz, Heinrichstrasse 36, 8010 Graz, Austria E-mail address: georgdesch@uni-grazat