The Wasserstein distances - PDF Free Download

The Wasserstei distaces March 20, 2011 This documet presets the proof of the mai results we proved o Wasserstei distaces themselves (ad ot o curves i the Wasserstei space). I particular, triagle iequality ad characterizatio of the topology. These proof are ot easy to be foud i the same terms. Defiitio of the distaces ad triagle iequality First, for R ad p 1, let us set P p () := {µ P() : x p dµ < + }. This subset of P() will be the space where we defie our distaces. Obviously, if is bouded the P p () = P(). For µ, ν P p (), let us defie { 1/p W p (µ, ν) := if x y p dγ γ Π(µ, ν)}, i.e. the p th root of the miimal trasport cost for the cost x y p. The assumptio µ, ν P p () guaratees fiiteess of this value, sice x y p C( x p + y p ) ad hece W p (µ, ν) p C( x p dµ+ x p dν). Notice that, due to Jese iequality, sice for ay γ Π(µ, ν) we have γ( ) = 1, for p q we ca ifer ( 1/p ( x y dγ) p = x y L p (γ) x y L q (γ) = x y q dγ) 1/q, which implies W p (µ, ν) W q (µ, ν). I particular W 1 (µ, ν) W p (µ, ν) for every p 1. We will ot defie here W (as a limit for p, or, which is the same, as the miimal value of the supremal problem mi γ Π(µ,ν) x y L (γ)). O the other had, for bouded a opposite iequality holds, sice ( ) 1/p ( 1/p x y p dγ diam() p p 1 x y dγ), which implies W p (µ, ν) CW 1 (µ, ν) 1/p, for C = diam() p ad p = p p 1. 1

Propositio 0.1. The quatity W p defied above is actually a distace over P p (). Proof. First, let us otice that W p 0. The, we also otice that W p (µ, ν) = 0 implies, as a cosequece that the miimum i the defiitio of W p is attaied, that there exists γ π(µ, ν) such that x y p dγ = 0, which meas that γ is cocetrated o {x = y}. This implies µ = ν sice, for ay test fuctio φ we have φ dµ = φ(x)dγ = φ(y)dγ = φ dν. We eed ow to prove the triagle iequality. For that, let us take µ, ρ ad ν P p (), γ + Π(µ, ρ) ad γ Π(µ, ρ). We ca also choose γ ± to be optimal. Let us use the Lemma 0.2 below to say that there exists a measure σp( ) such that (π x,y ) # σ = γ + ad (π y,z ) # σ = γ, where π x,y ad π y,z deote the projectios o the two first ad two last variables, respectively. Let us take γ := (π x,z ) # σ. By compositio of the projectios, it is easy to see that (π x ) # γ = (π x ) # σ = (π x ) # γ + = µ ad, aalogously, (π z ) # γ = ν. This meas γ Π(µ, ν) ad W p (µ, ν) ( 1/p ( x z dγ) p = ( x y L p (σ) + y z L p (σ) = = ( ) 1/p ( x z p dγ + + x z p dσ) 1/p = x z L p (σ) 1/p ( x z dσ) p + ) 1/p x z p dσ x z p dγ ) 1/p = W p (µ, ρ) + W p (ρ, ν). Lemma 0.2. Give two measures γ + Π(µ, ρ) ad γ Π(µ, ρ) there exists at least a measure σp( ) such that (π x,y ) # σ = γ + ad (π y,z ) # σ = γ, where π x,y ad π y,z deote the projectios o the two first ad two last variables, respectively. Proof. Start by takig γ + ad disitegrate it w.r.t. the projectio π y. We get a family of measures γ y + P() (we ca thik of them as measures over, istead of viewig them as measures over {y} )/ They satisfy (ad they are defied by) φ(x, y)dγ + (x, y) = dρ(y) φ(x, y) dγ y + (x), for every measurable fuctio φ of two variables. I the same way, oe has a family of measures γy P() such that for every ψ we have ψ(y, z)dγ (y, z) = dρ(y) ψ(y, z) dγy (z). For every y take ow γ y + γy, which is a measure over. Defie σ through ζ(x, y, z)dσ(x, y, z) := dρ(y) ζ(x, y, z) d ( γ y + γ ) y (x, z). 3 2

It is easy to check that, for φ depedig oly o x ad y, we have φ(x, y)dσ = dρ(y) φ(x, y) d ( γ y + γ ) y (x, z) = dρ(y) 3 φ(x, y) dγ y + (x) = φ dγ +. This proves (π x,y ) # σ = γ + ad the proof of (π y,z ) # σ = γ is completely aalogous. For the sake of completeess, we also give a proof of the triagle iequality which avoids usig disitegratios. We first eed the followig lemma. Lemma 0.3. Give µ, ν P p (R ) ad χ ε ay usual regularizig kerel i L 1 with χ ε = 1 ad spt(χ ε ) B(0, ε), we have lim ε 0 W p(µ χ ε, ν) = W p (µ, ν). Proof. Take a optimal trasport pla γ Π(µ, ν) ad defie a measure γ ε Π(µ χ ε, ν) through ψ(x, y)dγ ε := ψ(x z, y)χ ε (z)dz dγ(x, y). R R R R R We eed to check that its margials are actually µ χ ε ad ν. For that just cosider ψ(x)dγ ε = ψ(x z)χ ε (z)dz dγ(x, y) = dz ψ(x z)dγ(x, y) R R R R R R R R = dz ψ(x z)dµ(x) = ψ dµ χ ε R R R ad, more easily ψ(y)dγ ε = R R R R ψ(y)χ ε (z)dz dγ(x, y) = R ψ(y)dγ(x, y) = R R ψ dν. It is the easy to show that x y p dγ ε x y p dγ, sice x y p dγ ε x y p dγ dγ(x, y) x y p x y z p χ ε (z)dz pε dγ(x, y)( x y +1) p 1 (we use the fact that z ε o spt(χ ε ) ad we roughly estimate (a + ε) p a p εp(a + 1) p 1 thaks to the mea value theorem (for a 0 ad 0 ε 1). The last itegral beig fiite sice x y p dγ < +, lettig ε 0 we get lim sup ε 0 W p (µ χ ε, ν) p lim sup ε 0 x y p dγ ε = x y p dγ. This shows lim sup ε 0 W p (µ χ ε, ν) W p (µ, ν). Oe cas also obtai the opposite iequality with the limif i the followig way. First fix a sequece ε k 0 such that lim k W p (µ χ εk, ν) = lim if ε 0 W p (µ χ ε, ν). The extract a subsequece 3

ε kj so as to guaratee that the optimal trasport plas γ ε k j sedig µ χ εkj to ν have a weak limit γ 0 (see ext sectio for disambiguatios o the meaig of weak covergece). This weak limit must belog to Π(µ, ν) (the fact that the margials of γ 0 are µ ad ν follows by the properties of compositio with cotiuous fuctios of the weak covergece). The we have W p (µ, ν) p x y p dγ 0 lim if j x y p dγ ε k j = lim if j W p (µ χ εkj, ν) p = lim if ε 0 W p (µ χ ε, ν), where the first iequality follows from the fact that γ 0 is ot ecessarily optimal but is admissible ad the secod by semicotiuity (sice x y p is a positive ad cotiuous fuctio, which is the icreasig limit of positive, cotiuous ad bouded fuctios). The, we ca perform a proof of the triagle iequality based o the use of optimal trasport maps. Propositio 0.4. Eve if we refuse to use disitegratios, the triagle iequality is true for W p. Proof. First cosider the case where µ ad ρ are absolutely cotiuous ad ν is arbitrary. Let T be the optimal trasport from µ to ρ ad S from ρ to ν. The S T is a admissible trasport from µ to ν, sice (S T ) # µ = S # (T # µ) = S # ρ = ν. The we have ( W p (µ, ν) S(T (x)) x p dµ) 1/p = S T id L p (µ) S T T L p (µ) + T id L p (µ). Yet, ( S T T L p (µ) = 1/p ( S(T (x)) T (x) dµ) p = S(y) y p dρ) 1/p = W p (ρ, ν) ad T id L p (µ) = W p (µ, ρ), hece W p (µ, ν) W p (µ, ρ) + W p (ρ, ν). This gives the proof whe µ, ρ << L d. If ρ is arbitrary, take ow ρ χ ε istead, thus obtaiig W p (µ, ν) W p (µ, ρ χ ε ) + W p (ρ χ ε, ν). By passig to the limit as ε 0 ad usig Lemma 0.3 the iequality follows for arbitrary ρ. Fially, µ may be take arbitrary as well by cosiderig ow µ χ ε, with arbitrary ρ ad ν ad lettig ε 0. Topology iduced by W p First of all, let us clarify that we ofte use the term weak covergece, whe speakig of probability measures, to deote the covergece i the duality with bouded cotiuous fuctios (which is ofte referred to as arrow covergece), ad write µ µ to say that µ coverges i such a sese to µ. Notice also that, whe both µ ad µ are probability measures, this covergece coicides with the 4

covergece i the duality with fuctios φ C 0 (), vaishig at ifiity. To covice of such a fact, we oly eed to show that if we take φ C b (), µ, µ P() ad we suppose ψdµ ψdµ for every ψ C 0 (), the we also have φdµ φdµ. If all the measures are probability, we ca add for free a costat C to φ ad, sice φ is bouded, we ca choose C so that φ + C 0. Hece φ + C is the sup of a icreasig family of fuctios i C 0 (take (φ + C)χ, χ beig a icreasig family of cut-off fuctios with χ = 1 o B(0, ). Hece, by semicotiuity we have (φ + C)dµ lim if (φ + C)dµ, which implies φdµ lim if φdµ. If the same argumet is performed with φ we have te desired covergece of the itegrals. Oce the weak covergece is uderstood, we ca start from the followig result. Theorem 0.5. If is compact, the µ µ if ad oly if W 1 (µ, µ) 0. Proof. Let us recall the duality formula, which gives for arbitrary µ, ν P() { } { } W 1 (µ, ν) = mi x y dγ, γ Π(µ, ν) = max φ d(µ ν) : φ Lip 1. Let us start from a sequece µ such that W 1 (µ, µ) 0. Thaks to the duality formula, for every φ Lip 1 () we have φ d(µ µ) 0. By liearity, the same will be true for ay Lipschitz fuctio. By desity, for ay fuctio i C b (). This shows that the Wasserstei covergece implies the weak covergece. To prove the opposite implicatio, let us first fix a subsequece µ k such that lim k W 1 (µ k, µ) = lim sup W 1 (µ, µ). For every k, pick a fuctio φ k Lip 1 () such that φ k d(µ k µ) = W 1 (µ k, µ). Up to addig a costat, which does ot affect the itegral, we ca suppose that φ k all vaish o a same poit, ad they are hece uiformly bouded ad equicotiuous. By Ascoli s theorem we ca extract a sub-subsequece uiformly covergig to a certai φ Lip 1 (). By replacig the origial subsequece with this ew oe we ca avoid relabelig. We have ow W 1 (µ k, µ) = φ k d(µ k µ) φ k φ d(µ k + µ) + φ d(µ k µ) 2 φ k φ L + φ d(µ k µ) 0, where the first term goes to 0 by uiform covergece ad the secod by weak covergece. This shows that lim sup W 1 (µ, µ) 0 ad cocludes the proof. Theorem 0.6. If is compact ad p 1, the µ µ if ad oly if W p (µ, µ) 0. Proof. We have already proved this equivalece for p = 1. For the other values of p, just use the iequalities W 1 (µ, ν) W p (µ, ν) CW 1 (µ, ν) 1/p, that give the equivalece betwee the covergece for W p ad for W 1. We ca ow pass to the case of ubouded domais. 5

Theorem 0.7. Cosider ay R d ad p 1, the W p (µ, µ) 0 if ad oly if µ µ ad x p dµ x p dµ. Proof. Cosider first a sequece µ P p () which is covergig to µ for the W p distace. It is still true i this case that { } sup φ d(µ µ) : φ Lip 1 0, which gives the weak covergece testig agaist ay Lipschitz fuctio. Notice that Lipschitz fuctios are dese (for the uiform covergece) i the space C 0 () (while it is ot ecessarily the case for C b ()) ad that this is eough to prove µ µ. To obtai the other coditio, amely x p dµ x p dµ (which is ot a cosequece of the weak covergece, sice x p is ot bouded), it is sufficiet to otice that x p dµ = Wp p (µ, δ 0 ) Wp p (µ, δ 0 ) = x p dµ. We eed ow to prove the opposite implicatio. Cosider a sequece a µ µ satisfyig also x p dµ x p dµ. Fix R > 0 ad cosider the fuctio φ(x) := ( x R) p, which is cotiuous ad bouded. We have ( x p ( x R) p ) dµ = x p dµ φ dµ x p dµ φ dµ = ( x p ( x R) p ) dµ. Sice ( x p ( x R) p ) dµ B(0,R) c x p dµ it is possible to choose R so that ( x p ( x R) p ) dµ < ε/2 ad hece oe ca also guaratee that ( x p ( x R) p ) dµ < ε for all large eough. We use ow the iequality ( x R) p x p R p = x p ( x R) p which is valid for x R (see Lemma 0.8 below) to get ( x R) p dµ < ε for large eough ad ( x R) p dµ < ε. Cosider ow π R : R d B(0, R) defied as the projectio over B(0, R). This map is well defied ad cotiuous ad is the idetity o B(0, R). Moreover, for every x / B(0, R) we have x π R (x) = x R. We ca deduce ( 1/p ( ) 1/p W p (µ, (π R ) # µ) ( x R) dµ) p ε 1/p, W p (µ, (π R ) # µ ) ( x R) p dµ ε 1/p. Notice also that, due to the usual compositio of the weak covergece with cotiuous fuctios, from µ µ we also ifer (π R ) # µ (π R ) # µ. Yet, these measures are all cocetrated o 6

the compact set B(0, R) ad here we ca use the equivalece betwee weak covergece ad W p covergece. Hece, we get lim sup W p (µ, µ) lim sup (W p (µ, (π R ) # µ ) + W p ((π R ) # µ, (π R ) # µ) + W p (µ, (π R ) # µ)) 2ε 1/p + lim W p ((π R ) # µ, (π R ) # µ) = 2ε 1/p. The parameter ε > 0 beig arbitrary, we get lim sup W p (µ, µ) = 0 ad the proof is cocluded. Lemma 0.8. For a, b R + ad p 1 we have a p + b p (a + b) p. Proof. Suppose without loss of geerality that a b. The we ca write (a + b) p = a p + pξ p 1 b, for a poit ξ [a, a + b]. Use ow p 1 ad ξ a b to get (a + b) p a p + b p. 7