The Wasserstein distances

Similar documents
Sequences and Series of Functions

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

n p (Ω). This means that the

y X F n (y), To see this, let y Y and apply property (ii) to find a sequence {y n } X such that y n y and lim sup F n (y n ) F (y).

Convergence of random variables. (telegram style notes) P.J.C. Spreij

1+x 1 + α+x. x = 2(α x2 ) 1+x

7.1 Convergence of sequences of random variables

MATH 413 FINAL EXAM. f(x) f(y) M x y. x + 1 n

REAL ANALYSIS II: PROBLEM SET 1 - SOLUTIONS

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

Infinite Sequences and Series

Chapter 6 Infinite Series

Math Solutions to homework 6

PDE II Homework 2 Solutions

Measure and Measurable Functions

Metric Space Properties

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.

1 Introduction. 1.1 Notation and Terminology

6.3 Testing Series With Positive Terms

Integrable Functions. { f n } is called a determining sequence for f. If f is integrable with respect to, then f d does exist as a finite real number

Chapter 7 Isoperimetric problem

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Math 341 Lecture #31 6.5: Power Series

Exponential Functions and Taylor Series

Beurling Integers: Part 2

7.1 Convergence of sequences of random variables

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem

Lecture 3 The Lebesgue Integral

Sequences and Limits

M17 MAT25-21 HOMEWORK 5 SOLUTIONS

Assignment 5: Solutions

1 Convergence in Probability and the Weak Law of Large Numbers

7 Sequences of real numbers

Detailed proofs of Propositions 3.1 and 3.2

lim za n n = z lim a n n.

Analytic Continuation

McGill University Math 354: Honors Analysis 3 Fall 2012 Solutions to selected problems

1 = δ2 (0, ), Y Y n nδ. , T n = Y Y n n. ( U n,k + X ) ( f U n,k + Y ) n 2n f U n,k + θ Y ) 2 E X1 2 X1

Introduction to Functional Analysis

Seunghee Ye Ma 8: Week 5 Oct 28

Lecture 10: Bounded Linear Operators and Orthogonality in Hilbert Spaces

Empirical Processes: Glivenko Cantelli Theorems

MAS111 Convergence and Continuity

Dupuy Complex Analysis Spring 2016 Homework 02

FUNDAMENTALS OF REAL ANALYSIS by. V.1. Product measures

The natural exponential function

Exponential Functions and Taylor Series

Fall 2013 MTH431/531 Real analysis Section Notes

Solutions to Tutorial 5 (Week 6)

INFINITE SEQUENCES AND SERIES

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013

APPROXIMATION BY BERNSTEIN-CHLODOWSKY POLYNOMIALS

Lecture 8: Convergence of transformations and law of large numbers

Theorem 3. A subset S of a topological space X is compact if and only if every open cover of S by open sets in X has a finite subcover.

A Proof of Birkhoff s Ergodic Theorem

Probability for mathematicians INDEPENDENCE TAU

Math 299 Supplement: Real Analysis Nov 2013

Final Solutions. 1. (25pts) Define the following terms. Be as precise as you can.

Advanced Analysis. Min Yan Department of Mathematics Hong Kong University of Science and Technology

TENSOR PRODUCTS AND PARTIAL TRACES

Council for Innovative Research

Lecture 19: Convergence

Solutions to HW Assignment 1

Fundamental Theorem of Algebra. Yvonne Lai March 2010

TR/46 OCTOBER THE ZEROS OF PARTIAL SUMS OF A MACLAURIN EXPANSION A. TALBOT

Ma 4121: Introduction to Lebesgue Integration Solutions to Homework Assignment 5

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

If a subset E of R contains no open interval, is it of zero measure? For instance, is the set of irrationals in [0, 1] is of measure zero?

Lecture 3 : Random variables and their distributions

Notes 19 : Martingale CLT

Math 25 Solutions to practice problems

ON MEAN ERGODIC CONVERGENCE IN THE CALKIN ALGEBRAS

Character rigidity for lattices and commensurators I after Creutz-Peterson

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

6. Uniform distribution mod 1

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +

ANSWERS TO MIDTERM EXAM # 2

HOMEWORK #4 - MA 504

Distribution of Random Samples & Limit theorems

Solution. 1 Solutions of Homework 1. Sangchul Lee. October 27, Problem 1.1

Random Walks on Discrete and Continuous Circles. by Jeffrey S. Rosenthal School of Mathematics, University of Minnesota, Minneapolis, MN, U.S.A.

Application to Random Graphs

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Sequences and Series

Lecture Chapter 6: Convergence of Random Sequences

1 Lecture 2: Sequence, Series and power series (8/14/2012)

2.1. The Algebraic and Order Properties of R Definition. A binary operation on a set F is a function B : F F! F.

PRACTICE FINAL/STUDY GUIDE SOLUTIONS

Additional Notes on Power Series

Approximation by Superpositions of a Sigmoidal Function

Mathematical Methods for Physics and Engineering

11.5 Alternating Series, Absolute and Conditional Convergence

Topologie. Musterlösungen

LECTURE SERIES WITH NONNEGATIVE TERMS (II). SERIES WITH ARBITRARY TERMS

1 The Haar functions and the Brownian motion

Chapter 8. Uniform Convergence and Differentiation.

MAT1026 Calculus II Basic Convergence Tests for Series

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11

16 Riemann Sums and Integrals

FUNDAMENTALS OF REAL ANALYSIS by

Transcription:

The Wasserstei distaces March 20, 2011 This documet presets the proof of the mai results we proved o Wasserstei distaces themselves (ad ot o curves i the Wasserstei space). I particular, triagle iequality ad characterizatio of the topology. These proof are ot easy to be foud i the same terms. Defiitio of the distaces ad triagle iequality First, for R ad p 1, let us set P p () := {µ P() : x p dµ < + }. This subset of P() will be the space where we defie our distaces. Obviously, if is bouded the P p () = P(). For µ, ν P p (), let us defie { 1/p W p (µ, ν) := if x y p dγ γ Π(µ, ν)}, i.e. the p th root of the miimal trasport cost for the cost x y p. The assumptio µ, ν P p () guaratees fiiteess of this value, sice x y p C( x p + y p ) ad hece W p (µ, ν) p C( x p dµ+ x p dν). Notice that, due to Jese iequality, sice for ay γ Π(µ, ν) we have γ( ) = 1, for p q we ca ifer ( 1/p ( x y dγ) p = x y L p (γ) x y L q (γ) = x y q dγ) 1/q, which implies W p (µ, ν) W q (µ, ν). I particular W 1 (µ, ν) W p (µ, ν) for every p 1. We will ot defie here W (as a limit for p, or, which is the same, as the miimal value of the supremal problem mi γ Π(µ,ν) x y L (γ)). O the other had, for bouded a opposite iequality holds, sice ( ) 1/p ( 1/p x y p dγ diam() p p 1 x y dγ), which implies W p (µ, ν) CW 1 (µ, ν) 1/p, for C = diam() p ad p = p p 1. 1

Propositio 0.1. The quatity W p defied above is actually a distace over P p (). Proof. First, let us otice that W p 0. The, we also otice that W p (µ, ν) = 0 implies, as a cosequece that the miimum i the defiitio of W p is attaied, that there exists γ π(µ, ν) such that x y p dγ = 0, which meas that γ is cocetrated o {x = y}. This implies µ = ν sice, for ay test fuctio φ we have φ dµ = φ(x)dγ = φ(y)dγ = φ dν. We eed ow to prove the triagle iequality. For that, let us take µ, ρ ad ν P p (), γ + Π(µ, ρ) ad γ Π(µ, ρ). We ca also choose γ ± to be optimal. Let us use the Lemma 0.2 below to say that there exists a measure σp( ) such that (π x,y ) # σ = γ + ad (π y,z ) # σ = γ, where π x,y ad π y,z deote the projectios o the two first ad two last variables, respectively. Let us take γ := (π x,z ) # σ. By compositio of the projectios, it is easy to see that (π x ) # γ = (π x ) # σ = (π x ) # γ + = µ ad, aalogously, (π z ) # γ = ν. This meas γ Π(µ, ν) ad W p (µ, ν) ( 1/p ( x z dγ) p = ( x y L p (σ) + y z L p (σ) = = ( ) 1/p ( x z p dγ + + x z p dσ) 1/p = x z L p (σ) 1/p ( x z dσ) p + ) 1/p x z p dσ x z p dγ ) 1/p = W p (µ, ρ) + W p (ρ, ν). Lemma 0.2. Give two measures γ + Π(µ, ρ) ad γ Π(µ, ρ) there exists at least a measure σp( ) such that (π x,y ) # σ = γ + ad (π y,z ) # σ = γ, where π x,y ad π y,z deote the projectios o the two first ad two last variables, respectively. Proof. Start by takig γ + ad disitegrate it w.r.t. the projectio π y. We get a family of measures γ y + P() (we ca thik of them as measures over, istead of viewig them as measures over {y} )/ They satisfy (ad they are defied by) φ(x, y)dγ + (x, y) = dρ(y) φ(x, y) dγ y + (x), for every measurable fuctio φ of two variables. I the same way, oe has a family of measures γy P() such that for every ψ we have ψ(y, z)dγ (y, z) = dρ(y) ψ(y, z) dγy (z). For every y take ow γ y + γy, which is a measure over. Defie σ through ζ(x, y, z)dσ(x, y, z) := dρ(y) ζ(x, y, z) d ( γ y + γ ) y (x, z). 3 2

It is easy to check that, for φ depedig oly o x ad y, we have φ(x, y)dσ = dρ(y) φ(x, y) d ( γ y + γ ) y (x, z) = dρ(y) 3 φ(x, y) dγ y + (x) = φ dγ +. This proves (π x,y ) # σ = γ + ad the proof of (π y,z ) # σ = γ is completely aalogous. For the sake of completeess, we also give a proof of the triagle iequality which avoids usig disitegratios. We first eed the followig lemma. Lemma 0.3. Give µ, ν P p (R ) ad χ ε ay usual regularizig kerel i L 1 with χ ε = 1 ad spt(χ ε ) B(0, ε), we have lim ε 0 W p(µ χ ε, ν) = W p (µ, ν). Proof. Take a optimal trasport pla γ Π(µ, ν) ad defie a measure γ ε Π(µ χ ε, ν) through ψ(x, y)dγ ε := ψ(x z, y)χ ε (z)dz dγ(x, y). R R R R R We eed to check that its margials are actually µ χ ε ad ν. For that just cosider ψ(x)dγ ε = ψ(x z)χ ε (z)dz dγ(x, y) = dz ψ(x z)dγ(x, y) R R R R R R R R = dz ψ(x z)dµ(x) = ψ dµ χ ε R R R ad, more easily ψ(y)dγ ε = R R R R ψ(y)χ ε (z)dz dγ(x, y) = R ψ(y)dγ(x, y) = R R ψ dν. It is the easy to show that x y p dγ ε x y p dγ, sice x y p dγ ε x y p dγ dγ(x, y) x y p x y z p χ ε (z)dz pε dγ(x, y)( x y +1) p 1 (we use the fact that z ε o spt(χ ε ) ad we roughly estimate (a + ε) p a p εp(a + 1) p 1 thaks to the mea value theorem (for a 0 ad 0 ε 1). The last itegral beig fiite sice x y p dγ < +, lettig ε 0 we get lim sup ε 0 W p (µ χ ε, ν) p lim sup ε 0 x y p dγ ε = x y p dγ. This shows lim sup ε 0 W p (µ χ ε, ν) W p (µ, ν). Oe cas also obtai the opposite iequality with the limif i the followig way. First fix a sequece ε k 0 such that lim k W p (µ χ εk, ν) = lim if ε 0 W p (µ χ ε, ν). The extract a subsequece 3

ε kj so as to guaratee that the optimal trasport plas γ ε k j sedig µ χ εkj to ν have a weak limit γ 0 (see ext sectio for disambiguatios o the meaig of weak covergece). This weak limit must belog to Π(µ, ν) (the fact that the margials of γ 0 are µ ad ν follows by the properties of compositio with cotiuous fuctios of the weak covergece). The we have W p (µ, ν) p x y p dγ 0 lim if j x y p dγ ε k j = lim if j W p (µ χ εkj, ν) p = lim if ε 0 W p (µ χ ε, ν), where the first iequality follows from the fact that γ 0 is ot ecessarily optimal but is admissible ad the secod by semicotiuity (sice x y p is a positive ad cotiuous fuctio, which is the icreasig limit of positive, cotiuous ad bouded fuctios). The, we ca perform a proof of the triagle iequality based o the use of optimal trasport maps. Propositio 0.4. Eve if we refuse to use disitegratios, the triagle iequality is true for W p. Proof. First cosider the case where µ ad ρ are absolutely cotiuous ad ν is arbitrary. Let T be the optimal trasport from µ to ρ ad S from ρ to ν. The S T is a admissible trasport from µ to ν, sice (S T ) # µ = S # (T # µ) = S # ρ = ν. The we have ( W p (µ, ν) S(T (x)) x p dµ) 1/p = S T id L p (µ) S T T L p (µ) + T id L p (µ). Yet, ( S T T L p (µ) = 1/p ( S(T (x)) T (x) dµ) p = S(y) y p dρ) 1/p = W p (ρ, ν) ad T id L p (µ) = W p (µ, ρ), hece W p (µ, ν) W p (µ, ρ) + W p (ρ, ν). This gives the proof whe µ, ρ << L d. If ρ is arbitrary, take ow ρ χ ε istead, thus obtaiig W p (µ, ν) W p (µ, ρ χ ε ) + W p (ρ χ ε, ν). By passig to the limit as ε 0 ad usig Lemma 0.3 the iequality follows for arbitrary ρ. Fially, µ may be take arbitrary as well by cosiderig ow µ χ ε, with arbitrary ρ ad ν ad lettig ε 0. Topology iduced by W p First of all, let us clarify that we ofte use the term weak covergece, whe speakig of probability measures, to deote the covergece i the duality with bouded cotiuous fuctios (which is ofte referred to as arrow covergece), ad write µ µ to say that µ coverges i such a sese to µ. Notice also that, whe both µ ad µ are probability measures, this covergece coicides with the 4

covergece i the duality with fuctios φ C 0 (), vaishig at ifiity. To covice of such a fact, we oly eed to show that if we take φ C b (), µ, µ P() ad we suppose ψdµ ψdµ for every ψ C 0 (), the we also have φdµ φdµ. If all the measures are probability, we ca add for free a costat C to φ ad, sice φ is bouded, we ca choose C so that φ + C 0. Hece φ + C is the sup of a icreasig family of fuctios i C 0 (take (φ + C)χ, χ beig a icreasig family of cut-off fuctios with χ = 1 o B(0, ). Hece, by semicotiuity we have (φ + C)dµ lim if (φ + C)dµ, which implies φdµ lim if φdµ. If the same argumet is performed with φ we have te desired covergece of the itegrals. Oce the weak covergece is uderstood, we ca start from the followig result. Theorem 0.5. If is compact, the µ µ if ad oly if W 1 (µ, µ) 0. Proof. Let us recall the duality formula, which gives for arbitrary µ, ν P() { } { } W 1 (µ, ν) = mi x y dγ, γ Π(µ, ν) = max φ d(µ ν) : φ Lip 1. Let us start from a sequece µ such that W 1 (µ, µ) 0. Thaks to the duality formula, for every φ Lip 1 () we have φ d(µ µ) 0. By liearity, the same will be true for ay Lipschitz fuctio. By desity, for ay fuctio i C b (). This shows that the Wasserstei covergece implies the weak covergece. To prove the opposite implicatio, let us first fix a subsequece µ k such that lim k W 1 (µ k, µ) = lim sup W 1 (µ, µ). For every k, pick a fuctio φ k Lip 1 () such that φ k d(µ k µ) = W 1 (µ k, µ). Up to addig a costat, which does ot affect the itegral, we ca suppose that φ k all vaish o a same poit, ad they are hece uiformly bouded ad equicotiuous. By Ascoli s theorem we ca extract a sub-subsequece uiformly covergig to a certai φ Lip 1 (). By replacig the origial subsequece with this ew oe we ca avoid relabelig. We have ow W 1 (µ k, µ) = φ k d(µ k µ) φ k φ d(µ k + µ) + φ d(µ k µ) 2 φ k φ L + φ d(µ k µ) 0, where the first term goes to 0 by uiform covergece ad the secod by weak covergece. This shows that lim sup W 1 (µ, µ) 0 ad cocludes the proof. Theorem 0.6. If is compact ad p 1, the µ µ if ad oly if W p (µ, µ) 0. Proof. We have already proved this equivalece for p = 1. For the other values of p, just use the iequalities W 1 (µ, ν) W p (µ, ν) CW 1 (µ, ν) 1/p, that give the equivalece betwee the covergece for W p ad for W 1. We ca ow pass to the case of ubouded domais. 5

Theorem 0.7. Cosider ay R d ad p 1, the W p (µ, µ) 0 if ad oly if µ µ ad x p dµ x p dµ. Proof. Cosider first a sequece µ P p () which is covergig to µ for the W p distace. It is still true i this case that { } sup φ d(µ µ) : φ Lip 1 0, which gives the weak covergece testig agaist ay Lipschitz fuctio. Notice that Lipschitz fuctios are dese (for the uiform covergece) i the space C 0 () (while it is ot ecessarily the case for C b ()) ad that this is eough to prove µ µ. To obtai the other coditio, amely x p dµ x p dµ (which is ot a cosequece of the weak covergece, sice x p is ot bouded), it is sufficiet to otice that x p dµ = Wp p (µ, δ 0 ) Wp p (µ, δ 0 ) = x p dµ. We eed ow to prove the opposite implicatio. Cosider a sequece a µ µ satisfyig also x p dµ x p dµ. Fix R > 0 ad cosider the fuctio φ(x) := ( x R) p, which is cotiuous ad bouded. We have ( x p ( x R) p ) dµ = x p dµ φ dµ x p dµ φ dµ = ( x p ( x R) p ) dµ. Sice ( x p ( x R) p ) dµ B(0,R) c x p dµ it is possible to choose R so that ( x p ( x R) p ) dµ < ε/2 ad hece oe ca also guaratee that ( x p ( x R) p ) dµ < ε for all large eough. We use ow the iequality ( x R) p x p R p = x p ( x R) p which is valid for x R (see Lemma 0.8 below) to get ( x R) p dµ < ε for large eough ad ( x R) p dµ < ε. Cosider ow π R : R d B(0, R) defied as the projectio over B(0, R). This map is well defied ad cotiuous ad is the idetity o B(0, R). Moreover, for every x / B(0, R) we have x π R (x) = x R. We ca deduce ( 1/p ( ) 1/p W p (µ, (π R ) # µ) ( x R) dµ) p ε 1/p, W p (µ, (π R ) # µ ) ( x R) p dµ ε 1/p. Notice also that, due to the usual compositio of the weak covergece with cotiuous fuctios, from µ µ we also ifer (π R ) # µ (π R ) # µ. Yet, these measures are all cocetrated o 6

the compact set B(0, R) ad here we ca use the equivalece betwee weak covergece ad W p covergece. Hece, we get lim sup W p (µ, µ) lim sup (W p (µ, (π R ) # µ ) + W p ((π R ) # µ, (π R ) # µ) + W p (µ, (π R ) # µ)) 2ε 1/p + lim W p ((π R ) # µ, (π R ) # µ) = 2ε 1/p. The parameter ε > 0 beig arbitrary, we get lim sup W p (µ, µ) = 0 ad the proof is cocluded. Lemma 0.8. For a, b R + ad p 1 we have a p + b p (a + b) p. Proof. Suppose without loss of geerality that a b. The we ca write (a + b) p = a p + pξ p 1 b, for a poit ξ [a, a + b]. Use ow p 1 ad ξ a b to get (a + b) p a p + b p. 7