Upper and lower risk bounds for estimating the Wasserstein barycenter of random measures on the real line

Similar documents
Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

= lim. (x 1 x 2... x n ) 1 n. = log. x i. = M, n

2. Independence and Bernoulli Trials

å 1 13 Practice Final Examination Solutions - = CS109 Dec 5, 2018

STRONG CONSISTENCY FOR SIMPLE LINEAR EV MODEL WITH v/ -MIXING

Unit 9. The Tangent Bundle

2SLS Estimates ECON In this case, begin with the assumption that E[ i

IS 709/809: Computational Methods in IS Research. Simple Markovian Queueing Model

Random Variables. ECE 313 Probability with Engineering Applications Lecture 8 Professor Ravi K. Iyer University of Illinois

Functions of Random Variables

Continuous Random Variables: Conditioning, Expectation and Independence

D KL (P Q) := p i ln p i q i

Nonparametric Density Estimation Intro

Lecture 9. Some Useful Discrete Distributions. Some Useful Discrete Distributions. The observations generated by different experiments have

Chain Rules for Entropy

Summary of the lecture in Biostatistics

Point Estimation: definition of estimators

On the characteristics of partial differential equations

Channel Models with Memory. Channel Models with Memory. Channel Models with Memory. Channel Models with Memory

Pr[X (p + t)n] e D KL(p+t p)n.

Factorization of Finite Abelian Groups

Chapter 5 Properties of a Random Sample

Econometric Methods. Review of Estimation

STK3100 and STK4100 Autumn 2018

A Remark on the Uniform Convergence of Some Sequences of Functions

STK3100 and STK4100 Autumn 2017

Part 4b Asymptotic Results for MRR2 using PRESS. Recall that the PRESS statistic is a special type of cross validation procedure (see Allen (1971))

CHAPTER 6. d. With success = observation greater than 10, x = # of successes = 4, and

BASIC PRINCIPLES OF STATISTICS

X ε ) = 0, or equivalently, lim

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

The Mathematical Appendix

Chapter 5 Properties of a Random Sample

Two Fuzzy Probability Measures

STK4011 and STK9011 Autumn 2016

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Entropy, Relative Entropy and Mutual Information

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then

Qualifying Exam Statistical Theory Problem Solutions August 2005

CHAPTER VI Statistical Analysis of Experimental Data

Complete Convergence and Some Maximal Inequalities for Weighted Sums of Random Variables

Lecture Notes Types of economic variables

Semi-Riemann Metric on. the Tangent Bundle and its Index

Special Instructions / Useful Data

PROJECTION PROBLEM FOR REGULAR POLYGONS

CS 2750 Machine Learning Lecture 5. Density estimation. Density estimation

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best

Estimation of Stress- Strength Reliability model using finite mixture of exponential distributions

MEASURES OF DISPERSION

Introduction to Econometrics (3 rd Updated Edition, Global Edition) Solutions to Odd-Numbered End-of-Chapter Exercises: Chapter 9

Comparing Different Estimators of three Parameters for Transmuted Weibull Distribution

Rademacher Complexity. Examples

IMPROVED GA-CONVEXITY INEQUALITIES

MATH 247/Winter Notes on the adjoint and on normal operators.

Median as a Weighted Arithmetic Mean of All Sample Observations

Extreme Value Theory: An Introduction

Application of Generating Functions to the Theory of Success Runs

Analysis of Variance with Weibull Data

Modified Cosine Similarity Measure between Intuitionistic Fuzzy Sets

Chapter 4 (Part 1): Non-Parametric Classification (Sections ) Pattern Classification 4.3) Announcements

Lebesgue Measure of Generalized Cantor Set

A NEW LOG-NORMAL DISTRIBUTION

TESTS BASED ON MAXIMUM LIKELIHOOD

Training Sample Model: Given n observations, [[( Yi, x i the sample model can be expressed as (1) where, zero and variance σ

Asymptotic Behaviors of the Lorenz Curve for Left Truncated and Dependent Data

Chapter 4 Multiple Random Variables

Lecture Note to Rice Chapter 8

Set Theory and Probability

Lecture 3. Sampling, sampling distributions, and parameter estimation

Introduction to local (nonparametric) density estimation. methods

Simulation Output Analysis

22 Nonparametric Methods.

Large and Moderate Deviation Principles for Kernel Distribution Estimator

Structural Threshold Regression

Measures of Entropy based upon Statistical Constants

Probability and Statistics. What is probability? What is statistics?

Module 7: Probability and Statistics

THE PUBLISHING HOUSE PROCEEDINGS OF THE ROMANIAN ACADEMY, Series A, OF THE ROMANIAN ACADEMY Volume 9, Number 3/2008, pp

Strong Convergence of Weighted Averaged Approximants of Asymptotically Nonexpansive Mappings in Banach Spaces without Uniform Convexity

Parameter Estimation

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

9 U-STATISTICS. Eh =(m!) 1 Eh(X (1),..., X (m ) ) i.i.d

Lecture 9: Tolerant Testing

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

Mean is only appropriate for interval or ratio scales, not ordinal or nominal.

A Study of the Reproducibility of Measurements with HUR Leg Extension/Curl Research Line

Recursive linear estimation for discrete time systems in the presence of different multiplicative observation noises

Lecture 3 Probability review (cont d)

A Combination of Adaptive and Line Intercept Sampling Applicable in Agricultural and Environmental Studies

ON THE USE OF OBSERVED FISHER INFORMATION IN WALD AND SCORE TEST

Bayes Estimator for Exponential Distribution with Extension of Jeffery Prior Information

Bounds on the expected entropy and KL-divergence of sampled multinomial distributions. Brandon C. Roy

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA

ON BIVARIATE GEOMETRIC DISTRIBUTION. K. Jayakumar, D.A. Mundassery 1. INTRODUCTION

TESTING FOR SERIAL CORRELATION BY MEANS OF EXTREME VALUES ABSTRACT

Some Notes on the Probability Space of Statistical Surveys

Chapter 3 Experimental Design Models

Investigation of Partially Conditional RP Model with Response Error. Ed Stanek

Transcription:

Uer ad lower rsk bouds for estmatg the Wasserste baryceter of radom measures o the real le Jéréme Bgot, Raúl Gouet, Therry Kle, Alfredo Loez To cte ths verso: Jéréme Bgot, Raúl Gouet, Therry Kle, Alfredo Loez. Uer ad lower rsk bouds for estmatg the Wasserste baryceter of radom measures o the real le. Electroc joural of statstcs, Shaker Heghts, OH : Isttute of Mathematcal Statstcs, 208, 2 02,.2253 2289. <htts://rojecteucld.org/eucld.ejs/532333005 fo>. <hal-033340v2> HAL Id: hal-033340 htts://hal.archves-ouvertes.fr/hal-033340v2 Submtted o 29 Ja 208 HAL s a mult-dsclary oe access archve for the deost ad dssemato of scetfc research documets, whether they are ublshed or ot. The documets may come from teachg ad research sttutos Frace or abroad, or from ublc or rvate research ceters. L archve ouverte lurdsclare HAL, est destée au déôt et à la dffuso de documets scetfques de veau recherche, ublés ou o, émaat des établssemets d esegemet et de recherche fraças ou étragers, des laboratores ublcs ou rvés.

Uer ad lower rsk bouds for estmatg the Wasserste baryceter of radom measures o the real le Jéréme Bgot, Raúl Gouet 2, Therry Kle 3 & Alfredo Lóez 4 Isttut de Mathématques de Bordeaux et CNRS UMR 525 Uversté de Bordeaux Deto. de Igeería Matemátca ad CMM CNRS, UMI 2807 2 Uversdad de Chle ENAC- Ecole atoale de l avato cvle et Isttut de Mathématques de Toulouse et CNRS UMR 529 3 Uversté de Toulouse CSIRO Chle Iteratoal Cetre of Excellece 4 December 8, 207 Abstract Ths aer s focused o the statstcal aalyss of robablty measures ν,..., ν o R that ca be vewed as deedet realzatos of a uderlyg stochastc rocess. We cosder the stuato of ractcal mortace where the radom measures ν are absolutely cotuous wth destes f that are ot drectly observable. I ths case, stead of the destes, we have access to datasets of real radom varables X,j ; j orgazed the form of exermetal uts, such that X,,..., X, are d observatos samled from a radom measure ν for each. I ths settg, we focus o frst-order statstcs methods for estmatg, from such data, a meagful structural mea measure. For the urose of takg to accout hase ad amltude varatos the observatos, we argue that the oto of Wasserste baryceter s a relevat tool. The ma cotrbuto of ths aer s to characterze the rate of covergece of a ossbly smoothed emrcal Wasserste baryceter towards ts oulato couterart the asymtotc settg where both ad m may go to fty. The otmalty of ths rocedure s dscussed from the mmax ot of vew wth resect to the Wasserste metrc. We also hghlght the coecto betwee our aroach ad the curve regstrato roblem statstcs. Some umercal exermets are used to llustrate the results of the aer o the covergece rate of emrcal Wasserste baryceters. Keywords: Wasserste sace; Fréchet mea; Baryceter of robablty measures; Fuctoal data aalyss; Phase ad amltude varablty; Smoothg; Mmax otmalty. AMS classfcatos: Prmary 62G08; secodary 62G20. J. Bgot s a member of Isttut Uverstare de Frace.

Ackowledgmets We are very much debted to the referees ad the Assocate Edtor for ther costructve crtcsm, commets ad remarks that resulted a major revso of the orgal mauscrt. Itroducto I ths aer, we are cocered wth the statstcal aalyss of a set of absolutely cotuous measures ν,..., ν o the real le R, wth suorts cluded a ossbly ubouded terval Ω R, that ca be vewed as deedet coes of a uderlyg radom measure ν. I ths settg, t s of terest to defe ad estmate a mea measure ν 0 of the radom robablty measure ν. The oto of mea or averagg deeds o the metrc that s chose to comare elemets a gve data set. I ths work, we cosder the Wasserste metrc d W assocated to the quadratc cost for the comarso of robablty measures ad we defe ν 0 as the oulato Wasserste baryceter of ν, gve by ν 0 = arg m E d 2 W ν, µ, µ W 2 Ω where the above exectato s take wth resect to the dstrbuto of ν, ad W 2 Ω deotes the sace of robablty measures wth suort cluded Ω ad wth fte secod momet. A Wasserste baryceter corresods to the Fréchet mea Fré48 that s a exteso of the usual Eucldea mea to o-lear metrc saces. Throughout the aer, the oulato mea measure ν 0 s also referred to as the structural mea of ν, whch s a termology borrowed from curve regstrato see ZM ad refereces there. We choose to work wth the Wasserste metrc because t has bee show to be successful the resece of hase varato we refer to Secto 2 for more exlaatos. Data sets leadg to the aalyss of absolutely cotuous measures aear varous research felds. Examles ca be foud euroscece WS, demograhc ad geomcs studes Del, ZM, ecoomcs KU0, as well as bomedcal magg PM6. Nevertheless, such alcatos oe does ot drectly observe raw data the form of absolutely cotuous measures. Ideed, we geerally oly have access to radom observatos samled from dfferet dstrbutos, that rereset deedet subjects or exermetal uts. Thus, we roose to study the estmato of the structural mea measure ν 0 the oulato Wasserste baryceter from a data set cosstg of deedet real radom varables X,j ; j orgazed the form of exermetal uts, such that codtoally o ν the radom varables X,,..., X, are d observatos samled from the measure ν wth desty f, where deotes the umber of observatos for the -th subject or exermetal ut. The ma urose of ths aer s to roose oarametrc estmators of the structural mea measure ν 0 ad to characterze ther rates of covergece wth resect to the Wasserste metrc the asymtotc settg, where both ad m may go to fty. 2

. Ma cotrbutos Two tyes of oarametrc estmators are cosdered ths aer. The frst oe s gve by the emrcal Wasserste baryceter of the set of measures ν,..., ν, wth ν = δ X,j for. Ths estmator wll be referred to as the o-smoothed emrcal Wasserste baryceter. Alteratvely, sce the ukow robablty measures ν are suosed to be absolutely cotuous, a secod estmator s based o a relmary smoothg ste whch cossts usg stadard kerel smoothg to costruct estmators ˆf of the ukow destes f for each. The, a estmator of ν 0 s obtaed by takg the emrcal Wasserste baryceter of the measures ˆν,..., ˆν, wth ˆν A := ˆf A xdx, A R measurable. We refer to ths class of estmators as smoothed emrcal Wasserste baryceters whose smoothess deed o the choce of the badwdths the relmary kerel smoothg ste. The rates of covergece of both tyes of estmators are derved for ther squared Wasserste rsks, defed as ther exected squared Wasserte dstaces from ν 0, ad ther otmalty s dscussed from the mmax ot of vew. Fally, some umercal exermets wth smulated data are used to llustrate these results..2 Related work the lterature The oto of baryceter the Wasserste sace, for a fte set of robablty measures suorted o R d for ay d, has bee recetly troduced AC where a detaled characterzato of such baryceters terms of exstece, uqueess ad regularty s gve usg argumets from dualty ad covex aalyss. However, the covergece as of such Wasserste baryceters s ot cosdered that work. I the oe dmesoal case d =, comutg the Wasserste baryceter of a fte set of robablty measures smly amouts to averagg the usual way ther quatle fuctos. I statstcs, ths aroach has bee referred to as quatle sychrozato ZM. I the resece of hase varablty the data, quatle sychrozato s kow to be a arorate alteratve to the usual Eucldea mea of destes to comute a structural mea desty that s more cosstet wth the data. Varous asymtotc roertes of quatle sychrozato are studed ZM a statstcal model ad asymtotc settg smlar to that of ths aer wth m. However, other measures of rsk tha the oe ths aer are cosdered ZM, but the otmalty of the resultg covergece rates of quatle sychrozato s ot dscussed. The results of ths aer are very much coected wth those PZ6 where a ew framework s develoed for the regstrato of multle ot rocesses o the real le for the urose of searatg amltude ad hase varato such data. I PZ6, cosstet estmators of the structural mea of multle ot rocesses are obtaed by the use of smoothed Wasserste baryceters wth a arorate choce of kerel smoothg. Also, rates of covergece of such estmators are derved for the Wasserste metrc. The statstcal aalyss of multle ot rocesses s very much coected to the study of reeated observatos orgazed samles from deedet subjects or exermetal uts. Therefore, some of our results ths aer o smoothed emrcal Wasserste baryceters are bult uo the work PZ6. Nevertheless, ovel cotrbutos clude the dervato of a exact formula to comute the rsk 3

of o-smoothed Wasserste baryceters the case of samles of equal sze, ad ew uer bouds o the rate of covergece of the Wasserste rsk of o-smoothed ad smoothed emrcal Wasserste baryceters, together wth a dscusso of ther otmalty from the mmax ot of vew. The costructo of cosstet estmators of a oulato Wasserste baryceter for semarametrc models of radom measures ca also be foud BK7 ad BLGL5, together wth a dscusso o ther coecto to the well kow curve regstrato roblem statstcs RL0, WG97..3 Orgazato of the aer I Secto 2, we frst brefly exla why usg statstcs based o the Wasserste metrc s a relevat aroach for the aalyss of a set of radom measures the resece of hase ad amltude varatos ther destes. The, we troduce a deformable model for the regstrato of robablty measures that s arorate to study the statstcal roertes of emrcal Wasserste baryceters. The two tyes of oarametrc estmators descrbed above are fally troduced at the ed of Secto 2. The covergece rates ad the otmalty of these estmators are studed Secto 3. Some umercal exermets wth smulated data are roosed Secto 4 to hghlght the fte samle erformaces of these estmators. Secto 5 cotas a dscusso o the ma cotrbutos of ths work ad ther otetal extesos. The roofs of the ma results are gathered a techcal Aedx. Fally, ote that we use bold symbols f, ν,... to deote radom objects excet real radom varables. 2 Wasserste baryceters for the estmato of the structural mea a deformable model of robablty measures 2. The eed to accout for hase ad amltude varatos To estmate a mea measure from the data X,j ; j, a atural aroach s the followg oe. I a frst ste, oe uses the X,j s to comute estmators ˆf,..., ˆf e.g. va kerel smoothg of the uobserved desty fuctos f,..., f of the measures ν,..., ν. The, a estmator of a mea desty mght be defed as the usual Eucldea mea f = ˆf, whch s also classcally referred to as the cross-sectoal mea curve regstrato. At the level of measures, t corresods to comutg the arthmetcal mea measure ν = ˆν. The Eucldea mea f s the Fréchet mea of the ˆf s wth resect to the usual squared dstace the Hlbert sace L 2 Ω of square tegrable fuctos o Ω. Therefore, t oly accouts for lear varatos amltude the data. However, as remarked ZM, may alcatos t s ofte of terest to also cororate a aalyss of hase varablty.e. tme warg such fuctoal objects, sce t may lead to a better uderstadg of the structure of the data. I such settgs, the use of the stadard squared dstace L 2 Ω to comare desty fuctos gores a ossble sgfcat source of hase varablty the data. To better accout for hase varablty the data, t has bee roosed ZM to troduce the so-called method of quatle sychrozato as a alteratve to the cross sectoal 4

mea f. It amouts to comutg the mea measure ν ad, f t exsts, ts desty f whose quatle fucto s F = F, 2. where F deotes the quatle fucto of the measure ν wth desty f. The statstcal aalyss of quatle sychrozato, as studed ZM, comlemets the quatle ormalzato method orgally roosed BIAS03 to alg desty curves mcroarray data aalyss. Ths method s therefore arorate for the regstrato of desty fuctos ad the estmato of hase ad amltude varatos as exlaed detals PZ6. Let us ow assume that ν,..., ν are radom elemets takg values the set of absolutely cotuous measures cotaed W 2 Ω. I ths settg, t ca be checked see e.g. Proosto 2. below that quatle sychrozato corresods to comutg the emrcal Wasserste baryceter of the radom measures ν,..., ν, amely ν = arg m µ W 2 Ω d 2 W ν, µ. Therefore, the oto of averagg by quatle sychrozato corresods to usg the Wasserste dstace d W to comare robablty measures, whch leads to a oto of measure averagg that may better reflect the structure of the data tha the arthmetcal mea the resece of hase ad amltude varablty. To llustrate the dffereces betwee usg Eucldea ad Wasserste dstaces to accout for hase ad amltude varato, let us assume that the measures ν,..., ν have destes f,..., f obtaed from the followg locato-scale model: we let f 0 be a desty o R havg a fte secod momet ad, for a, b 0, R, =,...,, a gve set of d radom vectors, we defe f x := a f 0 a x b, x R,. 2.2 The sources of varablty of the destes from model 2.2 are the varato locato alog the x-axs, ad the scalg varato. I Fgure a, we lot a samle of = 00 destes from model 2.2 wth f 0 beg the stadard Gaussa desty, a U0.8,.2 ad b U 2, 2, where Ux, y deotes the uform dstrbuto o the terval x, y. I ths umercal exermet, there s more varablty hase.e. locato tha amltude.e. scalg, whch ca also be observed at the level of quatle fuctos as show by Fgure b. I the locato-scale model 2.2, t ca be checked, e.g. usg the quatle averagg formula 2., that the emrcal Wasserste baryceter ν s the robablty measure wth desty f x = ā ā f 0 x b, where ā = a ad b = b. Hece, f we assume that Ea = ad Eb = 0, t follows that d 2 W ν, ν 0 coverges almost surely to 0 as, meag that ν s a cosstet estmator of ν 0 as show by Fgure f. O the cotrary, the arthmetcal mea measure 5

0.0 0. 0.2 0.3 0.4 0.5 6 4 2 0 2 4 6 6 4 2 0 2 4 6 0.0 0.2 0.4 0.6 0.8.0 a Destes f,..., f samled from a locatoscale model b Quatle fuctos F,..., F of f,..., f 0.0 0. 0.2 0.3 0.4 6 4 2 0 2 4 6 6 4 2 0 2 4 6 0.0 0.2 0.4 0.6 0.8.0 c Eucldea mea desty f d Quatle fucto of the arthmetcal mea measure ν wth desty f 0.0 0. 0.2 0.3 0.4 6 4 2 0 2 4 6 6 4 2 0 2 4 6 0.0 0.2 0.4 0.6 0.8.0 e Desty f by quatle sychrozato f Quatle fucto of the Wasserste baryceter ν wth desty f Fgure : A examle of = 00 radom destes a wth quatle fuctos b samled from the locato-scale model 2.2 wth f 0 the stadard Gaussa desty, a U0.8,.2 ad b U 2, 2. c,d The sold-black curves are the Eucldea mea f ad ts quatle fucto. e,f The sold-red curves are the structural mea f gve by quatle sychrozato ad the quatle fucto of the emrcal Wasserste baryceter ν. I all the fgures, the dashed-blue curves are ether the desty f 0 or ts quatle fucto the locato-scale model 2.2. 6

ν s clearly ot a cosstet estmator of ν 0, as t ca be observed Fgure d. Remark 2.. It s clear that, the above locato-scale model, oe may easly rove that f coverges almost surely to f 0 as for varous dstaces betwee desty fuctos as llustrated by Fgure e. However, ths aer, we restrct our atteto to the roblem of how the structural mea measure ν 0 ca be estmated from emrcal Wasserste baryceters wth resect to the Wasserste dstace betwee robablty measures. Showg that the desty f t exsts of such estmators coverges to the desty f 0 of ν 0 s ot cosdered ths work. 2.2 Baryceters the Wasserste sace Let Ω be a terval of R, ossbly ubouded. We let W 2 Ω be the set of robablty measures over Ω, BΩ, wth fte secod momet, where BΩ s the σ-algebra of Borel subsets of Ω. We also deote by W ac 2 Ω the set of measures ν W 2Ω that are absolutely cotuous wth resect to the Lebesgue measure o R. The cumulatve dstrbuto fucto cdf, the quatle fucto ad the desty fucto f ν W ac 2 Ω of ν are deoted resectvely by F ν, F ν ad f ν. Defto 2.. The quadratc Wasserste dstace d W W 2 Ω s defed by d 2 W µ, ν := 0 F µ α F ν α 2 dα, for ay µ, ν W 2 Ω. 2.3 It ca be show that W 2 Ω edowed wth d W s a metrc sace, usually called Wasserste sace. For a detaled aalyss of W 2 Ω ad ts coecto wth otmal trasort theory, we refer to Vl03. A W 2 Ω-valued radom robablty measure ν s a measurable fucto from a abstract robablty sace to W 2 Ω, B W 2 Ω, where B W 2 Ω s the Borel σ-algebra of W 2 Ω. We deote by P the robablty o W 2 Ω, duced by ν. Defto 2.2 Square-tegrablty. The radom measure ν s sad to be square-tegrable f Ed 2 W µ, ν = d 2 W µ, νdpν < + for some thus for every µ W 2 Ω. W 2 Ω Observe that the exectato the defto above s well defed sce ν d W µ, ν s cotuous ad therefore, measurable. I the rest of the aer we assume that radom measures ν are square tegrable, the sese of the revous defto, ad so, ths roerty s ot exlctly stated deftos or results volvg ν. We use the otatos F, F ad f to deote resectvely the cumulatve dstrbuto fucto, the quatle fucto ad the desty fucto f t exsts of ν. Defto 2.3 Poulato ad emrcal Wasserste baryceters. A oulato Wasserste baryceter of ν s defed as a mmzer of µ d 2 W µ, νdpν over µ W 2 Ω. W 2 Ω 7

A emrcal Wasserste baryceter of ν,..., ν W 2 Ω s defed as a mmzer of µ d 2 W µ, ν over µ W 2 Ω. Proosto 2.. There exsts a uque baryceter of ν, deoted ν 0. F ν 0 = E F. Varν := E d 2 W ν, ν 0 = 0 Var F α dα <. Proof. Observe that F s measurable, cosdered as a radom elemet of L 2 0, the sace of real valued fuctos o 0,, square-tegrable wrt Lebesgue s measure, sce the ma ν W 2 Ω Fν L 2 0, s cotuous, because of 2.3. Assertos ad are cotaed Proosto 4. BGKL7. Let us rove. From 2.3 ad Fub s theorem, we have Var ν = E F α Fν 0 α 2 dα = 0 0 E F α F ν 0 α 2 dα = Fteess of Var ν s mmedate from the square-tegrablty of ν. 0 Var F α dα. 2.3 A deformable model of robablty measures Let ν,..., ν be deedet coes of the radom measure ν, wth baryceter ν 0. We cosder a deformable model of radom robablty measures satsfyg the followg assumtos: Assumto 2.. ν W2 ac Ω a.s. Assumto 2.2. ν 0 W ac 2 Ω. Assumto 2.3. Codtoally o ν, the observatos X,,..., X, are d radom varables samled from ν, where s a kow teger, for. Remark 2.2. Smlar assumtos are cosdered PZ6 to characterze a oulato baryceter W 2 Ω for the urose of estmatg hase ad amltude varatos from the observatos of multle ot rocesses. For examles of arametrc models satsfyg the Assumtos 2.- 2.3, we refer to BK7 ad BLGL5. The ma restrcto of ths deformable model s that ν 0 s assumed to be absolutely cotuous. Remark 2.3. Observe that Assumto 2. requres W ac 2 Ω to be a measurable subset of W 2Ω. The roof of ths techcal result wll aear a forthcomg aer. 8

2.4 No-smoothed emrcal baryceter To estmate the structural mea measure ν 0 from the data X,j ; j, a frst aroach cossts comutg straghtaway the baryceter of the emrcal measure ν,..., ν where ν = δ X,j δ x deotes the Drac mass at ot x Ω. The o-smoothed emrcal baryceter s thus defed as ˆν = arg m, µ W 2 Ω d 2 W ν, µ, 2.4 where =,...,. I the case = 2 =... = =, we have the followg rocedure for comutg the o-smoothed emrcal baryceter. For each, we deote by X, X,2... X, the order statstcs corresodg to the -th samle of observatos X,j j, ad defe X j = X,j, for all j. Thaks to Proosto 2., the quatle fucto of the emrcal Wasserste baryceter s the average of the quatle fuctos of ν,..., ν, ad thus we obta the formula ˆν, = δ X. j 2.5 Note that we use ˆν, stead of ˆν case, = 2 =... = =. to deote the o-smoothed emrcal baryceter the 2.5 Smoothed emrcal baryceter A alteratve aroach s to use a smoothg ste to obta estmated destes ad the comute the baryceter.. I a frst ste we use kerel smoothg to obta estmators ˆf h h,..., ˆf of f,..., f, where h,..., h are ostve badwdth arameters, that may be dfferet for each subject or exermetal ut. I ths aer we vestgate a o-stadard choce for the kerel fucto, roosed PZ6, to aalyze the covergece of smoothed emrcal baryceter W 2 Ω. I Secto 3 we gve a recse defto of the resultg estmators. However, at ths ot t s ot ecessary to go to such detals. 2. I a secod ste, a estmator of ν 0 s gve by ˆν h,, defed as the measure whose quatle fucto s gve by ˆF h α = ˆF α, α 0,, 2.6 where ˆF deotes the quatle fucto of the desty ˆf h,. If we deote by ˆν h the measure wth desty ˆf h the, by Proosto 2., oe has that ˆν h s also defed as, 9

the followg smoothed emrcal Wasserste baryceter ˆν h = arg m, µ W 2 Ω d 2 W ˆν h, µ. 2.7 3 Covergece rate for estmators of the oulato Wasserste baryceter I ths secto we dscuss the rates of covergece of the estmators ˆν ad ˆν h,, that are, resectvely characterzed by equatos 2.4 ad 2.7. Some of the results reseted below are usg the work BL7, o a detaled study of the varety of rates of covergece of a emrcal measure o the real le toward ts oulato couterart the Wasserste metrc. The, we dscuss the otmalty of these estmators from the mmax ot of vew followg the gudeles oarametrc statstcs to derve otmal rates of covergece see e.g. Tsy09 for a troducto to ths toc. 3. No-smoothed emrcal baryceter the case of samles of equal sze Let us frst characterze the rate of covergece of ˆν,, the secfc case where samles of observatos er ut are of equal sze, amely whe = 2 =... = =. I what follows, we let Y,..., Y be d radom varables samled from the oulato mea measure ν 0 deedetly of the data X,j ; j, ad we deote by µ = k= δ Y k the corresodg emrcal measure. Theorem 3.. If Assumtos 2., 2.2 ad 2.3 are satsfed ad f = 2 =... = =, the the estmator ˆν, satsfes E d 2 W ˆν,, ν 0 = Varν + = Varν + Var Y j + j/ j / E Y j F 0 α 2 dα, Var Yj + E d 2 W µ, ν 0, 3. where Y Y 2... Y deote the order statstcs of the samle Y,..., Y. Theorem 3. rovdes exact formulas to comute the rate of covergece for the exected squared Wasserste dstace of ˆν,. Formula 3. reles o the comutato of the varaces of the order statstcs of d varables Y,..., Y samled from the oulato mea measure ν 0, ad o the comutato of the rate of covergece of E d 2 W µ, ν 0. However, dervg a shar rate of covergece for ˆν, usg equalty 3. requres comutg the varaces of the order statstcs of d radom varables. To the best of our kowledge, obtag a shar estmate for Var Y j for ay j remas a dffcult task excet for secfc dstrbutos. For examle, f ν 0 s assumed to be a log-cocave measure, the t s ossble to use the results Secto 6 BL7 whch rovde shar bouds o the varaces of order statstcs for such 0

robablty measures. We dscuss below some examles where equalty 3. may be used to derve a shar rate of covergece for ˆν,. The case where ν 0 s the uform dstrbuto o 0,. I ths settg, t s kow see e.g. Secto 4.2 BL7 that Var Yj j j + = + 2 + 2 ad thus Var Yj = 6 +. Moreover, from Theorem 4.7 BL7, t follows that E d 2 W µ, ν 0 = 6. Therefore, thaks to 3. we obta E d 2 W ˆν,, ν 0 = Varν + 6 + + 6 = Varν + 6 + + +. 3.2 Equalty 3.2 thus shows that, whe ν 0 s the uform dstrbuto o 0,, the rate of covergece of ˆν, s gve by E d 2 W ˆν,, ν 0 + + 2, 3.3 ad ths rate s shar. Beyod the secfc case where ν 0 s a uform dstrbuto, t s geeral dffcult to comute E d 2 W µ, ν 0. Nevertheless, thaks to Theorem 4.3 BL7, we have the followg bouds 2 Var Yj E d 2 W µ, ν 0 2 for ay dstrbuto ν 0 W 2 Ω. Var Yj, 3.4 The case where ν 0 s the oe-sded exoetal dstrbuto. Combg equalty 3.4 wth 3., t follows that E d 2 W ˆν,, ν 0 Varν + + Var Yj. 3.5 Now usg e.g. Remark 6.3 BL7 oe has that f ν 0 s the oe-sded exoetal dstrbuto wth desty e x, for x 0 the Var Yj = log, as. j

Therefore, there exst a costat c > 0 such that E d 2 W ˆν,, ν 0 Varν + c + log, 3.6 for all suffcetly large. Hece, whe ν 0 s the exoetal dstrbuto the above equaltes show that the rate of covergece of ˆν, s O + + log. The case where ν 0 s a Gaussa dstrbuto. By Theorem 4.3 ad Corollary 6.4 BL7 there exst costats c, c 2 > 0 such that c log log Var Y j log log c2. 3.7 Therefore, combg the above uer boud wth 3.5, oe fally has that E d 2 W ˆν,, ν 0 log log Varν + c 2 +. 3.8 whe ν 0 s the stadard Gaussa. I ths settg, the rate of covergece s thus O + + log log. Uer bouds more geeral cases. If oe s terested dervg a uer boud o E d 2 W ˆν,, ν 0 for a larger class of measures ν 0 W 2 Ω e.g. beyod the log-cocave case, aother aroach s as follows. Notg that the term VarY j equalty 3. s egatve, a straghtforward cosequece of Theorem 3. s the followg uer boud E d 2 W ˆν,, ν 0 Varν + E d 2 W µ, ν 0. 3.9 The, thaks to equalty 3.9, to derve the rate of covergece of ˆν,, t remas to cotrol the rate of covergece of the emrcal measure µ to ν 0 for the exected squared Wasserste dstace. Ths ssue s dscussed detal BL7. I artcular, the work BL7 descrbes a varety of rates for the exected dstace E d 2 W µ, ν 0, from the stadard oe O to slower rates. For examle, by Theorem 5. BL7, the followg uer boud holds E d 2 W µ, ν 0 2 + J 2ν 0, 3.0 where, the so-called J 2 -fuctoal s defed as J 2 : W2 acω R + { }, wth F ν x F ν x J 2 ν = dx, ν W2 ac Ω. 3. f ν x Ω The J 2 fuctoal s show to be measurable Proosto A. of the Aedx. Therefore, rovded that J 2 ν 0 s fte, the emrcal measure µ coverges to ν 0 at the rate O. Hece, usg equalty 3.0, we have: 2

Corollary 3.. Suose that Assumtos 2., 2.2 ad 2.3 are satsfed. The, the estmator ˆν, satsfes E d 2 W ˆν,, ν 0 Varν + 2 + J 2ν 0. 3.2 By Corollary 3., f J 2 ν 0 <, the t follows that ˆν, coverges to ν 0 at the rate O +. Hece, the settg where ad J 2 ν 0 <, ˆν, coverges at the classcal arametrc rate O. The case, usually refereed to as the dese case the lterature o fuctoal data aalyss see e.g. LH0 ad refereces there, corresods to the stuato where the umber of observatos er ut/subject s larger tha the samle sze of fuctoal objects. I the sarse case whe <, the o-smoothed Wasserste baryceter coverges at the rate O, f J 2 ν 0 <. Remark 3.. Whe ν 0 s the uform dstrbuto o 0, oe has that J 2 ν 0 <, but we have show that E d 2 W ˆν,, ν 0 + +. Hece, ths settg, ˆν 2, coverges at the arametrc rate O rovded that, whch s a dese regme codto weaker tha. To coclude ths dscusso o the rate of covergece of the o-smoothed Wasserste baryceter the case of samles of equal sze, we study more detal the cotrol of the rate of covergece of the term E d 2 W µ, ν 0 equalty 3.9. As oted out may works see for examle dbgu05, BL7 ad the refereces there the fteess of J 2 ν 0 s the key ot to cotrol the covergece of the emrcal measure µ to the oulato measure ν 0 the Wasserste sace. Some kow facts cocerg ths ssue are the followg.. If J 2 ν 0 < the ν 0 s suorted o a terval of R ad ts desty s a.e. strctly ostve o ths terval. 2. If ν 0 s comactly suorted wth a desty bouded away from zero or wth a log-cocave desty the J 2 ν 0 <. 3. If the desty of ν 0 s of the form C α e x α the J 2 ν 0 s fte f ad oly f α > 2. I artcular, J 2 ν 0 = for the Gaussa dstrbuto. Some further commets ca be made the case where ν 0 s Gaussa. I ths settg, oe has that J 2 ν 0 = ad the rate of covergece of E d 2 W µ, ν 0 to zero s slower tha O. Ideed, from Corollary 6.4 BL7, f ν 0 s the stadard Gaussa, the the rate of covergece of E d 2 W µ, ν 0 s O, whch leads to the uer boud 3.8 for the log log o-smoothed Wasserste baryceter ˆν, the Gaussa case. Hece, thaks to 3.8, oe has that f s suffcetly large wth resect to amely whe log log, the ˆν, also coverges at the classcal arametrc rate O whe ν0 s the stadard Gaussa dstrbuto. Remark 3.2. Followg the work BL7, f ν 0 has a log-cocave dstrbuto, the oe may obta rates of covergece for E d 2 W ˆν,, ν 0 that are slower tha the stadard O rate 3

e.g. for beta or exoetal dstrbutos. Moreover, t s also ossble to cosderer for ay q ad for ay absolutely cotuous robablty ν o Ω, the fuctoal J q ν = Ω F ν x F ν x q/2 f ν x q dx, order to cotrol the rate of covergece of the emrcal measure to ν for the q-wasserste dstace. 3.2 No-smoothed emrcal baryceter the geeral case Let us ow cosder the geeral stuato where the s are ossbly dfferet. The result below gves a uer boud o the rate of covergece of ˆν where =,...,., Theorem 3.2. Suose that Assumtos 2., 2.2 ad 2.3 are satsfed. The E d W ˆν, ν 0 /2 Varν +, where ν = δ X,j for each. E d 2 W ν, ν, For the radom measure ν, we cosder the exteded radom varable J 2 ν see Proosto A.. Sce the ν s are deedet coes of ν, by alyg equalty 3.0 t follows that E d 2 W ν, ν 2E J 2 ν /2. Hece, from Theorem 3.2, we fally obta the followg uer boud o the rate of covergece for the o-smoothed emrcal baryceter Corollary 3.2. Suose that Assumtos 2., 2.2 ad 2.3 are satsfed. If J 2 ν has a fte exectato, the E d W ˆν, ν 0 /2 Varν + 2E J 2 ν /2., From Corollary 3.2, oe has that f m dese case, the /2, ad thus, the o-smoothed emrcal baryceter coverges at the arametrc rate /2 rovded that E J 2 ν <, amely E d W ˆν,, ν 0 Varν + 2E J2 ν /2. 3.3 /2 Remark 3.3. Kowg f J 2 ν has a fte exectato s geeral a dffcult task. But, f we assume that the desty f of ν s bouded below by a o-radom ostve costat the obvously E J 2 ν <. 4

3.3 The case of smoothed emrcal baryceters I ths secto, we assume that Ω = 0, ad we dscuss the rate of covergece of smoothed emrcal baryceters ˆν h ote that the followg results hold f Ω s ay comact terval., To choose a arorate kerel fucto to study the covergece rate of the estmator ˆν h,, we follow the roosal made PZ6. We let ψ be a ostve, smooth ad symmetrc desty o the real le, such that R x2 ψxdx =. We also deote by Ψ the cdf of the desty ψ ad, for a badwdth arameter h > 0, we let ψ h x = h ψ x h. The, for ay y 0, ad h > 0, we deote by µ y h the measure suorted o 0, whose desty f µ y s defed as h f µ y h x = ψ hx y+2b 2 ψ h x y {x y>0} +2b ψ h x y {x y<0} +4b b 2, x 0,, 3.4 where b = Ψ y/h ad b 2 = Ψ y/h. The, for each, we costruct a kerel desty estmator of f by defg ˆf h ˆν h = µ X,j h, as the desty assocated to the measure 3.5 where h > 0 s a badwdth arameter deedg o. For a dscusso o the tuto for ths choce of kerel smoothg, we refer to PZ6. A key roerty to aalyze the covergece rate of ˆν h s the followg lemma whch relates the Wasserste dstace betwee ˆν, h ad the emrcal measure ν = δ X,j. Remark 3.4. The results PZ6 strogly deed o the assumto that Ω s comact. To go beyod ths assumto, oe should be able to exted a arorate way the desty f µ y x to a o-comact settg, whch we beleve to be a dffcult task. Nevertheless, t should h be remarked that our results o o-smoothed emrcal Wasserste baryceter hold the geeral case where Ω = R. Lemma 3.. Let. Suose that 0 < h /4, the oe has the followg uer boud d 2 W ˆν h, ν 3h 2 + 4Ψ / h,. 3.6 Furthermore, f there exst costats C > 0 ad α 5 satsfyg the ψx Cx α, for all suffcetly large x, 3.7 d 2 W ˆν h, ν C ψ h 2, for h small eough ad some costat C ψ > 0 deedg oly o ψ. Proof. The uer boud 3.6 follows mmedately from Lemma PZ6 ad the symmetry of ψ. The, by alyg equalty 3.7 ad sce ψ s symmetrc, t follows that for h small eough Ψ / h = / h ψxdx = + / h ψxdx C 5 + / h x α dx = C α hα /2.

Hece, the secod art of Lemma 3. s a cosequece of the above equalty, the fact that α 5, ad the uer boud 3.6, whch comletes the roof. The result below gves a rate of covergece for the estmator ˆν h,. Theorem 3.3. Suose that Assumtos 2., 2.2 ad 2.3 are satsfed, ad that the desty ψ, used to defe kerel smoothg 3.5, satsfes equalty 3.7. If J 2 ν has a fte exectato, ad the badwdth arameters h are small eough, the we have E d W ˆν h, ν 0 /2 Varν + C /2 ψ, h + 2E J 2 ν /2. 3.8 Theorem 3.3 ca the be used to dscuss choces of badwdth arameters that may lead to a arametrc rate of covergece. For examle, f 0 < h /2 for all ad m dese case, the Theorem 3.3 mles that for suffcetly large to esure that max {h } s small eough E d W ˆν h /2,, ν 0 Varν + C ψ + 2E J 2 ν /2. 3.9 Remark 3.5. I the dese case amely m t ca be see, by comarg the uer bouds 3.3 ad 3.9, that a relmary smoothg ste of the data amely kerel smoothg the emrcal measures ν = δ X,j does ot mrove the arametrc rate of covergece /2. Moreover, the badwdth values have to be small to esure the rate of covergece /2 for E d W ˆν h,, ν 0. Ths result comes from the fact that we evaluate the rsk of emrcal baryceters at the level of measures W 2 Ω, ad that we do ot am to cotrol a estmato of the desty f 0 of the oulato mea measure ν 0. Remark 3.6. Theorem 3.3 shares smlartes wth the results from Theorem 2 PZ6, whch gves the rate of covergece for smoothed Wasserste baryceters, comuted from the realzatos of multle Posso rocesses a deformable model of measures smlar to that of ths aer. The ma dfferece PZ6 s that the umber of observatos for each exermetal ut are deedet Posso radom varables wth exectato E = τ for each they are ot determstc tegers. From such observatos ad uder smlar assumtos, t s roved PZ6 that the followg uer boud holds robablty h d W ˆν h, ν 0 O P, + O P + O P 4 τ. 3.20 The uer boud 3.20 s very smlar to 3.8, roosed ths aer. essetally slt the dstace d W ˆν h,, ν 0 to three terms: Both aroaches. a arametrc term of the order comg from the observato of a samle of sze of the radom measure ν, 2. a term volvg kerel smoothg ad 6

3. a term volvg the Wasserste dstace betwee ν 0 ad ts emrcal couterart µ = k= δ Y k, wth Y,..., Y d ν 0 -dstrbuted radom varables. Uder the codtos that τ O 2 ad max h O P /2, t follows from Theorem 2 PZ6 that ˆν h coverges at the arametrc rate O /2, for the Wasserste dstace. The quatty τ, reresets the averaged umber of ots observed for each Posso rocess. As remarked PZ6 the codto τ O 2 corresods to a dese samlg regme where the umber of observed Posso rocesses should ot grow too fast wth resect to the exected umber of ots observed for each rocess. Comarg the uer bouds 3.8 ad 3.20, the ma dfferece the cotrol of the rsk of ˆν h betwee our aroach ad the oe, PZ6 s that we use the codto E J 2 ν <. Uder such a assumto, the smoothed Wasserste baryceter for the model cosdered ths aer may be show to coverge at the rate O /2, for the exected Wasserste dstace, uder the dese case settg := m{, } whch s somehow a weaker codto tha E 2 for all, as PZ6. Therefore, t mght be argued that there s a sort of otmalty ga PZ6, ad that the results ths aer are a frst ste towards closg ths ga. But, o the other had, the result PZ6 s more geeral because othg s assumed about the fteess of the fuctoal J 2. I artcular, the uer boud 3.20 gve PZ6 also holds whe J 2 s fte, whch s ot the case for the uer boud 3.8 ths aer. 3.4 A lower boud o the mmax rsk I the rest of ths secto, we show that, the dese case ad for the exected squared Wasserste dstace, the rate of covergece O for o-smoothed emrcal Wasserste baryceters s otmal from the mmax ot of vew over a large class of radom measures ν satsfyg the deformable model defed Secto 2.3 through Assumtos 2., 2.2 ad 2.3. Defto 3.. For ν 0 W ac 2 Ω ad σ > 0, we defe DΩ, ν 0, σ 2 as the class of W 2 Ω-valued radom measures ν that satsfy the deformable model defed Secto 2.3 wth Varν < σ 2. Defto 3.2. Let A > 0. We deote by FR, A W2 ac R a gve set of measures wth varace bouded by A, whch cotas at least all Gaussa dstrbutos wth varace bouded by A. The, by equalty 3.9, we obta the followg corollary gvg a uform rate of covergece for the o-smoothed emrcal baryceter the case of samles of equal sze. Corollary 3.3. Let A > 0 ad σ > 0. Suose that = 2 =... = =. The, f there exsts a costat c 0 > 0 such that su E d 2 W µ, ν 0 c 0 ν 0 FR,A, 3.2 t follows that su su ν 0 FR,A ν DR,ν 0,σ 2 E d 2 W ˆν,, ν 0 σ2 + c 0. 3.22 7

The codto 3.2 may be terreted as the geeralzato of the dese case settg that has bee dscussed the revous sectos as t s vald oly f s suffcetly large wth resect to. As a examle, let A 0 ad suose that the set FR, A ca be arttoed as FR, A = F 0 R, A GR, A, where F 0 R, A deotes a set of measures ν 0 W2 ac R wth varace bouded by A satsfyg A 0 := su J 2 ν 0 < +, ν 0 F 0 R,A whle GR, A deotes the set of Gaussa dstrbutos wth varace bouded by A. For ths examle, t follows from equaltes 3.4, 3.7 ad 3.0 Secto 3. wth samles of equal sze that su E d 2 W µ, ν 0 max ν 0 FR,A A 0 2 +, c 2A log log max2a 0, c 2 A log log, rovded that log log, where c 2 s a costat from equalty 3.7. Hece, f s such that log log the codto 3.2 s satsfed wth c 0 = max2a 0, c 2 A = max 2 su J 2 ν 0, c 2 A ν 0 F 0 R,A The followg theorem shows that the uer boud 3.22 Corollary 3.3 s otmal term of rate of covergece from the mmax ot of vew oarametrc statstcs. Theorem 3.4. Let A > 0 ad σ > 0. The the followg lower boud holds f ˆν su su ν 0 FR,A ν DR,ν 0,σ 2 E d W ˆν, ν 0 e 2 ma /2, σ /2, 3.23 4 where ˆν = ˆν X,j ; j deotes ay estmator takg values W 2 R, B W 2 R wth ˆν deotg a measurable fucto of the data X,j ; j samled from the deformable model defed Secto 2.3. Now, by usg equaltes 3.3 ad 3.9 ad Deftos 3. ad 3.2 troduced above, we also obta the followg corollary gvg uform rates of covergece for the o-smoothed Wasserste baryceter the geeral stuato where the s are ossbly dfferet. Corollary 3.4. Let A > 0 ad σ > 0. Suose that the assumtos of Corollary 3.2 are satsfed, ad that, for all. The, the followg uer boud holds su su E d W ˆν, ν 0 σ /2 + 2 su su E J2 ν., ν 0 FR,A ν DR,ν 0,σ 2 ν 0 FR,A ν DR,ν 0,σ 2. 8

Hece, uder the assumtos made Corollary 3.4, the estmator ˆν coverges at the, otmal rate of covergece /2 rovded that su su ν 0 FR,A ν DR,ν 0,σ 2 E J 2 ν < +. We coclude ths dscusso by a few remarks o the rate of covergece that may be obtaed the sarse case. Remark 3.7. I the case of samles of equal sze, the results above show that the rate of covergece s otmal the dese case for the rsk E d 2 W ˆν,, ν 0, amely whe the umber = =... = of observatos er uts s suffcetly large wth resect to. We beleve that dervg a lower boud o the mmax rsk deedg o the sarse case e.g. whe < s more volved. Ideed, from the dscusso Secto 3. o the rate of covergece of the o-smoothed emrcal baryceter, t aears that the exact decay of E d 2 W ˆν,, ν 0 as a fucto of s dffcult to establsh as t deeds o ν 0. Ideed, from Secto 3., oe has that - f ν 0 s the uform dstrbuto o 0,, the E d 2 W ˆν,, ν 0 + + 2, - f ν 0 s the oe-sded exoetal dstrbuto, the E d 2 W ˆν,, ν 0 = O + log + - f ν 0 s the stadard Gaussa dstrbuto, the E d 2 W ˆν,, ν 0 = O + log log + - f ν 0 s such that J 2 ν 0 < +, the E d 2 W ˆν,, ν 0 = O + From Theorem 3., oe has that the rsk of the o-smoothed emrcal baryceter may be bouded from below as follows su su E d 2 W ˆν,, ν 0 j/ su E Y j F 0 α 2 dα. 3.24 ν 0 FR,A ν DR,ν 0,σ 2 The quatty j/ j / E Y j ν 0 FR,A j / F 0 α 2 dα may be terreted as a bas term whe estmatg the ukow measure by the oarametrc estmator µ = δ Y j. Therefore, for samles of equal sze ad the sarse case whe <, the lower boud 3.24 may be used to cotrol as a fucto of the best rate of covergece for ˆν, that may be obtaed over the class of measures ν 0 FR, A. Remark 3.8. Fally, we remark that better rates of covergece may be obtaed f oe assumes a arametrc model for the radom measure ν. Ideed, suose that µ 0 W2 ac Ω deotes a kow robablty measure wth exectato m 0 ad varace σ0 2 ad cosder that the data X,j ; j are samled from d radom measures ν,..., ν satsfyg the locato model F ν α = F µ 0 α + a, α 0,,, 3.25 where a,..., a are d radom varables wth ukow exectato ā ad varace γ 2. I ths model, the oulato Wasserste baryceter s the measure ν 0 wth quatle fucto 9.,,

F ν 0 = F µ 0 + ā. Sce, the measure µ 0 s assumed to be kow, a atural estmator for ν 0 s to take the measure ˆν 0 wth quatle fucto F ˆν 0 = F µ 0 + â, wth â = X j m 0. The, t s clear that E d 2 W ˆν 0, ν 0 = 0 = σ2 0 + γ2 2 E F ˆν 0 α Fν 0 α dα = E â ā 2 + γ2. I the case where all the s are equal to, the the above equalty smlfes to E d 2 W ˆν 0, ν 0 = σ2 0 + γ2 + γ2, ad thus the arametrc estmator ˆν 0 coverges at the rate O +. Therefore, ether the dese or sarse case <, the arametrc estmator ˆν 0 coverges at the rate O the locato model 3.25 whe the referece measure µ0 s kow. Moreover, the sarse case < the arametrc estmator ˆν 0 coverges faster tha the o-smoothed emrcal Wasserste baryceter ˆν,, thaks to the results Secto 3.. 4 Numercal exermets I ths smulato study we erform Mote Carlo exermets to comare the decay of the squared Wasserte rsks E d 2 W ˆνh,, ν 0 ad E d 2 W ˆν,, ν 0, of the smoothed ad o-smoothed emrcal Wasserste baryceters ˆν h ad ˆν,, as a fucto of the umber of uts ad the samle sze. The theoretcal results, ths aer dcate that, the dese case, both estmators coverge at the otmal arametrc rate O. However, the sarse case t remas uclear f a relmary smoothg ste may mrove the qualty of estmato of the oulato Wasserste baryceter. The urose of these umercal exermets s thus to comare the behavor of smoothed ad o-smoothed emrcal Wasserste baryceters, these two settgs, ad aalyze the fluece of the umber of measures ad the samle sze. We aalyze the case of radom samles X,j ; j, wth 0 200 ad 0 200. Data are geerated from destes suorted o a comact terval Ω that are samled from the followg model, accoutg for vertcal ad horzotal varatos f x = a f a x b, x Ω,, 4. where f s ether the desty of the stadard Gaussa law trucated to the terval 3, 3 or the uform desty o the terval 0,, a U0.8,.2, b U 2, 2. Ths settg 20

corresods to the the smulato study coducted PM6. For each choce of f ether the Gaussa or Uform case, the terval Ω s take such that each radom fucto f has a comact suort cluded Ω. Therefore, the oulato Wasserste baryceter model 4. s the measure wth desty f thaks to the fact that Eb = 0 ad Ea =. The Gaussa case res. Uform case corresods to the estmato of a Wasserste baryceter havg smooth res. o-dfferetable desty f. For gve values of ad, we evaluate the Wasserste rsk of ˆν h by reeatg M = 00, tmes the followg exermet. Frst, data are smulated from model 4.. The, for each, we use kerel smoothg to comute the desty ˆf h ad ts assocated measure ˆν h. We slghtly devate from the aalyss carred out Secto 3, as we use a Gaussa kerel to smooth the data X,j j, wth badwdth h chose by cross valdato, stead of the secfc kerel defed 3.4, that has bee roosed for the covergece aalyss of ˆν h,. We foud that ths modfcato has o substatal effect o the fte samle erformace of the rocedure, ad a smlar choce has bee made the umercal exermets PZ6. I Fgure 2a res. Fgure 3a we dslay a examle of destes estmated from realzatos of the model 4. wth = = 00 ad f the trucated Gaussa desty res. f the Uform desty. After comutg the quatle fucto F ˆν of the emrcal smoothed Wasserste h, baryceter ˆν h, we aroxmate d, 2 W ˆνh, ν 0 = 0, F ˆν F, α h ν 0 α 2 dα by dscretzg the tegral over a fe grd of values for α 0,. Ths aroxmated value of d 2 W ˆνh,, ν 0 s the averaged over the M = 00 reeated exermets to aroxmate E d 2 W ˆνh,, ν 0. Thaks to the exlct exresso 2.5 of the o-smoothed emrcal Wasserste baryceter ˆν,, ts quatle fucto F ˆν, s straghtforward to comute o a grd of values for α, ad the Wasserste rsk E d 2 W ˆν,, ν 0 s the aroxmated the same way by usg Mote Carlo reettos. For values of ad ragg from 0 to 200, we dslay Fgure 2c ad 2d res. Fgure 3c ad 3d these aroxmatos of E d 2 W ˆνh,, ν 0 ad E d 2 W ˆν,, ν 0 logarthmc scale for f the trucated Gaussa desty res. f the Uform desty. For both estmators, t aears that the Wasserste rsk s clearly a decreasg fucto of the umber of uts. To the cotrary, creasg does ot lead to a sgfcat decay of ths rsk. Ths suggest that Varν s the most sgfcat term the uer boud 3.2 of the Wasserste rsk of ˆν,. I Fgure 2b ad Fgure 3b, we also dslay the logarthm of the rato E d 2 W ˆν,, ν 0 /E d 2 W ˆν h,, ν 0. It ca be observed that: - Whe the oulato Wasserste baryceter has a smooth desty f Gaussa case the, for values of larger tha 00, both estmators smoothed ad o-smoothed emrcal Wasserste baryceters aear to have squared Wasserste rsks of aroxmately the same magtude. Ths teds to cofrm the results o covergece rates obtaed Secto 3, the dese case whe s suffcetly large wth resect to, whch show that a relmary smoothg s ot ecessary ths settg. For smaller values of betwee 2

0 ad 50, the smoothed emrcal Wasserste baryceter has a smaller Wasserste rsk. Ths suggests that troducg a smoothg ste through kerel smoothg of the data, each exermetal ut, may mrove the qualty of the estmato of ν 0, whe the samle sze s small whch corresods to the sarse case, the settg where the oulato Wasserste baryceter s a dstrbuto wth a smooth desty. I ths examle, Gaussa kerel smoothg s artcularly well suted, whch may exla the better erformaces obtaed wth a smoothed emrcal Wasserste baryceter the sarse case. - Whe the oulato Wasserste baryceter has a o-smooth desty f Uform case the, excet for very small values of 0, the o-smoothed emrcal Wasserste baryceter has always a lower squared Wasserste rsk, both the sarse ad dese cases. A relmary smoothg wth a Gaussa kerel does ot mrove the estmato of a Wasserste baryceter havg ecewse costat desty ad, ths settg, such a ste s thus ot ecessary. 5 Cocluso ad ersectves I ths aer we have studed the rate of covergece for the squared Wasserste dstace of ossbly smoothed emrcal baryceters a deformable model of measures. The ma cotrbutos of ths work ca be summarzed as follows. I the case of samles of equal sze, we have derved a closed-form exresso for the rsk of o-smooth emrcal baryceter, as a fucto of ad, whch allows to derve shar rates of covergece whose decay deeds o the oulato mea measure ν 0. A secod cocluso of the aer s that, the dese case whe the mmal umber m of observatos er ut s suffcetly large wth resect to the umber of observed measures, the o-smoothed emrcal baryceter coverges at the arametrc rate of covergece. Moreover, ths rate s show to be a lower boud o the decay of a ovel oto of mmax rsk, the deformable model of measures troduced ths aer. I the dese case, the umercal exermets that have bee carred out are agreemet wth the theoretcal results whch show that, ths settg, oe may oly cosder the o-smoothed emrcal Wasserste baryceter ad that a relmary smoothg ste s ot ecessary to obta a otmal estmator. A frst ersectve would be to fd a lower boud o the mmax rsk deedg o the sarse case. However, to ths ed, we beleve that oe has to frst obta sharer rates of covergece as a fucto of, for the o-smoothed emrcal baryceter. Fally, a atural ersectve s to ask how these results ca be exteded to hgher dmesoal settgs for measures suorted o R d, wth d >. However, we beleve that ths s far from beg obvous as the results ths aer rely heavly o the closed-form formula of Wasserste baryceters, the oe-dmesoal settg though quatle averagg. Such results do ot hold hgher-dmeso, for data sets cosstg of d radom vectors, samled from ukow radom measures suorted o R 2 or R 3, for examle. 22

0.0 0. 0.2 0.3 0.4 0.5 0.6 50 00 50 200.5.0 0.5 0.0 6 4 2 0 2 4 6 a A examle of estmated destes. 50 00 50 200 b log E d 2 W ˆν,, ν 0 /E d 2 W ˆν h,, ν 0. 50 00 50 200 0.5 0.0 0.5.0.5 50 00 50 200.0 0.5 0.0 0.5.0.5 2.0 2.0 50 00 50 200 c Log-Wasserste rsk of ˆν h,. 50 00 50 200 d Log-Wasserste rsk of ˆν,. Fgure 2: Gaussa case - a A examle of = 00 destes estmated from data samled from model 4. wth the choce of a stadard Gaussa desty f trucated to the terval 3, 3 ad = = 00, b Logarthm of the rato Ed 2 W ˆν,, ν 0 /E d 2 W ˆνh,, ν 0, c Wasserste rsk of the smoothed emrcal baryceter ˆν h, wth kerel badwdths chose by cross-valdato, c Wasserste rsk of the o-smoothed emrcal baryceter ˆν,. The values of ad vary from 0 to 200 by a cremet of 0. 23

thatf 0.0 0.5.0.5 50 00 50 200 0.0 0.05 0.00 0.05 0.0 0.5 3 2 0 2 3 4 tme a A examle of estmated destes. 50 00 50 200 b log E d 2 W ˆν,, ν 0 /E d 2 W ˆν h,, ν 0. 50 00 50 200 0.0 0.5.0.5 2.0 2.5 50 00 50 200 0.0 0.5.0.5 2.0 2.5 3.0 3.0 50 00 50 200 c Log-Wasserste rsk of ˆν h,. 50 00 50 200 d Log-Wasserste rsk of ˆν,. Fgure 3: Uform case - a A examle of = 00 destes estmated from data samled from model 4. wth the choce of a uform desty f o the terval 0, ad = = 00, b Logarthm of the rato Ed 2 W ˆν,, ν 0 /E d 2 W ˆνh,, ν 0, c Wasserste rsk of the smoothed emrcal baryceter ˆν h, wth kerel badwdths chose by cross-valdato, c Wasserste rsk of the o-smoothed emrcal baryceter ˆν,. The values of ad vary from 0 to 200 by a cremet of 0. 24

A Aedx A. Auxlary results We recall that Y,..., Y deote d radom varables samled from the measure ν 0 deedetly of the data, ad that the assocated emrcal measure s µ = δ Y j. By Corollary 4.5 BL7, t follows that E d 2 W µ, ν 0 = Var Y j + j/ j / E Y j F 0 α 2 dα, A. where Y Y 2... Y deote the order statstcs of the samle Y,..., Y. It s well kow that the j-th order statstc Yj admts the desty see e.g. BL7 f Y j y =! j! j! f 0yF 0 y j F 0 y j, y Ω. A.2 Moreover, uder Assumto 2.2, oe has, codtoally o F, that the j-th order statstc X,j admts the desty f X,j x =! j! j! f xf x j F x j, x Ω. A.3 Let us recall the otato X j = X,j. The the followg result holds. Lemma A.. If Assumtos 2., 2.2 ad 2.3 are satsfed, the, for each j, oe has E X j = E Y j. Moreover, Var X j Var Y j = Varν + Var Yj. Proof. Let j ad. Thaks to the exresso A.3 for the desty of X,j, oe has that E X,j F! = x j! j! f xf x j F x j dx = Ω 0 F α! j! j! αj α j dα, where we used the chage of varable α = F x to obta the last equalty. By Proosto 2., 25

E F = F 0 α for each. Therefore, usg Fub s theorem, t follows that E X,j = E E X,j F = E = = = 0 0 Ω 0 F α! j! j! αj α j dα E F α! j! j! αj α j dα F0 α! j! j! αj α j dα! y j! j! f 0yF 0 y j F 0 y j dy = E Yj, A.4 where we used the chage of varable y = F0 α ad A.2 to obta the last equalty above. Gve that E X j = E X,j, the frst statemet of Lemma A. follows from equalty A.4. Now, let us rove the secod statemet of Lemma A.. Thaks to formula A.3 for the desty of X,j, oe has that, for each j ad, E X,j 2 = E E X,j 2 F = E = 0 x 2! Ω E j! j! f xf x j F x j dx F α 2! j! j! αj α j dα, A.5 where, aga, we use the chage of varable α = F x, ad Fub s theorem to obta the last equalty. Smlarly, from A.2 t follows that, for each j, E Yj 2 = y 2! j! j! f 0yF 0 y j F 0 y j dy = Ω 0 F0 α 2! j! j! αj α j dα. A.6 Sce X j = X,j, we obta by deedece that Var X j = 2 Var X,j = 2 E X,j 2 E Xj 2. Hece, usg equaltes A.4, A.5 ad A.6, ad the fact that E F α 2 = E F α 2 26

for each, we obta Var X j = E F α 2! 0 = E F α 2! 0 = E 0 = 0 j! j! αj α j dα E Yj j! j! αj α j dα + Var Yj 2 E Y j 2 F α 2 F 0 α 2! j! j! αj α j dα + Var Yj Var F α! j! j! αj α j dα + Var Yj where, for the last equaltes, we used that E F = F0 the above equalty, oe fally obtas Var X j Var Y j = Varν + whch comletes the roof of Lemma A.. A.2 Proof of Theorem 3., by Proosto 2.. Therefore, from Var Yj, By Defto 2. of the Wasserste dstace, ad sce ˆν, = δ X j, t follows by usg Fub s theorem that E d 2 W ˆν,, ν 0 2 = E F ˆν, α F0 α dα 0 j/ = E X j F0 α 2 dα = = = j / j/ j / j/ j / E X j F 0 α 2 dα E X j E X j 2 + E X j F 0 α 2 dα Var X j + j/ j / E X j F 0 α 2 dα. A.7 27

From Lemma A., oe has that E X j = E we obta E d 2 W ˆν,, ν 0 = = Y j Var X j +. Therefore, by combg A.7 wth A., j/ j / E Y j F 0 α 2 dα Var X j Var Y j + E d 2 W µ, ν 0 = Varν + = Varν + Var Yj + E d 2 W µ, ν 0 A.8 Var Y j + j/ j / E Y j F 0 α 2 dα, where the last equaltes also follow from Lemma A. ad A., whch comletes the roof of Theorem 3.. A.3 Proof of Theorem 3.2 We recall that ν deotes the measure wth quatle fucto gve by equato 2.. By the tragle equalty, we have that d W ˆν, ν 0 d W ˆν,, ν + d W ν, ν 0. A.9, Thaks to Defto 2. of the Wasserste dstace, t follows by Fub s theorem that E d 2 W ν, ν 0 = 0 2 E F α F0 α dα = 0 E 2 F α F 0 α dα. By Assumto 2.3, oe has that E F α = F0 α for ay, ad thus, by deedece of the radom varables F α, oe obtas E d 2 W ν, ν 0 = Varν. Hece, by A.0 ad the equalty E d W ν, ν 0 E d W ν, ν 0 /2 Varν. E d 2 W ν, ν 0, oe obtas A.0 A. Now, let us remark that d W ˆν,, ν = F ν F F ν F, 28