On the estimation of the mean of a random vector

Size: px

Start display at page:

Download "On the estimation of the mean of a random vector"

Lucas Kristian Paul
6 years ago
Views:

1 O the estimatio of the mea of a radom vector Emilie Joly Uiversit Paris Ouest Naterre, Frace; emilie.joly@u-paris10.fr Gábor Lugosi ICREA ad Departmet of Ecoomics, Pompeu Fabra Uiversity, Barceloa, Spai; gabor.lugosi@upf.edu Roberto Imbuzeiro Oliveira IMPA, Rio de Jaeiro, RJ, Brazil; rimfo@impa.br July 8, 2016 Abstract We study the problem of estimatig the mea of a multivariate distributio based o idepedet samples. The mai result is the proof of existece of a estimator with a o-asymptotic sub-gaussia performace for all distributios satisfyig some mild momet assumptios. 1 Itroductio Let X be a radom vector takig values i R d. We assume throughout the paper that the mea vector µ = EX ad covariace matrix Σ = (X µ)(x µ) T exist. Give idepedet, idetically distributed samples X 1,..., X draw from the distributio of X, oe wishes to estimate the mea vector. A atural ad popular choice is the sample mea (1/) X i that is kow to have a ear-optimal behavior wheever the distributio is sufficietly light tailed. However, Supported by the Frech Agece Natioale de la Recherche (ANR), uder grat ANR-13-BS (project SPADRO). Supported by the Spaish Miistry of Ecoomy ad Competitiveess, Grat MTM P ad FEDER, EU. Support from CNPq, Brazil via Ciêcia sem Froteiras grat # / Supported by a Bolsa de Produtividade em Pesquisa from CNPq, Brazil. Supported by FAPESP Ceter for Neuromathematics (grat# 2013/ , FAPESP - S. Paulo Research Foudatio). 1

2 wheever heavy tails are a cocer, the sample mea is to be avoided as it may have a suboptimal performace. While the oe-dimesioal case (i.e., d = 1) is quite well uderstood (see [3], [5]), various aspects of the multidimesioal problem are still to be revealed. This paper aims at cotributig to the uderstadig of the multi-dimesioal case. Before statig the mai results, we briefly survey properties of some mea estimators of real-valued radom variables. Some of these techiques serve as basic buildig blocks for the estimators we propose for the vector-valued case. 1.1 Estimatig the mea of a real-valued radom variable Whe d = 1, the simplest ad most popular mea estimator is the sample mea µ = (1/) X i. The sample mea is ubiased ad the cetral limit theorem guaratees a asymptotically Gaussia distributio. However, uless the distributio of X has a light (e.g., sub-gaussia) tail, there are o o-asymptotic sub-gaussia performace guaratees for µ. We refer the reader to Catoi [3] for details. However, perhaps surprisigly, there exist estimators of µ with much better cocetratio properties, see Catoi [3] ad Devroye, Lerasle, Lugosi, ad Oliveira [5]. A coceptually simple ad quite powerful estimator is the so-called media-of-meas estimator that has bee proposed, i differet forms, i various papers, see Nemirovsky ad Yudi [14], Hsu [8], Jerrum, Valiat, ad Vazirai [10], Alo, Matias, ad Szegedy [1]. The media-of-meas estimator is defied as follows. Give a positive iteger b ad x 1,..., x b R, let q 1/2 deote the media of these umbers, that is, q 1/2 (x 1,..., x b ) = x i, where #{k [b] : x k x i } b 2 ad #{k [b] : x k x i } b 2. (If several i fit the above descriptio, we take the smallest oe.) For ay fixed δ [e 1 /2, 1), first choose b = l(1/δ) ad ote that b /2 holds. Next, partitio [] = {1,..., } ito b blocks B 1,..., B b, each of size B i /b 2. Give X 1,..., X, we compute the sample mea i each block Y i = 1 X j B i j B i ad defie the media-of-meas estimator by µ (δ) e.g., Hsu [8]) that for ay 4, { P µ (δ) = q 1/2 (Y 1,..., Y B ). Oe ca show (see, µ > 2e } (1 + l(1/δ)) 2Var(X) δ, (1) where Var(X) deotes the variace of X. Note that the media-of-meas estimator µ (δ) does ot require ay kowledge of the variace of X. However, it depeds o the desired cofidece level δ ad the partitio B 1,..., B b. Ay partitio satisfyig i, B i /b is valid i order to get (1). Hece, 2

3 we do ot keep the depedece o the partitio B 1,..., B b i the otatio µ (δ). Devroye, Lerasle, Lugosi, ad Oliveira [5] itroduce estimators that work for a large rage of cofidece levels uder some mild assumptios. Catoi [3] itroduces estimators of quite differet flavor ad gets a o-asymptotic result of the same form as (1). Bubeck, Cesa-Biachi ad Lugosi [2] apply these estimators i the cotext of badit problems. 1.2 Estimatig the mea of radom vectors Cosider ow the multi-dimesioal case whe d > 1. The sample mea µ = (1/) X i is still a obvious choice for estimatig the mea vector µ. If X has a multivariate ormal distributio with mea vector µ ad covariace matrix Σ, the µ is also multivariate ormal with mea µ ad covariace matrix (1/)Σ ad therefore, for δ (0, 1), with probability at least 1 δ, Tr(Σ) 2λmax log(1/δ) µ µ +, (2) where Tr(Σ) ad λ max deote the trace ad largest eigevalue of the covariace matrix, respectively (Haso ad Wright [7]). For o-gaussia ad possibly heavy-tailed distributios, oe caot expect such a sub-gaussia behavior of the sample mea. The mai goal of this paper is to ivestigate uder what coditios it is possible to defie mea estimators that reproduce a (o-asymptotic) sub-gaussia performace similar to (2). Lerasle ad Oliveira [11], Hsu ad Sabato [9], ad Misker [13] exted the media-ofmeas estimator to more geeral spaces. I particular, Misker s results imply that for each δ (0, 1) there exists a mea estimator µ (δ) ad a uiversal costat C such that, with probability at least 1 δ, µ (δ) µ C Tr(Σ) log(1/δ). (3) While this boud is quite remarkable ote that o assumptio other tha the existece of the covariace matrix is made, it does ot quite achieve a sub-gaussia performace boud that resembles (2). A istructive example is whe all eigevalues are idetical ad equal to λ max. If the dimesio d is large, (2) is of the order of (λ max /)(d + log(δ 1 )) while (3) gives the order (λ max /)(d log(δ 1 )). The mai result of this paper is the costructio of a mea estimator that, uder some mild momet assumptios, achieves a sub-gaussia performace boud i the sese of (2). More precisely, we prove the followig. Theorem 1 For all δ (0, 1) there exists a mea estimator µ (δ) ad a uiversal costat C such that if X 1,..., X are i.i.d. radom vectors i R d with mea µ R d ad covariace matrix Σ such that there exists a costat K > 0 such that, for all v R d with v = 1, E [ ((X µ) T v ) 4 ] K(v T Σv) 2, 3

4 the for all CK log d (d + log(1/δ)), ( ) Tr(Σ) µ (δ) λmax log(log d/δ) µ C +. The theorem guaratees the existece of a mea estimator whose performace matches the sub-gaussia boud (2), up to the additioal term of the order of (1/)λ max log log d for all distributios satisfyig the fourth-momet assumptio give above. The additioal term is clearly of mior importace. (For example, it is domiated by the first term wheever Tr(Σ) > λ max log log d.) With the estimator we costruct, this term is ievitable. O the other had, the iequality of the theorem oly holds for sample sizes that are at least a costat times d log d. This feature is ot desirable for truly high-dimesioal problems, especially takig ito accout that Misker s boud is dimesio-free. The fourth-momet assumptio ca be iterpreted as a boudedess assumptio of the kurtosis of (X µ) T v. The same assumptio has be used i Catoi [4] ad Giulii [6] for the robust estimatio of the Gram matrix. The fourth-momet assumptio may be weakeed to a aalogous (2 + ε)-th momet assumptio that we do ot detail for the clarity of the expositio. We prove the theorem by costructig a estimator i several steps. First we costruct a estimator that performs well for spherical distributios (i.e., for distributios whose covariace matrix has a trace comparable to dλ max ). This estimator is described i Sectio 2. I the secod step, we decompose the space i a data-depedet way ito the orthogoal sum of O(log d) subspaces such that all but oe subspaces are such that the projectio of X to the subspace has a spherical distributio. The last subspace is such that the projectio has a covariace matrix with a small trace. I each subspace we apply the first estimator ad combie them to obtai the fial estimator µ (δ). The proof below provides a explicit value of the costat C, though o attempt has bee made to optimize its value. The costructed estimator is computatioally so demadig that eve for moderate values of d it is hopeless to compute it i reasoable time. I this sese, Theorem 1 should be regarded as a existece result. It is a iterestig a importat challege to costruct estimators with similar statistical performace that ca be computed i polyomial time (as a fuctio of ad d). Note that the estimator of Misker cited above may be computed by solvig a covex optimizatio problem, makig it computatioally feasible, see also Hsu ad Sabato [9] for further computatioal cosideratios. 2 A estimator for spherical distributios I this sectio we costruct a estimator that works well wheever the distributio of X is sufficietly spherical i the sese that a positive fractio of the eigevalues of the covariace matrix is of the same order as λ max. More precisely, for c 1, we call a distributio c-spherical if dλ max ctr(σ). 4

5 For each δ (0, 1) ad uit vector w S d 1 (where S d 1 = {x R d : x = 1}), we may defie m (δ) (w) as the media-of-meas estimate (as defied i Sectio 1.1) of w T µ = Ew T X based o the i.i.d. sample w T X 1,..., w T X. Let N 1/2 S d 1 be a miimal 1/2-cover, that is, a set of smallest cardiality that has the property that for all u S d 1 there exists w N 1/2 with u x 1/2. It is well kow (see, e.g., [12, Lemma ]) that N 1/2 8 d. Notig that Var(w T X) λ max, by (1) ad the uio boud, we have that, with probability at least 1 δ, m sup (δ/8d ) (w) w T l(e8 µ 2e 2λ d /δ) max. w N 1/2 I other words, if, for λ > 0, we defie the empirical polytope { } P δ,λ = x R d m : sup (δ/8d ) (w) w T x 2e 2λ l(e8d /δ), w N 1/2 the with probability at least 1 δ, µ P δ,λmax. I particular, o this evet, P δ,λmax is oempty. Suppose that a upper boud of the largest eigevalue of the covariace matrix λ λ max is available. The we may defie the mea estimator { ay elemet y,λ = Pδ,λ if P δ,λ 0 otherwise µ (δ) Now suppose that µ P δ,λ ad let y P δ,λ be arbitrary. Defie u = (y µ)/ y µ S d 1, ad let w N 1/2 be such that w u 1/2. (Such a w exists by defiitio of N 1/2.) The y µ = u T (y µ) = (u w) T (y µ) + w T (y µ) (1/2) y µ + 4e 2λ l(e8d /δ), where we used Cauchy-Schwarz ad the fact that y, µ P δ,λ. Rearragig, we obtai that, o the evet that µ P δ,λ, µ (δ),λ µ d l 8 + l(e/δ) 8e 2λ, provided that λ λ max. Summarizig, we have proved the followig. Propositio 1 Let λ > 0 ad δ (0, 1). For ay distributio with mea µ ad covariace matrix Σ such that λ max = Σ λ, the estimator µ (δ),λ defied above satisfies, with probability at least 1 δ, µ (δ),λ µ d l 8 + l(e/δ) 8e 2λ. I particular, if the distributio is c-spherical ad λ 2λ max, the µ (δ) ctr(σ) l 8 +,λ µ λmax l(e/δ) 16e. 5.

6 The boud we obtaied has the same sub-gaussia form as (2), up to a multiplicative costat, wheever the distributio is c-spherical. To make the estimator fully datadepedet, we eed to fid a estimate λ that falls i the iterval [λ max, 2λ max ], with high probability. This may be achieved by splittig the sample i two parts of equal size (assumig is eve), estimatig λ max usig samples from oe part ad computig the mea estimate defied above usig the other part. I the ext sectio we describe such a method as a part of a more geeral procedure. 3 Empirical eigedecompositio I the previous sectio we preseted a mea estimate that works well for spherical distributios. We will use this estimator as a buildig block i the costructio of a estimator that has the desirable performace guaratee for distributios with ay covariace matrix. I additio to fiite covariaces, we assume that there exists a costat K > 0 such that, for all v R d with v = 1, E [ ((X µ) T v ) 4 ] K(v T Σv) 2. (4) I this sectio we assume that 2(400e) 2 K log 3/2 d ( d log 25 + log(2 log 3/2 d) + log(1/δ) ). The basic idea is the followig. We split the data ito two equal halves. We use the first half i order to decompose the space ito the sum of orthogoal subspaces such that the projectio of X ito each subspace is 4-spherical. The we may estimate the projected meas by the estimator of the previous sectio. Next we describe how we obtai a orthogoal decompositio of the space based o i.i.d. observatios X 1,..., X. Let s = log 3/2 d 2 ad m = /s. Divide the sample ito s blocks, each of size at least m. I what follows, we describe a way of sequetially decomposig R d ito the orthogoal sum of s + 1 subspaces R d = V 1 V s+1. First we costruct V 1 usig the first block X 1,..., X m of observatios. The we use the secod block to build V 2, ad so o, for s blocks. The key properties we eed are that (a) the radom vector X, projected to ay of these subspaces has a 4-spherical distributio; (b) the largest eigevalue of the covariace matrix of X, projected o V i is at most λ max (2/3) i 1. To this ed, just like i the previous sectio, let N γ S d 1 be a miimal γ-cover of the uit sphere S d 1 for a sufficietly small costat γ (0, 1). The value γ = 1/100 is sufficiet for our purposes ad i the sequel we assume this value. Note that N γ (4/γ) d (see [12, Lemma ] for a proof of this fact). Iitially, we use the first block X 1,..., X m. We may assume that m is eve. Usig these observatios, for each u N γ, we compute a estimate V m (δ) (u) of u T Σu = E(u T (X µ)) 2 = (1/2)E(u T (X X )) 2, where X is a i.i.d. copy of X. We may costruct the estimate by formig m/2 i.i.d. radom variables (1/2)(u T (X 1 X m/2+1 )) 2,..., (1/2)(u T (X m/2 X m )) 2 ad estimate their mea by the media-of-meas estimate V m (δ) (u) with parameter 6

7 δ/(s(4/γ) d ). The (1), together with assumptio (4) implies that, with probability at least 1 δ/s, u T Σu V m (δ) (u) K log(s(4/γ)d /δ) def. 4e = ε u T m. Σu m sup u N γ Our assumptios o the sample size guaratee that ε m < 1/100. The evet that the iequality above holds is deoted by E 1 so that P{E 1 } 1 δ/s. Let M δ,m be the set of all symmetric positive semidefiite d d matrices M satisfyig sup u N γ u T Mu V m (δ) (u) ε u T m. Σu By the argumet above, Σ M δ,m o the evet E 1. I particular, o E 1, M δ,m i oempty. Defie the estimated covariace matrix { ay elemet of Σ (δ) Mδ,m if M m = δ,m 0 otherwise Sice o E 1 both Σ (δ) m ad Σ are i M δ,m, o this evet, we have ( u T Σu ) 1 ε m 1 + ε m u T Σ(δ) m u ( u T Σu ) 1 + ε m 1 ε m for all u N γ. (5) Now compute the spectral decompositio Σ (δ) m = d λ i v i v i T, where λ 1 λ d 0 are the eigevalues ad v 1,..., v d the correspodig orthogoal eigevectors. Let u S d 1 be arbitrary ad let v be a poit i N γ with smallest distace to u. The u T Σ(δ) m u = v T Σ(δ) m v + 2(u v) T Σ(δ) m v + (u v) T Σ(δ) m (u v) v T Σ(δ) m v + λ 1 (2γ + γ 2 ) (6) (by Cauchy-Schwarz ad usig the fact that u v γ) (v T Σv) 1 + ε m + 3γ λ 1 1 ε m (by (5)) 1 + ε m λ max + 3γ λ 1. 1 ε m I particular, o E 1 we have λ 1 βλ max where β = 1+εm 1 ε m /(1 3γ) <

8 By a similar argumet, we have that for ay u S d 1, if v is the poit i N γ with smallest distace to u, the o E 1, u T Σu (v T Σ(δ) m v) 1 + ε m 1 ε m + 3γλ max 1 + ε m 1 ε m λ1 + 3γλ max. I particular, λ max β λ 1 (4/3) λ 1. Similarly, u T Σu (v T Σ(δ) m v) 1 ε m 3γ λ ε m ( ) 1 u T εm Σ(δ) m u 3γ λ 1 3γ λ ε m ( ) 1 u T εm Σ(δ) m u 6γ λ 1. (7) 1 + ε m Let d 1 be umber of eigevalues λ i that are at least λ 1 /2 ad let V 1 be the subspace of R d spaed by v 1,..., v d1. Deote by Π 1 (X) the orthogoal projectio of the radom variable X (idepedet of the X i used to build V 1 ) oto V 1. The for ay u V 1 S d 1, o the evet E 1, by (7), ad therefore ( ) u T Σu λ 1 1 εm 1 12γ λ ε m 3 Eu T (Π 1 (X) EΠ 1 (X))(Π 1 (X) EΠ 1 (X)) T u = u T Σu ) ( λ1 3, 4 λ 1 3 I particular, the ratio of the largest ad smallest eigevalues of the covariace matrix of Π 1 (X) is at most 4 ad therefore the distributio of Π 1 (X) is 4-spherical. O the other had, o the evet E 1, for ay uit vector u V1 S d 1 i the orthogoal complemet of V 1, we have u T Σu 2λ max /3. To see this, ote that u T Σ(δ) m u λ 1 /2 ad therefore, deotig by v the poit i N γ closest to u, ( u T Σu = u T Σ(δ) m u + v T Σ ) ( ) (δ) Σ m v + v T Σ(δ) m v u T Σ(δ) m u + ( u T Σu v T Σv ) λ ε mλ max + 3γ λ 1 + 3γλ max (by (5), (6), ad a similar argumet for the last term) λ max (β ( γ ) ) + 2ε m + 3γ 2λ max 3 I other words, the largest eigevalue of the covariace matrix of Π 1 (X) (the projectio of X to the subspace V 1 ) is at most (2/3)λ max. 8..

9 I the ext step we costruct the subspace V 2 V1. To this ed, we proceed exactly as i the first step but ow we replace R d by V1 ad the sample X 1,..., X m o the first block by the variables Π 1 (X m+1 ),..., Π 1 (X 2m ) V1. (Recall that Π 1 (X i ) is the projectio of X i to the subspace V1 ). Just like i the first step, with probability at least 1 δ/s we obtai a (possibly empty) subspace V 2, orthogoal to V 1 such that Π 2 (X), the projectio of X o V 2, has a 4-spherical distributio ad largest eigevalue of the covariace matrix of Π 2 (X) (the projectio of X to the subspace (V 1 V 2 ) ) is at most (2/3) 2 λ max. We repeat the procedure s times ad use a uio boud the s evets. We obtai, with probability at least 1 δ, a sequece of subspaces V 1,..., V s, with the followig properties: (i) V 1,..., V s are orthogoal subspaces. (ii) For each i = 1,..., s, Π i (X), the projectio of X o V i, has a 4-spherical distributio. (iii) The largest eigevalue of the covariace matrix of Π i (X) is at most λ (i) 1 (2/3) i 1 λ max. (iv) The largest eigevalue λ (i) 1 of the estimated covariace matrix of Π i (X) satisfies (3/4)λ (i) (i) 1 λ 1 1.1λ (i) 1. Note that it may happe for some T < s, we have R d = V 1 V T. I that case we defie V T +1 = = V s =. 4 Puttig it all together I this sectio we costruct our fial multivariate mea estimator ad prove Theorem 1. To simplify otatio, we assume that the sample size is 2. This oly effects the value of the uiversal costat C i the statemet of the theorem. The data is split ito two equal halves (X 1,..., X ) ad (X +1,..., X 2 ). The secod half is used to costruct the orthogoal spaces V 1,..., V s as described i the previous sectio. Let d 1,..., d s deote the dimesio of these subspaces. Recall that, with probability at least 1 δ, the costructio is successful i the sese that the subspaces satisfy properties (i) (iv) described at the ed of the previous sectio. Deote this evet by E. I the rest of the argumet we coditio o (X +1,..., X 2 ) ad assume that E occurs. All probabilities below are coditioal. If s d i < d (i.e., V 1 V s R d ), the we defie V s+1 = (V 1 V s ) ad deote by d s+1 = d s d i the dimesio of V s+1. Let Π 1,..., Π s+1 deote the projectio operators o the subspaces V 1,..., V s+1, respectively. For each i = 1,..., s + 1, we use the vectors Π i (X 1 ),..., Π i (X ) to compute a estimator of the mea E [Π i (X) (X +1,..., X 2 )] = Π i (µ). For i = 1,..., s, we use the estimator defied i Sectio 2. I particular, withi the d i -dimesioal space V i, we compute µ i = µ (δ/(s+1)). Note that sice λ,(4/3) λ i comes from i a empirical estimatio of Σ restricted to a empirical subspace V i, µ i is a estimator 9

10 costructed o the sample X 1,..., X. The, by Propositio 1, with probability 1 δ/(s+ 1), ( (i) (8/3) λ µ i Π i (µ) 2 1 di l 8 + l(e(2 log 3/2 d + 1)/δ)) 2 (8e). I the last subspace V s+1, we may use Misker s estimator, based o Π s+1 (X 1 ),..., Π s+1 (X ) to compute a estimator µ s+1 = µ (δ/(s+1)) of Π s+1 (µ). Sice the largest eigevalue of the covariace matrix of Π s+1 (X) is at most λ max /d 2, usig (3), we obtai that, with probability 1 δ/(s + 1), µ s+1 Π s+1 (µ) 2 C λ max log((2 log 3/2 d + 1)/δ). Our fial estimator is µ (δ) = s+1 µ s+1. By the uio boud, we have that, with probability at least 1 δ, µ (δ) µ 2 = s+1 µ i Π i (µ) 2 2 (8/3) l 8 (8e) λ (i) 1 d i + (8e) 2 (8/3) l(e(2 log 3/2 d + 1)/δ) +C λ max log((2 log 3/2 d + 1)/δ) First otice that, by properties (iii) ad (iv) at the ed of the previous sectio, O the other had, sice λ (i) λ (i) 1 1.1λ max (2/3) i 1 3.3λ max. s+1 Tr(Σ) = E X µ 2 = E Π i (X) Π i (µ) 2 ad for i s each Π i (X) has a 4-spherical distributio, we have that λ (i) 1 d i 1.1 This cocludes the proof of Theorem 1. Refereces λ (i) 1 d i 4.4Tr(Σ). [1] N. Alo, Y. Matias, ad M. Szegedy. The space complexity of approximatig the frequecy momets. Joural of Computer ad System Scieces, 58: , λ (i) 1

11 [2] S. Bubeck, N. Cesa-Biachi, ad G. Lugosi. Badits with heavy tail. IEEE Trasactios o Iformatio Theory, 59: , [3] O. Catoi. Challegig the empirical mea ad empirical variace: a deviatio study. Aales de l Istitut Heri Poicaré, Probabilités et Statistiques, 48(4): , [4] O. Catoi. Pac-bayesia bouds for the gram matrix ad least squares regressio with a radom desig. arxiv preprit arxiv: , [5] L. Devroye, M. Lerasle, G. Lugosi, ad R.I. Oliveira. Sub-Gausssia mea estimators. Aals of Statistics, [6] I. Giulii. Robust dimesio-free gram operator estimates. arxiv preprit arxiv: , [7] D.L. Haso ad F.T. Wright. A boud o tail probabilities for quadratic forms i idepedet radom variables. Aals of Mathematical Statistics, 42: , [8] D. Hsu. Robust statistics [9] D. Hsu ad S. Sabato. Loss miimizatio ad parameter estimatio with heavy tails. Joural of Machie Learig Research, 17:1 40, [10] M. Jerrum, L. Valiat, ad V. Vazirai. Radom geeratio of combiatorial structures from a uiform distributio. Theoretical Computer Sciece, 43: , [11] M. Lerasle ad R. I. Oliveira. Robust empirical mea estimators. arxiv: , [12] J. Matoušek. Lectures o discrete geometry. Spriger, [13] S. Misker. Geometric media ad robust estimatio i Baach spaces. Beroulli, 21: , [14] A.S. Nemirovsky ad D.B. Yudi. Problem complexity ad method efficiecy i optimizatio

Dimension-free PAC-Bayesian bounds for the estimation of the mean of a random vector

Dimension-free PAC-Bayesian bounds for the estimation of the mean of a random vector Dimesio-free PAC-Bayesia bouds for the estimatio of the mea of a radom vector Olivier Catoi CREST CNRS UMR 9194 Uiversité Paris Saclay olivier.catoi@esae.fr Ilaria Giulii Laboratoire de Probabilités et