arxiv: v1 [math.st] 17 Apr 2015

Size: px

Start display at page:

Download "arxiv: v1 [math.st] 17 Apr 2015"

Vernon Hood
6 years ago
Views:

1 Robust estimatio of U-statistics arxiv: v1 [math.st] 17 Apr 2015 Emilie Joly Gábor Lugosi April 20, 2015 This paper is dedicated to the memory of Evarist Gié. Abstract A importat part of the legacy of Evarist Gié is his fudametal cotributios to our uderstadig of U-statistics ad U-processes. I this paper we discuss the estimatio of the mea of multivariate fuctios i case of possibly heavy-tailed distributios. I such situatios, reliable estimates of the mea caot be obtaied by usual U-statistics. We itroduce a ew estimator, based o the so-called media-of-meas techique. We develop performace bouds for this ew estimator that geeralizes a estimate of Arcoes ad Gié (1993, showig that the ew estimator performs, uder miimal momet coditios, as well as classical U-statistics for bouded radom variables. We discuss a applicatio of this estimator to clusterig. 1 Itroductio Motivated by umerous applicatios, the theory of U-statistics ad U-processes has received cosiderable attetio i the past decades. U-statistics appear aturally i rakig (Clémeço et al., 2008, clusterig (Clémeço, 2014 ad learig o graphs (Biau ad Bleakley, 2006 or as compoets of higher-order terms i expasios of smooth statistics, see, for example, Robis et al. (2009. The geeral settig may be described as follows. Let X be a radom variable takig values i some measurable space X ad let h : X m R be a measurable fuctio of m 2 variables. Let P be the probability measure of X. Suppose we have access to m idepedet radom variables X 1,...,X, all distributed as X. We defie the U-statistics of order m ad kerel h based o the sequece {X i } as U (h = ( m!! (i 1,...,i m I m h(x i1,...,x im, (1 École Normale Supérieure, Paris ICREA ad Departmet of Ecoomics ad Busiess, Pompeu Fabra Uiversity. Supported by the Spaish Miistry of Sciece ad Techology grat MTM

2 where I m = {(i 1,...,i m : 1 i j, i j i k if j k} is the set of all m-tuples of differet itegers betwee 1 ad. U-statistics are ubiased estimators of themea m h = Eh(X 1,...,X m ad have miimal variace amog all ubiased estimators (Hoeffdig, Uderstadig the cocetratio of a U-statistics aroud its expected value has bee subject of extesive study. de la Peña ad Gié (1999 provide a excellet summary but see also Gié et al. (2000 for a more recet developmet. By a classical iequality of Hoeffdig (1963, for a bouded kerel h, for all δ > 0, P U log( 2 δ (h m h > h 2 /m δ, (2 ad we also have the Berstei-type iequality P U 4σ 2 log( 2 δ (h m h > 2 /m 4 h log(2 δ 6 /m δ, where σ 2 = Var(h(X 1,...,X m. However, uder certai degeeracy assumptios o the kerel, sigificatly sharper bouds have bee proved. Followig the expositio of de la Peña ad Gié (1999, for coveiece, we restrict out attetio to symmetric kerels. A kerel h is symmetric if for all x 1,...,x m R ad all permutatios s, h(x 1,...,x m = h(x s1,...,x sm. A symmetric kerel h is said to be P-degeerate of order q 1, 1 < q m, if for all x 1,...,x q 1 X, h(x 1,...,x m dp m q+1 (x q,...,x m = h(x 1,...,x m dp m (x 1,...,x m ad (x 1,...,x q f(x 1,...,x m dp m q (x q+1,...,x m is ot a costat fuctio. I the special case of m h = 0 ad q = m (i.e., whe the kerel is (m 1-degeerate, h is said to be P-caoical. P-caoical kerels appear aturally i the Hoeffdig decompositio of a U-statistic, see de la Peña ad Gié (1999. Arcoes ad Gié (1993 proved the followig importat improvemet of Hoeffig s iequalities for caoical kerels: If h m h is a bouded, symmetric P-caoical kerel of 2

3 m variables, there exist fiite positive costats c 1 ad c 2 depedig oly o m such that for all δ (0,1, { ( log( c 2δ } m/2 P U (h m h c 1 h δ, (3 ad also { ( σ 2 log( c 1 m/2 P U (h m h > δ h ( log( c 1δ } (m+1/2 δ. (4 c 2 c 2 I the special case of P-caoical kerels of order m = 2, (3 implies that U (h m h c ( 1 h c2 log, (5 δ with probability at least 1 δ. Note that this rate of covergece is sigificatly faster tha the rate O p ( 1/2 implied by (2. All the results cited above require boudedess of the kerel. If the kerel is ubouded buth(x 1,...,X m hassufficietlylight (e.g., sub-gaussiatails, thesomeoftheseresults may be exteded, see, for example, Gié et al. (2000. However, if h(x 1,...,X m may have a heavy-tailed distributio, expoetial iequalities do ot hold aymore (eve i the uivariate m = 1 case. However, eve though U-statistics may have a erratic behavior i the presece of heavy tails, i this paper we show that uder miimal momet coditios, oe may costruct estimators of m h that satisfy expoetial iequalities aalogous to (2 ad (3. These are the mai results of the paper. I particular, i Sectio 2 we itroduce a robust estimator of the mea m h. Theorems 1 ad 3 establish expoetial iequalities for the performace of the ew estimator uder miimal momet assumptios. More precisely, Theorem 1 oly requires that h(x 1,...,X m has a fiite variace ad establishes iequalities aalogous to (3 for P-degeerate kerels. I Theorem 3 we further weake the coditios ad oly assume that there exists 1 < p 2 such that E h p <. The ext example illustrates why classical U-statistics fail uder heavy-tailed distributios. Example. Cosider the special case m = 2, EX 1 = 0 ad h(x 1,X 2 = X 1 X 2. Note that thiskerelisp-caoical. WedefieY 1,...,Y asidepedetcopiesofx 1,...,X. BydecoupligiequalitiesforthetailofU-statisticsgiveiTheorem3.4.1ide la Peña ad Gié (1999(seealsoTheorem7itheAppedix,U (hhasasimilartailbehaviorto ( 1 i=1 X ( 1 i Thus, U (h behaves like a product of two idepedet empirical mea estimators of the same distributio. Whe the X i are heavy tailed, the empirical mea is kow to be a poor estimator of the mea. As a example, assume that X follows a α-stable law S(γ,α for some α (1,2 ad γ > 0. Recall that a radom variable X has a α-stable law S(γ,α if for all u R, Eexp(iuX = exp( γ α u α j=1 Y j.

4 (see Zolotarev (1986, Nola (2015. The it follows from the properties of α-stable distributios (summarized i Propositio 9 i the Appedix that there exists a costat c > 0 depedig oly o α ad γ such that { P U (h 2/α 2} c, ad therefore there is o hope to reproduce a upper boud like (5. Below we show how this problem ca be dealt with by replacig the U-statistics by a more robust estimator. Our approach is based o robust mea estimators i the uivariate settig. Estimatio of the mea of a possibly heavy-tailed radom variable X from i.i.d. sample X 1,...,X has recetly received icreasig attetio. Itroduced by Nemirovsky ad Yudi (1983, the media-of-meas estimator takes a cofidece level δ (0,1 ad divides the data ito V logδ 1 blocks. For each block k = 1,...,V, oe may compute the empirical mea µ k o the variables i the block. The media µ of the µ k is the so-called media-of-meas estimator. A short aalysis of the resultig estimator shows that µ m h c log(1/δ Var(X with probability at least 1 δ for a umerical costat c. For the details of the proof see Lerasle ad Oliveira (2011. Whe the variace is ifiite but a momet of order 1 < p 2 exists, the media-of meas estimator is still useful, see Bubeck et al. (2013. This estimator has recetly bee studied i various cotexts. M-estimatio based o this techique has bee developed by Lerasle ad Oliveira (2011 ad geeralizatios i a multivariate cotext have bee discussed by Hsu ad Sabato (2013 ad Misker (2015. A similar idea was used i Alo et al. (2002. A iterestig alterative of the media-ofmeas estimator has bee proposed by Catoi (2012. The rest of the paper is orgaized as follows. I Sectio 2 we itroduce a robust estimator of the mea m h ad preset performace bouds. I particular, Sectio 2.1 deals with the fiite variace case. Sectio 2.2 is dedicated to case whe h has a fiite p-th momet for some 1 < p < 2 for P-degeerate kerels. Fially, i Sectio 3, we preset a applicatio to clusterig problems. 2 Robust U-estimatio I this sectio we itroducea media-of-meas -style estimator of m h = Eh(X 1,...,X m. To defie the estimator, oe divides the data ito V blocks. For ay m-tuple of differet blocks, oe may compute a (decoupled U-statistics. Fially, oe computes the media of all the obtaied values. The rigorous defiitio is as follows. The estimator has a parameter V, the umber of blocks. A partitio B = (B 1,...,B V of {1,...,} is called regular if for all K = 1,...,V, B K V 1. 4

5 For ay B i1,...,b im i B, we set ad U Bi1,...,B im (h = I Bi1,...,B im = { (k 1,...,k m : k j B ij } 1 B i1 B im (k 1,...,k m I Bi1,...,B im h(x k1,...,x km. For ay iteger N adayvector (a 1,...,a N R N, wedefiethemediamed(a 1,...,a N as ay umber b such that {i N : ai b} N 2 Fially, we defie the robust estimator: ad {i N : ai b} N 2. U B (h = Med{U Bi1,...,B im (h : i j {1,...,V},1 i 1 <... < i m V}. (6 Notethat, mostlyiordertosimplifyotatio, weolytakethosevaluesofu Bi1,...,B im (h ito accout that correspod to distict idices i 1 < < i m. Thus, each U Bi1,...,B im (h is a so-called decoupled U-statistics (see the Appedix for the defiitio. Oe may icorporate all m-tuples (ot ecessarily with distict idices i the computatio of the media. However, this has a mior effect o the performace. Similar bouds may be prove though with a more complicated otatio. A simpler alterative is obtaied by takig oly diagoal blocks ito accout. More precisely, let U Bi (hbetheu-statistics calculated usigthevariables iblock B i (asdefied i (1. Oe may simply calculate the media of the V differet U-statistics U Bi (h. This versio is easy to aalyze because {i V : UBi (h b} is a sum of idepedet radom variables. However, this simple versio is wasteful i the sese that oly a small fractio of possible m-tuples are take ito accout. I the ext two sectios we aalyze the performace of the estimator U B (h. 2.1 Expoetial iequalities for P-degeerate kerels with fiite variace. Next we preset a performace boud of the estimator U B (h i the case whe σ 2 is fiite. The somewhat more complicated case of ifiite secod momet is treated i Sectio 2.2. Theorem 1. Let X 1,...,X be i.i.d. radom variables takig values i X. Let h : X m R be a symmetric kerel that is P-degeerate of order q 1. Assume Var(h(X 1,...,X m = σ 2 <. Let δ (0, 1 2 be such that log(1/δ 64m. Let B be a regular partitio of {1,...,} with B = 32m log(1/δ. The, with probability at least 1 2δ, we have ( UB (h m h log(1/δ q/2 Km σ, (7 where K m = m+1 m m 2. 5

6 Whe q = m, the kerel h m h is P-caoical ad the rate of covergece is the give by (logδ 1 / m/2. Thus, the ew estimator has a performace similar to stadard U-statistics as i (3 ad (4 but without the boudedess assumptio for the kerel. It is importat to ote that a disadvatage of the estimator U B (h is that it depeds o the cofidece level δ (through the umber of blocks. For differet cofidece levels, differet estimators are used. Because of its importace i applicatios, we spell out the special case whem = q = 2. I Sectio 3 we use this result i a example of cluster aalysis. Corollary 2. Let δ (0,1/2. Let h : X 2 R be a P-caoical kerel with σ 2 = Var(h(X 1,X 2 ad let 128(1+log(1/δ. The, with probability at least 1 2δ, U B (h m h 512σ 1+log(1/δ. (8 I the proof of Theorem 1 we eed the otio of Hoeffdig decompositio (Hoeffdig, 1948 of U-statistics. For probability measures P 1,...,P m, defie P 1 P m h = h d(p1,...,p m. For a symmetric kerel h : X m R the Hoeffdig projectios are defied, for 0 k m ad x 1,...,x k X, as π k h(x 1,...,x k := (δ x1 P (δ xk P P m k h where δ x deotes the Dirac measure at the poit x. Observe that π 0 h = P m h ad for k > 0, π k h is a P-caoical kerel. h ca be decomposed as h(x 1,...,x m = m k=01 i 1 <...<i k m π k h(x i1,...,x ik. (9 If h is assumed to be square-itegrable (i.e., P m h 2 <, the terms i (9 are orthogoal. If h is degeerate of order q 1, the for ay 1 k q 1, π k h = 0. Proof of Theorem 1. We begi with a weak cocetratio result o each U Bi1,...,B im (h. Let B i1,...,b im be elemets of B. For ay B B, we have 2 2 B B B. We deote by k = (k 1,...,k m a elemet of I Bi1,...,B im. We have, by the above-metioed orthogoality 6

7 property, ( Var U Bi1,...,B im (h = E [(U Bi1,...,B im (h P m h 2] 1 = B i B im 2 E[(h(X k1,...,x km P m h(h(x l1,...,x lm P m h] k I Bi1,...,B im = l I Bi1,...,B im 1 B i B im 2 k I Bi1,...,B im l I Bi1,...,B im 1 B i B im 2 k I Bi1,...,B im m ( k l E [ π s h(x 1,...,X s 2] (by orthogoality s s=q m m s=q t=0 ( t E [ π s h(x 1,...,X s 2] s ( 2 m t. B The last iequality is obtaied by coutig, for ay fixed k ad t, the umber of elemets l such that k l = t. Thus, ( Var U Bi1,...,B im (h 1 B i1... B im 1 B i1... B im ( 2 B 1 m m s=q 22m q+1 B q q O the other had, we have, by (9, m Var(h = E = = m m s=q t=q m ( m s s=q ( m s m s=q s=q 1 i 1 <...<i s m m ( t E [ ( π s h(x 1,...,X s 2] 2 s B E [ π s h(x 1,...,X s 2] m E [ π s h(x 1,...,X s 2] 2 ( m E [ π s h(x 1,...,X s 2]. s s=q 1 i 1 <...<i s m m s=q ( m s E ( 2 B 2 π s h(x i1,...,x is E [(π s h(x i1,...,x is 2] [ (π s h(x 1,...,X s 2]. 7 ( 2 B t=q m q m t m t

8 Combiig the two displayed equatios above, ( Var U Bi1,...,B im (h By Chebyshev s iequality, for all r (0,1, { P 22m q+1 B q q σ 2 22m B q q σ 2. U Bi1,...,B im (h P m h > 2 m σ B q/2 q/2 r 1/2 } r. (10 We set x = 2 m σ B q/2, ad q/2 r 1/2 { N x = (i 1,...,i m {1,...,V} m : 1 i 1 <... < i m B, U Bi1,...,B im (h P m h > x}. 1 The radom variable ( B m N x is a U-statistics of order m with the symmetric kerel g : (i 1,...,i m ½ {UBi1,...,B im (h P m h>x}. Thus, Hoeffdig s iequality for cetered U- statistics (2 gives { ( } B P N x EN x t m exp ( B t2 2m. (11 By (10 we have EN x ( B m r. Takig t = r = 1 4 i (11, by the defiitio of the media, we have { } P { U B (h P m (h > x } P N x ( B m ( exp B 32m Sice B 32mlog(δ 1, with probability at least 1 δ, we have U B (h P m h K m σ ( logδ 1 q/2 2. withk m = m+1 m m 2. Theupperboudforthelowertail holdsbythesameargumet. 2.2 Bouded momet of order p with 1 < p 2 I this sectio, we weake the assumptio of fiite variace ad oly assume the existece of a cetered momet of order p for some 1 < p 2. The outlie of the argumet is similar as i the case of fiite variace. First we obtai a weak cocetratio iequality for 8

9 the U-statistics is each block ad the use the property of the media to boost the weak iequality. While for the case of fiite variace weak cocetratio could be proved by a direct calculatio of the variace, here we eed the radomizatio iequalities for covex fuctios of U-statistics established by de la Peña (1992 ad Arcoes ad Gié (1993. Note that, here, a P-caoical techical assumptio is eeded. Theorem 3. Let h be a symmetric kerel of order m such that h m h is P-caoical. Assume that M p := E [ h(x1,...,x m m h p ] 1/p < for some 1 < p 2. Let δ (0, 1 2 be such that log(δ 1 64m. Let B be a regular partitio of {1,...,} with B = 32m log(δ 1. The, with probability at least 1 2δ, we have where K m = 2 4m+1 m m 2. ( log(δ 1 m(p 1/p U B (h m h K m M p (12 Proof. Defie the cetered versio of h by g(x 1,...,x m := h(x 1,...,x m m h. Let ε 1,...,ε be i.i.d. Rademacher radom variables (i.e., P{ε 1 = 1} = P{ε 1 = 1} = 1/2 idepedet of X 1,...,X. By the radomizatio iequalities (see Theorem i de la Peña ad Gié (1999 ad also Theorem 8 i the Appedix, we have p E g(x k1,...,x km (k 1,...,k m I Bi1,...,B im p 2 mp E X E ε ε k1...ε km g(x k1,...,x km (13 (k 1,...,k m I Bi1,...,B im 2 p/2 2 mp E X E ε ε k1...ε km g(x k1,...,x km (k 1,...,k m I Bi1,...,B im p/2 = 2 mp E X g(x k1,...,x km 2 (k 1,...,k m I Bi1,...,B im 2 mp (k 1,...,k m I Bi1,...,B im E g(x k1,...,x km p = 2 mp B i1 B im E g p. (14 9

10 ] Thus, we have E [ U Bi1,...,B im (h m h p 2 mp ( B i1... B im 1 p E g p ad by Markov s iequality, P { U Bi1,...,B im (h m h > 2m M p Aother use of (11 with t = r = 1 4 gives r 1 p ( m 1 p} p r. (15 (2 B U B (h P m h 2 4m+1 m m 2 Mp ( logδ 1 m p 1 p. To see why the boud of Theorem 3 gives essetially the right order of magitude, cosider agai the example described i the itroductio, whe m = 2, h(x 1,X 2 = X 1 X 2, ad the X i have a α-stable law S(γ,α for some γ > 0 ad 1 < α 2. Note that a α-stable radom variable has fiite momets up to (but ot icludig α ad therefore we may take ay p = α ǫ for ay ǫ (0,1 α. As we oted it i the itroductio, there exists a costat c depedig o α ad γ oly such that for all 1 i 1 < i 2 V, ( } P{ UBi1,B (h m 2/α 2 i2 h c 2/3, B ad therefore (15 is essetially the best rate oe ca hope for. 3 Cluster aalysis with U-statistics I this sectio we illustrate the use of the proposed mea estimator i a clusterig problem whe the presece of possibly heavy-tailed data requires robust techiques. We cosider the geeral statistical framework defied by Clémeço (2014, described as follows: Let X,X be i.i.d. radom variables takig values i X where typically but ot ecessarily, X is a subset of R d. For a partitio P of X ito K disjoit sets the so-called cells, defie Φ P (x,x = C P ½ {(x,x C 2 } the {0,1}-valued fuctio that idicates whether two elemets x ad x belog to the same cell C. Give a dissimilarity measure D : X 2 R +, the clusterig task cosists i fidig a partitio of X miimizig the clusterig risk W(P = E [ D(X,X Φ P (X,X ]. Let Π K be a fiite class of partitios P of X ito K cells ad defie W = mi P ΠK W(P. Give X 1,...,X be i.i.d. radom variables distributed as X, the goal is to fid a partitio P Π K with risk as close to W as possible. A atural idea ad this is the 10

11 approach of Clémeço (2014 is to estimate W(P by the U-statistics Ŵ (P = 2 ( 1 1 i<j D(X i,x j Φ P (X i,x j ad choose a partitio miimizig the empirical clusterig risk Ŵ(P. Clémeço (2014 uses the theory of U-processes to aalyze the performace of such miimizers of U-statistics. However, i order to cotrol uiform deviatios of the form sup P ΠK Ŵ(P W(P, expoetial cocetratio iequalities are eeded for U-statistics. This restricts oe to cosider boudeddissimilarity measures D(X,X. Whe D(X,X may have a heavy tail, we propose to replace U-statistics by the media-of-meas estimators of W(P itroduced i this paper. Let B be a regular partitio of {1,...,} ad defie the media-of-meas estimator W B (P of W(P as i (6. The Theorem 1 applies ad we have the followig simple corollary. Corollary 4. Let Π K be a class of partitios of cardiality Π K = N. Assume that σ 2 := E [ D(X 1,X 2 2] <. Let δ (0,1/2 be such that 128 log(n/δ. Let B be a regular partitio of {1,...,} with B = 64 log(n/δ. The there exists a costat C such that, with probability at least 1 2δ, sup W B (P W(P Cσ P Π K ( log(n/δ 1/2. (16 Proof. Sice Φ P (x,x is bouded by 1, Var(D(X 1,X 2 Φ P (X 1,X 2 E [ D(X 1,X 2 2]. For a fixed P Π K, Theorem 1 applies with m = 2 ad q = 1. The iequality follows from the uio boud. Oceuiformdeviatios of W B (P from its expected value are cotrolled, it is a routie exercise to derive performace bouds for clusterig based o miimizig W B (P over P Π K. Let P = argmi P ΠK W B (P deote the empirical miimizer. (I case of multiple miimizers, oe may select oe arbitrarily. Now for ay P 0 Π K, W( P W = W( P W B ( P+W B ( P W Takig the ifimum over Π K, W( P W B ( P+W B (P 0 W(P 0 +W(P 0 W 2 sup P Π K W B (P W(P +W(P 0 W. W( P W 2 sup P Π K W B (P W(P. (17 11

12 Fially, (16 implies that ( 1+log(N/δ 1/2 W( P W 2Cσ. This result is to be compared with Theorem 2 of Clémeço (2014. Our result holds uder the oly assumptio that D(X,X has a fiite secod momet. (This may be weakeed to assumig the existece of a fiite p-th momet for some 1 < p 2 by usig Theorem 3. O the other had, our result holds oly for a fiite class of partitios while Clémeço (2014 uses the theory of U-processes to obtai more sophisticated bouds for uiform deviatios over possibly ifiite classes of partitios. It remais a challege to develop a theory to cotrol processes of media-of-meas estimators i the style of Arcoes ad Gié (1993 ad ot havig to resort to the use of simple uio bouds. I the rest of this sectio we show that, uder certai low-oise assumptios, aalogous to the oes itroduced by Mamme ad Tsybakov (1999 i the cotext of classificatio, to obtai faster rates of covergece. I this part we eed bouds for P-caoical kerels ad use the full power of Corollary 2. Similar argumets for the study of miimizig U-statistics appear i Clémeço et al. (2008, Clémeço (2014. We assume the followig coditios, also cosidered by Clémeço (2014: 1. There exists P such that W(P = W 2. There exist α [0,1] ad κ < such that for all P Π K ad for all x X, P{Φ P (x,x Φ P (x,x} κ(w(p W α. Note that α 2 sice by the Cauchy-Schwarz iequality, W(P W E [ D(X 1,X 2 2] 1/2 P{ΦP (X 1,X 2 Φ P (X 1,X 2 } 1/2. Corollary 5. Assume the coditios above ad that σ 2 := E [ D(X 1,X 2 2] <. Let δ (0,1/2 be such that 128 log(n/δ. Let B be a regular partitio of {1,...,} with B = 64 log(n/δ. The there exists a costat C such that, with probability at least 1 2δ, ( log(n/δ 1/(2 α W( P W Cσ 2/(2 α. (18 The proof Corollary 5 is postpoed to the Appedix. 4 Appedix 4.1 Decouplig ad radomizatio Here we summarize some of the key tools for aalyzig U-statistics that we use i the paper. For a excellet expositio we refer to de la Peña ad Gié (

13 Let {X i } be i.i.d. radom variables takig values i X ad let {Xi k }, k = 1,...,m, be sequeces of idepedet copies. Let Φ be a o-egative fuctio. As a corollary of Theorem i de la Peña ad Gié (1999 we have the followig: Theorem 6. Let h : X m R be a measurable fuctio with E h(x 1,...,X m <. Let Φ : [0, [0, be a covex odecreasig fuctio such that EΦ( h(x 1,...,X m <. The EΦ h(x i1,...,x im EΦ C m h(xi 1 1,...,Xi m m I m where C m = 2 m (m m 1((m 1 m Moreover, if the kerel h is symmetric, the, EΦ c m h(xi 1 1,...,Xi m m EΦ h(x i1,...,x im I m where c m = 1/(2 2m 2 (m 1!. A equivalet result for tail probabilities of U-statistics is the followig (see Theorem i de la Peña ad Gié (1999: Theorem 7. Uder the same hypotheses as Theorem 6, there exists a costat C m depedig o m oly such that, for all t > 0, P h(x i1,...,x im > t C mp C m h(xi 1 1,...,Xi m m > t. I m If moreover, the kerel h is symmetric the there exists a costat c m depedig o m oly such that, for all t > 0, c m P c m h(xi 1 1,...,Xi m m > t P h(x i1,...,x im > t. I m Theext Theorem is a direct corollary of Theorem i de la Peña ad Gié (1999. Theorem 8. Let 1 < p 2. Let (ε i i be i.i.d Rademacher radom variables idepedet of the (X i i. Let h : X R be a P-degeerate measurable fuctio such that E( h(x 1,...,X m p <. The c m E ε i1...ε im h(x i1,...,x im p E h(x i1,...,x im p I m where C m = 2 mp ad c m = 2 mp. I m I m I m I m I m C m E ε i1...ε im h(x i1,...,x im p, I m 13

14 The same coclusio holds for decoupled U-statistics. 4.2 α-stable distributios Propositio 9. Let α (0,2. Let X 1,...,X be i.i.d. radom variables of law S(γ,α. Let f γ,α : x R be the desity fuctio of X 1. Let S = 1 i X i. The (i f γ,α (x is a eve fuctio. (ii f γ,α (x x + αγα c α x α 1 with c α = si( πα 2 Γ(α/π. (iii E[X p 1 ] is fiite for ay p < α ad is ifiite wheever p α. (iv S has a α-stable law S(γ 1/α,α. Proof. (i ad (iv follow directly from the defiitio. (ii is proved i the itroductio of Zolotarev (1986. (iii is a cosequece of (ii. 4.3 Proof of Corollary 5 Defie Λ (P = Ŵ(P W, the U-statistics based o the sample X 1,...,X, with symmetric kerel h P (x,x = D(x,x ( Φ P (x,x Φ P (x,x. We deote by Λ(P = W(P W the expected value of Λ (P. The mai argumet i the followig aalysis is based o the Hoeffdig decompositio. For all partitios P, Λ (P Λ(P = 2L (P+M (P for L (P = 1 i h(1 (X i with h (1 (x = E[h P (X,x] Λ(P ad M (P the U- statistics based o the caoical kerel give by h (2 (x,x = h P (x,x h (1 (x h (1 (x Λ(P. Let B be a regular partitio of {1,...,}. For ay B B, Λ B (P is the U- statistics o the kerel h P restricted to the set B ad Λ B (P is the media of the sequece (Λ B (P B B. We defie similarly L B (P ad M B (P o the variables (X i i B. For ay B B, Var(Λ B (P = 4Var(L B (P+Var(M B (P = 4 ( B Var h (1 2 ( (X + B ( B 1 Var h (2 (X 1,X 2. Simple computatios show that Var ( h (2 (X 1,X 2 = 2Var ( h (1 (X ad therefore, Var(Λ B (P 8 ( B Var h (1 (X. 14

15 Moreover, Var ( h (1 (X [ E X [E X hp (X,X ] ] 2 [ E X [E X D(X,X 2] [ (ΦP E X (X,X Φ P (X,X ]] 2 [ [ = E X EX D(X,X 2] { P X ΦP (X,X Φ P (X,X }] σ 2 κ(w(p W α where E X (resp. E X refers to the expectatio take with respect to X (resp. X. Chebyshev s iequality gives, for r (0, 1, } P {Λ B (P Λ(P > σ(w(p W α/2 8κ r. r B Usig agai (11 with r = 1 4, by B 128 log(n/δ, there exists a costat C such that for ay P Π K, with probability at least 1 2δ/N, Λ B (P Λ(P Cσ(W(P W α/2 log(n/δ. This implies by the uio boud, that W B ( P W( P Kσ(W( P W α/2 log(n/δ with probability at least 1 2δ. Usig (17, we obtai cocludig the proof. (W( P W 1 α/2 2Kσ log(n/δ, Refereces Alo, N., Y. Matias, ad M. Szegedy (2002. The space complexity of approximatig the frequecy momets. Joural of Computer ad System Scieces 58, Arcoes, M. A. ad E. Gié (1993. Limit theorems for U-processes. The Aals of Probability 21, Biau, G. ad K. Bleakley (2006. Statistical iferece o graphs. Statistics & Decisios 24(2,

16 Bubeck, S., N. Cesa-Biachi, ad G. Lugosi (2013. Badits with heavy tail. IEEE Trasactios o Iformatio Theory 59, Catoi, O. (2012. Challegig the empirical mea ad empirical variace: a deviatio study. Aales de l Istitut Heri Poicaré, Probabilités et Statistiques 48, Clémeço, S. (2014. A statistical view of clusterig performace through the theory of U-processes. Joural of Multivariate Aalysis 124, Clémeço, S., G. Lugosi, ad N. Vayatis (2008. Rakig ad empirical miimizatio of u-statistics. The Aals of Statistics, de la Peña, V. ad E. Gié (1999. Decouplig: from depedece to idepedece. New York: Spriger. de la Peña, V. H. (1992. Decouplig ad Khitchie s iequalities for U-statistics. The Aals of Probability, Gié, E., R. Lata la, ad J. Zi (2000. Expoetial ad momet iequalities for U- statistics. I High Dimesioal Probability II Progress i Probability, pp Birkhauser. Hoeffdig, W. (1948. A class of statistics with asymptotically ormal distributio. The Aals of Mathematical Statistics, Hoeffdig, W. (1963. Probability iequalities for sums of bouded radom variables. Joural of the America Statistical Associatio 58, Hsu, D. ad S. Sabato (2013. Approximate loss miimizatio with heavy tails. Computig Research Repository abs/ Lerasle, M. ad R. Oliveira (2011. Robust empirical mea estimators. Mamme, E. ad A. Tsybakov (1999. Smooth discrimiatio aalysis. The Aals of Statistics 27(6, Misker, S. (2015. Geometric media ad robust estimatio i Baach spaces. Beroulli. Nemirovsky, A. ad D. Yudi (1983. Problem complexity ad method efficiecy i optimizatio. Nola, J. P. (2015. Stable Distributios - Models for Heavy Tailed Data. Bosto: Birkhauser. I progress, Chapter 1 olie at academic2.america.edu/ jpola. Robis, J., L. Li, E. Tchetge, ad A. va der Vaart (2009. Quadratic semiparametric vo Mises calculus. Metrika 69(2-3,

17 Zolotarev, V. (1986. Oe-dimesioal stable distributios, Volume 65. America Mathematical Soc. 17

Dimension-free PAC-Bayesian bounds for the estimation of the mean of a random vector

Dimension-free PAC-Bayesian bounds for the estimation of the mean of a random vector Dimesio-free PAC-Bayesia bouds for the estimatio of the mea of a radom vector Olivier Catoi CREST CNRS UMR 9194 Uiversité Paris Saclay olivier.catoi@esae.fr Ilaria Giulii Laboratoire de Probabilités et