arxiv: v1 [math.st] 17 Apr 2015

Size: px
Start display at page:

Download "arxiv: v1 [math.st] 17 Apr 2015"

Transcription

1 Robust estimatio of U-statistics arxiv: v1 [math.st] 17 Apr 2015 Emilie Joly Gábor Lugosi April 20, 2015 This paper is dedicated to the memory of Evarist Gié. Abstract A importat part of the legacy of Evarist Gié is his fudametal cotributios to our uderstadig of U-statistics ad U-processes. I this paper we discuss the estimatio of the mea of multivariate fuctios i case of possibly heavy-tailed distributios. I such situatios, reliable estimates of the mea caot be obtaied by usual U-statistics. We itroduce a ew estimator, based o the so-called media-of-meas techique. We develop performace bouds for this ew estimator that geeralizes a estimate of Arcoes ad Gié (1993, showig that the ew estimator performs, uder miimal momet coditios, as well as classical U-statistics for bouded radom variables. We discuss a applicatio of this estimator to clusterig. 1 Itroductio Motivated by umerous applicatios, the theory of U-statistics ad U-processes has received cosiderable attetio i the past decades. U-statistics appear aturally i rakig (Clémeço et al., 2008, clusterig (Clémeço, 2014 ad learig o graphs (Biau ad Bleakley, 2006 or as compoets of higher-order terms i expasios of smooth statistics, see, for example, Robis et al. (2009. The geeral settig may be described as follows. Let X be a radom variable takig values i some measurable space X ad let h : X m R be a measurable fuctio of m 2 variables. Let P be the probability measure of X. Suppose we have access to m idepedet radom variables X 1,...,X, all distributed as X. We defie the U-statistics of order m ad kerel h based o the sequece {X i } as U (h = ( m!! (i 1,...,i m I m h(x i1,...,x im, (1 École Normale Supérieure, Paris ICREA ad Departmet of Ecoomics ad Busiess, Pompeu Fabra Uiversity. Supported by the Spaish Miistry of Sciece ad Techology grat MTM

2 where I m = {(i 1,...,i m : 1 i j, i j i k if j k} is the set of all m-tuples of differet itegers betwee 1 ad. U-statistics are ubiased estimators of themea m h = Eh(X 1,...,X m ad have miimal variace amog all ubiased estimators (Hoeffdig, Uderstadig the cocetratio of a U-statistics aroud its expected value has bee subject of extesive study. de la Peña ad Gié (1999 provide a excellet summary but see also Gié et al. (2000 for a more recet developmet. By a classical iequality of Hoeffdig (1963, for a bouded kerel h, for all δ > 0, P U log( 2 δ (h m h > h 2 /m δ, (2 ad we also have the Berstei-type iequality P U 4σ 2 log( 2 δ (h m h > 2 /m 4 h log(2 δ 6 /m δ, where σ 2 = Var(h(X 1,...,X m. However, uder certai degeeracy assumptios o the kerel, sigificatly sharper bouds have bee proved. Followig the expositio of de la Peña ad Gié (1999, for coveiece, we restrict out attetio to symmetric kerels. A kerel h is symmetric if for all x 1,...,x m R ad all permutatios s, h(x 1,...,x m = h(x s1,...,x sm. A symmetric kerel h is said to be P-degeerate of order q 1, 1 < q m, if for all x 1,...,x q 1 X, h(x 1,...,x m dp m q+1 (x q,...,x m = h(x 1,...,x m dp m (x 1,...,x m ad (x 1,...,x q f(x 1,...,x m dp m q (x q+1,...,x m is ot a costat fuctio. I the special case of m h = 0 ad q = m (i.e., whe the kerel is (m 1-degeerate, h is said to be P-caoical. P-caoical kerels appear aturally i the Hoeffdig decompositio of a U-statistic, see de la Peña ad Gié (1999. Arcoes ad Gié (1993 proved the followig importat improvemet of Hoeffig s iequalities for caoical kerels: If h m h is a bouded, symmetric P-caoical kerel of 2

3 m variables, there exist fiite positive costats c 1 ad c 2 depedig oly o m such that for all δ (0,1, { ( log( c 2δ } m/2 P U (h m h c 1 h δ, (3 ad also { ( σ 2 log( c 1 m/2 P U (h m h > δ h ( log( c 1δ } (m+1/2 δ. (4 c 2 c 2 I the special case of P-caoical kerels of order m = 2, (3 implies that U (h m h c ( 1 h c2 log, (5 δ with probability at least 1 δ. Note that this rate of covergece is sigificatly faster tha the rate O p ( 1/2 implied by (2. All the results cited above require boudedess of the kerel. If the kerel is ubouded buth(x 1,...,X m hassufficietlylight (e.g., sub-gaussiatails, thesomeoftheseresults may be exteded, see, for example, Gié et al. (2000. However, if h(x 1,...,X m may have a heavy-tailed distributio, expoetial iequalities do ot hold aymore (eve i the uivariate m = 1 case. However, eve though U-statistics may have a erratic behavior i the presece of heavy tails, i this paper we show that uder miimal momet coditios, oe may costruct estimators of m h that satisfy expoetial iequalities aalogous to (2 ad (3. These are the mai results of the paper. I particular, i Sectio 2 we itroduce a robust estimator of the mea m h. Theorems 1 ad 3 establish expoetial iequalities for the performace of the ew estimator uder miimal momet assumptios. More precisely, Theorem 1 oly requires that h(x 1,...,X m has a fiite variace ad establishes iequalities aalogous to (3 for P-degeerate kerels. I Theorem 3 we further weake the coditios ad oly assume that there exists 1 < p 2 such that E h p <. The ext example illustrates why classical U-statistics fail uder heavy-tailed distributios. Example. Cosider the special case m = 2, EX 1 = 0 ad h(x 1,X 2 = X 1 X 2. Note that thiskerelisp-caoical. WedefieY 1,...,Y asidepedetcopiesofx 1,...,X. BydecoupligiequalitiesforthetailofU-statisticsgiveiTheorem3.4.1ide la Peña ad Gié (1999(seealsoTheorem7itheAppedix,U (hhasasimilartailbehaviorto ( 1 i=1 X ( 1 i Thus, U (h behaves like a product of two idepedet empirical mea estimators of the same distributio. Whe the X i are heavy tailed, the empirical mea is kow to be a poor estimator of the mea. As a example, assume that X follows a α-stable law S(γ,α for some α (1,2 ad γ > 0. Recall that a radom variable X has a α-stable law S(γ,α if for all u R, Eexp(iuX = exp( γ α u α j=1 Y j.

4 (see Zolotarev (1986, Nola (2015. The it follows from the properties of α-stable distributios (summarized i Propositio 9 i the Appedix that there exists a costat c > 0 depedig oly o α ad γ such that { P U (h 2/α 2} c, ad therefore there is o hope to reproduce a upper boud like (5. Below we show how this problem ca be dealt with by replacig the U-statistics by a more robust estimator. Our approach is based o robust mea estimators i the uivariate settig. Estimatio of the mea of a possibly heavy-tailed radom variable X from i.i.d. sample X 1,...,X has recetly received icreasig attetio. Itroduced by Nemirovsky ad Yudi (1983, the media-of-meas estimator takes a cofidece level δ (0,1 ad divides the data ito V logδ 1 blocks. For each block k = 1,...,V, oe may compute the empirical mea µ k o the variables i the block. The media µ of the µ k is the so-called media-of-meas estimator. A short aalysis of the resultig estimator shows that µ m h c log(1/δ Var(X with probability at least 1 δ for a umerical costat c. For the details of the proof see Lerasle ad Oliveira (2011. Whe the variace is ifiite but a momet of order 1 < p 2 exists, the media-of meas estimator is still useful, see Bubeck et al. (2013. This estimator has recetly bee studied i various cotexts. M-estimatio based o this techique has bee developed by Lerasle ad Oliveira (2011 ad geeralizatios i a multivariate cotext have bee discussed by Hsu ad Sabato (2013 ad Misker (2015. A similar idea was used i Alo et al. (2002. A iterestig alterative of the media-ofmeas estimator has bee proposed by Catoi (2012. The rest of the paper is orgaized as follows. I Sectio 2 we itroduce a robust estimator of the mea m h ad preset performace bouds. I particular, Sectio 2.1 deals with the fiite variace case. Sectio 2.2 is dedicated to case whe h has a fiite p-th momet for some 1 < p < 2 for P-degeerate kerels. Fially, i Sectio 3, we preset a applicatio to clusterig problems. 2 Robust U-estimatio I this sectio we itroducea media-of-meas -style estimator of m h = Eh(X 1,...,X m. To defie the estimator, oe divides the data ito V blocks. For ay m-tuple of differet blocks, oe may compute a (decoupled U-statistics. Fially, oe computes the media of all the obtaied values. The rigorous defiitio is as follows. The estimator has a parameter V, the umber of blocks. A partitio B = (B 1,...,B V of {1,...,} is called regular if for all K = 1,...,V, B K V 1. 4

5 For ay B i1,...,b im i B, we set ad U Bi1,...,B im (h = I Bi1,...,B im = { (k 1,...,k m : k j B ij } 1 B i1 B im (k 1,...,k m I Bi1,...,B im h(x k1,...,x km. For ay iteger N adayvector (a 1,...,a N R N, wedefiethemediamed(a 1,...,a N as ay umber b such that {i N : ai b} N 2 Fially, we defie the robust estimator: ad {i N : ai b} N 2. U B (h = Med{U Bi1,...,B im (h : i j {1,...,V},1 i 1 <... < i m V}. (6 Notethat, mostlyiordertosimplifyotatio, weolytakethosevaluesofu Bi1,...,B im (h ito accout that correspod to distict idices i 1 < < i m. Thus, each U Bi1,...,B im (h is a so-called decoupled U-statistics (see the Appedix for the defiitio. Oe may icorporate all m-tuples (ot ecessarily with distict idices i the computatio of the media. However, this has a mior effect o the performace. Similar bouds may be prove though with a more complicated otatio. A simpler alterative is obtaied by takig oly diagoal blocks ito accout. More precisely, let U Bi (hbetheu-statistics calculated usigthevariables iblock B i (asdefied i (1. Oe may simply calculate the media of the V differet U-statistics U Bi (h. This versio is easy to aalyze because {i V : UBi (h b} is a sum of idepedet radom variables. However, this simple versio is wasteful i the sese that oly a small fractio of possible m-tuples are take ito accout. I the ext two sectios we aalyze the performace of the estimator U B (h. 2.1 Expoetial iequalities for P-degeerate kerels with fiite variace. Next we preset a performace boud of the estimator U B (h i the case whe σ 2 is fiite. The somewhat more complicated case of ifiite secod momet is treated i Sectio 2.2. Theorem 1. Let X 1,...,X be i.i.d. radom variables takig values i X. Let h : X m R be a symmetric kerel that is P-degeerate of order q 1. Assume Var(h(X 1,...,X m = σ 2 <. Let δ (0, 1 2 be such that log(1/δ 64m. Let B be a regular partitio of {1,...,} with B = 32m log(1/δ. The, with probability at least 1 2δ, we have ( UB (h m h log(1/δ q/2 Km σ, (7 where K m = m+1 m m 2. 5

6 Whe q = m, the kerel h m h is P-caoical ad the rate of covergece is the give by (logδ 1 / m/2. Thus, the ew estimator has a performace similar to stadard U-statistics as i (3 ad (4 but without the boudedess assumptio for the kerel. It is importat to ote that a disadvatage of the estimator U B (h is that it depeds o the cofidece level δ (through the umber of blocks. For differet cofidece levels, differet estimators are used. Because of its importace i applicatios, we spell out the special case whem = q = 2. I Sectio 3 we use this result i a example of cluster aalysis. Corollary 2. Let δ (0,1/2. Let h : X 2 R be a P-caoical kerel with σ 2 = Var(h(X 1,X 2 ad let 128(1+log(1/δ. The, with probability at least 1 2δ, U B (h m h 512σ 1+log(1/δ. (8 I the proof of Theorem 1 we eed the otio of Hoeffdig decompositio (Hoeffdig, 1948 of U-statistics. For probability measures P 1,...,P m, defie P 1 P m h = h d(p1,...,p m. For a symmetric kerel h : X m R the Hoeffdig projectios are defied, for 0 k m ad x 1,...,x k X, as π k h(x 1,...,x k := (δ x1 P (δ xk P P m k h where δ x deotes the Dirac measure at the poit x. Observe that π 0 h = P m h ad for k > 0, π k h is a P-caoical kerel. h ca be decomposed as h(x 1,...,x m = m k=01 i 1 <...<i k m π k h(x i1,...,x ik. (9 If h is assumed to be square-itegrable (i.e., P m h 2 <, the terms i (9 are orthogoal. If h is degeerate of order q 1, the for ay 1 k q 1, π k h = 0. Proof of Theorem 1. We begi with a weak cocetratio result o each U Bi1,...,B im (h. Let B i1,...,b im be elemets of B. For ay B B, we have 2 2 B B B. We deote by k = (k 1,...,k m a elemet of I Bi1,...,B im. We have, by the above-metioed orthogoality 6

7 property, ( Var U Bi1,...,B im (h = E [(U Bi1,...,B im (h P m h 2] 1 = B i B im 2 E[(h(X k1,...,x km P m h(h(x l1,...,x lm P m h] k I Bi1,...,B im = l I Bi1,...,B im 1 B i B im 2 k I Bi1,...,B im l I Bi1,...,B im 1 B i B im 2 k I Bi1,...,B im m ( k l E [ π s h(x 1,...,X s 2] (by orthogoality s s=q m m s=q t=0 ( t E [ π s h(x 1,...,X s 2] s ( 2 m t. B The last iequality is obtaied by coutig, for ay fixed k ad t, the umber of elemets l such that k l = t. Thus, ( Var U Bi1,...,B im (h 1 B i1... B im 1 B i1... B im ( 2 B 1 m m s=q 22m q+1 B q q O the other had, we have, by (9, m Var(h = E = = m m s=q t=q m ( m s s=q ( m s m s=q s=q 1 i 1 <...<i s m m ( t E [ ( π s h(x 1,...,X s 2] 2 s B E [ π s h(x 1,...,X s 2] m E [ π s h(x 1,...,X s 2] 2 ( m E [ π s h(x 1,...,X s 2]. s s=q 1 i 1 <...<i s m m s=q ( m s E ( 2 B 2 π s h(x i1,...,x is E [(π s h(x i1,...,x is 2] [ (π s h(x 1,...,X s 2]. 7 ( 2 B t=q m q m t m t

8 Combiig the two displayed equatios above, ( Var U Bi1,...,B im (h By Chebyshev s iequality, for all r (0,1, { P 22m q+1 B q q σ 2 22m B q q σ 2. U Bi1,...,B im (h P m h > 2 m σ B q/2 q/2 r 1/2 } r. (10 We set x = 2 m σ B q/2, ad q/2 r 1/2 { N x = (i 1,...,i m {1,...,V} m : 1 i 1 <... < i m B, U Bi1,...,B im (h P m h > x}. 1 The radom variable ( B m N x is a U-statistics of order m with the symmetric kerel g : (i 1,...,i m ½ {UBi1,...,B im (h P m h>x}. Thus, Hoeffdig s iequality for cetered U- statistics (2 gives { ( } B P N x EN x t m exp ( B t2 2m. (11 By (10 we have EN x ( B m r. Takig t = r = 1 4 i (11, by the defiitio of the media, we have { } P { U B (h P m (h > x } P N x ( B m ( exp B 32m Sice B 32mlog(δ 1, with probability at least 1 δ, we have U B (h P m h K m σ ( logδ 1 q/2 2. withk m = m+1 m m 2. Theupperboudforthelowertail holdsbythesameargumet. 2.2 Bouded momet of order p with 1 < p 2 I this sectio, we weake the assumptio of fiite variace ad oly assume the existece of a cetered momet of order p for some 1 < p 2. The outlie of the argumet is similar as i the case of fiite variace. First we obtai a weak cocetratio iequality for 8

9 the U-statistics is each block ad the use the property of the media to boost the weak iequality. While for the case of fiite variace weak cocetratio could be proved by a direct calculatio of the variace, here we eed the radomizatio iequalities for covex fuctios of U-statistics established by de la Peña (1992 ad Arcoes ad Gié (1993. Note that, here, a P-caoical techical assumptio is eeded. Theorem 3. Let h be a symmetric kerel of order m such that h m h is P-caoical. Assume that M p := E [ h(x1,...,x m m h p ] 1/p < for some 1 < p 2. Let δ (0, 1 2 be such that log(δ 1 64m. Let B be a regular partitio of {1,...,} with B = 32m log(δ 1. The, with probability at least 1 2δ, we have where K m = 2 4m+1 m m 2. ( log(δ 1 m(p 1/p U B (h m h K m M p (12 Proof. Defie the cetered versio of h by g(x 1,...,x m := h(x 1,...,x m m h. Let ε 1,...,ε be i.i.d. Rademacher radom variables (i.e., P{ε 1 = 1} = P{ε 1 = 1} = 1/2 idepedet of X 1,...,X. By the radomizatio iequalities (see Theorem i de la Peña ad Gié (1999 ad also Theorem 8 i the Appedix, we have p E g(x k1,...,x km (k 1,...,k m I Bi1,...,B im p 2 mp E X E ε ε k1...ε km g(x k1,...,x km (13 (k 1,...,k m I Bi1,...,B im 2 p/2 2 mp E X E ε ε k1...ε km g(x k1,...,x km (k 1,...,k m I Bi1,...,B im p/2 = 2 mp E X g(x k1,...,x km 2 (k 1,...,k m I Bi1,...,B im 2 mp (k 1,...,k m I Bi1,...,B im E g(x k1,...,x km p = 2 mp B i1 B im E g p. (14 9

10 ] Thus, we have E [ U Bi1,...,B im (h m h p 2 mp ( B i1... B im 1 p E g p ad by Markov s iequality, P { U Bi1,...,B im (h m h > 2m M p Aother use of (11 with t = r = 1 4 gives r 1 p ( m 1 p} p r. (15 (2 B U B (h P m h 2 4m+1 m m 2 Mp ( logδ 1 m p 1 p. To see why the boud of Theorem 3 gives essetially the right order of magitude, cosider agai the example described i the itroductio, whe m = 2, h(x 1,X 2 = X 1 X 2, ad the X i have a α-stable law S(γ,α for some γ > 0 ad 1 < α 2. Note that a α-stable radom variable has fiite momets up to (but ot icludig α ad therefore we may take ay p = α ǫ for ay ǫ (0,1 α. As we oted it i the itroductio, there exists a costat c depedig o α ad γ oly such that for all 1 i 1 < i 2 V, ( } P{ UBi1,B (h m 2/α 2 i2 h c 2/3, B ad therefore (15 is essetially the best rate oe ca hope for. 3 Cluster aalysis with U-statistics I this sectio we illustrate the use of the proposed mea estimator i a clusterig problem whe the presece of possibly heavy-tailed data requires robust techiques. We cosider the geeral statistical framework defied by Clémeço (2014, described as follows: Let X,X be i.i.d. radom variables takig values i X where typically but ot ecessarily, X is a subset of R d. For a partitio P of X ito K disjoit sets the so-called cells, defie Φ P (x,x = C P ½ {(x,x C 2 } the {0,1}-valued fuctio that idicates whether two elemets x ad x belog to the same cell C. Give a dissimilarity measure D : X 2 R +, the clusterig task cosists i fidig a partitio of X miimizig the clusterig risk W(P = E [ D(X,X Φ P (X,X ]. Let Π K be a fiite class of partitios P of X ito K cells ad defie W = mi P ΠK W(P. Give X 1,...,X be i.i.d. radom variables distributed as X, the goal is to fid a partitio P Π K with risk as close to W as possible. A atural idea ad this is the 10

11 approach of Clémeço (2014 is to estimate W(P by the U-statistics Ŵ (P = 2 ( 1 1 i<j D(X i,x j Φ P (X i,x j ad choose a partitio miimizig the empirical clusterig risk Ŵ(P. Clémeço (2014 uses the theory of U-processes to aalyze the performace of such miimizers of U-statistics. However, i order to cotrol uiform deviatios of the form sup P ΠK Ŵ(P W(P, expoetial cocetratio iequalities are eeded for U-statistics. This restricts oe to cosider boudeddissimilarity measures D(X,X. Whe D(X,X may have a heavy tail, we propose to replace U-statistics by the media-of-meas estimators of W(P itroduced i this paper. Let B be a regular partitio of {1,...,} ad defie the media-of-meas estimator W B (P of W(P as i (6. The Theorem 1 applies ad we have the followig simple corollary. Corollary 4. Let Π K be a class of partitios of cardiality Π K = N. Assume that σ 2 := E [ D(X 1,X 2 2] <. Let δ (0,1/2 be such that 128 log(n/δ. Let B be a regular partitio of {1,...,} with B = 64 log(n/δ. The there exists a costat C such that, with probability at least 1 2δ, sup W B (P W(P Cσ P Π K ( log(n/δ 1/2. (16 Proof. Sice Φ P (x,x is bouded by 1, Var(D(X 1,X 2 Φ P (X 1,X 2 E [ D(X 1,X 2 2]. For a fixed P Π K, Theorem 1 applies with m = 2 ad q = 1. The iequality follows from the uio boud. Oceuiformdeviatios of W B (P from its expected value are cotrolled, it is a routie exercise to derive performace bouds for clusterig based o miimizig W B (P over P Π K. Let P = argmi P ΠK W B (P deote the empirical miimizer. (I case of multiple miimizers, oe may select oe arbitrarily. Now for ay P 0 Π K, W( P W = W( P W B ( P+W B ( P W Takig the ifimum over Π K, W( P W B ( P+W B (P 0 W(P 0 +W(P 0 W 2 sup P Π K W B (P W(P +W(P 0 W. W( P W 2 sup P Π K W B (P W(P. (17 11

12 Fially, (16 implies that ( 1+log(N/δ 1/2 W( P W 2Cσ. This result is to be compared with Theorem 2 of Clémeço (2014. Our result holds uder the oly assumptio that D(X,X has a fiite secod momet. (This may be weakeed to assumig the existece of a fiite p-th momet for some 1 < p 2 by usig Theorem 3. O the other had, our result holds oly for a fiite class of partitios while Clémeço (2014 uses the theory of U-processes to obtai more sophisticated bouds for uiform deviatios over possibly ifiite classes of partitios. It remais a challege to develop a theory to cotrol processes of media-of-meas estimators i the style of Arcoes ad Gié (1993 ad ot havig to resort to the use of simple uio bouds. I the rest of this sectio we show that, uder certai low-oise assumptios, aalogous to the oes itroduced by Mamme ad Tsybakov (1999 i the cotext of classificatio, to obtai faster rates of covergece. I this part we eed bouds for P-caoical kerels ad use the full power of Corollary 2. Similar argumets for the study of miimizig U-statistics appear i Clémeço et al. (2008, Clémeço (2014. We assume the followig coditios, also cosidered by Clémeço (2014: 1. There exists P such that W(P = W 2. There exist α [0,1] ad κ < such that for all P Π K ad for all x X, P{Φ P (x,x Φ P (x,x} κ(w(p W α. Note that α 2 sice by the Cauchy-Schwarz iequality, W(P W E [ D(X 1,X 2 2] 1/2 P{ΦP (X 1,X 2 Φ P (X 1,X 2 } 1/2. Corollary 5. Assume the coditios above ad that σ 2 := E [ D(X 1,X 2 2] <. Let δ (0,1/2 be such that 128 log(n/δ. Let B be a regular partitio of {1,...,} with B = 64 log(n/δ. The there exists a costat C such that, with probability at least 1 2δ, ( log(n/δ 1/(2 α W( P W Cσ 2/(2 α. (18 The proof Corollary 5 is postpoed to the Appedix. 4 Appedix 4.1 Decouplig ad radomizatio Here we summarize some of the key tools for aalyzig U-statistics that we use i the paper. For a excellet expositio we refer to de la Peña ad Gié (

13 Let {X i } be i.i.d. radom variables takig values i X ad let {Xi k }, k = 1,...,m, be sequeces of idepedet copies. Let Φ be a o-egative fuctio. As a corollary of Theorem i de la Peña ad Gié (1999 we have the followig: Theorem 6. Let h : X m R be a measurable fuctio with E h(x 1,...,X m <. Let Φ : [0, [0, be a covex odecreasig fuctio such that EΦ( h(x 1,...,X m <. The EΦ h(x i1,...,x im EΦ C m h(xi 1 1,...,Xi m m I m where C m = 2 m (m m 1((m 1 m Moreover, if the kerel h is symmetric, the, EΦ c m h(xi 1 1,...,Xi m m EΦ h(x i1,...,x im I m where c m = 1/(2 2m 2 (m 1!. A equivalet result for tail probabilities of U-statistics is the followig (see Theorem i de la Peña ad Gié (1999: Theorem 7. Uder the same hypotheses as Theorem 6, there exists a costat C m depedig o m oly such that, for all t > 0, P h(x i1,...,x im > t C mp C m h(xi 1 1,...,Xi m m > t. I m If moreover, the kerel h is symmetric the there exists a costat c m depedig o m oly such that, for all t > 0, c m P c m h(xi 1 1,...,Xi m m > t P h(x i1,...,x im > t. I m Theext Theorem is a direct corollary of Theorem i de la Peña ad Gié (1999. Theorem 8. Let 1 < p 2. Let (ε i i be i.i.d Rademacher radom variables idepedet of the (X i i. Let h : X R be a P-degeerate measurable fuctio such that E( h(x 1,...,X m p <. The c m E ε i1...ε im h(x i1,...,x im p E h(x i1,...,x im p I m where C m = 2 mp ad c m = 2 mp. I m I m I m I m I m C m E ε i1...ε im h(x i1,...,x im p, I m 13

14 The same coclusio holds for decoupled U-statistics. 4.2 α-stable distributios Propositio 9. Let α (0,2. Let X 1,...,X be i.i.d. radom variables of law S(γ,α. Let f γ,α : x R be the desity fuctio of X 1. Let S = 1 i X i. The (i f γ,α (x is a eve fuctio. (ii f γ,α (x x + αγα c α x α 1 with c α = si( πα 2 Γ(α/π. (iii E[X p 1 ] is fiite for ay p < α ad is ifiite wheever p α. (iv S has a α-stable law S(γ 1/α,α. Proof. (i ad (iv follow directly from the defiitio. (ii is proved i the itroductio of Zolotarev (1986. (iii is a cosequece of (ii. 4.3 Proof of Corollary 5 Defie Λ (P = Ŵ(P W, the U-statistics based o the sample X 1,...,X, with symmetric kerel h P (x,x = D(x,x ( Φ P (x,x Φ P (x,x. We deote by Λ(P = W(P W the expected value of Λ (P. The mai argumet i the followig aalysis is based o the Hoeffdig decompositio. For all partitios P, Λ (P Λ(P = 2L (P+M (P for L (P = 1 i h(1 (X i with h (1 (x = E[h P (X,x] Λ(P ad M (P the U- statistics based o the caoical kerel give by h (2 (x,x = h P (x,x h (1 (x h (1 (x Λ(P. Let B be a regular partitio of {1,...,}. For ay B B, Λ B (P is the U- statistics o the kerel h P restricted to the set B ad Λ B (P is the media of the sequece (Λ B (P B B. We defie similarly L B (P ad M B (P o the variables (X i i B. For ay B B, Var(Λ B (P = 4Var(L B (P+Var(M B (P = 4 ( B Var h (1 2 ( (X + B ( B 1 Var h (2 (X 1,X 2. Simple computatios show that Var ( h (2 (X 1,X 2 = 2Var ( h (1 (X ad therefore, Var(Λ B (P 8 ( B Var h (1 (X. 14

15 Moreover, Var ( h (1 (X [ E X [E X hp (X,X ] ] 2 [ E X [E X D(X,X 2] [ (ΦP E X (X,X Φ P (X,X ]] 2 [ [ = E X EX D(X,X 2] { P X ΦP (X,X Φ P (X,X }] σ 2 κ(w(p W α where E X (resp. E X refers to the expectatio take with respect to X (resp. X. Chebyshev s iequality gives, for r (0, 1, } P {Λ B (P Λ(P > σ(w(p W α/2 8κ r. r B Usig agai (11 with r = 1 4, by B 128 log(n/δ, there exists a costat C such that for ay P Π K, with probability at least 1 2δ/N, Λ B (P Λ(P Cσ(W(P W α/2 log(n/δ. This implies by the uio boud, that W B ( P W( P Kσ(W( P W α/2 log(n/δ with probability at least 1 2δ. Usig (17, we obtai cocludig the proof. (W( P W 1 α/2 2Kσ log(n/δ, Refereces Alo, N., Y. Matias, ad M. Szegedy (2002. The space complexity of approximatig the frequecy momets. Joural of Computer ad System Scieces 58, Arcoes, M. A. ad E. Gié (1993. Limit theorems for U-processes. The Aals of Probability 21, Biau, G. ad K. Bleakley (2006. Statistical iferece o graphs. Statistics & Decisios 24(2,

16 Bubeck, S., N. Cesa-Biachi, ad G. Lugosi (2013. Badits with heavy tail. IEEE Trasactios o Iformatio Theory 59, Catoi, O. (2012. Challegig the empirical mea ad empirical variace: a deviatio study. Aales de l Istitut Heri Poicaré, Probabilités et Statistiques 48, Clémeço, S. (2014. A statistical view of clusterig performace through the theory of U-processes. Joural of Multivariate Aalysis 124, Clémeço, S., G. Lugosi, ad N. Vayatis (2008. Rakig ad empirical miimizatio of u-statistics. The Aals of Statistics, de la Peña, V. ad E. Gié (1999. Decouplig: from depedece to idepedece. New York: Spriger. de la Peña, V. H. (1992. Decouplig ad Khitchie s iequalities for U-statistics. The Aals of Probability, Gié, E., R. Lata la, ad J. Zi (2000. Expoetial ad momet iequalities for U- statistics. I High Dimesioal Probability II Progress i Probability, pp Birkhauser. Hoeffdig, W. (1948. A class of statistics with asymptotically ormal distributio. The Aals of Mathematical Statistics, Hoeffdig, W. (1963. Probability iequalities for sums of bouded radom variables. Joural of the America Statistical Associatio 58, Hsu, D. ad S. Sabato (2013. Approximate loss miimizatio with heavy tails. Computig Research Repository abs/ Lerasle, M. ad R. Oliveira (2011. Robust empirical mea estimators. Mamme, E. ad A. Tsybakov (1999. Smooth discrimiatio aalysis. The Aals of Statistics 27(6, Misker, S. (2015. Geometric media ad robust estimatio i Baach spaces. Beroulli. Nemirovsky, A. ad D. Yudi (1983. Problem complexity ad method efficiecy i optimizatio. Nola, J. P. (2015. Stable Distributios - Models for Heavy Tailed Data. Bosto: Birkhauser. I progress, Chapter 1 olie at academic2.america.edu/ jpola. Robis, J., L. Li, E. Tchetge, ad A. va der Vaart (2009. Quadratic semiparametric vo Mises calculus. Metrika 69(2-3,

17 Zolotarev, V. (1986. Oe-dimesioal stable distributios, Volume 65. America Mathematical Soc. 17

Dimension-free PAC-Bayesian bounds for the estimation of the mean of a random vector

Dimension-free PAC-Bayesian bounds for the estimation of the mean of a random vector Dimesio-free PAC-Bayesia bouds for the estimatio of the mea of a radom vector Olivier Catoi CREST CNRS UMR 9194 Uiversité Paris Saclay olivier.catoi@esae.fr Ilaria Giulii Laboratoire de Probabilités et

More information

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Convergence of random variables. (telegram style notes) P.J.C. Spreij Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space

More information

REGRESSION WITH QUADRATIC LOSS

REGRESSION WITH QUADRATIC LOSS REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d

More information

Lecture 19: Convergence

Lecture 19: Convergence Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may

More information

An Introduction to Randomized Algorithms

An Introduction to Randomized Algorithms A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis

More information

On the estimation of the mean of a random vector

On the estimation of the mean of a random vector O the estimatio of the mea of a radom vector Emilie Joly Uiversit Paris Ouest Naterre, Frace; emilie.joly@u-paris10.fr Gábor Lugosi ICREA ad Departmet of Ecoomics, Pompeu Fabra Uiversity, Barceloa, Spai;

More information

Lecture 3: August 31

Lecture 3: August 31 36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak

More information

Regression with quadratic loss

Regression with quadratic loss Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

Rademacher Complexity

Rademacher Complexity EECS 598: Statistical Learig Theory, Witer 204 Topic 0 Rademacher Complexity Lecturer: Clayto Scott Scribe: Ya Deg, Kevi Moo Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit theorems Throughout this sectio we will assume a probability space (Ω, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

4. Partial Sums and the Central Limit Theorem

4. Partial Sums and the Central Limit Theorem 1 of 10 7/16/2009 6:05 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 4. Partial Sums ad the Cetral Limit Theorem The cetral limit theorem ad the law of large umbers are the two fudametal theorems

More information

Lecture 15: Learning Theory: Concentration Inequalities

Lecture 15: Learning Theory: Concentration Inequalities STAT 425: Itroductio to Noparametric Statistics Witer 208 Lecture 5: Learig Theory: Cocetratio Iequalities Istructor: Ye-Chi Che 5. Itroductio Recall that i the lecture o classificatio, we have see that

More information

Sieve Estimators: Consistency and Rates of Convergence

Sieve Estimators: Consistency and Rates of Convergence EECS 598: Statistical Learig Theory, Witer 2014 Topic 6 Sieve Estimators: Cosistecy ad Rates of Covergece Lecturer: Clayto Scott Scribe: Julia Katz-Samuels, Brado Oselio, Pi-Yu Che Disclaimer: These otes

More information

Precise Rates in Complete Moment Convergence for Negatively Associated Sequences

Precise Rates in Complete Moment Convergence for Negatively Associated Sequences Commuicatios of the Korea Statistical Society 29, Vol. 16, No. 5, 841 849 Precise Rates i Complete Momet Covergece for Negatively Associated Sequeces Dae-Hee Ryu 1,a a Departmet of Computer Sciece, ChugWoo

More information

A survey on penalized empirical risk minimization Sara A. van de Geer

A survey on penalized empirical risk minimization Sara A. van de Geer A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the

More information

Self-normalized deviation inequalities with application to t-statistic

Self-normalized deviation inequalities with application to t-statistic Self-ormalized deviatio iequalities with applicatio to t-statistic Xiequa Fa Ceter for Applied Mathematics, Tiaji Uiversity, 30007 Tiaji, Chia Abstract Let ξ i i 1 be a sequece of idepedet ad symmetric

More information

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS Jauary 25, 207 INTRODUCTION TO MATHEMATICAL STATISTICS Abstract. A basic itroductio to statistics assumig kowledge of probability theory.. Probability I a typical udergraduate problem i probability, we

More information

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4. 4. BASES I BAACH SPACES 39 4. BASES I BAACH SPACES Sice a Baach space X is a vector space, it must possess a Hamel, or vector space, basis, i.e., a subset {x γ } γ Γ whose fiite liear spa is all of X ad

More information

Berry-Esseen bounds for self-normalized martingales

Berry-Esseen bounds for self-normalized martingales Berry-Essee bouds for self-ormalized martigales Xiequa Fa a, Qi-Ma Shao b a Ceter for Applied Mathematics, Tiaji Uiversity, Tiaji 30007, Chia b Departmet of Statistics, The Chiese Uiversity of Hog Kog,

More information

1 Review and Overview

1 Review and Overview DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,

More information

Element sampling: Part 2

Element sampling: Part 2 Chapter 4 Elemet samplig: Part 2 4.1 Itroductio We ow cosider uequal probability samplig desigs which is very popular i practice. I the uequal probability samplig, we ca improve the efficiecy of the resultig

More information

Chapter 6 Infinite Series

Chapter 6 Infinite Series Chapter 6 Ifiite Series I the previous chapter we cosidered itegrals which were improper i the sese that the iterval of itegratio was ubouded. I this chapter we are goig to discuss a topic which is somewhat

More information

This section is optional.

This section is optional. 4 Momet Geeratig Fuctios* This sectio is optioal. The momet geeratig fuctio g : R R of a radom variable X is defied as g(t) = E[e tx ]. Propositio 1. We have g () (0) = E[X ] for = 1, 2,... Proof. Therefore

More information

Chapter 5. Inequalities. 5.1 The Markov and Chebyshev inequalities

Chapter 5. Inequalities. 5.1 The Markov and Chebyshev inequalities Chapter 5 Iequalities 5.1 The Markov ad Chebyshev iequalities As you have probably see o today s frot page: every perso i the upper teth percetile ears at least 1 times more tha the average salary. I other

More information

Entropy and Ergodic Theory Lecture 5: Joint typicality and conditional AEP

Entropy and Ergodic Theory Lecture 5: Joint typicality and conditional AEP Etropy ad Ergodic Theory Lecture 5: Joit typicality ad coditioal AEP 1 Notatio: from RVs back to distributios Let (Ω, F, P) be a probability space, ad let X ad Y be A- ad B-valued discrete RVs, respectively.

More information

Binary classification, Part 1

Binary classification, Part 1 Biary classificatio, Part 1 Maxim Ragisky September 25, 2014 The problem of biary classificatio ca be stated as follows. We have a radom couple Z = (X,Y ), where X R d is called the feature vector ad Y

More information

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS J. Japa Statist. Soc. Vol. 41 No. 1 2011 67 73 A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS Yoichi Nishiyama* We cosider k-sample ad chage poit problems for idepedet data i a

More information

Law of the sum of Bernoulli random variables

Law of the sum of Bernoulli random variables Law of the sum of Beroulli radom variables Nicolas Chevallier Uiversité de Haute Alsace, 4, rue des frères Lumière 68093 Mulhouse icolas.chevallier@uha.fr December 006 Abstract Let be the set of all possible

More information

5.1 A mutual information bound based on metric entropy

5.1 A mutual information bound based on metric entropy Chapter 5 Global Fao Method I this chapter, we exted the techiques of Chapter 2.4 o Fao s method the local Fao method) to a more global costructio. I particular, we show that, rather tha costructig a local

More information

Lecture 2: Concentration Bounds

Lecture 2: Concentration Bounds CSE 52: Desig ad Aalysis of Algorithms I Sprig 206 Lecture 2: Cocetratio Bouds Lecturer: Shaya Oveis Ghara March 30th Scribe: Syuzaa Sargsya Disclaimer: These otes have ot bee subjected to the usual scrutiy

More information

Distribution of Random Samples & Limit theorems

Distribution of Random Samples & Limit theorems STAT/MATH 395 A - PROBABILITY II UW Witer Quarter 2017 Néhémy Lim Distributio of Radom Samples & Limit theorems 1 Distributio of i.i.d. Samples Motivatig example. Assume that the goal of a study is to

More information

Exponential Families and Bayesian Inference

Exponential Families and Bayesian Inference Computer Visio Expoetial Families ad Bayesia Iferece Lecture Expoetial Families A expoetial family of distributios is a d-parameter family f(x; havig the followig form: f(x; = h(xe g(t T (x B(, (. where

More information

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach STAT 425: Itroductio to Noparametric Statistics Witer 28 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach Istructor: Ye-Chi Che Referece: Sectio 8.4 of All of Noparametric Statistics.

More information

Advanced Stochastic Processes.

Advanced Stochastic Processes. Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.

More information

A Note on the Kolmogorov-Feller Weak Law of Large Numbers

A Note on the Kolmogorov-Feller Weak Law of Large Numbers Joural of Mathematical Research with Applicatios Mar., 015, Vol. 35, No., pp. 3 8 DOI:10.3770/j.iss:095-651.015.0.013 Http://jmre.dlut.edu.c A Note o the Kolmogorov-Feller Weak Law of Large Numbers Yachu

More information

ON POINTWISE BINOMIAL APPROXIMATION

ON POINTWISE BINOMIAL APPROXIMATION Iteratioal Joural of Pure ad Applied Mathematics Volume 71 No. 1 2011, 57-66 ON POINTWISE BINOMIAL APPROXIMATION BY w-functions K. Teerapabolar 1, P. Wogkasem 2 Departmet of Mathematics Faculty of Sciece

More information

Chapter 6 Principles of Data Reduction

Chapter 6 Principles of Data Reduction Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a

More information

Learning Theory: Lecture Notes

Learning Theory: Lecture Notes Learig Theory: Lecture Notes Kamalika Chaudhuri October 4, 0 Cocetratio of Averages Cocetratio of measure is very useful i showig bouds o the errors of machie-learig algorithms. We will begi with a basic

More information

Lecture 2. The Lovász Local Lemma

Lecture 2. The Lovász Local Lemma Staford Uiversity Sprig 208 Math 233A: No-costructive methods i combiatorics Istructor: Ja Vodrák Lecture date: Jauary 0, 208 Origial scribe: Apoorva Khare Lecture 2. The Lovász Local Lemma 2. Itroductio

More information

Empirical Process Theory and Oracle Inequalities

Empirical Process Theory and Oracle Inequalities Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi

More information

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities CS8B/Stat4B Sprig 008) Statistical Learig Theory Lecture: Ada Boost, Risk Bouds, Cocetratio Iequalities Lecturer: Peter Bartlett Scribe: Subhrasu Maji AdaBoost ad Estimates of Coditioal Probabilities We

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig

More information

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013 Large Deviatios for i.i.d. Radom Variables Cotet. Cheroff boud usig expoetial momet geeratig fuctios. Properties of a momet

More information

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014. Product measures, Toelli s ad Fubii s theorems For use i MAT3400/4400, autum 2014 Nadia S. Larse Versio of 13 October 2014. 1. Costructio of the product measure The purpose of these otes is to preset the

More information

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f. Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,

More information

Journal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula

Journal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula Joural of Multivariate Aalysis 102 (2011) 1315 1319 Cotets lists available at ScieceDirect Joural of Multivariate Aalysis joural homepage: www.elsevier.com/locate/jmva Superefficiet estimatio of the margials

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 3

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 3 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture 3 Tolstikhi Ilya Abstract I this lecture we will prove the VC-boud, which provides a high-probability excess risk boud for the ERM algorithm whe

More information

Chapter 7 Isoperimetric problem

Chapter 7 Isoperimetric problem Chapter 7 Isoperimetric problem Recall that the isoperimetric problem (see the itroductio its coectio with ido s proble) is oe of the most classical problem of a shape optimizatio. It ca be formulated

More information

Sequences. Notation. Convergence of a Sequence

Sequences. Notation. Convergence of a Sequence Sequeces A sequece is essetially just a list. Defiitio (Sequece of Real Numbers). A sequece of real umbers is a fuctio Z (, ) R for some real umber. Do t let the descriptio of the domai cofuse you; it

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

18.657: Mathematics of Machine Learning

18.657: Mathematics of Machine Learning 8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 4 Scribe: Cheg Mao Sep., 05 I this lecture, we cotiue to discuss the effect of oise o the rate of the excess risk E(h) = R(h) R(h

More information

Asymptotic distribution of products of sums of independent random variables

Asymptotic distribution of products of sums of independent random variables Proc. Idia Acad. Sci. Math. Sci. Vol. 3, No., May 03, pp. 83 9. c Idia Academy of Scieces Asymptotic distributio of products of sums of idepedet radom variables YANLING WANG, SUXIA YAO ad HONGXIA DU ollege

More information

MAT1026 Calculus II Basic Convergence Tests for Series

MAT1026 Calculus II Basic Convergence Tests for Series MAT026 Calculus II Basic Covergece Tests for Series Egi MERMUT 202.03.08 Dokuz Eylül Uiversity Faculty of Sciece Departmet of Mathematics İzmir/TURKEY Cotets Mootoe Covergece Theorem 2 2 Series of Real

More information

Rates of Convergence by Moduli of Continuity

Rates of Convergence by Moduli of Continuity Rates of Covergece by Moduli of Cotiuity Joh Duchi: Notes for Statistics 300b March, 017 1 Itroductio I this ote, we give a presetatio showig the importace, ad relatioship betwee, the modulis of cotiuity

More information

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1. Eco 325/327 Notes o Sample Mea, Sample Proportio, Cetral Limit Theorem, Chi-square Distributio, Studet s t distributio 1 Sample Mea By Hiro Kasahara We cosider a radom sample from a populatio. Defiitio

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1 EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum

More information

Kernel density estimator

Kernel density estimator Jauary, 07 NONPARAMETRIC ERNEL DENSITY ESTIMATION I this lecture, we discuss kerel estimatio of probability desity fuctios PDF Noparametric desity estimatio is oe of the cetral problems i statistics I

More information

Empirical risk minimization for heavy-tailed losses

Empirical risk minimization for heavy-tailed losses Empirical risk miimizatio for heavy-tailed losses Christia Browlees Emilie Joly Gábor Lugosi Jue 8, 2014 Abstract The purpose of this paper is to discuss empirical risk miimizatio whe the losses are ot

More information

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3 MATH 337 Sequeces Dr. Neal, WKU Let X be a metric space with distace fuctio d. We shall defie the geeral cocept of sequece ad limit i a metric space, the apply the results i particular to some special

More information

Intro to Learning Theory

Intro to Learning Theory Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified

More information

Math 525: Lecture 5. January 18, 2018

Math 525: Lecture 5. January 18, 2018 Math 525: Lecture 5 Jauary 18, 2018 1 Series (review) Defiitio 1.1. A sequece (a ) R coverges to a poit L R (writte a L or lim a = L) if for each ǫ > 0, we ca fid N such that a L < ǫ for all N. If the

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

Agnostic Learning and Concentration Inequalities

Agnostic Learning and Concentration Inequalities ECE901 Sprig 2004 Statistical Regularizatio ad Learig Theory Lecture: 7 Agostic Learig ad Cocetratio Iequalities Lecturer: Rob Nowak Scribe: Aravid Kailas 1 Itroductio 1.1 Motivatio I the last lecture

More information

On the convergence rates of Gladyshev s Hurst index estimator

On the convergence rates of Gladyshev s Hurst index estimator Noliear Aalysis: Modellig ad Cotrol, 2010, Vol 15, No 4, 445 450 O the covergece rates of Gladyshev s Hurst idex estimator K Kubilius 1, D Melichov 2 1 Istitute of Mathematics ad Iformatics, Vilius Uiversity

More information

Lecture 12: September 27

Lecture 12: September 27 36-705: Itermediate Statistics Fall 207 Lecturer: Siva Balakrisha Lecture 2: September 27 Today we will discuss sufficiecy i more detail ad the begi to discuss some geeral strategies for costructig estimators.

More information

Monte Carlo Integration

Monte Carlo Integration Mote Carlo Itegratio I these otes we first review basic umerical itegratio methods (usig Riema approximatio ad the trapezoidal rule) ad their limitatios for evaluatig multidimesioal itegrals. Next we itroduce

More information

Problem Set 2 Solutions

Problem Set 2 Solutions CS271 Radomess & Computatio, Sprig 2018 Problem Set 2 Solutios Poit totals are i the margi; the maximum total umber of poits was 52. 1. Probabilistic method for domiatig sets 6pts Pick a radom subset S

More information

Lecture 10 October Minimaxity and least favorable prior sequences

Lecture 10 October Minimaxity and least favorable prior sequences STATS 300A: Theory of Statistics Fall 205 Lecture 0 October 22 Lecturer: Lester Mackey Scribe: Brya He, Rahul Makhijai Warig: These otes may cotai factual ad/or typographic errors. 0. Miimaxity ad least

More information

Riesz-Fischer Sequences and Lower Frame Bounds

Riesz-Fischer Sequences and Lower Frame Bounds Zeitschrift für Aalysis ud ihre Aweduge Joural for Aalysis ad its Applicatios Volume 1 (00), No., 305 314 Riesz-Fischer Sequeces ad Lower Frame Bouds P. Casazza, O. Christese, S. Li ad A. Lider Abstract.

More information

Notes 19 : Martingale CLT

Notes 19 : Martingale CLT Notes 9 : Martigale CLT Math 733-734: Theory of Probability Lecturer: Sebastie Roch Refereces: [Bil95, Chapter 35], [Roc, Chapter 3]. Sice we have ot ecoutered weak covergece i some time, we first recall

More information

Detailed proofs of Propositions 3.1 and 3.2

Detailed proofs of Propositions 3.1 and 3.2 Detailed proofs of Propositios 3. ad 3. Proof of Propositio 3. NB: itegratio sets are geerally omitted for itegrals defied over a uit hypercube [0, s with ay s d. We first give four lemmas. The proof of

More information

Lecture 2: Monte Carlo Simulation

Lecture 2: Monte Carlo Simulation STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

More information

A Note on Sums of Independent Random Variables

A Note on Sums of Independent Random Variables Cotemorary Mathematics Volume 00 XXXX A Note o Sums of Ideedet Radom Variables Pawe l Hitczeko ad Stehe Motgomery-Smith Abstract I this ote a two sided boud o the tail robability of sums of ideedet ad

More information

On Random Line Segments in the Unit Square

On Random Line Segments in the Unit Square O Radom Lie Segmets i the Uit Square Thomas A. Courtade Departmet of Electrical Egieerig Uiversity of Califoria Los Ageles, Califoria 90095 Email: tacourta@ee.ucla.edu I. INTRODUCTION Let Q = [0, 1] [0,

More information

PAijpam.eu ON TENSOR PRODUCT DECOMPOSITION

PAijpam.eu ON TENSOR PRODUCT DECOMPOSITION Iteratioal Joural of Pure ad Applied Mathematics Volume 103 No 3 2015, 537-545 ISSN: 1311-8080 (prited versio); ISSN: 1314-3395 (o-lie versio) url: http://wwwijpameu doi: http://dxdoiorg/1012732/ijpamv103i314

More information

Lecture 01: the Central Limit Theorem. 1 Central Limit Theorem for i.i.d. random variables

Lecture 01: the Central Limit Theorem. 1 Central Limit Theorem for i.i.d. random variables CSCI-B609: A Theorist s Toolkit, Fall 06 Aug 3 Lecture 0: the Cetral Limit Theorem Lecturer: Yua Zhou Scribe: Yua Xie & Yua Zhou Cetral Limit Theorem for iid radom variables Let us say that we wat to aalyze

More information

arxiv: v1 [math.pr] 13 Oct 2011

arxiv: v1 [math.pr] 13 Oct 2011 A tail iequality for quadratic forms of subgaussia radom vectors Daiel Hsu, Sham M. Kakade,, ad Tog Zhag 3 arxiv:0.84v math.pr] 3 Oct 0 Microsoft Research New Eglad Departmet of Statistics, Wharto School,

More information

If a subset E of R contains no open interval, is it of zero measure? For instance, is the set of irrationals in [0, 1] is of measure zero?

If a subset E of R contains no open interval, is it of zero measure? For instance, is the set of irrationals in [0, 1] is of measure zero? 2 Lebesgue Measure I Chapter 1 we defied the cocept of a set of measure zero, ad we have observed that every coutable set is of measure zero. Here are some atural questios: If a subset E of R cotais a

More information

Linear Regression Demystified

Linear Regression Demystified Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to

More information

Entropy Rates and Asymptotic Equipartition

Entropy Rates and Asymptotic Equipartition Chapter 29 Etropy Rates ad Asymptotic Equipartitio Sectio 29. itroduces the etropy rate the asymptotic etropy per time-step of a stochastic process ad shows that it is well-defied; ad similarly for iformatio,

More information

Linear Support Vector Machines

Linear Support Vector Machines Liear Support Vector Machies David S. Roseberg The Support Vector Machie For a liear support vector machie (SVM), we use the hypothesis space of affie fuctios F = { f(x) = w T x + b w R d, b R } ad evaluate

More information

Notes 27 : Brownian motion: path properties

Notes 27 : Brownian motion: path properties Notes 27 : Browia motio: path properties Math 733-734: Theory of Probability Lecturer: Sebastie Roch Refereces:[Dur10, Sectio 8.1], [MP10, Sectio 1.1, 1.2, 1.3]. Recall: DEF 27.1 (Covariace) Let X = (X

More information

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15 17. Joit distributios of extreme order statistics Lehma 5.1; Ferguso 15 I Example 10., we derived the asymptotic distributio of the maximum from a radom sample from a uiform distributio. We did this usig

More information

Measure and Measurable Functions

Measure and Measurable Functions 3 Measure ad Measurable Fuctios 3.1 Measure o a Arbitrary σ-algebra Recall from Chapter 2 that the set M of all Lebesgue measurable sets has the followig properties: R M, E M implies E c M, E M for N implies

More information

Week 5-6: The Binomial Coefficients

Week 5-6: The Binomial Coefficients Wee 5-6: The Biomial Coefficiets March 6, 2018 1 Pascal Formula Theorem 11 (Pascal s Formula For itegers ad such that 1, ( ( ( 1 1 + 1 The umbers ( 2 ( 1 2 ( 2 are triagle umbers, that is, The petago umbers

More information

Chapter 6 Sampling Distributions

Chapter 6 Sampling Distributions Chapter 6 Samplig Distributios 1 I most experimets, we have more tha oe measuremet for ay give variable, each measuremet beig associated with oe radomly selected a member of a populatio. Hece we eed to

More information

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics for CS Spring 2008 David Wagner Note 22 CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig

More information

Stochastic Simulation

Stochastic Simulation Stochastic Simulatio 1 Itroductio Readig Assigmet: Read Chapter 1 of text. We shall itroduce may of the key issues to be discussed i this course via a couple of model problems. Model Problem 1 (Jackso

More information

A constructive analysis of convex-valued demand correspondence for weakly uniformly rotund and monotonic preference

A constructive analysis of convex-valued demand correspondence for weakly uniformly rotund and monotonic preference MPRA Muich Persoal RePEc Archive A costructive aalysis of covex-valued demad correspodece for weakly uiformly rotud ad mootoic preferece Yasuhito Taaka ad Atsuhiro Satoh. May 04 Olie at http://mpra.ub.ui-mueche.de/55889/

More information

Sequences and Series of Functions

Sequences and Series of Functions Chapter 6 Sequeces ad Series of Fuctios 6.1. Covergece of a Sequece of Fuctios Poitwise Covergece. Defiitio 6.1. Let, for each N, fuctio f : A R be defied. If, for each x A, the sequece (f (x)) coverges

More information

Lecture 4: April 10, 2013

Lecture 4: April 10, 2013 TTIC/CMSC 1150 Mathematical Toolkit Sprig 01 Madhur Tulsiai Lecture 4: April 10, 01 Scribe: Haris Agelidakis 1 Chebyshev s Iequality recap I the previous lecture, we used Chebyshev s iequality to get a

More information

Spectral Partitioning in the Planted Partition Model

Spectral Partitioning in the Planted Partition Model Spectral Graph Theory Lecture 21 Spectral Partitioig i the Plated Partitio Model Daiel A. Spielma November 11, 2009 21.1 Itroductio I this lecture, we will perform a crude aalysis of the performace of

More information

LECTURE 8: ASYMPTOTICS I

LECTURE 8: ASYMPTOTICS I LECTURE 8: ASYMPTOTICS I We are iterested i the properties of estimators as. Cosider a sequece of radom variables {, X 1}. N. M. Kiefer, Corell Uiversity, Ecoomics 60 1 Defiitio: (Weak covergece) A sequece

More information