arxiv: v1 [math.st] 17 Apr 2015
|
|
- Vernon Hood
- 6 years ago
- Views:
Transcription
1 Robust estimatio of U-statistics arxiv: v1 [math.st] 17 Apr 2015 Emilie Joly Gábor Lugosi April 20, 2015 This paper is dedicated to the memory of Evarist Gié. Abstract A importat part of the legacy of Evarist Gié is his fudametal cotributios to our uderstadig of U-statistics ad U-processes. I this paper we discuss the estimatio of the mea of multivariate fuctios i case of possibly heavy-tailed distributios. I such situatios, reliable estimates of the mea caot be obtaied by usual U-statistics. We itroduce a ew estimator, based o the so-called media-of-meas techique. We develop performace bouds for this ew estimator that geeralizes a estimate of Arcoes ad Gié (1993, showig that the ew estimator performs, uder miimal momet coditios, as well as classical U-statistics for bouded radom variables. We discuss a applicatio of this estimator to clusterig. 1 Itroductio Motivated by umerous applicatios, the theory of U-statistics ad U-processes has received cosiderable attetio i the past decades. U-statistics appear aturally i rakig (Clémeço et al., 2008, clusterig (Clémeço, 2014 ad learig o graphs (Biau ad Bleakley, 2006 or as compoets of higher-order terms i expasios of smooth statistics, see, for example, Robis et al. (2009. The geeral settig may be described as follows. Let X be a radom variable takig values i some measurable space X ad let h : X m R be a measurable fuctio of m 2 variables. Let P be the probability measure of X. Suppose we have access to m idepedet radom variables X 1,...,X, all distributed as X. We defie the U-statistics of order m ad kerel h based o the sequece {X i } as U (h = ( m!! (i 1,...,i m I m h(x i1,...,x im, (1 École Normale Supérieure, Paris ICREA ad Departmet of Ecoomics ad Busiess, Pompeu Fabra Uiversity. Supported by the Spaish Miistry of Sciece ad Techology grat MTM
2 where I m = {(i 1,...,i m : 1 i j, i j i k if j k} is the set of all m-tuples of differet itegers betwee 1 ad. U-statistics are ubiased estimators of themea m h = Eh(X 1,...,X m ad have miimal variace amog all ubiased estimators (Hoeffdig, Uderstadig the cocetratio of a U-statistics aroud its expected value has bee subject of extesive study. de la Peña ad Gié (1999 provide a excellet summary but see also Gié et al. (2000 for a more recet developmet. By a classical iequality of Hoeffdig (1963, for a bouded kerel h, for all δ > 0, P U log( 2 δ (h m h > h 2 /m δ, (2 ad we also have the Berstei-type iequality P U 4σ 2 log( 2 δ (h m h > 2 /m 4 h log(2 δ 6 /m δ, where σ 2 = Var(h(X 1,...,X m. However, uder certai degeeracy assumptios o the kerel, sigificatly sharper bouds have bee proved. Followig the expositio of de la Peña ad Gié (1999, for coveiece, we restrict out attetio to symmetric kerels. A kerel h is symmetric if for all x 1,...,x m R ad all permutatios s, h(x 1,...,x m = h(x s1,...,x sm. A symmetric kerel h is said to be P-degeerate of order q 1, 1 < q m, if for all x 1,...,x q 1 X, h(x 1,...,x m dp m q+1 (x q,...,x m = h(x 1,...,x m dp m (x 1,...,x m ad (x 1,...,x q f(x 1,...,x m dp m q (x q+1,...,x m is ot a costat fuctio. I the special case of m h = 0 ad q = m (i.e., whe the kerel is (m 1-degeerate, h is said to be P-caoical. P-caoical kerels appear aturally i the Hoeffdig decompositio of a U-statistic, see de la Peña ad Gié (1999. Arcoes ad Gié (1993 proved the followig importat improvemet of Hoeffig s iequalities for caoical kerels: If h m h is a bouded, symmetric P-caoical kerel of 2
3 m variables, there exist fiite positive costats c 1 ad c 2 depedig oly o m such that for all δ (0,1, { ( log( c 2δ } m/2 P U (h m h c 1 h δ, (3 ad also { ( σ 2 log( c 1 m/2 P U (h m h > δ h ( log( c 1δ } (m+1/2 δ. (4 c 2 c 2 I the special case of P-caoical kerels of order m = 2, (3 implies that U (h m h c ( 1 h c2 log, (5 δ with probability at least 1 δ. Note that this rate of covergece is sigificatly faster tha the rate O p ( 1/2 implied by (2. All the results cited above require boudedess of the kerel. If the kerel is ubouded buth(x 1,...,X m hassufficietlylight (e.g., sub-gaussiatails, thesomeoftheseresults may be exteded, see, for example, Gié et al. (2000. However, if h(x 1,...,X m may have a heavy-tailed distributio, expoetial iequalities do ot hold aymore (eve i the uivariate m = 1 case. However, eve though U-statistics may have a erratic behavior i the presece of heavy tails, i this paper we show that uder miimal momet coditios, oe may costruct estimators of m h that satisfy expoetial iequalities aalogous to (2 ad (3. These are the mai results of the paper. I particular, i Sectio 2 we itroduce a robust estimator of the mea m h. Theorems 1 ad 3 establish expoetial iequalities for the performace of the ew estimator uder miimal momet assumptios. More precisely, Theorem 1 oly requires that h(x 1,...,X m has a fiite variace ad establishes iequalities aalogous to (3 for P-degeerate kerels. I Theorem 3 we further weake the coditios ad oly assume that there exists 1 < p 2 such that E h p <. The ext example illustrates why classical U-statistics fail uder heavy-tailed distributios. Example. Cosider the special case m = 2, EX 1 = 0 ad h(x 1,X 2 = X 1 X 2. Note that thiskerelisp-caoical. WedefieY 1,...,Y asidepedetcopiesofx 1,...,X. BydecoupligiequalitiesforthetailofU-statisticsgiveiTheorem3.4.1ide la Peña ad Gié (1999(seealsoTheorem7itheAppedix,U (hhasasimilartailbehaviorto ( 1 i=1 X ( 1 i Thus, U (h behaves like a product of two idepedet empirical mea estimators of the same distributio. Whe the X i are heavy tailed, the empirical mea is kow to be a poor estimator of the mea. As a example, assume that X follows a α-stable law S(γ,α for some α (1,2 ad γ > 0. Recall that a radom variable X has a α-stable law S(γ,α if for all u R, Eexp(iuX = exp( γ α u α j=1 Y j.
4 (see Zolotarev (1986, Nola (2015. The it follows from the properties of α-stable distributios (summarized i Propositio 9 i the Appedix that there exists a costat c > 0 depedig oly o α ad γ such that { P U (h 2/α 2} c, ad therefore there is o hope to reproduce a upper boud like (5. Below we show how this problem ca be dealt with by replacig the U-statistics by a more robust estimator. Our approach is based o robust mea estimators i the uivariate settig. Estimatio of the mea of a possibly heavy-tailed radom variable X from i.i.d. sample X 1,...,X has recetly received icreasig attetio. Itroduced by Nemirovsky ad Yudi (1983, the media-of-meas estimator takes a cofidece level δ (0,1 ad divides the data ito V logδ 1 blocks. For each block k = 1,...,V, oe may compute the empirical mea µ k o the variables i the block. The media µ of the µ k is the so-called media-of-meas estimator. A short aalysis of the resultig estimator shows that µ m h c log(1/δ Var(X with probability at least 1 δ for a umerical costat c. For the details of the proof see Lerasle ad Oliveira (2011. Whe the variace is ifiite but a momet of order 1 < p 2 exists, the media-of meas estimator is still useful, see Bubeck et al. (2013. This estimator has recetly bee studied i various cotexts. M-estimatio based o this techique has bee developed by Lerasle ad Oliveira (2011 ad geeralizatios i a multivariate cotext have bee discussed by Hsu ad Sabato (2013 ad Misker (2015. A similar idea was used i Alo et al. (2002. A iterestig alterative of the media-ofmeas estimator has bee proposed by Catoi (2012. The rest of the paper is orgaized as follows. I Sectio 2 we itroduce a robust estimator of the mea m h ad preset performace bouds. I particular, Sectio 2.1 deals with the fiite variace case. Sectio 2.2 is dedicated to case whe h has a fiite p-th momet for some 1 < p < 2 for P-degeerate kerels. Fially, i Sectio 3, we preset a applicatio to clusterig problems. 2 Robust U-estimatio I this sectio we itroducea media-of-meas -style estimator of m h = Eh(X 1,...,X m. To defie the estimator, oe divides the data ito V blocks. For ay m-tuple of differet blocks, oe may compute a (decoupled U-statistics. Fially, oe computes the media of all the obtaied values. The rigorous defiitio is as follows. The estimator has a parameter V, the umber of blocks. A partitio B = (B 1,...,B V of {1,...,} is called regular if for all K = 1,...,V, B K V 1. 4
5 For ay B i1,...,b im i B, we set ad U Bi1,...,B im (h = I Bi1,...,B im = { (k 1,...,k m : k j B ij } 1 B i1 B im (k 1,...,k m I Bi1,...,B im h(x k1,...,x km. For ay iteger N adayvector (a 1,...,a N R N, wedefiethemediamed(a 1,...,a N as ay umber b such that {i N : ai b} N 2 Fially, we defie the robust estimator: ad {i N : ai b} N 2. U B (h = Med{U Bi1,...,B im (h : i j {1,...,V},1 i 1 <... < i m V}. (6 Notethat, mostlyiordertosimplifyotatio, weolytakethosevaluesofu Bi1,...,B im (h ito accout that correspod to distict idices i 1 < < i m. Thus, each U Bi1,...,B im (h is a so-called decoupled U-statistics (see the Appedix for the defiitio. Oe may icorporate all m-tuples (ot ecessarily with distict idices i the computatio of the media. However, this has a mior effect o the performace. Similar bouds may be prove though with a more complicated otatio. A simpler alterative is obtaied by takig oly diagoal blocks ito accout. More precisely, let U Bi (hbetheu-statistics calculated usigthevariables iblock B i (asdefied i (1. Oe may simply calculate the media of the V differet U-statistics U Bi (h. This versio is easy to aalyze because {i V : UBi (h b} is a sum of idepedet radom variables. However, this simple versio is wasteful i the sese that oly a small fractio of possible m-tuples are take ito accout. I the ext two sectios we aalyze the performace of the estimator U B (h. 2.1 Expoetial iequalities for P-degeerate kerels with fiite variace. Next we preset a performace boud of the estimator U B (h i the case whe σ 2 is fiite. The somewhat more complicated case of ifiite secod momet is treated i Sectio 2.2. Theorem 1. Let X 1,...,X be i.i.d. radom variables takig values i X. Let h : X m R be a symmetric kerel that is P-degeerate of order q 1. Assume Var(h(X 1,...,X m = σ 2 <. Let δ (0, 1 2 be such that log(1/δ 64m. Let B be a regular partitio of {1,...,} with B = 32m log(1/δ. The, with probability at least 1 2δ, we have ( UB (h m h log(1/δ q/2 Km σ, (7 where K m = m+1 m m 2. 5
6 Whe q = m, the kerel h m h is P-caoical ad the rate of covergece is the give by (logδ 1 / m/2. Thus, the ew estimator has a performace similar to stadard U-statistics as i (3 ad (4 but without the boudedess assumptio for the kerel. It is importat to ote that a disadvatage of the estimator U B (h is that it depeds o the cofidece level δ (through the umber of blocks. For differet cofidece levels, differet estimators are used. Because of its importace i applicatios, we spell out the special case whem = q = 2. I Sectio 3 we use this result i a example of cluster aalysis. Corollary 2. Let δ (0,1/2. Let h : X 2 R be a P-caoical kerel with σ 2 = Var(h(X 1,X 2 ad let 128(1+log(1/δ. The, with probability at least 1 2δ, U B (h m h 512σ 1+log(1/δ. (8 I the proof of Theorem 1 we eed the otio of Hoeffdig decompositio (Hoeffdig, 1948 of U-statistics. For probability measures P 1,...,P m, defie P 1 P m h = h d(p1,...,p m. For a symmetric kerel h : X m R the Hoeffdig projectios are defied, for 0 k m ad x 1,...,x k X, as π k h(x 1,...,x k := (δ x1 P (δ xk P P m k h where δ x deotes the Dirac measure at the poit x. Observe that π 0 h = P m h ad for k > 0, π k h is a P-caoical kerel. h ca be decomposed as h(x 1,...,x m = m k=01 i 1 <...<i k m π k h(x i1,...,x ik. (9 If h is assumed to be square-itegrable (i.e., P m h 2 <, the terms i (9 are orthogoal. If h is degeerate of order q 1, the for ay 1 k q 1, π k h = 0. Proof of Theorem 1. We begi with a weak cocetratio result o each U Bi1,...,B im (h. Let B i1,...,b im be elemets of B. For ay B B, we have 2 2 B B B. We deote by k = (k 1,...,k m a elemet of I Bi1,...,B im. We have, by the above-metioed orthogoality 6
7 property, ( Var U Bi1,...,B im (h = E [(U Bi1,...,B im (h P m h 2] 1 = B i B im 2 E[(h(X k1,...,x km P m h(h(x l1,...,x lm P m h] k I Bi1,...,B im = l I Bi1,...,B im 1 B i B im 2 k I Bi1,...,B im l I Bi1,...,B im 1 B i B im 2 k I Bi1,...,B im m ( k l E [ π s h(x 1,...,X s 2] (by orthogoality s s=q m m s=q t=0 ( t E [ π s h(x 1,...,X s 2] s ( 2 m t. B The last iequality is obtaied by coutig, for ay fixed k ad t, the umber of elemets l such that k l = t. Thus, ( Var U Bi1,...,B im (h 1 B i1... B im 1 B i1... B im ( 2 B 1 m m s=q 22m q+1 B q q O the other had, we have, by (9, m Var(h = E = = m m s=q t=q m ( m s s=q ( m s m s=q s=q 1 i 1 <...<i s m m ( t E [ ( π s h(x 1,...,X s 2] 2 s B E [ π s h(x 1,...,X s 2] m E [ π s h(x 1,...,X s 2] 2 ( m E [ π s h(x 1,...,X s 2]. s s=q 1 i 1 <...<i s m m s=q ( m s E ( 2 B 2 π s h(x i1,...,x is E [(π s h(x i1,...,x is 2] [ (π s h(x 1,...,X s 2]. 7 ( 2 B t=q m q m t m t
8 Combiig the two displayed equatios above, ( Var U Bi1,...,B im (h By Chebyshev s iequality, for all r (0,1, { P 22m q+1 B q q σ 2 22m B q q σ 2. U Bi1,...,B im (h P m h > 2 m σ B q/2 q/2 r 1/2 } r. (10 We set x = 2 m σ B q/2, ad q/2 r 1/2 { N x = (i 1,...,i m {1,...,V} m : 1 i 1 <... < i m B, U Bi1,...,B im (h P m h > x}. 1 The radom variable ( B m N x is a U-statistics of order m with the symmetric kerel g : (i 1,...,i m ½ {UBi1,...,B im (h P m h>x}. Thus, Hoeffdig s iequality for cetered U- statistics (2 gives { ( } B P N x EN x t m exp ( B t2 2m. (11 By (10 we have EN x ( B m r. Takig t = r = 1 4 i (11, by the defiitio of the media, we have { } P { U B (h P m (h > x } P N x ( B m ( exp B 32m Sice B 32mlog(δ 1, with probability at least 1 δ, we have U B (h P m h K m σ ( logδ 1 q/2 2. withk m = m+1 m m 2. Theupperboudforthelowertail holdsbythesameargumet. 2.2 Bouded momet of order p with 1 < p 2 I this sectio, we weake the assumptio of fiite variace ad oly assume the existece of a cetered momet of order p for some 1 < p 2. The outlie of the argumet is similar as i the case of fiite variace. First we obtai a weak cocetratio iequality for 8
9 the U-statistics is each block ad the use the property of the media to boost the weak iequality. While for the case of fiite variace weak cocetratio could be proved by a direct calculatio of the variace, here we eed the radomizatio iequalities for covex fuctios of U-statistics established by de la Peña (1992 ad Arcoes ad Gié (1993. Note that, here, a P-caoical techical assumptio is eeded. Theorem 3. Let h be a symmetric kerel of order m such that h m h is P-caoical. Assume that M p := E [ h(x1,...,x m m h p ] 1/p < for some 1 < p 2. Let δ (0, 1 2 be such that log(δ 1 64m. Let B be a regular partitio of {1,...,} with B = 32m log(δ 1. The, with probability at least 1 2δ, we have where K m = 2 4m+1 m m 2. ( log(δ 1 m(p 1/p U B (h m h K m M p (12 Proof. Defie the cetered versio of h by g(x 1,...,x m := h(x 1,...,x m m h. Let ε 1,...,ε be i.i.d. Rademacher radom variables (i.e., P{ε 1 = 1} = P{ε 1 = 1} = 1/2 idepedet of X 1,...,X. By the radomizatio iequalities (see Theorem i de la Peña ad Gié (1999 ad also Theorem 8 i the Appedix, we have p E g(x k1,...,x km (k 1,...,k m I Bi1,...,B im p 2 mp E X E ε ε k1...ε km g(x k1,...,x km (13 (k 1,...,k m I Bi1,...,B im 2 p/2 2 mp E X E ε ε k1...ε km g(x k1,...,x km (k 1,...,k m I Bi1,...,B im p/2 = 2 mp E X g(x k1,...,x km 2 (k 1,...,k m I Bi1,...,B im 2 mp (k 1,...,k m I Bi1,...,B im E g(x k1,...,x km p = 2 mp B i1 B im E g p. (14 9
10 ] Thus, we have E [ U Bi1,...,B im (h m h p 2 mp ( B i1... B im 1 p E g p ad by Markov s iequality, P { U Bi1,...,B im (h m h > 2m M p Aother use of (11 with t = r = 1 4 gives r 1 p ( m 1 p} p r. (15 (2 B U B (h P m h 2 4m+1 m m 2 Mp ( logδ 1 m p 1 p. To see why the boud of Theorem 3 gives essetially the right order of magitude, cosider agai the example described i the itroductio, whe m = 2, h(x 1,X 2 = X 1 X 2, ad the X i have a α-stable law S(γ,α for some γ > 0 ad 1 < α 2. Note that a α-stable radom variable has fiite momets up to (but ot icludig α ad therefore we may take ay p = α ǫ for ay ǫ (0,1 α. As we oted it i the itroductio, there exists a costat c depedig o α ad γ oly such that for all 1 i 1 < i 2 V, ( } P{ UBi1,B (h m 2/α 2 i2 h c 2/3, B ad therefore (15 is essetially the best rate oe ca hope for. 3 Cluster aalysis with U-statistics I this sectio we illustrate the use of the proposed mea estimator i a clusterig problem whe the presece of possibly heavy-tailed data requires robust techiques. We cosider the geeral statistical framework defied by Clémeço (2014, described as follows: Let X,X be i.i.d. radom variables takig values i X where typically but ot ecessarily, X is a subset of R d. For a partitio P of X ito K disjoit sets the so-called cells, defie Φ P (x,x = C P ½ {(x,x C 2 } the {0,1}-valued fuctio that idicates whether two elemets x ad x belog to the same cell C. Give a dissimilarity measure D : X 2 R +, the clusterig task cosists i fidig a partitio of X miimizig the clusterig risk W(P = E [ D(X,X Φ P (X,X ]. Let Π K be a fiite class of partitios P of X ito K cells ad defie W = mi P ΠK W(P. Give X 1,...,X be i.i.d. radom variables distributed as X, the goal is to fid a partitio P Π K with risk as close to W as possible. A atural idea ad this is the 10
11 approach of Clémeço (2014 is to estimate W(P by the U-statistics Ŵ (P = 2 ( 1 1 i<j D(X i,x j Φ P (X i,x j ad choose a partitio miimizig the empirical clusterig risk Ŵ(P. Clémeço (2014 uses the theory of U-processes to aalyze the performace of such miimizers of U-statistics. However, i order to cotrol uiform deviatios of the form sup P ΠK Ŵ(P W(P, expoetial cocetratio iequalities are eeded for U-statistics. This restricts oe to cosider boudeddissimilarity measures D(X,X. Whe D(X,X may have a heavy tail, we propose to replace U-statistics by the media-of-meas estimators of W(P itroduced i this paper. Let B be a regular partitio of {1,...,} ad defie the media-of-meas estimator W B (P of W(P as i (6. The Theorem 1 applies ad we have the followig simple corollary. Corollary 4. Let Π K be a class of partitios of cardiality Π K = N. Assume that σ 2 := E [ D(X 1,X 2 2] <. Let δ (0,1/2 be such that 128 log(n/δ. Let B be a regular partitio of {1,...,} with B = 64 log(n/δ. The there exists a costat C such that, with probability at least 1 2δ, sup W B (P W(P Cσ P Π K ( log(n/δ 1/2. (16 Proof. Sice Φ P (x,x is bouded by 1, Var(D(X 1,X 2 Φ P (X 1,X 2 E [ D(X 1,X 2 2]. For a fixed P Π K, Theorem 1 applies with m = 2 ad q = 1. The iequality follows from the uio boud. Oceuiformdeviatios of W B (P from its expected value are cotrolled, it is a routie exercise to derive performace bouds for clusterig based o miimizig W B (P over P Π K. Let P = argmi P ΠK W B (P deote the empirical miimizer. (I case of multiple miimizers, oe may select oe arbitrarily. Now for ay P 0 Π K, W( P W = W( P W B ( P+W B ( P W Takig the ifimum over Π K, W( P W B ( P+W B (P 0 W(P 0 +W(P 0 W 2 sup P Π K W B (P W(P +W(P 0 W. W( P W 2 sup P Π K W B (P W(P. (17 11
12 Fially, (16 implies that ( 1+log(N/δ 1/2 W( P W 2Cσ. This result is to be compared with Theorem 2 of Clémeço (2014. Our result holds uder the oly assumptio that D(X,X has a fiite secod momet. (This may be weakeed to assumig the existece of a fiite p-th momet for some 1 < p 2 by usig Theorem 3. O the other had, our result holds oly for a fiite class of partitios while Clémeço (2014 uses the theory of U-processes to obtai more sophisticated bouds for uiform deviatios over possibly ifiite classes of partitios. It remais a challege to develop a theory to cotrol processes of media-of-meas estimators i the style of Arcoes ad Gié (1993 ad ot havig to resort to the use of simple uio bouds. I the rest of this sectio we show that, uder certai low-oise assumptios, aalogous to the oes itroduced by Mamme ad Tsybakov (1999 i the cotext of classificatio, to obtai faster rates of covergece. I this part we eed bouds for P-caoical kerels ad use the full power of Corollary 2. Similar argumets for the study of miimizig U-statistics appear i Clémeço et al. (2008, Clémeço (2014. We assume the followig coditios, also cosidered by Clémeço (2014: 1. There exists P such that W(P = W 2. There exist α [0,1] ad κ < such that for all P Π K ad for all x X, P{Φ P (x,x Φ P (x,x} κ(w(p W α. Note that α 2 sice by the Cauchy-Schwarz iequality, W(P W E [ D(X 1,X 2 2] 1/2 P{ΦP (X 1,X 2 Φ P (X 1,X 2 } 1/2. Corollary 5. Assume the coditios above ad that σ 2 := E [ D(X 1,X 2 2] <. Let δ (0,1/2 be such that 128 log(n/δ. Let B be a regular partitio of {1,...,} with B = 64 log(n/δ. The there exists a costat C such that, with probability at least 1 2δ, ( log(n/δ 1/(2 α W( P W Cσ 2/(2 α. (18 The proof Corollary 5 is postpoed to the Appedix. 4 Appedix 4.1 Decouplig ad radomizatio Here we summarize some of the key tools for aalyzig U-statistics that we use i the paper. For a excellet expositio we refer to de la Peña ad Gié (
13 Let {X i } be i.i.d. radom variables takig values i X ad let {Xi k }, k = 1,...,m, be sequeces of idepedet copies. Let Φ be a o-egative fuctio. As a corollary of Theorem i de la Peña ad Gié (1999 we have the followig: Theorem 6. Let h : X m R be a measurable fuctio with E h(x 1,...,X m <. Let Φ : [0, [0, be a covex odecreasig fuctio such that EΦ( h(x 1,...,X m <. The EΦ h(x i1,...,x im EΦ C m h(xi 1 1,...,Xi m m I m where C m = 2 m (m m 1((m 1 m Moreover, if the kerel h is symmetric, the, EΦ c m h(xi 1 1,...,Xi m m EΦ h(x i1,...,x im I m where c m = 1/(2 2m 2 (m 1!. A equivalet result for tail probabilities of U-statistics is the followig (see Theorem i de la Peña ad Gié (1999: Theorem 7. Uder the same hypotheses as Theorem 6, there exists a costat C m depedig o m oly such that, for all t > 0, P h(x i1,...,x im > t C mp C m h(xi 1 1,...,Xi m m > t. I m If moreover, the kerel h is symmetric the there exists a costat c m depedig o m oly such that, for all t > 0, c m P c m h(xi 1 1,...,Xi m m > t P h(x i1,...,x im > t. I m Theext Theorem is a direct corollary of Theorem i de la Peña ad Gié (1999. Theorem 8. Let 1 < p 2. Let (ε i i be i.i.d Rademacher radom variables idepedet of the (X i i. Let h : X R be a P-degeerate measurable fuctio such that E( h(x 1,...,X m p <. The c m E ε i1...ε im h(x i1,...,x im p E h(x i1,...,x im p I m where C m = 2 mp ad c m = 2 mp. I m I m I m I m I m C m E ε i1...ε im h(x i1,...,x im p, I m 13
14 The same coclusio holds for decoupled U-statistics. 4.2 α-stable distributios Propositio 9. Let α (0,2. Let X 1,...,X be i.i.d. radom variables of law S(γ,α. Let f γ,α : x R be the desity fuctio of X 1. Let S = 1 i X i. The (i f γ,α (x is a eve fuctio. (ii f γ,α (x x + αγα c α x α 1 with c α = si( πα 2 Γ(α/π. (iii E[X p 1 ] is fiite for ay p < α ad is ifiite wheever p α. (iv S has a α-stable law S(γ 1/α,α. Proof. (i ad (iv follow directly from the defiitio. (ii is proved i the itroductio of Zolotarev (1986. (iii is a cosequece of (ii. 4.3 Proof of Corollary 5 Defie Λ (P = Ŵ(P W, the U-statistics based o the sample X 1,...,X, with symmetric kerel h P (x,x = D(x,x ( Φ P (x,x Φ P (x,x. We deote by Λ(P = W(P W the expected value of Λ (P. The mai argumet i the followig aalysis is based o the Hoeffdig decompositio. For all partitios P, Λ (P Λ(P = 2L (P+M (P for L (P = 1 i h(1 (X i with h (1 (x = E[h P (X,x] Λ(P ad M (P the U- statistics based o the caoical kerel give by h (2 (x,x = h P (x,x h (1 (x h (1 (x Λ(P. Let B be a regular partitio of {1,...,}. For ay B B, Λ B (P is the U- statistics o the kerel h P restricted to the set B ad Λ B (P is the media of the sequece (Λ B (P B B. We defie similarly L B (P ad M B (P o the variables (X i i B. For ay B B, Var(Λ B (P = 4Var(L B (P+Var(M B (P = 4 ( B Var h (1 2 ( (X + B ( B 1 Var h (2 (X 1,X 2. Simple computatios show that Var ( h (2 (X 1,X 2 = 2Var ( h (1 (X ad therefore, Var(Λ B (P 8 ( B Var h (1 (X. 14
15 Moreover, Var ( h (1 (X [ E X [E X hp (X,X ] ] 2 [ E X [E X D(X,X 2] [ (ΦP E X (X,X Φ P (X,X ]] 2 [ [ = E X EX D(X,X 2] { P X ΦP (X,X Φ P (X,X }] σ 2 κ(w(p W α where E X (resp. E X refers to the expectatio take with respect to X (resp. X. Chebyshev s iequality gives, for r (0, 1, } P {Λ B (P Λ(P > σ(w(p W α/2 8κ r. r B Usig agai (11 with r = 1 4, by B 128 log(n/δ, there exists a costat C such that for ay P Π K, with probability at least 1 2δ/N, Λ B (P Λ(P Cσ(W(P W α/2 log(n/δ. This implies by the uio boud, that W B ( P W( P Kσ(W( P W α/2 log(n/δ with probability at least 1 2δ. Usig (17, we obtai cocludig the proof. (W( P W 1 α/2 2Kσ log(n/δ, Refereces Alo, N., Y. Matias, ad M. Szegedy (2002. The space complexity of approximatig the frequecy momets. Joural of Computer ad System Scieces 58, Arcoes, M. A. ad E. Gié (1993. Limit theorems for U-processes. The Aals of Probability 21, Biau, G. ad K. Bleakley (2006. Statistical iferece o graphs. Statistics & Decisios 24(2,
16 Bubeck, S., N. Cesa-Biachi, ad G. Lugosi (2013. Badits with heavy tail. IEEE Trasactios o Iformatio Theory 59, Catoi, O. (2012. Challegig the empirical mea ad empirical variace: a deviatio study. Aales de l Istitut Heri Poicaré, Probabilités et Statistiques 48, Clémeço, S. (2014. A statistical view of clusterig performace through the theory of U-processes. Joural of Multivariate Aalysis 124, Clémeço, S., G. Lugosi, ad N. Vayatis (2008. Rakig ad empirical miimizatio of u-statistics. The Aals of Statistics, de la Peña, V. ad E. Gié (1999. Decouplig: from depedece to idepedece. New York: Spriger. de la Peña, V. H. (1992. Decouplig ad Khitchie s iequalities for U-statistics. The Aals of Probability, Gié, E., R. Lata la, ad J. Zi (2000. Expoetial ad momet iequalities for U- statistics. I High Dimesioal Probability II Progress i Probability, pp Birkhauser. Hoeffdig, W. (1948. A class of statistics with asymptotically ormal distributio. The Aals of Mathematical Statistics, Hoeffdig, W. (1963. Probability iequalities for sums of bouded radom variables. Joural of the America Statistical Associatio 58, Hsu, D. ad S. Sabato (2013. Approximate loss miimizatio with heavy tails. Computig Research Repository abs/ Lerasle, M. ad R. Oliveira (2011. Robust empirical mea estimators. Mamme, E. ad A. Tsybakov (1999. Smooth discrimiatio aalysis. The Aals of Statistics 27(6, Misker, S. (2015. Geometric media ad robust estimatio i Baach spaces. Beroulli. Nemirovsky, A. ad D. Yudi (1983. Problem complexity ad method efficiecy i optimizatio. Nola, J. P. (2015. Stable Distributios - Models for Heavy Tailed Data. Bosto: Birkhauser. I progress, Chapter 1 olie at academic2.america.edu/ jpola. Robis, J., L. Li, E. Tchetge, ad A. va der Vaart (2009. Quadratic semiparametric vo Mises calculus. Metrika 69(2-3,
17 Zolotarev, V. (1986. Oe-dimesioal stable distributios, Volume 65. America Mathematical Soc. 17
Dimension-free PAC-Bayesian bounds for the estimation of the mean of a random vector
Dimesio-free PAC-Bayesia bouds for the estimatio of the mea of a radom vector Olivier Catoi CREST CNRS UMR 9194 Uiversité Paris Saclay olivier.catoi@esae.fr Ilaria Giulii Laboratoire de Probabilités et
More informationConvergence of random variables. (telegram style notes) P.J.C. Spreij
Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space
More informationREGRESSION WITH QUADRATIC LOSS
REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d
More informationLecture 19: Convergence
Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may
More informationAn Introduction to Randomized Algorithms
A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis
More informationOn the estimation of the mean of a random vector
O the estimatio of the mea of a radom vector Emilie Joly Uiversit Paris Ouest Naterre, Frace; emilie.joly@u-paris10.fr Gábor Lugosi ICREA ad Departmet of Ecoomics, Pompeu Fabra Uiversity, Barceloa, Spai;
More informationLecture 3: August 31
36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS
MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak
More informationRegression with quadratic loss
Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More informationRademacher Complexity
EECS 598: Statistical Learig Theory, Witer 204 Topic 0 Rademacher Complexity Lecturer: Clayto Scott Scribe: Ya Deg, Kevi Moo Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for
More informationECE 901 Lecture 12: Complexity Regularization and the Squared Loss
ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit theorems Throughout this sectio we will assume a probability space (Ω, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More information4. Partial Sums and the Central Limit Theorem
1 of 10 7/16/2009 6:05 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 4. Partial Sums ad the Cetral Limit Theorem The cetral limit theorem ad the law of large umbers are the two fudametal theorems
More informationLecture 15: Learning Theory: Concentration Inequalities
STAT 425: Itroductio to Noparametric Statistics Witer 208 Lecture 5: Learig Theory: Cocetratio Iequalities Istructor: Ye-Chi Che 5. Itroductio Recall that i the lecture o classificatio, we have see that
More informationSieve Estimators: Consistency and Rates of Convergence
EECS 598: Statistical Learig Theory, Witer 2014 Topic 6 Sieve Estimators: Cosistecy ad Rates of Covergece Lecturer: Clayto Scott Scribe: Julia Katz-Samuels, Brado Oselio, Pi-Yu Che Disclaimer: These otes
More informationPrecise Rates in Complete Moment Convergence for Negatively Associated Sequences
Commuicatios of the Korea Statistical Society 29, Vol. 16, No. 5, 841 849 Precise Rates i Complete Momet Covergece for Negatively Associated Sequeces Dae-Hee Ryu 1,a a Departmet of Computer Sciece, ChugWoo
More informationA survey on penalized empirical risk minimization Sara A. van de Geer
A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the
More informationSelf-normalized deviation inequalities with application to t-statistic
Self-ormalized deviatio iequalities with applicatio to t-statistic Xiequa Fa Ceter for Applied Mathematics, Tiaji Uiversity, 30007 Tiaji, Chia Abstract Let ξ i i 1 be a sequece of idepedet ad symmetric
More informationChapter 3. Strong convergence. 3.1 Definition of almost sure convergence
Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i
More informationOptimally Sparse SVMs
A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but
More informationJanuary 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS
Jauary 25, 207 INTRODUCTION TO MATHEMATICAL STATISTICS Abstract. A basic itroductio to statistics assumig kowledge of probability theory.. Probability I a typical udergraduate problem i probability, we
More informationDefinition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.
4. BASES I BAACH SPACES 39 4. BASES I BAACH SPACES Sice a Baach space X is a vector space, it must possess a Hamel, or vector space, basis, i.e., a subset {x γ } γ Γ whose fiite liear spa is all of X ad
More informationBerry-Esseen bounds for self-normalized martingales
Berry-Essee bouds for self-ormalized martigales Xiequa Fa a, Qi-Ma Shao b a Ceter for Applied Mathematics, Tiaji Uiversity, Tiaji 30007, Chia b Departmet of Statistics, The Chiese Uiversity of Hog Kog,
More information1 Review and Overview
DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,
More informationElement sampling: Part 2
Chapter 4 Elemet samplig: Part 2 4.1 Itroductio We ow cosider uequal probability samplig desigs which is very popular i practice. I the uequal probability samplig, we ca improve the efficiecy of the resultig
More informationChapter 6 Infinite Series
Chapter 6 Ifiite Series I the previous chapter we cosidered itegrals which were improper i the sese that the iterval of itegratio was ubouded. I this chapter we are goig to discuss a topic which is somewhat
More informationThis section is optional.
4 Momet Geeratig Fuctios* This sectio is optioal. The momet geeratig fuctio g : R R of a radom variable X is defied as g(t) = E[e tx ]. Propositio 1. We have g () (0) = E[X ] for = 1, 2,... Proof. Therefore
More informationChapter 5. Inequalities. 5.1 The Markov and Chebyshev inequalities
Chapter 5 Iequalities 5.1 The Markov ad Chebyshev iequalities As you have probably see o today s frot page: every perso i the upper teth percetile ears at least 1 times more tha the average salary. I other
More informationEntropy and Ergodic Theory Lecture 5: Joint typicality and conditional AEP
Etropy ad Ergodic Theory Lecture 5: Joit typicality ad coditioal AEP 1 Notatio: from RVs back to distributios Let (Ω, F, P) be a probability space, ad let X ad Y be A- ad B-valued discrete RVs, respectively.
More informationBinary classification, Part 1
Biary classificatio, Part 1 Maxim Ragisky September 25, 2014 The problem of biary classificatio ca be stated as follows. We have a radom couple Z = (X,Y ), where X R d is called the feature vector ad Y
More informationA RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS
J. Japa Statist. Soc. Vol. 41 No. 1 2011 67 73 A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS Yoichi Nishiyama* We cosider k-sample ad chage poit problems for idepedet data i a
More informationLaw of the sum of Bernoulli random variables
Law of the sum of Beroulli radom variables Nicolas Chevallier Uiversité de Haute Alsace, 4, rue des frères Lumière 68093 Mulhouse icolas.chevallier@uha.fr December 006 Abstract Let be the set of all possible
More information5.1 A mutual information bound based on metric entropy
Chapter 5 Global Fao Method I this chapter, we exted the techiques of Chapter 2.4 o Fao s method the local Fao method) to a more global costructio. I particular, we show that, rather tha costructig a local
More informationLecture 2: Concentration Bounds
CSE 52: Desig ad Aalysis of Algorithms I Sprig 206 Lecture 2: Cocetratio Bouds Lecturer: Shaya Oveis Ghara March 30th Scribe: Syuzaa Sargsya Disclaimer: These otes have ot bee subjected to the usual scrutiy
More informationDistribution of Random Samples & Limit theorems
STAT/MATH 395 A - PROBABILITY II UW Witer Quarter 2017 Néhémy Lim Distributio of Radom Samples & Limit theorems 1 Distributio of i.i.d. Samples Motivatig example. Assume that the goal of a study is to
More informationExponential Families and Bayesian Inference
Computer Visio Expoetial Families ad Bayesia Iferece Lecture Expoetial Families A expoetial family of distributios is a d-parameter family f(x; havig the followig form: f(x; = h(xe g(t T (x B(, (. where
More informationLecture 7: Density Estimation: k-nearest Neighbor and Basis Approach
STAT 425: Itroductio to Noparametric Statistics Witer 28 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach Istructor: Ye-Chi Che Referece: Sectio 8.4 of All of Noparametric Statistics.
More informationAdvanced Stochastic Processes.
Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.
More informationA Note on the Kolmogorov-Feller Weak Law of Large Numbers
Joural of Mathematical Research with Applicatios Mar., 015, Vol. 35, No., pp. 3 8 DOI:10.3770/j.iss:095-651.015.0.013 Http://jmre.dlut.edu.c A Note o the Kolmogorov-Feller Weak Law of Large Numbers Yachu
More informationON POINTWISE BINOMIAL APPROXIMATION
Iteratioal Joural of Pure ad Applied Mathematics Volume 71 No. 1 2011, 57-66 ON POINTWISE BINOMIAL APPROXIMATION BY w-functions K. Teerapabolar 1, P. Wogkasem 2 Departmet of Mathematics Faculty of Sciece
More informationChapter 6 Principles of Data Reduction
Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a
More informationLearning Theory: Lecture Notes
Learig Theory: Lecture Notes Kamalika Chaudhuri October 4, 0 Cocetratio of Averages Cocetratio of measure is very useful i showig bouds o the errors of machie-learig algorithms. We will begi with a basic
More informationLecture 2. The Lovász Local Lemma
Staford Uiversity Sprig 208 Math 233A: No-costructive methods i combiatorics Istructor: Ja Vodrák Lecture date: Jauary 0, 208 Origial scribe: Apoorva Khare Lecture 2. The Lovász Local Lemma 2. Itroductio
More informationEmpirical Process Theory and Oracle Inequalities
Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi
More informationAda Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities
CS8B/Stat4B Sprig 008) Statistical Learig Theory Lecture: Ada Boost, Risk Bouds, Cocetratio Iequalities Lecturer: Peter Bartlett Scribe: Subhrasu Maji AdaBoost ad Estimates of Coditioal Probabilities We
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 12
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig
More informationEconomics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator
Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013 Large Deviatios for i.i.d. Radom Variables Cotet. Cheroff boud usig expoetial momet geeratig fuctios. Properties of a momet
More informationProduct measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.
Product measures, Toelli s ad Fubii s theorems For use i MAT3400/4400, autum 2014 Nadia S. Larse Versio of 13 October 2014. 1. Costructio of the product measure The purpose of these otes is to preset the
More informationLet us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.
Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,
More informationJournal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula
Joural of Multivariate Aalysis 102 (2011) 1315 1319 Cotets lists available at ScieceDirect Joural of Multivariate Aalysis joural homepage: www.elsevier.com/locate/jmva Superefficiet estimatio of the margials
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 3
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture 3 Tolstikhi Ilya Abstract I this lecture we will prove the VC-boud, which provides a high-probability excess risk boud for the ERM algorithm whe
More informationChapter 7 Isoperimetric problem
Chapter 7 Isoperimetric problem Recall that the isoperimetric problem (see the itroductio its coectio with ido s proble) is oe of the most classical problem of a shape optimizatio. It ca be formulated
More informationSequences. Notation. Convergence of a Sequence
Sequeces A sequece is essetially just a list. Defiitio (Sequece of Real Numbers). A sequece of real umbers is a fuctio Z (, ) R for some real umber. Do t let the descriptio of the domai cofuse you; it
More information6.3 Testing Series With Positive Terms
6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial
More information18.657: Mathematics of Machine Learning
8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 4 Scribe: Cheg Mao Sep., 05 I this lecture, we cotiue to discuss the effect of oise o the rate of the excess risk E(h) = R(h) R(h
More informationAsymptotic distribution of products of sums of independent random variables
Proc. Idia Acad. Sci. Math. Sci. Vol. 3, No., May 03, pp. 83 9. c Idia Academy of Scieces Asymptotic distributio of products of sums of idepedet radom variables YANLING WANG, SUXIA YAO ad HONGXIA DU ollege
More informationMAT1026 Calculus II Basic Convergence Tests for Series
MAT026 Calculus II Basic Covergece Tests for Series Egi MERMUT 202.03.08 Dokuz Eylül Uiversity Faculty of Sciece Departmet of Mathematics İzmir/TURKEY Cotets Mootoe Covergece Theorem 2 2 Series of Real
More informationRates of Convergence by Moduli of Continuity
Rates of Covergece by Moduli of Cotiuity Joh Duchi: Notes for Statistics 300b March, 017 1 Itroductio I this ote, we give a presetatio showig the importace, ad relatioship betwee, the modulis of cotiuity
More informationEcon 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.
Eco 325/327 Notes o Sample Mea, Sample Proportio, Cetral Limit Theorem, Chi-square Distributio, Studet s t distributio 1 Sample Mea By Hiro Kasahara We cosider a radom sample from a populatio. Defiitio
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio
More informationEECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1
EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum
More informationKernel density estimator
Jauary, 07 NONPARAMETRIC ERNEL DENSITY ESTIMATION I this lecture, we discuss kerel estimatio of probability desity fuctios PDF Noparametric desity estimatio is oe of the cetral problems i statistics I
More informationEmpirical risk minimization for heavy-tailed losses
Empirical risk miimizatio for heavy-tailed losses Christia Browlees Emilie Joly Gábor Lugosi Jue 8, 2014 Abstract The purpose of this paper is to discuss empirical risk miimizatio whe the losses are ot
More information(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3
MATH 337 Sequeces Dr. Neal, WKU Let X be a metric space with distace fuctio d. We shall defie the geeral cocept of sequece ad limit i a metric space, the apply the results i particular to some special
More informationIntro to Learning Theory
Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified
More informationMath 525: Lecture 5. January 18, 2018
Math 525: Lecture 5 Jauary 18, 2018 1 Series (review) Defiitio 1.1. A sequece (a ) R coverges to a poit L R (writte a L or lim a = L) if for each ǫ > 0, we ca fid N such that a L < ǫ for all N. If the
More informationLinear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d
Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y
More informationAgnostic Learning and Concentration Inequalities
ECE901 Sprig 2004 Statistical Regularizatio ad Learig Theory Lecture: 7 Agostic Learig ad Cocetratio Iequalities Lecturer: Rob Nowak Scribe: Aravid Kailas 1 Itroductio 1.1 Motivatio I the last lecture
More informationOn the convergence rates of Gladyshev s Hurst index estimator
Noliear Aalysis: Modellig ad Cotrol, 2010, Vol 15, No 4, 445 450 O the covergece rates of Gladyshev s Hurst idex estimator K Kubilius 1, D Melichov 2 1 Istitute of Mathematics ad Iformatics, Vilius Uiversity
More informationLecture 12: September 27
36-705: Itermediate Statistics Fall 207 Lecturer: Siva Balakrisha Lecture 2: September 27 Today we will discuss sufficiecy i more detail ad the begi to discuss some geeral strategies for costructig estimators.
More informationMonte Carlo Integration
Mote Carlo Itegratio I these otes we first review basic umerical itegratio methods (usig Riema approximatio ad the trapezoidal rule) ad their limitatios for evaluatig multidimesioal itegrals. Next we itroduce
More informationProblem Set 2 Solutions
CS271 Radomess & Computatio, Sprig 2018 Problem Set 2 Solutios Poit totals are i the margi; the maximum total umber of poits was 52. 1. Probabilistic method for domiatig sets 6pts Pick a radom subset S
More informationLecture 10 October Minimaxity and least favorable prior sequences
STATS 300A: Theory of Statistics Fall 205 Lecture 0 October 22 Lecturer: Lester Mackey Scribe: Brya He, Rahul Makhijai Warig: These otes may cotai factual ad/or typographic errors. 0. Miimaxity ad least
More informationRiesz-Fischer Sequences and Lower Frame Bounds
Zeitschrift für Aalysis ud ihre Aweduge Joural for Aalysis ad its Applicatios Volume 1 (00), No., 305 314 Riesz-Fischer Sequeces ad Lower Frame Bouds P. Casazza, O. Christese, S. Li ad A. Lider Abstract.
More informationNotes 19 : Martingale CLT
Notes 9 : Martigale CLT Math 733-734: Theory of Probability Lecturer: Sebastie Roch Refereces: [Bil95, Chapter 35], [Roc, Chapter 3]. Sice we have ot ecoutered weak covergece i some time, we first recall
More informationDetailed proofs of Propositions 3.1 and 3.2
Detailed proofs of Propositios 3. ad 3. Proof of Propositio 3. NB: itegratio sets are geerally omitted for itegrals defied over a uit hypercube [0, s with ay s d. We first give four lemmas. The proof of
More informationLecture 2: Monte Carlo Simulation
STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?
More informationA Note on Sums of Independent Random Variables
Cotemorary Mathematics Volume 00 XXXX A Note o Sums of Ideedet Radom Variables Pawe l Hitczeko ad Stehe Motgomery-Smith Abstract I this ote a two sided boud o the tail robability of sums of ideedet ad
More informationOn Random Line Segments in the Unit Square
O Radom Lie Segmets i the Uit Square Thomas A. Courtade Departmet of Electrical Egieerig Uiversity of Califoria Los Ageles, Califoria 90095 Email: tacourta@ee.ucla.edu I. INTRODUCTION Let Q = [0, 1] [0,
More informationPAijpam.eu ON TENSOR PRODUCT DECOMPOSITION
Iteratioal Joural of Pure ad Applied Mathematics Volume 103 No 3 2015, 537-545 ISSN: 1311-8080 (prited versio); ISSN: 1314-3395 (o-lie versio) url: http://wwwijpameu doi: http://dxdoiorg/1012732/ijpamv103i314
More informationLecture 01: the Central Limit Theorem. 1 Central Limit Theorem for i.i.d. random variables
CSCI-B609: A Theorist s Toolkit, Fall 06 Aug 3 Lecture 0: the Cetral Limit Theorem Lecturer: Yua Zhou Scribe: Yua Xie & Yua Zhou Cetral Limit Theorem for iid radom variables Let us say that we wat to aalyze
More informationarxiv: v1 [math.pr] 13 Oct 2011
A tail iequality for quadratic forms of subgaussia radom vectors Daiel Hsu, Sham M. Kakade,, ad Tog Zhag 3 arxiv:0.84v math.pr] 3 Oct 0 Microsoft Research New Eglad Departmet of Statistics, Wharto School,
More informationIf a subset E of R contains no open interval, is it of zero measure? For instance, is the set of irrationals in [0, 1] is of measure zero?
2 Lebesgue Measure I Chapter 1 we defied the cocept of a set of measure zero, ad we have observed that every coutable set is of measure zero. Here are some atural questios: If a subset E of R cotais a
More informationLinear Regression Demystified
Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to
More informationEntropy Rates and Asymptotic Equipartition
Chapter 29 Etropy Rates ad Asymptotic Equipartitio Sectio 29. itroduces the etropy rate the asymptotic etropy per time-step of a stochastic process ad shows that it is well-defied; ad similarly for iformatio,
More informationLinear Support Vector Machines
Liear Support Vector Machies David S. Roseberg The Support Vector Machie For a liear support vector machie (SVM), we use the hypothesis space of affie fuctios F = { f(x) = w T x + b w R d, b R } ad evaluate
More informationNotes 27 : Brownian motion: path properties
Notes 27 : Browia motio: path properties Math 733-734: Theory of Probability Lecturer: Sebastie Roch Refereces:[Dur10, Sectio 8.1], [MP10, Sectio 1.1, 1.2, 1.3]. Recall: DEF 27.1 (Covariace) Let X = (X
More information17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15
17. Joit distributios of extreme order statistics Lehma 5.1; Ferguso 15 I Example 10., we derived the asymptotic distributio of the maximum from a radom sample from a uiform distributio. We did this usig
More informationMeasure and Measurable Functions
3 Measure ad Measurable Fuctios 3.1 Measure o a Arbitrary σ-algebra Recall from Chapter 2 that the set M of all Lebesgue measurable sets has the followig properties: R M, E M implies E c M, E M for N implies
More informationWeek 5-6: The Binomial Coefficients
Wee 5-6: The Biomial Coefficiets March 6, 2018 1 Pascal Formula Theorem 11 (Pascal s Formula For itegers ad such that 1, ( ( ( 1 1 + 1 The umbers ( 2 ( 1 2 ( 2 are triagle umbers, that is, The petago umbers
More informationChapter 6 Sampling Distributions
Chapter 6 Samplig Distributios 1 I most experimets, we have more tha oe measuremet for ay give variable, each measuremet beig associated with oe radomly selected a member of a populatio. Hece we eed to
More informationDiscrete Mathematics for CS Spring 2008 David Wagner Note 22
CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig
More informationStochastic Simulation
Stochastic Simulatio 1 Itroductio Readig Assigmet: Read Chapter 1 of text. We shall itroduce may of the key issues to be discussed i this course via a couple of model problems. Model Problem 1 (Jackso
More informationA constructive analysis of convex-valued demand correspondence for weakly uniformly rotund and monotonic preference
MPRA Muich Persoal RePEc Archive A costructive aalysis of covex-valued demad correspodece for weakly uiformly rotud ad mootoic preferece Yasuhito Taaka ad Atsuhiro Satoh. May 04 Olie at http://mpra.ub.ui-mueche.de/55889/
More informationSequences and Series of Functions
Chapter 6 Sequeces ad Series of Fuctios 6.1. Covergece of a Sequece of Fuctios Poitwise Covergece. Defiitio 6.1. Let, for each N, fuctio f : A R be defied. If, for each x A, the sequece (f (x)) coverges
More informationLecture 4: April 10, 2013
TTIC/CMSC 1150 Mathematical Toolkit Sprig 01 Madhur Tulsiai Lecture 4: April 10, 01 Scribe: Haris Agelidakis 1 Chebyshev s Iequality recap I the previous lecture, we used Chebyshev s iequality to get a
More informationSpectral Partitioning in the Planted Partition Model
Spectral Graph Theory Lecture 21 Spectral Partitioig i the Plated Partitio Model Daiel A. Spielma November 11, 2009 21.1 Itroductio I this lecture, we will perform a crude aalysis of the performace of
More informationLECTURE 8: ASYMPTOTICS I
LECTURE 8: ASYMPTOTICS I We are iterested i the properties of estimators as. Cosider a sequece of radom variables {, X 1}. N. M. Kiefer, Corell Uiversity, Ecoomics 60 1 Defiitio: (Weak covergece) A sequece
More information