COMPLEXITY REGULARIZATION VIA LOCALIZED RANDOM PENALTIES

Size: px
Start display at page:

Download "COMPLEXITY REGULARIZATION VIA LOCALIZED RANDOM PENALTIES"

Transcription

1 COMLEXITY REGULARIZATION VIA LOCALIZED RANDOM ENALTIES GÁBOR LUGOSI AND MARTEN WEGKAM Abstract. I this paper model selectio via pealized empirical loss miimizatio i oparametric classificatio problems is studied. Data-depedet pealties are costructed, which are based o estimates of the complexity of a small subclass of each model class, cotaiig oly those fuctios with small empirical loss. The pealties are ovel sice those cosidered i the literature are typically based o the etire model class. Oracle iequalities usig these pealties are established, ad the advatage of the ew pealties over those based o the complexity of the whole model class is demostrated. 1. Itroductio. I this paper we propose a ew complexity-pealized model selectio method based o data-depedet pealties. We cosider a simple biary classificatio problem, though most ideas may be exteded to a more geeral framewor. Give a radom observatio X R d, oe has to predict Y 0, 1. A classifier or classificatio rule is a fuctio f : R d 0, 1, with loss L(f) def = f(x) Y. A sample D = (X 1, Y 1 ),..., (X, Y ) of idepedet, idetically distributed (i.i.d.) pairs are available. Each pair (X i, Y i ) has the same distributio as (X, Y ) ad D is idepedet of (X, Y ). The statisticia s tas is to select a classificatio rule f based o the data D such that the probability of error is small. The Bayes classifier L(f ) = f (X) Y D f (x) def = I Y = 1 X = x Y = 0 X = x (where I deotes the idicator fuctio) is the optimal rule as L def = if L(f) = L(f ), f:r d 0,1 Date: March 3, Mathematics Subject Classificatio. rimary 62H30, 62G99; secodary 60E15. Key words ad phrases. Classificatio, complexity regularizatio, cocetratio iequalities, oracle iequalities, rademacher averages, radom pealties, shatter coefficiets. Supported by DGI grat BMF

2 2 GÁBOR LUGOSI AND MARTEN WEGKAM but both f ad L are uow to the statisticia. I this ote we study classifiers f : R d 0, 1 which miimize the empirical loss L(f) = 1 If(X i ) Y i i=1 over a class of rules F. For ay f F miimizig the empirical probability of error, we have Clearly, the approximatio error is decreasig as F becomes richer. EL( f) L = E L( f) L + E(L L)( f) statistical problem becomes: the estimatio error = E if L(f) L + E(L L)( f) if E L(f) L + E(L L)( f) = if L(f) L + E(L L)( f). if L(f) L However, the more complex F, the more difficult the E(L L)( f) icreases with the complexity of F. I may approaches to the problem described above, oe fixes i advace a sequece of model classes F 1, F 2,..., whose uio is F. The problem of pealized model selectio is to fid a possibly data-depedet pealty Ĉ, assiged to each class F, such that miimizig the pealized empirical loss L(f) + Ĉ, f F, = 1, 2,... leads to a predictio rule f with smallest possible loss. Deote by f a fuctio i F havig miimal empirical loss ad by L = if L(f) the miimal loss i class F. that The mai idea is that sice f miimizes L(f), we fid, by the argumet described above, EL( f ) L L L + E(L L)( f ). Our goal is to fid the class F such that L( f ) is as small as possible. To this ed, a good balace has to be foud betwee the approximatio ad estimatio errors. The approximatio error is uow to us, but the estimatio error may be estimated. The ey to complexityregularized model selectio is that a tight boud for the estimatio error is a good pealty Ĉ. More precisely, we show i Lemma 2.1 below that if for some costat κ > 0 Ĉ (L L)( f ) κ 2 2,

3 LOCALIZED RANDOM ENALTIES 3 the the oracle iequality EL( f) L if holds, ad also, a similar boud, L( f) L if ( ) L L + EĈ + 2κ 2 ( ) L L + 2Ĉ holds with probability greater tha 1 4κ 2. This simple result shows that the pealty should be, with large probability, a upper boud o the estimatio error, ad to guaratee good performace, the boud should be as tight as possible. Origially, distributio-free bouds, based o uiform-deviatio iequalities, were proposed as pealties. For example, the structural ris miimizatio method of Vapi ad Chervoeis 27 uses pealties of the form log S (2) + log Ĉ = κ, where κ is a costat ad S (2) is the 2-maximal shatter coefficiet of the class that is, A = x : f(x) = 1, f F,, (1.1) S (2) = max x 1,...,x 2 x 1,, x 2 A, A A = max x 1,...,x 2 (f(x 1 ),, f(x 2 )), f F, see for example, Vapi 26, Devroye, Györfi, ad Lugosi 9. of pealty wors follows from the Vapi-Chervoeis iequality. The fact that this type Such distributio-free bouds are attractive because of their simplicity, but precisely because of their distributiofree ature, they are ecessarily loose i may cases. Recetly various attempts have bee made to defie the pealties i a data-depedet way to achieve this goal, see, for example, Bartlett, Bouchero, ad Lugosi 2, Koltchisii 11, Koltchisii ad acheo 13, Lozao 15, Lugosi ad Nobel 17, Massart 19, ad Shawe-Taylor, Bartlett, Williamso, ad Athoy 22. For example, i 11 ad 2, radom complexity pealties based o Rademacher averages were proposed ad ivestigated. Rademacher averages are defied as 1 R F = E sup σ i If(X i ) Y i D, i=1

4 4 GÁBOR LUGOSI AND MARTEN WEGKAM where σ 1,..., σ are i.i.d. symmetric 1, 1-valued radom variables, idepedet of D. The reaso why this pealty was itroduced is based o the fact that E sup (L L)(f) E R F (see, e.g., Va der Vaart ad Weller 25), ad sice R F ca be show to be sharply cocetrated aroud its mea. I fact, cocetratio iequalities have bee a ey tool i the aalysis of data-based pealties (see Massart 19) ad this paper relies heavily o some recet cocetratio results. The model selectio method based o Rademacher complexities satisfies a oracle iequality of the rough form (1.2) EL( f) L if L L + κ 1 E R F + κ 2 log (see 2 ad 11) for values of the costats κ 1, κ 2 > 0. The advatage of this boud over the oe obtaied by the distributio-free pealties metioed above may perhaps be better uderstood if we further boud where E R F E log 2S (X 1 ) 2 (1.3) S (X 1 ) = X 1,..., X A : A = x : f(x) = 1, f F = (f(x 1 ),..., f(x )), f F, is the radom shatter coefficiet of the class F, which obviously ever exceeds the worst-case shatter coefficiet S () ad may be sigificatly smaller for certai distributios. However, this improved pealty is still ot completely satisfactory. To see this, recall that by a classical result of Vapi ad Chervoeis, for ay idex, (1.4) EL( f ) L c ( L E log S (X 1 ) + E log S (X 1 ) which is much smaller tha the correspodig expected Rademacher average if L is small. (For explicit costats we refer to Theorem 1.14 i Lugosi 16.) Sice i typical classificatio problems the miimal error L i class F is ofte very small for some, it is importat to fid pealties which allow to derive oracle iequalities with the appropriate depedece o L. I particular, a desirable goal would be to develop classifiers f for which a oracle iequality resemblig EL( f) L if L L + κ 1 L E log S (X 1 ) ) E log S (X1 + κ ) 2

5 LOCALIZED RANDOM ENALTIES 5 holds for all distributios. The mai results of this ote (Theorems 4.1 ad 4.2) show that estimates of the desired property are ideed possible to costruct i a coceptually simple way. By the ey Lemma 2.1, it suffices to fid a data-depedet upper estimate of (L L)( f ) which has the order of magitude of the above upper boud. The difficulty is that L ad E log S (X1 ) both deped o the uderlyig distributio. The improvemet is achieved by decreasig the pealties so that the supremum i the defiitio of the Rademacher average is ot tae over the whole class F but rather over a small subclass F cotaiig oly fuctios which loo good o the data. More precisely, defie the radom subclass F F by F = f F : L(f) κ 1 L( f ) + κ 2 1 log S (X1 ) + κ 3 1 log() for some o-egative costats κ 1, κ 2 ad κ 3. Ris estimates based o localized Rademacher averages have bee cosidered i several recet wors. The most closely related procedure is proposed by Koltchisii ad acheo 12, who, assumig if L(f) = 0, compute the Rademacher averages of subclasses of F with empirical loss less tha r for differet values of r obtaied by a recursive procedure, ad obtai bouds for the loss of the empirical ris miimizer i terms of the localized Rademacher averages obtaied after a certai umber of iteratios. Our approach of boudig the loss is coceptually simpler: it suffices to compute the Rademacher complexities at oly oe scale which depeds o the smallest empirical loss i the class ad a term of a smaller order determied by the shatter coefficiets of the whole class. Thus, we use global iformatio to determie the scale of localizatio. Bartlett, Bousquet, ad Medelso 3 also derive closely related geeralizatio bouds, based o localized Rademacher averages. I their approach the performace bouds also deped o Rademacher averages computed at differet scales of localizatio, which are combied by the techique of peelig. For further recet related wor we also refer to Bousquet 7, Bousquet, Koltchisii, ad acheo 8, ad Tsybaov 24. The rest of the paper is orgaized as follows. Sectio 2 presets some basic iequalities o model selectio, which geeralizes some of the results i Bartlett, Bouchero, ad Lugosi 2. Sectio 3 proposes a simple but suboptimal pealty which already has some of the mai features of the pealties preseted i Sectio 4. It shows, i a trasparet way, some of the uderlyig ideas of the mai results. Sectio 4 itroduces a ew pealty based o the Rademacher average R F ad it is show that the ew estimate yields a improvemet of the desired form.

6 6 GÁBOR LUGOSI AND MARTEN WEGKAM 2. relimiaries I this sectio we preset two basic auxiliary lemmata o model selectio. The first lemma is geeral i the sese that it does ot deped o the particular choice of the pealty Ĉ. This result was metioed i the itroductio ad geeralizes a result obtaied by Bartlett, Bouchero, ad Lugosi 2. Lemma 2.1. Suppose that the radom variables Ĉ1, Ĉ2,... are such that Ĉ (L L)( f ) κ 2 2 for some κ > 0 ad for all. The we have EL( f) L if L L + EĈ + 2κ 2. It is clear that we ca always tae Ĉ 1. roof. Observe that E sup (L L)( f ) Ĉ sup (L L)( f ) Ĉ =1 (sice sup 0 (L L)( f ) Ĉ (L L)( f ) Ĉ 0 (by the uio boud) κ 2 2 =1 (by assumptio) 2κ 2. 1)

7 LOCALIZED RANDOM ENALTIES 7 Therefore, we may coclude that EL( f) L = E L( f) L + Ĉ + E (L L)( f) Ĉ (where is the selected model idex, that is, f = f ) E if L( Ĉ f ) L + + E (L L)( f) Ĉ (by defiitio of f) E if if L(f) L + Ĉ (by defiitio of f ) if if L(f) L + EĈ (iterchage E ad if) L L + EĈ + 2κ 2 (by the precedig display) if + E sup (L L)( f ) Ĉ + E sup (L L)( f ) Ĉ ad the proof is complete. The precedig result is ot etirely satisfactory o the followig groud. Although it presets a sharp boud, it is a boud for the average ris behavior of f. However, the pealty is computed o the data at had, ad therefore the proposed criterio should have optimal performace for (almost) all possible sequeces of the data. The followig result presets a oasymptotic oracle iequality which holds with large probability ad a asymptotic almost sure versio. Lemma 2.2. Assume that for all, 1, Ĉ (L L)( f ) The for all 1 we have κ 2 2 ad the asymptotic almost sure boud ad Ĉ ( L L)(f ) L( f) ( ) L if L L + 2Ĉ 4κ 2 lim if L( f) ( L if L L + 2Ĉ) = 1. κ 2 2.

8 8 GÁBOR LUGOSI AND MARTEN WEGKAM roof. Let be the selected model idex. Notice that L( f) = L( f) + Ĉ + (L L)( f) Ĉ if L( Ĉ f ) + + sup (L L)( f ) Ĉ if L(f Ĉ ) + + sup (L L)( f ) Ĉ if L + 2Ĉ + sup ( L L)(f ) Ĉ + sup By assumptio, the last two terms o the right-had side satisfy sup ( L L)(f ) Ĉ + sup (L L)( f ) Ĉ 0 (L L)( f ) Ĉ. =1 2κ 2 2 < 4κ 2, provig the first iequality. The almost sure statemet is a direct cosequece of the Borel- Catelli lemma. 3. A simple versio The purpose of this short sectio is to offer a simplified, yet suggestive illustratio of the ideas. As discussed i the itroductio, a ideal pealty would be a tight upper boud for the expressio o the right-had side of (1.4). simple pealty Ĉ = 2 2 L( f ) + 8 log S (2) + 2 log() Motivated by this boud, we propose the log S (2) + 2 log() where S (2) is the (worst-case) 2-shatter coefficiet defied i (1.1). Thus, the miimal loss L i class F is estimated by its atural empirical couterpart L( f ) = if L(f) ad the expected logarithmic shatter coefficiet E log S (X1 ) is estimated by the distributio-free upper boud log S (2). (This term may be bouded further by V log(2 + 1), where V is the VC-dimesio of the set A ). The auxiliary terms 1 log() are ecessary to derive the desired oracle iequalities. The ext theorem shows that the proposed pealty ideed wors. Theorem 3.1. Cosider the pealized empirical loss miimizer f with the data-based pealty Ĉ defied above. The for every ad for all distributios of (X, Y ), EL( f) L if ( ) L L + EĈ ,

9 LOCALIZED RANDOM ENALTIES 9 I particular, EL( f) L if L L + 4 L + 2 log S (2) + 2 log() log S (2) + 2 log() The proof uses Lemma 2.1 ad the followig uiform deviatio boud due to Vapi ad Chervoeis 27. (The slighly improved form used here is proved by Athoy, ad Shawe-Taylor 1.) ropositio 3.2. Let S (X1 2) be the radom shatter coefficiet of A based o i.i.d. observatios X 1,, X 2 defied i (1.3). For all ε > 0 ad 1, (3.1) ad (3.2) sup L(f) 2 L(f) 2ε sup L(f) 2L(f) 2ε roof. Observe that for all ε > 0 ad 1, ad similarly, sup L(f) 2 L(f) 2ε sup L(f) 2L(f) 2ε 4ES (X 2 1 ) exp( ε/4) 4ES (X 2 1 ) exp( ε/4). sup sup L(f) L(f) L(f) L(f) L(f) L(f) The propositio follows by Athoy, ad Shawe-Taylor 1. ε ε. roof of Theorem 3.1. We start with the proof of the first iequality of Theorem 3.1. I view of Lemma 2.1, it suffices to show that Cosequetly, by (3.1), L( f ) L( f ) Ĉ 8/() 2. 2 L( f ) + 8 log S (2) = = + 16 log() L( f ) 2 L( f ) 8 log S (2) 4S (2) exp 4 2 2, L( f ) + 16 log() ) ( 8 log S (2) + 16 log() 8

10 10 GÁBOR LUGOSI AND MARTEN WEGKAM so that where Ĉ C 1 4/() 2, C = 2 L( f log S (2) ) Aother applicatio of iequality (3.1) yields L( f ) L( f ) Ĉ L( f ) L( f ) C + 4 () 2 = 4S (2) exp 8 () 2. Coclude via Lemma 2.1 that + 2 log(). ( log 4 4 S (2) + 2 log() EL( f) ( ) mi L + EĈ ) + 4 () 2 For the secod iequality, deduce that for all δ > 0, E L( f ) + δ E L( f ) + δ E if L(f) + δ L + δ. by Jese s iequality ad the defiitio of f. The boud of Theorem 3.1 has the right depedece o L as suggested by iequality (1.4) metioed i the itroductio. I particular, if L happes to equal to zero for some class F, the the upper boud has a improved rate of covergece. The disadvatage of the simple pealty defied above is that istead of the expected shatter coefficiets, a distributio-free (ad therefore suboptimal) upper boud appears for each class F. Recetly, Bouchero, Lugosi ad Massart 4 proved that log S (X1 ) cocetrates sharply aroud its mea. For example, we have the followig iequalities: ropositio 3.3. For all ε > 0, 1, Moreover, for each 1, E log S (X 1 ) > 2 log S (X 1 ) + 2ε e ε, log S (X 1 ) > 2E log S (X 1 ) + 2ε e ε. E log S (X 1 ) log ES (X 1 ) 1 l 2 E log S (X 1 ) 2E log S (X 1 ).

11 LOCALIZED RANDOM ENALTIES 11 This propositio implies that the expected radom log shatter coefficiets E log S (X 1 ) of F may be replaced by a costat times log S (X1 ) ad vice versa. Hece we may replace the distributio-free bouds log S (2) by empirical estimates log S (X1 ), at the price of slightly worse costats. The mai oracle iequalities i Sectio 4 are accompaied by asymptotic almost-sure versios of bouds for the expected value. Such bouds are easy to obtai as well, simply by ivoig Lemma 2.2 istead of Lemma 2.1. The details are omitted here. F, (4.1) 4. Rademacher pealties The mai results of the paper are preseted i this sectio. Assig, to each model class û = 16 4 log S (X 1 ) + 9 log() with S (X1 ) defied i (1.3), ad the class (4.2) F = f F : L(f) 16 L( f ) + 15û. Observe that the class F cotais oly those classifiers whose empirical loss is ot much larger tha that of the empirical miimizer. Note that the costat 16 has o special role, it has bee chose by coveiece. Ay costat larger tha oe would lead to similar results, at the price of modifyig other costats. The term û depeds o the shatter coefficiet of the whole class F but it is typically small compared to L( f ). The pealty is calculated i terms of the Rademacher average of this smaller class. More precisely, defie the complexity estimate by (4.3) Ĉ = ( 8 R F log() log(), 8 L( f ) + 7û ) 1. Agai, ot too much attetio should be paid to the values of the costats ivolved. We favored simple readable proofs over optimal costats. Note that, through S (X1 ), the pealty also depeds o the radom shatter coefficiet of the whole class F. However, the term ivolvig the shatter coefficiet of the etire class F, 1 log() log S (X 1 ), is typically much smaller (by a factor 1/2 ) tha the Rademacher average of the whole class F. (For istace, see iequality (4.8) ad ropositio 4.6 below.) We have the followig performace boud for the expected loss of the miimizer f of the pealized empirical loss L( f ) + Ĉ.

12 12 GÁBOR LUGOSI AND MARTEN WEGKAM Theorem 4.1. For every, EL( f) L if I additio, with probability greater tha 1 44/ 2, ad also ( ) L L + EĈ L( f) L if (L L + 2Ĉ) lim if L( f) L if (L L + 2Ĉ) = 1. The ext theorem is here to poit out that the boud above is ideed a sigificat improvemet over bouds of the type (1.2), ad that the depedece o the miimal loss L ad the radom shatter coefficiet has the form suggested by (1.4). For this purpose, we itroduce (4.4) ad the class We also set u = 16 8E log S (X1 ) + 17 log() F = f F : L(f) 64L + 63u. ε = 2 1 log(). Theorem 4.2. The followig oracle iequality holds EL( f) L mi 1 L L + 8E R F + 15ε + 16 L + u 2ε I particular, there exists uiversal costats κ 1 ad κ 2 such that EL( f) L if L L + κ 1 L (E log S (X 1 ) log()) E log S (X1 + κ ) log() 2 This oracle iequality has the desired form outlied i the itroductio ad improves upo the results of 2 ad 13. For example, i the special case whe L = 0 for 0, we obtai, for some umerical costats c 1 ad c 2, EL( f) E log S (X1 mi c ) log() 1 + c which is of a differet order of magitude from the pealties cosidered by 2 ad 13. Theorem 4.2 is oly stated for the expected loss but a iequality which holds with large probability may be obtaied just as i Theorem 4.1..

13 LOCALIZED RANDOM ENALTIES 13 roofs of Theorems 4.1 ad 4.2. First, recall the defiitios of û ad u i (4.1) ad (4.4), respectively, ad i additio, defie ad the evet u = 8 2 log ES (X 1 ) + 2 log() B def = u û u. Observe ropositio 3.3 above yields that, with probability at least 1 1/() 2, ad therefore u = 16 log ES (X 1 ) + log() 16 2E log S (X 1 ) + log() log S (X 1 ) + 4 log() + log() = û 16 42E log S (X 1 ) + 4 log() + 9 log() = 16 8E log S (X 1 ) + 17 log() = u (4.5) B c () 2. Fially, we itroduce the evet ad the class A = sup L(f) 2 L(f) u sup L(f) 2L(f) u F = f F : L(f) 4L + 3u. The followig itermediate result will be useful i the proofs of both theorems. Lemma 4.3. We have (4.6) ad o the set A B, the followig holds: (i) f F. A B 1 9 () 2, (ii) F F, ad i particular, R F R F. (iii) L 2 L( f ) + u. roof. To begi with, otice that ES (X 2 1 ) ES (X 1 )S (X 2 +1) = E 2 S (X 1 )

14 14 GÁBOR LUGOSI AND MARTEN WEGKAM by the defiitio of the shatter coefficiet ad by the idepedece of the X i. ropositio 3.2, ( A c 8ES (X1 2 ) exp u ) This boud ad (4.5) imply assertio (4.6). To prove claim (i), observe that o A, L( f ) 2 L( f ) + u (by defiitio of A ) Thus, by 2 L(f ) + u (by defiitio of f ) 2 (2L + u ) + u (by defiitio of A ) = 4L + 3u. For claim (ii), otice that for ay f F, L(f) 2L(f) + u (by defiitio of A ) 2 4L + 3u + u (by defiitio of F ) = 8L + 7u 8L( f ) + 7u (by defiitio of L ) 16 L( f ) + 15u (by defiitio of A ) 16 L( f ) + 15û (by defiitio of B ). Claim (ii) ow follows. Claim (iii) is immediate from the defiitio of A sice both f ad f belog to F. Next we li the Rademacher average R F to E sup L(f) L(f). By a classical symmetrizatio device (cf. Gié ad Zi 10 or Va der Vaart ad Weller 25) (4.7) E sup L(f) L(f) 2E R F.

15 LOCALIZED RANDOM ENALTIES 15 Also, R F is ow to cocetrate sharply aroud its mea. For example, we have, by results of Bouchero, Lugosi, ad Massart 4, 5, the followig bouds. ropositio 4.4. For all ɛ > 0, 1, RF 2E R F + ɛ e 6ɛ/5 ad R F 1 2 E R F ɛ e ɛ. roof. Defie Z def = R F, the it follows from Bouchero, Lugosi, ad Massart 4 that log E exp(λ(z EZ)) EZ(e λ 1 λ), which implies further that for 0 λ < 3 log E exp(λ(z EZ)) After a applicatio of Marov s iequality we fid λez 2(1 λ/3). Z EZ + 2EZx + x/3 e x. We obtai the desired upper-tail boud by isertig Z = R F i the precedig display ad ivoig the iequality 2 xy x + y. The boud for the lower tail follows from the iequality Z EZ 2xEZ e x (see 4) ad sice x y 2xy. Fially, we mae ey use of the followig cocetratio iequality for the supremum of a empirical process, recetly established by Talagrad 23, see also Ledoux 14, Massart 19, Rio 21. The best ow costats reported here have bee obtaied by Bousquet 6. ropositio 4.5. Set Σ F = sup L(f)(1 L(f)). For all ɛ > 0, 1 sup L(f) L(f) 4ɛ 2E sup L(f) L(f) + Σ F 2ɛ + e ɛ. 3 We are ow ready to prove Theorems 4.1 ad 4.2.

16 16 GÁBOR LUGOSI AND MARTEN WEGKAM roof of Theorem 4.1. Deduce, usig (i), (ii) ad (iii) of Lemma 4.3, the followig strig of iequalities: L( f ) L( f ) + Ĉ A B = L( f ) L( f ) + 8 R F + 10ε + 8 L( f ) + 7û 2ε A B f F : L(f) L(f) + 8 R F + 10ε + 8 L( f ) + 7û 2ε A B (by property (i) ) f F : L(f) L(f) + 8 R F + 10ε + 8 L( f ) + 7u 2ε A B (by property (ii) ad defiitio of B ) f F : L(f) L(f) + 8 R F + 10ε + 4L + 3u 2ε A B (by property (iii) ) sup L(f) L(f) 8 R F + 10ε + Σ F 2ε where the last iequality follows from Σ 2 F = sup Var(If(X) Y ) sup L(f) 4L + 3u. Ivoe iequality (4.7), iequality (4.6) ad ropositios 4.4 ad 4.5 above to coclude that L( f ) L( f ) + Ĉ sup L(f) L(f) 8 R F + 10ε + Σ F 2ε (sice (A B ) c 9/( 2 2 ) by (4.6) i Lemma 4.3) sup L(f) L(f) 4E R F + 2ε + Σ F 2ε (by ropositio 4.4) sup L(f) L(f) 2E sup L(f) L(f) + 4ε 3 + Σ F (by (4.7) ) 11 2 (by ropositio 4.5). 2 2ε This proves the first assertio. The almost sure statemet follows by ivoig Lemma 2.2 ad the precedig argumet (which also shows that Ĉ (L L)(f )

17 LOCALIZED RANDOM ENALTIES 17 although the last assertio could be show i a much easier way as it oly ivolves a sigle fuctio f ). Theorem 4.1 follows from Lemma 2.1 ad Lemma 2.2. (4.8) I the proof of Theorem 4.2 we eed the symmetrizatio device E R F 2E sup L(f) L(f) + sup L(f). (see, e.g., Medelso 20, p.18), ad also the followig result due to Massart 18. versio stated here is tae from Lugosi 16.) (The ropositio 4.6. Set Σ = sup L(f)(1 L(f)). The for all 1, E sup L(f) L(f) 8E log 2S (X1 2) 2Σ E log 2S (X1 2). roof. The statemet follows almost immediately from Theorem 1.10 i Lugosi 16 by otig that the worst-case shatter coefficiets may be replaced with impuity by the radom shatter coefficiets. roof of Theorem 4.2. Observe that o the evet A B, F F, where F is as defied i Theorem 4.2. Ideed, for ay f F, L(f) 2 L(f) + u (by defiitio of A ) 2 16 L( f ) + 15û + u (by defiitio of F ) 32 L( f ) + 31u (by defiitio of B ) 32 L(f ) + 31u (by defiitio of f ) 322L + u + 31u (by defiitio of A ) = 64L + 63u. Also, we otice that o the evet A, L( f ) L(f ) 2L + u.

18 18 GÁBOR LUGOSI AND MARTEN WEGKAM These observatios imply that Ĉ I A B 8 R F + 10ε L + 63u 2ε 8 R F + 10ε + 16 L + u 2ε. Cosequetly, it follows from Lemma 4.3 above that EĈ EĈI A + (A B ) c 8E R F + 10ε + 16 L + u 2ε + 9() 2 8E R F + 15ε + 16 L + u 2ε. This boud ad Theorem 4.1 yield the first iequality of Theorem 4.2. The secod iequality follows from the symmetrizatio (4.8) ad ropositio 4.6 above. Acowledgemets. We tha Olivier Bousquet for his ivaluable remars ad advices. We also appreciate the helpful remars by two referees. Refereces 1 M. Athoy ad J. Shawe-Taylor. A result of Vapi with applicatios. Discrete Applied Mathematics, 47: , Bartlett, S. Bouchero, ad G. Lugosi. Model selectio ad error estimatio. Machie Learig, 48:85 113, Bartlett, O. Bousquet, ad S. Medelso. Localized Rademacher complexities. I roceedigs of the 15th aual coferece o Computatioal Learig Theory, pages 44 48, S. Bouchero, G. Lugosi, ad. Massart. A sharp cocetratio iequality with applicatios. Radom Structures ad Algorithms, 16: , S. Bouchero, G. Lugosi, ad. Massart. Cocetratio iequalities usig the etropy method. Aals of robability, to appear, O. Bousquet. A Beett cocetratio iequality ad its applicatio to suprema of empirical processes. C. R. Acad. Sci. aris, 334: , O. Bousquet. New approaches to statistical learig theory. Aals of the Istitute of Statistical Mathematics, O. Bousquet, V. Koltchisii, ad D. acheo. Some local measures of complexity of covex hulls ad geeralizatio bouds. I roceedigs of the 15th Aual Coferece o Computatioal Learig Theory, pages Spriger, L. Devroye, L. Györfi, ad G. Lugosi. A robabilistic Theory of atter Recogitio. Spriger-Verlag, New Yor, E. Gié ad J. Zi. Some limit theorems for empirical processes. Aals of robability, 12: , V. Koltchisii. Rademacher pealties ad structural ris miimizatio. IEEE Trasactios o Iformatio Theory, 47: , V. Koltchisii ad D. acheo. Rademacher processes ad boudig the ris of fuctio learig. I E. Gié, D.M. Maso, ad J.A. Weller, editors, High Dimesioal robability II, pages , V. Koltchisii ad D. acheo. Empirical margi distributios ad boudig the geeralizatio error of combied classifiers. Aals of Statistics, 30, M. Ledoux. O Talagrad s deviatio iequalities for product measures. ESAIM: robability ad Statistics, 1:63 87, F. Lozao. Model selectio usig rademacher pealizatio. I roceedigs of the Secod ICSC Symposia o Neural Computatio (NC2000). ICSC Adademic ress, 2000.

19 LOCALIZED RANDOM ENALTIES G. Lugosi. atter classificatio ad learig theory. I L. Györfi, editor, riciples of Noparametric Learig, pages Spriger, Viea, G. Lugosi ad A. Nobel. Adaptive model selectio usig empirical complexities. Aals of Statistics, 27: , Massart. About the costats i Talagrad s cocetratio iequalities for empirical processes. Aals of robability, 28: , Massart. Some applicatios of cocetratio iequalities to statistics. Aales de la Faculté des Sciecies de Toulouse, IX: , S. Medelso. A few otes o statistical learig theory. I S. Medelso ad A. Smola, editors, Advaced Lectures i Machie Learig, LNCS 2600, pages Spriger, E. Rio. Iégalités de cocetratio pour les processus empiriques de classes de parties. robability Theory ad Related Fields, 119: , J. Shawe-Taylor,.L. Bartlett, R.C. Williamso, ad M. Athoy. Structural ris miimizatio over data-depedet hierarchies. IEEE Trasactios o Iformatio Theory, 44: , M. Talagrad. A ew loo at idepedece. Aals of robability, 24:1 34, (Special Ivited aper). 24 A. B. Tsybaov. Optimal aggregatio of classifiers i statistical learig. C. R. Acad. Sci. aris, to appear, A.W. va der Waart ad J.A. Weller. Wea covergece ad empirical processes. Spriger-Verlag, New Yor, V.N. Vapi. Statistical Learig Theory. Joh Wiley, New Yor, V.N. Vapi ad A.Ya. Chervoeis. Theory of atter Recogitio. Naua, Moscow, (i Russia); Germa traslatio: Theorie der Zeicheereug, Aademie Verlag, Berli, Departmet of Ecoomics, ompeu Fabra Uiversity, Ramo Trias Fargas 25-27, Barceloa, Spai address: lugosi@upf.es Departmet of Statistics, Yale Uiversity,.O. Box , New Have, CT , Uited States of America address: marte.wegamp@yale.edu

A survey on penalized empirical risk minimization Sara A. van de Geer

A survey on penalized empirical risk minimization Sara A. van de Geer A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the

More information

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Convergence of random variables. (telegram style notes) P.J.C. Spreij Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space

More information

REGRESSION WITH QUADRATIC LOSS

REGRESSION WITH QUADRATIC LOSS REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 3

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 3 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture 3 Tolstikhi Ilya Abstract I this lecture we will prove the VC-boud, which provides a high-probability excess risk boud for the ERM algorithm whe

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

Regression with quadratic loss

Regression with quadratic loss Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,

More information

Rademacher Complexity

Rademacher Complexity EECS 598: Statistical Learig Theory, Witer 204 Topic 0 Rademacher Complexity Lecturer: Clayto Scott Scribe: Ya Deg, Kevi Moo Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for

More information

Empirical Process Theory and Oracle Inequalities

Empirical Process Theory and Oracle Inequalities Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig

More information

Binary classification, Part 1

Binary classification, Part 1 Biary classificatio, Part 1 Maxim Ragisky September 25, 2014 The problem of biary classificatio ca be stated as follows. We have a radom couple Z = (X,Y ), where X R d is called the feature vector ad Y

More information

Glivenko-Cantelli Classes

Glivenko-Cantelli Classes CS28B/Stat24B (Sprig 2008 Statistical Learig Theory Lecture: 4 Gliveko-Catelli Classes Lecturer: Peter Bartlett Scribe: Michelle Besi Itroductio This lecture will cover Gliveko-Catelli (GC classes ad itroduce

More information

1 Review and Overview

1 Review and Overview CS9T/STATS3: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #6 Scribe: Jay Whag ad Patrick Cho October 0, 08 Review ad Overview Recall i the last lecture that for ay family of scalar fuctios F, we

More information

Agnostic Learning and Concentration Inequalities

Agnostic Learning and Concentration Inequalities ECE901 Sprig 2004 Statistical Regularizatio ad Learig Theory Lecture: 7 Agostic Learig ad Cocetratio Iequalities Lecturer: Rob Nowak Scribe: Aravid Kailas 1 Itroductio 1.1 Motivatio I the last lecture

More information

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization ECE 90 Lecture 4: Maximum Likelihood Estimatio ad Complexity Regularizatio R Nowak 5/7/009 Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio Y i iid p θ, i,, where

More information

Lecture 3: August 31

Lecture 3: August 31 36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,

More information

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i

More information

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities CS8B/Stat4B Sprig 008) Statistical Learig Theory Lecture: Ada Boost, Risk Bouds, Cocetratio Iequalities Lecturer: Peter Bartlett Scribe: Subhrasu Maji AdaBoost ad Estimates of Coditioal Probabilities We

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit theorems Throughout this sectio we will assume a probability space (Ω, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

Sieve Estimators: Consistency and Rates of Convergence

Sieve Estimators: Consistency and Rates of Convergence EECS 598: Statistical Learig Theory, Witer 2014 Topic 6 Sieve Estimators: Cosistecy ad Rates of Covergece Lecturer: Clayto Scott Scribe: Julia Katz-Samuels, Brado Oselio, Pi-Yu Che Disclaimer: These otes

More information

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS J. Japa Statist. Soc. Vol. 41 No. 1 2011 67 73 A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS Yoichi Nishiyama* We cosider k-sample ad chage poit problems for idepedet data i a

More information

Introduction to Probability. Ariel Yadin

Introduction to Probability. Ariel Yadin Itroductio to robability Ariel Yadi Lecture 2 *** Ja. 7 ***. Covergece of Radom Variables As i the case of sequeces of umbers, we would like to talk about covergece of radom variables. There are may ways

More information

Estimation of the essential supremum of a regression function

Estimation of the essential supremum of a regression function Estimatio of the essetial supremum of a regressio fuctio Michael ohler, Adam rzyżak 2, ad Harro Walk 3 Fachbereich Mathematik, Techische Uiversität Darmstadt, Schlossgartestr. 7, 64289 Darmstadt, Germay,

More information

1 Review and Overview

1 Review and Overview DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,

More information

Statistical Machine Learning II Spring 2017, Learning Theory, Lecture 7

Statistical Machine Learning II Spring 2017, Learning Theory, Lecture 7 Statistical Machie Learig II Sprig 2017, Learig Theory, Lecture 7 1 Itroductio Jea Hoorio jhoorio@purdue.edu So far we have see some techiques for provig geeralizatio for coutably fiite hypothesis classes

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

Journal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula

Journal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula Joural of Multivariate Aalysis 102 (2011) 1315 1319 Cotets lists available at ScieceDirect Joural of Multivariate Aalysis joural homepage: www.elsevier.com/locate/jmva Superefficiet estimatio of the margials

More information

Lecture 15: Learning Theory: Concentration Inequalities

Lecture 15: Learning Theory: Concentration Inequalities STAT 425: Itroductio to Noparametric Statistics Witer 208 Lecture 5: Learig Theory: Cocetratio Iequalities Istructor: Ye-Chi Che 5. Itroductio Recall that i the lecture o classificatio, we have see that

More information

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014. Product measures, Toelli s ad Fubii s theorems For use i MAT3400/4400, autum 2014 Nadia S. Larse Versio of 13 October 2014. 1. Costructio of the product measure The purpose of these otes is to preset the

More information

The random version of Dvoretzky s theorem in l n

The random version of Dvoretzky s theorem in l n The radom versio of Dvoretzky s theorem i l Gideo Schechtma Abstract We show that with high probability a sectio of the l ball of dimesio k cε log c > 0 a uiversal costat) is ε close to a multiple of the

More information

Lecture 19: Convergence

Lecture 19: Convergence Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may

More information

Rates of Convergence by Moduli of Continuity

Rates of Convergence by Moduli of Continuity Rates of Covergece by Moduli of Cotiuity Joh Duchi: Notes for Statistics 300b March, 017 1 Itroductio I this ote, we give a presetatio showig the importace, ad relatioship betwee, the modulis of cotiuity

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/5.070J Fall 203 Lecture 3 9//203 Large deviatios Theory. Cramér s Theorem Cotet.. Cramér s Theorem. 2. Rate fuctio ad properties. 3. Chage of measure techique.

More information

Self-normalized deviation inequalities with application to t-statistic

Self-normalized deviation inequalities with application to t-statistic Self-ormalized deviatio iequalities with applicatio to t-statistic Xiequa Fa Ceter for Applied Mathematics, Tiaji Uiversity, 30007 Tiaji, Chia Abstract Let ξ i i 1 be a sequece of idepedet ad symmetric

More information

Lecture 3 The Lebesgue Integral

Lecture 3 The Lebesgue Integral Lecture 3: The Lebesgue Itegral 1 of 14 Course: Theory of Probability I Term: Fall 2013 Istructor: Gorda Zitkovic Lecture 3 The Lebesgue Itegral The costructio of the itegral Uless expressly specified

More information

Information-based Feature Selection

Information-based Feature Selection Iformatio-based Feature Selectio Farza Faria, Abbas Kazeroui, Afshi Babveyh Email: {faria,abbask,afshib}@staford.edu 1 Itroductio Feature selectio is a topic of great iterest i applicatios dealig with

More information

Disjoint Systems. Abstract

Disjoint Systems. Abstract Disjoit Systems Noga Alo ad Bey Sudaov Departmet of Mathematics Raymod ad Beverly Sacler Faculty of Exact Scieces Tel Aviv Uiversity, Tel Aviv, Israel Abstract A disjoit system of type (,,, ) is a collectio

More information

An Introduction to Randomized Algorithms

An Introduction to Randomized Algorithms A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013 Large Deviatios for i.i.d. Radom Variables Cotet. Cheroff boud usig expoetial momet geeratig fuctios. Properties of a momet

More information

Notes 19 : Martingale CLT

Notes 19 : Martingale CLT Notes 9 : Martigale CLT Math 733-734: Theory of Probability Lecturer: Sebastie Roch Refereces: [Bil95, Chapter 35], [Roc, Chapter 3]. Sice we have ot ecoutered weak covergece i some time, we first recall

More information

It is often useful to approximate complicated functions using simpler ones. We consider the task of approximating a function by a polynomial.

It is often useful to approximate complicated functions using simpler ones. We consider the task of approximating a function by a polynomial. Taylor Polyomials ad Taylor Series It is ofte useful to approximate complicated fuctios usig simpler oes We cosider the task of approximatig a fuctio by a polyomial If f is at least -times differetiable

More information

Lecture 2. The Lovász Local Lemma

Lecture 2. The Lovász Local Lemma Staford Uiversity Sprig 208 Math 233A: No-costructive methods i combiatorics Istructor: Ja Vodrák Lecture date: Jauary 0, 208 Origial scribe: Apoorva Khare Lecture 2. The Lovász Local Lemma 2. Itroductio

More information

18.657: Mathematics of Machine Learning

18.657: Mathematics of Machine Learning 8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 4 Scribe: Cheg Mao Sep., 05 I this lecture, we cotiue to discuss the effect of oise o the rate of the excess risk E(h) = R(h) R(h

More information

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3 MATH 337 Sequeces Dr. Neal, WKU Let X be a metric space with distace fuctio d. We shall defie the geeral cocept of sequece ad limit i a metric space, the apply the results i particular to some special

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

Basics of Probability Theory (for Theory of Computation courses)

Basics of Probability Theory (for Theory of Computation courses) Basics of Probability Theory (for Theory of Computatio courses) Oded Goldreich Departmet of Computer Sciece Weizma Istitute of Sciece Rehovot, Israel. oded.goldreich@weizma.ac.il November 24, 2008 Preface.

More information

ECE534, Spring 2018: Solutions for Problem Set #2

ECE534, Spring 2018: Solutions for Problem Set #2 ECE534, Srig 08: s for roblem Set #. Rademacher Radom Variables ad Symmetrizatio a) Let X be a Rademacher radom variable, i.e., X = ±) = /. Show that E e λx e λ /. E e λx = e λ + e λ = + k= k=0 λ k k k!

More information

Advanced Stochastic Processes.

Advanced Stochastic Processes. Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.

More information

Dimension-free PAC-Bayesian bounds for the estimation of the mean of a random vector

Dimension-free PAC-Bayesian bounds for the estimation of the mean of a random vector Dimesio-free PAC-Bayesia bouds for the estimatio of the mea of a radom vector Olivier Catoi CREST CNRS UMR 9194 Uiversité Paris Saclay olivier.catoi@esae.fr Ilaria Giulii Laboratoire de Probabilités et

More information

Mi-Hwa Ko and Tae-Sung Kim

Mi-Hwa Ko and Tae-Sung Kim J. Korea Math. Soc. 42 2005), No. 5, pp. 949 957 ALMOST SURE CONVERGENCE FOR WEIGHTED SUMS OF NEGATIVELY ORTHANT DEPENDENT RANDOM VARIABLES Mi-Hwa Ko ad Tae-Sug Kim Abstract. For weighted sum of a sequece

More information

Maximum Likelihood Estimation and Complexity Regularization

Maximum Likelihood Estimation and Complexity Regularization ECE90 Sprig 004 Statistical Regularizatio ad Learig Theory Lecture: 4 Maximum Likelihood Estimatio ad Complexity Regularizatio Lecturer: Rob Nowak Scribe: Pam Limpiti Review : Maximum Likelihood Estimatio

More information

Sequences and Series of Functions

Sequences and Series of Functions Chapter 6 Sequeces ad Series of Fuctios 6.1. Covergece of a Sequece of Fuctios Poitwise Covergece. Defiitio 6.1. Let, for each N, fuctio f : A R be defied. If, for each x A, the sequece (f (x)) coverges

More information

Ma 530 Infinite Series I

Ma 530 Infinite Series I Ma 50 Ifiite Series I Please ote that i additio to the material below this lecture icorporated material from the Visual Calculus web site. The material o sequeces is at Visual Sequeces. (To use this li

More information

Riesz-Fischer Sequences and Lower Frame Bounds

Riesz-Fischer Sequences and Lower Frame Bounds Zeitschrift für Aalysis ud ihre Aweduge Joural for Aalysis ad its Applicatios Volume 1 (00), No., 305 314 Riesz-Fischer Sequeces ad Lower Frame Bouds P. Casazza, O. Christese, S. Li ad A. Lider Abstract.

More information

The log-behavior of n p(n) and n p(n)/n

The log-behavior of n p(n) and n p(n)/n Ramauja J. 44 017, 81-99 The log-behavior of p ad p/ William Y.C. Che 1 ad Ke Y. Zheg 1 Ceter for Applied Mathematics Tiaji Uiversity Tiaji 0007, P. R. Chia Ceter for Combiatorics, LPMC Nakai Uivercity

More information

6.883: Online Methods in Machine Learning Alexander Rakhlin

6.883: Online Methods in Machine Learning Alexander Rakhlin 6.883: Olie Methods i Machie Learig Alexader Rakhli LECTURES 5 AND 6. THE EXPERTS SETTING. EXPONENTIAL WEIGHTS All the algorithms preseted so far halluciate the future values as radom draws ad the perform

More information

Chapter 6 Infinite Series

Chapter 6 Infinite Series Chapter 6 Ifiite Series I the previous chapter we cosidered itegrals which were improper i the sese that the iterval of itegratio was ubouded. I this chapter we are goig to discuss a topic which is somewhat

More information

Machine Learning Theory (CS 6783)

Machine Learning Theory (CS 6783) Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT

More information

A Note on the Kolmogorov-Feller Weak Law of Large Numbers

A Note on the Kolmogorov-Feller Weak Law of Large Numbers Joural of Mathematical Research with Applicatios Mar., 015, Vol. 35, No., pp. 3 8 DOI:10.3770/j.iss:095-651.015.0.013 Http://jmre.dlut.edu.c A Note o the Kolmogorov-Feller Weak Law of Large Numbers Yachu

More information

10-701/ Machine Learning Mid-term Exam Solution

10-701/ Machine Learning Mid-term Exam Solution 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it

More information

Spectral Partitioning in the Planted Partition Model

Spectral Partitioning in the Planted Partition Model Spectral Graph Theory Lecture 21 Spectral Partitioig i the Plated Partitio Model Daiel A. Spielma November 11, 2009 21.1 Itroductio I this lecture, we will perform a crude aalysis of the performace of

More information

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f. Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,

More information

Optimal Sample-Based Estimates of the Expectation of the Empirical Minimizer

Optimal Sample-Based Estimates of the Expectation of the Empirical Minimizer Optimal Sample-Based Estimates of the Expectatio of the Empirical Miimizer Peter L. Bartlett Computer Sciece Divisio ad Departmet of Statistics Uiversity of Califoria, Berkeley 367 Evas Hall #3860, Berkeley,

More information

Intro to Learning Theory

Intro to Learning Theory Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified

More information

Advanced Analysis. Min Yan Department of Mathematics Hong Kong University of Science and Technology

Advanced Analysis. Min Yan Department of Mathematics Hong Kong University of Science and Technology Advaced Aalysis Mi Ya Departmet of Mathematics Hog Kog Uiversity of Sciece ad Techology September 3, 009 Cotets Limit ad Cotiuity 7 Limit of Sequece 8 Defiitio 8 Property 3 3 Ifiity ad Ifiitesimal 8 4

More information

Kernel density estimator

Kernel density estimator Jauary, 07 NONPARAMETRIC ERNEL DENSITY ESTIMATION I this lecture, we discuss kerel estimatio of probability desity fuctios PDF Noparametric desity estimatio is oe of the cetral problems i statistics I

More information

On Random Line Segments in the Unit Square

On Random Line Segments in the Unit Square O Radom Lie Segmets i the Uit Square Thomas A. Courtade Departmet of Electrical Egieerig Uiversity of Califoria Los Ageles, Califoria 90095 Email: tacourta@ee.ucla.edu I. INTRODUCTION Let Q = [0, 1] [0,

More information

Empirical Processes: Glivenko Cantelli Theorems

Empirical Processes: Glivenko Cantelli Theorems Empirical Processes: Gliveko Catelli Theorems Mouliath Baerjee Jue 6, 200 Gliveko Catelli classes of fuctios The reader is referred to Chapter.6 of Weller s Torgo otes, Chapter??? of VDVW ad Chapter 8.3

More information

SOME GENERALIZATIONS OF OLIVIER S THEOREM

SOME GENERALIZATIONS OF OLIVIER S THEOREM SOME GENERALIZATIONS OF OLIVIER S THEOREM Alai Faisat, Sait-Étiee, Georges Grekos, Sait-Étiee, Ladislav Mišík Ostrava (Received Jauary 27, 2006) Abstract. Let a be a coverget series of positive real umbers.

More information

Lecture 10 October Minimaxity and least favorable prior sequences

Lecture 10 October Minimaxity and least favorable prior sequences STATS 300A: Theory of Statistics Fall 205 Lecture 0 October 22 Lecturer: Lester Mackey Scribe: Brya He, Rahul Makhijai Warig: These otes may cotai factual ad/or typographic errors. 0. Miimaxity ad least

More information

Linear Regression Demystified

Linear Regression Demystified Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to

More information

5.1 A mutual information bound based on metric entropy

5.1 A mutual information bound based on metric entropy Chapter 5 Global Fao Method I this chapter, we exted the techiques of Chapter 2.4 o Fao s method the local Fao method) to a more global costructio. I particular, we show that, rather tha costructig a local

More information

Chapter 5. Inequalities. 5.1 The Markov and Chebyshev inequalities

Chapter 5. Inequalities. 5.1 The Markov and Chebyshev inequalities Chapter 5 Iequalities 5.1 The Markov ad Chebyshev iequalities As you have probably see o today s frot page: every perso i the upper teth percetile ears at least 1 times more tha the average salary. I other

More information

Lecture Chapter 6: Convergence of Random Sequences

Lecture Chapter 6: Convergence of Random Sequences ECE5: Aalysis of Radom Sigals Fall 6 Lecture Chapter 6: Covergece of Radom Sequeces Dr Salim El Rouayheb Scribe: Abhay Ashutosh Doel, Qibo Zhag, Peiwe Tia, Pegzhe Wag, Lu Liu Radom sequece Defiitio A ifiite

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

ECE 901 Lecture 13: Maximum Likelihood Estimation

ECE 901 Lecture 13: Maximum Likelihood Estimation ECE 90 Lecture 3: Maximum Likelihood Estimatio R. Nowak 5/7/009 The focus of this lecture is to cosider aother approach to learig based o maximum likelihood estimatio. Ulike earlier approaches cosidered

More information

A Risk Comparison of Ordinary Least Squares vs Ridge Regression

A Risk Comparison of Ordinary Least Squares vs Ridge Regression Joural of Machie Learig Research 14 (2013) 1505-1511 Submitted 5/12; Revised 3/13; Published 6/13 A Risk Compariso of Ordiary Least Squares vs Ridge Regressio Paramveer S. Dhillo Departmet of Computer

More information

An elementary proof that almost all real numbers are normal

An elementary proof that almost all real numbers are normal Acta Uiv. Sapietiae, Mathematica, 2, (200 99 0 A elemetary proof that almost all real umbers are ormal Ferdiád Filip Departmet of Mathematics, Faculty of Educatio, J. Selye Uiversity, Rolícej šoly 59,

More information

Lecture 14: Graph Entropy

Lecture 14: Graph Entropy 15-859: Iformatio Theory ad Applicatios i TCS Sprig 2013 Lecture 14: Graph Etropy March 19, 2013 Lecturer: Mahdi Cheraghchi Scribe: Euiwoog Lee 1 Recap Bergma s boud o the permaet Shearer s Lemma Number

More information

Learnability with Rademacher Complexities

Learnability with Rademacher Complexities Learability with Rademacher Complexities Daiel Khashabi Fall 203 Last Update: September 26, 206 Itroductio Our goal i study of passive ervised learig is to fid a hypothesis h based o a set of examples

More information

Information Theory and Statistics Lecture 4: Lempel-Ziv code

Information Theory and Statistics Lecture 4: Lempel-Ziv code Iformatio Theory ad Statistics Lecture 4: Lempel-Ziv code Łukasz Dębowski ldebowsk@ipipa.waw.pl Ph. D. Programme 203/204 Etropy rate is the limitig compressio rate Theorem For a statioary process (X i)

More information

MAXIMAL INEQUALITIES AND STRONG LAW OF LARGE NUMBERS FOR AANA SEQUENCES

MAXIMAL INEQUALITIES AND STRONG LAW OF LARGE NUMBERS FOR AANA SEQUENCES Commu Korea Math Soc 26 20, No, pp 5 6 DOI 0434/CKMS20265 MAXIMAL INEQUALITIES AND STRONG LAW OF LARGE NUMBERS FOR AANA SEQUENCES Wag Xueju, Hu Shuhe, Li Xiaoqi, ad Yag Wezhi Abstract Let {X, } be a sequece

More information

Confidence interval for the two-parameter exponentiated Gumbel distribution based on record values

Confidence interval for the two-parameter exponentiated Gumbel distribution based on record values Iteratioal Joural of Applied Operatioal Research Vol. 4 No. 1 pp. 61-68 Witer 2014 Joural homepage: www.ijorlu.ir Cofidece iterval for the two-parameter expoetiated Gumbel distributio based o record values

More information

Appendix to Quicksort Asymptotics

Appendix to Quicksort Asymptotics Appedix to Quicksort Asymptotics James Alle Fill Departmet of Mathematical Scieces The Johs Hopkis Uiversity jimfill@jhu.edu ad http://www.mts.jhu.edu/~fill/ ad Svate Jaso Departmet of Mathematics Uppsala

More information

The standard deviation of the mean

The standard deviation of the mean Physics 6C Fall 20 The stadard deviatio of the mea These otes provide some clarificatio o the distictio betwee the stadard deviatio ad the stadard deviatio of the mea.. The sample mea ad variace Cosider

More information

Asymptotic distribution of products of sums of independent random variables

Asymptotic distribution of products of sums of independent random variables Proc. Idia Acad. Sci. Math. Sci. Vol. 3, No., May 03, pp. 83 9. c Idia Academy of Scieces Asymptotic distributio of products of sums of idepedet radom variables YANLING WANG, SUXIA YAO ad HONGXIA DU ollege

More information

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + 62. Power series Defiitio 16. (Power series) Give a sequece {c }, the series c x = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + is called a power series i the variable x. The umbers c are called the coefficiets of

More information

ON THE LEHMER CONSTANT OF FINITE CYCLIC GROUPS

ON THE LEHMER CONSTANT OF FINITE CYCLIC GROUPS ON THE LEHMER CONSTANT OF FINITE CYCLIC GROUPS NORBERT KAIBLINGER Abstract. Results of Lid o Lehmer s problem iclude the value of the Lehmer costat of the fiite cyclic group Z/Z, for 5 ad all odd. By complemetary

More information

On the Theory of Learning with Privileged Information

On the Theory of Learning with Privileged Information O the Theory of Learig with Privileged Iformatio Dmitry Pechyoy NEC Laboratories Priceto, NJ 08540, USA pechyoy@ec-labs.com Vladimir Vapik NEC Laboratories Priceto, NJ 08540, USA vlad@ec-labs.com Abstract

More information

Lecture 2 Long paths in random graphs

Lecture 2 Long paths in random graphs Lecture Log paths i radom graphs 1 Itroductio I this lecture we treat the appearace of log paths ad cycles i sparse radom graphs. will wor with the probability space G(, p) of biomial radom graphs, aalogous

More information

Math 155 (Lecture 3)

Math 155 (Lecture 3) Math 55 (Lecture 3) September 8, I this lecture, we ll cosider the aswer to oe of the most basic coutig problems i combiatorics Questio How may ways are there to choose a -elemet subset of the set {,,,

More information

Element sampling: Part 2

Element sampling: Part 2 Chapter 4 Elemet samplig: Part 2 4.1 Itroductio We ow cosider uequal probability samplig desigs which is very popular i practice. I the uequal probability samplig, we ca improve the efficiecy of the resultig

More information

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics for CS Spring 2008 David Wagner Note 22 CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig

More information

Statistics 511 Additional Materials

Statistics 511 Additional Materials Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability

More information

Stochastic Simulation

Stochastic Simulation Stochastic Simulatio 1 Itroductio Readig Assigmet: Read Chapter 1 of text. We shall itroduce may of the key issues to be discussed i this course via a couple of model problems. Model Problem 1 (Jackso

More information

SOME TRIGONOMETRIC IDENTITIES RELATED TO POWERS OF COSINE AND SINE FUNCTIONS

SOME TRIGONOMETRIC IDENTITIES RELATED TO POWERS OF COSINE AND SINE FUNCTIONS Folia Mathematica Vol. 5, No., pp. 4 6 Acta Uiversitatis Lodziesis c 008 for Uiversity of Lódź Press SOME TRIGONOMETRIC IDENTITIES RELATED TO POWERS OF COSINE AND SINE FUNCTIONS ROMAN WITU LA, DAMIAN S

More information

Lecture 13: Maximum Likelihood Estimation

Lecture 13: Maximum Likelihood Estimation ECE90 Sprig 007 Statistical Learig Theory Istructor: R. Nowak Lecture 3: Maximum Likelihood Estimatio Summary of Lecture I the last lecture we derived a risk (MSE) boud for regressio problems; i.e., select

More information

Distribution of Random Samples & Limit theorems

Distribution of Random Samples & Limit theorems STAT/MATH 395 A - PROBABILITY II UW Witer Quarter 2017 Néhémy Lim Distributio of Radom Samples & Limit theorems 1 Distributio of i.i.d. Samples Motivatig example. Assume that the goal of a study is to

More information