Glivenko-Cantelli Classes - PDF Free Download

CS28B/Stat24B (Sprig 2008 Statistical Learig Theory Lecture: 4 Gliveko-Catelli Classes Lecturer: Peter Bartlett Scribe: Michelle Besi Itroductio This lecture will cover Gliveko-Catelli (GC classes ad itroduce Rademacher averages. We are iterested i GC classes because, for these classes, we get uiform covergece of the empirical average to the true expectatio. Rademacher averages provide a measure of complexity. I this lecture, the primary focus will be o itroducig the GC classes of fuctios ad provig the GC Theorem. It will ed with the defiitio of Rademacher averages. Recall from previous lectures that: We ca choose: Ad we wat: Ad it suffices to show: ˆf = argmi ˆR(f ˆR(f if R(f sup R(f ˆR(f is small For GC class fuctios this sufficiet coditio is satisfied as gets large. 2 GC Classes We begi with a defiitio of the GC class of fuctios. Defiitio. F is a GC Class if, for all ɛ > 0: lim sup P (sup Ef ˆ E f > ɛ = 0 P Note: P meas idepedet draws from a distributio. 2. The Gliveko-Catelli Theorem Let:

2 Gliveko-Catelli Classes x,..., x be i.i.d. data poits from a distributio F. F (x be the empirical distributio fuctio F (t be the true distributio fuctio f of P We have the followig expressios for the CDF s: F (t = Ex t Now defie: F (t = E x t G = {x x 0 : θ R} That is, there is a oe-to-oe mappig betwee G ad R. Therefore: Gliveko-Catelli Theorem P, sup Eg E g 0 Thus, we ca iterpret this classical result as a result about uiform covergece over this class of subsets of the reals. 2.2 GC Theorem We ll ow formally preset the GC Theorem, ad give a proof that is suggestive of a approach that applies much more geerally (which we ll meet i the ext lecture. Theorem 2.. Defie: F (t = P ((, t (the empirical distributio fuctio F (t = P ((, t (the true distributio fuctio f of P For all probability distributios P o R, F a.s. F uiformly o R Or, symbolically we ca write: sup F (x F (x a.s. 0 x R Note: the law of large umbers esures poitwise covergece of distributio fuctios, however, with the GC class of fuctios we obtai somethig stroger, amely uiform covergece. The proof of the Gliveko-Catelli Theorem ivolves three parts:. Use of the McDiarmid cocetratio equality 2. Use of symmetrizatio 3. Applicatio of simple restrictios Proof.

Gliveko-Catelli Classes 3. Through applicatio of the McDiarmid cocetratio iequality we kow that with probability at least exp( 2ɛ 2, ( sup E g Eg E sup E g Eg + ɛ That is, the deviatios are cocetrated aroud their expectatio. 2. Next we apply symmetrizatio. Recall that we ultimately would like to prove: sup E g Eg a.s. 0 g g Also, ote that we ca write: E g Eg = g(x i E g(x i Let: x,..., x be i.i.d. copies of x,..., x. sup E g Eg = E sup g(x i Eg (expadig o defiitio of E g g = E sup (g(x i Eg(x i = E sup E (g(x i g(x i x,..., x (properties of coditioal expectatio EE sup (g(x i g(x i (brigig the E out frot = Esup ɛ i (g(x i g(x i Where ɛ i is a Rademacher variable (uiform o {±}. So we have the followig upperboud o the previous expressio: ( E sup + 3. Next, we cosider simple restrictios o G. We ca write: 2 E sup = 2 E E sup ɛ i g(x i 2 E sup Rademacher averages of G x,..., x

4 Gliveko-Catelli Classes But, {(g(x,..., g(x : g G} = {(g(x (,..., g(x ( : g G} + Where we have ordered the data: {x,...x } = {x (,..., x ( } ad x (... x ( Next we apply the follow lemma to seek the boud of the expressio from above: 2 EE sup x,..., x Lemma 2.2. For A R with R = max ( i a2 i 2, we have: ( E sup ɛ i a i R 2 log A } {{ } Z a Proof. So, exp s s>0 E supz a E exp E sup s supz a = E sup (exp(sz a E expsz a exp s 2 2 A exp(s 2 R2 s ( a 2 i R 2 (because expoetial fuctio is covex (by Hoeffdig s iequality ( log A ɛ i a i if + sr2 = R 2 log A s>0 s 2 Note: For our applicatio, R ad A + Hece: ( 2 log( + P r sup Ê(g Eg ɛ + 2 which completes the proof. exp( 2ɛ 2, 3 Rademacher averages Defiitio. For a class F of real-valued fuctios defied o X, for i.i.d. x,..., x X, ad for idepedet Rademacher radom variables ɛ,..., ɛ, defie:

Gliveko-Catelli Classes 5 R (F = E sup ɛ if(x i (Rademacher averages o F ˆR (F = Esup ɛ if(x i x,..., x (empirical Rademacher averages