CONCENTRATION INEQUALITIES
|
|
- Georgia Crawford
- 5 years ago
- Views:
Transcription
1 CONCENTRATION INEQUALITIES MAXIM RAGINSKY I te previous lecture, te followig result was stated witout proof. If X 1,..., X are idepedet Beroulliθ radom variables represetig te outcomes of a sequece of tosses of a coi wit bias probability of eads θ, te for ay ε 0, 1 1 P θ θ ε 2e ε2 were θ = 1 is te fractio of eads i X = X 1,..., X. Sice θ = E θ, 1 says tat te sample or empirical average of te X i s cocetrates sarply aroud te statistical average θ = EX 1. Bouds like tese are fudametal i statistical learig teory. I te ext few lectures, we will lear te teciques eeded to derive suc bouds for settigs muc more complicated ta coi tossig. Tis is ot meat to be a complete picture; more details ad additioal results ca be foud i te excellet survey by Boucero et al. [BBL04]. X i 1. Te basic tools We start wit Markov s iequality: Let X R be a oegative radom variable. Te for ay t > 0 we ave 2 Te proof is simple: PX t EX t. PX t = E[1 {X t} ] E[X1 {X t}] t EX t, were: 3 uses te fact tat te probability of a evet ca be expressed as te expectatio of its idicator fuctio: PX A = P X dx = 1 {x A} P X dx = E[1 {X A} ] 4 uses te fact tat 5 uses te fact tat so cosequetly E[X1 {X t} ] EX. A X X t > 0 = X t 1 X 0 = X1 {X t} X, Date: Jauary 24,
2 Markov s iequality leads to our first boud o te probability tat a radom variable deviates from its expectatio by more ta a give amout: Cebysev s iequality. Let X be a arbitrary real radom variable. Te for ay t > 0 6 P X EX t Var X t 2, were Var X E[ X EX 2 ] = EX 2 EX 2 is te variace of X. To prove 6, we apply Markov s iequality 2 to te oegative radom variable X EX 2 : 7 P X EX t = X EX 2 t 2 8 E X EX 2 t 2, were te first step uses te fact tat te fuctio φx = x 2 is mootoically icreasig o [0,, so tat a b 0 if ad oly if a 2 b 2. Now let s apply tese tools to te problem of boudig te probability tat, for a coi wit bias θ, te fractio of eads i trials differs from θ by more ta some ε > 0. To tat ed, let us represet te outcomes of te tosses by idepedet Beroulliθ radom variables X 1,..., X {0, 1}, were PX i = 1 = θ for all i. Let θ = 1 X i. Te ad E θ = E Var θ = Var [ 1 1 ] X i = 1 EX }{{} i = θ =PX i =1 X i = 1 2 Var X i = θ1 θ, were we ave used te fact tat te X i s are i.i.d., so VarX X = Var X i = Var X 1. Now we are i a positio to apply Cebysev s iequality: 9 P θ θ ε Var θ θ1 θ ε 2 = ε 2. At te very least, 9 sows tat te probability of gettig a bad sample decreases wit sample size. Ufortuately, it does ot decrease fast eoug. To see wy, we ca appeal to te Cetral Limit Teorem, wic rougly states tat P θ θ t θ1 θ 1 Φt 1 e t2 /2, 2π t were Φt = 1/ 2π t e x2 /2 dx is te stadard Gaussia CDF. Tis would suggest sometig like P θ θ ε exp ε2, 2θ1 θ wic decays wit muc faster ta te rigt-ad side of 9, 2
3 2. Te Ceroff boudig trick ad Hoeffdig s iequality To fix 9, we will use a very powerful tecique, kow as te Ceroff boudig trick [Ce52]. Let X be a oegative radom variable. Suppose we are iterested i boudig te probability PX t for some particular t > 0. Observe tat for ay s > 0 we ave 10 PX t = P e sx e st e st E [ e sx], were te first step is by mootoicity of te fuctio φx = e sx ad te secod step is by Markov s iequality 2. Te Ceroff trick is to coose a s > 0 tat would make te rigt-ad side of 10 suitably small. I fact, sice 10 olds simultaeously for all s > 0, te optimal tig to do is to take PX t if s>0 e st E [ e sx]. However, ofte a good upper boud o te momet-geeratig fuctio E [ e sx] is eoug. Oe suc boud was developed by Hoeffdig [Hoe63] for te case we X is bouded wit probability oe: Lemma 1 Hoeffdig. Let X be a radom variable wit EX = 0 ad Pa X b = 1 for some < a b <. Te for all s > 0 11 E [ e sx] e s2 b a 2 /8. Proof. Te proof uses elemetary calculus ad covexity. First we ote tat te fuctio φx = e sx is covex o R. Ay x [a, b] ca be writte as Hece Sice EX = 0, we ave x = x a b a b + b x b a a. e sx x a b a esb + b x b a esa. E [ e sx] b b a esa a b a esb b = b a a b a esb a e sa. We ave sb a i te expoet i te pareteses. To get te same tig i te e sa term multiplyig te pareteses, we wit a bit of foresigt seek λ suc tat sa = λsb a, wic gives us λ = a/b a. Te b b a a b a esb a e sa = 1 λ + λe sb a e λsb a. Now let u = sb a, so we ca write 12 E [ e sx] 1 λ + λe u e λu. Agai wit a bit of foresigt, let us express te rigt-ad side of 12 as a expoetial of a fuctio of u: 1 λ + λe u e λu = e φu, were φu = log1 λ + λe u λu. Now te wole affair iges o us beig able to sow tat φu u 2 /8 for ay u 0. To tat ed, we first ote tat φ0 = φ 0 = 0, ad tat φ u 1/4 for all u 0. Terefore, by Taylor s teorem we ave φu = φ0 + φ 0u φ αu 2 3
4 for some α [0, u], ad we ca upper-boud te rigt-ad side of te above expressio by u 2 /8. Tus, wic gives us 11. E [ e sx] e φu e u2 /8 = e s2 b a 2 /8, We will ow use te Ceroff metod ad te above lemma to prove te followig Teorem 1 Hoeffdig s iequality. Let X 1,..., X be idepedet radom variables, suc tat X i [a i, b i ] wit probability oe. Let S X i. Te for ay t > 0 2t 2 13 P S ES t exp b i a i 2 ; 14 2t 2 P S ES t exp b i a i 2. Cosequetly, 15 2t 2 P S ES t 2 exp b i a i 2. Proof. By replacig eac X i wit X i EX i, we may as well assume tat EX i = 0. Te S = X i. Usig Ceroff s trick, we write 16 P S t = P e ss e st e st E [ e ss]. Sice te X i s are idepedet, E [ e ss] [ ] [ 17 = E e sx X = E e sx i ] = E [ e sx ] i. Sice X i [a i, b i ], we ca apply Lemma 1 to write E [ e sx i] e s 2 b i a i 2 /8. Substitutig tis ito 17 ad 16, we obtai If we coose s = P S t e st = exp e s2 b i a i 2 /8 st + s2 8 b i a i 2 4t P b i a i 2, te we obtai 13. Te proof of 14 is similar. Now we will apply Hoeffdig s iequality to improve our crude cocetratio boud 9 for te sum of idepedet Beroulliθ radom variables, X 1,..., X. Sice eac X i {0, 1}, we ca apply Teorem 1 to get, for ay t > 0, P X i θ t 2e 2t2 /. Terefore, wic gives us te claimed boud 1. P θ θ ε = P X i θ ε 2e 2ε2, 4
5 3. From bouded variables to bouded differeces: McDiarmid s iequality Hoeffdig s iequality applies to sums of idepedet radom variables. We will ow develop its geeralizatio, due to McDiarmid [McD89], to arbitrary real-valued fuctios of idepedet radom variables tat satisfy a certai coditio. Let X be some set, ad cosider a fuctio g : X R. We say tat g as bouded differeces if tere exist oegative umbers c 1,..., c, suc tat 18 sup x 1,...,x,x i X gx1,..., x gx 1,..., x i 1, x i, x i+1,..., x ci for all i = 1,...,. I words, if we cage te it variable wile keepig all te oters fixed, te value of g will ot cage by more ta c i. Teorem 2 McDiarmid s iequality [McD89]. Let X = X 1,..., X X be a -tuple of idepedet X-valued radom variables. If a fuctio g : X R as bouded differeces, as i 18, te, for all t > 0, P gx EgX t exp 2t2 ; c2 i P EgX gx t exp 2t2. c2 i Proof. Let me first sketc te geeral idea beid te proof. Let V = gx EgX. Te first step will be to write V as a sum V i, were te terms V i are costructed so tat: 1 V i is a fuctio oly of X i = X 1,..., X i 2 Tere exists a fuctio Ψ i : X i 1 R suc tat, coditioally o X i 1, Ψ i X i 1 V i Ψ i X i 1 + c i. Provided we ca arrage tigs i tis way, we ca apply Lemma 1 to V i coditioally o X i 1 : E [ e sv i X i 1] e s2 c 21 2 i /8. Te, usig Ceroff s metod, we ave P Z EZ t = PV t e st E [ e sv ] = e st E [e s P ] V i = e st E [e s P ] 1 V i e sv = e st E [e s P 1 V i E [e ]] X sv 1 e st e s2 c 2 /8 E [e s P ] 1 V i, were i te ext-to-last step we used te fact tat V 1,..., V 1 deped oly o X 1, ad i te last step we used 21 wit i =. If we cotiue peelig off te terms ivolvig V 1, V 2,..., V 1, we will get P Z EZ t exp st + s2 c 2 i. 8 Takig s = 4t/ c2 i, we ed up wit 19. 5
6 It remais to costruct te V i s wit te desired properties. To tat ed, let Te V i = H i X i = E[Z X i ] ad V i = H i X i H i 1 X i 1. { E[Z X i ] E[Z X i 1 ] } = E[Z X ] EZ = Z EZ = V. Note tat V i depeds oly o X i by costructio. Moreover, let Ψ i X i 1 = if Hi X i 1, x H i 1 X i 1 x X Ψ ix i 1 = sup Hi X i 1, x H i 1 X i 1, x X were, owig to te fact tat te X i s are idepedet, we ave H i X i 1, x = E[Z X i 1, X i = x] = gx i 1, x, x i+1p X i+1 dx i+1 x i+1 deotig te tuple x i+1,..., x. Te Ψ ix i 1 Ψ i X i 1 = sup Hi X i 1, x H i 1 X i 1 if Hi X i 1, x H i 1 X i 1 x X x X = sup sup Hi X i 1, x H i X i 1, x x X x X = sup sup E[Z X i 1, X i = x] E[Z X i 1, X i = x ] x X x X [gx = sup sup i 1, x, x i+1 gx i 1, x, x i+1 ] P dx i+1 x X x X sup sup gx i 1, x, x i+1 gx i 1, x, x i+1 P dx i+1 x X x X c i, were te last step follows from te bouded differece property. Tus, we ca write Ψ i Xi 1 Ψ i X i 1 + c i, wic implies tat, ideed, coditioally o X i 1. Ψ i X i 1 V i Ψ i X i 1 + c i 4. McDiarmid s iequality i actio McDiarmid s iequality is a extremely powerful ad ofte used tool i statistical learig teory. We will ow discuss several examples of its use. To tat ed, we will first itroduce some otatio ad defiitios. Let X be some measurable space. If Q is a probability distributio of a X-valued radom variable X, te we ca compute te expectatio of ay measurable fuctio f : X R w.r.t. Q. So far, we ave deoted tis expectatio by EfX or by E Q fx. We will ofte fid it coveiet to use a alterative otatio, Qf. Let X = X 1,..., X be idepedet idetically distributed i.i.d. X-valued radom variables wit commo distributio P. Te mai object of iterest to us is te empirical distributio iduced by X, wic we will deote by P X. Te empirical distributio assigs te probability 1/ to eac X i, i.e., P X = 1 δ Xi. 6
7 Here, δ x deotes a uit mass cocetrated at a poit x X, i.e., te probability distributio o X defied by δ x A = 1 {x A}, measurable A X. We ote te followig importat facts about P X : 1 Beig a fuctio of te sample X, P X is a radom variable takig values i te space of probability distributios over X. 2 Te probability of a set A X uder P X, P X A = 1 1 {Xi A}, is te empirical frequecy of te set A o te sample X. Te expectatio of P X A is equal to P A, te P -probability of A. Ideed, [ ] E P 1 X A = E 1 {Xi A} = 1 E[1 {Xi A}] = 1 PX i A = P A. 3 Give a fuctio f : X R, we ca compute its expectatio w.r.t. P X : P X f = 1 fx i, wic is just te sample mea of f o X. It is also referred to as te empirical expectatio of f o X. We ave [ ] E P 1 1 X f = E fx i EfX i = EfX = P f. We ca ow proceed to our examples Sums of bouded radom variables. I te special case we X = R, P is a probability distributio supported o a fiite iterval, ad gx is te sum gx = X i, McDiarmid s iequality simply reduces to Hoeffdig s. Ideed, for ay x [a, b] ad x i we ave [a, b] Itercagig te roles of x i ad x i, we get gx i 1, x i, x i+1 gx i 1, x i, x i+1 = x i x i b a. gx i 1, x i, x i+1 gx i 1, x i, x i+1 = x i x i b a. Hece, we may apply Teorem 2 wit c i = b a for all i to get P gx EgX t 2 exp 2t2 b a 2. 7
8 4.2. Uiform deviatios. Let X 1,..., X be i.i.d. X-valued radom variables wit commo distributio P. By te Law of Large Numbers, for ay A X ad ay ε > 0 lim P PX A P A ε = 0. I fact, we ca use Hoeffdig s iequality to sow tat P PX A P A ε 2e 2ε2. Tis probability boud olds for eac A separately. However, i learig teory we are ofte iterested i te deviatio of empirical frequecies from true probabilities simultaeously over some collectio of te subsets of X. To tat ed, let A be suc a collectio ad cosider te fuctio gx sup P 22 X A P A. A A Later i te course we will see tat, for certai coices of A, EgX = O1/. However, regardless of wat A is, it is easy to see tat, by cagig oly oe X i, te value of gx ca cage at most by 1/. Let x = x 1,..., x, coose some oter x i X, ad let x i deote x wit x i replaced by x i : Te x = x i 1, xi, x i+1, x i = xi 1, x i, x i+1. gx gx i = sup P x A P A sup P x A A A i A P A A { = sup if Px A P A P } x A A A A i A P A { sup Px A P A P } x i A P A A A sup P x A P x i A A A = 1 sup 1 {xi A} 1 {x i A} 1. A A Itercagig te roles of x ad x i, we obtai gx i gx 1. Tus, gx gx i 1. Note tat tis boud olds for all i ad all coices of x ad x i. Tis meas tat te fuctio g defied i 22 as bouded differeces wit c 1 =... = c = 1/. Cosequetly, we ca use Teorem 2 to get P gx EgX ε 2e 2ε2. Tis sows tat te uiform deviatio gx cocetrates sarply aroud its mea EgX. 8
9 4.3. Uiform deviatios cotiued. Te same idea applies to arbitrary real-valued fuctios over X. Let X = X 1,..., X be as i te previous example. Give ay fuctio f : X [0, 1], Hoeffdig s iequality tells us tat P PX f EfX ε 2e 2ε2. However, just as i te previous example, i learig teory we are primarily iterested i cotrollig te deviatios of empirical meas from true meas simultaeously over wole classes of fuctios. To tat ed, let F be suc a class cosistig of fuctios f : X [0, 1] ad cosider te uiform deviatio gx sup f F P X f P f. A argumet etirely similar to te oe i te previous example 1 sows tat tis g as bouded differeces wit c 1 =... = c = 1/. Terefore, applyig McDiarmid s iequality, we obtai P gx EgX ε 2e 2ε2. We will see later tat, for certai fuctio classes F, we will ave EgX = O1/ Kerel desity estimatio. For our fial example, let X = X 1,..., X be a -tuple of i.i.d. real-valued radom variables wose commo distributio P as a probability desity fuctio pdf f, i.e., P A = fxdx for ay measurable set A R. We wis to estimate f from te sample X. A popular metod is to use a kerel estimate te book by Devroye ad Lugosi [DL01] as plety of material o desity estimatio, icludig kerel metods, from te viewpoit of statistical learig teory. To tat ed, we pick a oegative fuctio K : R R tat itegrates to oe, Kxdx = 1 suc a fuctio is called a kerel, as well as a positive badwidt or smootig costat > 0 ad form te estimate f x = 1 A x Xi K It is ot ard to verify 2 tat f is a valid pdf, i.e., tat it is oegative ad itegrates to oe. A commo way of quatifyig te performace of a desity estimator is to use te L 1 distace to te true desity f: f f L1 = f x fx dx. R Note tat f f L1 is a radom variable sice it depeds o te radom sample X. Tus, we ca write it as a fuctio gx of te sample X. Leavig aside te problem of actually boudig EgX, we ca easily establis a cocetratio boud for it usig McDiarmid s iequality. To do. 1 Exercise: verify tis! 2 Aoter exercise! 9
10 tat, we eed to ceck tat g as bouded differeces. Coosig x ad x i as before, we ave gx gx i = 1 i 1 x xj K R j=1 1 i 1 x xj K R j=1 1 x K xi R 2 x K dx = 2. R + 1 K x xi + 1 K K x x i dx + 1 j=i+1 x x i + 1 x xj K fx dx x xj K fx dx j=i+1 Tus, we see tat gx as te bouded differeces property wit c 1 =... = c = 2/, so tat P gx EgX ε 2e ε2 /2. Refereces [BBL04] S. Boucero, O. Bousquet, ad G. Lugosi. Cocetratio iequalities. I O. Bousquet, U. vo Luxburg, ad G. Rätsc, editors, Advaced Lectures i Macie Learig, pages Spriger, [Ce52] H. Ceroff. A meausre of asymptotic efficiecy of tests of a ypotesis based o te sum of observatios. Aals of Matematical Statistics, 23: , [DL01] L. Devroye ad G. Lugosi. Combiatorial Metods i Desity Estimatio. Spriger, [Hoe63] W. Hoeffdig. Probability iequalities for sums of bouded radom variables. Joural of te America Statistical Associatio, 58:13 30, [McD89] C. McDiarmid. O te metod of bouded differeces. I Surveys i Combiatorics, pages Cambridge Uiversity Press,
Learning Theory: Lecture Notes
Learig Theory: Lecture Notes Kamalika Chaudhuri October 4, 0 Cocetratio of Averages Cocetratio of measure is very useful i showig bouds o the errors of machie-learig algorithms. We will begi with a basic
More informationChapter 5. Inequalities. 5.1 The Markov and Chebyshev inequalities
Chapter 5 Iequalities 5.1 The Markov ad Chebyshev iequalities As you have probably see o today s frot page: every perso i the upper teth percetile ears at least 1 times more tha the average salary. I other
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS
MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak
More informationLECTURE 2 LEAST SQUARES CROSS-VALIDATION FOR KERNEL DENSITY ESTIMATION
Jauary 3 07 LECTURE LEAST SQUARES CROSS-VALIDATION FOR ERNEL DENSITY ESTIMATION Noparametric kerel estimatio is extremely sesitive to te coice of badwidt as larger values of result i averagig over more
More informationThis section is optional.
4 Momet Geeratig Fuctios* This sectio is optioal. The momet geeratig fuctio g : R R of a radom variable X is defied as g(t) = E[e tx ]. Propositio 1. We have g () (0) = E[X ] for = 1, 2,... Proof. Therefore
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More informationNonparametric regression: minimax upper and lower bounds
Capter 4 Noparametric regressio: miimax upper ad lower bouds 4. Itroductio We cosider oe of te two te most classical o-parametric problems i tis example: estimatig a regressio fuctio o a subset of te real
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit theorems Throughout this sectio we will assume a probability space (Ω, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More informationDistribution of Random Samples & Limit theorems
STAT/MATH 395 A - PROBABILITY II UW Witer Quarter 2017 Néhémy Lim Distributio of Radom Samples & Limit theorems 1 Distributio of i.i.d. Samples Motivatig example. Assume that the goal of a study is to
More informationGlivenko-Cantelli Classes
CS28B/Stat24B (Sprig 2008 Statistical Learig Theory Lecture: 4 Gliveko-Catelli Classes Lecturer: Peter Bartlett Scribe: Michelle Besi Itroductio This lecture will cover Gliveko-Catelli (GC classes ad itroduce
More informationConvergence of random variables. (telegram style notes) P.J.C. Spreij
Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space
More informationLecture 3: August 31
36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,
More informationAdvanced Stochastic Processes.
Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013 Large Deviatios for i.i.d. Radom Variables Cotet. Cheroff boud usig expoetial momet geeratig fuctios. Properties of a momet
More informationAgnostic Learning and Concentration Inequalities
ECE901 Sprig 2004 Statistical Regularizatio ad Learig Theory Lecture: 7 Agostic Learig ad Cocetratio Iequalities Lecturer: Rob Nowak Scribe: Aravid Kailas 1 Itroductio 1.1 Motivatio I the last lecture
More informationLecture 2: Concentration Bounds
CSE 52: Desig ad Aalysis of Algorithms I Sprig 206 Lecture 2: Cocetratio Bouds Lecturer: Shaya Oveis Ghara March 30th Scribe: Syuzaa Sargsya Disclaimer: These otes have ot bee subjected to the usual scrutiy
More informationLet us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.
Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,
More informationLecture 19: Convergence
Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/5.070J Fall 203 Lecture 3 9//203 Large deviatios Theory. Cramér s Theorem Cotet.. Cramér s Theorem. 2. Rate fuctio ad properties. 3. Chage of measure techique.
More informationRandom Variables, Sampling and Estimation
Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig
More informationProbability 2 - Notes 10. Lemma. If X is a random variable and g(x) 0 for all x in the support of f X, then P(g(X) 1) E[g(X)].
Probability 2 - Notes 0 Some Useful Iequalities. Lemma. If X is a radom variable ad g(x 0 for all x i the support of f X, the P(g(X E[g(X]. Proof. (cotiuous case P(g(X Corollaries x:g(x f X (xdx x:g(x
More informationA survey on penalized empirical risk minimization Sara A. van de Geer
A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the
More informationLecture 7: Properties of Random Samples
Lecture 7: Properties of Radom Samples 1 Cotiued From Last Class Theorem 1.1. Let X 1, X,...X be a radom sample from a populatio with mea µ ad variace σ
More informationDiscrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22
CS 70 Discrete Mathematics for CS Sprig 2007 Luca Trevisa Lecture 22 Aother Importat Distributio The Geometric Distributio Questio: A biased coi with Heads probability p is tossed repeatedly util the first
More informationECE 901 Lecture 12: Complexity Regularization and the Squared Loss
ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality
More informationExpectation and Variance of a random variable
Chapter 11 Expectatio ad Variace of a radom variable The aim of this lecture is to defie ad itroduce mathematical Expectatio ad variace of a fuctio of discrete & cotiuous radom variables ad the distributio
More informationREGRESSION WITH QUADRATIC LOSS
REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d
More information4 Conditional Distribution Estimation
4 Coditioal Distributio Estimatio 4. Estimators Te coditioal distributio (CDF) of y i give X i = x is F (y j x) = P (y i y j X i = x) = E ( (y i y) j X i = x) : Tis is te coditioal mea of te radom variable
More informationEcon 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.
Eco 325/327 Notes o Sample Mea, Sample Proportio, Cetral Limit Theorem, Chi-square Distributio, Studet s t distributio 1 Sample Mea By Hiro Kasahara We cosider a radom sample from a populatio. Defiitio
More informationJanuary 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS
Jauary 25, 207 INTRODUCTION TO MATHEMATICAL STATISTICS Abstract. A basic itroductio to statistics assumig kowledge of probability theory.. Probability I a typical udergraduate problem i probability, we
More informationLecture 12: September 27
36-705: Itermediate Statistics Fall 207 Lecturer: Siva Balakrisha Lecture 2: September 27 Today we will discuss sufficiecy i more detail ad the begi to discuss some geeral strategies for costructig estimators.
More informationEE 4TM4: Digital Communications II Probability Theory
1 EE 4TM4: Digital Commuicatios II Probability Theory I. RANDOM VARIABLES A radom variable is a real-valued fuctio defied o the sample space. Example: Suppose that our experimet cosists of tossig two fair
More informationIt is always the case that unions, intersections, complements, and set differences are preserved by the inverse image of a function.
MATH 532 Measurable Fuctios Dr. Neal, WKU Throughout, let ( X, F, µ) be a measure space ad let (!, F, P ) deote the special case of a probability space. We shall ow begi to study real-valued fuctios defied
More informationRegression with quadratic loss
Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,
More informationMonte Carlo Integration
Mote Carlo Itegratio I these otes we first review basic umerical itegratio methods (usig Riema approximatio ad the trapezoidal rule) ad their limitatios for evaluatig multidimesioal itegrals. Next we itroduce
More informationEstimation for Complete Data
Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of
More information4. Partial Sums and the Central Limit Theorem
1 of 10 7/16/2009 6:05 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 4. Partial Sums ad the Cetral Limit Theorem The cetral limit theorem ad the law of large umbers are the two fudametal theorems
More informationMath 525: Lecture 5. January 18, 2018
Math 525: Lecture 5 Jauary 18, 2018 1 Series (review) Defiitio 1.1. A sequece (a ) R coverges to a poit L R (writte a L or lim a = L) if for each ǫ > 0, we ca fid N such that a L < ǫ for all N. If the
More informationSTAT Homework 1 - Solutions
STAT-36700 Homework 1 - Solutios Fall 018 September 11, 018 This cotais solutios for Homework 1. Please ote that we have icluded several additioal commets ad approaches to the problems to give you better
More informationST5215: Advanced Statistical Theory
ST525: Advaced Statistical Theory Departmet of Statistics & Applied Probability Tuesday, September 7, 2 ST525: Advaced Statistical Theory Lecture : The law of large umbers The Law of Large Numbers The
More informationThe log-behavior of n p(n) and n p(n)/n
Ramauja J. 44 017, 81-99 The log-behavior of p ad p/ William Y.C. Che 1 ad Ke Y. Zheg 1 Ceter for Applied Mathematics Tiaji Uiversity Tiaji 0007, P. R. Chia Ceter for Combiatorics, LPMC Nakai Uivercity
More informationRademacher Complexity
EECS 598: Statistical Learig Theory, Witer 204 Topic 0 Rademacher Complexity Lecturer: Clayto Scott Scribe: Ya Deg, Kevi Moo Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for
More informationThis exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.
Probability ad Statistics FS 07 Secod Sessio Exam 09.0.08 Time Limit: 80 Miutes Name: Studet ID: This exam cotais 9 pages (icludig this cover page) ad 0 questios. A Formulae sheet is provided with the
More informationSolution. 1 Solutions of Homework 1. Sangchul Lee. October 27, Problem 1.1
Solutio Sagchul Lee October 7, 017 1 Solutios of Homework 1 Problem 1.1 Let Ω,F,P) be a probability space. Show that if {A : N} F such that A := lim A exists, the PA) = lim PA ). Proof. Usig the cotiuity
More informationEECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1
EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum
More information2.2. Central limit theorem.
36.. Cetral limit theorem. The most ideal case of the CLT is that the radom variables are iid with fiite variace. Although it is a special case of the more geeral Lideberg-Feller CLT, it is most stadard
More informationThe standard deviation of the mean
Physics 6C Fall 20 The stadard deviatio of the mea These otes provide some clarificatio o the distictio betwee the stadard deviatio ad the stadard deviatio of the mea.. The sample mea ad variace Cosider
More information17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15
17. Joit distributios of extreme order statistics Lehma 5.1; Ferguso 15 I Example 10., we derived the asymptotic distributio of the maximum from a radom sample from a uiform distributio. We did this usig
More informationConcentration inequalities
Cocetratio iequalities Jea-Yves Audibert 1,2 1. Imagie - ENPC/CSTB - uiversité Paris Est 2. Willow (INRIA/ENS/CNRS) ThRaSH 2010 with Problem Tight upper ad lower bouds o f(x 1,..., X ) X 1,..., X i.i.d.
More information1 Approximating Integrals using Taylor Polynomials
Seughee Ye Ma 8: Week 7 Nov Week 7 Summary This week, we will lear how we ca approximate itegrals usig Taylor series ad umerical methods. Topics Page Approximatig Itegrals usig Taylor Polyomials. Defiitios................................................
More informationNotes On Nonparametric Density Estimation. James L. Powell Department of Economics University of California, Berkeley
Notes O Noparametric Desity Estimatio James L Powell Departmet of Ecoomics Uiversity of Califoria, Berkeley Uivariate Desity Estimatio via Numerical Derivatives Cosider te problem of estimatig te desity
More informationDimension-free PAC-Bayesian bounds for the estimation of the mean of a random vector
Dimesio-free PAC-Bayesia bouds for the estimatio of the mea of a radom vector Olivier Catoi CREST CNRS UMR 9194 Uiversité Paris Saclay olivier.catoi@esae.fr Ilaria Giulii Laboratoire de Probabilités et
More informationLecture 3 The Lebesgue Integral
Lecture 3: The Lebesgue Itegral 1 of 14 Course: Theory of Probability I Term: Fall 2013 Istructor: Gorda Zitkovic Lecture 3 The Lebesgue Itegral The costructio of the itegral Uless expressly specified
More informationAMS570 Lecture Notes #2
AMS570 Lecture Notes # Review of Probability (cotiued) Probability distributios. () Biomial distributio Biomial Experimet: ) It cosists of trials ) Each trial results i of possible outcomes, S or F 3)
More informationExponential Families and Bayesian Inference
Computer Visio Expoetial Families ad Bayesia Iferece Lecture Expoetial Families A expoetial family of distributios is a d-parameter family f(x; havig the followig form: f(x; = h(xe g(t T (x B(, (. where
More informationThe Random Walk For Dummies
The Radom Walk For Dummies Richard A Mote Abstract We look at the priciples goverig the oe-dimesioal discrete radom walk First we review five basic cocepts of probability theory The we cosider the Beroulli
More information1 Review and Overview
CS9T/STATS3: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #6 Scribe: Jay Whag ad Patrick Cho October 0, 08 Review ad Overview Recall i the last lecture that for ay family of scalar fuctios F, we
More informationAda Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities
CS8B/Stat4B Sprig 008) Statistical Learig Theory Lecture: Ada Boost, Risk Bouds, Cocetratio Iequalities Lecturer: Peter Bartlett Scribe: Subhrasu Maji AdaBoost ad Estimates of Coditioal Probabilities We
More informationLecture 9: Regression: Regressogram and Kernel Regression
STAT 425: Itroductio to Noparametric Statistics Witer 208 Lecture 9: Regressio: Regressogram ad erel Regressio Istructor: Ye-Ci Ce Referece: Capter 5 of All of oparametric statistics 9 Itroductio Let X,
More informationLecture 12: November 13, 2018
Mathematical Toolkit Autum 2018 Lecturer: Madhur Tulsiai Lecture 12: November 13, 2018 1 Radomized polyomial idetity testig We will use our kowledge of coditioal probability to prove the followig lemma,
More informationON LOCAL LINEAR ESTIMATION IN NONPARAMETRIC ERRORS-IN-VARIABLES MODELS 1
Teory of Stocastic Processes Vol2 28, o3-4, 2006, pp*-* SILVELYN ZWANZIG ON LOCAL LINEAR ESTIMATION IN NONPARAMETRIC ERRORS-IN-VARIABLES MODELS Local liear metods are applied to a oparametric regressio
More informationAn Introduction to Randomized Algorithms
A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis
More informationLimit Theorems. Convergence in Probability. Let X be the number of heads observed in n tosses. Then, E[X] = np and Var[X] = np(1-p).
Limit Theorems Covergece i Probability Let X be the umber of heads observed i tosses. The, E[X] = p ad Var[X] = p(-p). L O This P x p NM QP P x p should be close to uity for large if our ituitio is correct.
More informationNotes 19 : Martingale CLT
Notes 9 : Martigale CLT Math 733-734: Theory of Probability Lecturer: Sebastie Roch Refereces: [Bil95, Chapter 35], [Roc, Chapter 3]. Sice we have ot ecoutered weak covergece i some time, we first recall
More informationDiscrete Mathematics for CS Spring 2008 David Wagner Note 22
CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig
More informationIntroduction to Probability. Ariel Yadin
Itroductio to robability Ariel Yadi Lecture 2 *** Ja. 7 ***. Covergece of Radom Variables As i the case of sequeces of umbers, we would like to talk about covergece of radom variables. There are may ways
More information1 Convergence in Probability and the Weak Law of Large Numbers
36-752 Advaced Probability Overview Sprig 2018 8. Covergece Cocepts: i Probability, i L p ad Almost Surely Istructor: Alessadro Rialdo Associated readig: Sec 2.4, 2.5, ad 4.11 of Ash ad Doléas-Dade; Sec
More informationLECTURE 8: ASYMPTOTICS I
LECTURE 8: ASYMPTOTICS I We are iterested i the properties of estimators as. Cosider a sequece of radom variables {, X 1}. N. M. Kiefer, Corell Uiversity, Ecoomics 60 1 Defiitio: (Weak covergece) A sequece
More informationResampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.
Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator
More informationThe random version of Dvoretzky s theorem in l n
The radom versio of Dvoretzky s theorem i l Gideo Schechtma Abstract We show that with high probability a sectio of the l ball of dimesio k cε log c > 0 a uiversal costat) is ε close to a multiple of the
More informationALLOCATING SAMPLE TO STRATA PROPORTIONAL TO AGGREGATE MEASURE OF SIZE WITH BOTH UPPER AND LOWER BOUNDS ON THE NUMBER OF UNITS IN EACH STRATUM
ALLOCATING SAPLE TO STRATA PROPORTIONAL TO AGGREGATE EASURE OF SIZE WIT BOT UPPER AND LOWER BOUNDS ON TE NUBER OF UNITS IN EAC STRATU Lawrece R. Erst ad Cristoper J. Guciardo Erst_L@bls.gov, Guciardo_C@bls.gov
More informationMaximum Likelihood Estimation and Complexity Regularization
ECE90 Sprig 004 Statistical Regularizatio ad Learig Theory Lecture: 4 Maximum Likelihood Estimatio ad Complexity Regularizatio Lecturer: Rob Nowak Scribe: Pam Limpiti Review : Maximum Likelihood Estimatio
More informationECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization
ECE 90 Lecture 4: Maximum Likelihood Estimatio ad Complexity Regularizatio R Nowak 5/7/009 Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio Y i iid p θ, i,, where
More informationNotes 5 : More on the a.s. convergence of sums
Notes 5 : More o the a.s. covergece of sums Math 733-734: Theory of Probability Lecturer: Sebastie Roch Refereces: Dur0, Sectios.5; Wil9, Sectio 4.7, Shi96, Sectio IV.4, Dur0, Sectio.. Radom series. Three-series
More informationElements of Statistical Methods Lots of Data or Large Samples (Ch 8)
Elemets of Statistical Methods Lots of Data or Large Samples (Ch 8) Fritz Scholz Sprig Quarter 2010 February 26, 2010 x ad X We itroduced the sample mea x as the average of the observed sample values x
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013 Fuctioal Law of Large Numbers. Costructio of the Wieer Measure Cotet. 1. Additioal techical results o weak covergece
More informationChapter 3. Strong convergence. 3.1 Definition of almost sure convergence
Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i
More information1 = δ2 (0, ), Y Y n nδ. , T n = Y Y n n. ( U n,k + X ) ( f U n,k + Y ) n 2n f U n,k + θ Y ) 2 E X1 2 X1
8. The cetral limit theorems 8.1. The cetral limit theorem for i.i.d. sequeces. ecall that C ( is N -separatig. Theorem 8.1. Let X 1, X,... be i.i.d. radom variables with EX 1 = ad EX 1 = σ (,. Suppose
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationMathematics 170B Selected HW Solutions.
Mathematics 17B Selected HW Solutios. F 4. Suppose X is B(,p). (a)fidthemometgeeratigfuctiom (s)of(x p)/ p(1 p). Write q = 1 p. The MGF of X is (pe s + q), sice X ca be writte as the sum of idepedet Beroulli
More informationLecture 8: Convergence of transformations and law of large numbers
Lecture 8: Covergece of trasformatios ad law of large umbers Trasformatio ad covergece Trasformatio is a importat tool i statistics. If X coverges to X i some sese, we ofte eed to check whether g(x ) coverges
More informationHOMEWORK I: PREREQUISITES FROM MATH 727
HOMEWORK I: PREREQUISITES FROM MATH 727 Questio. Let X, X 2,... be idepedet expoetial radom variables with mea µ. (a) Show that for Z +, we have EX µ!. (b) Show that almost surely, X + + X (c) Fid the
More informationAsymptotic distribution of products of sums of independent random variables
Proc. Idia Acad. Sci. Math. Sci. Vol. 3, No., May 03, pp. 83 9. c Idia Academy of Scieces Asymptotic distributio of products of sums of idepedet radom variables YANLING WANG, SUXIA YAO ad HONGXIA DU ollege
More informationLecture 2: Poisson Sta*s*cs Probability Density Func*ons Expecta*on and Variance Es*mators
Lecture 2: Poisso Sta*s*cs Probability Desity Fuc*os Expecta*o ad Variace Es*mators Biomial Distribu*o: P (k successes i attempts) =! k!( k)! p k s( p s ) k prob of each success Poisso Distributio Note
More informationDiscrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 19
CS 70 Discrete Mathematics ad Probability Theory Sprig 2016 Rao ad Walrad Note 19 Some Importat Distributios Recall our basic probabilistic experimet of tossig a biased coi times. This is a very simple
More informationSupplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate
Supplemetary Material for Fast Stochastic AUC Maximizatio with O/-Covergece Rate Migrui Liu Xiaoxua Zhag Zaiyi Che Xiaoyu Wag 3 iabao Yag echical Lemmas ized versio of Hoeffdig s iequality, ote that We
More informationFall 2013 MTH431/531 Real analysis Section Notes
Fall 013 MTH431/531 Real aalysis Sectio 8.1-8. Notes Yi Su 013.11.1 1. Defiitio of uiform covergece. We look at a sequece of fuctios f (x) ad study the coverget property. Notice we have two parameters
More information32 estimating the cumulative distribution function
32 estimatig the cumulative distributio fuctio 4.6 types of cofidece itervals/bads Let F be a class of distributio fuctios F ad let θ be some quatity of iterest, such as the mea of F or the whole fuctio
More informationStatistical Theory; Why is the Gaussian Distribution so popular?
Statistical Theory; Why is the Gaussia Distributio so popular? Rob Nicholls MRC LMB Statistics Course 2014 Cotets Cotiuous Radom Variables Expectatio ad Variace Momets The Law of Large Numbers (LLN) The
More information5. Likelihood Ratio Tests
1 of 5 7/29/2009 3:16 PM Virtual Laboratories > 9. Hy pothesis Testig > 1 2 3 4 5 6 7 5. Likelihood Ratio Tests Prelimiaries As usual, our startig poit is a radom experimet with a uderlyig sample space,
More informationn outcome is (+1,+1, 1,..., 1). Let the r.v. X denote our position (relative to our starting point 0) after n moves. Thus X = X 1 + X 2 + +X n,
CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 9 Variace Questio: At each time step, I flip a fair coi. If it comes up Heads, I walk oe step to the right; if it comes up Tails, I walk oe
More informationEmpirical Process Theory and Oracle Inequalities
Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi
More informationMath 152. Rumbos Fall Solutions to Review Problems for Exam #2. Number of Heads Frequency
Math 152. Rumbos Fall 2009 1 Solutios to Review Problems for Exam #2 1. I the book Experimetatio ad Measuremet, by W. J. Youde ad published by the by the Natioal Sciece Teachers Associatio i 1962, the
More information1 Review and Overview
DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,
More informationLecture 7 Testing Nonlinear Inequality Restrictions 1
Eco 75 Lecture 7 Testig Noliear Iequality Restrictios I Lecture 6, we discussed te testig problems were te ull ypotesis is de ed by oliear equality restrictios: H : ( ) = versus H : ( ) 6= : () We sowed
More informationProduct measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.
Product measures, Toelli s ad Fubii s theorems For use i MAT3400/4400, autum 2014 Nadia S. Larse Versio of 13 October 2014. 1. Costructio of the product measure The purpose of these otes is to preset the
More informationLecture 2: April 3, 2013
TTIC/CMSC 350 Mathematical Toolkit Sprig 203 Madhur Tulsiai Lecture 2: April 3, 203 Scribe: Shubhedu Trivedi Coi tosses cotiued We retur to the coi tossig example from the last lecture agai: Example. Give,
More informationProblem Set 2 Solutions
CS271 Radomess & Computatio, Sprig 2018 Problem Set 2 Solutios Poit totals are i the margi; the maximum total umber of poits was 52. 1. Probabilistic method for domiatig sets 6pts Pick a radom subset S
More informationEstimation of the Mean and the ACVF
Chapter 5 Estimatio of the Mea ad the ACVF A statioary process {X t } is characterized by its mea ad its autocovariace fuctio γ ), ad so by the autocorrelatio fuctio ρ ) I this chapter we preset the estimators
More information