CONCENTRATION INEQUALITIES

Similar documents
Learning Theory: Lecture Notes

Chapter 5. Inequalities. 5.1 The Markov and Chebyshev inequalities

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

LECTURE 2 LEAST SQUARES CROSS-VALIDATION FOR KERNEL DENSITY ESTIMATION

This section is optional.

7.1 Convergence of sequences of random variables

Nonparametric regression: minimax upper and lower bounds

7.1 Convergence of sequences of random variables

Distribution of Random Samples & Limit theorems

Glivenko-Cantelli Classes

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Lecture 3: August 31

Advanced Stochastic Processes.

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables

Agnostic Learning and Concentration Inequalities

Lecture 2: Concentration Bounds

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Lecture 19: Convergence

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem

Random Variables, Sampling and Estimation

Probability 2 - Notes 10. Lemma. If X is a random variable and g(x) 0 for all x in the support of f X, then P(g(X) 1) E[g(X)].

A survey on penalized empirical risk minimization Sara A. van de Geer

Lecture 7: Properties of Random Samples

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

Expectation and Variance of a random variable

REGRESSION WITH QUADRATIC LOSS

4 Conditional Distribution Estimation

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

Lecture 12: September 27

EE 4TM4: Digital Communications II Probability Theory

It is always the case that unions, intersections, complements, and set differences are preserved by the inverse image of a function.

Regression with quadratic loss

Monte Carlo Integration

Estimation for Complete Data

4. Partial Sums and the Central Limit Theorem

Math 525: Lecture 5. January 18, 2018

STAT Homework 1 - Solutions

ST5215: Advanced Statistical Theory

The log-behavior of n p(n) and n p(n)/n

Rademacher Complexity

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

Solution. 1 Solutions of Homework 1. Sangchul Lee. October 27, Problem 1.1

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

2.2. Central limit theorem.

The standard deviation of the mean

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15

Concentration inequalities

1 Approximating Integrals using Taylor Polynomials

Notes On Nonparametric Density Estimation. James L. Powell Department of Economics University of California, Berkeley

Dimension-free PAC-Bayesian bounds for the estimation of the mean of a random vector

Lecture 3 The Lebesgue Integral

AMS570 Lecture Notes #2

Exponential Families and Bayesian Inference

The Random Walk For Dummies

1 Review and Overview

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities

Lecture 9: Regression: Regressogram and Kernel Regression

Lecture 12: November 13, 2018

ON LOCAL LINEAR ESTIMATION IN NONPARAMETRIC ERRORS-IN-VARIABLES MODELS 1

An Introduction to Randomized Algorithms

Limit Theorems. Convergence in Probability. Let X be the number of heads observed in n tosses. Then, E[X] = np and Var[X] = np(1-p).

Notes 19 : Martingale CLT

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Introduction to Probability. Ariel Yadin

1 Convergence in Probability and the Weak Law of Large Numbers

LECTURE 8: ASYMPTOTICS I

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

The random version of Dvoretzky s theorem in l n

ALLOCATING SAMPLE TO STRATA PROPORTIONAL TO AGGREGATE MEASURE OF SIZE WITH BOTH UPPER AND LOWER BOUNDS ON THE NUMBER OF UNITS IN EACH STRATUM

Maximum Likelihood Estimation and Complexity Regularization

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization

Notes 5 : More on the a.s. convergence of sums

Elements of Statistical Methods Lots of Data or Large Samples (Ch 8)

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

1 = δ2 (0, ), Y Y n nδ. , T n = Y Y n n. ( U n,k + X ) ( f U n,k + Y ) n 2n f U n,k + θ Y ) 2 E X1 2 X1

Topic 9: Sampling Distributions of Estimators

Mathematics 170B Selected HW Solutions.

Lecture 8: Convergence of transformations and law of large numbers

HOMEWORK I: PREREQUISITES FROM MATH 727

Asymptotic distribution of products of sums of independent random variables

Lecture 2: Poisson Sta*s*cs Probability Density Func*ons Expecta*on and Variance Es*mators

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 19

Supplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate

Fall 2013 MTH431/531 Real analysis Section Notes

32 estimating the cumulative distribution function

Statistical Theory; Why is the Gaussian Distribution so popular?

5. Likelihood Ratio Tests

n outcome is (+1,+1, 1,..., 1). Let the r.v. X denote our position (relative to our starting point 0) after n moves. Thus X = X 1 + X 2 + +X n,

Empirical Process Theory and Oracle Inequalities

Math 152. Rumbos Fall Solutions to Review Problems for Exam #2. Number of Heads Frequency

1 Review and Overview

Lecture 7 Testing Nonlinear Inequality Restrictions 1

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

Lecture 2: April 3, 2013

Problem Set 2 Solutions

Estimation of the Mean and the ACVF

Transcription:

CONCENTRATION INEQUALITIES MAXIM RAGINSKY I te previous lecture, te followig result was stated witout proof. If X 1,..., X are idepedet Beroulliθ radom variables represetig te outcomes of a sequece of tosses of a coi wit bias probability of eads θ, te for ay ε 0, 1 1 P θ θ ε 2e ε2 were θ = 1 is te fractio of eads i X = X 1,..., X. Sice θ = E θ, 1 says tat te sample or empirical average of te X i s cocetrates sarply aroud te statistical average θ = EX 1. Bouds like tese are fudametal i statistical learig teory. I te ext few lectures, we will lear te teciques eeded to derive suc bouds for settigs muc more complicated ta coi tossig. Tis is ot meat to be a complete picture; more details ad additioal results ca be foud i te excellet survey by Boucero et al. [BBL04]. X i 1. Te basic tools We start wit Markov s iequality: Let X R be a oegative radom variable. Te for ay t > 0 we ave 2 Te proof is simple: 3 4 5 PX t EX t. PX t = E[1 {X t} ] E[X1 {X t}] t EX t, were: 3 uses te fact tat te probability of a evet ca be expressed as te expectatio of its idicator fuctio: PX A = P X dx = 1 {x A} P X dx = E[1 {X A} ] 4 uses te fact tat 5 uses te fact tat so cosequetly E[X1 {X t} ] EX. A X X t > 0 = X t 1 X 0 = X1 {X t} X, Date: Jauary 24, 2011. 1

Markov s iequality leads to our first boud o te probability tat a radom variable deviates from its expectatio by more ta a give amout: Cebysev s iequality. Let X be a arbitrary real radom variable. Te for ay t > 0 6 P X EX t Var X t 2, were Var X E[ X EX 2 ] = EX 2 EX 2 is te variace of X. To prove 6, we apply Markov s iequality 2 to te oegative radom variable X EX 2 : 7 P X EX t = X EX 2 t 2 8 E X EX 2 t 2, were te first step uses te fact tat te fuctio φx = x 2 is mootoically icreasig o [0,, so tat a b 0 if ad oly if a 2 b 2. Now let s apply tese tools to te problem of boudig te probability tat, for a coi wit bias θ, te fractio of eads i trials differs from θ by more ta some ε > 0. To tat ed, let us represet te outcomes of te tosses by idepedet Beroulliθ radom variables X 1,..., X {0, 1}, were PX i = 1 = θ for all i. Let θ = 1 X i. Te ad E θ = E Var θ = Var [ 1 1 ] X i = 1 EX }{{} i = θ =PX i =1 X i = 1 2 Var X i = θ1 θ, were we ave used te fact tat te X i s are i.i.d., so VarX 1 +...+X = Var X i = Var X 1. Now we are i a positio to apply Cebysev s iequality: 9 P θ θ ε Var θ θ1 θ ε 2 = ε 2. At te very least, 9 sows tat te probability of gettig a bad sample decreases wit sample size. Ufortuately, it does ot decrease fast eoug. To see wy, we ca appeal to te Cetral Limit Teorem, wic rougly states tat P θ θ t θ1 θ 1 Φt 1 e t2 /2, 2π t were Φt = 1/ 2π t e x2 /2 dx is te stadard Gaussia CDF. Tis would suggest sometig like P θ θ ε exp ε2, 2θ1 θ wic decays wit muc faster ta te rigt-ad side of 9, 2

2. Te Ceroff boudig trick ad Hoeffdig s iequality To fix 9, we will use a very powerful tecique, kow as te Ceroff boudig trick [Ce52]. Let X be a oegative radom variable. Suppose we are iterested i boudig te probability PX t for some particular t > 0. Observe tat for ay s > 0 we ave 10 PX t = P e sx e st e st E [ e sx], were te first step is by mootoicity of te fuctio φx = e sx ad te secod step is by Markov s iequality 2. Te Ceroff trick is to coose a s > 0 tat would make te rigt-ad side of 10 suitably small. I fact, sice 10 olds simultaeously for all s > 0, te optimal tig to do is to take PX t if s>0 e st E [ e sx]. However, ofte a good upper boud o te momet-geeratig fuctio E [ e sx] is eoug. Oe suc boud was developed by Hoeffdig [Hoe63] for te case we X is bouded wit probability oe: Lemma 1 Hoeffdig. Let X be a radom variable wit EX = 0 ad Pa X b = 1 for some < a b <. Te for all s > 0 11 E [ e sx] e s2 b a 2 /8. Proof. Te proof uses elemetary calculus ad covexity. First we ote tat te fuctio φx = e sx is covex o R. Ay x [a, b] ca be writte as Hece Sice EX = 0, we ave x = x a b a b + b x b a a. e sx x a b a esb + b x b a esa. E [ e sx] b b a esa a b a esb b = b a a b a esb a e sa. We ave sb a i te expoet i te pareteses. To get te same tig i te e sa term multiplyig te pareteses, we wit a bit of foresigt seek λ suc tat sa = λsb a, wic gives us λ = a/b a. Te b b a a b a esb a e sa = 1 λ + λe sb a e λsb a. Now let u = sb a, so we ca write 12 E [ e sx] 1 λ + λe u e λu. Agai wit a bit of foresigt, let us express te rigt-ad side of 12 as a expoetial of a fuctio of u: 1 λ + λe u e λu = e φu, were φu = log1 λ + λe u λu. Now te wole affair iges o us beig able to sow tat φu u 2 /8 for ay u 0. To tat ed, we first ote tat φ0 = φ 0 = 0, ad tat φ u 1/4 for all u 0. Terefore, by Taylor s teorem we ave φu = φ0 + φ 0u + 1 2 φ αu 2 3

for some α [0, u], ad we ca upper-boud te rigt-ad side of te above expressio by u 2 /8. Tus, wic gives us 11. E [ e sx] e φu e u2 /8 = e s2 b a 2 /8, We will ow use te Ceroff metod ad te above lemma to prove te followig Teorem 1 Hoeffdig s iequality. Let X 1,..., X be idepedet radom variables, suc tat X i [a i, b i ] wit probability oe. Let S X i. Te for ay t > 0 2t 2 13 P S ES t exp b i a i 2 ; 14 2t 2 P S ES t exp b i a i 2. Cosequetly, 15 2t 2 P S ES t 2 exp b i a i 2. Proof. By replacig eac X i wit X i EX i, we may as well assume tat EX i = 0. Te S = X i. Usig Ceroff s trick, we write 16 P S t = P e ss e st e st E [ e ss]. Sice te X i s are idepedet, E [ e ss] [ ] [ 17 = E e sx 1+...+X = E e sx i ] = E [ e sx ] i. Sice X i [a i, b i ], we ca apply Lemma 1 to write E [ e sx i] e s 2 b i a i 2 /8. Substitutig tis ito 17 ad 16, we obtai If we coose s = P S t e st = exp e s2 b i a i 2 /8 st + s2 8 b i a i 2 4t P b i a i 2, te we obtai 13. Te proof of 14 is similar. Now we will apply Hoeffdig s iequality to improve our crude cocetratio boud 9 for te sum of idepedet Beroulliθ radom variables, X 1,..., X. Sice eac X i {0, 1}, we ca apply Teorem 1 to get, for ay t > 0, P X i θ t 2e 2t2 /. Terefore, wic gives us te claimed boud 1. P θ θ ε = P X i θ ε 2e 2ε2, 4

3. From bouded variables to bouded differeces: McDiarmid s iequality Hoeffdig s iequality applies to sums of idepedet radom variables. We will ow develop its geeralizatio, due to McDiarmid [McD89], to arbitrary real-valued fuctios of idepedet radom variables tat satisfy a certai coditio. Let X be some set, ad cosider a fuctio g : X R. We say tat g as bouded differeces if tere exist oegative umbers c 1,..., c, suc tat 18 sup x 1,...,x,x i X gx1,..., x gx 1,..., x i 1, x i, x i+1,..., x ci for all i = 1,...,. I words, if we cage te it variable wile keepig all te oters fixed, te value of g will ot cage by more ta c i. Teorem 2 McDiarmid s iequality [McD89]. Let X = X 1,..., X X be a -tuple of idepedet X-valued radom variables. If a fuctio g : X R as bouded differeces, as i 18, te, for all t > 0, 19 20 P gx EgX t exp 2t2 ; c2 i P EgX gx t exp 2t2. c2 i Proof. Let me first sketc te geeral idea beid te proof. Let V = gx EgX. Te first step will be to write V as a sum V i, were te terms V i are costructed so tat: 1 V i is a fuctio oly of X i = X 1,..., X i 2 Tere exists a fuctio Ψ i : X i 1 R suc tat, coditioally o X i 1, Ψ i X i 1 V i Ψ i X i 1 + c i. Provided we ca arrage tigs i tis way, we ca apply Lemma 1 to V i coditioally o X i 1 : E [ e sv i X i 1] e s2 c 21 2 i /8. Te, usig Ceroff s metod, we ave P Z EZ t = PV t e st E [ e sv ] = e st E [e s P ] V i = e st E [e s P ] 1 V i e sv = e st E [e s P 1 V i E [e ]] X sv 1 e st e s2 c 2 /8 E [e s P ] 1 V i, were i te ext-to-last step we used te fact tat V 1,..., V 1 deped oly o X 1, ad i te last step we used 21 wit i =. If we cotiue peelig off te terms ivolvig V 1, V 2,..., V 1, we will get P Z EZ t exp st + s2 c 2 i. 8 Takig s = 4t/ c2 i, we ed up wit 19. 5

It remais to costruct te V i s wit te desired properties. To tat ed, let Te V i = H i X i = E[Z X i ] ad V i = H i X i H i 1 X i 1. { E[Z X i ] E[Z X i 1 ] } = E[Z X ] EZ = Z EZ = V. Note tat V i depeds oly o X i by costructio. Moreover, let Ψ i X i 1 = if Hi X i 1, x H i 1 X i 1 x X Ψ ix i 1 = sup Hi X i 1, x H i 1 X i 1, x X were, owig to te fact tat te X i s are idepedet, we ave H i X i 1, x = E[Z X i 1, X i = x] = gx i 1, x, x i+1p X i+1 dx i+1 x i+1 deotig te tuple x i+1,..., x. Te Ψ ix i 1 Ψ i X i 1 = sup Hi X i 1, x H i 1 X i 1 if Hi X i 1, x H i 1 X i 1 x X x X = sup sup Hi X i 1, x H i X i 1, x x X x X = sup sup E[Z X i 1, X i = x] E[Z X i 1, X i = x ] x X x X [gx = sup sup i 1, x, x i+1 gx i 1, x, x i+1 ] P dx i+1 x X x X sup sup gx i 1, x, x i+1 gx i 1, x, x i+1 P dx i+1 x X x X c i, were te last step follows from te bouded differece property. Tus, we ca write Ψ i Xi 1 Ψ i X i 1 + c i, wic implies tat, ideed, coditioally o X i 1. Ψ i X i 1 V i Ψ i X i 1 + c i 4. McDiarmid s iequality i actio McDiarmid s iequality is a extremely powerful ad ofte used tool i statistical learig teory. We will ow discuss several examples of its use. To tat ed, we will first itroduce some otatio ad defiitios. Let X be some measurable space. If Q is a probability distributio of a X-valued radom variable X, te we ca compute te expectatio of ay measurable fuctio f : X R w.r.t. Q. So far, we ave deoted tis expectatio by EfX or by E Q fx. We will ofte fid it coveiet to use a alterative otatio, Qf. Let X = X 1,..., X be idepedet idetically distributed i.i.d. X-valued radom variables wit commo distributio P. Te mai object of iterest to us is te empirical distributio iduced by X, wic we will deote by P X. Te empirical distributio assigs te probability 1/ to eac X i, i.e., P X = 1 δ Xi. 6

Here, δ x deotes a uit mass cocetrated at a poit x X, i.e., te probability distributio o X defied by δ x A = 1 {x A}, measurable A X. We ote te followig importat facts about P X : 1 Beig a fuctio of te sample X, P X is a radom variable takig values i te space of probability distributios over X. 2 Te probability of a set A X uder P X, P X A = 1 1 {Xi A}, is te empirical frequecy of te set A o te sample X. Te expectatio of P X A is equal to P A, te P -probability of A. Ideed, [ ] E P 1 X A = E 1 {Xi A} = 1 E[1 {Xi A}] = 1 PX i A = P A. 3 Give a fuctio f : X R, we ca compute its expectatio w.r.t. P X : P X f = 1 fx i, wic is just te sample mea of f o X. It is also referred to as te empirical expectatio of f o X. We ave [ ] E P 1 1 X f = E fx i EfX i = EfX = P f. We ca ow proceed to our examples. 4.1. Sums of bouded radom variables. I te special case we X = R, P is a probability distributio supported o a fiite iterval, ad gx is te sum gx = X i, McDiarmid s iequality simply reduces to Hoeffdig s. Ideed, for ay x [a, b] ad x i we ave [a, b] Itercagig te roles of x i ad x i, we get gx i 1, x i, x i+1 gx i 1, x i, x i+1 = x i x i b a. gx i 1, x i, x i+1 gx i 1, x i, x i+1 = x i x i b a. Hece, we may apply Teorem 2 wit c i = b a for all i to get P gx EgX t 2 exp 2t2 b a 2. 7

4.2. Uiform deviatios. Let X 1,..., X be i.i.d. X-valued radom variables wit commo distributio P. By te Law of Large Numbers, for ay A X ad ay ε > 0 lim P PX A P A ε = 0. I fact, we ca use Hoeffdig s iequality to sow tat P PX A P A ε 2e 2ε2. Tis probability boud olds for eac A separately. However, i learig teory we are ofte iterested i te deviatio of empirical frequecies from true probabilities simultaeously over some collectio of te subsets of X. To tat ed, let A be suc a collectio ad cosider te fuctio gx sup P 22 X A P A. A A Later i te course we will see tat, for certai coices of A, EgX = O1/. However, regardless of wat A is, it is easy to see tat, by cagig oly oe X i, te value of gx ca cage at most by 1/. Let x = x 1,..., x, coose some oter x i X, ad let x i deote x wit x i replaced by x i : Te x = x i 1, xi, x i+1, x i = xi 1, x i, x i+1. gx gx i = sup P x A P A sup P x A A A i A P A A { = sup if Px A P A P } x A A A A i A P A { sup Px A P A P } x i A P A A A sup P x A P x i A A A = 1 sup 1 {xi A} 1 {x i A} 1. A A Itercagig te roles of x ad x i, we obtai gx i gx 1. Tus, gx gx i 1. Note tat tis boud olds for all i ad all coices of x ad x i. Tis meas tat te fuctio g defied i 22 as bouded differeces wit c 1 =... = c = 1/. Cosequetly, we ca use Teorem 2 to get P gx EgX ε 2e 2ε2. Tis sows tat te uiform deviatio gx cocetrates sarply aroud its mea EgX. 8

4.3. Uiform deviatios cotiued. Te same idea applies to arbitrary real-valued fuctios over X. Let X = X 1,..., X be as i te previous example. Give ay fuctio f : X [0, 1], Hoeffdig s iequality tells us tat P PX f EfX ε 2e 2ε2. However, just as i te previous example, i learig teory we are primarily iterested i cotrollig te deviatios of empirical meas from true meas simultaeously over wole classes of fuctios. To tat ed, let F be suc a class cosistig of fuctios f : X [0, 1] ad cosider te uiform deviatio gx sup f F P X f P f. A argumet etirely similar to te oe i te previous example 1 sows tat tis g as bouded differeces wit c 1 =... = c = 1/. Terefore, applyig McDiarmid s iequality, we obtai P gx EgX ε 2e 2ε2. We will see later tat, for certai fuctio classes F, we will ave EgX = O1/. 4.4. Kerel desity estimatio. For our fial example, let X = X 1,..., X be a -tuple of i.i.d. real-valued radom variables wose commo distributio P as a probability desity fuctio pdf f, i.e., P A = fxdx for ay measurable set A R. We wis to estimate f from te sample X. A popular metod is to use a kerel estimate te book by Devroye ad Lugosi [DL01] as plety of material o desity estimatio, icludig kerel metods, from te viewpoit of statistical learig teory. To tat ed, we pick a oegative fuctio K : R R tat itegrates to oe, Kxdx = 1 suc a fuctio is called a kerel, as well as a positive badwidt or smootig costat > 0 ad form te estimate f x = 1 A x Xi K It is ot ard to verify 2 tat f is a valid pdf, i.e., tat it is oegative ad itegrates to oe. A commo way of quatifyig te performace of a desity estimator is to use te L 1 distace to te true desity f: f f L1 = f x fx dx. R Note tat f f L1 is a radom variable sice it depeds o te radom sample X. Tus, we ca write it as a fuctio gx of te sample X. Leavig aside te problem of actually boudig EgX, we ca easily establis a cocetratio boud for it usig McDiarmid s iequality. To do. 1 Exercise: verify tis! 2 Aoter exercise! 9

tat, we eed to ceck tat g as bouded differeces. Coosig x ad x i as before, we ave gx gx i = 1 i 1 x xj K R j=1 1 i 1 x xj K R j=1 1 x K xi R 2 x K dx = 2. R + 1 K x xi + 1 K K x x i dx + 1 j=i+1 x x i + 1 x xj K fx dx x xj K fx dx j=i+1 Tus, we see tat gx as te bouded differeces property wit c 1 =... = c = 2/, so tat P gx EgX ε 2e ε2 /2. Refereces [BBL04] S. Boucero, O. Bousquet, ad G. Lugosi. Cocetratio iequalities. I O. Bousquet, U. vo Luxburg, ad G. Rätsc, editors, Advaced Lectures i Macie Learig, pages 208 240. Spriger, 2004. [Ce52] H. Ceroff. A meausre of asymptotic efficiecy of tests of a ypotesis based o te sum of observatios. Aals of Matematical Statistics, 23:493 507, 1952. [DL01] L. Devroye ad G. Lugosi. Combiatorial Metods i Desity Estimatio. Spriger, 2001. [Hoe63] W. Hoeffdig. Probability iequalities for sums of bouded radom variables. Joural of te America Statistical Associatio, 58:13 30, 1963. [McD89] C. McDiarmid. O te metod of bouded differeces. I Surveys i Combiatorics, pages 148 188. Cambridge Uiversity Press, 1989. 10