A Gentle Introduction to Concentration Inequalities

Size: px
Start display at page:

Download "A Gentle Introduction to Concentration Inequalities"

Transcription

1 A Gentle Introduction to Concentration Inequalities Karthik Sridharan Abstract This notes is ment to be a review of some basic inequalities and bounds on Random variables. A basic understanding of probability theory and set algebra might be required of the reader. This document is aimed to provide clear and complete proof for some inequalities. For readers familiar with the topics, many of the steps might seem trivial. None the less they are provided to simplify the proofs for readers new to the topic. This notes also provides to the best of my knowledge, the most generalized statement and proof for Symmetrization lemma. I also provides the less famous but geralized proof for Jensen s inequality and logarithmic sobolev inequality. Refer [] for a more detailed review of many of these inequalities with examples demonstrating their uses. Preliminary Throughout this notes we shall consider a probability space (Ω, E, P ) where Ω is the sample space, E is the event class which is a σ algebra on Ω and P is a probability measure. Further, we shall assume that there exists a borel measurable function mapping every point ω Ω to a real number uniquely called a random variable. We shall call the space of random variables X (note that X R). Further, we shall assume that all the functions and sets defined in the notes are measurable under the probability measure P. Chebychev s Inequality Let us start this notes by proving what is refered to as Chebychev s Inequality in [3]. Note, often by Chebychev s inequality an inequality derived from the below proved theorem is used. However [3] refers to this inequality as Tchebycheff s Inequality and the same is followed in this notes.

2 Theorem For some a X where X R, let f be a non-negative function such that {f(x) b x a}, where b Y where Y R. Then the following inequality holds, P (x a) E{f(x)} b Proof Let set X = {x : x a&x X} therefore we have, X X Since f is a non-negative function, taking the lebesgue integral of the function over sets X and X we have, fdp X fdp b X dp X where X dp = P (X ). However the lebague integral over probability measure of a function is its expectation. Hence we have, E{f(x : x X)} bp (X ) and hence E{f(x)} bp (x a) P (x a) E{f(x)} b () Now let us suppose that the function f is monotonically increasing. Therefore for every x a, f(x) f(a). In () use b = f(a). Therefore we get P (x a) E{f(x)} f(a) From this we can get the well known inequalities like P (x a) E{x} a called Markov inequality which holds for a > 0 and nonnegetive x, P ( x E(x) a) E{ x E(x) } a = V ar{x} a () often called the Chebychev s Inequality and the Chernoff s bound P (x a) Eesx e sa (3) In this note, V ar{} and σ are used interchangeably for variance

3 3 Information Theoretic Bounds 3. Jensen s Inequality Here we shall state and prove a generalized, measure theoretic proof for Jensen s inequality. In general, in probability theory, a more specific form of Jensen s inequality is famous. But before that we shall first define a convex function. Definition A function φ(x) is defined to be convex in interval (a, b) if for every point x in the interval (a, b) there exists an m such that for any x (a, b) φ(x) m(x x ) + φ(x ) (4) Note that this definition can be proved to be equivallent to the definition of convex function as one in which the value of the function for any point in the interval of convexity is always below the line segment joining the end points of any subinterval of the convex interval containing that point. We chose this particular definition for simplyfying the proof of Jensen s inequality. Now without further a due, let us move to stating and proving Jensen s Inequality. (Note: Refer [4] for a similar generalized proof for Jensen s Inequality.) Theorem Let f and µ be measurable functions of x which are finite a.e. on A R n. Now let fµ and µ be integrable on A and µ 0. If φ is a function which is convex in interval (a, b) which is the range of function f and A φ(f)µ exists then, A φ( fµ A µ ) A φ(f)µ A µ (5) Proof From our assumptions, the range of f is (a, b) which is the interval in which φ(x) is convex. Hence, consider the number, x A = fµ A µ 3

4 Clearly it is within the interval (a, b). Further, from Equation (4) we have for almost every x, φ(f(x)) m(f(x) x ) + φ(x ) Multiplying by µ and integrating both sides we get, φ(f)µ m( fµ x µ) + φ(x ) A Now see that A fµ x A µ = 0 and hence we have, φ(f)µ φ(x ) µ hence we get the result, A A A φ(f)µ φ( A A A fµ A µ ) µ A A µ Now note that if µ is a probability measure then, A µ = and since expected value is simply lebesgue integral of function w.r.t. probability measure, we have E[φ(f(x))] φ(e[f(x)]) (6) for any function φ convex for the range of function f(x). Now a function φ(x) is convex if φ (x) exists and is monotonically increasing and if second derivative exists and is nonnegetive. Therefore we can conclude that the function log x is a convex function. Therefore by Jensen s inequality, we have E[ log f(x)] log E[f(x)] Now if we take function f(x) to be the probability result we get the result that Entropy H(P ) is always greater than or equal to 0. If we make f(x) the ratio of two probability measures dp and dq, we get the result that relative entropy or KL divergence of two distributions is always non negetive. That is D(P Q) = E P [log( dq(x) dp (x) )] log(e P [ dq(x) dp (x) ]) = log(e Q[dQ(x)]) = 0 Therefore, D(P Q) 0 4

5 3. Han s Inequality We shall first prove the Han s Inequality for entropy and then usingthe result, we shall prove the Han s Inequality for relative entropy. Theorem 3 Let x, x,..., x n be discrete random variables from sample space X. Then H(x,..., x n ) H(x,..., x i, x i+,..., x n ) (7) n Proof Note that D(P X,Y P X P Y ) = H(X) H(X Y ). Since we already proved that Relative entropy is non-negetive, we have, H(X) H(X Y ). This in a vague way means that information (about some variable Y ) can only reduce entropy or uncertainity (H(X Y )), which makes intutive sense. Now consider the entropy, H(x,..., x n ) = H(x,..., x i, x i+,..., x n ) + H(x i x,..., x i, x i+,..., x n ) Since we have already seen that H(X) H(X Y ), applying this we have, H(x,..., x n ) H(x,..., x i, x i+,..., x n ) + H(x i x,..., x i ) Summing both sides upto n we get nh(x,..., x n ) H(x,..., x i, x i+,..., x n ) + H(x i x,..., x i ) Now note that by definition of conditional entropy, H(X Y ) = H(X, Y ) H(Y ). Therefore extending this to many variables we get the chain rule of entropy as, H(x,..., x n ) = H(x i x,..., x i ) Therefore using this chain rule, Therefore, nh(x,..., x n ) H(x,..., x i, x i+,..., x n ) + H(x,..., x n ) H(x,..., x n ) n H(x,..., x i, x i+,..., x n ) 5

6 Now we shall prove the Han s inequality for relative entropies. Let x n = x, x,..., x n be discrete random variables from sample space X just like in the previous case. Now let P and Q be probability distributions in the product space X n and let P be a distribution such that P X n(x,...,x n ) = dp (x ) dp (x ) dx dx... dpn(xn) dx n. That is distribution P assumes independence of the variables (x,..., x n ) with the probability density function of each variable x i as dp i(x) dx i. Let x (i) = (x,..., x i, x i+,..., x n ). Now with this setting we shall state and prove Han s relative entropy inequlity. dx n Theorem 4 Given any distribution Q on product space X n and a distribution P on X n which assumes independence of variables (x,..., x n ) where and Q (i) (x (i) ) = P (i) (x (i) ) = D(Q P ) n X X D(Q (i) P (i) ) (8) dq(x,..., x i, x i, x i+,..., x n ) dx n dx i dp (x,..., x i, x i, x i+,..., x n ) dx n dx i Proof By definition of relative entropy, D(Q P ) = dqlog( dq X n dp ) = dq X n dx n In the above equation, consider the term X n assumption about P we know that dp (x n ) dx n log(dq) dq dx n log( dp dx n )dx n (9) = dp (x ) dp (x)... dp n(x n ) dx dx dx n dq dx n log( dp dx n ))dx n. From our Now P (i) (x (i) ) = X dp (x,...,x i,x i,x i+,...,x n) dx n dx i, therefore, dp (x n ) dx n Therefore using this we get dq X n dx n log( dp (xn ) dx n )dx n = n n ( = dp i(x i ) dx i dp (i) (x (i) ) dx (i) dq X n dx n 6 (log( dp i(x i ) dx i ) + log( dp (i) (x (i) ) dx (i) ))dx n )

7 = n ( n Xn dq (i) (i) dx (i) (log(dp (x (i) ) dx (i) Rearranging the terms we get, dq X n dx n ))dx (i) ) + n dq X n dx n log( dp (xn ) dx n )dx n log( dp (xn ) dx n )dx n = n n ( Xn dq (i) (i) dx (i) (log(dp (x (i) ) dx (i) ))dx (i) ) Now also note that by Han s inequlity for entropy, dq X n dx n log( dq(xn ) dx n )dx n Xn dq (i) log(dq(i) (x (i) ) dx (i) dx (i) )dx (i) Therefore when we consider relative entropy given by Equation (9) we get, n X n D(Q P ) dq (i) log(dq(i) (x (i) ) dx (i) dx (i) )dx (i) n X n Thus finally simplifying we get the required result as D(Q P ) n D(Q (i) P (i) ) dq (i) (i) dx (i) log(dp (x (i) ) dx (i) )dx (i) 4 Inequalities of Sums of Random Variables 4. Hoeffding s Inequality Theorem 5 Let be independent bounded random variables such that the random variable x i falls in the interval [p i, q i ]. Then for any a > 0 we have P ( x i E( x i ) a) e a (q i p i ) Proof Form the Chernoff s bound given by (3) we get, P (x E(x) a) Ees(x E(x)) e sa 7

8 Let S n = x i. Therefore we have, P (S n E(S n ) a) Ees(Sn E(Sn)) e sa e sa n Ee s(x i E(x i )) (0) Now Let y be any random variable such that p y q and Ey = 0. Then for any s > 0 due to convexity of exponential function we have e sy y p q p esq + q y q p esp Taking expectation on both sides we get, Ee sy Now let α = p q p. Therefore, p q p esq + q q p esp Ee sy (αe s(q p) + ( α))e sα(q p) Ee sy e log(αes(q p) +( α)) sα(q p) Ee sy e φ(u) () Where the function φ(u) = log(αe u + ( α)) uα and u = s(q p). Now using Taylor s theorem, we have for some η, But we have φ(0) = 0 and φ (x) = φ(x) = φ(0) + xφ (0) + x φ (η) () φ (x) = αe x αe x +( α) α. Therefore φ (0) = 0 Now, α( α)ex ( α + αe x ) If we consider φ (x) we see that the function is maximum when φ (x) = α( α)ex ( α + αe x ) α ( α)e x ( α + αe x ) 3 = 0 e x = α α 8

9 Therefore, for any x φ (x) (( α) ) ( α) = 4 Therefore from () and () we have Therefore for any p y q Using this in (0) we get, Ee sy e u 8 Ee sy e s (q p) 8 (3) P (S n E(S n ) a) e sa e s (q i p i ) 8 Now we find the best bound by minimizing the L.H.S. of the above equation w.r.t s. Therefore we have de s (q i p i ) sa 8 ds Therefore for the best bound we have and correspondingly we get = e s (q i p i ) 8 sa (q i p i ) (s 8 s = 4a i (q i p i ) a) = 0 P (S n E(S n ) a) e a (q i p i ) (4) Now an interesting result from the Hoeffding s inequality often used in learning theory is to bound not the differnce in sum and its corresponding expectation but the emperical average of loss function and its expectation. This is done by using (4) as, E(Sn) denotes the emperical average of x and n = Ex. There- Now E n x = Sn n fore we have P (S n E(S n ) na) e P ( S n n E(S n) n a) e (an) (q i p i ) (an) (q i p i ) P (E n (x) E(x) a) e n a (q i p i ) (5) 9

10 4. Bernstein s Inequality Hoeffding s inequality does not use any knowledge about the distribution of variables. The Bernstein s inequality [7] uses the variance of the distribution to get a tighter bound. Theorem 6 Let x, x,..., x n be independent bounded random variables such that Ex i = 0 and x i ς with probability and let σ = n V ar{x i } Then for any a > 0 we have P ( x i ɛ) e n nɛ σ +ςɛ/3 Proof We need to re-estimate the new bound starting from (0). Let F i = r= s r E(x r i ) r!σ i where σ i = Ex i. Now ex = + x + r= x r r!. Therefore, Ee sx i = + sex i + r= E(x r i ) r! Since Ex i = 0 we have, Ee sx i = + F i s σ i e F is σ i Consider the term Ex r i. Since expectation of a function is just the Lebesgue integral of the function with respect to probability measure, we have Ex r i = P xr i x i. Using Schwarz s inequality we get, Ex r i = P x r i x i ( P Ex r i σ i ( x r i ) ( x i ) P P x r i ) Proceeding to use the Schwarz s inequality recursively n times we get Ex r i σ n i ( P 0 x (n r n+ ) i ) n

11 {σ ( n ) i ( x (n r n+ ) i ) n } P Now we know that x i ς. Therefore Hence we get Taking limit n to infinity we get Therefore, Therefore, ( x (n r n+ ) i ) n (ς (n r n+ ) ) n P Ex r i {σ ( n ) i ς (r n ) } Ex r i lim n {σ ( n ) i ς (r n ) } F i = r= F i s ς Applying this to (6) we get, Ex r i σ i ς r (6) s r E(x r i ) r!σ i r= s r ς r r! Ex r i e s σ i Now using (0) and the fact that σ = σ i n r= s r σ i ςr r!σ i = s ςi (e sς sς) (e sς sς) s ς we get, P (S n a) e sa e s nσ (e sς sς) s ς (7) Now to obtain the closest bound we minimize R.H.S w.r.t s. Therefore we get de s nσ ( (e sς sς) s ς ) sa ds Therefore to get a tighter bound we have = e s nσ ( (e sς sς) s ς ) sa (nσ ( (ςesς ς) ς ) a) = 0 ( (esς ) ) = a ς nσ

12 Therefore we have Using this s in (7) we get s = aς log( ς nσ + ) P (S n a) e nσ ς e nσ ς ( aς nσ ( aς aς nσ log( nσ +)) a aς log( ς nσ +) aς aς aς log( nσ +) nσ log( nσ +)) Let H(x) = ( + x)log( + x) x Therefore we get P (S n a) e nσ ς H( aς nσ ) (8) This is called the Bennett s inequality [?]. We can derive the Bernstien s inequality by further bounding the function H(x). Let function G(x) = 3 x x+3. We see that H(0) = G(0) = H (0) = G (0) = 0 and we see that H (x) = x+ and G (x) = 7. Therefore H (x+3) (0) G (0) and further 3 if f n (x) of a function f represents the n th derivative of the function then we have H n (0) G n (0) for any {n }. Therefore as a consequence of Taylor s theorem we have H(x) G(x) x 0 Therefore applying this to (8) we get P ( i P ( Now let a = nɛ. Therefore, Therefore we get, P ( x i a) e nσ ς G( aς nσ ) x i a) e i x i nɛ) e i P ( x i ɛ) e n a (aς+3nσ ) ɛ n (nɛς+3nσ ) nɛ σ +ςɛ/3 (9) An interesting phenomenon here is that if σ < ɛ then the upper bound grows as e nɛ rather than e nɛ as suggested by Hoeffding s inequality (4).

13 5 Inequalities of Functions of Random Variables 5. Efron Stien s Inequality Till now we only considered sum of R.V.s. Now we shall consider functions of R.V.s. The Efron Stien [9] inequality is one of the tightest bounds known. Theorem 7 Let S : X n R be a measurable function which is invariant under permutation and let the random variable Z be given by Z = S(x, x,..., x n ). Then we have V ar(z) E[(Z Z i) ] where Z i = S(x,..., x i,..., x n) where {x,...x n} is another sample from the same distribution as that of {x,...x n } Proof Let E i Z = E[Z x,..., x i, x i+,..., x n ] and let V = Z EZ. Now if we define V i as then V = V i and V i = E[Z x,..., x i ] E[Z x,..., x i ], i =,.., n V ar(z) = EV = E[( V i ) ] = E[ Vi ] + E[ V i V j ] i>j Now E[XY ] = E[E[XY Y ]] = E[Y E[X Y ]], therefore E[V j V i ] = E[V i E[V j x,..., x i ]]. But since i > j E[V j x,..., x i ] = 0. Therefore we have, Now let E x j i V ar(z) = E[ V i ] = E[V i ] represent expectation w.r.t variables {x i,..., x j }. V ar(z) = E[(E[Z x,..., x i ] E[Z x,..., x i ]) ] 3

14 = E x i [(E x n i+ [Z x,..., x i ] E x n i [Z x,..., x i ]) ] = E x i [(E x n i+ [Z x,..., x i ] E x n i+ [E xi [Z x,..., x i ]]) ] However, x is a convex function and hence we can apply Jensens inequality (6) and hence get, Therefore, V ar(z) E x i,x n [(Z E x i+ i [Z x,..., x i, x i+,..., x n ]) ] V ar(z) E[(Z E i [Z]) ] Where E i [Z] = E[Z x,..., x i, x i+,..., x n ]. Now let x and y be independent samples from the same distribution. E(x y) = E[x + y xy] = E[x ] (E[x]) Hence, if x and y are i.i.d s, then V ar{x} = E[ (x y) ]. Thus we have, E i [(Z E i [Z]) ] = E i[(z Z i) ] Thus we have the Efron-Stein inequality as V ar(z) E[(Z Z i) ] (0) Notice that if function S is the sum of the random variables the inequality becomes an equality. Hence the bound is tight. It is often refered to as jacknife bound. 5. McDiarmid s Inequality Theorem 8 Let S : X n R be a measurable function which is invariant under permutation and let the random variable Z be given by Z = S(x, x,..., x n ). Then for any a > 0 we have P ( Z E[Z] a) e a ς i 4

15 whenever the function has bounded difference [0]. That is sup x,...,x n,x i S(x, x,..., x n ) S(x,..., x i,..., x n ) ς i where Z i = S(x,..., x i,..., x n) where {x,...x n} is a sample from the same distribution as {x,...x n } Proof Using Chernoff s bound (3) we get Now let, P (Z E[Z] a) e sa e E[Z E[Z]] V i = E[Z x,..., x i ] E[Z x,..., x i ], i =,.., n then V = V i = Z E[Z]. Therefore, P (Z E[Z] a) e sa n E[e sv i ] = e sa n E[e sv i ] () Now Let V i be bounded by the interval [L i, U i ]. We know that Z Z i ) ς i, hence it follows that V i ς i and hence U i L i ς i. Using (3) on E[e sv i ] we get, E[e sv i ] e s (U i L i ) 8 e s ς i 8 Using this in () we get, P (Z E[Z] a) e as n e s ς i 8 = e s n ς i 8 sa Now to make the bound tight we simply minimize it with respect to s. Therefore to do that, ςi s 8 a = 0 Therefore the bound is given by, s = P (Z E[Z] a) e ( 4a ς i 4a n ς i ) ς i 4a 8 ( n ) ς i 5

16 Hence we get, P (Z E[Z] a) e ( P ( Z E[Z] a) e ( a ς i a ς i ) ) () 5.3 Logarithmic Sobolev Inequality This inequality is very useful in deriving simple proofs for many known bounds using a method famous as Ledoux method []. Before jumping into the therem, let us first prove a useful lemma. Lemma 9 For any positive random variable y and α > 0 we have E{ylog y} Ey log Ey E{ylog y ylog α (y α)} (3) Proof For any (x > 0 ) we have, log x x. Therefore, log α Ey α Ey Therefore, Ey log α Ey α Ey adding E{ylog y} on both sides we get, Ey logα Ey logey + E{ylog y} α Ey + E{ylog y} Therefore simplifying we get the required result as E{ylog y} Ey log Ey E{ylog y ylog α (y α)} Now just like in the previous two sections, let S : X n R be a measurable function which is invariant under permutation and let the random variable Z be given by Z = S(x, x,..., x n ). Here we assume independence of (x, x,..., x n ). Z i = S(x,..., x i,..., x n) where {x,...x n} is another sample from the same distribution as that of {x,...x n }. Now we are ready to state and prove the logarithmic Sobolev inequality. 6

17 Theorem 0 If function ψ(x) = e x x then, se{ze sz } E{e sz }log E{e sz } E{e sz ψ( s(z Z i))} (4) Proof From Lemma (Equation(3)) we have for any positive variable Y and α = Y i > 0, E i {Y log Y } E i Y log E i Y E i {Y log Y Y log Y i (Y Y i )} Now let Y = e sz and Y i = esz i then, E i {Y log Y } E i Y log E i Y E i {e sz (sz sz i) e sz ( e sz i sz )} Writing it in terms of function ψ(x) we get, E i {Y log Y } E i Y log E i Y E i {e sz ψ( s(z Z i))} (5) Now let measure P denote the distribution of (x,..., x n ) and let distribution Q be given by, Then we have, dq(x,..., x n ) = dp (x,..., x n )Y (x,..., x n ) D(Q P ) = E Q {log Y } = E P {Y log Y } = E{Y log Y } Now since Y is positive, we have D(Q P ) E{Y log Y } EY log EY (6) However by Han s inequality for relative entropy, rearranging Equation (8) we have, D(Q P ) D(Q P ) D(Q (i) P (i) ) (7) Now we have already shown that D(Q P ) = E{Y log Y } and further, E{E i {Y log Y }} = E{Y log Y } therefore, D(Q P ) = E{E i {Y log Y }}. Now by definition, dq (i) (x (i) ) dx (i) = X = X dq(x,..., x i, x i, x i+,..., x n ) dx n dx i Y (x,..., x n )dp (x,..., x i, x i, x i+,..., x n ) dx n dx i 7

18 Due to the independence assumption of the sample, we can rewrite the above as, dq (i) (x (i) ) dx (i) = dp (x,..., x i, x i+,..., x n ) dx (i) Y (x,..., x n )dp i (x i ) X = dp (x,..., x i, x i+,..., x n ) dx (i) E i Y Therefore D(Q (i) P (i) ) is given by, D(Q (i) P (i) ) = E i Y log E i Y dp (x i ) = E{E i Y log E i Y } X n Now using this we get, D(Q P ) D(Q (i) P (i) ) = E{E i {Y log Y }} E{E i Y log E i Y } Therefore, using this in Equation (7) we get, D(Q P ) E{E i {Y log Y }} E{E i Y log E i Y } and using this inturn with Equation (6) we get, E{Y log Y } EY log EY E{E i {Y log Y } E i Y log E i Y } Using the above with Equation (5) and substituting Y with e sz we get the first required result as, se{ze sz } E{e sz }log E{e sz } E{e sz ψ( s(z Z i))} 6 Symmetrization Lemma The symmetrisization lemma is probably one of the easier bounds we review in this notes. However, it is extremely powerful since it allows us to bound the difference between of emperical mean of a function and its expected value, using the difference between emperical means of the function 8

19 for independent samples of the same size as the original sample. Note that in most literature, the symmetrization lemma stated and proved only bounds zero one functions like loss function or the actual classification function. Here we derive a more generalized version where we prove the lemma for bounding the difference between expectation of any measurable function with bounded variance and its emperical mean. Lemma Let f : X R be a measurable function such that V ar{f} C. Let En ˆ {f} be the emperical mean of the function f(x) estimated using a set (x, x,..., x n ) of n independent identical samples from space X. Then for any a > 0 if n > 8C we have, a P ( ˆ E n {f} ˆ En {f} a) P ( E{f} ˆ E n {f} > a) (8) where E ˆ n {f} and En ˆ {f} stand for emperical mean of the function f(x) estimated using samples (x, x,..., x n) and (x, x,..., x n) respectively Proof By the definition of probability, P ( ˆ E n {f} ˆ En {f} a) = X n [ ˆ En {f} ˆ En {f} a]dp Where function z is for any z 0 and 0 otherwise. Since X n is the product space X n X n, using Fubini s theorem we have, P ( E ˆ n {f} En ˆ {f} X a) = n X n [ En ˆ {f} En ˆ {f} a]dp dp Now since the set Y = {(x, x,..., x n ) : E{f} ˆ E n {f} > a} is a subset of X n and term inside the integral is always non-negetive, P ( E ˆ n {f} En ˆ {f} Y a) X n [ En ˆ {f} En ˆ {f} a]dp dp Now let Let Z = {(x, x,..., x n ) : E ˆ n {f} E{f(x)} a }. Clearly for any sample (x, x,..., x n ), if (x, x,..., x n ) Y and (x n+, x n+,..., x n ) Z it implies that (x, x,..., x n ) is a sample such that E ˆ n {f} En ˆ {f} a Therefore, coming back to the integral since Z X n, Y X n [ En ˆ {f} En ˆ {f} a]dp dp [ En ˆ {f} En ˆ {f} Y Z a]dp dp 9

20 Now since the integral is over Y and Z half spaces as we saw earlier, [ En ˆ {f} En ˆ {f} Y Z a]dp dp = dp dp Y Z = P ( ˆ E n {f} E{f} Y a)dp Therefore, Y X n [ En ˆ {f} En ˆ {f} a]dp dp P ( ˆ E n {f} E{f} Y a)dp Now, P ( E ˆ n {f} E{f} a) = P ( Eˆ n {f} E{f} > a) Using Equation () (often called the chebyshev s inequality) we get, P ( E ˆ 4V ar{f} n {f} E{f} > a) na Now if we choose n such that n > 8C a Therefore, 4C na as per our assumption then, P ( ˆ E n {f} E{f} > a) P ( ˆ E n {f} E{f} a) = Puting this back in the integral we get, P ( ˆ E n {f} ˆ En {f} a) Y Therefore we get the final result as, dp = dp Y P ( ˆ E n {f} ˆ En {f} a) P ( E{f} ˆ E n {f} > a) Note that if we make the function f to be a zero one function then the maximum possible variance is 4. Hence if we set C to 4 then the condition under which the inequality holds becomes, n >. Further note that if a we choose the zero one function f(x) such that it is when say x is a particular value and 0 if not, then the result basically bounds the absolute difference between probability of the event of x taking a particular value and the frequency estimate of x taking that value in a sample of size n using the difference in the frequencies of occurance of the value in independent samples of size n. 0

21 References [] V.N. Vapnik, A.YA. Chervonenkis, On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities, Theory of Probability and its Applications, 6():64-8, 97. [] Gabor Lugosi, Concentration-of-Measure Inequalities, Lecture Notes, lugosi/anu.pdf [3] A.N. Kolmogorov, Foundations of Probability, Chelsea Publications, NY, 956. [4] Richard L. Wheeden, Antoni Zygmund, Measure and Integral:An Introduction to Real Analysis, International Series of Monographs in Pure and Applied Mathematics. [5] W. Hoeffding, Probability Inequalities for Sums of Bounded Random Variables, Journal of the American Statistical Association, 58:3-30, 963. [6] G. Benette, Probability Inequalities for Sum of Independent Random Variables, Journal of the American Statistical Association, 57:33-45, 96. [7] S.N. Bernstein, The Theory of Probabilities, Gastehizdat Publishing House, Moscow, 946. [8] V. V. Petrov, Sums of Independent Random Variables, Translated by A.A. Brown, Springer Verlag, 975. [9] B. Efron, C. Stein, The Jacknife Estimate of Variance, Annals of Statistics, 9:586-96, 98. [0] C. Mcdiarmid, On the Method of Bounded Differences, In Surveys in Combinatorics, LMS lecture. note series 4, 989. [] Michel Ledoux, The Concentration of Measure Phenomenon, American Mathematical Society, Mathematical Surveys and Monographs, Vol 89, 00. [] Olivier Bousquet, Stephane Boucheron, and Gabor Lugosi, Introduction to Statistical Learning Theory,Lecture Notes, lugosi/mlss slt.pdf

A Gentle Introduction to Concentration Inequalities

A Gentle Introduction to Concentration Inequalities A Gentle Introducton to Concentraton Inequaltes Karthk Srdharan Abstract Ths notes s ment to be a revew of some basc nequaltes and bounds on Random varables. A basc understandng of probablty theory and

More information

Concentration inequalities and the entropy method

Concentration inequalities and the entropy method Concentration inequalities and the entropy method Gábor Lugosi ICREA and Pompeu Fabra University Barcelona what is concentration? We are interested in bounding random fluctuations of functions of many

More information

4 Expectation & the Lebesgue Theorems

4 Expectation & the Lebesgue Theorems STA 205: Probability & Measure Theory Robert L. Wolpert 4 Expectation & the Lebesgue Theorems Let X and {X n : n N} be random variables on a probability space (Ω,F,P). If X n (ω) X(ω) for each ω Ω, does

More information

Lecture 3: Expected Value. These integrals are taken over all of Ω. If we wish to integrate over a measurable subset A Ω, we will write

Lecture 3: Expected Value. These integrals are taken over all of Ω. If we wish to integrate over a measurable subset A Ω, we will write Lecture 3: Expected Value 1.) Definitions. If X 0 is a random variable on (Ω, F, P), then we define its expected value to be EX = XdP. Notice that this quantity may be. For general X, we say that EX exists

More information

Lecture 1 Measure concentration

Lecture 1 Measure concentration CSE 29: Learning Theory Fall 2006 Lecture Measure concentration Lecturer: Sanjoy Dasgupta Scribe: Nakul Verma, Aaron Arvey, and Paul Ruvolo. Concentration of measure: examples We start with some examples

More information

Lectures 22-23: Conditional Expectations

Lectures 22-23: Conditional Expectations Lectures 22-23: Conditional Expectations 1.) Definitions Let X be an integrable random variable defined on a probability space (Ω, F 0, P ) and let F be a sub-σ-algebra of F 0. Then the conditional expectation

More information

Probability Background

Probability Background Probability Background Namrata Vaswani, Iowa State University August 24, 2015 Probability recap 1: EE 322 notes Quick test of concepts: Given random variables X 1, X 2,... X n. Compute the PDF of the second

More information

LECTURE 3. Last time:

LECTURE 3. Last time: LECTURE 3 Last time: Mutual Information. Convexity and concavity Jensen s inequality Information Inequality Data processing theorem Fano s Inequality Lecture outline Stochastic processes, Entropy rate

More information

Lecture 2: Random Variables and Expectation

Lecture 2: Random Variables and Expectation Econ 514: Probability and Statistics Lecture 2: Random Variables and Expectation Definition of function: Given sets X and Y, a function f with domain X and image Y is a rule that assigns to every x X one

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Computational and Statistical Learning Theory Problem set 1 Due: Monday, October 10th Please send your solutions to learning-submissions@ttic.edu Notation: Input space: X Label space: Y = {±1} Sample:

More information

Concentration Inequalities

Concentration Inequalities Chapter Concentration Inequalities I. Moment generating functions, the Chernoff method, and sub-gaussian and sub-exponential random variables a. Goal for this section: given a random variable X, how does

More information

Concentration, self-bounding functions

Concentration, self-bounding functions Concentration, self-bounding functions S. Boucheron 1 and G. Lugosi 2 and P. Massart 3 1 Laboratoire de Probabilités et Modèles Aléatoires Université Paris-Diderot 2 Economics University Pompeu Fabra 3

More information

March 1, Florida State University. Concentration Inequalities: Martingale. Approach and Entropy Method. Lizhe Sun and Boning Yang.

March 1, Florida State University. Concentration Inequalities: Martingale. Approach and Entropy Method. Lizhe Sun and Boning Yang. Florida State University March 1, 2018 Framework 1. (Lizhe) Basic inequalities Chernoff bounding Review for STA 6448 2. (Lizhe) Discrete-time martingales inequalities via martingale approach 3. (Boning)

More information

Probability and Measure

Probability and Measure Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 84 Paper 4, Section II 26J Let (X, A) be a measurable space. Let T : X X be a measurable map, and µ a probability

More information

Concentration inequalities and tail bounds

Concentration inequalities and tail bounds Concentration inequalities and tail bounds John Duchi Outline I Basics and motivation 1 Law of large numbers 2 Markov inequality 3 Cherno bounds II Sub-Gaussian random variables 1 Definitions 2 Examples

More information

Lecture 2: Review of Basic Probability Theory

Lecture 2: Review of Basic Probability Theory ECE 830 Fall 2010 Statistical Signal Processing instructor: R. Nowak, scribe: R. Nowak Lecture 2: Review of Basic Probability Theory Probabilistic models will be used throughout the course to represent

More information

Useful Probability Theorems

Useful Probability Theorems Useful Probability Theorems Shiu-Tang Li Finished: March 23, 2013 Last updated: November 2, 2013 1 Convergence in distribution Theorem 1.1. TFAE: (i) µ n µ, µ n, µ are probability measures. (ii) F n (x)

More information

OXPORD UNIVERSITY PRESS

OXPORD UNIVERSITY PRESS Concentration Inequalities A Nonasymptotic Theory of Independence STEPHANE BOUCHERON GABOR LUGOSI PASCAL MASS ART OXPORD UNIVERSITY PRESS CONTENTS 1 Introduction 1 1.1 Sums of Independent Random Variables

More information

x log x, which is strictly convex, and use Jensen s Inequality:

x log x, which is strictly convex, and use Jensen s Inequality: 2. Information measures: mutual information 2.1 Divergence: main inequality Theorem 2.1 (Information Inequality). D(P Q) 0 ; D(P Q) = 0 iff P = Q Proof. Let ϕ(x) x log x, which is strictly convex, and

More information

EE514A Information Theory I Fall 2013

EE514A Information Theory I Fall 2013 EE514A Information Theory I Fall 2013 K. Mohan, Prof. J. Bilmes University of Washington, Seattle Department of Electrical Engineering Fall Quarter, 2013 http://j.ee.washington.edu/~bilmes/classes/ee514a_fall_2013/

More information

Spring 2012 Math 541B Exam 1

Spring 2012 Math 541B Exam 1 Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote

More information

EECS 750. Hypothesis Testing with Communication Constraints

EECS 750. Hypothesis Testing with Communication Constraints EECS 750 Hypothesis Testing with Communication Constraints Name: Dinesh Krithivasan Abstract In this report, we study a modification of the classical statistical problem of bivariate hypothesis testing.

More information

EXPONENTIAL INEQUALITIES IN NONPARAMETRIC ESTIMATION

EXPONENTIAL INEQUALITIES IN NONPARAMETRIC ESTIMATION EXPONENTIAL INEQUALITIES IN NONPARAMETRIC ESTIMATION Luc Devroye Division of Statistics University of California at Davis Davis, CA 95616 ABSTRACT We derive exponential inequalities for the oscillation

More information

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R. Ergodic Theorems Samy Tindel Purdue University Probability Theory 2 - MA 539 Taken from Probability: Theory and examples by R. Durrett Samy T. Ergodic theorems Probability Theory 1 / 92 Outline 1 Definitions

More information

Bennett-type Generalization Bounds: Large-deviation Case and Faster Rate of Convergence

Bennett-type Generalization Bounds: Large-deviation Case and Faster Rate of Convergence Bennett-type Generalization Bounds: Large-deviation Case and Faster Rate of Convergence Chao Zhang The Biodesign Institute Arizona State University Tempe, AZ 8587, USA Abstract In this paper, we present

More information

Introduction to Statistical Learning Theory

Introduction to Statistical Learning Theory Introduction to Statistical Learning Theory In the last unit we looked at regularization - adding a w 2 penalty. We add a bias - we prefer classifiers with low norm. How to incorporate more complicated

More information

Probability and Measure

Probability and Measure Chapter 4 Probability and Measure 4.1 Introduction In this chapter we will examine probability theory from the measure theoretic perspective. The realisation that measure theory is the foundation of probability

More information

Measure and Integration: Solutions of CW2

Measure and Integration: Solutions of CW2 Measure and Integration: s of CW2 Fall 206 [G. Holzegel] December 9, 206 Problem of Sheet 5 a) Left (f n ) and (g n ) be sequences of integrable functions with f n (x) f (x) and g n (x) g (x) for almost

More information

3. Review of Probability and Statistics

3. Review of Probability and Statistics 3. Review of Probability and Statistics ECE 830, Spring 2014 Probabilistic models will be used throughout the course to represent noise, errors, and uncertainty in signal processing problems. This lecture

More information

LECTURE 2. Convexity and related notions. Last time: mutual information: definitions and properties. Lecture outline

LECTURE 2. Convexity and related notions. Last time: mutual information: definitions and properties. Lecture outline LECTURE 2 Convexity and related notions Last time: Goals and mechanics of the class notation entropy: definitions and properties mutual information: definitions and properties Lecture outline Convexity

More information

Lecture 22: Final Review

Lecture 22: Final Review Lecture 22: Final Review Nuts and bolts Fundamental questions and limits Tools Practical algorithms Future topics Dr Yao Xie, ECE587, Information Theory, Duke University Basics Dr Yao Xie, ECE587, Information

More information

Brownian Motion and Conditional Probability

Brownian Motion and Conditional Probability Math 561: Theory of Probability (Spring 2018) Week 10 Brownian Motion and Conditional Probability 10.1 Standard Brownian Motion (SBM) Brownian motion is a stochastic process with both practical and theoretical

More information

d(x n, x) d(x n, x nk ) + d(x nk, x) where we chose any fixed k > N

d(x n, x) d(x n, x nk ) + d(x nk, x) where we chose any fixed k > N Problem 1. Let f : A R R have the property that for every x A, there exists ɛ > 0 such that f(t) > ɛ if t (x ɛ, x + ɛ) A. If the set A is compact, prove there exists c > 0 such that f(x) > c for all x

More information

5 Operations on Multiple Random Variables

5 Operations on Multiple Random Variables EE360 Random Signal analysis Chapter 5: Operations on Multiple Random Variables 5 Operations on Multiple Random Variables Expected value of a function of r.v. s Two r.v. s: ḡ = E[g(X, Y )] = g(x, y)f X,Y

More information

Foundations of Machine Learning

Foundations of Machine Learning Introduction to ML Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu page 1 Logistics Prerequisites: basics in linear algebra, probability, and analysis of algorithms. Workload: about

More information

Kernel Density Estimation

Kernel Density Estimation EECS 598: Statistical Learning Theory, Winter 2014 Topic 19 Kernel Density Estimation Lecturer: Clayton Scott Scribe: Yun Wei, Yanzhen Deng Disclaimer: These notes have not been subjected to the usual

More information

7 Convergence in R d and in Metric Spaces

7 Convergence in R d and in Metric Spaces STA 711: Probability & Measure Theory Robert L. Wolpert 7 Convergence in R d and in Metric Spaces A sequence of elements a n of R d converges to a limit a if and only if, for each ǫ > 0, the sequence a

More information

AN INEQUALITY FOR TAIL PROBABILITIES OF MARTINGALES WITH BOUNDED DIFFERENCES

AN INEQUALITY FOR TAIL PROBABILITIES OF MARTINGALES WITH BOUNDED DIFFERENCES Lithuanian Mathematical Journal, Vol. 4, No. 3, 00 AN INEQUALITY FOR TAIL PROBABILITIES OF MARTINGALES WITH BOUNDED DIFFERENCES V. Bentkus Vilnius Institute of Mathematics and Informatics, Akademijos 4,

More information

P (A G) dp G P (A G)

P (A G) dp G P (A G) First homework assignment. Due at 12:15 on 22 September 2016. Homework 1. We roll two dices. X is the result of one of them and Z the sum of the results. Find E [X Z. Homework 2. Let X be a r.v.. Assume

More information

Novel Bernstein-like Concentration Inequalities for the Missing Mass

Novel Bernstein-like Concentration Inequalities for the Missing Mass Novel Bernstein-like Concentration Inequalities for the Missing Mass Bahman Yari Saeed Khanloo Monash University bahman.khanloo@monash.edu Gholamreza Haffari Monash University gholamreza.haffari@monash.edu

More information

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2)

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2) 14:17 11/16/2 TOPIC. Convergence in distribution and related notions. This section studies the notion of the so-called convergence in distribution of real random variables. This is the kind of convergence

More information

Hoeffding, Chernoff, Bennet, and Bernstein Bounds

Hoeffding, Chernoff, Bennet, and Bernstein Bounds Stat 928: Statistical Learning Theory Lecture: 6 Hoeffding, Chernoff, Bennet, Bernstein Bounds Instructor: Sham Kakade 1 Hoeffding s Bound We say X is a sub-gaussian rom variable if it has quadratically

More information

Measure-theoretic probability

Measure-theoretic probability Measure-theoretic probability Koltay L. VEGTMAM144B November 28, 2012 (VEGTMAM144B) Measure-theoretic probability November 28, 2012 1 / 27 The probability space De nition The (Ω, A, P) measure space is

More information

1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer 11(2) (1989),

1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer 11(2) (1989), Real Analysis 2, Math 651, Spring 2005 April 26, 2005 1 Real Analysis 2, Math 651, Spring 2005 Krzysztof Chris Ciesielski 1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer

More information

Homework 11. Solutions

Homework 11. Solutions Homework 11. Solutions Problem 2.3.2. Let f n : R R be 1/n times the characteristic function of the interval (0, n). Show that f n 0 uniformly and f n µ L = 1. Why isn t it a counterexample to the Lebesgue

More information

Uniform concentration inequalities, martingales, Rademacher complexity and symmetrization

Uniform concentration inequalities, martingales, Rademacher complexity and symmetrization Uniform concentration inequalities, martingales, Rademacher complexity and symmetrization John Duchi Outline I Motivation 1 Uniform laws of large numbers 2 Loss minimization and data dependence II Uniform

More information

Probability Theory Overview and Analysis of Randomized Algorithms

Probability Theory Overview and Analysis of Randomized Algorithms Probability Theory Overview and Analysis of Randomized Algorithms Analysis of Algorithms Prepared by John Reif, Ph.D. Probability Theory Topics a) Random Variables: Binomial and Geometric b) Useful Probabilistic

More information

Part II Probability and Measure

Part II Probability and Measure Part II Probability and Measure Theorems Based on lectures by J. Miller Notes taken by Dexter Chua Michaelmas 2016 These notes are not endorsed by the lecturers, and I have modified them (often significantly)

More information

Lecture I: Asymptotics for large GUE random matrices

Lecture I: Asymptotics for large GUE random matrices Lecture I: Asymptotics for large GUE random matrices Steen Thorbjørnsen, University of Aarhus andom Matrices Definition. Let (Ω, F, P) be a probability space and let n be a positive integer. Then a random

More information

EE5319R: Problem Set 3 Assigned: 24/08/16, Due: 31/08/16

EE5319R: Problem Set 3 Assigned: 24/08/16, Due: 31/08/16 EE539R: Problem Set 3 Assigned: 24/08/6, Due: 3/08/6. Cover and Thomas: Problem 2.30 (Maimum Entropy): Solution: We are required to maimize H(P X ) over all distributions P X on the non-negative integers

More information

THE LINDEBERG-FELLER CENTRAL LIMIT THEOREM VIA ZERO BIAS TRANSFORMATION

THE LINDEBERG-FELLER CENTRAL LIMIT THEOREM VIA ZERO BIAS TRANSFORMATION THE LINDEBERG-FELLER CENTRAL LIMIT THEOREM VIA ZERO BIAS TRANSFORMATION JAINUL VAGHASIA Contents. Introduction. Notations 3. Background in Probability Theory 3.. Expectation and Variance 3.. Convergence

More information

Chapter 7. Basic Probability Theory

Chapter 7. Basic Probability Theory Chapter 7. Basic Probability Theory I-Liang Chern October 20, 2016 1 / 49 What s kind of matrices satisfying RIP Random matrices with iid Gaussian entries iid Bernoulli entries (+/ 1) iid subgaussian entries

More information

EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm

EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm 1. Feedback does not increase the capacity. Consider a channel with feedback. We assume that all the recieved outputs are sent back immediately

More information

h(x) lim H(x) = lim Since h is nondecreasing then h(x) 0 for all x, and if h is discontinuous at a point x then H(x) > 0. Denote

h(x) lim H(x) = lim Since h is nondecreasing then h(x) 0 for all x, and if h is discontinuous at a point x then H(x) > 0. Denote Real Variables, Fall 4 Problem set 4 Solution suggestions Exercise. Let f be of bounded variation on [a, b]. Show that for each c (a, b), lim x c f(x) and lim x c f(x) exist. Prove that a monotone function

More information

16.1 Bounding Capacity with Covering Number

16.1 Bounding Capacity with Covering Number ECE598: Information-theoretic methods in high-dimensional statistics Spring 206 Lecture 6: Upper Bounds for Density Estimation Lecturer: Yihong Wu Scribe: Yang Zhang, Apr, 206 So far we have been mostly

More information

The Central Limit Theorem: More of the Story

The Central Limit Theorem: More of the Story The Central Limit Theorem: More of the Story Steven Janke November 2015 Steven Janke (Seminar) The Central Limit Theorem:More of the Story November 2015 1 / 33 Central Limit Theorem Theorem (Central Limit

More information

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Theorems Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

Exercises and Answers to Chapter 1

Exercises and Answers to Chapter 1 Exercises and Answers to Chapter The continuous type of random variable X has the following density function: a x, if < x < a, f (x), otherwise. Answer the following questions. () Find a. () Obtain mean

More information

11.1 Set Cover ILP formulation of set cover Deterministic rounding

11.1 Set Cover ILP formulation of set cover Deterministic rounding CS787: Advanced Algorithms Lecture 11: Randomized Rounding, Concentration Bounds In this lecture we will see some more examples of approximation algorithms based on LP relaxations. This time we will use

More information

Probability inequalities 11

Probability inequalities 11 Paninski, Intro. Math. Stats., October 5, 2005 29 Probability inequalities 11 There is an adage in probability that says that behind every limit theorem lies a probability inequality (i.e., a bound on

More information

STA205 Probability: Week 8 R. Wolpert

STA205 Probability: Week 8 R. Wolpert INFINITE COIN-TOSS AND THE LAWS OF LARGE NUMBERS The traditional interpretation of the probability of an event E is its asymptotic frequency: the limit as n of the fraction of n repeated, similar, and

More information

Tail and Concentration Inequalities

Tail and Concentration Inequalities CSE 694: Probabilistic Analysis and Randomized Algorithms Lecturer: Hung Q. Ngo SUNY at Buffalo, Spring 2011 Last update: February 19, 2011 Tail and Concentration Ineualities From here on, we use 1 A to

More information

CLASSICAL PROBABILITY MODES OF CONVERGENCE AND INEQUALITIES

CLASSICAL PROBABILITY MODES OF CONVERGENCE AND INEQUALITIES CLASSICAL PROBABILITY 2008 2. MODES OF CONVERGENCE AND INEQUALITIES JOHN MORIARTY In many interesting and important situations, the object of interest is influenced by many random factors. If we can construct

More information

Solutions: Problem Set 4 Math 201B, Winter 2007

Solutions: Problem Set 4 Math 201B, Winter 2007 Solutions: Problem Set 4 Math 2B, Winter 27 Problem. (a Define f : by { x /2 if < x

More information

On Negatively Associated Random Variables 1

On Negatively Associated Random Variables 1 ISSN 1995-0802, Lobachevskii Journal of Mathematics, 2012, Vol. 33, No. 1, pp. 47 55. c Pleiades Publishing, Ltd., 2012. On Negatively Associated Random Variables 1 M. Gerasimov, V. Kruglov 1*,and A.Volodin

More information

Learning Theory. Sridhar Mahadevan. University of Massachusetts. p. 1/38

Learning Theory. Sridhar Mahadevan. University of Massachusetts. p. 1/38 Learning Theory Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts p. 1/38 Topics Probability theory meet machine learning Concentration inequalities: Chebyshev, Chernoff, Hoeffding, and

More information

Lectures 6: Degree Distributions and Concentration Inequalities

Lectures 6: Degree Distributions and Concentration Inequalities University of Washington Lecturer: Abraham Flaxman/Vahab S Mirrokni CSE599m: Algorithms and Economics of Networks April 13 th, 007 Scribe: Ben Birnbaum Lectures 6: Degree Distributions and Concentration

More information

If Y and Y 0 satisfy (1-2), then Y = Y 0 a.s.

If Y and Y 0 satisfy (1-2), then Y = Y 0 a.s. 20 6. CONDITIONAL EXPECTATION Having discussed at length the limit theory for sums of independent random variables we will now move on to deal with dependent random variables. An important tool in this

More information

Econ 508B: Lecture 5

Econ 508B: Lecture 5 Econ 508B: Lecture 5 Expectation, MGF and CGF Hongyi Liu Washington University in St. Louis July 31, 2017 Hongyi Liu (Washington University in St. Louis) Math Camp 2017 Stats July 31, 2017 1 / 23 Outline

More information

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157 Lecture 6: Gaussian Channels Copyright G. Caire (Sample Lectures) 157 Differential entropy (1) Definition 18. The (joint) differential entropy of a continuous random vector X n p X n(x) over R is: Z h(x

More information

Probability and Measure

Probability and Measure Probability and Measure Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Convergence of Random Variables 1. Convergence Concepts 1.1. Convergence of Real

More information

Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012

Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012 Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 202 BOUNDS AND ASYMPTOTICS FOR FISHER INFORMATION IN THE CENTRAL LIMIT THEOREM

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 59 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d

More information

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given. HW1 solutions Exercise 1 (Some sets of probability distributions.) Let x be a real-valued random variable with Prob(x = a i ) = p i, i = 1,..., n, where a 1 < a 2 < < a n. Of course p R n lies in the standard

More information

Appendix B: Inequalities Involving Random Variables and Their Expectations

Appendix B: Inequalities Involving Random Variables and Their Expectations Chapter Fourteen Appendix B: Inequalities Involving Random Variables and Their Expectations In this appendix we present specific properties of the expectation (additional to just the integral of measurable

More information

Lecture 11. Probability Theory: an Overveiw

Lecture 11. Probability Theory: an Overveiw Math 408 - Mathematical Statistics Lecture 11. Probability Theory: an Overveiw February 11, 2013 Konstantin Zuev (USC) Math 408, Lecture 11 February 11, 2013 1 / 24 The starting point in developing the

More information

Probability reminders

Probability reminders CS246 Winter 204 Mining Massive Data Sets Probability reminders Sammy El Ghazzal selghazz@stanfordedu Disclaimer These notes may contain typos, mistakes or confusing points Please contact the author so

More information

1 Stat 605. Homework I. Due Feb. 1, 2011

1 Stat 605. Homework I. Due Feb. 1, 2011 The first part is homework which you need to turn in. The second part is exercises that will not be graded, but you need to turn it in together with the take-home final exam. 1 Stat 605. Homework I. Due

More information

On the Concentration of the Crest Factor for OFDM Signals

On the Concentration of the Crest Factor for OFDM Signals On the Concentration of the Crest Factor for OFDM Signals Igal Sason Department of Electrical Engineering Technion - Israel Institute of Technology, Haifa 3, Israel E-mail: sason@eetechnionacil Abstract

More information

Product measure and Fubini s theorem

Product measure and Fubini s theorem Chapter 7 Product measure and Fubini s theorem This is based on [Billingsley, Section 18]. 1. Product spaces Suppose (Ω 1, F 1 ) and (Ω 2, F 2 ) are two probability spaces. In a product space Ω = Ω 1 Ω

More information

CHAPTER 3: LARGE SAMPLE THEORY

CHAPTER 3: LARGE SAMPLE THEORY CHAPTER 3 LARGE SAMPLE THEORY 1 CHAPTER 3: LARGE SAMPLE THEORY CHAPTER 3 LARGE SAMPLE THEORY 2 Introduction CHAPTER 3 LARGE SAMPLE THEORY 3 Why large sample theory studying small sample property is usually

More information

CALCULUS JIA-MING (FRANK) LIOU

CALCULUS JIA-MING (FRANK) LIOU CALCULUS JIA-MING (FRANK) LIOU Abstract. Contents. Power Series.. Polynomials and Formal Power Series.2. Radius of Convergence 2.3. Derivative and Antiderivative of Power Series 4.4. Power Series Expansion

More information

On second order sufficient optimality conditions for quasilinear elliptic boundary control problems

On second order sufficient optimality conditions for quasilinear elliptic boundary control problems On second order sufficient optimality conditions for quasilinear elliptic boundary control problems Vili Dhamo Technische Universität Berlin Joint work with Eduardo Casas Workshop on PDE Constrained Optimization

More information

6.1 Moment Generating and Characteristic Functions

6.1 Moment Generating and Characteristic Functions Chapter 6 Limit Theorems The power statistics can mostly be seen when there is a large collection of data points and we are interested in understanding the macro state of the system, e.g., the average,

More information

LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS. S. G. Bobkov and F. L. Nazarov. September 25, 2011

LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS. S. G. Bobkov and F. L. Nazarov. September 25, 2011 LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS S. G. Bobkov and F. L. Nazarov September 25, 20 Abstract We study large deviations of linear functionals on an isotropic

More information

Probability: Handout

Probability: Handout Probability: Handout Klaus Pötzelberger Vienna University of Economics and Business Institute for Statistics and Mathematics E-mail: Klaus.Poetzelberger@wu.ac.at Contents 1 Axioms of Probability 3 1.1

More information

If g is also continuous and strictly increasing on J, we may apply the strictly increasing inverse function g 1 to this inequality to get

If g is also continuous and strictly increasing on J, we may apply the strictly increasing inverse function g 1 to this inequality to get 18:2 1/24/2 TOPIC. Inequalities; measures of spread. This lecture explores the implications of Jensen s inequality for g-means in general, and for harmonic, geometric, arithmetic, and related means in

More information

HARMONIC ANALYSIS. Date:

HARMONIC ANALYSIS. Date: HARMONIC ANALYSIS Contents. Introduction 2. Hardy-Littlewood maximal function 3. Approximation by convolution 4. Muckenhaupt weights 4.. Calderón-Zygmund decomposition 5. Fourier transform 6. BMO (bounded

More information

Probability A exam solutions

Probability A exam solutions Probability A exam solutions David Rossell i Ribera th January 005 I may have committed a number of errors in writing these solutions, but they should be ne for the most part. Use them at your own risk!

More information

Generalization Bounds for the Area Under an ROC Curve

Generalization Bounds for the Area Under an ROC Curve Generalization Bounds for the Area Under an ROC Curve Shivani Agarwal, Thore Graepel, Ralf Herbrich, Sariel Har-Peled and Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign

More information

Lectures on Integration. William G. Faris

Lectures on Integration. William G. Faris Lectures on Integration William G. Faris March 4, 2001 2 Contents 1 The integral: properties 5 1.1 Measurable functions......................... 5 1.2 Integration.............................. 7 1.3 Convergence

More information

Lecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity;

Lecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity; CSCI699: Topics in Learning and Game Theory Lecture 2 Lecturer: Ilias Diakonikolas Scribes: Li Han Today we will cover the following 2 topics: 1. Learning infinite hypothesis class via VC-dimension and

More information

Problem Set 1 October 30, 2017

Problem Set 1 October 30, 2017 1. e π can be calculated from e x = x n. But that s not a good approximation method. The n! reason is that π is not small compared to 1. If you want O(0.1) accuracy, then the last term you need to include

More information

STA 711: Probability & Measure Theory Robert L. Wolpert

STA 711: Probability & Measure Theory Robert L. Wolpert STA 711: Probability & Measure Theory Robert L. Wolpert 6 Independence 6.1 Independent Events A collection of events {A i } F in a probability space (Ω,F,P) is called independent if P[ i I A i ] = P[A

More information

Continuous Random Variables

Continuous Random Variables 1 / 24 Continuous Random Variables Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering Indian Institute of Technology Bombay February 27, 2013 2 / 24 Continuous Random Variables

More information

Generalization Bounds in Machine Learning. Presented by: Afshin Rostamizadeh

Generalization Bounds in Machine Learning. Presented by: Afshin Rostamizadeh Generalization Bounds in Machine Learning Presented by: Afshin Rostamizadeh Outline Introduction to generalization bounds. Examples: VC-bounds Covering Number bounds Rademacher bounds Stability bounds

More information

1 + lim. n n+1. f(x) = x + 1, x 1. and we check that f is increasing, instead. Using the quotient rule, we easily find that. 1 (x + 1) 1 x (x + 1) 2 =

1 + lim. n n+1. f(x) = x + 1, x 1. and we check that f is increasing, instead. Using the quotient rule, we easily find that. 1 (x + 1) 1 x (x + 1) 2 = Chapter 5 Sequences and series 5. Sequences Definition 5. (Sequence). A sequence is a function which is defined on the set N of natural numbers. Since such a function is uniquely determined by its values

More information

Introduction to Self-normalized Limit Theory

Introduction to Self-normalized Limit Theory Introduction to Self-normalized Limit Theory Qi-Man Shao The Chinese University of Hong Kong E-mail: qmshao@cuhk.edu.hk Outline What is the self-normalization? Why? Classical limit theorems Self-normalized

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 9 10/2/2013. Conditional expectations, filtration and martingales

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 9 10/2/2013. Conditional expectations, filtration and martingales MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 9 10/2/2013 Conditional expectations, filtration and martingales Content. 1. Conditional expectations 2. Martingales, sub-martingales

More information

A Note on the Central Limit Theorem for a Class of Linear Systems 1

A Note on the Central Limit Theorem for a Class of Linear Systems 1 A Note on the Central Limit Theorem for a Class of Linear Systems 1 Contents Yukio Nagahata Department of Mathematics, Graduate School of Engineering Science Osaka University, Toyonaka 560-8531, Japan.

More information