A Gentle Introduction to Concentration Inequalities
|
|
- Colin Harrington
- 5 years ago
- Views:
Transcription
1 A Gentle Introduction to Concentration Inequalities Karthik Sridharan Abstract This notes is ment to be a review of some basic inequalities and bounds on Random variables. A basic understanding of probability theory and set algebra might be required of the reader. This document is aimed to provide clear and complete proof for some inequalities. For readers familiar with the topics, many of the steps might seem trivial. None the less they are provided to simplify the proofs for readers new to the topic. This notes also provides to the best of my knowledge, the most generalized statement and proof for Symmetrization lemma. I also provides the less famous but geralized proof for Jensen s inequality and logarithmic sobolev inequality. Refer [] for a more detailed review of many of these inequalities with examples demonstrating their uses. Preliminary Throughout this notes we shall consider a probability space (Ω, E, P ) where Ω is the sample space, E is the event class which is a σ algebra on Ω and P is a probability measure. Further, we shall assume that there exists a borel measurable function mapping every point ω Ω to a real number uniquely called a random variable. We shall call the space of random variables X (note that X R). Further, we shall assume that all the functions and sets defined in the notes are measurable under the probability measure P. Chebychev s Inequality Let us start this notes by proving what is refered to as Chebychev s Inequality in [3]. Note, often by Chebychev s inequality an inequality derived from the below proved theorem is used. However [3] refers to this inequality as Tchebycheff s Inequality and the same is followed in this notes.
2 Theorem For some a X where X R, let f be a non-negative function such that {f(x) b x a}, where b Y where Y R. Then the following inequality holds, P (x a) E{f(x)} b Proof Let set X = {x : x a&x X} therefore we have, X X Since f is a non-negative function, taking the lebesgue integral of the function over sets X and X we have, fdp X fdp b X dp X where X dp = P (X ). However the lebague integral over probability measure of a function is its expectation. Hence we have, E{f(x : x X)} bp (X ) and hence E{f(x)} bp (x a) P (x a) E{f(x)} b () Now let us suppose that the function f is monotonically increasing. Therefore for every x a, f(x) f(a). In () use b = f(a). Therefore we get P (x a) E{f(x)} f(a) From this we can get the well known inequalities like P (x a) E{x} a called Markov inequality which holds for a > 0 and nonnegetive x, P ( x E(x) a) E{ x E(x) } a = V ar{x} a () often called the Chebychev s Inequality and the Chernoff s bound P (x a) Eesx e sa (3) In this note, V ar{} and σ are used interchangeably for variance
3 3 Information Theoretic Bounds 3. Jensen s Inequality Here we shall state and prove a generalized, measure theoretic proof for Jensen s inequality. In general, in probability theory, a more specific form of Jensen s inequality is famous. But before that we shall first define a convex function. Definition A function φ(x) is defined to be convex in interval (a, b) if for every point x in the interval (a, b) there exists an m such that for any x (a, b) φ(x) m(x x ) + φ(x ) (4) Note that this definition can be proved to be equivallent to the definition of convex function as one in which the value of the function for any point in the interval of convexity is always below the line segment joining the end points of any subinterval of the convex interval containing that point. We chose this particular definition for simplyfying the proof of Jensen s inequality. Now without further a due, let us move to stating and proving Jensen s Inequality. (Note: Refer [4] for a similar generalized proof for Jensen s Inequality.) Theorem Let f and µ be measurable functions of x which are finite a.e. on A R n. Now let fµ and µ be integrable on A and µ 0. If φ is a function which is convex in interval (a, b) which is the range of function f and A φ(f)µ exists then, A φ( fµ A µ ) A φ(f)µ A µ (5) Proof From our assumptions, the range of f is (a, b) which is the interval in which φ(x) is convex. Hence, consider the number, x A = fµ A µ 3
4 Clearly it is within the interval (a, b). Further, from Equation (4) we have for almost every x, φ(f(x)) m(f(x) x ) + φ(x ) Multiplying by µ and integrating both sides we get, φ(f)µ m( fµ x µ) + φ(x ) A Now see that A fµ x A µ = 0 and hence we have, φ(f)µ φ(x ) µ hence we get the result, A A A φ(f)µ φ( A A A fµ A µ ) µ A A µ Now note that if µ is a probability measure then, A µ = and since expected value is simply lebesgue integral of function w.r.t. probability measure, we have E[φ(f(x))] φ(e[f(x)]) (6) for any function φ convex for the range of function f(x). Now a function φ(x) is convex if φ (x) exists and is monotonically increasing and if second derivative exists and is nonnegetive. Therefore we can conclude that the function log x is a convex function. Therefore by Jensen s inequality, we have E[ log f(x)] log E[f(x)] Now if we take function f(x) to be the probability result we get the result that Entropy H(P ) is always greater than or equal to 0. If we make f(x) the ratio of two probability measures dp and dq, we get the result that relative entropy or KL divergence of two distributions is always non negetive. That is D(P Q) = E P [log( dq(x) dp (x) )] log(e P [ dq(x) dp (x) ]) = log(e Q[dQ(x)]) = 0 Therefore, D(P Q) 0 4
5 3. Han s Inequality We shall first prove the Han s Inequality for entropy and then usingthe result, we shall prove the Han s Inequality for relative entropy. Theorem 3 Let x, x,..., x n be discrete random variables from sample space X. Then H(x,..., x n ) H(x,..., x i, x i+,..., x n ) (7) n Proof Note that D(P X,Y P X P Y ) = H(X) H(X Y ). Since we already proved that Relative entropy is non-negetive, we have, H(X) H(X Y ). This in a vague way means that information (about some variable Y ) can only reduce entropy or uncertainity (H(X Y )), which makes intutive sense. Now consider the entropy, H(x,..., x n ) = H(x,..., x i, x i+,..., x n ) + H(x i x,..., x i, x i+,..., x n ) Since we have already seen that H(X) H(X Y ), applying this we have, H(x,..., x n ) H(x,..., x i, x i+,..., x n ) + H(x i x,..., x i ) Summing both sides upto n we get nh(x,..., x n ) H(x,..., x i, x i+,..., x n ) + H(x i x,..., x i ) Now note that by definition of conditional entropy, H(X Y ) = H(X, Y ) H(Y ). Therefore extending this to many variables we get the chain rule of entropy as, H(x,..., x n ) = H(x i x,..., x i ) Therefore using this chain rule, Therefore, nh(x,..., x n ) H(x,..., x i, x i+,..., x n ) + H(x,..., x n ) H(x,..., x n ) n H(x,..., x i, x i+,..., x n ) 5
6 Now we shall prove the Han s inequality for relative entropies. Let x n = x, x,..., x n be discrete random variables from sample space X just like in the previous case. Now let P and Q be probability distributions in the product space X n and let P be a distribution such that P X n(x,...,x n ) = dp (x ) dp (x ) dx dx... dpn(xn) dx n. That is distribution P assumes independence of the variables (x,..., x n ) with the probability density function of each variable x i as dp i(x) dx i. Let x (i) = (x,..., x i, x i+,..., x n ). Now with this setting we shall state and prove Han s relative entropy inequlity. dx n Theorem 4 Given any distribution Q on product space X n and a distribution P on X n which assumes independence of variables (x,..., x n ) where and Q (i) (x (i) ) = P (i) (x (i) ) = D(Q P ) n X X D(Q (i) P (i) ) (8) dq(x,..., x i, x i, x i+,..., x n ) dx n dx i dp (x,..., x i, x i, x i+,..., x n ) dx n dx i Proof By definition of relative entropy, D(Q P ) = dqlog( dq X n dp ) = dq X n dx n In the above equation, consider the term X n assumption about P we know that dp (x n ) dx n log(dq) dq dx n log( dp dx n )dx n (9) = dp (x ) dp (x)... dp n(x n ) dx dx dx n dq dx n log( dp dx n ))dx n. From our Now P (i) (x (i) ) = X dp (x,...,x i,x i,x i+,...,x n) dx n dx i, therefore, dp (x n ) dx n Therefore using this we get dq X n dx n log( dp (xn ) dx n )dx n = n n ( = dp i(x i ) dx i dp (i) (x (i) ) dx (i) dq X n dx n 6 (log( dp i(x i ) dx i ) + log( dp (i) (x (i) ) dx (i) ))dx n )
7 = n ( n Xn dq (i) (i) dx (i) (log(dp (x (i) ) dx (i) Rearranging the terms we get, dq X n dx n ))dx (i) ) + n dq X n dx n log( dp (xn ) dx n )dx n log( dp (xn ) dx n )dx n = n n ( Xn dq (i) (i) dx (i) (log(dp (x (i) ) dx (i) ))dx (i) ) Now also note that by Han s inequlity for entropy, dq X n dx n log( dq(xn ) dx n )dx n Xn dq (i) log(dq(i) (x (i) ) dx (i) dx (i) )dx (i) Therefore when we consider relative entropy given by Equation (9) we get, n X n D(Q P ) dq (i) log(dq(i) (x (i) ) dx (i) dx (i) )dx (i) n X n Thus finally simplifying we get the required result as D(Q P ) n D(Q (i) P (i) ) dq (i) (i) dx (i) log(dp (x (i) ) dx (i) )dx (i) 4 Inequalities of Sums of Random Variables 4. Hoeffding s Inequality Theorem 5 Let be independent bounded random variables such that the random variable x i falls in the interval [p i, q i ]. Then for any a > 0 we have P ( x i E( x i ) a) e a (q i p i ) Proof Form the Chernoff s bound given by (3) we get, P (x E(x) a) Ees(x E(x)) e sa 7
8 Let S n = x i. Therefore we have, P (S n E(S n ) a) Ees(Sn E(Sn)) e sa e sa n Ee s(x i E(x i )) (0) Now Let y be any random variable such that p y q and Ey = 0. Then for any s > 0 due to convexity of exponential function we have e sy y p q p esq + q y q p esp Taking expectation on both sides we get, Ee sy Now let α = p q p. Therefore, p q p esq + q q p esp Ee sy (αe s(q p) + ( α))e sα(q p) Ee sy e log(αes(q p) +( α)) sα(q p) Ee sy e φ(u) () Where the function φ(u) = log(αe u + ( α)) uα and u = s(q p). Now using Taylor s theorem, we have for some η, But we have φ(0) = 0 and φ (x) = φ(x) = φ(0) + xφ (0) + x φ (η) () φ (x) = αe x αe x +( α) α. Therefore φ (0) = 0 Now, α( α)ex ( α + αe x ) If we consider φ (x) we see that the function is maximum when φ (x) = α( α)ex ( α + αe x ) α ( α)e x ( α + αe x ) 3 = 0 e x = α α 8
9 Therefore, for any x φ (x) (( α) ) ( α) = 4 Therefore from () and () we have Therefore for any p y q Using this in (0) we get, Ee sy e u 8 Ee sy e s (q p) 8 (3) P (S n E(S n ) a) e sa e s (q i p i ) 8 Now we find the best bound by minimizing the L.H.S. of the above equation w.r.t s. Therefore we have de s (q i p i ) sa 8 ds Therefore for the best bound we have and correspondingly we get = e s (q i p i ) 8 sa (q i p i ) (s 8 s = 4a i (q i p i ) a) = 0 P (S n E(S n ) a) e a (q i p i ) (4) Now an interesting result from the Hoeffding s inequality often used in learning theory is to bound not the differnce in sum and its corresponding expectation but the emperical average of loss function and its expectation. This is done by using (4) as, E(Sn) denotes the emperical average of x and n = Ex. There- Now E n x = Sn n fore we have P (S n E(S n ) na) e P ( S n n E(S n) n a) e (an) (q i p i ) (an) (q i p i ) P (E n (x) E(x) a) e n a (q i p i ) (5) 9
10 4. Bernstein s Inequality Hoeffding s inequality does not use any knowledge about the distribution of variables. The Bernstein s inequality [7] uses the variance of the distribution to get a tighter bound. Theorem 6 Let x, x,..., x n be independent bounded random variables such that Ex i = 0 and x i ς with probability and let σ = n V ar{x i } Then for any a > 0 we have P ( x i ɛ) e n nɛ σ +ςɛ/3 Proof We need to re-estimate the new bound starting from (0). Let F i = r= s r E(x r i ) r!σ i where σ i = Ex i. Now ex = + x + r= x r r!. Therefore, Ee sx i = + sex i + r= E(x r i ) r! Since Ex i = 0 we have, Ee sx i = + F i s σ i e F is σ i Consider the term Ex r i. Since expectation of a function is just the Lebesgue integral of the function with respect to probability measure, we have Ex r i = P xr i x i. Using Schwarz s inequality we get, Ex r i = P x r i x i ( P Ex r i σ i ( x r i ) ( x i ) P P x r i ) Proceeding to use the Schwarz s inequality recursively n times we get Ex r i σ n i ( P 0 x (n r n+ ) i ) n
11 {σ ( n ) i ( x (n r n+ ) i ) n } P Now we know that x i ς. Therefore Hence we get Taking limit n to infinity we get Therefore, Therefore, ( x (n r n+ ) i ) n (ς (n r n+ ) ) n P Ex r i {σ ( n ) i ς (r n ) } Ex r i lim n {σ ( n ) i ς (r n ) } F i = r= F i s ς Applying this to (6) we get, Ex r i σ i ς r (6) s r E(x r i ) r!σ i r= s r ς r r! Ex r i e s σ i Now using (0) and the fact that σ = σ i n r= s r σ i ςr r!σ i = s ςi (e sς sς) (e sς sς) s ς we get, P (S n a) e sa e s nσ (e sς sς) s ς (7) Now to obtain the closest bound we minimize R.H.S w.r.t s. Therefore we get de s nσ ( (e sς sς) s ς ) sa ds Therefore to get a tighter bound we have = e s nσ ( (e sς sς) s ς ) sa (nσ ( (ςesς ς) ς ) a) = 0 ( (esς ) ) = a ς nσ
12 Therefore we have Using this s in (7) we get s = aς log( ς nσ + ) P (S n a) e nσ ς e nσ ς ( aς nσ ( aς aς nσ log( nσ +)) a aς log( ς nσ +) aς aς aς log( nσ +) nσ log( nσ +)) Let H(x) = ( + x)log( + x) x Therefore we get P (S n a) e nσ ς H( aς nσ ) (8) This is called the Bennett s inequality [?]. We can derive the Bernstien s inequality by further bounding the function H(x). Let function G(x) = 3 x x+3. We see that H(0) = G(0) = H (0) = G (0) = 0 and we see that H (x) = x+ and G (x) = 7. Therefore H (x+3) (0) G (0) and further 3 if f n (x) of a function f represents the n th derivative of the function then we have H n (0) G n (0) for any {n }. Therefore as a consequence of Taylor s theorem we have H(x) G(x) x 0 Therefore applying this to (8) we get P ( i P ( Now let a = nɛ. Therefore, Therefore we get, P ( x i a) e nσ ς G( aς nσ ) x i a) e i x i nɛ) e i P ( x i ɛ) e n a (aς+3nσ ) ɛ n (nɛς+3nσ ) nɛ σ +ςɛ/3 (9) An interesting phenomenon here is that if σ < ɛ then the upper bound grows as e nɛ rather than e nɛ as suggested by Hoeffding s inequality (4).
13 5 Inequalities of Functions of Random Variables 5. Efron Stien s Inequality Till now we only considered sum of R.V.s. Now we shall consider functions of R.V.s. The Efron Stien [9] inequality is one of the tightest bounds known. Theorem 7 Let S : X n R be a measurable function which is invariant under permutation and let the random variable Z be given by Z = S(x, x,..., x n ). Then we have V ar(z) E[(Z Z i) ] where Z i = S(x,..., x i,..., x n) where {x,...x n} is another sample from the same distribution as that of {x,...x n } Proof Let E i Z = E[Z x,..., x i, x i+,..., x n ] and let V = Z EZ. Now if we define V i as then V = V i and V i = E[Z x,..., x i ] E[Z x,..., x i ], i =,.., n V ar(z) = EV = E[( V i ) ] = E[ Vi ] + E[ V i V j ] i>j Now E[XY ] = E[E[XY Y ]] = E[Y E[X Y ]], therefore E[V j V i ] = E[V i E[V j x,..., x i ]]. But since i > j E[V j x,..., x i ] = 0. Therefore we have, Now let E x j i V ar(z) = E[ V i ] = E[V i ] represent expectation w.r.t variables {x i,..., x j }. V ar(z) = E[(E[Z x,..., x i ] E[Z x,..., x i ]) ] 3
14 = E x i [(E x n i+ [Z x,..., x i ] E x n i [Z x,..., x i ]) ] = E x i [(E x n i+ [Z x,..., x i ] E x n i+ [E xi [Z x,..., x i ]]) ] However, x is a convex function and hence we can apply Jensens inequality (6) and hence get, Therefore, V ar(z) E x i,x n [(Z E x i+ i [Z x,..., x i, x i+,..., x n ]) ] V ar(z) E[(Z E i [Z]) ] Where E i [Z] = E[Z x,..., x i, x i+,..., x n ]. Now let x and y be independent samples from the same distribution. E(x y) = E[x + y xy] = E[x ] (E[x]) Hence, if x and y are i.i.d s, then V ar{x} = E[ (x y) ]. Thus we have, E i [(Z E i [Z]) ] = E i[(z Z i) ] Thus we have the Efron-Stein inequality as V ar(z) E[(Z Z i) ] (0) Notice that if function S is the sum of the random variables the inequality becomes an equality. Hence the bound is tight. It is often refered to as jacknife bound. 5. McDiarmid s Inequality Theorem 8 Let S : X n R be a measurable function which is invariant under permutation and let the random variable Z be given by Z = S(x, x,..., x n ). Then for any a > 0 we have P ( Z E[Z] a) e a ς i 4
15 whenever the function has bounded difference [0]. That is sup x,...,x n,x i S(x, x,..., x n ) S(x,..., x i,..., x n ) ς i where Z i = S(x,..., x i,..., x n) where {x,...x n} is a sample from the same distribution as {x,...x n } Proof Using Chernoff s bound (3) we get Now let, P (Z E[Z] a) e sa e E[Z E[Z]] V i = E[Z x,..., x i ] E[Z x,..., x i ], i =,.., n then V = V i = Z E[Z]. Therefore, P (Z E[Z] a) e sa n E[e sv i ] = e sa n E[e sv i ] () Now Let V i be bounded by the interval [L i, U i ]. We know that Z Z i ) ς i, hence it follows that V i ς i and hence U i L i ς i. Using (3) on E[e sv i ] we get, E[e sv i ] e s (U i L i ) 8 e s ς i 8 Using this in () we get, P (Z E[Z] a) e as n e s ς i 8 = e s n ς i 8 sa Now to make the bound tight we simply minimize it with respect to s. Therefore to do that, ςi s 8 a = 0 Therefore the bound is given by, s = P (Z E[Z] a) e ( 4a ς i 4a n ς i ) ς i 4a 8 ( n ) ς i 5
16 Hence we get, P (Z E[Z] a) e ( P ( Z E[Z] a) e ( a ς i a ς i ) ) () 5.3 Logarithmic Sobolev Inequality This inequality is very useful in deriving simple proofs for many known bounds using a method famous as Ledoux method []. Before jumping into the therem, let us first prove a useful lemma. Lemma 9 For any positive random variable y and α > 0 we have E{ylog y} Ey log Ey E{ylog y ylog α (y α)} (3) Proof For any (x > 0 ) we have, log x x. Therefore, log α Ey α Ey Therefore, Ey log α Ey α Ey adding E{ylog y} on both sides we get, Ey logα Ey logey + E{ylog y} α Ey + E{ylog y} Therefore simplifying we get the required result as E{ylog y} Ey log Ey E{ylog y ylog α (y α)} Now just like in the previous two sections, let S : X n R be a measurable function which is invariant under permutation and let the random variable Z be given by Z = S(x, x,..., x n ). Here we assume independence of (x, x,..., x n ). Z i = S(x,..., x i,..., x n) where {x,...x n} is another sample from the same distribution as that of {x,...x n }. Now we are ready to state and prove the logarithmic Sobolev inequality. 6
17 Theorem 0 If function ψ(x) = e x x then, se{ze sz } E{e sz }log E{e sz } E{e sz ψ( s(z Z i))} (4) Proof From Lemma (Equation(3)) we have for any positive variable Y and α = Y i > 0, E i {Y log Y } E i Y log E i Y E i {Y log Y Y log Y i (Y Y i )} Now let Y = e sz and Y i = esz i then, E i {Y log Y } E i Y log E i Y E i {e sz (sz sz i) e sz ( e sz i sz )} Writing it in terms of function ψ(x) we get, E i {Y log Y } E i Y log E i Y E i {e sz ψ( s(z Z i))} (5) Now let measure P denote the distribution of (x,..., x n ) and let distribution Q be given by, Then we have, dq(x,..., x n ) = dp (x,..., x n )Y (x,..., x n ) D(Q P ) = E Q {log Y } = E P {Y log Y } = E{Y log Y } Now since Y is positive, we have D(Q P ) E{Y log Y } EY log EY (6) However by Han s inequality for relative entropy, rearranging Equation (8) we have, D(Q P ) D(Q P ) D(Q (i) P (i) ) (7) Now we have already shown that D(Q P ) = E{Y log Y } and further, E{E i {Y log Y }} = E{Y log Y } therefore, D(Q P ) = E{E i {Y log Y }}. Now by definition, dq (i) (x (i) ) dx (i) = X = X dq(x,..., x i, x i, x i+,..., x n ) dx n dx i Y (x,..., x n )dp (x,..., x i, x i, x i+,..., x n ) dx n dx i 7
18 Due to the independence assumption of the sample, we can rewrite the above as, dq (i) (x (i) ) dx (i) = dp (x,..., x i, x i+,..., x n ) dx (i) Y (x,..., x n )dp i (x i ) X = dp (x,..., x i, x i+,..., x n ) dx (i) E i Y Therefore D(Q (i) P (i) ) is given by, D(Q (i) P (i) ) = E i Y log E i Y dp (x i ) = E{E i Y log E i Y } X n Now using this we get, D(Q P ) D(Q (i) P (i) ) = E{E i {Y log Y }} E{E i Y log E i Y } Therefore, using this in Equation (7) we get, D(Q P ) E{E i {Y log Y }} E{E i Y log E i Y } and using this inturn with Equation (6) we get, E{Y log Y } EY log EY E{E i {Y log Y } E i Y log E i Y } Using the above with Equation (5) and substituting Y with e sz we get the first required result as, se{ze sz } E{e sz }log E{e sz } E{e sz ψ( s(z Z i))} 6 Symmetrization Lemma The symmetrisization lemma is probably one of the easier bounds we review in this notes. However, it is extremely powerful since it allows us to bound the difference between of emperical mean of a function and its expected value, using the difference between emperical means of the function 8
19 for independent samples of the same size as the original sample. Note that in most literature, the symmetrization lemma stated and proved only bounds zero one functions like loss function or the actual classification function. Here we derive a more generalized version where we prove the lemma for bounding the difference between expectation of any measurable function with bounded variance and its emperical mean. Lemma Let f : X R be a measurable function such that V ar{f} C. Let En ˆ {f} be the emperical mean of the function f(x) estimated using a set (x, x,..., x n ) of n independent identical samples from space X. Then for any a > 0 if n > 8C we have, a P ( ˆ E n {f} ˆ En {f} a) P ( E{f} ˆ E n {f} > a) (8) where E ˆ n {f} and En ˆ {f} stand for emperical mean of the function f(x) estimated using samples (x, x,..., x n) and (x, x,..., x n) respectively Proof By the definition of probability, P ( ˆ E n {f} ˆ En {f} a) = X n [ ˆ En {f} ˆ En {f} a]dp Where function z is for any z 0 and 0 otherwise. Since X n is the product space X n X n, using Fubini s theorem we have, P ( E ˆ n {f} En ˆ {f} X a) = n X n [ En ˆ {f} En ˆ {f} a]dp dp Now since the set Y = {(x, x,..., x n ) : E{f} ˆ E n {f} > a} is a subset of X n and term inside the integral is always non-negetive, P ( E ˆ n {f} En ˆ {f} Y a) X n [ En ˆ {f} En ˆ {f} a]dp dp Now let Let Z = {(x, x,..., x n ) : E ˆ n {f} E{f(x)} a }. Clearly for any sample (x, x,..., x n ), if (x, x,..., x n ) Y and (x n+, x n+,..., x n ) Z it implies that (x, x,..., x n ) is a sample such that E ˆ n {f} En ˆ {f} a Therefore, coming back to the integral since Z X n, Y X n [ En ˆ {f} En ˆ {f} a]dp dp [ En ˆ {f} En ˆ {f} Y Z a]dp dp 9
20 Now since the integral is over Y and Z half spaces as we saw earlier, [ En ˆ {f} En ˆ {f} Y Z a]dp dp = dp dp Y Z = P ( ˆ E n {f} E{f} Y a)dp Therefore, Y X n [ En ˆ {f} En ˆ {f} a]dp dp P ( ˆ E n {f} E{f} Y a)dp Now, P ( E ˆ n {f} E{f} a) = P ( Eˆ n {f} E{f} > a) Using Equation () (often called the chebyshev s inequality) we get, P ( E ˆ 4V ar{f} n {f} E{f} > a) na Now if we choose n such that n > 8C a Therefore, 4C na as per our assumption then, P ( ˆ E n {f} E{f} > a) P ( ˆ E n {f} E{f} a) = Puting this back in the integral we get, P ( ˆ E n {f} ˆ En {f} a) Y Therefore we get the final result as, dp = dp Y P ( ˆ E n {f} ˆ En {f} a) P ( E{f} ˆ E n {f} > a) Note that if we make the function f to be a zero one function then the maximum possible variance is 4. Hence if we set C to 4 then the condition under which the inequality holds becomes, n >. Further note that if a we choose the zero one function f(x) such that it is when say x is a particular value and 0 if not, then the result basically bounds the absolute difference between probability of the event of x taking a particular value and the frequency estimate of x taking that value in a sample of size n using the difference in the frequencies of occurance of the value in independent samples of size n. 0
21 References [] V.N. Vapnik, A.YA. Chervonenkis, On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities, Theory of Probability and its Applications, 6():64-8, 97. [] Gabor Lugosi, Concentration-of-Measure Inequalities, Lecture Notes, lugosi/anu.pdf [3] A.N. Kolmogorov, Foundations of Probability, Chelsea Publications, NY, 956. [4] Richard L. Wheeden, Antoni Zygmund, Measure and Integral:An Introduction to Real Analysis, International Series of Monographs in Pure and Applied Mathematics. [5] W. Hoeffding, Probability Inequalities for Sums of Bounded Random Variables, Journal of the American Statistical Association, 58:3-30, 963. [6] G. Benette, Probability Inequalities for Sum of Independent Random Variables, Journal of the American Statistical Association, 57:33-45, 96. [7] S.N. Bernstein, The Theory of Probabilities, Gastehizdat Publishing House, Moscow, 946. [8] V. V. Petrov, Sums of Independent Random Variables, Translated by A.A. Brown, Springer Verlag, 975. [9] B. Efron, C. Stein, The Jacknife Estimate of Variance, Annals of Statistics, 9:586-96, 98. [0] C. Mcdiarmid, On the Method of Bounded Differences, In Surveys in Combinatorics, LMS lecture. note series 4, 989. [] Michel Ledoux, The Concentration of Measure Phenomenon, American Mathematical Society, Mathematical Surveys and Monographs, Vol 89, 00. [] Olivier Bousquet, Stephane Boucheron, and Gabor Lugosi, Introduction to Statistical Learning Theory,Lecture Notes, lugosi/mlss slt.pdf
A Gentle Introduction to Concentration Inequalities
A Gentle Introducton to Concentraton Inequaltes Karthk Srdharan Abstract Ths notes s ment to be a revew of some basc nequaltes and bounds on Random varables. A basc understandng of probablty theory and
More informationConcentration inequalities and the entropy method
Concentration inequalities and the entropy method Gábor Lugosi ICREA and Pompeu Fabra University Barcelona what is concentration? We are interested in bounding random fluctuations of functions of many
More information4 Expectation & the Lebesgue Theorems
STA 205: Probability & Measure Theory Robert L. Wolpert 4 Expectation & the Lebesgue Theorems Let X and {X n : n N} be random variables on a probability space (Ω,F,P). If X n (ω) X(ω) for each ω Ω, does
More informationLecture 3: Expected Value. These integrals are taken over all of Ω. If we wish to integrate over a measurable subset A Ω, we will write
Lecture 3: Expected Value 1.) Definitions. If X 0 is a random variable on (Ω, F, P), then we define its expected value to be EX = XdP. Notice that this quantity may be. For general X, we say that EX exists
More informationLecture 1 Measure concentration
CSE 29: Learning Theory Fall 2006 Lecture Measure concentration Lecturer: Sanjoy Dasgupta Scribe: Nakul Verma, Aaron Arvey, and Paul Ruvolo. Concentration of measure: examples We start with some examples
More informationLectures 22-23: Conditional Expectations
Lectures 22-23: Conditional Expectations 1.) Definitions Let X be an integrable random variable defined on a probability space (Ω, F 0, P ) and let F be a sub-σ-algebra of F 0. Then the conditional expectation
More informationProbability Background
Probability Background Namrata Vaswani, Iowa State University August 24, 2015 Probability recap 1: EE 322 notes Quick test of concepts: Given random variables X 1, X 2,... X n. Compute the PDF of the second
More informationLECTURE 3. Last time:
LECTURE 3 Last time: Mutual Information. Convexity and concavity Jensen s inequality Information Inequality Data processing theorem Fano s Inequality Lecture outline Stochastic processes, Entropy rate
More informationLecture 2: Random Variables and Expectation
Econ 514: Probability and Statistics Lecture 2: Random Variables and Expectation Definition of function: Given sets X and Y, a function f with domain X and image Y is a rule that assigns to every x X one
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory Problem set 1 Due: Monday, October 10th Please send your solutions to learning-submissions@ttic.edu Notation: Input space: X Label space: Y = {±1} Sample:
More informationConcentration Inequalities
Chapter Concentration Inequalities I. Moment generating functions, the Chernoff method, and sub-gaussian and sub-exponential random variables a. Goal for this section: given a random variable X, how does
More informationConcentration, self-bounding functions
Concentration, self-bounding functions S. Boucheron 1 and G. Lugosi 2 and P. Massart 3 1 Laboratoire de Probabilités et Modèles Aléatoires Université Paris-Diderot 2 Economics University Pompeu Fabra 3
More informationMarch 1, Florida State University. Concentration Inequalities: Martingale. Approach and Entropy Method. Lizhe Sun and Boning Yang.
Florida State University March 1, 2018 Framework 1. (Lizhe) Basic inequalities Chernoff bounding Review for STA 6448 2. (Lizhe) Discrete-time martingales inequalities via martingale approach 3. (Boning)
More informationProbability and Measure
Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 84 Paper 4, Section II 26J Let (X, A) be a measurable space. Let T : X X be a measurable map, and µ a probability
More informationConcentration inequalities and tail bounds
Concentration inequalities and tail bounds John Duchi Outline I Basics and motivation 1 Law of large numbers 2 Markov inequality 3 Cherno bounds II Sub-Gaussian random variables 1 Definitions 2 Examples
More informationLecture 2: Review of Basic Probability Theory
ECE 830 Fall 2010 Statistical Signal Processing instructor: R. Nowak, scribe: R. Nowak Lecture 2: Review of Basic Probability Theory Probabilistic models will be used throughout the course to represent
More informationUseful Probability Theorems
Useful Probability Theorems Shiu-Tang Li Finished: March 23, 2013 Last updated: November 2, 2013 1 Convergence in distribution Theorem 1.1. TFAE: (i) µ n µ, µ n, µ are probability measures. (ii) F n (x)
More informationOXPORD UNIVERSITY PRESS
Concentration Inequalities A Nonasymptotic Theory of Independence STEPHANE BOUCHERON GABOR LUGOSI PASCAL MASS ART OXPORD UNIVERSITY PRESS CONTENTS 1 Introduction 1 1.1 Sums of Independent Random Variables
More informationx log x, which is strictly convex, and use Jensen s Inequality:
2. Information measures: mutual information 2.1 Divergence: main inequality Theorem 2.1 (Information Inequality). D(P Q) 0 ; D(P Q) = 0 iff P = Q Proof. Let ϕ(x) x log x, which is strictly convex, and
More informationEE514A Information Theory I Fall 2013
EE514A Information Theory I Fall 2013 K. Mohan, Prof. J. Bilmes University of Washington, Seattle Department of Electrical Engineering Fall Quarter, 2013 http://j.ee.washington.edu/~bilmes/classes/ee514a_fall_2013/
More informationSpring 2012 Math 541B Exam 1
Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote
More informationEECS 750. Hypothesis Testing with Communication Constraints
EECS 750 Hypothesis Testing with Communication Constraints Name: Dinesh Krithivasan Abstract In this report, we study a modification of the classical statistical problem of bivariate hypothesis testing.
More informationEXPONENTIAL INEQUALITIES IN NONPARAMETRIC ESTIMATION
EXPONENTIAL INEQUALITIES IN NONPARAMETRIC ESTIMATION Luc Devroye Division of Statistics University of California at Davis Davis, CA 95616 ABSTRACT We derive exponential inequalities for the oscillation
More informationErgodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.
Ergodic Theorems Samy Tindel Purdue University Probability Theory 2 - MA 539 Taken from Probability: Theory and examples by R. Durrett Samy T. Ergodic theorems Probability Theory 1 / 92 Outline 1 Definitions
More informationBennett-type Generalization Bounds: Large-deviation Case and Faster Rate of Convergence
Bennett-type Generalization Bounds: Large-deviation Case and Faster Rate of Convergence Chao Zhang The Biodesign Institute Arizona State University Tempe, AZ 8587, USA Abstract In this paper, we present
More informationIntroduction to Statistical Learning Theory
Introduction to Statistical Learning Theory In the last unit we looked at regularization - adding a w 2 penalty. We add a bias - we prefer classifiers with low norm. How to incorporate more complicated
More informationProbability and Measure
Chapter 4 Probability and Measure 4.1 Introduction In this chapter we will examine probability theory from the measure theoretic perspective. The realisation that measure theory is the foundation of probability
More informationMeasure and Integration: Solutions of CW2
Measure and Integration: s of CW2 Fall 206 [G. Holzegel] December 9, 206 Problem of Sheet 5 a) Left (f n ) and (g n ) be sequences of integrable functions with f n (x) f (x) and g n (x) g (x) for almost
More information3. Review of Probability and Statistics
3. Review of Probability and Statistics ECE 830, Spring 2014 Probabilistic models will be used throughout the course to represent noise, errors, and uncertainty in signal processing problems. This lecture
More informationLECTURE 2. Convexity and related notions. Last time: mutual information: definitions and properties. Lecture outline
LECTURE 2 Convexity and related notions Last time: Goals and mechanics of the class notation entropy: definitions and properties mutual information: definitions and properties Lecture outline Convexity
More informationLecture 22: Final Review
Lecture 22: Final Review Nuts and bolts Fundamental questions and limits Tools Practical algorithms Future topics Dr Yao Xie, ECE587, Information Theory, Duke University Basics Dr Yao Xie, ECE587, Information
More informationBrownian Motion and Conditional Probability
Math 561: Theory of Probability (Spring 2018) Week 10 Brownian Motion and Conditional Probability 10.1 Standard Brownian Motion (SBM) Brownian motion is a stochastic process with both practical and theoretical
More informationd(x n, x) d(x n, x nk ) + d(x nk, x) where we chose any fixed k > N
Problem 1. Let f : A R R have the property that for every x A, there exists ɛ > 0 such that f(t) > ɛ if t (x ɛ, x + ɛ) A. If the set A is compact, prove there exists c > 0 such that f(x) > c for all x
More information5 Operations on Multiple Random Variables
EE360 Random Signal analysis Chapter 5: Operations on Multiple Random Variables 5 Operations on Multiple Random Variables Expected value of a function of r.v. s Two r.v. s: ḡ = E[g(X, Y )] = g(x, y)f X,Y
More informationFoundations of Machine Learning
Introduction to ML Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu page 1 Logistics Prerequisites: basics in linear algebra, probability, and analysis of algorithms. Workload: about
More informationKernel Density Estimation
EECS 598: Statistical Learning Theory, Winter 2014 Topic 19 Kernel Density Estimation Lecturer: Clayton Scott Scribe: Yun Wei, Yanzhen Deng Disclaimer: These notes have not been subjected to the usual
More information7 Convergence in R d and in Metric Spaces
STA 711: Probability & Measure Theory Robert L. Wolpert 7 Convergence in R d and in Metric Spaces A sequence of elements a n of R d converges to a limit a if and only if, for each ǫ > 0, the sequence a
More informationAN INEQUALITY FOR TAIL PROBABILITIES OF MARTINGALES WITH BOUNDED DIFFERENCES
Lithuanian Mathematical Journal, Vol. 4, No. 3, 00 AN INEQUALITY FOR TAIL PROBABILITIES OF MARTINGALES WITH BOUNDED DIFFERENCES V. Bentkus Vilnius Institute of Mathematics and Informatics, Akademijos 4,
More informationP (A G) dp G P (A G)
First homework assignment. Due at 12:15 on 22 September 2016. Homework 1. We roll two dices. X is the result of one of them and Z the sum of the results. Find E [X Z. Homework 2. Let X be a r.v.. Assume
More informationNovel Bernstein-like Concentration Inequalities for the Missing Mass
Novel Bernstein-like Concentration Inequalities for the Missing Mass Bahman Yari Saeed Khanloo Monash University bahman.khanloo@monash.edu Gholamreza Haffari Monash University gholamreza.haffari@monash.edu
More informationX n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2)
14:17 11/16/2 TOPIC. Convergence in distribution and related notions. This section studies the notion of the so-called convergence in distribution of real random variables. This is the kind of convergence
More informationHoeffding, Chernoff, Bennet, and Bernstein Bounds
Stat 928: Statistical Learning Theory Lecture: 6 Hoeffding, Chernoff, Bennet, Bernstein Bounds Instructor: Sham Kakade 1 Hoeffding s Bound We say X is a sub-gaussian rom variable if it has quadratically
More informationMeasure-theoretic probability
Measure-theoretic probability Koltay L. VEGTMAM144B November 28, 2012 (VEGTMAM144B) Measure-theoretic probability November 28, 2012 1 / 27 The probability space De nition The (Ω, A, P) measure space is
More information1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer 11(2) (1989),
Real Analysis 2, Math 651, Spring 2005 April 26, 2005 1 Real Analysis 2, Math 651, Spring 2005 Krzysztof Chris Ciesielski 1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer
More informationHomework 11. Solutions
Homework 11. Solutions Problem 2.3.2. Let f n : R R be 1/n times the characteristic function of the interval (0, n). Show that f n 0 uniformly and f n µ L = 1. Why isn t it a counterexample to the Lebesgue
More informationUniform concentration inequalities, martingales, Rademacher complexity and symmetrization
Uniform concentration inequalities, martingales, Rademacher complexity and symmetrization John Duchi Outline I Motivation 1 Uniform laws of large numbers 2 Loss minimization and data dependence II Uniform
More informationProbability Theory Overview and Analysis of Randomized Algorithms
Probability Theory Overview and Analysis of Randomized Algorithms Analysis of Algorithms Prepared by John Reif, Ph.D. Probability Theory Topics a) Random Variables: Binomial and Geometric b) Useful Probabilistic
More informationPart II Probability and Measure
Part II Probability and Measure Theorems Based on lectures by J. Miller Notes taken by Dexter Chua Michaelmas 2016 These notes are not endorsed by the lecturers, and I have modified them (often significantly)
More informationLecture I: Asymptotics for large GUE random matrices
Lecture I: Asymptotics for large GUE random matrices Steen Thorbjørnsen, University of Aarhus andom Matrices Definition. Let (Ω, F, P) be a probability space and let n be a positive integer. Then a random
More informationEE5319R: Problem Set 3 Assigned: 24/08/16, Due: 31/08/16
EE539R: Problem Set 3 Assigned: 24/08/6, Due: 3/08/6. Cover and Thomas: Problem 2.30 (Maimum Entropy): Solution: We are required to maimize H(P X ) over all distributions P X on the non-negative integers
More informationTHE LINDEBERG-FELLER CENTRAL LIMIT THEOREM VIA ZERO BIAS TRANSFORMATION
THE LINDEBERG-FELLER CENTRAL LIMIT THEOREM VIA ZERO BIAS TRANSFORMATION JAINUL VAGHASIA Contents. Introduction. Notations 3. Background in Probability Theory 3.. Expectation and Variance 3.. Convergence
More informationChapter 7. Basic Probability Theory
Chapter 7. Basic Probability Theory I-Liang Chern October 20, 2016 1 / 49 What s kind of matrices satisfying RIP Random matrices with iid Gaussian entries iid Bernoulli entries (+/ 1) iid subgaussian entries
More informationEE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm
EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm 1. Feedback does not increase the capacity. Consider a channel with feedback. We assume that all the recieved outputs are sent back immediately
More informationh(x) lim H(x) = lim Since h is nondecreasing then h(x) 0 for all x, and if h is discontinuous at a point x then H(x) > 0. Denote
Real Variables, Fall 4 Problem set 4 Solution suggestions Exercise. Let f be of bounded variation on [a, b]. Show that for each c (a, b), lim x c f(x) and lim x c f(x) exist. Prove that a monotone function
More information16.1 Bounding Capacity with Covering Number
ECE598: Information-theoretic methods in high-dimensional statistics Spring 206 Lecture 6: Upper Bounds for Density Estimation Lecturer: Yihong Wu Scribe: Yang Zhang, Apr, 206 So far we have been mostly
More informationThe Central Limit Theorem: More of the Story
The Central Limit Theorem: More of the Story Steven Janke November 2015 Steven Janke (Seminar) The Central Limit Theorem:More of the Story November 2015 1 / 33 Central Limit Theorem Theorem (Central Limit
More informationPart IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015
Part IA Probability Theorems Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.
More informationExercises and Answers to Chapter 1
Exercises and Answers to Chapter The continuous type of random variable X has the following density function: a x, if < x < a, f (x), otherwise. Answer the following questions. () Find a. () Obtain mean
More information11.1 Set Cover ILP formulation of set cover Deterministic rounding
CS787: Advanced Algorithms Lecture 11: Randomized Rounding, Concentration Bounds In this lecture we will see some more examples of approximation algorithms based on LP relaxations. This time we will use
More informationProbability inequalities 11
Paninski, Intro. Math. Stats., October 5, 2005 29 Probability inequalities 11 There is an adage in probability that says that behind every limit theorem lies a probability inequality (i.e., a bound on
More informationSTA205 Probability: Week 8 R. Wolpert
INFINITE COIN-TOSS AND THE LAWS OF LARGE NUMBERS The traditional interpretation of the probability of an event E is its asymptotic frequency: the limit as n of the fraction of n repeated, similar, and
More informationTail and Concentration Inequalities
CSE 694: Probabilistic Analysis and Randomized Algorithms Lecturer: Hung Q. Ngo SUNY at Buffalo, Spring 2011 Last update: February 19, 2011 Tail and Concentration Ineualities From here on, we use 1 A to
More informationCLASSICAL PROBABILITY MODES OF CONVERGENCE AND INEQUALITIES
CLASSICAL PROBABILITY 2008 2. MODES OF CONVERGENCE AND INEQUALITIES JOHN MORIARTY In many interesting and important situations, the object of interest is influenced by many random factors. If we can construct
More informationSolutions: Problem Set 4 Math 201B, Winter 2007
Solutions: Problem Set 4 Math 2B, Winter 27 Problem. (a Define f : by { x /2 if < x
More informationOn Negatively Associated Random Variables 1
ISSN 1995-0802, Lobachevskii Journal of Mathematics, 2012, Vol. 33, No. 1, pp. 47 55. c Pleiades Publishing, Ltd., 2012. On Negatively Associated Random Variables 1 M. Gerasimov, V. Kruglov 1*,and A.Volodin
More informationLearning Theory. Sridhar Mahadevan. University of Massachusetts. p. 1/38
Learning Theory Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts p. 1/38 Topics Probability theory meet machine learning Concentration inequalities: Chebyshev, Chernoff, Hoeffding, and
More informationLectures 6: Degree Distributions and Concentration Inequalities
University of Washington Lecturer: Abraham Flaxman/Vahab S Mirrokni CSE599m: Algorithms and Economics of Networks April 13 th, 007 Scribe: Ben Birnbaum Lectures 6: Degree Distributions and Concentration
More informationIf Y and Y 0 satisfy (1-2), then Y = Y 0 a.s.
20 6. CONDITIONAL EXPECTATION Having discussed at length the limit theory for sums of independent random variables we will now move on to deal with dependent random variables. An important tool in this
More informationEcon 508B: Lecture 5
Econ 508B: Lecture 5 Expectation, MGF and CGF Hongyi Liu Washington University in St. Louis July 31, 2017 Hongyi Liu (Washington University in St. Louis) Math Camp 2017 Stats July 31, 2017 1 / 23 Outline
More informationLecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157
Lecture 6: Gaussian Channels Copyright G. Caire (Sample Lectures) 157 Differential entropy (1) Definition 18. The (joint) differential entropy of a continuous random vector X n p X n(x) over R is: Z h(x
More informationProbability and Measure
Probability and Measure Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Convergence of Random Variables 1. Convergence Concepts 1.1. Convergence of Real
More informationPhenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012
Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 202 BOUNDS AND ASYMPTOTICS FOR FISHER INFORMATION IN THE CENTRAL LIMIT THEOREM
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 59 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d
More informationHW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.
HW1 solutions Exercise 1 (Some sets of probability distributions.) Let x be a real-valued random variable with Prob(x = a i ) = p i, i = 1,..., n, where a 1 < a 2 < < a n. Of course p R n lies in the standard
More informationAppendix B: Inequalities Involving Random Variables and Their Expectations
Chapter Fourteen Appendix B: Inequalities Involving Random Variables and Their Expectations In this appendix we present specific properties of the expectation (additional to just the integral of measurable
More informationLecture 11. Probability Theory: an Overveiw
Math 408 - Mathematical Statistics Lecture 11. Probability Theory: an Overveiw February 11, 2013 Konstantin Zuev (USC) Math 408, Lecture 11 February 11, 2013 1 / 24 The starting point in developing the
More informationProbability reminders
CS246 Winter 204 Mining Massive Data Sets Probability reminders Sammy El Ghazzal selghazz@stanfordedu Disclaimer These notes may contain typos, mistakes or confusing points Please contact the author so
More information1 Stat 605. Homework I. Due Feb. 1, 2011
The first part is homework which you need to turn in. The second part is exercises that will not be graded, but you need to turn it in together with the take-home final exam. 1 Stat 605. Homework I. Due
More informationOn the Concentration of the Crest Factor for OFDM Signals
On the Concentration of the Crest Factor for OFDM Signals Igal Sason Department of Electrical Engineering Technion - Israel Institute of Technology, Haifa 3, Israel E-mail: sason@eetechnionacil Abstract
More informationProduct measure and Fubini s theorem
Chapter 7 Product measure and Fubini s theorem This is based on [Billingsley, Section 18]. 1. Product spaces Suppose (Ω 1, F 1 ) and (Ω 2, F 2 ) are two probability spaces. In a product space Ω = Ω 1 Ω
More informationCHAPTER 3: LARGE SAMPLE THEORY
CHAPTER 3 LARGE SAMPLE THEORY 1 CHAPTER 3: LARGE SAMPLE THEORY CHAPTER 3 LARGE SAMPLE THEORY 2 Introduction CHAPTER 3 LARGE SAMPLE THEORY 3 Why large sample theory studying small sample property is usually
More informationCALCULUS JIA-MING (FRANK) LIOU
CALCULUS JIA-MING (FRANK) LIOU Abstract. Contents. Power Series.. Polynomials and Formal Power Series.2. Radius of Convergence 2.3. Derivative and Antiderivative of Power Series 4.4. Power Series Expansion
More informationOn second order sufficient optimality conditions for quasilinear elliptic boundary control problems
On second order sufficient optimality conditions for quasilinear elliptic boundary control problems Vili Dhamo Technische Universität Berlin Joint work with Eduardo Casas Workshop on PDE Constrained Optimization
More information6.1 Moment Generating and Characteristic Functions
Chapter 6 Limit Theorems The power statistics can mostly be seen when there is a large collection of data points and we are interested in understanding the macro state of the system, e.g., the average,
More informationLARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS. S. G. Bobkov and F. L. Nazarov. September 25, 2011
LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS S. G. Bobkov and F. L. Nazarov September 25, 20 Abstract We study large deviations of linear functionals on an isotropic
More informationProbability: Handout
Probability: Handout Klaus Pötzelberger Vienna University of Economics and Business Institute for Statistics and Mathematics E-mail: Klaus.Poetzelberger@wu.ac.at Contents 1 Axioms of Probability 3 1.1
More informationIf g is also continuous and strictly increasing on J, we may apply the strictly increasing inverse function g 1 to this inequality to get
18:2 1/24/2 TOPIC. Inequalities; measures of spread. This lecture explores the implications of Jensen s inequality for g-means in general, and for harmonic, geometric, arithmetic, and related means in
More informationHARMONIC ANALYSIS. Date:
HARMONIC ANALYSIS Contents. Introduction 2. Hardy-Littlewood maximal function 3. Approximation by convolution 4. Muckenhaupt weights 4.. Calderón-Zygmund decomposition 5. Fourier transform 6. BMO (bounded
More informationProbability A exam solutions
Probability A exam solutions David Rossell i Ribera th January 005 I may have committed a number of errors in writing these solutions, but they should be ne for the most part. Use them at your own risk!
More informationGeneralization Bounds for the Area Under an ROC Curve
Generalization Bounds for the Area Under an ROC Curve Shivani Agarwal, Thore Graepel, Ralf Herbrich, Sariel Har-Peled and Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign
More informationLectures on Integration. William G. Faris
Lectures on Integration William G. Faris March 4, 2001 2 Contents 1 The integral: properties 5 1.1 Measurable functions......................... 5 1.2 Integration.............................. 7 1.3 Convergence
More informationLecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity;
CSCI699: Topics in Learning and Game Theory Lecture 2 Lecturer: Ilias Diakonikolas Scribes: Li Han Today we will cover the following 2 topics: 1. Learning infinite hypothesis class via VC-dimension and
More informationProblem Set 1 October 30, 2017
1. e π can be calculated from e x = x n. But that s not a good approximation method. The n! reason is that π is not small compared to 1. If you want O(0.1) accuracy, then the last term you need to include
More informationSTA 711: Probability & Measure Theory Robert L. Wolpert
STA 711: Probability & Measure Theory Robert L. Wolpert 6 Independence 6.1 Independent Events A collection of events {A i } F in a probability space (Ω,F,P) is called independent if P[ i I A i ] = P[A
More informationContinuous Random Variables
1 / 24 Continuous Random Variables Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering Indian Institute of Technology Bombay February 27, 2013 2 / 24 Continuous Random Variables
More informationGeneralization Bounds in Machine Learning. Presented by: Afshin Rostamizadeh
Generalization Bounds in Machine Learning Presented by: Afshin Rostamizadeh Outline Introduction to generalization bounds. Examples: VC-bounds Covering Number bounds Rademacher bounds Stability bounds
More information1 + lim. n n+1. f(x) = x + 1, x 1. and we check that f is increasing, instead. Using the quotient rule, we easily find that. 1 (x + 1) 1 x (x + 1) 2 =
Chapter 5 Sequences and series 5. Sequences Definition 5. (Sequence). A sequence is a function which is defined on the set N of natural numbers. Since such a function is uniquely determined by its values
More informationIntroduction to Self-normalized Limit Theory
Introduction to Self-normalized Limit Theory Qi-Man Shao The Chinese University of Hong Kong E-mail: qmshao@cuhk.edu.hk Outline What is the self-normalization? Why? Classical limit theorems Self-normalized
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 9 10/2/2013. Conditional expectations, filtration and martingales
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 9 10/2/2013 Conditional expectations, filtration and martingales Content. 1. Conditional expectations 2. Martingales, sub-martingales
More informationA Note on the Central Limit Theorem for a Class of Linear Systems 1
A Note on the Central Limit Theorem for a Class of Linear Systems 1 Contents Yukio Nagahata Department of Mathematics, Graduate School of Engineering Science Osaka University, Toyonaka 560-8531, Japan.
More information