Concentration of Measure with Applications in Information Theory, Communications, and Coding

Size: px
Start display at page:

Download "Concentration of Measure with Applications in Information Theory, Communications, and Coding"

Transcription

1 Concentration of Measure with Applications in Information Theory, Communications, and Coding Maxim Raginsky (UIUC) and Igal Sason (Technion) A Tutorial, Presented at 2015 IEEE International Symposium on Information Theory (ISIT) Hong Kong, June 2015 Part 2 of 2

2 The Plan 1. Prelude: The Chernoff bound 2. The entropy method 3. Logarithmic Sobolev inequalities 4. Transportation cost inequalities 5. Some applications

3 Given: X 1, X 2,..., X n : independent random variables Z = f(x n ), for some real-valued f Problem: derive sharp bounds on the deviation probabilities Benchmark: X 1,..., X n i.i.d. N(0, σ 2 ) P[Z EZ t], for t 0 Z = f(x n ) = 1 n (X X n ) sample mean Goal: P[Z t] exp ) ( nt2 2σ 2 extend to other distributions besides Gaussians extend to nonlinear f

4 Prelude: The Chernoff Bound

5 Review: The Chernoff Bound Define: logarithmic moment-generating function ψ(λ) log E[e λ(z EZ) ] its Legendre dual ψ (t) sup {λt ψ(λ)} λ 0 P[Z EZ t] = P[e λ(z EZ) e λt ] e λt E[e λ(z EZ) ] = e {λt ψ(λ)} Markov s inequality Optimize over λ: Sanity check: P[Z EZ t] e ψ (t) Z N(0, σ 2 ) ψ(λ) = λ2 σ 2 2 P[Z t] e t2 /2σ 2 ψ (t) = t2 2σ 2

6 Chernoff Bound: Subgaussian Random Variables Definition. A real-valued r.v. Z is σ 2 -subgaussian if ψ(λ) λ2 σ 2 Immediate: if Z is σ 2 -subgaussian, then ψ (t) = sup {λt ψ(λ)} λ 0 sup λ 0 = t 2 /2σ 2 giving the Gaussian tail bound 2 { λt λ 2 σ 2 /2 } P[Z EZ t] e t2 /2σ 2 How do we establish subgaussianity?

7 Review: Hoeffding s Lemma Any almost surely bounded r.v. is subgaussian: If there exist < a b < such that Z [a, b] a.s., then ψ(λ) λ2 (b a) 2. 8 Corollary. If Z [a, b] a.s., then ) P[Z EZ t] exp ( 2t2 (b a) 2 Wassily Hoeffding Proof (of Corollary). By Hoeffding s lemma, Z is subgaussian with σ 2 = (b a) 2 /4.

8 Hoeffding s Lemma: An Alternative Proof 1. Assume without loss of generality that EZ = Compute the first two derivatives of ψ: ψ (λ) = E[ZeλZ ] E[e λz ] ψ (λ) = E[Z2 e λz ] E[e λz ] ( E[Ze λz ) 2 ] E[e λz ] 3. Tilted distribution: P = L(Z) Q dq dp (Z) = eλz E P [e λz ]. Then ψ (λ) = E Q [Z], ψ (λ) = Var Q [Z]. 4. Z [a, b] P -a.s. = Z [a, b] Q-a.s. = Var Q [Z] (b a) Calculus: ψ(λ) = λ τ 0 0 ψ (ρ) dρ dτ λ2 (b a) 2. 8

9 The Entropy Method

10 Exponential Tilting and (Relative) Entropy Back to our setting: Z = f(x), X an arbitrary r.v. Want to prove subgaussianity of Z, so need to analyze ψ(λ) = log E[e λ(z EZ) ] = log E[e λ(f(x) Ef(X)) ]. Let P = L(X), introduce tilted distribution P λf : dp λf dp (X) = eλf(x) E[e λf(x) ]. We will relate ψ(λ) to the relative entropy D(P λf P ).

11 Exponential Tilting and (Relative) Entropy Tilting of P = L(X): dp λf dp (X) = Relative entropy: D(P λf P ) = = With a bit of foresight, eλf(x) E[e λf(x) ] eλ(f(x) Ef(X)) e ψ(λ). dp λf log dp λf dp dp λf (λ(f E P f) ψ(λ)) = λe P [(f E P f)e λ(f E P f) ] ψ(λ) e ψ(λ) = λψ (λ) ψ(λ) λψ (λ) ψ(λ) = λ 2 ( ψ (λ) λ ψ(λ) ) λ 2 = λ 2 d dλ ( ) ψ(λ) λ

12 The Herbst Argument Tilting of P = L(X): Relative entropy: dp λf dp (X) = D(P λf P ) = λ 2 d dλ eλf(x) E[e λf(x) ] ( ) ψ(λ) Since lim ψ(λ)/λ = 0 (by l Hôpital), we have λ 0 ψ(λ) = λ λ 0 λ D(P ρf P ) ρ 2 dρ. Suppose now that P = L(X) and f are such that for some σ 2. Then D(P ρf P ) ρ2 σ 2 ψ(λ) λ λ 0 2, ρ 0 ρ 2 σ 2 2ρ 2 dρ = λ2 σ 2 2.

13 The Herbst Argument Lemma (Herbst, 1975). Suppose that Z = f(x) is such that D(P λf P ) λ2 σ 2, λ 0. 2 Then Z is σ 2 -subgaussian, and so P[f(X) Ef(X) t] e t2 /2σ 2, t 0. Ira Herbst For the sake of completeness, we... prove a theorem which states roughly that the potential of an intrinsically hypercontractive Schrödinger operator must increase at least quadratically at infinity.... [I]ts complete proof requires information in an unpublished letter of 1975 from I. Herbst to L. Gross. [C]ertain steps in the argument... can be written down abstractly. from a 1984 paper by B. Simon and E.B. Davies

14 The Herbst Converse Lemma (R. van Handel, 2014). Suppose Z = f(x) is σ 2 /4-subgaussian. Then D(P λf P ) λ2 σ 2, λ 0. 2 Proof. Let Ẽ[ ] E P λf [ ]. ] D(P λf λf P ) = [log Ẽ dp (Jensen) [ ] dp λf log dp Ẽ dp [ ] e λ(f(x) Ef(X)) = log Ẽ E[e λ(f(x) Ef(X)) ] [ = log E e 2λ(f Ef)] { log E[e λ(f Ef) ] }{{} 1 (2λ)2 σ 2 /4 2 = λ2 σ 2 2 } 2

15 The Herbst Argument: What Is It Good For? Subgaussianity of Z = f(x) is equivalent to D(P λf P ) = O(λ 2 ), but what does that give us? Recall: we are interested in high-dimensional settings Z = f(x n ) = f(x 1,..., X n ) X 1,..., X n independent r.v. s Thus, P is a product measure: P = P 1 P 2... P n, P i L(X i ) The relative entropy tensorizes!! we can break the (hard) n-dimensional problem into n (hopefully) easier 1-dimensional problems.

16 Tensorization X 1,..., X n independent r.v. s P = L(X n ) = P 1... P n, P i = L(X i ) Recall the Efron Stein Steele inequality: Var P [f(x n )] n E [ Var P [f(x n ) X i ] ] i=1 tensorization of variance Tensorization of relative entropy: for an arbitrary probability measure Q on X n, D(Q P ) n D(Q Xi X i P X i X i Q X i) }{{} conditional divergence i=1 independence of the X i s is key

17 Tensorization: A Quick Proof D(Q P ) n = D(Q Xi X i 1 P X i X i 1 Q X i 1) i=1 n [ = E Q log dq ] X i X i 1 dp i=1 Xi X i 1 [ n = E Q log dq ] X i X i 1 + dq i=1 Xi X i n [ = E Q log dq ] X i X i + dq i=1 Xi X i 1 n = i=1 n i=1 (chain rule) [ E Q log dq ] X i X i dp Xi X i 1 [ n E Q log dq ] X i X i dp Xi X i i=1 n D(Q Xi X i Q X i 1 Q i X X i) + D(Q Xi X i P X i Q Xi) n D(Q Xi X i P X i Q Xi) i=1 i=1 (independence) D (Q P ) erasure divergence (Verdú Weissman, 2008)

18 Tensorization and Tilting Recall: we are interested in D(Q P ), where P = P 1... P n, dq = eλf E P [e λf ] dp Then dq Xi ( x i ) = P i (dx i ) eλf(x1,...,xi 1,xi,xi+1,...,xn) dp Xi X E P [e λf(xn ) ] = E P [e λf(xn) X i = x i ] E P [e λf(xn ) ] therefore dq Xi X i = x i dp Xi (x i ) = eλf(x1,...,xi 1,xi,xi+1,...,xn) E P [e λf(x1,...,xi 1,Xi,xi+1,...,xn) ] remember, x i is fixed.

19 Tensorization and Tilting dp λf X i X i = x i dp Xi (x i ) = eλf(x1,...,xi 1,xi,xi+1,...,xn) E Pi [e λf(x 1,...,x i 1,X i,x i+1,...,x n) ] For a fixed x i, define the function f i ( x i ) : X R f i (x i x i ) = f(x 1,..., x i 1, x i, x i+1,..., x n ) = f i (x i ) (just shorthand, always remember x i ) Now observe that Q Xi X i = x i is the tilting of P i P Xi : dq Xi X i = x i = eλf i E Pi [e λf i ] dp i

20 Tensorization and Tilting Lemma. If X 1,..., X n are independent r.v. s with joint law P = P 1... P n, where P i = L(X i ), then for any f : X n R D(P λf P ) n i=1 [ ] Ẽ D(P λf i i P i ), where Ẽ[ ] E P λf [ ] and f i( ) = f i ( X i ) for each i. Proof. D(P λf P ) = = n D(P λf X i X P i Xi P λf ) X i λf n dpx Ẽ log i X i i=1 i=1 [ n Ẽ log i=1 dp Xi dp λfi i dp i ] = n i=1 [ ] Ẽ D(P λfi i P i )

21 The Entropy Method: Divide and Conquer Want to derive a subgaussian tail bound P[f(X n ) Ef(X n ) t] e t2 /2σ 2, t 0 where X 1,..., X n are independent r.v. s. Suppose we can prove there exist constants c 1,..., c n, such that Then ( ) D(P λf P ) (tensor.) D(P λfi i P i ) λ2 c 2 i, i {1,..., n}. 2 λ2 n i=1 c2 i 2 (Herbst) = σ 2 = Now we just need to prove ( )!! n i=1 c 2 i

22 Logarithmic Sobolev Inequalities

23 Log-Sobolev in a Nutshell Goal: control the relative entropy D(P λf P ). A log-sobolev inequality ties together: (i) the underlying probability measure P (ii) a function class A (containing f of interest) (iii) an energy functional E : A R such that and looks like this: E(αf) = αe(f), f A, α 0 D(P f P ) c 2 E2 (f), f A. In that case, if E(f) L, then D(P λf P ) c 2 E2 (λf) = c 2 λ2 E 2 (f) λ2 cl 2 The name comes from an analogy with Sobolev inequalities in functional analysis. 2

24 The Bernoulli Log-Sobolev Inequality Theorem (Gross, 1975). Let X 1,..., X n be i.i.d. Bern(1/2) random variables. Then, for any function f : {0, 1} n R, D(P f P ) 1 8 E[ Df(X n ) 2 e f(xn) ] E[e f(xn ), ] where Df(x n n ) f(x n ) f( x n e i ) }{{} 2, i=1 flip ith bit and E[ ] is w.r.t. P = (Bern(1/2)) n. Leonard Gross Remarks: This is not the original form of the inequality from Gross 1975 paper, but they are equivalent. Note that D(λf) = λd(f) for all λ 0.

25 Bernoulli LSI: Proof Sketch Consider first n = 1 and f : {0, 1} R with a = f(0) and b = f(1). In that case, the log-sobolev inequality reads e a e a + e b log 2e a e a + e b + eb e a + e b log 2e b e a + e b 1 8 (b a)2. Proof: Elementary (but tedious) exercise in calculus. Tensorization: n D(P f P ) Ẽ[D(P fi i P i )] where Ẽ[h(Xn )] = E[h(Xn )e f(x n) ] E[e f(xn ) ] i=1 1 n [ Ẽ f(x i 1, 0, X n 8 i+1) f(x i 1, 1, Xi+1) n 2] i=1 = 1 [ Df(X n ) 2] = 1 E[ Df(X n ) 2 e f(xn) ] 8Ẽ 8 E[e f(xn ) ]

26 The Gaussian Log-Sobolev Inequality Theorem (Gross, 1975). Let X 1,..., X n be i.i.d. N(0, 1) random variables. Then, for any smooth function f : R n R, [ ] D(P f P ) 1 E f(x n ) 2 2 ef(xn ) 2 E[e f(xn), ] where all expectations are w.r.t. P = L(X n ) = N(0, I n ). Leonard Gross Remarks: This is not the original form of the inequality from Gross 1975 paper, but they are equivalent. Equivalent forms of Gaussian LSI have been obtained independently by A. Stam (1959) and by P. Federbush (1969). The contribution of Stam (via the entropy power inequality) was first pointed out by E.A. Carlen in 1991.

27 Proof(s) of the Gaussian LSI There are many ways of proving the Gaussian log-sobolev inequality. Original proof by Gross: apply the Bernoulli LSI to ( ) X X n n/2 i.i.d. f, X i Bern(1/2) n/4 then use the Central Limit Theorem: X X n n/2 n/4 N(0, 1) as n via Markov semigroups via hypercontractivity (E. Nelson) via Stam s inequality for entropy power and Fisher info. via I-MMSE relation (cf. Raginsky and Sason)...

28 Application of Gaussian LSI Theorem (Tsirelson Ibragimov Sudakov, 1976). Let X 1,..., X n be independent N(0, 1) random variables. Let f : R n R be a function which is L-Lipschitz: f(x n ) f(y n ) L x n y n 2, x n, y n R n. Then Z = f(x n ) is L 2 -subgaussian: log E[e λ(f(xn ) Ef(X n )) ] λ2 L 2, λ 0. 2 Remarks: The original proof did not rely on the Gaussian LSI. It is a striking result: f can be an arbitrary nonlinear function, and the subgaussian constant is independent of the dimension n.

29 Tsirelson Ibragimov Sudakov: Proof via LSI By an approximation argument, can assume that f is differentiable. Since it is L-Lipschitz, f 2 L. By the Gaussian LSI, for any λ 0, [ ] D(P λf P ) 1 E λ f(x n ) 2 2 eλf(xn ) 2 E [ e ] λf(xn ) [ ] = λ2 E f(x n ) 2 2 eλf(xn ) 2 E [ e ] λf(xn ) By the Herbst argument, λ2 L 2 2. log E [e λ(f(xn ) Ef(X n )) ] λ2 L 2 2.

30 A Gaussian Concentration Bound The Tsirelson Ibragimov Sudakov inequality gives us Corollary. Let X 1,..., X n i.i.d. N(0, 1), and let f : R n R be an L-Lipschitz function. Then P [f(x n ) Ef(X n ) t] e t2 /2L 2. Proof. Use the Chernoff bound. Remarks: This is an example of dimension-free concentration: the tail bound does not depend on n. Applying the same result to f and using the union bound, we get P [ f(x n ) Ef(X n ) t] 2e t2 /2L 2.

31 Deriving Log-Sobolev (1) Are there systematic ways to derive log-sobolev? The usual (probabilistic) approach (a subtle art): Construct a continuous-time Markov process {X t } t 0 with stationary distribution P and Markov generator Lf(x) lim t 0 E[f(X t ) X 0 = x] f(x) t Use the structure of L to obtain an inequality of the form D(P f P ) c E(e f, f) 2 E P [e f ], where E(g, h) E P [f(x)lg(x)] is the Dirichlet form. Extract Γ by looking for a bound of the form E(e f, f) E P [ Γf(X) 2 e f(x) ] Different choices of L (for the same P ) will yield different Γ s, and hence different log-sobolev inequalities.

32 Deriving Log-Sobolev (2) An alternative (information-theoretic) approach: based on a recent paper of A. Maurer (2012). Exploits a representation of D(P λf P ) in terms of the variance of f(x) under the tilted distributions dp sf = esf E P [e sf ] dp. An interpretation in terms of statistical physics: think of f as energy and of s 0 as inverse temperature. Then Var sf [f(x)] E P [f 2 (X)e sf(x) ] E P [e sf(x) ] ( ) 2 E P [f(x)e sfx() ] E P [e sf(x) ] gives the thermal fluctuations of f at temp. T = 1/s. Infinite-temperature limit (T ): recover Var P [f(x)].

33 Entropy via Thermal Fluctuations Theorem (A. Maurer, 2012). Let X be a random variable with law P. Then for any real-valued function f and any λ 0 D(P λf P ) = λ λ 0 t Var sf [f(x)] ds dt, where Var sf [f(x)] is the variance of f(x) under the tilted distribution P sf. Andreas Maurer Recall: Var sf [f(x)] = E[f 2 (X)e sf(x) ] E[e sf(x) ] where ψ(s) = log E[e s(f(x) Ef(X)) ] ( E[f(X)e sf(x) ) 2 ] ψ (s) E[e sf(x) ]

34 Proof Recall D(P λf P ) = λψ (λ) ψ(λ), where ψ(λ) = log E[e λ(f Ef) ]. Since ψ(0) = ψ (0) = 0, we have λψ (λ) = ψ(λ) = Substitute: λ 0 λ 0 ψ (λ) dt ψ (t) dt. D(P λf P ) = = = λ 0 λ λ 0 t λ λ 0 t (ψ (λ) ψ (t)) dt ψ (s) ds dt Var sf [f(x)] ds dt.

35 From Thermal Fluctuations to Log-Sobolev Theorem. Let A be a class of functions of X, and suppose that there is a mapping Γ : A R, such that: 1. For all f A and α 0, Γ(αf) = αγ(f). 2. There exists a constant c > 0, such that Then Var λf [f(x)] c Γ(f) 2, f A, λ 0. D(P λf P ) λ2 c Γ(f) 2, f A, λ 0. 2 Proof. D(P λf P ) c Γ(f) 2 λ 0 λ t ds dt = c Γ(f) 2 λ 2 2

36 From Thermal Fluctuations to Log-Sobolev: Example 1 Let s use Maurer s method to derive the Bernoulli LSI. For any f : {0, 1} R, define Γ(f) f(0) f(1). Since f is obviously bounded, for every s 0 we have Finally, Var sf [f(x)] (f(0) f(1))2 4 Γf 2 4. D(P f P ) = t Γ(f)2 4 = 1 8 Γf 2. Var sf [f(x)] ds dt t ds dt For n > 1, use tensorization.

37 From Thermal Fluctuations to Log-Sobolev: Example 2 Let s use Maurer s method to derive McDiarmid s inequality. We will use tensorization, so let s first consider n = 1. We are interested in all functions f : X R, such that for some c <. sup f(x) inf f(x) c x X x X Define Γ(f) sup x X f(x) inf x X f(x). Any f satisfies f(x) [inf f, sup f]. If Γ(f) <, then [inf f, sup f] is a bounded interval. In that case, for any P, Var sf [f(x)] (sup f inf f)2 4 = Γ(f) 2. 4 Using the integral representation of the divergence, we get D(P λf P ) λ2 c 2, if sup f inf f c. 8

38 Proof of McDiarmid (cont.) So far, we have obtained ( ) D(P λf P ) λ2 c 2, if sup f inf f c. 8 Let X i P i, 1 i n, be independent r.v. s. Consider f : X n R that has bounded differences: ( ) sup sup f(x i 1, x i, x n i+1) inf f(x i 1, x i, x n x i x i x i i+1) c i for all i, for some constants 0 c 1,..., c n <. For each i, apply ( ) to f i ( ) f(x i 1,, x n i+1 ): D(P λf i i P i ) 1 8 ( [recall, f i ( ) depends on x i ]. sup f(x i 1, x i, x n i+1) inf f(x i 1, x i, x n x i x i i+1) ) 2

39 Proof of McDiarmid (cont.) Now we tensorize: for P = P 1... P n, D(P λf P ) n [ ] Ẽ D(P λf i i P i ) i=1 [ n λ2 ( 8 Ẽ sup i=1 x i f(x i 1, x i, X n i+1) inf x i f(x i 1, x i, X n i+1) } {{ } = Γ(f)(X n ) 2 ) 2 ]

40 Theorem. Let X 1,..., X n X be independent r.v. s with joint law P = P 1... P n. Then, for any function f : X n R, D(P f P ) 1 8 Γf 2, where { n } ( ) 1/2 2 Γf(x n ) = sup f(x i 1, x i, x n i+1) inf f(x i 1, x i, x n x i=1 i x i i+1) }{{} =Γ if( x i ) Remarks: McDiarmid: if f has bounded differences with c 1,..., c n, then n i=1 c2 i f(x n ) is -subgaussian 4 Since Γf 2 n i=1 Γi f 2, the above theorem is stronger than McDiarmid.

41 Transportation-Cost Inequalities

42 Concentration and the Lipschitz Property A common theme: Let f : R n R be 1-Lipschitz w.r.t. the Euclidean norm: f(x n ) f(y n ) x n y n 2. Let X 1,..., X n be i.i.d. N(0, 1) r.v. s. Then P [ f(x n ) Ef(X n ) t] 2e t2 /2. Let f : X n R be 1-Lipschitz w.r.t. a weighted Hamming metric: f(x n ) f(y n ) d c (x n, y n ), n d c (x n, y n ) c i 1{x i y i } i=1 (this is equivalent to bounded differences). Let X 1,..., X n be independent r.v. s. Then ) P [ f(x n ) Ef(X n ) t] 2 exp ( 2t2 n. i=1 c2 i

43 The Setting: Probability in Metric Spaces Let (X, d) be a metric space. A function f : X R is L-Lipschitz (w.r.t. d) if f(x) f(y) Ld(x, y), x, y X. Notation: Lip L (X, d) the class of all L-Lipschitz functions. Question: What conditions does a probability measure P on X have to satisfy, so that f(x), X P, is σ 2 -subgaussian for every f Lip 1 (X, d)?

44 Some Definitions A coupling of two probability measures P and Q on X is any probability measure π on X X, such that (X, Y ) π = X P, Y Q. Π(P, Q): family of all couplings of P and Q. Definition. For p 1, the L p Wasserstein distance between P and Q is given by W p (P, Q) inf (E π[d p (X, Y )]) 1/p. π Π(P,Q) Kantorovich Rubinstein formula. For any two P, Q, W 1 (P, Q) = sup E P [f] E Q [f]. f Lip 1 (X,d)

45 Optimal Transportation Monge-Kantorovich Optimal Transportation Problem. Given two probability measures P, Q on a common space X and a cost function c : X X R +, minimize E π [c(x, Y )] over all couplings π Π(P, Q). Interpretation: P and Q: initial and final distributions of some material (say, sand) in space c(x, y): cost of transporting a grain of sand from location x to location y π(dy x): randomized strategy for transporting from location x Wasserstein distances: transportation cost is some power of a metric Gaspard Monge Leonid Kantorovich

46 Transportation Cost Inequalities Definition. A probability measure P on a metric space (X, d) satisfies an L p transportation cost inequality with constant c, or T p (c) for short, if W p (P, Q) 2cD(Q P ), Q. Preview: We will be primarily concerned with T 1 (c) and T 2 (c) inequalities. T 1 (c) is easier to work with, while T 2 (c) has strong properties (dimension-free concentration).

47 Examples of TC Inequalities: 1 X: arbitrary space with trivial metric d(x, y) = 1{x y} Then W 1 (P, Q) = inf E π[1{x Y }] π Π(P,Q) inf P π[x Y ] π Π(P,Q) = P Q TV total variation distance (proof by explicit construction of optimal coupling) Any P satisfies T 1 (1/4): P Q TV 1 2 D(Q P ) (Csiszár Kemperman Kullback Pinsker)

48 Examples of TC Inequalities: 2 X = {0, 1} with trivial metric d(x, y) = 1{x y} P = Bern(p) Distribution-dependent refinement of Pinsker: P Q TV 2c(p)D(Q P ), where c(p) = p p 2(log p log p), p 1 p (Ordentlich Weinberger, 2005) Thus, P = Bern(p) satisfies T 1 (c(p)), and the constant is optimal: D(Q P ) inf Q Q P 2 = 1 TV 2c(p).

49 Examples of TC Inequalities: 3 X = R n with d(x, y) = x y 2 The Gaussian measure P = N(0, I n ) satisfies T 2 (1): W 2 (P, Q) 2D(Q P ) (Talagrand, 1996) Note: the constant is independent of n!! Michel Talagrand

50 Transportation and Concentration Theorem (Bobkov Götze, 1999). Let P be a probability measure on a metric space (X, d). Then the following two statements are equivalent: 1. f(x), X P, is σ 2 -subgaussian for every f Lip 1 (X, d). 2. P satisfies T 1 (c) on (X, d): W 1 (P, Q) 2σ 2 D(Q P ), Q P. Sergey Bobkov Friedrich Götze Remarks: Connection between concentration and transportation inequalities was first pointed out by Marton (1986). Remarkable result: concentration phenomenon for Lipschitz functions can be expressed purely in terms of probabilistic and metric structures.

51 Proof of Bobkov Götze (Ultra-light version, due to R. van Handel) Statement 1 of the theorem is equivalent to ( ) sup sup {log E P [e ] λ(f E P f) λ2 σ 2 } 0. 2 λ 0 f Lip 1 (X,d) Use Gibbs variational principle: log E P [e h ] = sup {E Q [h] D(Q P )}, h s.t. e h L 1 (P ) Q (supremum achieved by the tilted distribution Q = P h ). Then ( ) is equivalent to ( ) sup sup sup {λ (E Q [f] E P [f]) D(Q P ) λ2 σ 2 } 0 Q 2 λ 0 f Lip 1 (X,d)

52 Proof of Bobkov Götze (cont.) ( ) sup sup λ 0 f Lip 1 (X,d) sup Q {λ (E Q [f] E P [f]) D(Q P ) λ2 σ 2 } 0 2 Interchange the order of suprema: sup sup λ 0 f Lip 1 (X,d) sup Q [...] = sup Q sup sup λ 0 f Lip 1 (X,d) Then ( ) sup sup {λw 1 (P, Q) D(Q P ) λ2 σ 2 } 0 Q λ 0 2 (by Kantorovich Rubinstein) { } W1 (P, Q) sup Q 2σ 2 D(Q P ) 0 (optimize over λ) [...]

53 Tensorization of Transportation Inequalities At first sight, all we have is another equivalent characterization of concentration of Lipschitz functions. However, transportation inequalities tensorize!! Proof of tensorization is through a beautiful result on couplings by Katalin Marton.

54 The Marton Coupling Theorem (Marton, 1986). Let (X i, P i ), 1 i n, be probability spaces. Let w i : X i X i R +, 1 i n, be positive weight functions, and let ϕ : R + R + be a convex function. Suppose that, for each i, inf ϕ (E π[w i (X, Y )]) 2σ 2 D(Q P i ), Q. π Π(P i,q) Then the following holds for the product measure P = P 1... P n on the product space X = X 1... X n : inf π Π(P,Q) i=1 n ϕ (E π [w i (X i, Y i )]) 2σ 2 D(Q P ) Katalin Marton for every Q on X. Proof idea: Chain rule for relative entropy + conditional coupling + induction.

55 Tensorization of Transportation Cost Inequalities Theorem. Let (X i, P i, d i ), 1 i n, be probability metric spaces. If for some 1 p 2 each P i satisfies T p (c) on (X i, d i ), then the product measure P = P 1... P n on X = X 1... X n satisfies T p (cn 2/p 1 ) w.r.t. the metric ( n 1/p d p (x n, y n ) d p i (x i, y i )). i=1 Remarks: If each P i satisfies T 1 (c), then P = P 1... P n satisfies T 1 (cn) with respect to the metric i d i. Note: constant deteriorates with n. If each P i satisfies T 2 (c), then P satisfies T 2 (c) with respect to. Note: constant is independent of n. i d2 i

56 Proof of Tensorization By hypothesis, for each i, inf (E π[d p i (X, Y )])2/p 2cD(Q P i ), π Π(P i,q) }{{} Wp,d 2 (P i i,q) Q. 1 p 2 = ϕ(u) = u 2/p is convex. Take w i = d p i. Then inf ϕ (E π[w i (X, Y )]) 2cD(Q P i ), π Π(P i,q) Q. By Marton s coupling, ( ) inf π Π(P,Q) i=1 n (E π [d p i (X i, Y i )]) 2/p 2cD(Q P ), Q.

57 Proof of Tensorization (cont.) We have shown that if P i satisfies T p (c) w.r.t. d i, for each i, then ( ) inf π Π(P,Q) i=1 n (E π [d p i (X i, Y i )]) 2/p 2cD(Q P ), Q. We will now prove LHS( ) n 1 2/p Wp,d 2 p (P, Q). For any π Π(P, Q), ( [ n E π d p i (X i, Y i ) i=1 ]) 2/p ( n 2/p E π [d p i (X i, Y i )]) (concavity of t t 1/p ) i=1 n 2/p 1 n (E π [d p i (X i, Y i )]) 2/p (convexity of t t 2/p ) i=1 Take infimum over all π Π(P, Q), and we are done.

58 (Yet Another) Proof of McDiarmid s Inequality Product probability space: (X 1... X n, P 1... P n ) Given c 1,..., c n 0, equip X i with d ci (x i, y i ) c i 1{x i y i }. Then any P i on X i satisfies T 1 (c 2 i /4): W 1,dci (P i, Q) c i P i Q TV c 2 i 2 D(Q P i) (by rescaling Pinsker). By Marton coupling, P = P 1... P n satisfies T 1 with constant (1/4) n i=1 c2 i with respect to the metric n n d c (x n, y n ) d ci (x i, y i ) = c i 1{x i y i }. i=1 By Bobkov Götze, this is equivalent to subgaussian property of all f Lip 1 (X 1... X n, d c ): ) P[ f(x n ) Ef(X n ) t] 2 exp ( 2t2 n, f Lip 1 (d c ) i=1 c2 i i=1 } {{ } McDiarmid } {{ } bdd. diff.

59 Some Applications

60 The Blowing-Up Lemma Consider a product space Y n equipped with Hamming metric d(y n, z n ) = n i=1 1{y i z i } For a set A Y n and for r {0, 1,..., n}, define its r-blowup { } [A] r z n Y n : min y n A d(zn, y n ) r The following result, in a different (asymptotic) form was first proved by Ahlswede Gács Körner (1976); a simple proof was given by Marton (1986): Let Y 1,..., Y n be independent r.v. s taking values in Y. Then for every set A Y n with P Y n(a) > 0 ( ) 2 P Y n {[A] r } 1 exp 2 n r n 2 log 1 P Y n(a) + Informally, any set in a product space can be blown up to engulf most of the probability mass.

61 Marton s Proof Let P i = L(Y i ), 1 i n; P = P 1... P n L(Y n ). By tensorization, P satisfies the TC inequality n ( ) W 1 (P, Q) 2 D(Q P ), Q P(Yn ), where W 1 (P, Q) = inf π Π(P,Q) E π [ n ] 1{Y i Z i }, Y n P, Z n Q i=1 For any B Y n with P (B) > 0, consider conditional distribution P B ( ) P ( B). P (B) 1 Then D(P B P ) = log P (B), and therefore n ( ) W 1 (P, P B ) 2 log 1 P (B).

62 Marton s Proof ( ) W 1 (P, P B ) n 2 log 1 P (B) Apply ( ) to B = A and B = [A] c r: W 1 (P, P A ) n 2 log 1 P (A), W 1(P, P [A] c r ) n 2 log 1 1 P ([A] r ) Then n 2 log 1 P (A) + n 2 log 1 1 P ([A] r ) W 1(P A, P ) + W 1 (P, P [A] c r ) W 1 (P A, P [A] c r ) min d(y n, z n ) y n A,z n [A] c r r. (def. of [ ] r) (triangle ineq.) Rearrange to finish the proof.

63 The Blowing-Up Lemma: Consequences Consider a DMC (X, Y, T ) with input alphabet X, output alphabet Y, transition probabilities T (y x), (x, y) X Y (n, M, ε)-code C: encoder f : {1,..., M} X n, decoder g : Y n {1,..., M} with max P[g(Y n ) j X n = f(j)] ε. 1 j M Equivalently, C = {(u j, D j )} M j=1, where uj = f(j) X n codewords Dj = g 1 (j) = {y n Y n : g(y n ) = j} decoding sets T n (D j u j ) 1 ε, j = 1,..., M. Lemma. There exists δ n > 0, such that T n( ) X [D j ] n nδn = u j 1 1, j = 1,..., M. n

64 Proof Choose δ n = 1 n n ( ) log n 1 2n + 2n log 1 1 ε For each j, apply Blowing-Up Lemma to the product measure n P j (y n ) = T (y i u j (i)), i=1 where u j = (u j (1),..., u j (n)) = f(j). }{{} jth codeword With r = nδ n, this gives T n( ) X [D j ] n nδn = u j 1 exp 2 n ( nδ n ) 2 n 2 log 1 1 ε n.

65 From Blowing-Up Lemma to Strong Converses Informally, the Blowing-Up Lemma shows that any bad code contains a good subcode (Ahlswede and Dueck, 1976). Consider an (n, M, ε)-code C = {(u j, D j )} M j=1. Each decoding set D j can be blown up to a set D j Y n with T n ( D j u j ) 1 1 n. The object C = {(u j, D j )} M j=1 is not a code (since the sets D j are no longer disjoint), but a random coding argument can be used to extract an (n, M, ε ) subcode with M slightly smaller than M and ε < ε. Then one can apply the usual (weak) converse to the subcode. Similar ideas can be used in multiterminal settings (starting with Ahlswede Gács Körner).

66 Example: Capacity-Achieving Channel Codes The set-up DMC (X, Y, T ) with capacity C = C(T ) = max P X I(X; Y ) (n, M)-code: C = (f, g) with encoder f : {1,..., M} X n and decoder g : Y n {1,..., M} Capacity-achieving codes: A sequence {C n } n=1, where each C n is an (n, M n )-code, is capacity-achieving if lim n 1 n log M n = C.

67 Capacity-Achieving Channel Codes Capacity-achieving input and output distributions: P X arg max P X I(X; Y ) (may not be unique) P X T P Y (always unique) Theorem (Shamai Verdú, 1997). Let {C n } be any capacity-achieving code sequence with vanishing error probability. Then 1 ( ) lim n n D P (Cn) P Y n Y n = 0, where P (Cn) Y is the output distribution induced by the code n C n when the messages in {1,..., M n } are equiprobable.

68 Capacity-Achieving Channel Codes lim n 1 n D(P Y n P Y n) = 0 Main message: channel output sequences induced by good code resemble i.i.d. sequences drawn from the CAOD P Y Useful implications: estimate performance characteristics of good channel codes by their expectations w.r.t. P Y n = (P Y )n often much easier to compute explicitly bound estimation accuracy using large-deviation theory (e.g., Sanov s theorem) Question: what about good codes with nonvanishing error probability?

69 Codes with Nonvanishing Error Probability Y. Polyanskiy and S. Verdú, Empirical distribution of good channel codes with non-vanishing error probability (2012) 1. Let C = (f, g) be any (n, M, ε)-code for T : max P [ g(y n ) j X n = f(j) ] ε. 1 j M Then D ( P (C) ) P Y n Y n nc log M + o(n). 2. If {C n } n=1 is a capacity-achieving sequence, where each C n is an (n, M n, ε)-code for some fixed ε > 0, then 1 ( lim n n D P (Cn) Y n ) P Y n = 0. In some cases, the o(n) term can be improved to O( n).

70 Relative Entropy at the Output of a Code Consider a DMC T : X Y with T ( ) > 0, and let c(t ) = 2 max max T (y x) x X y,y Y log T (y x) Theorem (Raginsky Sason, 2013). Any (n, M, ε)-code C for T, where ε (0, 1/2), satisfies ( ) D P (C) P Y n Y n nc log M + log 1 n ε + c(t ) 2 log 1 1 2ε Remark: Polyanskiy and Verdú show that ( ) D P (C) P Y n Y n nc log M + a n for some constant a = a(ε).

71 Proof Sketch (1) Fix x n X n and study concentration of the function h x n(y n ) = log dp Y n X n =x n dp (C) around its expectation w.r.t. P Y n X n =x n: Y n (y n ) ( ) E[h x n(y n ) X n = x n P (C) ] = D P Y n X n =x n Y n Step 1: Because T ( ) > 0, the function h x n(y n ) is 1-Lipschitz w.r.t. scaled Hamming metric d(y n, ȳ n ) = c(t ) n 1{y i ȳ i } i=1

72 Proof Sketch (2) Step 1: Because T ( ) > 0, the function h x n(y n ) is 1-Lipschitz w.r.t. scaled Hamming metric d(y n, ȳ n ) = c(t ) n 1{y i ȳ i } Step 2: Any product probability measure µ on (Y n, d) satisfies i=1 log E µ [ e tf(y n ) ] nc(t )2 t 2 for any f with E µ f = 0 and F Lip 1. Proof: Tensorization of T 1 (Pinsker), followed by appeal to Bobkov Götze. 8

73 Proof Sketch (3) h x n(y n ) = log dp Y n X n =x n (y n ) dp (C) Y n ( E[h x n(y n ) X n = x n P (C) ] = D P Y n X n =x n Step 3: For any x n, µ = P Y n X n =x n Y n ) is a product measure, so ( ( ) ) ) P h x n(y n P (C) ) D P Y n X n =x n Y + r exp ( 2r2 n nc(t ) 2 Use this with r = c(t ) n 2 log 1 1 2ε : ( ( P h x n(y n P (C) ) D P Y n X n =x n ) n Y + c(t ) n 2 log 1 1 2ε ) 1 2ε Remark: Polyanskiy Verdú show Var[h x n(y n ) X n = x n ] = O(n).

74 Proof Sketch (4) Recall: ( ( P h x n(y n P (C) ) D P Y n X n =x n ) n Y + c(t ) n 2 log 1 1 2ε ) 1 2ε Step 4: Same as Polyanskiy Verdú, appeal to Augustin s strong converse (1966) to get log M log 1 ( ) P ε + D (C) P (C) n P Y n X n Y n X + c(t ) n 2 log 1 1 2ε ( ) D P (C) P Y n Y n ( ) ( P = D P Y n X n Y n P (C) P (C) P (C) X D P n Y n X n Y n nc log M + log 1 ε + c(t ) n 2 log 1 1 2ε X n )

75 Relative Entropy at the Output of a Code Theorem (Raginsky Sason, 2013). Let (X, Y, T ) be a DMC with C > 0. Then, for any 0 < ε < 1, any (n, M, ε)-code C for T satisfies D ( P (C) P Y n Y n) nc log M + 2n (log n) 3/2 ( log n + log ( 2 X Y 2). 1 log n log ( ) ) ( ε ) log Y log n Proof idea: Apply Blowing-Up Lemma to the code, then extract a good subcode. Remark: Polyanskiy and Verdú show that for some constant b > 0. D ( P (C) Y n P Y n) nc log M + b n log 3/2 n

76 Concentration of Lipschitz Functions Theorem (Raginsky Sason, 2013). Let (X, Y, T ) be a DMC with c(t ) <. Let d : Y n Y n R + be a metric, and suppose that P Y n X n =x n, xn X n, as well as P Y n, satisfy T 1(c) for some c > 0. Then, for any ε (0, 1/2), any (n, M, ε)-code C for T, and any function f : Y n R we have ( ) P (C) Y f(y n ) E[f(Y n )] r n ) (nc 4 ε exp ln M + a n 8c f 2 Lip where Y n P Y n, and a c(t ) 1 2 ln 1 1 2ε. r 2, r 0

77 Proof Sketch Step 1: For each x n X n, let φ(x n ) E[f(Y n ) X n = x n ]. Then, by Bobkov Götze, ( ) ( P f(y n ) φ(x n ) r X n = x n) 2 exp r2 2c f 2 Lip Step 2: By restricting to a subcode C with codewords x n X n satisfying φ(x n ) E[f(Y n )] + r, we can show that ( r f Lip 2c nc log M + a n + log 1 ), ε ) with M = MP (C) X (φ(x n ) E[f(Y n )] + r. Solve to get n ( ) P (C) X φ(x n ) E[f(Y n )] r 2e nc log M+a n+log 1 ε r 2 n 2c f 2 Lip Step 3: Apply union bound.

78 Empirical Averages at the Code Output Equip Y n with the Hamming metric n d(y n, ȳ n ) = 1{y i ȳ i } Consider functions of the form f(y n ) = 1 n i=1 n f i (y i ), i=1 where f i (y i ) f i (ȳ i ) L1{y i ȳ i } for all i, y i, ȳ i. Then f Lip L/n. Since P Y n X n =x n for all xn and PY n are product measures on Y n, they all satisfy T 1 (n/4) (by tensorization) Therefore, for any (n, M, ε)-code and any such f we have P (C) ( Y f(y n ) E[f(Y n )] r ) n 4 ( ε exp nc log M + a ) n nr2 2L 2

79 Concentration of Measure Information-Theoretic Converse Concentration phenomenon in a nutshell: if a subset of a metric probability space does not have too small of a probability mass, then its blowups will eventually take up most of the probability mass. Question: given a set whose blowups eventually take up most of the probability mass, how small can this set be? This question was answered by Kontoyiannis (1999) as a consequence of a general information-theoretic converse.

80 Converse Concentration of Measure: The Set-Up Let X be a finite set, together with a distortion function d : X X R + and a mass function M : X (0, ). Extend to product space X n : d n (x n, y n ) M n (x n ) n d(x i, y i ) i=1 n M(x i ) i=1 M n (A) x n A M n (x n ), A X n Blowups: A X n [A] r { } x n X n : min d n(x n, y n ) r y n A

81 Converse Concentration of Measure Let P be a probability measure on X. Define { R n (δ) min I(X n ; Y n ) + E log M n (Y n ) : P X n Y n } P X n = P n, E[d n (X n, Y n )] nδ Theorem (Kontoyiannis). Let A n X n be an arbitrary set. Then where δ 1 [ ] n E min d n (X n, y n ) y n A n Remark: 1 n log M n(a n ) R(δ), and It can be shown that R 1 (δ) = inf n 1 R n(δ) n. R n (δ) R(δ) lim R 1 (δ). n n

82 Proof Define the mapping ϕ n : X n X n via ϕ n (x n ) arg min y n A n d n (x n, y n ) and let Y n = ϕ n (X n ), Q n = L(Y n ). Then log M n (A n ) = log log y n A n M n (y n ) y n A n: Q n>0 Q n (y n ) M n(y n ) Q n (y n ) Q n (y n ) log Mn(yn ) Q n (y n ) y n A n = Q n (y n ) log Q n (y n ) + y n A n = H(Y n ) + E log M(Y n ) = I(X n ; Y n ) + E log M(Y n ) R n (δ). y n A n Q(y n ) log M n (y n )

83 Converse Concentration of Measure Consider a sequence of sets {A n } n=1 with ( ) P n ([A n ] nδ ) n 1. Apply Kontoyiannis converse to the mass function M = P, to get the following: Corollary. If the sequence {A n } satisfies ( ), then lim inf n 1 n log P n (A n ) R(δ), where the concentration exponent is { } R(δ) = min I(X; Y ) + E log P (Y ) : P X = P, E[d(X, Y )] δ P XY { } max H(Y X) + D(P Y P ) : P X = P, E[d(X, Y )] δ. P XY

84 Example of The Concentration Exponent Theorem (Raginsky Sason, 2013). Let P = Bern(p). Then { ( ) ϕ(p)δ 2 (1 p)h, if δ [0, 1 p] where R(δ) δ 1 p = log p, if δ [1 p, 1] ϕ(p) = 1 1 2p log 1 p p and h( ) is the binary entropy function. Remarks: The upper bound is not tight, but in this case R(δ) can be evaluated numerically (cf. Kontoyiannis, 2001). The proof is by a coupling argument.

85 Summary Three related methods for obtaining sharp concentration inequalities in high dimension: 1. The entropy method 2. Log-Sobolev inequalities 3. Transportation-cost inequalities All three methods crucially rely on tensorization: Breaking the original high-dimensional problem into low-dimensional pieces, exploiting low-dimensional structure to control entropy locally, assembling local information into a global bound. Tensorization is a consequence of independence. Applications to information theory: Exploit the problem structure to isolate independence (e.g., output distribution of a DMC for any fixed input block).

86 What We Had to Skip Log-Sobolev inequalities and hypercontractivity Log-Sobolev inequalities when Herbst fails (e.g., Poisson measures) Connections to isoperimetric inequalities HWI inequalities: tying together relative entropy, Wasserstein distance, Fisher information Concentration inequalities for functions of dependent random variables For this and more, consult our monograph: M. Raginsky and I. Sason, Concentration of Measure Inequalities in Info. Theory, Comm. and Coding, FnT, 2nd edition, 2014.

87 Recent Books and Surveys - Concentration Inequalities 1. N. Alon and J. H. Spencer, The Probabilistic Method, Wiley, 3rd edition, S. Boucheron, G. Lugosi, and P. Massart, Concentration Inequalities: A Nonasymptotic Theory of Independence, Oxford Press, F. Chung and L. Lu, Complex Graphs and Networks, vol. 106, Regional Conference Series in Mathematics, Wiley, D. P. Dubhashi and A. Panconesi, Concentration of Measure for the Analysis of Randomized Algorithms, Cambridge Press, M. Ledoux, The Concentration of Measure Phenomenon, Mathematical Surveys and Monographs, vol. 89, AMS, C. McDiarmid, Concentration, Probabilistic Methods for Algorithmic Discrete Mathematics, pp , Springer, M. Raginsky and I. Sason, Concentration of Measure Inequalities in Info. Theory, Comm. and Coding, FnT, 2nd edition, R. van Handel, Probability in High Dimension, lecture notes, 2014.

Refined Bounds on the Empirical Distribution of Good Channel Codes via Concentration Inequalities

Refined Bounds on the Empirical Distribution of Good Channel Codes via Concentration Inequalities Refined Bounds on the Empirical Distribution of Good Channel Codes via Concentration Inequalities Maxim Raginsky and Igal Sason ISIT 2013, Istanbul, Turkey Capacity-Achieving Channel Codes The set-up DMC

More information

Concentration inequalities and the entropy method

Concentration inequalities and the entropy method Concentration inequalities and the entropy method Gábor Lugosi ICREA and Pompeu Fabra University Barcelona what is concentration? We are interested in bounding random fluctuations of functions of many

More information

Distance-Divergence Inequalities

Distance-Divergence Inequalities Distance-Divergence Inequalities Katalin Marton Alfréd Rényi Institute of Mathematics of the Hungarian Academy of Sciences Motivation To find a simple proof of the Blowing-up Lemma, proved by Ahlswede,

More information

OXPORD UNIVERSITY PRESS

OXPORD UNIVERSITY PRESS Concentration Inequalities A Nonasymptotic Theory of Independence STEPHANE BOUCHERON GABOR LUGOSI PASCAL MASS ART OXPORD UNIVERSITY PRESS CONTENTS 1 Introduction 1 1.1 Sums of Independent Random Variables

More information

Concentration, self-bounding functions

Concentration, self-bounding functions Concentration, self-bounding functions S. Boucheron 1 and G. Lugosi 2 and P. Massart 3 1 Laboratoire de Probabilités et Modèles Aléatoires Université Paris-Diderot 2 Economics University Pompeu Fabra 3

More information

Stein s method, logarithmic Sobolev and transport inequalities

Stein s method, logarithmic Sobolev and transport inequalities Stein s method, logarithmic Sobolev and transport inequalities M. Ledoux University of Toulouse, France and Institut Universitaire de France Stein s method, logarithmic Sobolev and transport inequalities

More information

Concentration inequalities: basics and some new challenges

Concentration inequalities: basics and some new challenges Concentration inequalities: basics and some new challenges M. Ledoux University of Toulouse, France & Institut Universitaire de France Measure concentration geometric functional analysis, probability theory,

More information

Two Applications of the Gaussian Poincaré Inequality in the Shannon Theory

Two Applications of the Gaussian Poincaré Inequality in the Shannon Theory Two Applications of the Gaussian Poincaré Inequality in the Shannon Theory Vincent Y. F. Tan (Joint work with Silas L. Fong) National University of Singapore (NUS) 2016 International Zurich Seminar on

More information

Entropy and Ergodic Theory Lecture 15: A first look at concentration

Entropy and Ergodic Theory Lecture 15: A first look at concentration Entropy and Ergodic Theory Lecture 15: A first look at concentration 1 Introduction to concentration Let X 1, X 2,... be i.i.d. R-valued RVs with common distribution µ, and suppose for simplicity that

More information

EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15

EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15 EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15 1. Cascade of Binary Symmetric Channels The conditional probability distribution py x for each of the BSCs may be expressed by the transition probability

More information

Logarithmic Sobolev inequalities in discrete product spaces: proof by a transportation cost distance

Logarithmic Sobolev inequalities in discrete product spaces: proof by a transportation cost distance Logarithmic Sobolev inequalities in discrete product spaces: proof by a transportation cost distance Katalin Marton Alfréd Rényi Institute of Mathematics of the Hungarian Academy of Sciences Relative entropy

More information

arxiv: v4 [cs.it] 17 Oct 2015

arxiv: v4 [cs.it] 17 Oct 2015 Upper Bounds on the Relative Entropy and Rényi Divergence as a Function of Total Variation Distance for Finite Alphabets Igal Sason Department of Electrical Engineering Technion Israel Institute of Technology

More information

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 ECE598: Information-theoretic methods in high-dimensional statistics Spring 06 Lecture : Mutual Information Method Lecturer: Yihong Wu Scribe: Jaeho Lee, Mar, 06 Ed. Mar 9 Quick review: Assouad s lemma

More information

Contents 1. Introduction 1 2. Main results 3 3. Proof of the main inequalities 7 4. Application to random dynamical systems 11 References 16

Contents 1. Introduction 1 2. Main results 3 3. Proof of the main inequalities 7 4. Application to random dynamical systems 11 References 16 WEIGHTED CSISZÁR-KULLBACK-PINSKER INEQUALITIES AND APPLICATIONS TO TRANSPORTATION INEQUALITIES FRANÇOIS BOLLEY AND CÉDRIC VILLANI Abstract. We strengthen the usual Csiszár-Kullback-Pinsker inequality by

More information

On the Concentration of the Crest Factor for OFDM Signals

On the Concentration of the Crest Factor for OFDM Signals On the Concentration of the Crest Factor for OFDM Signals Igal Sason Department of Electrical Engineering Technion - Israel Institute of Technology, Haifa 3, Israel E-mail: sason@eetechnionacil Abstract

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 59 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d

More information

Stein s Method: Distributional Approximation and Concentration of Measure

Stein s Method: Distributional Approximation and Concentration of Measure Stein s Method: Distributional Approximation and Concentration of Measure Larry Goldstein University of Southern California 36 th Midwest Probability Colloquium, 2014 Concentration of Measure Distributional

More information

Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012

Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012 Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 202 BOUNDS AND ASYMPTOTICS FOR FISHER INFORMATION IN THE CENTRAL LIMIT THEOREM

More information

Concentration inequalities for non-lipschitz functions

Concentration inequalities for non-lipschitz functions Concentration inequalities for non-lipschitz functions University of Warsaw Berkeley, October 1, 2013 joint work with Radosław Adamczak (University of Warsaw) Gaussian concentration (Sudakov-Tsirelson,

More information

Concentration of Measure with Applications in Information Theory, Communications and Coding

Concentration of Measure with Applications in Information Theory, Communications and Coding Concentration of Measure with Applications in Information Theory, Communications and Coding Maxim Raginsky UIUC Urbana, IL 61801, USA maxim@illinois.edu Igal Sason Technion Haifa 32000, Israel sason@ee.technion.ac.il

More information

An upper bound on l q norms of noisy functions

An upper bound on l q norms of noisy functions Electronic Collouium on Computational Complexity, Report No. 68 08 An upper bound on l norms of noisy functions Alex Samorodnitsky Abstract Let T ɛ, 0 ɛ /, be the noise operator acting on functions on

More information

Lecture 1 Measure concentration

Lecture 1 Measure concentration CSE 29: Learning Theory Fall 2006 Lecture Measure concentration Lecturer: Sanjoy Dasgupta Scribe: Nakul Verma, Aaron Arvey, and Paul Ruvolo. Concentration of measure: examples We start with some examples

More information

Lecture 35: December The fundamental statistical distances

Lecture 35: December The fundamental statistical distances 36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose

More information

Hypercontractivity of spherical averages in Hamming space

Hypercontractivity of spherical averages in Hamming space Hypercontractivity of spherical averages in Hamming space Yury Polyanskiy Department of EECS MIT yp@mit.edu Allerton Conference, Oct 2, 2014 Yury Polyanskiy Hypercontractivity in Hamming space 1 Hypercontractivity:

More information

Channel Dispersion and Moderate Deviations Limits for Memoryless Channels

Channel Dispersion and Moderate Deviations Limits for Memoryless Channels Channel Dispersion and Moderate Deviations Limits for Memoryless Channels Yury Polyanskiy and Sergio Verdú Abstract Recently, Altug and Wagner ] posed a question regarding the optimal behavior of the probability

More information

AN INEQUALITY FOR TAIL PROBABILITIES OF MARTINGALES WITH BOUNDED DIFFERENCES

AN INEQUALITY FOR TAIL PROBABILITIES OF MARTINGALES WITH BOUNDED DIFFERENCES Lithuanian Mathematical Journal, Vol. 4, No. 3, 00 AN INEQUALITY FOR TAIL PROBABILITIES OF MARTINGALES WITH BOUNDED DIFFERENCES V. Bentkus Vilnius Institute of Mathematics and Informatics, Akademijos 4,

More information

A note on the convex infimum convolution inequality

A note on the convex infimum convolution inequality A note on the convex infimum convolution inequality Naomi Feldheim, Arnaud Marsiglietti, Piotr Nayar, Jing Wang Abstract We characterize the symmetric measures which satisfy the one dimensional convex

More information

Theory of Probability Fall 2008

Theory of Probability Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 8.75 Theory of Probability Fall 008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. Section Prekopa-Leindler inequality,

More information

Stability results for Logarithmic Sobolev inequality

Stability results for Logarithmic Sobolev inequality Stability results for Logarithmic Sobolev inequality Daesung Kim (joint work with Emanuel Indrei) Department of Mathematics Purdue University September 20, 2017 Daesung Kim (Purdue) Stability for LSI Probability

More information

Arimoto Channel Coding Converse and Rényi Divergence

Arimoto Channel Coding Converse and Rényi Divergence Arimoto Channel Coding Converse and Rényi Divergence Yury Polyanskiy and Sergio Verdú Abstract Arimoto proved a non-asymptotic upper bound on the probability of successful decoding achievable by any code

More information

On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein Method

On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein Method On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein Method Igal Sason Department of Electrical Engineering Technion - Israel Institute of Technology Haifa 32000, Israel ETH, Zurich,

More information

Concentration Inequalities

Concentration Inequalities Chapter Concentration Inequalities I. Moment generating functions, the Chernoff method, and sub-gaussian and sub-exponential random variables a. Goal for this section: given a random variable X, how does

More information

Tight Bounds for Symmetric Divergence Measures and a Refined Bound for Lossless Source Coding

Tight Bounds for Symmetric Divergence Measures and a Refined Bound for Lossless Source Coding APPEARS IN THE IEEE TRANSACTIONS ON INFORMATION THEORY, FEBRUARY 015 1 Tight Bounds for Symmetric Divergence Measures and a Refined Bound for Lossless Source Coding Igal Sason Abstract Tight bounds for

More information

EECS 750. Hypothesis Testing with Communication Constraints

EECS 750. Hypothesis Testing with Communication Constraints EECS 750 Hypothesis Testing with Communication Constraints Name: Dinesh Krithivasan Abstract In this report, we study a modification of the classical statistical problem of bivariate hypothesis testing.

More information

Entropy and Ergodic Theory Lecture 17: Transportation and concentration

Entropy and Ergodic Theory Lecture 17: Transportation and concentration Entropy and Ergodic Theory Lecture 17: Transportation and concentration 1 Concentration in terms of metric spaces In this course, a metric probability or m.p. space is a triple px, d, µq in which px, dq

More information

Dispersion of the Gilbert-Elliott Channel

Dispersion of the Gilbert-Elliott Channel Dispersion of the Gilbert-Elliott Channel Yury Polyanskiy Email: ypolyans@princeton.edu H. Vincent Poor Email: poor@princeton.edu Sergio Verdú Email: verdu@princeton.edu Abstract Channel dispersion plays

More information

Lecture 4 Noisy Channel Coding

Lecture 4 Noisy Channel Coding Lecture 4 Noisy Channel Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw October 9, 2015 1 / 56 I-Hsiang Wang IT Lecture 4 The Channel Coding Problem

More information

Heat Flows, Geometric and Functional Inequalities

Heat Flows, Geometric and Functional Inequalities Heat Flows, Geometric and Functional Inequalities M. Ledoux Institut de Mathématiques de Toulouse, France heat flow and semigroup interpolations Duhamel formula (19th century) pde, probability, dynamics

More information

Correlation Detection and an Operational Interpretation of the Rényi Mutual Information

Correlation Detection and an Operational Interpretation of the Rényi Mutual Information Correlation Detection and an Operational Interpretation of the Rényi Mutual Information Masahito Hayashi 1, Marco Tomamichel 2 1 Graduate School of Mathematics, Nagoya University, and Centre for Quantum

More information

Lecture 5 Channel Coding over Continuous Channels

Lecture 5 Channel Coding over Continuous Channels Lecture 5 Channel Coding over Continuous Channels I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw November 14, 2014 1 / 34 I-Hsiang Wang NIT Lecture 5 From

More information

On Improved Bounds for Probability Metrics and f- Divergences

On Improved Bounds for Probability Metrics and f- Divergences IRWIN AND JOAN JACOBS CENTER FOR COMMUNICATION AND INFORMATION TECHNOLOGIES On Improved Bounds for Probability Metrics and f- Divergences Igal Sason CCIT Report #855 March 014 Electronics Computers Communications

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 2: Introduction to statistical learning theory. 1 / 22 Goals of statistical learning theory SLT aims at studying the performance of

More information

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University Chapter 4 Data Transmission and Channel Capacity Po-Ning Chen, Professor Department of Communications Engineering National Chiao Tung University Hsin Chu, Taiwan 30050, R.O.C. Principle of Data Transmission

More information

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

Lecture 6 I. CHANNEL CODING. X n (m) P Y X 6- Introduction to Information Theory Lecture 6 Lecturer: Haim Permuter Scribe: Yoav Eisenberg and Yakov Miron I. CHANNEL CODING We consider the following channel coding problem: m = {,2,..,2 nr} Encoder

More information

Metric Spaces and Topology

Metric Spaces and Topology Chapter 2 Metric Spaces and Topology From an engineering perspective, the most important way to construct a topology on a set is to define the topology in terms of a metric on the set. This approach underlies

More information

Logarithmic Sobolev Inequalities

Logarithmic Sobolev Inequalities Logarithmic Sobolev Inequalities M. Ledoux Institut de Mathématiques de Toulouse, France logarithmic Sobolev inequalities what they are, some history analytic, geometric, optimal transportation proofs

More information

Information theoretic perspectives on learning algorithms

Information theoretic perspectives on learning algorithms Information theoretic perspectives on learning algorithms Varun Jog University of Wisconsin - Madison Departments of ECE and Mathematics Shannon Channel Hangout! May 8, 2018 Jointly with Adrian Tovar-Lopez

More information

Soft Covering with High Probability

Soft Covering with High Probability Soft Covering with High Probability Paul Cuff Princeton University arxiv:605.06396v [cs.it] 20 May 206 Abstract Wyner s soft-covering lemma is the central analysis step for achievability proofs of information

More information

Entropy, Compound Poisson Approximation, Log-Sobolev Inequalities and Measure Concentration

Entropy, Compound Poisson Approximation, Log-Sobolev Inequalities and Measure Concentration Entropy, Compound Poisson Approximation, Log-Sobolev Inequalities and Measure Concentration Ioannis Kontoyiannis 1 and Mokshay Madiman Division of Applied Mathematics, Brown University 182 George St.,

More information

Notes 3: Stochastic channels and noisy coding theorem bound. 1 Model of information communication and noisy channel

Notes 3: Stochastic channels and noisy coding theorem bound. 1 Model of information communication and noisy channel Introduction to Coding Theory CMU: Spring 2010 Notes 3: Stochastic channels and noisy coding theorem bound January 2010 Lecturer: Venkatesan Guruswami Scribe: Venkatesan Guruswami We now turn to the basic

More information

On the Concentration of the Crest Factor for OFDM Signals

On the Concentration of the Crest Factor for OFDM Signals On the Concentration of the Crest Factor for OFDM Signals Igal Sason Department of Electrical Engineering Technion - Israel Institute of Technology Haifa 32000, Israel The 2011 IEEE International Symposium

More information

Weak convergence and large deviation theory

Weak convergence and large deviation theory First Prev Next Go To Go Back Full Screen Close Quit 1 Weak convergence and large deviation theory Large deviation principle Convergence in distribution The Bryc-Varadhan theorem Tightness and Prohorov

More information

Lecture 18: March 15

Lecture 18: March 15 CS71 Randomness & Computation Spring 018 Instructor: Alistair Sinclair Lecture 18: March 15 Disclaimer: These notes have not been subjected to the usual scrutiny accorded to formal publications. They may

More information

LECTURE 10. Last time: Lecture outline

LECTURE 10. Last time: Lecture outline LECTURE 10 Joint AEP Coding Theorem Last time: Error Exponents Lecture outline Strong Coding Theorem Reading: Gallager, Chapter 5. Review Joint AEP A ( ɛ n) (X) A ( ɛ n) (Y ) vs. A ( ɛ n) (X, Y ) 2 nh(x)

More information

LECTURE 3. Last time:

LECTURE 3. Last time: LECTURE 3 Last time: Mutual Information. Convexity and concavity Jensen s inequality Information Inequality Data processing theorem Fano s Inequality Lecture outline Stochastic processes, Entropy rate

More information

SHARED INFORMATION. Prakash Narayan with. Imre Csiszár, Sirin Nitinawarat, Himanshu Tyagi, Shun Watanabe

SHARED INFORMATION. Prakash Narayan with. Imre Csiszár, Sirin Nitinawarat, Himanshu Tyagi, Shun Watanabe SHARED INFORMATION Prakash Narayan with Imre Csiszár, Sirin Nitinawarat, Himanshu Tyagi, Shun Watanabe 2/40 Acknowledgement Praneeth Boda Himanshu Tyagi Shun Watanabe 3/40 Outline Two-terminal model: Mutual

More information

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2)

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2) 14:17 11/16/2 TOPIC. Convergence in distribution and related notions. This section studies the notion of the so-called convergence in distribution of real random variables. This is the kind of convergence

More information

March 1, Florida State University. Concentration Inequalities: Martingale. Approach and Entropy Method. Lizhe Sun and Boning Yang.

March 1, Florida State University. Concentration Inequalities: Martingale. Approach and Entropy Method. Lizhe Sun and Boning Yang. Florida State University March 1, 2018 Framework 1. (Lizhe) Basic inequalities Chernoff bounding Review for STA 6448 2. (Lizhe) Discrete-time martingales inequalities via martingale approach 3. (Boning)

More information

Uniform concentration inequalities, martingales, Rademacher complexity and symmetrization

Uniform concentration inequalities, martingales, Rademacher complexity and symmetrization Uniform concentration inequalities, martingales, Rademacher complexity and symmetrization John Duchi Outline I Motivation 1 Uniform laws of large numbers 2 Loss minimization and data dependence II Uniform

More information

Functional Properties of MMSE

Functional Properties of MMSE Functional Properties of MMSE Yihong Wu epartment of Electrical Engineering Princeton University Princeton, NJ 08544, USA Email: yihongwu@princeton.edu Sergio Verdú epartment of Electrical Engineering

More information

NEW FUNCTIONAL INEQUALITIES

NEW FUNCTIONAL INEQUALITIES 1 / 29 NEW FUNCTIONAL INEQUALITIES VIA STEIN S METHOD Giovanni Peccati (Luxembourg University) IMA, Minneapolis: April 28, 2015 2 / 29 INTRODUCTION Based on two joint works: (1) Nourdin, Peccati and Swan

More information

Superposition Encoding and Partial Decoding Is Optimal for a Class of Z-interference Channels

Superposition Encoding and Partial Decoding Is Optimal for a Class of Z-interference Channels Superposition Encoding and Partial Decoding Is Optimal for a Class of Z-interference Channels Nan Liu and Andrea Goldsmith Department of Electrical Engineering Stanford University, Stanford CA 94305 Email:

More information

SHARED INFORMATION. Prakash Narayan with. Imre Csiszár, Sirin Nitinawarat, Himanshu Tyagi, Shun Watanabe

SHARED INFORMATION. Prakash Narayan with. Imre Csiszár, Sirin Nitinawarat, Himanshu Tyagi, Shun Watanabe SHARED INFORMATION Prakash Narayan with Imre Csiszár, Sirin Nitinawarat, Himanshu Tyagi, Shun Watanabe 2/41 Outline Two-terminal model: Mutual information Operational meaning in: Channel coding: channel

More information

Katalin Marton. Abbas El Gamal. Stanford University. Withits A. El Gamal (Stanford University) Katalin Marton Withits / 9

Katalin Marton. Abbas El Gamal. Stanford University. Withits A. El Gamal (Stanford University) Katalin Marton Withits / 9 Katalin Marton Abbas El Gamal Stanford University Withits 2010 A. El Gamal (Stanford University) Katalin Marton Withits 2010 1 / 9 Brief Bio Born in 1941, Budapest Hungary PhD from Eötvös Loránd University

More information

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9 MAT 570 REAL ANALYSIS LECTURE NOTES PROFESSOR: JOHN QUIGG SEMESTER: FALL 204 Contents. Sets 2 2. Functions 5 3. Countability 7 4. Axiom of choice 8 5. Equivalence relations 9 6. Real numbers 9 7. Extended

More information

The Moment Method; Convex Duality; and Large/Medium/Small Deviations

The Moment Method; Convex Duality; and Large/Medium/Small Deviations Stat 928: Statistical Learning Theory Lecture: 5 The Moment Method; Convex Duality; and Large/Medium/Small Deviations Instructor: Sham Kakade The Exponential Inequality and Convex Duality The exponential

More information

Capacity of AWGN channels

Capacity of AWGN channels Chapter 3 Capacity of AWGN channels In this chapter we prove that the capacity of an AWGN channel with bandwidth W and signal-tonoise ratio SNR is W log 2 (1+SNR) bits per second (b/s). The proof that

More information

Concentration inequalities

Concentration inequalities Concentration inequalities A nonasymptotic theory of independence Stéphane Boucheron Paris 7 Gábor Lugosi ICREA and Universitat Pompeu Fabra Pascal Massart Orsay CLARENDON PRESS. OXFORD 2012 PREFACE Measure

More information

Concentration inequalities and tail bounds

Concentration inequalities and tail bounds Concentration inequalities and tail bounds John Duchi Outline I Basics and motivation 1 Law of large numbers 2 Markov inequality 3 Cherno bounds II Sub-Gaussian random variables 1 Definitions 2 Examples

More information

Solutions to Homework Set #1 Sanov s Theorem, Rate distortion

Solutions to Homework Set #1 Sanov s Theorem, Rate distortion st Semester 00/ Solutions to Homework Set # Sanov s Theorem, Rate distortion. Sanov s theorem: Prove the simple version of Sanov s theorem for the binary random variables, i.e., let X,X,...,X n be a sequence

More information

ON MEHLER S FORMULA. Giovanni Peccati (Luxembourg University) Conférence Géométrie Stochastique Nantes April 7, 2016

ON MEHLER S FORMULA. Giovanni Peccati (Luxembourg University) Conférence Géométrie Stochastique Nantes April 7, 2016 1 / 22 ON MEHLER S FORMULA Giovanni Peccati (Luxembourg University) Conférence Géométrie Stochastique Nantes April 7, 2016 2 / 22 OVERVIEW ı I will discuss two joint works: Last, Peccati and Schulte (PTRF,

More information

Tightened Upper Bounds on the ML Decoding Error Probability of Binary Linear Block Codes and Applications

Tightened Upper Bounds on the ML Decoding Error Probability of Binary Linear Block Codes and Applications on the ML Decoding Error Probability of Binary Linear Block Codes and Moshe Twitto Department of Electrical Engineering Technion-Israel Institute of Technology Haifa 32000, Israel Joint work with Igal

More information

Lecture 5: Channel Capacity. Copyright G. Caire (Sample Lectures) 122

Lecture 5: Channel Capacity. Copyright G. Caire (Sample Lectures) 122 Lecture 5: Channel Capacity Copyright G. Caire (Sample Lectures) 122 M Definitions and Problem Setup 2 X n Y n Encoder p(y x) Decoder ˆM Message Channel Estimate Definition 11. Discrete Memoryless Channel

More information

A Gentle Introduction to Concentration Inequalities

A Gentle Introduction to Concentration Inequalities A Gentle Introduction to Concentration Inequalities Karthik Sridharan Abstract This notes is ment to be a review of some basic inequalities and bounds on Random variables. A basic understanding of probability

More information

Introduction to Statistical Learning Theory

Introduction to Statistical Learning Theory Introduction to Statistical Learning Theory In the last unit we looked at regularization - adding a w 2 penalty. We add a bias - we prefer classifiers with low norm. How to incorporate more complicated

More information

EXPONENTIAL INEQUALITIES IN NONPARAMETRIC ESTIMATION

EXPONENTIAL INEQUALITIES IN NONPARAMETRIC ESTIMATION EXPONENTIAL INEQUALITIES IN NONPARAMETRIC ESTIMATION Luc Devroye Division of Statistics University of California at Davis Davis, CA 95616 ABSTRACT We derive exponential inequalities for the oscillation

More information

MEASURE CONCENTRATION FOR COMPOUND POIS- SON DISTRIBUTIONS

MEASURE CONCENTRATION FOR COMPOUND POIS- SON DISTRIBUTIONS Elect. Comm. in Probab. 11 (26), 45 57 ELECTRONIC COMMUNICATIONS in PROBABILITY MEASURE CONCENTRATION FOR COMPOUND POIS- SON DISTRIBUTIONS I. KONTOYIANNIS 1 Division of Applied Mathematics, Brown University,

More information

Bennett-type Generalization Bounds: Large-deviation Case and Faster Rate of Convergence

Bennett-type Generalization Bounds: Large-deviation Case and Faster Rate of Convergence Bennett-type Generalization Bounds: Large-deviation Case and Faster Rate of Convergence Chao Zhang The Biodesign Institute Arizona State University Tempe, AZ 8587, USA Abstract In this paper, we present

More information

Series 7, May 22, 2018 (EM Convergence)

Series 7, May 22, 2018 (EM Convergence) Exercises Introduction to Machine Learning SS 2018 Series 7, May 22, 2018 (EM Convergence) Institute for Machine Learning Dept. of Computer Science, ETH Zürich Prof. Dr. Andreas Krause Web: https://las.inf.ethz.ch/teaching/introml-s18

More information

Lecture 8: Information Theory and Statistics

Lecture 8: Information Theory and Statistics Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang

More information

EE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions

EE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions EE/Stat 376B Handout #5 Network Information Theory October, 14, 014 1. Problem.4 parts (b) and (c). Homework Set # Solutions (b) Consider h(x + Y ) h(x + Y Y ) = h(x Y ) = h(x). (c) Let ay = Y 1 + Y, where

More information

Entropy, Inference, and Channel Coding

Entropy, Inference, and Channel Coding Entropy, Inference, and Channel Coding Sean Meyn Department of Electrical and Computer Engineering University of Illinois and the Coordinated Science Laboratory NSF support: ECS 02-17836, ITR 00-85929

More information

Superconcentration inequalities for centered Gaussian stationnary processes

Superconcentration inequalities for centered Gaussian stationnary processes Superconcentration inequalities for centered Gaussian stationnary processes Kevin Tanguy Toulouse University June 21, 2016 1 / 22 Outline What is superconcentration? Convergence of extremes (Gaussian case).

More information

National University of Singapore Department of Electrical & Computer Engineering. Examination for

National University of Singapore Department of Electrical & Computer Engineering. Examination for National University of Singapore Department of Electrical & Computer Engineering Examination for EE5139R Information Theory for Communication Systems (Semester I, 2014/15) November/December 2014 Time Allowed:

More information

Exercises with solutions (Set D)

Exercises with solutions (Set D) Exercises with solutions Set D. A fair die is rolled at the same time as a fair coin is tossed. Let A be the number on the upper surface of the die and let B describe the outcome of the coin toss, where

More information

EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm

EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm 1. Feedback does not increase the capacity. Consider a channel with feedback. We assume that all the recieved outputs are sent back immediately

More information

EE 4TM4: Digital Communications II. Channel Capacity

EE 4TM4: Digital Communications II. Channel Capacity EE 4TM4: Digital Communications II 1 Channel Capacity I. CHANNEL CODING THEOREM Definition 1: A rater is said to be achievable if there exists a sequence of(2 nr,n) codes such thatlim n P (n) e (C) = 0.

More information

Lower Bounds on the Graphical Complexity of Finite-Length LDPC Codes

Lower Bounds on the Graphical Complexity of Finite-Length LDPC Codes Lower Bounds on the Graphical Complexity of Finite-Length LDPC Codes Igal Sason Department of Electrical Engineering Technion - Israel Institute of Technology Haifa 32000, Israel 2009 IEEE International

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini April 27, 2018 1 / 80 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d

More information

Introduction and Preliminaries

Introduction and Preliminaries Chapter 1 Introduction and Preliminaries This chapter serves two purposes. The first purpose is to prepare the readers for the more systematic development in later chapters of methods of real analysis

More information

COMMENT ON : HYPERCONTRACTIVITY OF HAMILTON-JACOBI EQUATIONS, BY S. BOBKOV, I. GENTIL AND M. LEDOUX

COMMENT ON : HYPERCONTRACTIVITY OF HAMILTON-JACOBI EQUATIONS, BY S. BOBKOV, I. GENTIL AND M. LEDOUX COMMENT ON : HYPERCONTRACTIVITY OF HAMILTON-JACOBI EQUATIONS, BY S. BOBKOV, I. GENTIL AND M. LEDOUX F. OTTO AND C. VILLANI In their remarkable work [], Bobkov, Gentil and Ledoux improve, generalize and

More information

6.1 Moment Generating and Characteristic Functions

6.1 Moment Generating and Characteristic Functions Chapter 6 Limit Theorems The power statistics can mostly be seen when there is a large collection of data points and we are interested in understanding the macro state of the system, e.g., the average,

More information

Lectures 6: Degree Distributions and Concentration Inequalities

Lectures 6: Degree Distributions and Concentration Inequalities University of Washington Lecturer: Abraham Flaxman/Vahab S Mirrokni CSE599m: Algorithms and Economics of Networks April 13 th, 007 Scribe: Ben Birnbaum Lectures 6: Degree Distributions and Concentration

More information

ECE 4400:693 - Information Theory

ECE 4400:693 - Information Theory ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43 Outline 1 Review: Entropy of discrete RVs 2 Differential

More information

ECE Information theory Final (Fall 2008)

ECE Information theory Final (Fall 2008) ECE 776 - Information theory Final (Fall 2008) Q.1. (1 point) Consider the following bursty transmission scheme for a Gaussian channel with noise power N and average power constraint P (i.e., 1/n X n i=1

More information

On Concentration of Martingales and Applications in Information Theory, Communication & Coding

On Concentration of Martingales and Applications in Information Theory, Communication & Coding On Concentration of Martingales and Applications in Information Theory, Communication & Coding Igal Sason Department of Electrical Engineering Technion - Israel Institute of Technology Haifa 32000, Israel

More information

A new converse in rate-distortion theory

A new converse in rate-distortion theory A new converse in rate-distortion theory Victoria Kostina, Sergio Verdú Dept. of Electrical Engineering, Princeton University, NJ, 08544, USA Abstract This paper shows new finite-blocklength converse bounds

More information

Capacitary inequalities in discrete setting and application to metastable Markov chains. André Schlichting

Capacitary inequalities in discrete setting and application to metastable Markov chains. André Schlichting Capacitary inequalities in discrete setting and application to metastable Markov chains André Schlichting Institute for Applied Mathematics, University of Bonn joint work with M. Slowik (TU Berlin) 12

More information

Concentration Properties of Restricted Measures with Applications to Non-Lipschitz Functions

Concentration Properties of Restricted Measures with Applications to Non-Lipschitz Functions Concentration Properties of Restricted Measures with Applications to Non-Lipschitz Functions S G Bobkov, P Nayar, and P Tetali April 4, 6 Mathematics Subject Classification Primary 6Gxx Keywords and phrases

More information