Second-Order Asymptotics in Information Theory

Second-Order Asymptotics in Information Theory Vincent Y. F. Tan (vtan@nus.edu.sg) Dept. of ECE and Dept. of Mathematics National University of Singapore (NUS) National Taiwan University November 2015 Vincent Tan (NUS) Second-Order Asymptotics NTU 1 / 109

Outline 1 Motivation, Background and History Vincent Tan (NUS) Second-Order Asymptotics NTU 2 / 109

Outline 1 Motivation, Background and History 2 Binary Hypothesis Testing Vincent Tan (NUS) Second-Order Asymptotics NTU 2 / 109

Outline 1 Motivation, Background and History 2 Binary Hypothesis Testing 3 Fixed-Length Lossless Source Coding Vincent Tan (NUS) Second-Order Asymptotics NTU 2 / 109

Outline 1 Motivation, Background and History 2 Binary Hypothesis Testing 3 Fixed-Length Lossless Source Coding 4 Channel Coding Vincent Tan (NUS) Second-Order Asymptotics NTU 2 / 109

Outline 1 Motivation, Background and History 2 Binary Hypothesis Testing 3 Fixed-Length Lossless Source Coding 4 Channel Coding 5 Slepian-Wolf Coding Vincent Tan (NUS) Second-Order Asymptotics NTU 2 / 109

Outline 1 Motivation, Background and History 2 Binary Hypothesis Testing 3 Fixed-Length Lossless Source Coding 4 Channel Coding 5 Slepian-Wolf Coding 6 Summary and Open Problems Vincent Tan (NUS) Second-Order Asymptotics NTU 2 / 109

Transmission of Information Shannon s Figure 1 INFORMATION SOURCE TRANSMITTER RECEIVER DESTINATION SIGNAL RECEIVED SIGNAL MESSAGE MESSAGE NOISE SOURCE Shannon abstracted away information meaning, semantics treat all data equally Shannon s bits as a universal Figure currency 1 crucial abstraction for modern communication and computing systems Also relaxed computation and delay constraints to discover a fundamental limit: capacity, providing a goal-post to work toward Information theory Finding fundamental limits for reliable information transmission Saturday, June 11, 2011 Vincent Tan (NUS) Second-Order Asymptotics NTU 4 / 109

Transmission of Information Shannon s Figure 1 INFORMATION SOURCE TRANSMITTER RECEIVER DESTINATION SIGNAL RECEIVED SIGNAL MESSAGE MESSAGE NOISE SOURCE Shannon abstracted away information meaning, semantics treat all data equally Shannon s bits as a universal Figure currency 1 crucial abstraction for modern communication and computing systems Also relaxed computation and delay constraints to discover a fundamental limit: capacity, providing a goal-post to work toward Information theory Finding fundamental limits for reliable information transmission Saturday, June 11, 2011 Channel coding: Concerned with the maximum rate of communication in bits/channel use Vincent Tan (NUS) Second-Order Asymptotics NTU 4 / 109

Channel Coding (One-Shot) M X Y M f W ϕ A code is an triple C = {M, f, ϕ} where M is the message set The average error probability p err (C) is [ ] p err (C) := Pr M M where M is uniform on M Vincent Tan (NUS) Second-Order Asymptotics NTU 5 / 109

Channel Coding (One-Shot) M X Y M f W ϕ A code is an triple C = {M, f, ϕ} where M is the message set The average error probability p err (C) is [ ] p err (C) := Pr M M where M is uniform on M A non-asymptotic fundamental limit can be defined as M (W, ε) := sup { m N C s.t. m = M, p err (C) ε } Central problem in information theory to characterize M (W, ε). Vincent Tan (NUS) Second-Order Asymptotics NTU 5 / 109

Channel Coding (n-shot) M X n Y f W n n M ϕ Consider n independent uses of a discrete memoryless channel (DMC) W n Vincent Tan (NUS) Second-Order Asymptotics NTU 6 / 109

Channel Coding (n-shot) M X n Y f W n n M ϕ Consider n independent uses of a discrete memoryless channel (DMC) W n For vectors x n = (x 1,..., x n ) X n and y n := (y 1,..., y n ) Y n, the channel law is n W n (y n x n ) = W(y i x i ) i=1 Vincent Tan (NUS) Second-Order Asymptotics NTU 6 / 109

Channel Coding (n-shot) M X n Y f W n n M ϕ Consider n independent uses of a discrete memoryless channel (DMC) W n For vectors x n = (x 1,..., x n ) X n and y n := (y 1,..., y n ) Y n, the channel law is n W n (y n x n ) = W(y i x i ) i=1 Non-asymptotic fundamental limit for n uses of W M (W n, ε) Vincent Tan (NUS) Second-Order Asymptotics NTU 6 / 109

Background: Shannon s Channel Coding Theorem Shannon s (1948) noisy channel coding theorem and Wolfowitz s (1959) strong converse state that Vincent Tan (NUS) Second-Order Asymptotics NTU 7 / 109

Background: Shannon s Channel Coding Theorem Shannon s (1948) noisy channel coding theorem and Wolfowitz s (1959) strong converse state that Theorem (Shannon (1948), Wolfowitz (1959)) where is the capacity of the DMC. 1 lim n n log M (W n, ε) = C, ε (0, 1) C = max P I(P, W) Vincent Tan (NUS) Second-Order Asymptotics NTU 7 / 109

Background: Shannon s Channel Coding Theorem 1 lim n n log M (W n, ε) = C bits/channel use Channel coding theorem for DMCs is independent of ε (0, 1) Vincent Tan (NUS) Second-Order Asymptotics NTU 8 / 109

Background: Shannon s Channel Coding Theorem 1 lim n n log M (W n, ε) = C bits/channel use Channel coding theorem for DMCs is independent of ε (0, 1) 1 lim p err(c n ) n 0 C R Vincent Tan (NUS) Second-Order Asymptotics NTU 8 / 109

Background: Shannon s Channel Coding Theorem 1 lim n n log M (W n, ε) = C bits/channel use Channel coding theorem for DMCs is independent of ε (0, 1) 1 lim p err(c n ) n 0 C R Phase transition at capacity Vincent Tan (NUS) Second-Order Asymptotics NTU 8 / 109

Background: Second-Order Coding Rates What happens at capacity? Vincent Tan (NUS) Second-Order Asymptotics NTU 9 / 109

Background: Second-Order Coding Rates What happens at capacity? More precisely, what happens when log M n nc + L n for some L R? Vincent Tan (NUS) Second-Order Asymptotics NTU 9 / 109

Background: Second-Order Coding Rates What happens at capacity? More precisely, what happens when log M n nc + L n for some L R? Here L is known as the second-order coding rate of the code Vincent Tan (NUS) Second-Order Asymptotics NTU 9 / 109

Background: Second-Order Coding Rates What happens at capacity? More precisely, what happens when log M n nc + L n for some L R? Here L is known as the second-order coding rate of the code Note that L can be negative (cf. Hayashi (2008), Hayashi (2009)) Vincent Tan (NUS) Second-Order Asymptotics NTU 9 / 109

Background: Second-Order Coding Rates Assume rate of the code satisfies 1 n log M n = C + L ( 1 ) + o n n Vincent Tan (NUS) Second-Order Asymptotics NTU 10 / 109

Background: Second-Order Coding Rates Assume rate of the code satisfies 1 n log M n = C + L ( 1 ) + o n n 1 lim n perr(cn) 0.5 0 L Vincent Tan (NUS) Second-Order Asymptotics NTU 10 / 109

Background: Second-Order Coding Rates Assume rate of the code satisfies 1 n log M n = C + L ( 1 ) + o n n 1 lim n perr(cn) 0.5 p err (C n ) = Φ( L V ) + o(1) 0 L Vincent Tan (NUS) Second-Order Asymptotics NTU 10 / 109

Background: Second-Order Coding Rates Assume rate of the code satisfies 1 n log M n = C + L ( 1 ) + o n n 1 lim n perr(cn) 0.5 p err (C n ) = Φ( L V ) + o(1) 0 L For an error probability ε, the optimum second-order coding rate is L (ε) := VΦ 1 (ε) Vincent Tan (NUS) Second-Order Asymptotics NTU 10 / 109

Error Exponents vs Normal Approximation For error exponent analysis, we fix rate R < C and study ε (W n, 2 nr ) = min{ε : (2 nr, ε)-code for W n } Most of the time, ε (W n, 2 nr ) exp( ne(r)) Vincent Tan (NUS) Second-Order Asymptotics NTU 11 / 109

Error Exponents vs Normal Approximation For error exponent analysis, we fix rate R < C and study ε (W n, 2 nr ) = min{ε : (2 nr, ε)-code for W n } Most of the time, ε (W n, 2 nr ) exp( ne(r)) For normal approximation or second-order analysis, we fix the error probability ε (0, 1) and seek R (W n, ε) = log M (W n, ε) n C + V n Φ 1 (ε) Vincent Tan (NUS) Second-Order Asymptotics NTU 11 / 109

Error Exponents vs Normal Approximation For error exponent analysis, we fix rate R < C and study ε (W n, 2 nr ) = min{ε : (2 nr, ε)-code for W n } Most of the time, ε (W n, 2 nr ) exp( ne(r)) For normal approximation or second-order analysis, we fix the error probability ε (0, 1) and seek R (W n, ε) = log M (W n, ε) n C + V n Φ 1 (ε) Some form of duality in the analyses... Vincent Tan (NUS) Second-Order Asymptotics NTU 11 / 109

Agenda : Part I Agenda for today s tutorial: Point-to-point communication 1 Most results in point-to-point information theory can be derived by understanding the fundamental limits of binary hypothesis testing (Strassen (1962)) 2 Lossless source coding is an easy corollary of binary hypothesis testing (Strassen (1962)) Vincent Tan (NUS) Second-Order Asymptotics NTU 12 / 109

Agenda : Part I Agenda for today s tutorial: Point-to-point communication 1 Most results in point-to-point information theory can be derived by understanding the fundamental limits of binary hypothesis testing (Strassen (1962)) 2 Lossless source coding is an easy corollary of binary hypothesis testing (Strassen (1962)) 3 Prove the channel coding dispersion for DMCs 1 Strassen (1962) 2 Hayashi (2009) 3 Polyanskiy-Poor-Verdú (2010) 4 Tomamichel-Tan (2013) Vincent Tan (NUS) Second-Order Asymptotics NTU 12 / 109

Agenda : Part II An extension to a multiterminal (network) setting the Slepian-Wolf problem (Tan-Kosut (2014), Nomura-Han (2015)); Vincent Tan (NUS) Second-Order Asymptotics NTU 13 / 109

Agenda : Part II An extension to a multiterminal (network) setting the Slepian-Wolf problem (Tan-Kosut (2014), Nomura-Han (2015)); I will be talking about a subset of the material from my monograph. V. Y. F. Tan Asymptotic expansions in information theory with non-vanishing error probabilities Now Publishers Foundations and Trends in Communications and Information Theory Vincent Tan (NUS) Second-Order Asymptotics NTU 13 / 109

Setup of the Binary Hypothesis Testing Problem In binary hypothesis testing, we are concerned with the problem H 0 : Z P H 1 : Z Q where P, Q P(Z) are distributions on the same space Z. Vincent Tan (NUS) Second-Order Asymptotics NTU 15 / 109

Setup of the Binary Hypothesis Testing Problem In binary hypothesis testing, we are concerned with the problem H 0 : Z P H 1 : Z Q where P, Q P(Z) are distributions on the same space Z. Alphabet Z may be assumed to be finite Vincent Tan (NUS) Second-Order Asymptotics NTU 15 / 109

Setup of the Binary Hypothesis Testing Problem In binary hypothesis testing, we are concerned with the problem H 0 : Z P H 1 : Z Q where P, Q P(Z) are distributions on the same space Z. Alphabet Z may be assumed to be finite Design a test δ : Z {0, 1} that outputs 0 if Z P and 1 otherwise Vincent Tan (NUS) Second-Order Asymptotics NTU 15 / 109

Various Error Probabilities For a given test δ : Z {0, 1}, we may define Vincent Tan (NUS) Second-Order Asymptotics NTU 16 / 109

Various Error Probabilities For a given test δ : Z {0, 1}, we may define Probability of false alarm P FA (δ) := z δ(z)p(z) = E P [δ(z)] Vincent Tan (NUS) Second-Order Asymptotics NTU 16 / 109

Various Error Probabilities For a given test δ : Z {0, 1}, we may define Probability of false alarm P FA (δ) := z δ(z)p(z) = E P [δ(z)] Probability of missed detection P MD (δ) := z (1 δ(z))q(z) = E Q [1 δ(z)] Vincent Tan (NUS) Second-Order Asymptotics NTU 16 / 109

Various Error Probabilities For a given test δ : Z {0, 1}, we may define Probability of false alarm P FA (δ) := z δ(z)p(z) = E P [δ(z)] Probability of missed detection P MD (δ) := z (1 δ(z))q(z) = E Q [1 δ(z)] Holy grail: Design a test δ such that P FA (δ) 0 while P D (δ) = 1 P MD (δ) 1 but impossible most of the time Vincent Tan (NUS) Second-Order Asymptotics NTU 16 / 109

A Measure of the Performance of a Test δ For given distributions P and Q and an ε (0, 1), we may define β 1 ε (P, Q) := This is the same as β 1 ε (P, Q) := inf {P MD(δ) : P FA (δ) ε} δ:z {0,1} inf A Z:P(A) ε Q(Ac ) where A represents the acceptance region for H 1, i.e., δ(z) = 1 iff z A. Vincent Tan (NUS) Second-Order Asymptotics NTU 17 / 109

A Measure of the Performance of a Test δ For given distributions P and Q and an ε (0, 1), we may define β 1 ε (P, Q) := This is the same as β 1 ε (P, Q) := inf {P MD(δ) : P FA (δ) ε} δ:z {0,1} inf A Z:P(A) ε Q(Ac ) where A represents the acceptance region for H 1, i.e., δ(z) = 1 iff z A. The larger the tolerance ε, the smaller β 1 ε Vincent Tan (NUS) Second-Order Asymptotics NTU 17 / 109

ε-hypothesis Testing Divergence The ε-hypothesis testing divergence is D ε h(p Q) := log β 1 ε(p, Q) 1 ε Measure of distinguishability of P from Q Vincent Tan (NUS) Second-Order Asymptotics NTU 18 / 109

ε-hypothesis Testing Divergence The ε-hypothesis testing divergence is D ε h(p Q) := log β 1 ε(p, Q) 1 ε Measure of distinguishability of P from Q Similar to divergence, we have non-negativity and data processing inequality D ε h(p Q) 0. D ε h(p Q) D ε h(pw QW) where PW(z ) = z P(z)W(z z) for any channel W : Z Z Vincent Tan (NUS) Second-Order Asymptotics NTU 18 / 109

ε-information Spectrum Divergence While D ε h (P Q) is very useful and fundamental, it is hard to compute (optimization over functions δ : Z {0, 1}). Vincent Tan (NUS) Second-Order Asymptotics NTU 19 / 109

ε-information Spectrum Divergence While D ε h (P Q) is very useful and fundamental, it is hard to compute (optimization over functions δ : Z {0, 1}). Define another related quantity: The ε-information spectrum divergence { { D ε s(p Q) := sup R : P z Z : log P(z) } } Q(z) R ε Information Spectrum Methods in Information Theory by T. S. Han (2003) Vincent Tan (NUS) Second-Order Asymptotics NTU 19 / 109

ε-information Spectrum Divergence { { D ε s(p Q) := sup R : P z Z : log P(z) } } Q(z) R ε ε 1 ε Density of log P(Z) Q(Z) when Z P R D ε s (P Q) is the largest point R for which the probability mass to the left is no larger than ε. Vincent Tan (NUS) Second-Order Asymptotics NTU 20 / 109

ε-information Spectrum Divergence { { D ε s(p Q) := sup R : P z Z : log P(z) } } Q(z) R ε The ε-information Spectrum Divergence is easy to estimate Vincent Tan (NUS) Second-Order Asymptotics NTU 21 / 109

ε-information Spectrum Divergence { { D ε s(p Q) := sup R : P z Z : log P(z) } } Q(z) R ε The ε-information Spectrum Divergence is easy to estimate If P n and Q n are product distributions, i.e., n P n (z n ) = P(z i ), Q n (z n ) = i=1 n Q(z i ), then the probability { P n z n Z n : log Pn (z n } ( n ) ) Q n (z n ) R = Pr log P(Z i) Q(Z i ) R i=1 i=1 and log P(Z i) Q(Z i ) (where Z i P) are iid random variables. Vincent Tan (NUS) Second-Order Asymptotics NTU 21 / 109

ε-information Spectrum Divergence { { D ε s(p Q) := sup R : P z Z : log P(z) } } Q(z) R ε The ε-information Spectrum Divergence is easy to estimate If P n and Q n are product distributions, i.e., n P n (z n ) = P(z i ), Q n (z n ) = i=1 n Q(z i ), then the probability { P n z n Z n : log Pn (z n } ( n ) ) Q n (z n ) R = Pr log P(Z i) Q(Z i ) R i=1 i=1 and log P(Z i) Q(Z i ) (where Z i P) are iid random variables. Probability is that of the tail of a sum of iid rvs easy! Vincent Tan (NUS) Second-Order Asymptotics NTU 21 / 109

Relation Between Divergences Lemma For every ε (0, 1) and η (0, 1 ε), we have D ε 1 s(p Q) log 1 ε Dε h(p Q) D ε+η s (P Q) + log 1 ε η Vincent Tan (NUS) Second-Order Asymptotics NTU 22 / 109

Relation Between Divergences Lemma For every ε (0, 1) and η (0, 1 ε), we have D ε 1 s(p Q) log 1 ε Dε h(p Q) D ε+η s (P Q) + log 1 ε η Proof: We only prove the lower bound: Let δ be the likelihood ratio test { δ(z) := 1 log P(z) } Q(z) γ where γ := D ε s(p Q) ξ. Vincent Tan (NUS) Second-Order Asymptotics NTU 22 / 109

Proof of Relation Between Divergences By definition of ε-information spectrum divergence, { E P [δ(z)] = P z Z : log P(z) } Q(z) }{{} γ ε =D ε s (P Q) ξ Vincent Tan (NUS) Second-Order Asymptotics NTU 23 / 109

Proof of Relation Between Divergences By definition of ε-information spectrum divergence, { E P [δ(z)] = P z Z : log P(z) } Q(z) }{{} γ ε =D ε s (P Q) ξ Next, we estimate E Q [1 δ(z)] = z z { Q(z)1 log P(z) } Q(z) > γ P(z) exp( γ)1 { log P(z) } Q(z) > γ exp( γ) Vincent Tan (NUS) Second-Order Asymptotics NTU 23 / 109

Proof of Relation Between Divergences By definition of ε-information spectrum divergence, { E P [δ(z)] = P z Z : log P(z) } Q(z) }{{} γ ε =D ε s (P Q) ξ Next, we estimate E Q [1 δ(z)] = z z { Q(z)1 log P(z) } Q(z) > γ P(z) exp( γ)1 { log P(z) } Q(z) > γ exp( γ) Thus, one has D ε 1 h(p Q) γ log 1 ε = 1 Dε s(p Q) ξ log 1 ε Proof is completed by taking ξ 0. Vincent Tan (NUS) Second-Order Asymptotics NTU 23 / 109

Basic Definitions Define the product distributions n n P (n) (z n ) = P i (z i ), Q (n) (z n ) = Q i (z i ). i=1 i=1 Vincent Tan (NUS) Second-Order Asymptotics NTU 24 / 109

Basic Definitions Define the product distributions n n P (n) (z n ) = P i (z i ), Q (n) (z n ) = Q i (z i ). The relative entropy is i=1 i=1 D(P Q) = z P(z) log P(z) Q(z), Vincent Tan (NUS) Second-Order Asymptotics NTU 24 / 109

Basic Definitions Define the product distributions n n P (n) (z n ) = P i (z i ), Q (n) (z n ) = Q i (z i ). i=1 i=1 The relative entropy is D(P Q) = z P(z) log P(z) Q(z), The relative entropy variance is V(P Q) := z [ P(z) log P(z) ] 2 Q(z) D(P Q). Vincent Tan (NUS) Second-Order Asymptotics NTU 24 / 109

Basic Asymptotic Expansions Lemma (Asymptotic Expansion for D ε s) Assume that V(P i Q i ) V for all i and for some V > 0. Then, D ε s(p (n) Q (n) ) = nd n + nv n Φ 1 (ε) + O(1) where D n := 1 n n D(P i Q i ), i=1 V n := 1 n n V(P i Q i ). i=1 Corollary (Asymptotic Expansion for D ε s for Identical Distributions) If P i = P and Q i = Q for all i {1, 2,..., n} and P Q, then D ε s(p (n) Q (n) ) = nd + nvφ 1 (ε) + O(1) where D = D(P Q) and V = V(P Q). Vincent Tan (NUS) Second-Order Asymptotics NTU 25 / 109

Asymptotic Expansion for D ε s(p n Q n ) Moral: n (1 n n i=1 ) log P(Z i) Q(Z i ) D(P Q) d N (0, V(P Q)). Vincent Tan (NUS) Second-Order Asymptotics NTU 26 / 109

Asymptotic Expansion for D ε s(p n Q n ) Moral: n (1 n n i=1 ) log P(Z i) Q(Z i ) D(P Q) d N (0, V(P Q)). V(P Q) 1/2 D(P Q) Vincent Tan (NUS) Second-Order Asymptotics NTU 26 / 109

Berry-Esseen Theorem Theorem Let X 1,..., X n be independent random variables with Define E[X i ] = 0, E[X 2 i ] = σ 2 i, E[ X i 3 ] = T i σ 2 := 1 n Then for every n 1, ( sup Pr 1 σ n a R n σi 2, i=1 T := 1 n n i=1 ) n X i < a Φ(a) i=1 where Φ(a) = a 1 2π e t2 /2 dt. T i 6T σ 3 n Vincent Tan (NUS) Second-Order Asymptotics NTU 27 / 109

Berry-Esseen Theorem vs Central Limit Theorem Recall that CLT implies that ( ) 1 n Pr σ X i < a Φ(a) = Pr(Z < a) n for every a R. i=1 Vincent Tan (NUS) Second-Order Asymptotics NTU 28 / 109

Berry-Esseen Theorem vs Central Limit Theorem Recall that CLT implies that ( ) 1 n Pr σ X i < a Φ(a) = Pr(Z < a) n for every a R. i=1 Thus, the Berry-Esseen theorem quantifies the rate of convergence of the distribution function of to the standard Gaussian Z. 1 σ n n i=1 X i Vincent Tan (NUS) Second-Order Asymptotics NTU 28 / 109

Basic Asymptotic Expansions: Proof Now we show D ε s(p (n) Q (n) ) = nd n + nv n Φ 1 (ε) + O(1) Vincent Tan (NUS) Second-Order Asymptotics NTU 29 / 109

Basic Asymptotic Expansions: Proof Now we show D ε s(p (n) Q (n) ) = nd n + nv n Φ 1 (ε) + O(1) We may write ( Pr log P(n) (Z n ) Q (n) (Z n ) R ) = Pr ( n i=1 ) log P i(z i ) Q i (Z i ) R Vincent Tan (NUS) Second-Order Asymptotics NTU 29 / 109

Basic Asymptotic Expansions: Proof Now we show D ε s(p (n) Q (n) ) = nd n + nv n Φ 1 (ε) + O(1) We may write ( Pr log P(n) (Z n ) Q (n) (Z n ) R ) = Pr By the Berry-Esseen theorem, ( n ) Pr log P i(z i ) Q i (Z i ) R i=1 ( n i=1 ) log P i(z i ) Q i (Z i ) R ( ) R ndn = Φ ± c nvn n Vincent Tan (NUS) Second-Order Asymptotics NTU 29 / 109

Basic Asymptotic Expansions: Proof Now we show D ε s(p (n) Q (n) ) = nd n + nv n Φ 1 (ε) + O(1) We may write ( Pr log P(n) (Z n ) Q (n) (Z n ) R ) = Pr By the Berry-Esseen theorem, ( n ) Pr log P i(z i ) Q i (Z i ) R i=1 ( n i=1 ) log P i(z i ) Q i (Z i ) R ( ) R ndn = Φ ± c nvn n Constant c > 0 finite because V(P i Q i ) V > 0 for all i. Vincent Tan (NUS) Second-Order Asymptotics NTU 29 / 109

Basic Asymptotic Expansions: Proof Now we show D ε s(p (n) Q (n) ) = nd n + nv n Φ 1 (ε) + O(1) We may write ( Pr log P(n) (Z n ) Q (n) (Z n ) R ) = Pr By the Berry-Esseen theorem, ( n ) Pr log P i(z i ) Q i (Z i ) R i=1 ( n i=1 ) log P i(z i ) Q i (Z i ) R ( ) R ndn = Φ ± c nvn n Constant c > 0 finite because V(P i Q i ) V > 0 for all i. Now upper bound the RHS by ε and solve for R Vincent Tan (NUS) Second-Order Asymptotics NTU 29 / 109

Further Asymptotic Expansions Our real objective in this section is β 1 ε (P n, Q n ) := inf {P MD(δ) : P FA (δ) ε} δ:z {0,1} Vincent Tan (NUS) Second-Order Asymptotics NTU 30 / 109

Further Asymptotic Expansions Our real objective in this section is β 1 ε (P n, Q n ) := inf {P MD(δ) : P FA (δ) ε} δ:z {0,1} Now, we assume that P n and Q n are product distributions where the component distributions are identical, i.e., P n (z n ) = n P(z i ), Q n (z n ) = i=1 n Q(z i ). i=1 Vincent Tan (NUS) Second-Order Asymptotics NTU 30 / 109

Further Asymptotic Expansions Our real objective in this section is β 1 ε (P n, Q n ) := inf {P MD(δ) : P FA (δ) ε} δ:z {0,1} Now, we assume that P n and Q n are product distributions where the component distributions are identical, i.e., P n (z n ) = n P(z i ), Q n (z n ) = i=1 n Q(z i ). i=1 Recall that D ε h(p n Q n ) := log β 1 ε(p n, Q n ) 1 ε and D ε h is related to Dε s by simple bounds Vincent Tan (NUS) Second-Order Asymptotics NTU 30 / 109

Further Asymptotic Expansions Lemma Assume that P Q. Then, for every ε (0, 1), D ε h(p n Q n ) = nd + nvφ 1 (ε) + O(log n) where D = D(P Q) and V = V(P Q). Vincent Tan (NUS) Second-Order Asymptotics NTU 31 / 109

Further Asymptotic Expansions Lemma Assume that P Q. Then, for every ε (0, 1), D ε h(p n Q n ) = nd + nvφ 1 (ε) + O(log n) where D = D(P Q) and V = V(P Q). Corollary Assume that P Q. Then, for every ε (0, 1), β 1 ε (P n, Q n ) = exp ( nd nvφ 1 (ε) + O(log n) ) Vincent Tan (NUS) Second-Order Asymptotics NTU 31 / 109

Chernoff-Stein Lemma H. Chernoff C. Stein Vincent Tan (NUS) Second-Order Asymptotics NTU 32 / 109

Chernoff-Stein Lemma H. Chernoff C. Stein Corollary (Chernoff-Stein Lemma) lim 1 n n log β 1 ε(p n, Q n ) = D(P Q), ε (0, 1). Vincent Tan (NUS) Second-Order Asymptotics NTU 32 / 109

Chernoff-Stein Lemma H. Chernoff C. Stein Corollary (Chernoff-Stein Lemma) lim 1 n n log β 1 ε(p n, Q n ) = D(P Q), ε (0, 1). We have proved a refinement to the Chernoff-Stein Lemma, cf. log β 1 ε (P n, Q n ) = nd(p Q) + nv(p Q)Φ 1 (ε) + O(log n) Vincent Tan (NUS) Second-Order Asymptotics NTU 32 / 109

Further Asymptotic Expansions: Proof Recall that D ε s(p n Q n 1 ) log 1 ε Dε h(p n Q n ) D ε+η s (P n Q n ) + log 1 ε η Vincent Tan (NUS) Second-Order Asymptotics NTU 33 / 109

Further Asymptotic Expansions: Proof Recall that D ε s(p n Q n 1 ) log 1 ε Dε h(p n Q n ) D ε+η s (P n Q n ) + log 1 ε η From the lower bound, we obtain D ε s(p n Q n ) log 1 1 ε = nd + nvφ 1 (ε) + O(1) Vincent Tan (NUS) Second-Order Asymptotics NTU 33 / 109

Further Asymptotic Expansions: Proof Recall that D ε s(p n Q n 1 ) log 1 ε Dε h(p n Q n ) D ε+η s (P n Q n ) + log 1 ε η From the lower bound, we obtain D ε s(p n Q n ) log 1 1 ε = nd + nvφ 1 (ε) + O(1) From the upper bound, setting η = 1 n, we obtain Ds ε+η (P n Q n ) + log 1 ε = nd + ( nvφ 1 ε + 1 ) + 1 log n + O(1) η n 2 Taylor = nd + nvφ 1 (ε) + 1 log n + O(1) 2 Vincent Tan (NUS) Second-Order Asymptotics NTU 33 / 109

Summary of Binary Hypothesis Testing The main result of this section is 1 V(P Q) n log β 1 ε(p n, Q n ) D(P Q) + Φ 1 (ε). n Vincent Tan (NUS) Second-Order Asymptotics NTU 34 / 109

Summary of Binary Hypothesis Testing The main result of this section is 1 V(P Q) n log β 1 ε(p n, Q n ) D(P Q) + Φ 1 (ε). n We can be more precise about the third-order terms (omitted in this tutorial) Sometimes we can t guarantee that inf V(P i Q i ) > 0. i 1 In this case, we can use Chebyshev s inequality instead of the Berry-Esseen theorem to upper bound D ε s(p (n) Q (n) ) Vincent Tan (NUS) Second-Order Asymptotics NTU 34 / 109

Setup for Lossless Source Coding x m ˆx f ϕ Illustration of the fixed-to-fixed length source coding problem. Vincent Tan (NUS) Second-Order Asymptotics NTU 36 / 109

Setup for Lossless Source Coding x m ˆx f ϕ Illustration of the fixed-to-fixed length source coding problem. An (M, ε)-code for the source P P(X ) consists of encoder f : X {1,..., M} decoder ϕ : {1,..., M} X such that the probability of error P ( {x X : ϕ(f (x)) x} ) ε. Vincent Tan (NUS) Second-Order Asymptotics NTU 36 / 109

Non-Asymptotic Fundamental Limit for Source Coding Define M (P, ε) := min{m : an (M, ε)-code for P} Vincent Tan (NUS) Second-Order Asymptotics NTU 37 / 109

Non-Asymptotic Fundamental Limit for Source Coding Define M (P, ε) := min{m : an (M, ε)-code for P} When we observe n independent and identically distributed realizations of the source, we are interested in M (P n, ε), where P n (x n ) = n P(x i ). i=1 Vincent Tan (NUS) Second-Order Asymptotics NTU 37 / 109

The Source Coding Theorem Shannon, in his seminal 1948 paper, showed using typicality arguments that Theorem (Shannon (1948)) For any discrete memoryless source and any ε (0, 1), 1 lim n n log M (P n, ε) = H(P) = x P(x) log 1 P(x), bits per source symb. The quantity H(P) is known as the entropy of the source. Vincent Tan (NUS) Second-Order Asymptotics NTU 38 / 109

The Source Coding Theorem Shannon, in his seminal 1948 paper, showed using typicality arguments that Theorem (Shannon (1948)) For any discrete memoryless source and any ε (0, 1), 1 lim n n log M (P n, ε) = H(P) = x P(x) log 1 P(x), bits per source symb. The quantity H(P) is known as the entropy of the source. Interpretation: Entropy is the smallest exponent of the size of sets in X n with P n -probability of at least 1 ε. Vincent Tan (NUS) Second-Order Asymptotics NTU 38 / 109

Refinements to the Source Coding Theorem Can we refine the remainder terms in log M (P n, ε) = nh(p) + o(n) bits per source symb. Vincent Tan (NUS) Second-Order Asymptotics NTU 39 / 109

Refinements to the Source Coding Theorem Can we refine the remainder terms in log M (P n, ε) = nh(p) + o(n) bits per source symb. In fact, this can be done very easily by using binary hypothesis testing! Vincent Tan (NUS) Second-Order Asymptotics NTU 39 / 109

Refinements to the Source Coding Theorem Can we refine the remainder terms in log M (P n, ε) = nh(p) + o(n) bits per source symb. In fact, this can be done very easily by using binary hypothesis testing! Strong and simple connection between binary hypothesis testing and fixed-length lossless source coding Vincent Tan (NUS) Second-Order Asymptotics NTU 39 / 109

Non-Asymptotic Achievability for Source Coding Lemma Let ε (0, 1). We have M (P, ε) β 1 ε (P, µ) = D ε h(p µ) log where µ is the counting measure, i.e., 1 1 ε µ(a) = A, A X Vincent Tan (NUS) Second-Order Asymptotics NTU 40 / 109

Non-Asymptotic Achievability for Source Coding Lemma Let ε (0, 1). We have M (P, ε) β 1 ε (P, µ) = D ε h(p µ) log where µ is the counting measure, i.e., 1 1 ε µ(a) = A, A X Remark that second argument β 1 ε (P, Q) can be any unnormalized measure (not necessarily a probability measure), i.e., we take Q = µ Vincent Tan (NUS) Second-Order Asymptotics NTU 40 / 109

Proof of Achievability for Source Coding Let T X be a typical set of symbols with P-probability 1 ε, i.e., P(T ) 1 ε Vincent Tan (NUS) Second-Order Asymptotics NTU 41 / 109

Proof of Achievability for Source Coding Let T X be a typical set of symbols with P-probability 1 ε, i.e., P(T ) 1 ε Search over all such sets for the one with the smallest cardinality. Assign each symbol in T a unique index from {1,..., T } and those in X \ T the symbol 1. Vincent Tan (NUS) Second-Order Asymptotics NTU 41 / 109

Proof of Achievability for Source Coding Let T X be a typical set of symbols with P-probability 1 ε, i.e., P(T ) 1 ε Search over all such sets for the one with the smallest cardinality. Assign each symbol in T a unique index from {1,..., T } and those in X \ T the symbol 1. Clearly, error probability ε and hence M (P, ε) The RHS is exactly β 1 ε (P, µ). min T T X :P(T ) 1 ε Vincent Tan (NUS) Second-Order Asymptotics NTU 41 / 109

Non-Asymptotic Converse for Source Coding Lemma Let ε (0, 1). For any η (0, 1 ε), we have log M (P, ε) D ε+η s (P µ) log 1 η Proof follows from the idea as upper bounding D ε h with Dε+η s Vincent Tan (NUS) Second-Order Asymptotics NTU 42 / 109

Non-Asymptotic Converse for Source Coding Lemma Let ε (0, 1). For any η (0, 1 ε), we have log M (P, ε) D ε+η s (P µ) log 1 η Proof follows from the idea as upper bounding D ε h In summary, with Dε+η s D ε+η s (P µ) log 1 η log M (P, ε) D ε 1 h(p µ) log 1 ε Vincent Tan (NUS) Second-Order Asymptotics NTU 42 / 109

Asymptotic Expansion for Lossless Source Coding Define the entropy variance as V(P) = x Theorem (Strassen (1962)) P(x) For any source P with V(P) > 0, we have [ log 1 ] 2 P(x) H(P) log M (P n, ε) = nh(p) nv(p)φ 1 (ε) + O(log n) V. Strassen Vincent Tan (NUS) Second-Order Asymptotics NTU 43 / 109

Asymptotic Expansion for Source Coding: Proof The lower bound reads log M (P n, ε) D ε+η s (P n µ n ) log 1 η Vincent Tan (NUS) Second-Order Asymptotics NTU 44 / 109

Asymptotic Expansion for Source Coding: Proof The lower bound reads log M (P n, ε) D ε+η s (P n µ n ) log 1 η Choose η = 1 n. Use the asymptotic expansion for D ε s: log M (P n, ε) D ε+ 1 n s (P n µ n ) + O(log n) = nh(p) nv(p)φ 1 ( ε + 1 n ) + O(log n) because D(P µ) = H(P), and V(P µ) = V(P). Vincent Tan (NUS) Second-Order Asymptotics NTU 44 / 109

Asymptotic Expansion for Source Coding: Proof The lower bound reads log M (P n, ε) D ε+η s (P n µ n ) log 1 η Choose η = 1 n. Use the asymptotic expansion for D ε s: log M (P n, ε) D ε+ 1 n s (P n µ n ) + O(log n) = nh(p) nv(p)φ 1 ( ε + 1 n ) + O(log n) because D(P µ) = H(P), and V(P µ) = V(P). Proof is completed by Taylor expanding Φ 1 ( ). Vincent Tan (NUS) Second-Order Asymptotics NTU 44 / 109

Asymptotic Expansion for Source Coding: Proof The upper bound reads log M (P n, ε) D ε h(p n µ n ) log 1 1 ε Vincent Tan (NUS) Second-Order Asymptotics NTU 45 / 109

Asymptotic Expansion for Source Coding: Proof The upper bound reads log M (P n, ε) D ε h(p n µ n ) log 1 1 ε The asymptotic expansion for D ε h then yields log M (P n, ε) nh(p) nv(p)φ 1 (ε) + O(log n) again because D(P µ) = H(P) and V(P µ) = V(P) Vincent Tan (NUS) Second-Order Asymptotics NTU 45 / 109

Lossless Source Coding: Remarks Main takeaway here is that log M (P n, ε) = nh(p) nv(p)φ 1 (ε) + O(log n) We can be more precise about O(log n) = 1 log n + O(1) 2 This requires some additional techniques. Observe that fixed-length lossless source coding is nothing but binary hypothesis testing with Q taken to be the counting measure µ and P remains as P! Vincent Tan (NUS) Second-Order Asymptotics NTU 46 / 109

Lossless Source Coding: Remarks Main takeaway here is that log M (P n, ε) = nh(p) nv(p)φ 1 (ε) + O(log n) We can be more precise about O(log n) = 1 log n + O(1) 2 This requires some additional techniques. Observe that fixed-length lossless source coding is nothing but binary hypothesis testing with Q taken to be the counting measure µ and P remains as P! Proof follows directly from expansions of D ε h (Pn Q n ) and D ε s(p n Q n ) Vincent Tan (NUS) Second-Order Asymptotics NTU 46 / 109

The Setup of the Channel Coding Problem m x y ˆm f W ϕ Illustration of the channel coding problem. Vincent Tan (NUS) Second-Order Asymptotics NTU 48 / 109

The Setup of the Channel Coding Problem m x y ˆm f W ϕ Illustration of the channel coding problem. An (M, ε) av -code for the channel W P(Y X ) consists of encoder f : {1,..., M} X decoder ϕ : X {1,..., M} such that the average probability of error 1 M M W(Y \ ϕ(m) f (m)) ε. m=1 Vincent Tan (NUS) Second-Order Asymptotics NTU 48 / 109

Non-Asymptotic Fund. Limit for Channel Coding Define M av(w, ε) := max{m : an (M, ε) av -code for W} Vincent Tan (NUS) Second-Order Asymptotics NTU 49 / 109

Non-Asymptotic Fund. Limit for Channel Coding Define M av(w, ε) := max{m : an (M, ε) av -code for W} When we use the channel n times, we are interested in n Mav(W n, ε), where W n (y n x n ) = W(y i x i ). i=1 Vincent Tan (NUS) Second-Order Asymptotics NTU 49 / 109

Non-Asymptotic Fund. Limit for Channel Coding Define M av(w, ε) := max{m : an (M, ε) av -code for W} When we use the channel n times, we are interested in n Mav(W n, ε), where W n (y n x n ) = W(y i x i ). This means that the channel is stationary and memoryless. i=1 Vincent Tan (NUS) Second-Order Asymptotics NTU 49 / 109

Non-Asymptotic Fund. Limit for Channel Coding Define M av(w, ε) := max{m : an (M, ε) av -code for W} When we use the channel n times, we are interested in n Mav(W n, ε), where W n (y n x n ) = W(y i x i ). This means that the channel is stationary and memoryless. We also assume X and Y are finite so W is a DMC. i=1 Vincent Tan (NUS) Second-Order Asymptotics NTU 49 / 109

Non-Asymptotic Fund. Limit for Channel Coding Define M av(w, ε) := max{m : an (M, ε) av -code for W} When we use the channel n times, we are interested in M av(w n, ε), where W n (y n x n ) = n W(y i x i ). This means that the channel is stationary and memoryless. i=1 We also assume X and Y are finite so W is a DMC. Later we will also consider AWGN channels in which there is a power constraint Vincent Tan (NUS) Second-Order Asymptotics NTU 49 / 109

The Channel Coding Theorem Shannon, in his seminal 1948 paper, showed using typicality arguments that Theorem (Shannon (1948)) For any discrete memoryless channel and any ε (0, 1), 1 lim n n log M av(w n, ε) = C = max I(P, W), bits per channel use. P Mutual information for an input distribution P and channel W is defined as I(P, W) = x P(x)D(W( x) PW) = x,y P(x)W(y x) log W(y x) PW(y) Vincent Tan (NUS) Second-Order Asymptotics NTU 50 / 109

The Channel Coding Theorem Shannon, in his seminal 1948 paper, showed using typicality arguments that Theorem (Shannon (1948)) For any discrete memoryless channel and any ε (0, 1), 1 lim n n log M av(w n, ε) = C = max I(P, W), bits per channel use. P Mutual information for an input distribution P and channel W is defined as I(P, W) = x P(x)D(W( x) PW) = x,y P(x)W(y x) log W(y x) PW(y) Interpretation: We can send up to C bits per channel use over W. Vincent Tan (NUS) Second-Order Asymptotics NTU 50 / 109

Refinements to the Channel Coding Theorem Can we refine the remainder terms in log M av(w n, ε) = nc + o(n) Vincent Tan (NUS) Second-Order Asymptotics NTU 51 / 109

Refinements to the Channel Coding Theorem Can we refine the remainder terms in log M av(w n, ε) = nc + o(n) This is not as simple as lossless source coding Vincent Tan (NUS) Second-Order Asymptotics NTU 51 / 109

Refinements to the Channel Coding Theorem Can we refine the remainder terms in log M av(w n, ε) = nc + o(n) This is not as simple as lossless source coding Requires understanding of non-asymptotic (finite blocklength) bounds Vincent Tan (NUS) Second-Order Asymptotics NTU 51 / 109

Refinements to the Channel Coding Theorem Can we refine the remainder terms in log M av(w n, ε) = nc + o(n) This is not as simple as lossless source coding Requires understanding of non-asymptotic (finite blocklength) bounds Careful asymptotic evaluations Vincent Tan (NUS) Second-Order Asymptotics NTU 51 / 109

Non-Asymptotic Achievability for Channel Coding Lemma (Feinstein (1954)) Let ε (0, 1) and W be any channel from X to Y. Then for any η (0, ε), we have log M av(w, ε) sup P D ε η s (P W P PW) log 1 η. In fact, the original Feinstein s lemma (1954) works for max probability of error log M max(w, ε) sup P D ε η s (P W P PW) log 1 η Vincent Tan (NUS) Second-Order Asymptotics NTU 52 / 109

Non-Asymptotic Achievability for Channel Coding Lemma (Feinstein (1954)) Let ε (0, 1) and W be any channel from X to Y. Then for any η (0, ε), we have log M av(w, ε) sup P D ε η s (P W P PW) log 1 η. In fact, the original Feinstein s lemma (1954) works for max probability of error log M max(w, ε) sup P D ε η s (P W P PW) log 1 η Again connection to binary hypothesis testing Vincent Tan (NUS) Second-Order Asymptotics NTU 52 / 109

Proof of Feinstein s Lemma : Part I Generate M codewords independently from P. This forms the codebook C = {x(1),..., x(m)}. Vincent Tan (NUS) Second-Order Asymptotics NTU 53 / 109

Proof of Feinstein s Lemma : Part I Generate M codewords independently from P. This forms the codebook C = {x(1),..., x(m)}. Given y, decode to m {1,..., M} if and only if log W(y x(m)) PW(y) > γ Vincent Tan (NUS) Second-Order Asymptotics NTU 53 / 109

Proof of Feinstein s Lemma : Part I Generate M codewords independently from P. This forms the codebook C = {x(1),..., x(m)}. Given y, decode to m {1,..., M} if and only if log W(y x(m)) PW(y) > γ Assume m = 1. Error events are { E 1 := E 2 := log W(Y X(1)) PW(Y) } γ { m 1 : log W(Y X( m)) > γ PW(Y) } Vincent Tan (NUS) Second-Order Asymptotics NTU 53 / 109

Proof of Feinstein s Lemma : Part II Probability of error is Pr(E) Pr(E 1 ) + Pr(E 2 ) Vincent Tan (NUS) Second-Order Asymptotics NTU 54 / 109

Proof of Feinstein s Lemma : Part II Probability of error is Pr(E) Pr(E 1 ) + Pr(E 2 ) First term is (related to information spectrum divergence) ( Pr(E 1 ) = Pr log W(Y X(1)) ) γ PW(Y) Vincent Tan (NUS) Second-Order Asymptotics NTU 54 / 109

Proof of Feinstein s Lemma : Part II Probability of error is Pr(E) Pr(E 1 ) + Pr(E 2 ) First term is (related to information spectrum divergence) ( Pr(E 1 ) = Pr log W(Y X(1)) ) γ PW(Y) Second term is Pr(E 2 ) = Pr M x,y M x,y ( m 1 : log W(Y X( m)) PW(Y) P(x)PW(y)1 ) > γ { log W(y x) PW(y) > γ P(x)W(y x) exp( γ)1 } { log W(y x) } PW(y) > γ M exp( γ) Vincent Tan (NUS) Second-Order Asymptotics NTU 54 / 109

Proof of Feinstein s Lemma : Part III Hence, there exists an (M, ε) av -code such that for every P and every γ, ( ε Pr log W(Y X(1)) ) γ + M exp( γ) PW(Y) or setting η := M exp( γ), ( ε η Pr log W(Y X(1)) log M ) PW(Y) η In other words, for every P and every η, we have an (M, ε) av -code s.t. log M η Dε η s (P W P PW) Vincent Tan (NUS) Second-Order Asymptotics NTU 55 / 109

Proof of Feinstein s Lemma : Part III Hence, there exists an (M, ε) av -code such that for every P and every γ, ( ε Pr log W(Y X(1)) ) γ + M exp( γ) PW(Y) or setting η := M exp( γ), ( ε η Pr log W(Y X(1)) log M ) PW(Y) η In other words, for every P and every η, we have an (M, ε) av -code s.t. log M η Dε η s (P W P PW) This completes the proof of Feinstein s lemma. Vincent Tan (NUS) Second-Order Asymptotics NTU 55 / 109

Second-Order Achievability for Channel Coding Feinstein s lemma says that log M av(w n, ε) sup D ε η s (P W n P PW n ) log 1 P P(X n ) η. Vincent Tan (NUS) Second-Order Asymptotics NTU 56 / 109

Second-Order Achievability for Channel Coding Feinstein s lemma says that log M av(w n, ε) sup D ε η s (P W n P PW n ) log 1 P P(X n ) η. Choose η = 1 n and where P(x n ) = which we assume to be unique. n P (x i ) i=1 P = arg max I(P, W) P Vincent Tan (NUS) Second-Order Asymptotics NTU 56 / 109

Second-Order Achievability for Channel Coding By the asymptotic expansion of D ε s, we obtain D ε η s ((P ) n W n (P ) n (P W) n ) = ni(p, W) + nu(p, W)Φ 1 (ε) + O(log n) Vincent Tan (NUS) Second-Order Asymptotics NTU 57 / 109

Second-Order Achievability for Channel Coding By the asymptotic expansion of D ε s, we obtain D ε η s ((P ) n W n (P ) n (P W) n ) = ni(p, W) + nu(p, W)Φ 1 (ε) + O(log n) Then using the bound on log M av(w n, ε), we obtain log M av(w n, ε) ni(p, W) + nu(p, W)Φ 1 (ε) + O(log n) = nc + nu(p, W)Φ 1 (ε) + O(log n) where the unconditional information variance is U(P, W) = [ P (x)w(y x) log W(y x) P W(y) C x,y ] 2 Vincent Tan (NUS) Second-Order Asymptotics NTU 57 / 109

Second-Order Achievability for Channel Coding Lemma For a DMC with unique capacity-achieving input distribution P, log M av(w n, ε) nc + nu(p, W)Φ 1 (ε) + O(log n) Vincent Tan (NUS) Second-Order Asymptotics NTU 58 / 109

Second-Order Achievability for Channel Coding Lemma For a DMC with unique capacity-achieving input distribution P, log M av(w n, ε) nc + nu(p, W)Φ 1 (ε) + O(log n) The second-order term contains U(P, W) = x,y P (x)w(y x) [ log W(y x) ] 2 P W(y) C This is not quite right, but almost right; need a converse. We can refine the third-order term (Polyanskiy s thesis (2010)) Vincent Tan (NUS) Second-Order Asymptotics NTU 58 / 109

Second-Order Achievability for Channel Coding Lemma For a DMC with unique capacity-achieving input distribution P, log M av(w n, ε) nc + nu(p, W)Φ 1 (ε) + O(log n) The second-order term contains U(P, W) = x,y P (x)w(y x) [ log W(y x) ] 2 P W(y) C This is not quite right, but almost right; need a converse. We can refine the third-order term (Polyanskiy s thesis (2010)) Since Feinstein s lemma holds under the max error setting, the same bound holds for log M max(w n, ε). Vincent Tan (NUS) Second-Order Asymptotics NTU 58 / 109

Non-Asymptotic Converse for Channel Coding Lemma (Hayashi-Nagaoka (2003)) Let ε (0, 1) and η (0, 1 ε). Let W be any channel from X to Y. Then, log M av(w, ε) inf sup Q P(Y) P P(X ) D ε+η s (P W P Q) + log 1 η Amazing duality with Feinstein s lemma log M av(w, ε) sup P P(X ) Ds ε η (P W P PW) log 1 η Vincent Tan (NUS) Second-Order Asymptotics NTU 59 / 109

Non-Asymptotic Converse for Channel Coding Lemma (Hayashi-Nagaoka (2003)) Let ε (0, 1) and η (0, 1 ε). Let W be any channel from X to Y. Then, log M av(w, ε) inf sup Q P(Y) P P(X ) D ε+η s (P W P Q) + log 1 η Amazing duality with Feinstein s lemma log M av(w, ε) sup P P(X ) Ds ε η (P W P PW) log 1 η But the user can choose the output distribution Q Vincent Tan (NUS) Second-Order Asymptotics NTU 59 / 109

Non-Asymptotic Converse for Channel Coding Lemma (Hayashi-Nagaoka (2003)) Let ε (0, 1) and η (0, 1 ε). Let W be any channel from X to Y. Then, log M av(w, ε) inf sup Q P(Y) P P(X ) D ε+η s (P W P Q) + log 1 η Amazing duality with Feinstein s lemma log M av(w, ε) sup P P(X ) Ds ε η (P W P PW) log 1 η But the user can choose the output distribution Q Key in the second-order converse is how to choose Q so as to ensure the evaluation of RHS is easy Vincent Tan (NUS) Second-Order Asymptotics NTU 59 / 109

Proof of Hayashi-Nagaoka Lemma : Part I Fix any (M, ε) av -code for W. This induces the Markov chain J X Y Ĵ where J is uniformly distributed on {1,..., M}. This Markov chain induces the code distribution P (j, x, y, ĵ) = 1 JXYĴ 1{x = f (j)}w(y x)1{ĵ = ϕ(y)}. M Due to the data processing inequality for D ε h, we obtain D ε h(p W P Q) = D ε h(p XY P X Q Y ) D ε h(p JĴ P J QĴ) where QĴ is induced by the decoder ϕ applied to Q Y. Vincent Tan (NUS) Second-Order Asymptotics NTU 60 / 109

Proof of Hayashi-Nagaoka Lemma : Part I Fix any (M, ε) av -code for W. This induces the Markov chain J X Y Ĵ where J is uniformly distributed on {1,..., M}. This Markov chain induces the code distribution P (j, x, y, ĵ) = 1 JXYĴ 1{x = f (j)}w(y x)1{ĵ = ϕ(y)}. M Due to the data processing inequality for D ε h, we obtain D ε h(p W P Q) = D ε h(p XY P X Q Y ) D ε h(p JĴ P J QĴ) where QĴ is induced by the decoder ϕ applied to Q Y. Consider the test δ(j, ĵ) = 1{j ĵ} Vincent Tan (NUS) Second-Order Asymptotics NTU 60 / 109

Proof of Hayashi-Nagaoka Lemma : Part II The test satisfies E PJĴ [δ(j, Ĵ)] = Pr(J Ĵ) ε. Vincent Tan (NUS) Second-Order Asymptotics NTU 61 / 109

Proof of Hayashi-Nagaoka Lemma : Part II The test satisfies E PJĴ [δ(j, Ĵ)] = Pr(J Ĵ) ε. Furthermore, E PJ QĴ[δ(J, Ĵ)] = j,ĵ P J (j)qĵ(ĵ)1{j ĵ} = 1 j,ĵ P J (j)qĵ(ĵ)1{j = ĵ} = 1 ĵ QĴ(ĵ) j P J (j)1{j = ĵ} = 1 ĵ QĴ(ĵ) 1 M = 1 1 M Vincent Tan (NUS) Second-Order Asymptotics NTU 61 / 109

Proof of Hayashi-Nagaoka Lemma : Part III By the definition of the hypothesis-testing divergence, D ε h(p JĴ P J QĴ) log M + log(1 ε) Vincent Tan (NUS) Second-Order Asymptotics NTU 62 / 109

Proof of Hayashi-Nagaoka Lemma : Part III By the definition of the hypothesis-testing divergence, D ε h(p JĴ P J QĴ) log M + log(1 ε) By the relation between D ε s and D ε h, log M D ε h(p W P Q) + log 1 1 ε D ε+η s (P W P Q) + log 1 η Maximize over P to make the bound code-independent log M sup P Ds ε+η (P W P Q) + log 1 η Vincent Tan (NUS) Second-Order Asymptotics NTU 62 / 109

Proof of Hayashi-Nagaoka Lemma : Part III By the definition of the hypothesis-testing divergence, D ε h(p JĴ P J QĴ) log M + log(1 ε) By the relation between D ε s and D ε h, log M D ε h(p W P Q) + log 1 1 ε D ε+η s (P W P Q) + log 1 η Maximize over P to make the bound code-independent log M sup P Q is a free parameter. Minimize over it. Ds ε+η (P W P Q) + log 1 η Vincent Tan (NUS) Second-Order Asymptotics NTU 62 / 109

Second-Order Converse for Channel Coding : Part I Fix an (M, ε) av -code. Starting from the Hayashi-Nagaoka converse for the channel W n, we obtain log M sup P for any fixed Q P(Y n ). D ε+η s (P W n P Q) + log 1 η We can replace the optimization over P P(X n ) to an optimization over input sequences x n X n : log M max x n D ε+η s (W n ( x n ) Q) + log 1 η Vincent Tan (NUS) Second-Order Asymptotics NTU 63 / 109

Second-Order Converse for Channel Coding : Part I Fix an (M, ε) av -code. Starting from the Hayashi-Nagaoka converse for the channel W n, we obtain log M sup P for any fixed Q P(Y n ). D ε+η s (P W n P Q) + log 1 η We can replace the optimization over P P(X n ) to an optimization over input sequences x n X n : log M max x n D ε+η s (W n ( x n ) Q) + log 1 η Now choose Q P(Y n ) to be the convex combination Q(y n 1 ) = (PW) n (y n ) P n (X ) P P n(x ) Vincent Tan (NUS) Second-Order Asymptotics NTU 63 / 109

Second-Order Converse for Channel Coding : Part II Lemma Let θ i 0 be such that i θ i = 1. Then ( ) D ε s P θ i Q i inf {D εs(p Q } i ) + log 1θi i i Vincent Tan (NUS) Second-Order Asymptotics NTU 64 / 109

Second-Order Converse for Channel Coding : Part II Lemma Let θ i 0 be such that i θ i = 1. Then ( ) D ε s P θ i Q i inf {D εs(p Q } i ) + log 1θi i i From the previous derivations, log M max D ε+η x n s W n ( x n ) 1 P n (X ) P P n(x ) (PW) n + log 1 η Vincent Tan (NUS) Second-Order Asymptotics NTU 64 / 109

Second-Order Converse for Channel Coding : Part II Lemma Let θ i 0 be such that i θ i = 1. Then ( ) D ε s P θ i Q i inf {D εs(p Q } i ) + log 1θi i i From the previous derivations, log M max D ε+η x n s W n ( x n ) 1 P n (X ) P P n(x ) (PW) n + log 1 η By sieving out the type of x n, one has log M max x n D ε+η s (W n ( x n ) (ˆP x nw) n ) + log P n (X ) + log 1 η Vincent Tan (NUS) Second-Order Asymptotics NTU 64 / 109

Second-Order Converse for Channel Coding : Part III log M max x n D ε+η s (W n ( x n ) (ˆP x nw) n ) + log P n (X ) + log 1 η }{{} =O(log n) Choose η = 1 n. By the asymptotic expansion of the information spectrum divergence, log M max ni(p, W) + nv(p, W)Φ 1 (ε) + O(log n). P P n(x ) where the conditional information variance is V(P, W) = P(x) [ W(y x) log W(y x) PW(y) D(W( x) PW) x y ] 2 Vincent Tan (NUS) Second-Order Asymptotics NTU 65 / 109

Second-Order Converse for Channel Coding : Part III log M max x n D ε+η s (W n ( x n ) (ˆP x nw) n ) + log P n (X ) + log 1 η }{{} =O(log n) Choose η = 1 n. By the asymptotic expansion of the information spectrum divergence, log M max ni(p, W) + nv(p, W)Φ 1 (ε) + O(log n). P P n(x ) where the conditional information variance is V(P, W) = P(x) [ W(y x) log W(y x) PW(y) D(W( x) PW) x y ] 2 Now, invoke continuity of P I(P, W) and P V(P, W) to replace P with P above Vincent Tan (NUS) Second-Order Asymptotics NTU 65 / 109

Second-Order Converse for Channel Coding : Part IV Lemma For a DMC with unique capacity-achieving input distribution P, log M av(w n, ε) nc + nv(p, W)Φ 1 (ε) + O(log n) Vincent Tan (NUS) Second-Order Asymptotics NTU 66 / 109

Second-Order Converse for Channel Coding : Part IV Lemma For a DMC with unique capacity-achieving input distribution P, log M av(w n, ε) nc + nv(p, W)Φ 1 (ε) + O(log n) This is almost the same as the achievability log M av(w n, ε) nc + nu(p, W)Φ 1 (ε) + O(log n) Vincent Tan (NUS) Second-Order Asymptotics NTU 66 / 109

Second-Order Converse for Channel Coding : Part IV Lemma For a DMC with unique capacity-achieving input distribution P, log M av(w n, ε) nc + nv(p, W)Φ 1 (ε) + O(log n) This is almost the same as the achievability log M av(w n, ε) nc + nu(p, W)Φ 1 (ε) + O(log n) So is U(P, W) = V(P, W)?? Vincent Tan (NUS) Second-Order Asymptotics NTU 66 / 109

Second-Order Asymptotics for Channel Coding Conditional information variance is V(P, W) = P(x) [ W(y x) log W(y x) PW(y) D(W( x) PW) x y Unconditional information variance is U(P, W) = x P(x) y W(y x) [ log W(y x) ] 2 PW(y) C ] 2 Vincent Tan (NUS) Second-Order Asymptotics NTU 67 / 109

Second-Order Asymptotics for Channel Coding Conditional information variance is V(P, W) = P(x) [ W(y x) log W(y x) PW(y) D(W( x) PW) x y Unconditional information variance is U(P, W) = x P(x) y W(y x) [ log W(y x) ] 2 PW(y) C ] 2 In general, if (X, Y) P W, var(i(x; Y)) = U(P, W) V(P, W) = E[var(i(X; Y)) X] by the law of total variance. Vincent Tan (NUS) Second-Order Asymptotics NTU 67 / 109

Second-Order Asymptotics for Channel Coding If we choose P to be capacity-achieving (i.e., I(P, W) = C), then by the KKT conditions, D(W( x) P W) = C Vincent Tan (NUS) Second-Order Asymptotics NTU 68 / 109

Second-Order Asymptotics for Channel Coding If we choose P to be capacity-achieving (i.e., I(P, W) = C), then by the KKT conditions, D(W( x) P W) = C and so V(P, W) = x = x P(x) y P(x) y [ W(y x) log W(y x) ] 2 P W(y) D(W( x) P W) [ W(y x) log W(y x) ] 2 P W(y) C = U(P, W) Vincent Tan (NUS) Second-Order Asymptotics NTU 68 / 109

Second-Order Asymptotics for Channel Coding Theorem (Strassen (1962)) For a DMC with unique capacity-achieving input distribution P, log Mav(W n, ε) = nc + nvφ 1 (ε) + O(log n) where V = V(P, W) Direct part based on Feinstein Converse part based on Hayashi-Nagaoka with clever choice of Q Vincent Tan (NUS) Second-Order Asymptotics NTU 69 / 109

Second-Order Asymptotics for Channel Coding Theorem (Strassen (1962)) For a DMC with unique capacity-achieving input distribution P, log Mav(W n, ε) = nc + nvφ 1 (ε) + O(log n) where V = V(P, W) Direct part based on Feinstein Converse part based on Hayashi-Nagaoka with clever choice of Q We can optimize the third-order term; Usually O(log n) = 1 log n + O(1) 2 See Tomamichel-Tan (2013) and Altuğ-Wagner (2014). Vincent Tan (NUS) Second-Order Asymptotics NTU 69 / 109

Summary : Channel Coding Derived the second-order asymptotic expansions for DMCs and AWGN channels Vincent Tan (NUS) Second-Order Asymptotics NTU 70 / 109

Summary : Channel Coding Derived the second-order asymptotic expansions for DMCs and AWGN channels Achievability hinges on Feinstein s lemma or its generalized version Vincent Tan (NUS) Second-Order Asymptotics NTU 70 / 109

Summary : Channel Coding Derived the second-order asymptotic expansions for DMCs and AWGN channels Achievability hinges on Feinstein s lemma or its generalized version Converse hinges on Hayashi-Nagaoka s lemma with a good choice of output distribution Vincent Tan (NUS) Second-Order Asymptotics NTU 70 / 109

Setup of the Slepian-Wolf coding problem x1 n m 1 f 1 ϕ (ˆx 1 n, ˆxn 2 ) x n 2 f 2 m 2 Illustration of the Slepian-Wolf problem. Vincent Tan (NUS) Second-Order Asymptotics NTU 72 / 109

Setup of the Slepian-Wolf coding problem x1 n m 1 f 1 ϕ (ˆx 1 n, ˆxn 2 ) x n 2 f 2 m 2 Illustration of the Slepian-Wolf problem. Two correlated sources (X n 1, Xn 2 ) n i=1 P X 1 X 2 (x 1i, x 2i ). Vincent Tan (NUS) Second-Order Asymptotics NTU 72 / 109

Setup of the Slepian-Wolf coding problem x1 n m 1 f 1 ϕ (ˆx 1 n, ˆxn 2 ) x n 2 f 2 m 2 Illustration of the Slepian-Wolf problem. Two correlated sources (X n 1, Xn 2 ) n i=1 P X 1 X 2 (x 1i, x 2i ). Separately encoded Vincent Tan (NUS) Second-Order Asymptotics NTU 72 / 109

Setup of the Slepian-Wolf coding problem x1 n m 1 f 1 ϕ (ˆx 1 n, ˆxn 2 ) x n 2 f 2 m 2 Illustration of the Slepian-Wolf problem. Two correlated sources (X n 1, Xn 2 ) n i=1 P X 1 X 2 (x 1i, x 2i ). Separately encoded Both to be decoded at destination Vincent Tan (NUS) Second-Order Asymptotics NTU 72 / 109

The Slepian-Wolf theorem Sources to be compressed to nr 1 and nr 2 bits respectively. Vincent Tan (NUS) Second-Order Asymptotics NTU 73 / 109

The Slepian-Wolf theorem Sources to be compressed to nr 1 and nr 2 bits respectively. (R 1, R 2 ) achievable if there exists a sequence of (2 nr 1, 2 nr 2, n)-codes such that lim Pr ( (ˆX n n 1, ˆX 2 n ) (Xn 1, Xn 2 )) = 0. R(P XY ) is the set of all achievable (R 1, R 2 ) pairs. Vincent Tan (NUS) Second-Order Asymptotics NTU 73 / 109

The Slepian-Wolf theorem Sources to be compressed to nr 1 and nr 2 bits respectively. (R 1, R 2 ) achievable if there exists a sequence of (2 nr 1, 2 nr 2, n)-codes such that lim Pr ( (ˆX n n 1, ˆX 2 n ) (Xn 1, Xn 2 )) = 0. R(P XY ) is the set of all achievable (R 1, R 2 ) pairs. Slepian and Wolf (1973) R(P XY ) = {R 1 H(X 1 X 2 ), R 2 H(X 2 X 1 ), R 1 + R 2 H(X 1, X 2 )} D. Slepian J. Wolf Vincent Tan (NUS) Second-Order Asymptotics NTU 73 / 109

The Slepian-Wolf region R 2 H 2 H 2 1 R(P XY ) H 1 2 H 1 R 1 R 1 H(X 1 X 2 ) R 2 H(X 2 X 1 ) R 1 + R 2 H(X 1, X 2 ) Vincent Tan (NUS) Second-Order Asymptotics NTU 74 / 109

A Review of Slepian-Wolf coding Partition X n j randomly into exp(nr j ) bins, i.e., Pr(x n j B j (m j )) = exp( nr j ), j = 1, 2 Vincent Tan (NUS) Second-Order Asymptotics NTU 75 / 109

A Review of Slepian-Wolf coding Partition X n j randomly into exp(nr j ) bins, i.e., Pr(x n j B j (m j )) = exp( nr j ), j = 1, 2 Transmit bin index m j [1 : exp(nr j )] if X n j B(m j ). Vincent Tan (NUS) Second-Order Asymptotics NTU 75 / 109

A Review of Slepian-Wolf coding Partition X n j randomly into exp(nr j ) bins, i.e., Pr(x n j B j (m j )) = exp( nr j ), j = 1, 2 Transmit bin index m j [1 : exp(nr j )] if X n j B(m j ). At decoder, declare that (X n 1, Xn 2 ) B 1(m 1 ) B 2 (m 2 ) are the transmitted vectors iff (X n 1, Xn 2 ) T ɛ Vincent Tan (NUS) Second-Order Asymptotics NTU 75 / 109

A Review of Slepian-Wolf coding Partition X n j randomly into exp(nr j ) bins, i.e., Pr(x n j B j (m j )) = exp( nr j ), j = 1, 2 Transmit bin index m j [1 : exp(nr j )] if X n j B(m j ). At decoder, declare that (X n 1, Xn 2 ) B 1(m 1 ) B 2 (m 2 ) are the transmitted vectors iff (X n 1, Xn 2 ) T ɛ By standard typicality arguments, we need R 1 H(X 1 X 2 ) R 2 H(X 2 X 1 ) R 1 + R 2 H(X 1, X 2 ) Vincent Tan (NUS) Second-Order Asymptotics NTU 75 / 109

Setup for Second-Order Asymptotics for Slepian-Wolf We chose M 1 = exp(nr 1 ), M 2 = exp(nr 2 ) for the optimum rate region and sought rates (R 1, R 2 ) such that Pr ( (ˆX 1 n, ˆX 2 n ) (Xn 1, Xn 2 )) 0. Alternatively, we can fix (R 1, R 2 ) Bd(R(P XY)) and ε (0, 1) and choose M 1 = exp(nr 1 + nl 1 ), M 2 = exp(nr 2 + nl 2 ) Then seek (L 1, L 2 ) pair such that Pr ( (ˆX n 1, ˆX n 2 ) (Xn 1, Xn 2 )) ε + o(1). Vincent Tan (NUS) Second-Order Asymptotics NTU 76 / 109

Setup for Second-Order Asymptotics for Slepian-Wolf Note that we re operating on the boundary of the SW region! R 2 Bd(R(P XY )) H 2 H 2 1 (R 1, R 2 ) R(P XY ) H 1 2 H 1 R 1 Vincent Tan (NUS) Second-Order Asymptotics NTU 77 / 109

Definitions for Second-Order Asymptotics for SW (L 1, L 2 ) R 2 is (R 1, R 2, ε)-achievable if there exists a sequence of (n, M 1n, M 2n, ε n )-codes such that lim sup n lim sup n 1 n (log M 1n nr 1) L 1 1 n (log M 2n nr 2) L 2 and lim sup ε n ε. n Vincent Tan (NUS) Second-Order Asymptotics NTU 78 / 109

Definitions for Second-Order Asymptotics for SW (L 1, L 2 ) R 2 is (R 1, R 2, ε)-achievable if there exists a sequence of (n, M 1n, M 2n, ε n )-codes such that lim sup n lim sup n 1 n (log M 1n nr 1) L 1 1 n (log M 2n nr 2) L 2 and lim sup ε n ε. n L(ε; R 1, R 2 ) is the set of all (R 1, R 2, ε)-achievable (L 1, L 2 ) pairs Vincent Tan (NUS) Second-Order Asymptotics NTU 78 / 109

Second-Order Asymptotics for SW Joint work with Oliver Kosut Paper published in Feb 2014 issue of IT Transactions Vincent Tan (NUS) Second-Order Asymptotics NTU 79 / 109