Channel Dispersion and Moderate Deviations Limits for Memoryless Channels

Similar documents
Dispersion of the Gilbert-Elliott Channel

Arimoto Channel Coding Converse and Rényi Divergence

Soft Covering with High Probability

Second-Order Asymptotics in Information Theory

On Third-Order Asymptotics for DMCs

EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15

A Tight Upper Bound on the Second-Order Coding Rate of Parallel Gaussian Channels with Feedback

Channels with cost constraints: strong converse and dispersion

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Refined Bounds on the Empirical Distribution of Good Channel Codes via Concentration Inequalities

A new converse in rate-distortion theory

Lecture 4 Noisy Channel Coding

Variable-length coding with feedback in the non-asymptotic regime

for some error exponent E( R) as a function R,

Lecture 14 February 28

A New Metaconverse and Outer Region for Finite-Blocklength MACs

Second-Order Asymptotics for the Gaussian MAC with Degraded Message Sets

Scalar coherent fading channel: dispersion analysis

Superposition Encoding and Partial Decoding Is Optimal for a Class of Z-interference Channels

Entropy, Inference, and Channel Coding

National University of Singapore Department of Electrical & Computer Engineering. Examination for

Chain Independence and Common Information

EXPURGATED GAUSSIAN FINGERPRINTING CODES. Pierre Moulin and Negar Kiyavash

An Extended Fano s Inequality for the Finite Blocklength Coding

Two Applications of the Gaussian Poincaré Inequality in the Shannon Theory

Correlation Detection and an Operational Interpretation of the Rényi Mutual Information

Memory in Classical Information Theory: A Brief History

Feedback Capacity of a Class of Symmetric Finite-State Markov Channels

LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS. S. G. Bobkov and F. L. Nazarov. September 25, 2011

Upper Bounds on the Capacity of Binary Intermittent Communication

Lower Bounds on the Graphical Complexity of Finite-Length LDPC Codes

On the Distribution of Mutual Information

Convexity/Concavity of Renyi Entropy and α-mutual Information

Lecture 5: Channel Capacity. Copyright G. Caire (Sample Lectures) 122

A Formula for the Capacity of the General Gel fand-pinsker Channel

An Alternative Proof of Channel Polarization for Channels with Arbitrary Input Alphabets

The Method of Types and Its Application to Information Hiding

(each row defines a probability distribution). Given n-strings x X n, y Y n we can use the absence of memory in the channel to compute

ELEC546 Review of Information Theory

Notes 3: Stochastic channels and noisy coding theorem bound. 1 Model of information communication and noisy channel

Lecture 18: Gaussian Channel

Performance-based Security for Encoding of Information Signals. FA ( ) Paul Cuff (Princeton University)

On the Duality between Multiple-Access Codes and Computation Codes

Functional Properties of MMSE

Self-normalized Cramér-Type Large Deviations for Independent Random Variables

Lecture 2. Capacity of the Gaussian channel

A proof of the existence of good nested lattices

Quantum Sphere-Packing Bounds and Moderate Deviation Analysis for Classical-Quantum Channels

Secret Key Agreement: General Capacity and Second-Order Asymptotics. Masahito Hayashi Himanshu Tyagi Shun Watanabe

Simple Channel Coding Bounds

x log x, which is strictly convex, and use Jensen s Inequality:

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

Bounds on Mutual Information for Simple Codes Using Information Combining

Lattices for Distributed Source Coding: Jointly Gaussian Sources and Reconstruction of a Linear Function

Training-Based Schemes are Suboptimal for High Rate Asynchronous Communication

EE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet.

arxiv: v4 [cs.it] 17 Oct 2015

Asymptotic Statistics-III. Changliang Zou

Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information

Finite-Blocklength Bounds on the Maximum Coding Rate of Rician Fading Channels with Applications to Pilot-Assisted Transmission

ECE Information theory Final (Fall 2008)

Lecture 4 Channel Coding

Representation of Correlated Sources into Graphs for Transmission over Broadcast Channels

Multiaccess Channels with State Known to One Encoder: A Case of Degraded Message Sets

Analytical Bounds on Maximum-Likelihood Decoded Linear Codes: An Overview

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157

A General Formula for Compound Channel Capacity

Spring 2012 Math 541B Exam 1

The Capacity of Finite Abelian Group Codes Over Symmetric Memoryless Channels Giacomo Como and Fabio Fagnani

Random Bernstein-Markov factors

Chapter 9 Fundamental Limits in Information Theory

On the Feedback Capacity of Stationary Gaussian Channels

VC-DENSITY FOR TREES

(Classical) Information Theory III: Noisy channel coding

Constellation Shaping for Communication Channels with Quantized Outputs

THIS paper is aimed at designing efficient decoding algorithms

Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012

Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery

much more on minimax (order bounds) cf. lecture by Iain Johnstone

IN this paper, we consider the capacity of sticky channels, a

Tightened Upper Bounds on the ML Decoding Error Probability of Binary Linear Block Codes and Applications

On the Optimality of Treating Interference as Noise in Competitive Scenarios

Introduction to Probability

An Improved Sphere-Packing Bound for Finite-Length Codes over Symmetric Memoryless Channels

Strong Converse Theorems for Classes of Multimessage Multicast Networks: A Rényi Divergence Approach

On Dependence Balance Bounds for Two Way Channels

MMSE Dimension. snr. 1 We use the following asymptotic notation: f(x) = O (g(x)) if and only

Polar codes for the m-user MAC and matroids

Estimation-Theoretic Representation of Mutual Information

An Achievable Error Exponent for the Mismatched Multiple-Access Channel

ECE 4400:693 - Information Theory

Capacity of AWGN channels

Variable Length Codes for Degraded Broadcast Channels

Lecture 8: Information Theory and Statistics

Entropies & Information Theory

Lecture 5 Channel Coding over Continuous Channels

Maximum Likelihood Decoding of Codes on the Asymmetric Z-channel

Energy State Amplification in an Energy Harvesting Communication System

On Concentration of Martingales and Applications in Information Theory, Communication & Coding

Transcription:

Channel Dispersion and Moderate Deviations Limits for Memoryless Channels Yury Polyanskiy and Sergio Verdú Abstract Recently, Altug and Wagner ] posed a question regarding the optimal behavior of the probability of error when channel coding rate converges to the capacity sufficiently slowly. They gave a sufficient condition for the discrete memoryless channel DMC to satisfy a moderate deviation property MDP with the constant equal to the channel dispersion. Their sufficient condition excludes some practically interesting channels, such as the binary erasure channel and the Z-channel. We extend their result in two directions. First, we show that a DMC satisfies MDP if and only if its channel dispersion is nonzero. Second, we prove that the AWGN channel also satisfies MDP with a constant equal to the channel dispersion. While the methods used by Altug and Wagner are based on the method of types and other DMC-specific ideas, our proofs in both achievability and converse parts rely on the tools from our recent work 2] on finite-blocklength regime that are equally applicable to non-discrete channels and channels with memory. Index Terms Shannon theory, channel capacity, channel dispersion, moderate deviations, discrete channels, AWGN channel, finite blocklength regime. I. INTRODUCTION A random transformation is defined by a pair of measurable spaces of inputs A and outputs B and a conditional probability measure P Y X : A B. An M, ǫ code average probability of error for the random transformation A, B, P Y X is a pair of possibly randomized maps f :,...,M} A the encoder and g : B,..., M} the decoder, satisfying M M PgY m X = fm] ǫ. m= Similarly, an M, ǫ code maximal probability of error is a pair of possibly randomized maps f :,...,M} A and g : B,...,M}, satisfying max PgY m X = fm] ǫ. 2 m,...,m} In applications, we will take A and B to be n-fold Cartesian products of alphabets A and B, and a channel to be a sequence of random transformations P Y n X n : An B n } indexed by blocklength 3]. An M, ǫ code for A n, B n, P Y n Xn} is called an n, M, ǫ code. For a chosen channel we define the The authors are with the Department of Electrical Engineering, Princeton University, Princeton, NJ, 08544 USA. e-mail: ypolyans,verdu}@princeton.edu. The research was supported by the National Science Foundation under Grants CCF-06-3554 and CCF-07-28445. following non-asymptotic fundamental its: ǫ n, M = infǫ : n, M, ǫ-code maximal probab. of error}3 ǫ avg n, M = infǫ : n, M, ǫ-code average probab. of error}.4 For several memoryless channels as well as some channels with memory it is known that 0, R < C ǫ n, expnr} = 5, R > C, where C is the capacity of the channel. The convergence in 5 is known to be exponential, but the precise evaluation of this exponent is generally an open problem even for the simplest DMCs. If we replace expnr} with expnc A n} then the probability of error converges to a number between 0 and, as follows: ǫ n, expnc A n} = Q A V, 6 where V is the channel dispersion, a fundamental characteristic of a channel, especially valuable in the finite blocklength analysis; see 2], 4]. Reference ] raised the question of the best possible behavior of the probability of error when the coding rate approaches capacity slower than / n. If we assume that 6 holds uniformly in A, then we expect that ǫ nρn n, expnc nρ n } Q e nρ2 n 2V. 7 V This argument justifies the following definition: Definition : A channel with capacity C is said to satisfy the moderate deviation property MDP with constant ν if for any sequence of integers M n such that log M n = nc nρ n, 8 where ρ n > 0, ρ n 0 and, we have As usual, Qx = R x = log ǫ n, M n log ǫ avgn, M n 9 = loge 2ν. 0 2π e t2 /2 dt.

The regime of rate slowly converging to capacity as in 8 falls between the central-it theorem CLT regime 6 and the large deviations or error-exponent regime 5. In ] it was shown that MDP holds for a certain subset of the DMCs which excludes, for example, the binary erasure channel BEC and the Z-channel. We show how a refinement of their result can be easily derived using methods developed in 2]. Namely, we show that a DMC satisfies MDP if and only if its channel dispersion is positive. Therefore, not only do we extend the subset of DMCs for which MDP is shown, but we show that this subset cannot be further extended. Additionally, we show that the additive white Gaussian noise AWGN channel satisfies MDP. We also show that for any channel not necessarily stationary, memoryless, or even non-anticipatory the constant ν in the MDP and the dispersion V in the centralit 6 cannot differ. One of the main tools in our treatment 2] is the performance of the optimal binary hypothesis test defined as follows. Consider a W-valued random variable W which can take probability measures P or Q. A randomized test between those two distributions is defined by a random transformation P Z W : W 0, } where 0 indicates that the test chooses Q. The best performance achievable among those randomized tests is given by 2 β α P, Q = min w WQwP Z W w, where the minimum is over all probability distributions P Z W satisfying P Z W : PwP Z W w α. 2 w W The minimum in is guaranteed to be achieved by the Neyman-Pearson lemma. Thus, β α P, Q gives the minimum probability of error under hypothesis Q if the probability of error under hypothesis P is not larger than α. It is easy to show that e.g. 5] for any γ > 0 α P On the other hand, dp dq γ ] + γβ α P, Q. 3 β α P, Q γ 0, 4 for any γ 0 that satisfies ] dp P dq γ 0 α. 5 II. ON THE MDP CONSTANT For an arbitrary channel, neither 6 implies 0, nor vice versa. However, if both its hold, then the respective constants must be equal: Theorem : Consider an arbitrary channel i.e. a sequence of random transformations with capacity C and suppose that 2 We write summations over alphabets for simplicity; however, all of our general results hold for arbitrary probability spaces. central-it 6 holds with dispersion V and MDP holds with constant ν thus, V > 0 and ν > 0. Then V = ν. 6 Proof: Define a sequence of cumulative density functions CDFs as follows: F n x = ǫ n, expnc + x nv }. 7 Then, on one hand the central-it property 6 ensures that F n x Φx, 8 for all x R, where Φ is the CDF of the standard Gaussian N0, : x Φx = e y2 2 dy. 9 2π On the other hand, the MDP property, cf. Definition, can be reformulated as follows: For every sequence a n > 0 s.t. a n n we have log F n a n = loge 2θ, 20 where θ = ν V. We must show that θ =. To do so, define the following non-increasing sequence of numbers: u n = sup sup F m x Φx. 2 m n x R Since the convergence in 8 is necessarily uniform, we have: u n 0. 22 Notice that if u n = 0 for all sufficiently large n, then the result follows automatically since for any sequence a n satisfying conditions for 20 we have log Φ a n = loge 2 23 and θ =. Thus, in the remaining we assume that u n does not vanish for any n. First, suppose that n log u n < 0. 24 Then for some δ > 0 and all n N we should have u n exp nδ} 25 But then, for any admissible sequence a n we have because log F n a n = log Φ a n 26 F n a n Φ a n exp nδ} 27 and by the conditions on a n, Φ a n exp nδ}. Finally, application of 23 to 26 completes the proof in this case. Second, suppose that n log u n = 0. 28

Assume, for example, that θ >. Fix any δ > 0 such that 2 + δ > + 2δ. 29 2θ Choose the following sequence a n > 0: a n = 2θ + 2δ log e u n. 30 The it 23 implies that for all sufficiently large n we have Φ a n e 2 +δa2 n = un r, 3 where r > by 29. Condition 28 shows that sequence a n satisfies conditions for 20, from which we find that for all sufficiently large n we have F n a n e 2θ +δa2 n = un r2, 32 where r 2 <. Consider the chain of inequalities: u n F n a n Φ a n 33 F n a n Φ a n 34 u n r2 u n r 35 where 33 is by the definition of u n in 2 and 35 is by 3 and 32. Finally, 35 is a contradiction since r 2 < < r. Similarly, one shows that θ < is also impossible. III. DISCRETE MEMORYLESS CHANNELS In the sequel we use the notation of 2, Section IV.A]. In particular, the DMC has finite input alphabet A, finite output alphabet B, and conditional probabilities n P Y n X nyn x n = Wy i x i, 36 i= where W x is a conditional probability mass function on B for all x A, which is abbreviated as W x when notationally convenient. We denote the simplex of probability distributions on A by P. It is useful to partition P into n-types: P n = P P : npx Z + x A}. 37 We use the following notation and terminology: output distribution P W information density mutual information PWy = x APxWy x. 38 ix; y = log Wy x PWy. 39 IP, W = EiX; Y ] 40 = PxWy xlog Wy x PWy.4 x A y B conditional information variance V P, W = EVariX; Y X] 42 = x APx Wy xlog 2 Wy x PWy y B } DW x PW] 2 43 third absolute moment of the information density TP, W = Wy x PxWy x log PWy x A y B 3 DW x PW. 44 Note that IP, W, V P, W and TP, W are continuous functions of P P; see 2, Lemma 62]. the compact subset of capacity-achieving distributions Π where Π = P P : IP, W = C}. 45 C = maxip, W. 46 P P channel dispersion, which according to 2, Theorem 49] is equal to V = min V P, W. 47 P Π Apart from analyzing the it of ǫ avg the result of ] can be stated as follows: Theorem 2 ]: Consider a DMC W. If Wy x > 0 for all x A, y B and V > 0 then DMC W satisfies MDP with the constant V. The main result of this section is: Theorem 3: The DMC W satisfies MDP if and only if V > 0, in which case V is the MDP constant of the DMC. Theorem 3 follows from Theorems 4 and 6 below. Theorem 4: Consider a DMC W and a sequence ρ n such that ρ n > 0, ρ n 0 and ρ 2 nn. If V > 0 then there exists a sequence of n, expnc nρ n }, ǫ n codes maximal probability of error with log ǫ n 2V. 48 On the other hand, when V = 0 there exists a sequence of n, expnc nρ n }, ǫ n codes maximal probability of error with ǫ n 2 exp nρ n }, 49 so that the channel cannot satisfy MDP. Proof: Denote by P the capacity achieving distribution that also achieves V in 47. According to the DT bound 2, Theorem 7], there exist an n, 2 expnc nρ n }, ǫ n code average probability of error such that ǫ n exp E ix n, Y n nc + nρ n +}], 50

where ix n, y n = j= log Wy j x j PWy j. 5 And therefore, by a standard purging method, there also exists an n, expnc nρ n }, ǫ n code maximal probability of error with ǫ n = 2ǫ n, or ǫ n 2E exp ix n, Y n nc + nρ n +}]. 52 If V = 0 then ix n, Y n = nc and 49 readily follows. Assume V > 0, fix arbitrary λ < and consider a chain of elementary inequalities: exp ix n, Y n nc + nρ n +} 53 ix n, Y n nc λnρ n } 54 + exp ix n, Y n nc + nρ n +} ix n, Y n > nc λnρ n } 55 ix n, Y n nc λnρ n } + exp λnρ n }. 56 By 6, Theorem 3.7.] we have log PiX n, Y n nc λnρ n ] λ2 log e. 57 2V Therefore, by taking the expectation in 56 and by conditions on ρ n the second term is asymptotically dominated by the first and we obtain: log E exp ix n, Y n nc + nρ n +}] λ2 log e. 58 2V Since λ < was arbitrary we can take λ to obtain 48. The main analytic tool required in proving the converse bound in this section is a tight non-asymptotic lower bound for the probability of a large deviation of a random variable from its mean. This question has been addressed by many authors working in probability and statistics, starting from Kolmogorov 7]. Currently, one of the most general such results belongs to Rozovsky 8], 9]. The following is a weakening of 8, Theorem ] which plays the same role as Berry-Esseen inequality in the analysis of 6; see 2]. 3 Theorem 5 Rozovsky: There exist universal constants A > 0 and A 2 > 0 with the following property. Let X k, k =,...,n be independent with finite third moments: µ k = EX k ], σ 2 k = VarX k ], and t k = E X k µ k 3 ]. 59 3 Similar to well-known extensions of the Berry-Esseen inequality to the case of random variables without a third absolute moment, Rozovsky does not require that E X k 3 be bounded. However, we only will need this weaker result. Denote S = n k= σ2 k and T = n k= t k. Whenever x we have n P X k µ k > x ] S Qxe A T S 3/2 x3 A 2T S x. 3/2 k= 60 Theorem 6: Consider a DMC W and a sequence of n, M n, ǫ n codes average probability of error with log M n nc nρ n, 6 where ρ n > 0, ρ n 0 and ρ 2 nn. If V > 0 then we have inf log ǫ n log e 2V. 62 Proof: Replacing the encoder with an optimal deterministic one, we can only reduce the average probability of error. Next, if we have an n, M n, ǫ n code average probability of error with a deterministic encoder, then a standard argument shows that there exists an n, 2 M n, 2ǫ n subcode maximal probability of error. Replacing M n 2 M n and ǫ n 2ǫ n, without loss of generality we may assume the code to have a deterministic encoder and a maximal probability of error ǫ n. Now for each n denote by P n P n the n-type containing the largest number of codewords. A standard type-counting argument shows that then there exists an n, M n, ǫ n constant composition P n subcode with log M n nc nρ n A logn +. 63 By compactness of P the sequence P n has an accumulation point P. Without loss of generality, we may assume P n P. Now for each n define the following probability distribution Q Y n on B n : n Q Y ny n = P n Wy j. 64 j= According to 2, Theorem 3] we have β ǫn P X n Y n, P X nq Y n M n, 65 where here and below P X n is the distribution induced by the encoder on A n. Applying 3 we get that for any γ we have: ǫ n P log WY n X n Q Y ny n < γ ] expγ log M n }. 66 We now fix arbitrary λ > and take γ = nc λnρ n to obtain: ǫ n P log WY n X n ] Q Y ny n < nc λnρ n exp nρ n λ + A logn + }. 67 Notice that since the code has constant composition P n, the distribution of log WY n X n Q Y ny n given X n = x n is the same for all x n. Therefore, assuming such conditioning we have log WY n X n Q Y ny n Z j, 68 j=

where Z j are independent and EZ j ] = nip n, W, 69 j= VarZ j ] = nv P n, W, 70 j= E Z j EZ j ] 3] = ntp n, W. 7 j= In terms of Z j the bound in 67 asserts ǫ n P Z j < nc λnρ n j= exp nρ n λ + A logn + }. 72 First, suppose that IP, W < C. Then a simple Chernoff bound implies that the right-hand side of 67 converges to and 62 holds. Next, assume IP, W = C. Since IP n, W C we have from 72: ǫ n P Z j nip n, W < λnρ n j= exp nρ n λ + A logn + }. 73 Note that by continuity of V P, W we have V P n, W V P, W V > 0, 74 where V P, W V since P is capacity-achieving. Therefore, by Theorem 5 we obtain: P Z j nip n, W < λnρ n j= Q λ V Pn, W nρ 2 n e λ3 A TPn,W V 3 nρ 3 Pn,W n λa 2TP n, W V 2 P n, W ρ n, 75 since TP n, W is continuous and, thus, bounded on P, we see that the term in parentheses is + o because of the conditions on ρ n. Therefore, inf log P Z j nip n, W < λnρ n + j= λ log Q nρ 2 n V Pn, W λ3 A TP n, W V 3 nρ 3 n P n, W 76 = λ2 log e 77 2V P, W λ2 log e. 78 2V Finally, it is easy to see that the second term in 73 is asymptotically dominated by the first term according to 78 and nρ n. Thus, from 78 we conclude that inf log ǫ n log e 2V. 79 IV. THE AWGN CHANNEL The AWGN channel with signal-to-noise ratio SNR equal to P is defined for each blocklength n as follows: the input space is a subset of vectors of R n satisfying x n 2 np, 80 the output space is R n and the channel acts by adding a white Gaussian noise of variance : Y n = X n + Z n, 8 where Z n N0,I n. The channel dispersion of the AWGN channel is given by 2, Theorem 54] V P = log2 e 2 + P 2. 82 Theorem 7: The AWGN channel with SNR P satisfies MDP with constant V P. Proof: We rely heavily on the notation and results of 2, Section III.J]. Converse: Consider a sequence of n, M n, ǫ n codes average probability of error with M n = expnc nρ n }, 83 where ρ n > 0, ρ n 0 and ρ 2 nn. Following the method of 0] and 2, Lemma 39] we can assume without loss of generality that every codeword C j R n, j =,...,M n lies on a power-sphere: C j 2 = np. 84 We apply the meta-converse bound 2, Theorem 27] with Q Y n chosen as n Q Y n = N0, + P, 85 to obtain j= β ǫn P Xn Y n, P X nq Y n exp nc + nρ n}, 86 where P X n is the distribution induced by the encoder on R n. As explained in 2, Section III.J] we have the equality β ǫn P Xn Y n, P X nq Y n = β ǫ n P Y n X n =x, Q Y n, 87 where x = P,..., P] T. Now applying 3 β ǫn P Y n X n =x, Q Y n with γ = nc λnρ n, where λ > is arbitrary we obtain log e ǫ n P P Zi 2 + 2 ] PZ i < λnρ n 2 + P i= exp nρ n λ }, 88

where we have written the distribution of log P Y n X n =x Q Y explicitly in terms of the i.i.d. random variables Z j N0, ; n see 2, 205]. According to 6, Theorem 3.7.], the first term in the right-hand side of 88 dominates the second one and we have inf log ǫ n λ2 2V P, 89 and taking λ ց we obtain inf log ǫ n 2V P. 90 Achievability: Similar to 2, Section III.J] we apply the κβ bound 2, Theorem 25], with F chosen to be the power sphere F = x n R n : x n 2 = np } 9 and Q Y n as in 85. Using the identity 87 and the lower bound on κ τ F, Q Y n given by 2, Lemma 6] we show that for all 0 < ǫ < and 0 < τ < ǫ there exists an n, M, ǫ code maximal probability of error with REFERENCES ] Y. Altug and A. B. Wagner, Moderate deviation analysis of channel coding: Discrete memoryless case, in Proc. 200 IEEE Int. Symp. Inf. Theory ISIT, Austin, TX, USA, Jun. 200. 2] Y. Polyanskiy, H. V. Poor, and S. Verdú, Channel coding rate in the finite blocklength regime, IEEE Trans. Inf. Theory, vol. 56, no. 5, pp. 2307 2359, May 200. 3] S. Verdú and T. S. Han, A general formula for channel capacity, IEEE Trans. Inf. Theory, vol. 40, no. 4, pp. 47 57, 994. 4] Y. Polyanskiy, H. V. Poor, and S. Verdú, Dispersion of the Gilbert- Elliott channel, IEEE Trans. Inf. Theory, to appear. 5] S. Verdú, EE528 Information Theory, Lecture Notes. Princeton, NJ: Princeton Univ., 2007. 6] A. Dembo and O. Zeitouni, Large deviations techniques and applications, 2nd ed. New York: Springer Verlag, 2009. 7] A. N. Kolmogorov, Über das Gesets des iterierten Logarithmus, Math. Ann., vol. 0, pp. 26 35, 929. 8] L. V. Rozovsky, Estimate from below for large-deviation probabilities of a sum of independent random variables with finite variances in Russian, Zapiski Nauchn. Sem. POMI, vol. 260, pp. 28 239, 999. 9], Estimate from below for large-deviation probabilities of a sum of independent random variables with finite variances, J. Math. Sci., vol. 09, no. 6, pp. 292 2209, May 2002. 0] C. E. Shannon, Probability of error for optimal codes in a Gaussian channel, Bell Syst. Tech. J., vol. 38, pp. 6 656, 959. M C τ e C2n β ǫ+τ P Y n X n =x, Q Y n, 92 where x = P,..., P] n R n is a vector on the the power sphere, and C and C 2 are some positive constants. We now take τ = ǫ 2 and apply the upper bound on β from 4 to obtain the statement: For any γ there exists and n, M, ǫ code maximal probability of error with M ǫ 2e C2n 2C expγ} 93 and ǫ = 2P log dp ] Y n X n =x γ. 94 Q Y n Now take γ n = nc λnρ n, where λ < is arbitrary. By 6, Theorem 3.7.] we obtain a sequence of codes with for all n sufficiently large and In particular, log M n nc nρ n 95 λ2 log ǫ n 2V P. 96 log ǫ n, expnc nρ n } λ2 2V P, 97 and since λ < is arbitrary we can take λ ր to finish the proof.