Dispersion of the Gilbert-Elliott Channel

Similar documents
Channel Dispersion and Moderate Deviations Limits for Memoryless Channels

EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15

Arimoto Channel Coding Converse and Rényi Divergence

Variable-length coding with feedback in the non-asymptotic regime

Channels with cost constraints: strong converse and dispersion

Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information

Feedback Capacity of a Class of Symmetric Finite-State Markov Channels

A New Metaconverse and Outer Region for Finite-Blocklength MACs

Superposition Encoding and Partial Decoding Is Optimal for a Class of Z-interference Channels

Simple Channel Coding Bounds

Soft Covering with High Probability

Optimal Lossless Compression: Source Varentropy and Dispersion

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

On Third-Order Asymptotics for DMCs

Second-Order Asymptotics in Information Theory

Refined Bounds on the Empirical Distribution of Good Channel Codes via Concentration Inequalities

Upper Bounds on the Capacity of Binary Intermittent Communication

An Extended Fano s Inequality for the Finite Blocklength Coding

A new converse in rate-distortion theory

Lecture 4 Noisy Channel Coding

A Tight Upper Bound on the Second-Order Coding Rate of Parallel Gaussian Channels with Feedback

ELEC546 Review of Information Theory

Multiaccess Channels with State Known to One Encoder: A Case of Degraded Message Sets

Variable Rate Channel Capacity. Jie Ren 2013/4/26

A Formula for the Capacity of the General Gel fand-pinsker Channel

Symmetric Characterization of Finite State Markov Channels

A General Formula for Compound Channel Capacity

EE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet.

On the Distribution of Mutual Information

An Alternative Proof of Channel Polarization for Channels with Arbitrary Input Alphabets

Energy State Amplification in an Energy Harvesting Communication System

On Scalable Coding in the Presence of Decoder Side Information

An Achievable Error Exponent for the Mismatched Multiple-Access Channel

On the Typicality of the Linear Code Among the LDPC Coset Code Ensemble

Scalar coherent fading channel: dispersion analysis

Two Applications of the Gaussian Poincaré Inequality in the Shannon Theory

Error Exponent Region for Gaussian Broadcast Channels

Lecture 8: Shannon s Noise Models

EECS 750. Hypothesis Testing with Communication Constraints

Capacity-achieving Feedback Scheme for Flat Fading Channels with Channel State Information

Correlation Detection and an Operational Interpretation of the Rényi Mutual Information

for some error exponent E( R) as a function R,

Secret Key Agreement: General Capacity and Second-Order Asymptotics. Masahito Hayashi Himanshu Tyagi Shun Watanabe

Computation of Information Rates from Finite-State Source/Channel Models

National University of Singapore Department of Electrical & Computer Engineering. Examination for

Transmitting k samples over the Gaussian channel: energy-distortion tradeoff

arxiv: v4 [cs.it] 17 Oct 2015

Capacity of Memoryless Channels and Block-Fading Channels With Designable Cardinality-Constrained Channel State Feedback

The Poisson Channel with Side Information

(each row defines a probability distribution). Given n-strings x X n, y Y n we can use the absence of memory in the channel to compute

Guesswork Subject to a Total Entropy Budget

On the Secrecy Capacity of the Z-Interference Channel

On The Binary Lossless Many-Help-One Problem with Independently Degraded Helpers

Feedback Capacity of the Compound Channel

The Effect of Memory Order on the Capacity of Finite-State Markov and Flat-Fading Channels

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

On the Sufficiency of Power Control for a Class of Channels with Feedback

MMSE Dimension. snr. 1 We use the following asymptotic notation: f(x) = O (g(x)) if and only

Generalized Writing on Dirty Paper

Quantum Sphere-Packing Bounds and Moderate Deviation Analysis for Classical-Quantum Channels

Finite-Blocklength Bounds on the Maximum Coding Rate of Rician Fading Channels with Applications to Pilot-Assisted Transmission

Shannon meets Wiener II: On MMSE estimation in successive decoding schemes

Lower Bounds on the Graphical Complexity of Finite-Length LDPC Codes

On the Low-SNR Capacity of Phase-Shift Keying with Hard-Decision Detection

Lecture 4 Channel Coding

Source and Channel Coding for Correlated Sources Over Multiuser Channels

On the Optimality of Treating Interference as Noise in Competitive Scenarios

Lecture 11: Quantum Information III - Source Coding

The Information Lost in Erasures Sergio Verdú, Fellow, IEEE, and Tsachy Weissman, Senior Member, IEEE

Representation of Correlated Sources into Graphs for Transmission over Broadcast Channels

Estimation-Theoretic Representation of Mutual Information

Intermittent Communication

Information Theory. Lecture 5 Entropy rate and Markov sources STEFAN HÖST

Memory in Classical Information Theory: A Brief History

Shannon s Noisy-Channel Coding Theorem

On Gaussian MIMO Broadcast Channels with Common and Private Messages

Convexity/Concavity of Renyi Entropy and α-mutual Information

EE 4TM4: Digital Communications II. Channel Capacity

The PPM Poisson Channel: Finite-Length Bounds and Code Design

An introduction to basic information theory. Hampus Wessman

Unified Scaling of Polar Codes: Error Exponent, Scaling Exponent, Moderate Deviations, and Error Floors

On the Capacity of the Interference Channel with a Relay

Lecture 14 February 28

arxiv: v1 [cs.it] 4 Jun 2018

Training-Based Schemes are Suboptimal for High Rate Asynchronous Communication

Capacity of a channel Shannon s second theorem. Information Theory 1/33

Variable Length Codes for Degraded Broadcast Channels

Asymptotic Estimates in Information Theory with Non-Vanishing Error Probabilities

Cut-Set Bound and Dependence Balance Bound

Constellation Shaping for Communication Channels with Quantized Outputs

On the Secrecy Capacity of Fading Channels

Midterm Exam Information Theory Fall Midterm Exam. Time: 09:10 12:10 11/23, 2016

Second-Order Asymptotics for the Gaussian MAC with Degraded Message Sets

Binary Transmissions over Additive Gaussian Noise: A Closed-Form Expression for the Channel Capacity 1

Asymptotic Expansion and Error Exponent of Two-Phase Variable-Length Coding with Feedback for Discrete Memoryless Channels

WE study the capacity of peak-power limited, single-antenna,

Multiuser Successive Refinement and Multiple Description Coding

On Dependence Balance Bounds for Two Way Channels

On Compound Channels With Side Information at the Transmitter

Duality Between Channel Capacity and Rate Distortion With Two-Sided State Information

Transcription:

Dispersion of the Gilbert-Elliott Channel Yury Polyanskiy Email: ypolyans@princeton.edu H. Vincent Poor Email: poor@princeton.edu Sergio Verdú Email: verdu@princeton.edu Abstract Channel dispersion plays a fundamental role in assessing the backoff from capacity due to finite blocklength. This paper analyzes the channel dispersion for a simple channel with memory: the Gilbert-Elliott communication model in which the crossover probability of a binary symmetric channel evolves as a binary symmetric Markov chain, with and without side information at the receiver about the channel state. With side information, although capacity is invariant to the chain dynamics, dispersion is shown to be the sum of two terms: due to the Markov chain dynamics and due to the the randomness in the error generation, respectively. I. INTRODUCTION The fundamental performance limit for a channel in the finite blocklength regime is M (n, ɛ), the maximum cardinality of a codebook of blocklength n which can be decoded with block error probability no greater than ɛ. Denoting the channel capacity by C, the approximation log M (n, ɛ) n C () is asymptotically tight for channels that satisfy the strong converse. However for many channels, error rates and blocklength ranges of practical interest, () is too optimistic. It has been shown in [ that a much tighter approximation can be obtained by defining a second parameter referred to as the channel dispersion: Definition : The dispersion V (measured in squared information units per channel use) of a channel with capacity C is equal to (nc log M (n, ɛ)) V = lim lim sup ɛ 0 n n ln. () ɛ In conjunction with the channel capacity C, channel dispersion emerges as a powerful analysis and design tool [. In order to achieve a given fraction η of capacity with a given error probability, the minimal required blocklength is proportional to V/C, namely, n ( Q ) (ɛ) V η C. (3) The research was supported by the National Science Foundation under Grants CCF-06-3554 and CNS-06-5637. As usual, Q(x) = R x e t / dt. π More specifically, [ shows that for simple memoryless channels the two-term expansion log M (n, ɛ) = nc V Q (ɛ) + O(log n), (4) gives an excellent approximation (unless the blocklength is very small). The expansion (4) was first proved for a discrete memoryless channel by Strassen [. The new upper and lower bounds found in [ allowed us to demonstrate the remarkable tightness of the approximation, prove (4) for the additive white Gaussian noise channels, refine Strassen s proof, and improve the bounds on the O(log n) term (see also [3, [4). In this paper, we initiate the study of the dispersion of channels subject to fading with memory. For coherent channels that behave ergodically, channel capacity is independent of the fading dynamics [5 since a sufficiently long codeword sees a channel realization whose empirical statistics have no randomness. In contrast, channel dispersion does depend on the extent of the fading memory since it determines the blocklength required to ride out not only the noise but the channel fluctuations due to fading. One of the simplest models that incorporates fading with memory is the Gilbert- Elliott channel (GEC): a binary symmetric channel where the crossover probability is a binary Markov chain [6, [7. The results and required tools depend crucially on whether the channel state is known at the decoder. II. CHANNEL MODEL Let {S j } j= be a homogeneous Markov process with states {, }, transition probabilities P[S = S = = P[S = S = = τ, (5) P[S = S = = P[S = S = = τ, (6) and initial distribution P[S = = P[S = = /. (7) Now for δ, δ [0, we define {Z j } j= as conditionally independent given S and P[Z j = 0 S j = s = δ s, (8) P[Z j = S j = s = δ s. (9) The results in this paper can be readily generalized at the expense of more cumbersome expressions to Gilbert-Elliott channels with asymmetric Markov chains.

The Gilbert-Elliott channel acts on an input binary vector X n by adding (modulo ) the vector Z n : Y n = X n + Z n. (0) III. CAPACITY The capacity C of a Gilbert-Elliott channel with the state S n known perfectly at the receiver depends only on the stationary distribution P S and is given by C = log E [h(δ S ) () = log P[S = h(δ ) P[S = h(δ ), () where h(x) = x log x ( x) log( x) is the binary entropy function. In the symmetric-chain special case considered in this paper, C = log h(δ ) h(δ ). (3) When the state S n is not known at the receiver, the capacity is given by [8 C 0 = log E [ h(p[z 0 = Z ) (4) = log lim E [ h(p[z 0 = Z n). (5) n Throughout the paper we use subscripts and 0 to denote when the state S n is known and is not known, respectively. IV. MAIN RESULTS Before showing the expansion for the Gilbert-Elliott channel we recall the corresponding result for the binary symmetric channel (BSC) [, [3. Theorem : The dispersion of the BSC with crossover probability δ is V (δ) = δ( δ) log δ. (6) δ Furthermore, provided that V (δ) > 0 and regardless of whether ɛ (0, ) is a maximal or average probability of error we have log M (n, ɛ) = n(log h(δ)) V (δ)q (ɛ) + log n + O(). (7) Theorem : The dispersion of the Gilbert-Elliott channel with state S n known at the receiver, and state transition probability τ (0, ) is V = (V (δ ) + V (δ )) ( ) ( ) h(δ ) h(δ ) + τ. (8) Furthermore, provided that V > 0 and regardless of whether ɛ (0, ) is a maximal or average probability of error we have log M (n, ɛ) = nc V Q (ɛ) + O(log n), (9) where C is given in (3). Moreover, (9) holds even if the transmitter knows the full state sequence S n in advance (i.e., non-causally). Note that the condition V > 0 for (9) to hold excludes only some degenerate cases for which we have: M (n, ɛ) = n (when both crossover probabilities are 0 or ) or M (n, ɛ) = ɛ (when δ = δ = /). To formulate the result for the case of no state information at the receiver, we define the stationary process: F j = log P Zj Z j (Z j Z ) j. (0) Theorem 3: The dispersion of the Gilbert-Elliott channel with no state information and state transition probability τ (0, ) is V 0 = Var [F 0 + E [(F i E [F i )(F 0 E [F 0 ). () i= Furthermore, provided that V 0 > 0 and regardless of whether ɛ is a maximal or average probability of error, we have log M (n, ɛ) = nc 0 V 0 Q (ɛ) + O(log n), () where C 0 is given by (4). It can be shown that the process F j has a smooth spectral density S F (f), and that V 0 = S F (0), (3) which provides a way of computing V 0 by a Monte Carlo simulation paired with a spectral estimator. Another method is to notice that the terms in the infinite sum () decay as ( τ) j. Hence, given any prescribed precision it is sufficient to compute only finitely many terms in (). Each term can in turn be computed with required precision by noting that P Zj Z j [ Z j is a Markov process with a simple transition kernel. The proof of Theorem is outlined in the appendix, while the proof of Theorem 3 being somewhat more technical can be found in [0. V. DISCUSSION The natural application of expansion (4) is in approximating the maximal achievable rate. Unlike the BSC case (7), the prelog constant for the GEC is unknown and therefore a natural choice for the approximation would be 0 log n. However, in view of robustness of the prelog in (7) to variation in crossover probability, we chose the following expression for numerical comparison V0, C 0, n Q (ɛ) + log n. (4) n To demonstrate the tightness of (4) we have computed numerically the upper and lower bounds developed in the course of the proof of Theorems and 3. The comparison is shown on Fig.. For the case of state known (left plot) the achievability bound is (45) and the converse bound is (6), see the appendix. For the state not known (right plot), we computed the capacity and dispersion: C 0 0.80 bit, (5) V 0.73 bit. (6)

0.5 Capacity 0.5 Converse 0.4 0.4 Rate R, bit/ch.use 0.3 Achievability Normal approximation Rate R, bit/ch.use 0.3 Converse Capacity 0. 0. 0. 0. Achievability Normal approximation 0 0 500 000 500 000 500 3000 3500 4000 Blocklength, n 0 0 500 000 500 000 500 3000 3500 4000 Blocklength, n Fig.. Rate-blocklength tradeoff at block error rate ɛ = 0 for the Gilbert-Elliott channel with parameters δ = /, δ = 0 and state transition probability τ = 0.. The left (right) plot is for the case when the state is known (not known) at the receiver. The plots in Fig. suggest that not only do our bounds tightly describe the value of n log M (n, ɛ), but also that the simple expression (4) is precise enough for addressing many practical questions. Let us discuss two such questions. First, for the state known case, capacity C is independent of the state transition probability τ; see (3). However, as shown by Theorem, the channel dispersion V does indeed depend on τ. But then (8) implies that this minimal blocklength behaves as O ( ) τ when τ 0. This has an intuitive explanation: to achieve the full capacity of a Gilbert-Elliott channel we need to wait until the influence of the random initial state washes away. Since transitions occur on average every τ channel uses, the blocklength should be O ( ) τ as τ 0. Comparing (6) and (8) we can ascribe a meaning to each of the two terms in (8): the first one gives the dispersion due to the usual BSC noise, whereas the second one is due to memory in the channel. Next, consider the case in which the state is not known at the decoder. As shown in [8, when the state transition probability τ decreases to 0 the capacity C 0 (τ) increases to C. This is sometimes interpreted as implying that if the state is unknown at the receiver slower dynamics are advantageous. Our refined analysis, however, shows that this is true only up to a point. Indeed, fix a rate R < C and an ɛ > 0. In view of the tightness of (4), the minimum blocklength, as a function of state transition probability τ needed to achieve rate R is approximately given by ( Q ) (ɛ) N 0 (τ) V 0 (τ), (7) C 0 (τ) R provided that C 0 (τ) > R, of course. Heuristically, there are two effects: when the state transition probability τ decreases we can predict the current state better and therefore, the capacity grows; on the other hand, as Blocklength, N(τ) 0 7 0 6 0 5 0 4 0 4 0 3 0 0 τ Fig.. Minimal blocklength, needed to achieve R = 0.4 bit and ɛ = 0.0 as a function of state transition probability τ. The channel is the Gilbert-Elliott with no state information at the receiver, δ = /, δ = 0. discussed above, when τ decreases we also have to wait longer until the chain forgets the initial state. The trade-off between these two effects is demonstrated on Fig., where we plot N 0 (τ) for the same parameters as in Fig.. VI. CONCLUSION In this paper, we have proved an approximation of the form (4) for the Gilbert-Elliott channel. In Fig., we have illustrated the relevance by comparing it numerically with bounds. As we have found previously in [ and [3, expansions such as (4) have practical importance by providing tight approximations of the speed of convergence to capacity, and by allowing for estimation of the blocklength needed to achieve

a given fraction of capacity, as given by (3). REFERENCES [ Y. Polyanskiy, H. V. Poor and S. Verdú, Channel coding rate in the finite blocklength regime, submitted to IEEE Trans. Inform. Theory, Nov. 008. [ V. Strassen, Asymptotische Abschätzungen in Shannon s Informationstheorie, Trans. Third Prague Conf. Information Theory, 96, Czechoslovak Academy of Sciences, Prague, pp. 689-73. [3 Y. Polyanskiy, H. V. Poor and S. Verdú, New channel coding achievability bounds, Proc. IEEE Int. Symp. Information Theory (ISIT), Toronto, Canada, 008. [4 Y. Polyanskiy, H. V. Poor and S. Verdú, Dispersion of Gaussian channels, Proc. IEEE Int. Symp. Information Theory (ISIT), Seoul, Korea, 009. [5 E. Biglieri, J. Proakis, and S. Shamai (Shitz), Fading channels: Information-theoretic and communication aspects, IEEE Trans. Inform. Theory, 50th Anniversary Issue, Vol. 44, No. 6, pp. 69-69, October 998. [6 E. N. Gilbert, Capacity of burst-noise channels, Bell Syst. Tech. J., Vol. 39, pp. 53-65, Sept. 960. [7 E. O. Elliott, Estimates of error rates for codes on burst-noise channels, Bell Syst. Tech. J., Vol. 4, pp. 977-997, Sept. 963 [8 M. Mushkin and I. Bar-David, Capacity and coding for the Gilbert- Elliott channels, IEEE Trans. Inform. Theory, Vol. 35, No. 6, pp. 77-90, 989. [9 S. Verdú, EE58 Information Theory, Lecture Notes,, Princeton, NJ, 007. [0 Y. Polyanskiy, H. V. Poor and S. Verdú, Dispersion of the Gilbert-Elliott channel, draft, 009. [ A. N. Tikhomirov, On the convergence rate in the central limit theorem for weakly dependent random variables, Theory of Probability and Its Applications, Vol. XXV, No. 4, 980. APPENDIX PROOF OF THEOREM For our analysis we need to invoke a few relevant results from [ and [3. Consider an abstract channel P Y X ; for an arbitrary input distribution P X define an (extended) random variable i(x; Y ) = log dp Y X(Y X), (8) dp Y (Y ) where P Y = dp X P Y X=x. Theorem 4 (DT bound): For an arbitrary P X there exists a code with M codewords and average probability of error ɛ satisfying [ { [ } ɛ E exp i(x; Y ) log M +. (9) The optimal performance of binary hypothesis testing plays an important role in our development. Consider a random variable W taking values in a set W which can take probability measures P or Q. A randomized test between those two distributions is defined by a random transformation P Z W : W {0, } where 0 indicates that the test chooses Q. The best performance achievable among those randomized tests is given by 3 β α (P, Q) = min w W Q(w)P Z W ( w), (30) 3 We write summations over alphabets for simplicity; however, all of our general results hold for arbitrary probability spaces. where the minimum is taken over all probability distributions P Z W satisfying P Z W : P (w)p Z W ( w) α. (3) w W The minimum in (30) is guaranteed to be achieved by the Neyman-Pearson lemma. Thus, β α (P, Q) gives the minimum probability of error under hypothesis Q if the probability of error under hypothesis P is not larger than α. It is easy to show that (e.g. [9) for any γ > 0 α P On the other hand, [ dp dq γ + γβ α (P, Q). (3) β α (P, Q) γ 0, (33) for any γ 0 that satisfies [ dp P dq γ 0 α. (34) Virtually all known converse results for channel coding follow from the next theorem by a judicious choice of Q Y X and a lower bound on β, see [. Theorem 5 (meta-converse): Consider two different abstract channels P Y X and Q Y X defined on the same input and output spaces. For a given code (possibly randomized encoder and decoder pair), let ɛ = average error probability with P Y X ɛ = average error probability with Q Y X P X = Q X = encoder output distribution with Then, equiprobable codewords. β ɛ (P XY, Q XY ) ɛ, (35) where P XY = P X P Y X and Q XY = Q X Q Y X. Proof of Theorem : Achievability: We choose P X n equiprobable. Since the output of the channel is (Y n, S n ) we need to write down the expression for i(x n ; Y n S n ). To do that we define an operation on R {0, }: { a [b 0, b = 0, = a, b =. (36) Then we obtain i(x n ; Y n S n ) = log P Y n X n S n (37) P Y n S n n ( = n log + log δ [Zj [ Zj S j + δ S j )(38) j= where (37) is by independence of X n and S n and (38) is because under equiprobable X n we have that P Y n S n is also equiprobable. Using (38) we can find E [i(x n ; Y n S n ) = nc and (39) Var[i(X n ; Y n S n ) = nv + O(). (40)

By (38) we see that i(x n ; Y n S n ) is a sum of weakly dependent random variables. For such a process, Tikhomirov s theorem [ provides an extension of the Berry-Esseen inequality, namely: P [ i(x n ; Y n S n ) > nc + V λ Q(λ) B log n. n (4) In addition, similarly to Lemma 45, Appendix G of [, we can show for arbitrary A: E [exp{ i(x n ; Y n S n ) + A} {i(x n ; Y n S n ) A} B log n. (4) In (4) and (4), B and B are fixed constants, which do not depend on A or λ. Finally, we set where log M = nc V Q (ɛ n ), (43) ɛ n = ɛ (B + B ) log n. (44) Then, by Theorem 4 we know that there exists a code with M codewords and average probability of error p e bounded by 4 [ { [ } p e E exp i(x n ; Y n S n ) log M + (45) [ P i(x n ; Y n S n ) log M + B log n (46) ɛ n + (B + B ) log n (47) ɛ, (48) where (46) is by (4) with A = log M, (47) is by (4) and definition of log M and (48) is by (44). Therefore, by invoking Taylor s expansion in (43) we have log M (n, ɛ) log M nc V Q (ɛ) + O(log n). (49) Converse: In the converse part we will assume that the transmitter has access to the full state sequence S n and then generates X n based on both the input message and S n. Take the best such code with M (n, ɛ) codewords and average probability of error no greater than ɛ. We now propose to treat the pair (X n, S n ) as a combined input to the channel and the pair (Y n, S n ) as a combined output, available to the decoder. Note that in this situation, the encoder induces a distribution P X n S n and is necessarily randomized because the distribution of S n is given by the output of the Markov chain and is independent of the transmitted message W. To apply Theorem 5 we choose the auxiliary channel that passes S n unchanged and generates Y n equiprobably: Q Y n X n S n(yn, s n x n ) = n. (50) 4 In the statement of the theorem we claimed a stronger result about maximal probability of error. Its proof is only slightly different and is omitted. Note that by the constraint on the encoder, S n is independent of the message W. Moreover, under the Q-channel the Y n is also independent of W and we clearly have ɛ M. (5) Therefore by Theorem 5 we obtain β ɛ (P Xn Y n S n, Q X n Y n S n) M. (5) To lower bound β ɛ (P Xn Y n S n, Q X n Y n Sn) via (3) we notice that log P X n Y n S n Q X n Y n S n = log P Y n Xn SnP Xn Sn Q Y n X n S nq X n S n (53) = log P Y n X n S n Q Y n X n S n (54) = i(x n ; Y n S n ), (55) where (53) is because P Xn S n = Q X n Sn and (55) follows simply by noting that P Y n S n in (37) is also equiprobable. Now set log γ = nc V Q (ɛ n ), (56) where this time ɛ n = ɛ + B log n +. (57) By (3) we have for α = ɛ: β ɛ ( [ ɛ P log P X n Y n S n γ Q X n Y n S n ) log γ (58) = γ ( ɛ P [i(xn ; Y n S n ) log γ) (59) ( ɛ ( ɛ n ) B ) log n (60) γ n =, (6) nγ where (59) is by (55), (60) is by (4) and (6) is by (57). Finally, log M (n, ɛ) log β ɛ (6) log γ + log n (63) = nc V Q (ɛ n ) + log n (64) = nc V Q (ɛ) + O(log n), (65) where (6) is just (5), (63) is by (6), (64) is by (56) and (65) is by Taylor s formula applied to Q using (57) for ɛ n.