Asymptotics of Input-Constrained Erasure Channel Capacity

Size: px

Start display at page:

Download "Asymptotics of Input-Constrained Erasure Channel Capacity"

Estella Mathews
6 years ago
Views:

1 Asymptotics of Input-Constrained Erasure Channel Capacity Yonglong Li The University of Hong Kong Guangyue Han The University of Hong Kong May 3, 06 Abstract In this paper, we examine an input-constrained erasure channel and we characterize the asymptotics of its capacity when the erasure rate is low. More specifically, for a general memoryless erasure channel with its input supported on an irreducible finite-type constraint, we derive partial asymptotics of its capacity, using some series expansion type formulas of its mutual information rate; and for a binary erasure channel with its first-order Markovian input supported on the (, )-RLL constraint, based on the concavity of its mutual information rate with respect to some parameterization of the input, we numerically evaluate its first-order Markov capacity and further derive its full asymptotics. The asymptotics result obtained in this paper, when compared with the recently derived feedback capacity for a binary erasure channel with the same input constraint, enable us to draw the conclusion that feedback may increase the capacity of an input-constrained channel, even if the channel is memoryless. Index Terms: erasure channel, input constraint, capacity, feedback. Introduction The primary concern of this paper is erasure channel, which is a common digital communication channel model that plays a fundamental role in coding and information theory. Throughout the paper, we assume that time is discrete and indexed by the integers. At time n, the erasure channel of interest can be described by the following equation: Y n = X n E n, () where the channel input {X n }, supported on an irreducible finite-type constraint S, is a stationary process taking values from the input alphabet X = {,,, K}, and the erasure process {E n }, independent of {X n }, is a binary stationary and ergodic process with erasure A preliminary version of this work has been presented in IEEE ISIT 04.

2 rate ε P (E = 0), and {Y n } is the channel output process over the output alphabet Y = {0,,, K}. The word erasure as in the name of our channel naturally arises if a 0 is interpreted as an erasure at the receiving end of the channel; so, at time n, the channel output Y n is nothing but the channel input X n if E n =, but an erasure if E n = 0. Let X denote the set of all the finite length words over X. Let F be a finite subset of X, and let S be the finite-type constraint with respect to F, which is a subset of X consisting of all the finite length words over X, each of which does not contain any element in F as a contiguous subsequence (or, roughly, elements in F are forbidden in S). The most well known example is the (d, k)-run-length-limited (RLL) constraint over the alphabet {, }, which forbids any sequence with fewer than d or more than k consecutive s in between two successive s; in particular, a prominent example is the (, )-RLL constraint, a widely used constraint in magnetic recording and data storage; see [30, 3]. For the (d, k)-rll constraint with k <, a forbidden set F is When k =, one can choose F to be F = { } {{ } : 0 l < d} {0} {{ 0} }. l k+ F = { } {{ } : 0 l < d}; l in particular when d =, k =, F can be chosen to be {}. The length of F is defined to be that of the longest words in F. Generally speaking, there may be many such F s with different lengths that give rise to the same constraint S; the length of the shortest such F s minus gives the topological order of S. For example, the topological order of the (, )-RLL constraint, whose shortest F proves to be {}, is. A finite-type constraint S is said to be irreducible if for any u, v S, there is a w S such that uwv S. As mentioned before, the input process X of our channel () is assumed to be supported on an irreducible finite-type constraint S, namely, A(X) S, where A(X) {x j i X : p X (x j i ) > 0}. The capacity of the channel (), denoted by C(S, ε), can be computed by C(S, ε) = sup I(X; Y ), A(X) S where the supremum is taken over all stationary processes X supported on S. Here, we note that input-constraints [50] are widely used in various real-life applications such as magnetic and optical recording [3] and communications over band-limited channels with inter-symbol interference []. Particularly, we will pay special attention in this paper to a binary erasure channel with erasure rate ε (BEC(ε)) with the input supported on the (, )-RLL constraint, denoted by S 0 throughout the paper. When there is no constraint imposed on the input process X, that is, S = X, it is well known that C(S, ε) = ( ε) log K; see Theorem 5.. When ε = 0, that is, when the channel is perfect with no erasures, C(S, ε) proves to be the noiseless capacity of the constraint S, which can be achieved by a unique m-th order Markov chain ˆX with A( ˆX) = S [33].

3 On the other hand, other than these two above-mentioned degenerated cases, explicit analytic formulas of capacity for non-degenerated cases have remained evasive, and the problem of analytically characterizing the noisy constrained capacity is widely believed to be intractable. The problem of numerically computing the capacity C(S, ε) seems to be as challenging: the computation of the capacity of a general channel with memory or input constraints is notoriously difficult and has been open for decades; and the fact that our erasure channel is only a special class of such ones does not appear to make the problem easier. Here, we note that for a discrete memoryless channel, Shannon gave a closed-form formula of the capacity in his celebrated paper [40], and Blahut [5] and Arimoto [], independently proposed an algorithm which can efficiently compute the capacity and the capacity-achieving distribution simultaneously. However, unlike the discrete memoryless channels, the capacity of a channel with memory or input constraints in general admits no single-letter characterization and very little is known about the efficient computation of the channel capacity. To date, most known results in this regard have been in the forms of numerically computed bounds: for instance, numerically computed lower bounds by Arnold and Loeliger [], A. Kavcic [5], Pfister, Soriaga and Siegel [35], Vontobel and Arnold [47]. One of the most effective strategies to compute the capacity of channels with memory or input constraints is the so-called Markov approximation scheme. The idea is that instead of maximizing the mutual information rate over all stationary processes, one can maximize the mutual information rate over all m-th order Markov processes to obtain the m-th order Markov capacity. Under suitable assumptions (see, e.g., [6]), when m tends to infinity, the corresponding sequence of Markov capacities will tend to the channel capacity. For our erasure channel, the m-th order Markov capacity is defined as C (m) (S, ε) = sup I(X; Y ), where the supremum is taken over all m-th order Markov chains supported on S. The main contributions of this work are the characterization of the asymptotics of the above-mentioned input-constrained erasure channel capacity. Of great relevance to this work are results by Han and Marcus [9], Jacquet and Szpankowski [4], which have characterized asymptotics of the capacity of the a binary symmetric channel with crossover probability ε (BSC(ε)) with the input supported on the (, )-RLL constraint. The approach in the above-mentioned work is to obtain the asymptotics of the mutual information rate first, and then apply some bounding argument to obtain that of the capacity. The approach in this work roughly follows the same strategy, however, as elaborated below, our approach differs from theirs to a great extent in terms of technical implementations. Throughout the paper, we use the logarithm with base e in the proofs and we use the logarithms with base in the numerical computations of the channel capacity. Below is a brief account of our results and methodology employed in this work. The starting point of our approach is Lemma. in Section, a key lemma that expresses the conditional entropy H(Y 0 Y n ) in a form that is particularly effective for analyzing asymptotics of C(S, ε) when ε is close to 0. As elaborated in Theorem., Lemma. naturally gives a lower and upper bound on C(S, ε), where the lower bound gives a counterpart result of Wolf s conjecture for a BEC(ε). Moreover, when applied to the case when X is a Markov chain, Lemma. yields some explicit series expansion type formulas in Theorem.4 and Corollary.5, which aptly pave the way for characterizing the asymptotics of 3

4 the input-constrained erasure channel capacity. Here we remark that the method in [9, 4] have been further developed for more general families of memory channels in [0, ] via examining the contractiveness of an associated random dynamical system [4]. However, the methodology to derive asymptotics of the mutual information rate in this work capitalizes on certain characteristics that are in a sense unique to erasure channels. In Section 3, we consider a memoryless erasure channel with the input supported on an irreducible finite-type constraint, and in Theorem 3., we derive partial asymptotics of its capacity C(S, ε) in the vicinity of ε = 0 where C(S, ε) is written as the sum of a constant term, a linear term in ε and an O(ε )-term. The lower bound part in the proof of this theorem follows from an easy application of Theorem.4, and the upper bound part hings on an adapted argument in [9]. In Section 4, we consider a BEC(ε) with the input being a first-order Markov process supported on the (, )-RLL constraint S 0. Within this special setup, we show in Theorem 4. that the I(X; Y ) is strictly concave with respect to some parameterization of X. And in Section 4., we numerically evaluate C () (S 0, ε) and the corresponding capacity-achieving distribution using the randomized algorithm proposed in [6] which proves to be convergent given the concavity of I(X; Y ). Moreover, the concavity of I(X; Y ) guarantees the uniqueness of the capacity achieving distribution, based on which we derive full asymptotics of the above input-constrained BEC(ε) around ε = 0 in Theorem 4., where C () (S, ε) is expressed as an infinite sum of all O(ε k )-terms. In Section 5, we turn to the scenarios when there might be feedback in our erasure channel. We first prove in Theorem 5. that when there is no input constraint, the feedback does not increase the capacity of the erasure channel even with the presence of the channel memory. When the input constraint is not trivial, however, we show in Theorem 5.3 that feedback does increase the capacity using the example of a BEC(ε) with the (, )-RLL input constraint, and so feedback may increase the capacity of input-constrained erasure channels even if there is no channel memory. The results obtained in this section suggest the intricacy of the interplay between feedback, memory and input constraints. A Key Lemma and Its Applications In this section, we focus on the mutual information of the erasure channel () introduced in Section. The starting point of our approach is the following key lemma, which is particularly effective for analysis of input-constrained erasure channels. Lemma.. For any n, we have H(Y 0 Y n ) = H(E 0 E n) + where [ n, ] { n,, }. Proof. Note that D [ n,] H(X 0 X D )P (E 0 =, E D =, E D c = 0), () H(Y 0 Y n ) = p(y n) 0 log p(y 0 y n) y n 0 = T (n) + T (n), 4

5 where T (n) = y n,y 0=0 p(y 0 n) log p(y 0 y n) and T (n) = y n,y 0 0 From the independence of {X n } and {E n }, it follows that p(y j i ) = p X (x j i )P (E j I(yi ) =, E Ī(y j i ) = 0) for k I(y j i ) x j i : x k=y k p(y 0 n) log p(y 0 y n). ) = p X (y I(y j i ) P (E I(y j i ) =, E Ī(y j i ) = 0). (3) Here and throughout the paper, let Y be the set of all finite length words over Y and we define, for any y j i Y, and For y 0 0, I(y j i ) = {k : i k j, y k 0}, Ī(y j i ) = {k : i k j, y k = 0} y I(y j i ) = {y k : k I(y j i )}. p(y 0 y n) = p(y0 n) p(y n) ) p X (y I(y 0 n ) P (E I(y 0 n ) =, E Ī(y n 0 = ) = 0) ) p X (y I(y n ) P (E I(y n ) =, E Ī(y n ) = 0) ) (a) = p X (y 0 y I(y n ) P (E 0 = E I(y n ) =, E Ī(y n ) = 0), where (a) follows from the fact that Ī(y0 n) = Ī(y n). Similarly, for y 0 = 0, ) P (E I(y 0 n ) =, E Ī(y n 0 ) = 0) p(y 0 y n) = p X (y I(y 0 n ) ) p X (y I(y n ) P (E I(y n ) =, E Ī(y n ) = 0) (a) = P (E 0 = E I(y n ) =, E Ī(y n ) = 0), where (a) follows from the fact that I(y n) 0 = I(y n). Therefore, T (n) = p(y n) 0 log p(y 0 y n) y n,y 0=0 = y n,y 0=0 = D [ n,] (a) = D [ n,] p(y 0 n) log P (E 0 = 0 E I(y n ) =, E Ī(y n ) = 0) y 0 n :I(y n )=D,y 0=0 p(y 0 n) log P (E 0 = 0 E D ) =, E D c = 0) P (E 0 = 0, E D =, E D c = 0) log P (E 0 = 0 E D =, E D c = 0), (4) 5

6 where (a) follows from the fact that for any given D [ n, ], p(y n) 0 = P (E 0 = 0, E D =, E D c = 0). Also, we have T (n) = y n,y 0 0 = y n,y 0 0 = T 3 (n) y 0 n :I(y n )=D,y 0=0 p(y 0 n) log p(y 0 y n) ) p(y n) 0 log p X (y 0 y I(y n ) P (E 0 = E I(y l ) =, E Ī(y l ) = 0) y n,y 0 0 (a) = T 3 (n) D [ n,] p(y 0 n) log P (E 0 = E I(y n ) =, E Ī(y n ) = 0) P (E 0 =, E D =, E D c = 0) log P (E 0 = E D =, E D c = 0), (5) where (a) follows from a similar argument as in the proof of (4) and T 3 (n) = p(y n) 0 log p X (y 0 y I(y )). n y n,y 0 0 From (3), it then follows that T 3 (n) = p(y n) 0 log p X (y 0 y I(y )) n y n,y 0 0 = = D [ n,] D [ n,] y 0 n :I(y0 n )=D {0} p X (y D, y 0 )P (E 0 =, E D =, E D c = 0) log p X (y 0 y D ) H(X 0 X D )P (E 0 =, E D =, E D c = 0). (6) The desired formula for H(Y 0 Y n ) then follows from (4), (5) and (6). One of the immediate applications of Lemma. is the following lower and upper bounds on C(S, ε). Theorem.. ( ε)c(s, 0) C(S, ε) ( ε) log K. 6

7 Proof. For the upper bound, it follows from Lemma. that I(X; Y ) = lim (H(Y 0 Y n ) H(Y 0 Y n, X 0 n n)) = lim H(X 0 X D )P (E 0 =, E D =, E D c = 0) n D [ n,] (a) lim n D [ n,] lim n D [ n,] = P (E 0 = ) log K = ( ε) log K, H(X 0 )P (E 0 =, E D =, E D c = 0) P (E 0 =, E D =, E D c = 0) log K where we have used the fact that conditioning reduces entropy for (a). Assume S is of topological order m, and let ˆX be the m-order Markov chain that achieves the noiseless capacity C(S, 0) of the constraint S. Again, it follows from Lemma. that I( ˆX; Y ) = lim (H(Y 0 Y n ) H(Y 0 Y n, ˆX 0 n n)) = lim H( ˆX 0 ˆX D )P (E 0 =, E D =, E D c = 0) n D [ n,] lim n D [ n,] (a) = lim n D [ n,] H( ˆX 0 H( ˆX 0 = P (E 0 = )H( ˆX 0 = ( ε)c(s, 0), ˆX m) ˆX m, ˆX D )P (E 0 =, E D =, E D c = 0) ˆX m)p (E 0 =, E D =, E D c = 0) where we have used the fact that { ˆX n } is an m-th order Markov chain for (a). Remark.3. The upper bound part of Theorem. also follows from the well-known fact that (see Theorem 5.) C(X, ε) = ( ε) log K and for any S, C(S, ε) C(X, ε), which is obviously true. Let C (S, ε) denote the capacity of a BSC(ε) with the (d, k)-rll constraint. In [49] Wolf posed the following conjecture on C (S, ε): C (S, ε) C (S, 0)( H(ε)), where H(ε) ε log ε ( ε) log( ε). A weaker form of this bound has been established in [36] by counting the possible subcodes satisfying the (d, k)-rll constraint in some linear coding scheme, but the conjecture for the general case still remains open. 7

8 It is well known that H(ε) is the capacity of a BSC(ε) without any input constraint, and ε is the capacity of a BEC(ε) without any input constraint. So, for an inputconstrained BEC(ε), the lower bound part of Theorem. gives a counterpart result of Wolf s conjecture. When applied to the channel with a Markovian input, Lemma. gives a relatively explicit series expansion type formula for the mutual information rate of (). Theorem.4. Assume {X n } is an m-th order input Markov chain. Then, I(X; Y ) = b(k,m) t=0 {i t } B (k,t) H(X 0 X i t, X k k m )P (E A(k,i t ) =, E Ā(k,i t ) = 0), (7) where A(k, i t ) = { k m,, k, i t, 0} and Ā(k, it ) = { k m,, 0} A(k, i t ) and B (n, u) = {{i,, i u } [ n, ] : for all j =,, u, {i j, i j +,, i j + m} {i,, i u }} and b(k, m) = (m) k m +R(k), here R(k) denotes the remainder of k divided by m. Proof. Note that H(Y 0 X 0 n, Y n ) = H(X 0 E 0 X 0 n, E n, Y n ) (a) = H(E 0 E n), where (a) follows from the independence of {X n } and {E n }. From Lemma., it then follows that Now, letting I(X; Y ) = lim (H(Y 0 Y n ) H(Y 0 X 0 n n, Y n )) = lim H(X 0 X D )P (E 0 =, E D =, E D c = 0). (8) n D [ n,] B(n, u) = {D [ n, ] : D = u} and B (n, u) = B(n, u) B (n, u), we deduce that, for {i t } B (k, t) D [ n, k m] P (E A(k,i t ) =, E D =, E Ā(k,i t ) = 0, E [ n, k m] D = 0) = P (E A(k,i t ) =, E Ā(k,i t ) = 0). 8

9 and n k=m {i,...,i k } B (n,k) = n m+ b(k,m) t=0 H(X 0 X i k )P (E 0 =, E i k =, E ī k = 0) {i t } B(k,t) D [ n, k m] { H(X0 X A(k,i t ), X D ) } P (E A(k,i t ) =, E D =, E Ā(k,i t ) = 0, E [ n, k m] D = 0) (a) = = n m+ b(k,m) t=0 {i t } B(k,t) D [ n, k m] { H(X0 X A(k,i t ) ) } P (E A(k,i t ) =, E D =, E Ā(k,i t ) = 0, E [ n, k m] D = 0) n m+ b(k,m) t=0 {i t } B(k,t) H(X 0 X A(k,i t ))P (E A(k,i t ) =, E Ā(k,i t ) = 0), where (a) follows from the fact that {X n } is an m-th order Markov chain. Then it follows that H(X 0 X D )P (E 0 =, E D =, E D c = 0) D [ n,] = n = = where n k=m n m+ {i,...,i k } B(n,k) {i,...,i k } B (n,k) b(k,m) t=0 T (n) = H(X 0 X i k )P (E 0 =, E i k =, E ī k = 0) b(n,m) + {i,...,i k } B (n,k) H(X 0 X i k )P (E 0 =, E i k =, E ī k = 0) {i t } B(k,t) H(X 0 X A(k,i t ))P (E A(k,i t ) =, E Ā(k,i t ) = 0) + T (n), (9) b(n,m) {i k } B (n,k) It follows from H(X 0 X i k ) log K that T (n) b(n,m) {i k } B (n,k) H(X 0 X i k )P (E 0 =, E i k =, E ī k = 0). P (E 0 =, E i k =, E ī k = 0) log K P (F n) log K, where F n is the event that there is no m consecutive s in E n. Now, let W i = (E i,, E i m+ ) for i. Then it follows from the assumption that W i is also a stationary and ergodic process with P (W i = (,, )) > 0. Using Poincare s recurrence 9

10 theorem [0], we have that P (W i = (,, ) i.o.) =, which implies that P (F ) = 0, where F denotes the event that there is no m consecutive s in E. This, together with the fact that lim n P (F n ) = P (F ), implies that lim n T (n) = 0, and therefore the proof of the theorem is complete. The following corollary can be readily deduced from Theorem.4. Corollary.5. Assume that {E n } is i.i.d. and {X n } is an m-th order Markov chain. Then where I(X; Y ) = ( ε) m+ a(k, t) = {i...i t} B (k,t) b(k,m) t=0 In particular, if {X n } is a first-order Markov chain, I(X; Y ) = ( ε) a(k, t)( ε) t ε k t, (0) H(X 0 X i t, X k k m ). H(X 0 X k )ε k. () Remark.6. A series expansion type formula for H(X Y ) different from () is given in Theorem of [46] for a discrete memoryless erasure channel with a first-order input Markov chain. It can be verified that these two formulas are equivalent in the sense that either one can be deduced from the other one via simple derivations. The form that our formula takes however makes it particularly effective for the capacity analysis of an input-constrained erasure channel. 3 Input-Constrained Memoryless Erasure Channel In this section, we will focus on the case when {E n } is i.i.d. and S is an irreducible finite-type constraint of topological order m. With Lemma. and Corollary.5 established, we are ready to characterize the asymptotics of the capacity of this type of erasure channels. As mentioned in Section, when ε = 0, it is well known [33] that there exists an mth-order Markov chain ˆX with A( ˆX) = S such that H( ˆX) = H( ˆX 0 ˆX m) = max H(X) = C(S, 0), () A(X) S where the maximization is over all stationary processes supported on S. theorem characterizes the asymptotics of C(S, ε) near ε = 0. The following Theorem 3.. Assume that {E n } is i.i.d. Then, { C(S, ε) = C(S, 0) (m + )H( ˆX m 0 ˆX m) H( ˆX 0 i= } i ˆX i+, ˆX i m ) ε + O(ε ). (3) 0

11 Moreover, for any n m, C (n) (S, ε) is of the same asymptotic form as in (3.), namely, { } C (n) (S, ε) = C(S, 0) (m + )H( ˆX m 0 ˆX m) H( ˆX i 0 ˆX i+, ˆX i m ) ε + O(ε ). (4) Proof. To establish (3), we prove that C(S, ε) is lower and upper bounded by the same asymptotic form as in (3). For the lower bound part, we consider the channel () with ˆX as its input. Note that and furthermore, for k m + where we have defined It then follows that P ( ˆF 0 ) = ( ε) m and P ( ˆF k ) = ε( ε) m for k m, ˆF k = {E k m k P ( ˆF k ) = ( ε) m+ b(k,m) t=0 i= B (k, t) ( ε) t+m ε k t, =, E k = 0, E k+ contains no m consecutive s}. k=m+ ( ε) b(k,m) t=0 k=m+ (a) = ( ε)( a(k, t)( ε) t ε k t P ( ˆF k ) log K m P ( ˆF k ) log K = (mε ( ε) m + m u= ( ) m ( ε) m u ε u ) log K u = O(ε ), (5) where (a) follows from P ( k 0 ˆFk ) = and the constant in O(ε ) depends only on m and K. Then, from Corollary.5, it follows that C(S, ε) I( ˆX; Y ) = ( ε) m+ = ( ε) m+ m (b) = H( ˆX 0 ˆX m) + b(k,m) { t=0 b(k,m) t=0 a(k, t)( ε) t ε k t a(k, t)( ε) t ε k t + ( ε) m+ (m + )H( ˆX 0 ˆX m) m H( ˆX 0 i= k=m+ b(k,m) t=0 a(k, t)( ε) t ε k t } i ˆX i+, ˆX i m ) ε + O(ε ),

12 m where (b) follows from (5) and b(k,m) t=0 { a(k, t)( ε) t+m+ ε k t = H( ˆX 0 ˆX m )+ (m + )H( ˆX 0 ˆX m m ) H( ˆX 0 i= i ˆX i+, ˆX i m ) This, together with (), establishes that C(S, ε) is lower bounded by the asymptotic form in (3). For the upper bound part, we will adapt the argument in [9]. Let and S n = p n = (p(ˆx 0 n) : ˆx 0 n A( ˆX n)) 0 : p(ˆx 0 n) > 0, p(ˆx 0 n) = ˆx n 0 A( ˆX n 0 ) S n,δ = {p n S n : p(ˆx 0 n) > δ for any ˆx 0 n A( ˆX 0 n)}, where A( ˆX 0 n) = {ˆx 0 n : p(ˆx 0 n) > 0}. In this proof, we define It then follows from Lemma. that C n (ε, S) = sup p n S n H(Y 0 Y n ) H(ε). C n (S, ε) = sup p n S n f(p n, ε), } ε+o(ε ). where f(p n, ε) n H(X 0 X D )( ε) k+ ε n k. k= D [ n,], D =k Let p n (ε) maximize f(p n, ε). As f(p n, ε) is continuous in (p n, ε) and is maximized at ˆp n when ε = 0, there exists some ε 0 > 0 (depends on n) and δ > 0 such that for all ε < ε 0, p n (ε) S n,δ. Then for ε ε 0, there exists some constant M (depends on n) such that { ( ) } C n (S, ε) sup p n S n,δ From now on, we write H(X 0 X n) + g (p n ) = H(X 0 X n), g (p n ) = (n + )H(X 0 X n) ( n k= (n + )H(X 0 X n) H(X 0 X k+, X k n ) n k= H(X 0 X k+, X k n ) Letting H = H(p n ) be the Hessian of g (p n ), we now expand g (p n ) and g (p n ) around p n = ˆp n to obtain g (p n ) = g (ˆp n ) + qt nhq n + O( q n ) ε ) +Mε. ε. and g (p n ) = g (ˆp n ) + ˆx 0 n A( ˆX 0 n ) g (ˆp n ) p(ˆx 0 n) q(ˆx0 n) + O( q n ).

13 where q n p n ˆp n contains all q(ˆx 0 n) as its coordinates. Since H is negative definite (see Lemma 3. [9]), we deduce that, for q n sufficiently small, g (p n ) + g (p n )ε g (ˆp n ) + g (ˆp n )ε + 4 qt nhq n + g (ˆp n ) p(ˆx 0 n) q(ˆx0 n) ε. ˆx 0 n A( ˆX 0 n ) Without loss of generality, we henceforth assume H is a diagonal matrix with all diagonal entries denoted k(ˆx 0 n) < 0 (since otherwise we can diagonalize H). Now, let { A ( ˆX n) 0 = ˆx 0 n : 4 k(ˆx0 n) q(ˆx 0 n ) + g (ˆp n ) p(ˆx 0 q(ˆx 0 n ) } ε < 0 n) and let A ( ˆX 0 n) denote the complement of A ( ˆX 0 n) in A( ˆX 0 n): A ( ˆX 0 n) = A( ˆX 0 n) A ( ˆX 0 n). Then, we have f(p n, ε) g (ˆp n ) + g (ˆp n )ε + 4 qt nhq n + g (ˆp n ) p(ˆx 0 q(ˆx 0 n) ε + Mε n)) ˆx 0 n A( ˆX n 0 ) ( g (ˆp n ) + g (ˆp n )ε + 4 q(ˆx0 n) k(ˆx 0 n) + g (ˆp n ) p(ˆx 0 q(ˆx 0 n ) ) ε + Mε n) (a) g (ˆp n ) + g (ˆp n )ε + ˆx 0 n A ( ˆX 0 n ) ˆx 0 n A ( ˆX 0 n ) 4 g (ˆp n) p(ˆx ε 0 n ) + Mε, k(ˆx 0 n) where (a) follows from the easily verifiable fact that for any ˆx 0 n A ( ˆX 0 n), 4 q(ˆx0 n) k(ˆx 0 n) + g (ˆp n ) p(ˆx 0 q(ˆx 0 n ) 4 g (ˆp n) p(ˆx ε ε 0 n ). n) k(ˆx 0 n) Since ˆX n 0 is an m-th order Markov chain, we deduce that ( g (ˆp n ) + g (ˆp n )ε = H( ˆX 0 = H( ˆX 0 ˆX n) + ˆX m) + ( (n + )H( ˆX 0 (m + )H( ˆX 0 ˆX n) ˆX m) k= n H( ˆX 0 k= m H( ˆX 0 k= k ˆX k+, ˆX n ) ) k ˆX k+, ˆX m ) We now ready to deduce that, for some positive constant M, ( ) f(p n, ε) H( ˆX 0 ˆX m) + (m + )H( ˆX m 0 ˆX m) H( ˆX k 0 ˆX k+, ˆX m ) ε + M ε, 3 ε ) ε,

14 which further implies that C(ε, S) C n (ε, S) H( ˆX 0 ˆX m)+ ( (m + )H( ˆX 0 ˆX m) m H( ˆX 0 k= k ˆX k+, ˆX m ) The proof of (3) is then complete. With C(S, ε) replaced with C (m) (S, ε), the proof of (3) also establishes (4). ) ε+m ε. Remark 3.. In a fairly general setting (where input constraints are not considered), a similar asymptotic formula with a constant term, a term linear in ε and a residual o(ε)-term has been derived in Theorem 3 of [46]. As an immediate corollary of Theorem 3., the following result gives asymptotics of the capacity of a BEC(ε) with the input supported on the (, )-RLL constraint S 0. Corollary 3.3. Assume K = and {E n } is i.i.d. Then, we have C(S 0, ε) = log λ log + λ ε + O(ε ), and for any n, C (n) (S 0, ε) is of the same asymptotic form, namely, C (n) (S 0, ε) = log λ log + λ ε + O(ε ). (6) Proof. Let λ = ( + 5)/. It is well known [30] that the noiseless capacity C(S 0, 0) = log λ and the first-order Markov chain { ˆX n } with the following transition probability matrix [ ] /λ /λ Π =, (7) 0 achieves the noiseless capacity, that is, H( ˆX) = C(S 0, 0) = log λ. Furthermore, via straightforward computations, we deduce that H( ˆX 0 ˆX ) = λ + λ H(/λ ) + + λ H(/λ ) = log λ log + λ, which, together with the fact that implies H( ˆX) = H( ˆX 0 ˆX ) = log λ, C(S 0, ε) = log λ log + λ ε + O(ε ), as desired. And the asymptotic form of C (n) (S 0, ε) follows from similar computations. Remark 3.4. The asymptotic form in (6) only gives partial asymptotics of C (n) (S 0, ε) for any n. For the case n =, we will derive later on the full asymptotics of C () (S 0, ε); see (4) in Section 4. 4

15 4 Input-Constrained Binary Erasure Channel In this section, we will focus on a BEC(ε) with the input being a first-order Markov process supported on the (, )-RLL constraint S 0. To be more precise, we assume that K =, {E n } is i.i.d. and {X n } is a first-order Markov chain, taking values in {, } and having the following transition probability matrix: [ ] θ θ Π =. 0 In Section 4., we will show that I(X; Y ) is concave with respect to θ, and in Section 4., we apply the algorithm in [6] to numerically evaluate C () (S 0, ε), whose convergence is guaranteed by the above-mentioned concavity result. Finally, in Section 4.3, we characterize the full asymptotics of C () (S 0, ε) around ε = Concavity The concavity of the mutual information rate of special families of finite-state machine channels (FSMCs) has been considered in [] and [9]. The results therein actually imply that the concavity of I(X; Y ) with respect to some parameterization of the Markov chain X when ε is small enough. In this section, however, we will show that I(X; Y ) is concave with respect to θ, irrespective of the values of ε. Below is the main theorem of this section. Theorem 4.. For all ε [0, ), I(X; Y ) is strictly concave with respect to θ, 0 θ. Proof. From Corollary.5, it follows that to prove the theorem, it suffices to show that for any n, H(X 0 X n ) is strictly concave with respect to θ, 0 θ. To prove this, we will deal with the following several cases: Case : n =. Straightforward computations give and H (X 0 X ) = H(X 0 X ) = θ log θ ( θ) log( θ) + θ { log θ 4 log( θ) ( + θ) 3 θ 4 } θ +. One checks that the function within the brace is negative and it takes the maximum at θ = /. Therefore H(X 0 X ) is strictly concave in θ. Case : n. By definition, H(X 0 X n ) = P (X n = )H(X 0 X n = ) + P (X n = )H(X 0 X n = ). The following facts can be verified easily: (i) f(θ) P (X n = ) = P (X n = ) = +θ ; (ii) the n-step transition probability matrix of the Markov chain {X n } is [ ] Π n gn+ (θ) g = n+ (θ), g n (θ) g n (θ) 5

16 where g n (θ) ( θ)n ; + θ (iii) H(y) = y log y ( y) log( y) is strictly concave with respect to y for y (0, ). With the above notation, we have and H(X 0 X n ) = f(θ)h(g n+ (θ)) + ( f(θ))h(g n (θ)) H (X 0 X n ) = fh (g n+ )(g n+) + ( f)h (g n )(g n) (8) + f (H(g n+ ) H(g n )) (9) f H (g n )g n + ( f)h (g n )g n (0) + f H (g n+ )g n+ + fh (g n+ )g n+, () where f = f(θ) and g n = g n (θ). It follows from (i) and (iii) that the term (8) is strictly negative. So, to prove the theorem, it suffices to show that T (9) + (0) + () 0. By the mean value theorem, (9) = ( + θ) (g 3 n+ g n ) log z z = ( θ)n ( + θ) log z, ( + θ) 4 z where z lies between g n and g n+. As a function of z, ( θ)n (+θ) log z (+θ) 4 z at g n. It then follows that where and T c n{θ + ( θ) n [(n 3n)θ + (n n ) + n + n]} ( + θ) 4 + c n+{4 ( θ) n [(n 3n)θ + (n n ) + n + n]} ( + θ) 4 = c n[θ + Q(n, θ)( θ) n ] ( + θ) 4 + c n+[4 Q(n, θ)( θ) n ] ( + θ) 4, Q(n, θ) = (n 3n)θ + (n n )θ + n + n We then consider the following several cases: c n = log g n g n. takes the maximum Case.: n is a positive even number. We first consider the case that g n. For this case, obviously we have c n 0, g n+ > and Q(n, θ) > 0, which further implies that T c n[θ Q(n, θ)θ n ] ( + θ) 4 + c n+[4 + Q(n, θ)θ n ] ( + θ) 4 < 0. 6

17 Now, for the case that g n >, again obviously we have c n < 0 and furthermore, c n+ c n = log g n+ g n+ log g n g n 0, where we have used the fact that g n g n+ for a positive even number n. Now, we are ready to deduce that T c n[θ Q(n, θ)θ n ] + c n+[4 + Q(n, θ)θ n ] ( + θ) 4 ( + θ) 4 = (c n+ c n )(4 + Q(n, θ)θ n ) + c n(θ + ) ( + θ) 4 ( + θ) 4 < 0. Case.: n is a positive odd integer and n 3. In this case, we have T c n[θ + Q(n, θ)θ n ] + c n+[4 Q(n, θ)θ n ] ( + θ) 4 ( + θ) 4 = (c n+ c n )[4 Q(n, θ)θ n ] + c n(θ + ) ( + θ) 4 ( + θ) { 4 } [4 Q(n, θ)θ n ]θ n + (θ + )(/g ( + θ) 4 n ), ( y )y where the last inequality follows from the mean value theorem, the inequality log z z for z > 0 and the fact that z lies between g n and g n+. Let y = B n /( + θ) and C n = + θ B n, where B n [ θ n+, + θ n ]. Then { } [4 Q(n, θ)θ n ]θ n T + (θ + )(/g ( + θ) 4 n ) ( g )g (θ 4θn )B n C n + ( + θ)( + θ n )θ n (4 Q(n, θ)θ n ). B n C n ( + θ) 3 ( + θ n ) Note that the above numerator, as a function of B n, takes the maximum at B n = + θ n. Denote this maximum by θ( + θ n )h (n, θ), where h (n, θ) = θ 3θ n + θ n Q(n, θ) + θ n Q(n, θ) 4 + θ n. To complete the proof, it suffices to prove h (n, θ) 0. Substituting Q(n, θ) into h (n, θ), we have h (n, θ) = θ 3θ n + θ n Q(n, θ) + θ n Q(n, θ) 4 + θ n where = θ 3θ n + θ n + θn [(n 3n)θ 3 +(3n 5n 4)θ + (3n n 8)θ + n + n] h(n, θ), h(n, θ) = θ 3θ n + θ n + (4n 4n 6)θ n+. The following facts can be easily verified: 7

18 (a) h(n, θ) takes the minimum at some θ 0, where θ 0 satisfies the following equation h (n, θ) = 0; () (b) h (n, θ) < 0 for θ [0, /] and n. It then follows from (a) and (b) that θ 0 / for n >. Solving () in θ n, we have where θ n 0 = (3 θ 0)n 3 + ((3 θ 0 )n 3) + 4(n + )(4n 4n 6)θ0 5. (n + )(4n 4n 6)θ0 5 For n, substituting θ n 0 into h(n, θ), we have h (n, θ) h(n, θ) h(n, θ 0 ) = θ 0 3θ0 n + θ0 n + (4n 4n 6)θ0 n+ θ0 n [ 3 + θ 0 + (4n 4n 6)θ0 4 θ0 n ] θ0 n v(n), v(n) = 5 + n 3 + (n 3) + (n + )(4n 4n 6) 3. (n + ) One checks that v(n) > 0 for n 65. Now, with the fact that h (n, θ) > 0 for 3 n 65 (this can be verified via tedious yet straightforward computations since h (n, θ) is an elementary function), we conclude that for h (n, θ) 0 for all n 3 and θ [0, ]. 4. Numerical evaluation of C () (S 0, ε) When ε = 0, that is, when the channel is perfect with no erasures, both C(S, ε) and C (m) (S, ε) boil down to the noiseless capacity of the constraint S, which can be explicitly computed [33]; however, little progress has been made for the case when ε > 0 due to the lack of simple and explicit characterization for C(S, ε) and C (m) (S, ε). In terms of numerically computing C(S, ε) and C (m) (S, ε), relevant work can be found in the subject of FSMCs, as input-constrained memoryless erasure channels can be regarded as special cases of FSMCs. Unfortunately, the capacity of an FSMC is still largely unknown and the fact that our channel is only a special FSMC does not seem to make the problem easier. Recently, Vontobel et al. [48] proposed a generalized Blahut-Arimoto algoritm (GBAA) to compute the capacity of an FSMC; and in [6], Han also proposed a randomized algorithm to compute the capacity of an FSMC. For both algorithms, the concavity of the mutual information rate is a desired property for the convergence (the convergence of the GBAA requires, in addition, the concavity of certain conditional entropy rate). On the other hand, as elaborated in [9], such a desired property, albeit established for a few special cases [, 9], is not true in general. 8

19 θ max (ε) ε Figure : Capacity-achieving Distribution C () ( 0, ε) ε Figure : Capacity The concavity established in the previous section allows us to numerically compute C () (S 0, ε) using the algorithm in [6]. The randomized algorithm proposed in [6] iteratively compute {θ n } in the following way: { θ n, if θ n + a n g θ n+ = n b(θ n ) [0, ], θ n + a n g n b(θ n ), otherwise, where g n b(θ n ) is a simulator for I (X; Y ) (for details, see [6]). The author shows that {θ n } converges to the first-order capacity-achieving distribution if I(X; Y ) is concave with respect to θ, which has been proven in Theorem 4.. Therefore, with proven convergence, this algorithm can be used to compute the first-order capacity-achieving distribution θ(ε) and the first-order capacity C () (S 0, ε) (in bits), which are shown in Fig. and Fig., respectively. 4.3 Full Asymptotics As in Section 4., the noiseless capacity of (, )-RLL constraint S 0 is achieved by the first-order Markov chain with the transition probability matrix (7). So, we have θ max (ε) = argmax I(X; Y ) = /λ, θ where λ = ( + 5)/. In this section, we give a full asymptotic formula for θ max (ε) around ε = 0, which further leads to a full asymptotic formula for C () (S 0, ε) around ε = 0. 9

20 The following theorem gives the Taylor series of θ max in ε around ε = 0, which leads to an explicit formula for the n-th derivative of C () (S 0, ε) at ε = 0, whose coefficients can be explicitly computed. Theorem 4.. a) θ max (ε) is analytic in ε for ε [0, ) and ( d H(X 0 X ) max(0) = dθ θ (n) where is taken over all nonnegative intergers m,, m n k satisfying the constraint and n k= + ( ) n k! k m,m,,m n k m,m,,m n=0 ( )) λ m,m,,m n k a(m,, m n k ) a(m,, m n ) dm + +m n+ H(X 0 X ) dθ m + +m n+ d m + +m n k + H(X 0 X k ) dθ m + +m n k + ( ) n (θ λ max(0)) (j) m j j= ( ) n k (θ λ max(0)) (j) m j j= }, (3) m + m + + (n k)m n k = n k a(m,, m n k ) = (n k)!. m! m mn k!(n k) m n k b) C () (S 0, ε) is analytic in ε for ε [0, ) with the following Taylor series expansion around ε = 0: ( C () d n G 0 (ε) (S 0, ε) = dε n + dn (G (ε) G 0 (ε)) n ( ) n d n k ε=0 dε n + ε=0 k dε n k {(G k(ε) + G k (ε) G k (ε))} n=0 where G k (ε) = H(X 0 X k )(θ max (ε)). Proof. a) For ε > 0, I(X; Y ) = k= { 0 θ = 0 or, > 0 θ (0, ). With Theorem 4. establishing the concavity of I(X; Y ), θ max should be the unique zero point of the derivative of the mutual information rate. So, θ max (ε) (0, ) and satisfies (4) ε=0 ) ε n, 0 = di(x; Y ) dθ = ( ε) dh(x 0 X k ) ε k. (5) dθ According to the analytic implicit function theorem [7], θ max (ε) is analytic in ε for ε [0, ). In the following the s-th order derivative of θ max (ε) at ε = 0 is computed. It follows from 0

21 the Leibniz formula and the Faa di Bruno formula [7] that 0 = = = { } d n dh(x0 X k ) ε k dε n dθ ε=0 n ( ) { n d (n k) dh(x0 X k ) k! k dε n k dθ n ( ) n k! k } ε=0 dm+ +mn k+ H(X 0 X k) a(m,, m n k ) dθ m + +m n k + m,m,,m n k ( ) n k (θ λ max(0)) (j) m j j= (6) which immediately implies a). b) Note that C () (S 0, ε) = ( ε) G k (ε)ε k = G 0 (ε) + (G (ε) G 0 (ε))ε + (G k (ε) + G k (ε) G k (ε))ε k. It then follows from the Leibniz formula that { d n } (G dε n k (ε) + G k (ε) G k (ε))ε k k= n ( ) n d n t = t dε {(G k(ε) + G n t k (ε) G k (ε))} ε k t. Therefore, d n C () (S 0, ε) dε s k= t=0 k= = dn G 0 (ε) ε=0 dε n + dn (G (ε) G 0 (ε)) ε=0 dε n ε=0 n ( ) n d n t + t dε {(G k(ε) + G n t k (ε) G k (ε))} ε k t k= t=0 = dn G 0 (ε) dε n + dn (G (ε) G 0 (ε)) ε=0 dε n ε=0 n ( ) n d n k + k dε {(G k(ε) + G n k k (ε) G k (ε))}, k= which immediately implies b). Despite their convoluted looks, (3) and (4) are explicit and computable. Below, we list the coefficients of C () (S 0, ε) (in bits) and θ max (ε) up to the third order, which are numerically computed according to (3) and (4) and rounded off to the ten thousandths decimal digit: ε=0 ε=0

22 Table : ε 0 ε ε ε 3 θmax(ε) C (ε) Feedback with Input-Constraint In this section, we consider the input-constrained erasure channel () as in Section however with possible feedback, and we are interested in comparing its feedback capacity C F B (S, ε) and its non-feedback capacity C(S, ε). The following theorem states that for the erasure channel without any input-constraint, feedback does not increase the capacity and both of them can be computed explicitly. This result is in fact implied by Theorem in [46], where a random coding argument has been employed in the proof; we nonetheless give an alternative proof in Appendix A for completeness. Theorem 5.. For the erasure channel () without any input constraints, feedback does not increase the capacity, and we have C F B (X, ε) = C(X, ε) = ( ε) log K. On the other hand, we will show in the following that feedback may increase the capacity when the input constraint in the erasure channel is non-trivial. As elaborated below, this is achieved by comparing the asymptotics of the feedback capacity and the non-feedback capacity for a special input-constrained erasure channel. In [37], Sabag et al. computed an explicit formula of feedback capacity for BEC with (, )-RLL input constraint S 0. Theorem 5.. channel is [37] The feedback capacity of the (, )-RLL input-constrained erasure C F B (S 0, ε) = max 0 p H(p) p +, ε where the unique maximizer p(ε) satisfies p = ( p) ε. Clearly, the explicit formula in Theorem 5. readily gives the asymptotics of the feedback capacity. To see this, note that p(0) = /λ and p() = /. Straightforward computations yield d log p(ε) dε ( p(ε)) log p(ε) = ( p(ε) + p(ε)( ε))( ε).

23 Hence, C F B (S 0, ε) = H(p(ε)) p(ε) + ε ( ε) log p(ε) = ε = log p(ε) + log p(ε) ( ε k ) = log p(0) k= d log p(ε) dε = log λ λ λ + log λ ε + O(ε ). ε + ε ε=0 4 log p(0) + O(ε ) It then follows from straightforward computations that for the case when ε is close to 0, C(S 0, ε) < C F B (S 0, ε). So, we have proven the following theorem: Theorem 5.3. For a BEC(ε) with the (, )-RLL input constraint, feedback increases the channel capacity when ε is small enough. Remark 5.4. An independent work in [44] also found that feedback does increase the capacity of a BEC(ε) with the same input constraint S 0, by comparing a tighter bound of non-feedback capacity C(S 0, ε), obtained via a dual capacity approach, with the feedback capacity C F B (S 0, ε). Remark 5.5. Recently, Sabag et al. [38] also computed an explicit asymptotic formula for the feedback capacity of a BSC(ε) with the input supported on the (, )-RLL constraint. By comparing the asymptotics of the feedback capacity with the that of non-feedback capacity [9], they showed that feedback does increase the channel capacity in the high SNR regime. It is well known that for any memoryless channel without any input constraint, feedback does not increase the channel capacity. Theorem 5. states when there is no input constraint, the feedback does not increase the capacity of the erasure channel even with the presence of the channel memory. Theorem 5.3 says that feedback may increase the capacity of inputconstrained erasure channels even if there is no channel memory. These two theorems, together with the results in [38, 44], suggest the intricacy of the interplay between feedback, memory and input constraints. Appendices A Proof of Theorem 5. We first prove that C(X, ε) = ( ε) log K. (7) A similar argument using the independence of {X i } and {E i } as in the proof of (3) yields that p(y n ) = P (E I(y n ) =, E Ī(y n ) = 0)P (X I(y n ) = y I(y n )). 3

24 It then follows that H(Y n ) = y n p(y n ) log p(y n ) = p(y n ) log P (E I(y n ) =, E Ī(y n) = 0) y n y n = D [,n] D [,n] = = D [,n] D [,n] H(E n ) + P (y n ) log P (X I(y n ) = y I(y n )) P (E D =, E D C = 0)P (X D = y D ) log P (E D =, E D c = 0) y n:i(yn )=D P (E D =, E D c = 0)P (X D = y D ) log P (X D = y D ) y n:i(yn )=D P (E D =, E D c = 0) log P (E D =, E D c = 0) + P (E D =, E D c = 0)H(X D ) + H(E n ) D [,n] P (E D =, E D c = 0) D log K D [,n] P (E D =, E D c = 0)H(X D ) = H(E n ) + E[E + + E n ] log K, (8) where the only inequality becomes equality if {X n } is i.i.d. with the uniform distribution. It then further follows that C(X, ε) = lim n n sup I(X n p(x n ) ; Y n ) = lim n n sup (H(Y n p(x n ) ) H(Y n X n )) = lim n n sup (H(Y n p(x n ) ) H(E n )) (a) lim n n E[E + + E n ] log K (b) = P (E = ) log K = ( ε) log K, where (a) follows from (8) and (b) follows from the ergodicity of {E n }. The desired (7) then follows from the fact that the only inequality (a) becomes equality if {X n } is i.i.d. with the uniform distribution. We next prove that C F B (X, ε) ( ε) log K, which, together with (7), immediately implies the theorem. Let W, independent of {E i }, be the message to be sent and X i (W, Y i ) denote the encoding function. As shown in [4], C F B (X, ε) = lim n n sup {p(x i = X i,y i ):i=,,n} 4 I(W ; Y n ).

25 Using the chain rule for entropy, we have n H(Y n W ) = H(Y i W, Y i ) (a) = = (b) = i= n i= n i= n i= H(E i X i W, Y i, X i, E i ) H(E i W, Y i, X i, E i ) H(E i E i ) = H(E n ), (9) where (a) follows from the fact that X i is a function of W and Y i and E i = 0 if and only if Y i = 0, (b) follows from the independence of W and {E i }. Note that for y i 0, p(y i w, y i ) = P (X i (w, y i ) = y i, E i = w, y i ) = P (X i (w, y i ) = y i w, y i )P (E i = X i (w, y i ) = y i, w, y i ) = P (X i (w, y i ) = y i w, y i )P (E i = E I(y i ) =, E Ī(y i ) = 0). (30) And for y i = 0, p(y i w, y i ) = P (E i = 0 E I(y i ) =, E Ī(y i ) = 0, w, yi ) (a) = P (E i = 0 E I(y i ) =, E Ī(y i ) = 0), (3) where (a) follows from the independence of W and {E i }. It then follows that p(y n ) = w p(w)p(y n w) = w (a) = w (b) = w = p(w) n i= p(y i w, y i ) p(w) p(y i w, y i i I(y n ) p(w) p(w) w ) ) i I(y n ) p(y i w, y i ) p(y i w, y i i Ī(yn ) i I(y n ) P (X i (w, y i ) = y i w, y i P (E i = 0 E I(y i ) =, E Ī(y i ) = 0) i Ī(yn ) ) P (E I(y n ) =, E Ī(y n ) = 0), where (a) follows from (30) and (b) follows from (3). Since for any D [, n], y n :I(yn )=D p(y n ) = P (E D =, E D c = 0), 5

26 which implies that q(ỹn )) w p(w) P (X i (w, ỹ i ) = ỹ i w, ỹ i ) : I(ỹ n ) = I(y n ) i I(ỹ n) is an M I(yn ) -dimensional probability mass function. Therefore, through a similar argument as before, we have H(Y n ) = y n p(y n ) log p(y n ) = p(y n ) log P (E I(y n ) =, E Ī(y n) = 0) y n y n = H(E n ) H(E n ) + D [,n] D [,n] y n :I(yn )=D P (E D =, E D c = 0)q(y n ) log q(y n ) P (E D =, E D c = 0) D log K P (E I(y n ) =, E Ī(y n ) = 0)q(y n ) log q(y n ) = H(E n ) + E[E + + E n ] log K, (3) where the inequality follows from the fact that q(y n ) is an M D -dimensional probability mass function. Combining (9) and (3), we have as desired. C F B (X, ε) = lim n n n sup {p(x i = X i,y i ):i=,,n} lim n E[E + + E n ] log K = P (E = ) log K = ( ε) log K, I(W ; Y n ) Acknowledgement. We would like to thank Navin Kashyap, Haim Permuter, Oron Sabag and Wenyi Zhang for insightful discussions and suggestions and for pointing out relevant references that result in great improvements in many aspects of this work. References [] S. Arimoto, An algorithm for computing the capacity of arbitrary discrete memoryless channels, IEEE Trans. Inf. Theory, vol. 8, no., pp. 4 0, Jan. 97. [] D. M. Arnold and H. A. Loeliger, On the information rate of binary-input channels with memory, in Proceedings of IEEE International Conference on Communications, vol. 9, pp , Jun

27 [3] D. M. Arnold, H. A. Loeliger, P. O. Vontobel, A. Kavcic and W. Zeng, Simulation- Based Computation of Information Rates for Channels With Memory, IEEE. Trans. Inf. Theory, vol.5, no.8, pp , Aug [4] D. Blackwell, The entropy of functions of finite-state markov chains, Trans. First Prague Conf. Information Theory, Statistical Decision Functions, Random Processes, pp. 3 0, 957. [5] R. Blahut, Computation of channel capacity and rate-distortion functions, IEEE Trans. Inf. Theory, vol. 8, no. 4, pp , Apr. 97. [6] J. Chen and P. Siegel, Markov processes asymptotically achieve the capacity of finitestate intersymbol interference channels, IEEE Trans. Inf. Theory, vol. 54, no. 3, pp , Mar [7] G. Constantine and T. Savits. A Multivariate Faa Di Bruno Formula with Applications, Trans. Amer. Math. Soc., vol. 348, no., pp , Feb [8] T. M. Cover and J. A. Thomas, Elements of Information Theory, New York: Wiley, 99. [9] A. Dembo, On Gaussian feedback capacity, IEEE Trans. Inf. Theory, vol. 35, no. 5, pp , Sep [0] R. Durrett, Probability: Theory and Examples, Cambridge University Press, 00. [] Y. Ephraim and N. Merhav, Hidden Markov processes, IEEE Trans. Inf. Theory, vol. 48, no. 6, pp , Jun. 00. [] G. Forney, Jr., Maximum-likelihood sequence estimation of digital sequences in the presence of intersymbol interference, IEEE Trans. Inf. Theory, vol. 8, no. 3, pp , Mar. 97. [3] R. Gallager, Information theory and reliable communication. New York: Wiley, 968. [4] A. Goldsmith and P. Varaiya, Capacity, mutual information, and coding for finitestate markov channels, IEEE Trans. Inf. Theory, vol. 4, no. 3, pp , Mar [5] R. M. Gray, Entropy and Information Theory. Springer US, 0. [6] G. Han, A randomized algorithm for the capacity of finite-state channels, IEEE Trans. Inf. Theory, vol. 6, no. 7, pp , July 05. [7] G. Han and B. Marcus, Analyticity of entropy rate of hidden Markov chains, IEEE Trans. Inf. Theory, vol. 5, no., pp , Dec [8] G. Han and B. Marcus, Derivatives of entropy rate in special families of hidden Markov chains, IEEE Trans. Inf. Theory, vol. 53, no. 7, pp , Jul

28 [9] G. Han and B. Marcus, Asymptotics of input-constrained binary symmetric channel capacity, Ann. Appl. Probab., vol. 9, no. 3, pp , 009. [0] G. Han and B. Marcus, Asymptotics of entropy rate in special families of hidden Markov chains, IEEE Trans. Inf. Theory, vol. 56, no. 3, pp , Mar. 00. [] G. Han and B. Marcus. Concavity of the mutual information rate for input-restricted memoryless channels at high SNR, IEEE Trans. Inf. Theory, vol. 58, no. 3, pp , Mar. 0. [] W. Hirt and J. L. Massey, Capacity of the discrete-time Gaussian channel with intersymbol interference, IEEE Trans. Inf. Theory, vol. 34, no. 3, pp , May 988. [3] P. Jacquet, G. Seroussi, and W. Szpankowski, On the entropy of a hidden Markov process, Theoret. Comput. Sci., vol. 395, no. -3, pp. 03 9, 008. [4] P. Jacquet and W. Szpankowski, Noisy constrained capacity for BSC channels, IEEE Trans. Inf. Theory, vol. 56, no., pp , Nov. 00. [5] A. Kavcic, On the capacity of Markov sources over noisy channels, in Proceedings of 00 IEEE Global Telecommunications Conference, vol. 5, pp , Nov. 00. [6] H. Kobayashi and D. T. Tang, Application of partial-response channel coding to magnetic recording systems, IBM Journal of Research and Development, vol. 4, no. 4, pp , 970. [7] S. G. Krantz and H. R. Parks, The Implicit Function Theorem: History, Theory, and Applications, Springer New York, 03. [8] Y. Li and G. Han, Input-constrained erasure channels: Mutual information and capacity, in Proceedings of the IEEE International Symposium on Information Theory, pp , Jul. 04. [9] Y. Li and G. Han, Concavity of mutual information rate of finite-state channels, in Proceedings of IEEE International Symposium on Information Theory, pp. 4 8, Jul. 03. [30] D. Lind and B. Marcus, An Introduction to Symbolic Dynamics and Coding. Cambridge University Press, 995. [3] B. Marcus, R. Roth, and P. Siegel, Constrained systems and coding for recording channels, Handbook of coding theory, Vol. I, II. Amsterdam: North-Holland, pp , 998. [3] M. Mushkin and I. Bar-David, Capacity and coding for the gilbert-elliott channels, IEEE Trans. Inf. Theory, vol. 35, no. 6, pp , Jun [33] W. Parry, Intrinsic markov chains, Transactions of the American Mathematical Society, vol., no., pp ,

29 [34] H. D. Pfister, The capacity of finite-state channels in the high-noise regime, Entropy of hidden Markov processes and connections to dynamical systems, London Math. Soc. Lecture Note Series. Cambridge: Cambridge Univ. Press, 0, vol. 385, pp. 79. [35] H. Pfister, J. B. Soriaga, and P. Siegel, On the achievable information rates of finite state ISI channels, in Proceedings of IEEE Global Telecommunications Conference, vol. 5, pp , Nov. 00. [36] A. Patapoutian and P. Vijay Kumar, The (d, k) Subcode of a Linear Block Code, IEEE Trans. Inf. Theory, vol. 38, no. 4, pp , Jul. 99. [37] O. Sabag, H. Permuter and N. Kashyap, The Feedback Capacity of the (, )-RLL Input-Constrained Erasure Channel, [38] O. Sabag, H. Permuter and N. Kashyap, The feedback capacity of the binary symmetric channel with a no-consecutive-ones input constraint, in 50th Ann. Allerton Conf. Urbana, IL, 05, to appear. [39] E. Seneta, Non-negative matrices and Markov chains, Springer Series in Statistics. New York: Springer, 98. [40] C. E. Shannon, A mathematical theory of communication, Bell Syst. Tech.J., vol. 7, pp , , 948. [4] C. Shannon, The zero error capacity of a noisy channel, IEEE Trans. Inf. Theory, vol., no. 3, pp. 8 9, Sep [4] S. Tatikonda and S. Mitter, The Capacity of Channels With Feedback, IEEE Trans. Inf. Theory, vol. 55, no., pp , Jan [43] J. Taylor, Several Complex Variables With Connections to Algebraic Geometry and Lie Groups, Graduate Studies in Mathematics. American Mathematical Society, 00. [44] A. Thangaraj Dual capacity upper bounds for noisy runlength constrained channels. Submitted to ITW 06. [45] Yang, Shaohua and Kavcic, A. and Tatikonda, S., Feedback Capacity of Stationary Sources over Gaussian Intersymbol Interference Channels, Global Telecommunications Conference, 006. GLOBECOM 06. IEEE,vol., no., pp. 6, Nov Dec [46] S. Verdu and T. Weissman, The Information Lost in Erasures, IEEE Trans. Inf. Theory, vol. 54, no., pp , Nov [47] P. O. Vontobel and D. M. Arnold, An upper bound on the capacity of channels with memory and constraint input, in Proceedings of IEEE Information Theory Workshop, pp , Sep.00. [48] P. O. Vontobel, A. Kavčić, D. M. Arnold, and H. A. Loeliger, A generalization of the Blahut-Arimoto algorithm to finite-state channels, IEEE Trans. Inf. Theory, vol. 54, no. 5, pp , May

Recent Results on Input-Constrained Erasure Channels

Recent Results on Input-Constrained Erasure Channels A Case Study for Markov Approximation July, 2017@NUS Memoryless Channels Memoryless Channels Channel transitions are characterized by time-invariant