Noisy Constrained Capacity for BSC Channels

Size: px

Start display at page:

Download "Noisy Constrained Capacity for BSC Channels"

Daniel Chapman
6 years ago
Views:

1 Noisy Constrained Capacity for BSC Channels Philippe Jacquet INRIA Rocquencourt 7853 Le Chesnay Cedex France Wojciech Szpankowski Department of Computer Science Purdue University W. Lafayette, IN U.S.A. Abstract We study the classical problem of noisy constrained capacity in the case of the binary symmetric channel (BSC), namely, the capacity of a BSC whose input is a sequence from a constrained set. As stated in [4]... while calculation of the noise-free capacity of constrained sequences is well known, the computation of the capacity of a constraint in the presence of noise... has been an unsolved problem in the half-century since Shannon s landmark paper.... We first express the constrained capacity of a binary symmetric channel with (d, k)-constrained input as a limit of the top Lyapunov exponents of certain matrix random processes. Then, we compute asymptotic approximations of the noisy constrained capacity for cases where the noise parameter ε is small. In particular, we show that when k 2d, the error term (excess of capacity beyond the noise-free capacity) is O(ε), whereas it is O(ε log ε) when k > 2d. In both cases, we compute the coefficient of the error term. In the course of establishing these findings, we also extend our previous results on the entropy of a hidden Markov process to higher-order finite memory processes. These conclusions are proved by a combination of analytic and combinatorial methods. I. INTRODUCTION We consider a binary symmetric channel (BSC) with crossover probability ε, and a constrained set of inputs. More precisely, let S n denote the set of binary sequences of length n satisfying a given (d, k)-rll constraint [8], i.e., no sequence in S n contains a run of zeros of length shorter than d or longer than k (we assume that the values d and k, d k, are understood from the context). We write n S n for n =... n. Furthermore, we denote S = n>0 S n. We assume that the input to the channel is a stationary process = { k } k supported on S. We regard the BSC channel as emitting a Bernoulli noise sequence E = {E k } k, independent of, with P(E i = ) = ε. The channel output is Z i = i E i. where denotes addition modulo 2 (exclusive-or). For ease of notation, we identify the BSC channel with its parameter ε. Let C(ε) denote conventional BSC channel capacity (over unconstrained binary sequences), namely, C(ε) = H(ε), where H(ε) = ε logε ( ε)log( ε). We use natural logarithms throughout. Entropies are correspondingly Preliminary version of this paper was presented at ISIT, Nice, Work of W. Szpankowski was supported in part by NSF STC Grant CCF , NSF Grants DMS , and CCF , NSA Grant H , the AFOSR Grant FA , and Humboldt Foundation. measured in nats. The entropy of a random variable or process will be denoted H( n ), and the entropy rate by H(). The noisy constrained capacity C(S, ε) is defined [4] by C(S, ε) = sup I(; Z) = lim S n n sup I( n, Z n ), () n Sn where I(; Z) is the mutual information and the supreme are over all stationary processes supported on S and S n, respectively. The noiseless capacity of the constraint is C(S) = C(S, 0). This quantity has been extensively studied, and several interpretations and methods for its explicit derivation are known (see, e.g., [8] and extensive bibliography therein). As for C(S, ε), the best results in the literature have been in the form of bounds and numerical simulations based on producing random (and, hopefully, typical) channel output sequences (see, e.g., [26], [23], [] and references therein). These methods allow for fairly precise numerical approximations of the capacity for given constraints and channel parameters. Our approach to the noisy constrained capacity C(S, ε) is different. We first consider the corresponding mutual information, I(; Z) = H(Z) H(Z ). (2) Since H(Z ) = H(ε), the problem reduces to finding H(Z), the entropy rate of the output process. If we restrict our attention to constrained processes that are generated by Markov sources, the output process Z can be regarded as a hidden Markov process (HMP), and the problem of computing I(; Z) reduces to that of computing the entropy rate of this HMP. The noisy constrained capacity follows provided we find the imizing distribution P of. It is well known (see, e.g., [8]) that we can regard the (d, k) constraint as the output of a kth-order finite memory (Markov) stationary process, uniquely defined by conditional probabilities P(x t x t t k ), where for any sequence {x i} i, we denote by x j i, j i, the sub-sequence x i, x i+,...,x j. For nontrivial constraints, some of these conditional probabilities must be set to zero in order to enforce the constraint (for example, the probability of a zero after seeing k consecutive zeros, or of a one after seeing less than d consecutive zeros). When the remaining free probabilities are assigned so that the entropy of the process is imized, we say that the process is entropic, and we denote the imizing distribution by

2 P. The noiseless capacity C(S) is equal to the entropy of P [8]. The Shannon entropy (or, simply, entropy) of a HMP was studied as early as [2], where the analysis suggests the intrinsic complexity of the HMP entropy as a function of the process parameters. Blackwell [2] showed an expression of the entropy in terms of a measure Q, obtained by solving an integral equation dependent on the parameters of the process. The measure is hard to extract from the equation in any explicit way. Recently, we have seen a resurgence of interest in estimating HMP entropies [7], [8], [4], [9], [20], [27]. In particular, one recent approach is based on computing the coefficients of an asymptotic expansion of the entropy rate around certain values of the Markov and channel parameters. The first result along these lines was presented in [4], where the Taylor expansion around ε = 0 is studied for a binary HMP of order one. In particular, the first derivative of the entropy rate at ε = 0 is expressed very compactly as a Kullback-Liebler divergence between two distributions on binary triplets, derived from the marginals of the input process. It is also shown in [4], [5] that the entropy rate of a HMP can be expressed in terms of the top Lyapunov exponent of a random process of 2 2 matrices (cf. also [], where the capacity of certain channels with memory is also shown to be related to top Lyapunov exponents). Further improvements, and new methods for the asymptotic expansion approach were obtained in [9], [27], and [8]. In [20] the authors express the entropy rate for a binary HMP where one of the transition probabilities is equal to zero as an asymptotic expansion including a O(ε log ε) term. As we shall see in the sequel, this case is related to the (, ) (or the equivalent (0, )) RLL constraint. Analyticity of the entropy as a function of ε was studied in [7]. In Section II of this paper we extend the results of [4], [5] on HMP entropy to higher order Markov processes. We show that the entropy of a rth-order HMP can be expressed as the top Lyapunov exponent of a random process of matrices of dimensions 2 r 2 r (cf. Theorem ). As an additional result of this work, of interest on its own, we derive the asymptotic expansion of the HMP entropy rate around ε = 0 for the case where all transition probabilities are positive (cf. Theorem 2). In particular, we derive an expression for the first derivative of the entropy rate as the Kullback-Liebler divergence between two distributions on 2r+-tuples, again generalizing the formula for r= [4].The results of Section II are applied, in Section III, to express the noisy constrained capacity as a limit of top Lyapunov exponents of certain matrix processes. These exponents, however, are notoriously difficult to compute [25]. Hence, as in the case of the entropy of HMPs, it is interesting to study asymptotic expansions of the noisy constrained capacity. In Section III-B, we study the asymptotics of the noisy constrained capacity, and we show that for (d, k) constraints with k 2d, we have C(S, ε) = C(S) + K ε + O(ε 2 log ε), where K is a well characterized constant. On the other hand, when k > 2d, we have C(S, ε) = C(S)+L ε log ε+o(ε), where, again, L is an explicit constant. The latter case covers the (0, ) constraint (and also the equivalent (, ) constraint). Our formula for the constant L in this case is consistent with the one derived from the results of [20]. Preliminary results of this paper were presented in [6]. We also remark that recently Han and Marcus [9], [0] reached similar conclusions and obtained some generalizations using different methodology. II. ENTROPY OF HIGHER ORDER HMPS Let = { i } i be an rth-order stationary finite memory (Markov) process over a binary alphabet A={0, }. The process is defined by the set of conditional probabilities P( t = t r t = a r ), a r A r. The process is equivalently interpreted as the Markov chain of its states s t = t r t, t > 0 (we assume r+ 0 is defined and distributed according to the stationary distribution of the process). Clearly, a transition from a state u A r to a state v A r can have positive probability only if u and v satisfy u r 2=v r, in which case we say that (u, v) is an overlapping pair. The noise process E = {E i } i is Bernoulli (binary i.i.d.), independent of, with P(E i =) = ε. Finally, the HMP is Z={Z i } i, Z i = i E i, i. (3) Let Zi = (Z i, Z i+,...,z i+r ) and Ẽ i = (E i,..., E i+r ). Also, for e {0, }, let Ẽi e = (e, E i,...,e i+r ). We next compute 2 the probability of Z n := Zn+r. From the definitions of and E, we have P( Z n, Ẽn) = e AP( Z n, Ẽn, E n = e) (4) = n P( Z, Z n+r, E n = e, Ẽn) e A = n P(Z n+r, E n+r Z, Ẽe n n )P( Z, Ẽe n ) e A = P(E n+r )P ( Z n Ẽn Z n Ẽe n n )P( Z, Ẽe n ). e A Observe that in the last line the transition probabilities P ( ) are with respect to the original Markov chain. We next derive, from (4), an expression for P( Z n ) as a product of matrices extending our earlier work [4], [5]. In what follows, vectors are of dimension 2 r or r, and matrices are of dimensions 2 r 2 r. We denote row vectors by bold lowercase letters, matrices by bold uppercase letters, and we let = [,...,]; superscript t denotes transposition. Entries in vectors and matrices are indexed by vectors in A r. Let for a i A r p n = [P( Z n, Ẽn=a ), P( Z n, Ẽn=a 2 )...P( Z n, Ẽn=a 2 r)] be a vector of dimension 2 r, and let M( Z n Z n ) be a 2 r 2 r matrix defined as follows: if (e n,e n ) A r A r is an We generally use the term finite memory process for the first interpretation, and Markov chain for the second. 2 In general, the measures governing probability expressions will be clear from the context. In cases when confusion is possible, we will explicitly indicate the measure, e.g., P.

3 overlapping pair, then the entry (e n,e n ) of M( Z n Z n ) is M en,e n ( Z n Z n ) = P ( Z n e n Z n e n )P(Ẽn=e n ). (5) All other entries are zero. Clearly, M( Z n Z n ) is a random matrix, drawn from a set of 2 r+ possible realizations. With these definitions, it follows from (4) that p n = p n M( Z n Z n ). (6) Since P( Z n ) = p n t = e A r P( Z n, Ẽn = e), after iterating (6), we obtain P( Z n ) = p M( Z 2 Z ) M( Z n Z n ) t. (7) The joint distribution P(Z n ) of the HMP, presented in (7), has the form p A n t, where A n is the product of the first n random matrices of the process M = M( Z 2 Z ),M( Z 3 Z 2 ),...,M( Z n Z n ),... (8) Applying a subadditive ergodic theorem, and noting that p A n t is a matrix norm of A n (indeed, both p and t are positive and A is nonnegative, hence p A n t satisfies the conditions for a matrix norm, as already observed in []) is readily proved that n E[ logp Z (Z n )] must converge to a constant ξ known as the top Lyapunov exponent of the random process M (cf. [5], [2], [25]). This leads to the following theorem. Theorem : The entropy rate of the HMP Z of (3) satisfies [ H(Z) = lim E ] n n log P Z(Z n+r ) [ ( = lim n n E log p M( Z 2 Z ) M( Z n Z n ) t)] =ξ, where ξ is the top Lyapunov exponent of the process M of (8). Theorem and its derivation generalize the results, for r =, of [4], [5], [27], [28]. It is known that computing top Lyapunov exponents is hard (maybe infeasible), as shown in [25]. Therefore, we shift our attention to asymptotic approximations. We consider the entropy rate H(Z) for the HMP Z as a function of ε for small ε. In order to derive expressions for the entropy rate, we resort to the following formal definition (which was also used in entropy computations in [3] and [5]): R n (s, ε) = PZ s (zn ), (9) z n An where s is a real (or complex) variable, and the summation is over all binary n-tuples. In the sequel, we write P Z (z n ) for P(z n) to distinguish it from P (z n ). It is readily verified that H(Z n ) = E [ logp Z(Z n )] = s R n(s, ε). (0) s= The entropy of the underlying Markov sequence is H( n )= s R n(s, 0). s= Furthermore, let P = [p ei,e j ] ei,e j Ar be the transition matrix of the underlying rth order Markov chain, and let π = [π e ] e A r be the corresponding stationary distribution. Define also P(s) = [p s e i,e j ] ei,e j A r and π(s) = [πs e] e A r. Then R n (s, 0) = z n P s (zn ) = π(s)p(s)n t () which is entirely defined on the underlying Markov process. Using a formal Taylor expansion near ε = 0, we write R n (s, ε) = R n (s, 0) + ε ε R n(s, ε) + O(g(n)ε 2 ), (2) ε=0 where g(n) is the second derivative of R n (s, ε) with respect to ε, computed at some ε, provided these derivatives exist (the dependence on n stems from (9)). Using analyticity at ε = 0 (cf. [7], [5]), we find H(Z n ) = H(n ) ε 2 sε R n(s, ε) + O(g(n)ε 2 ) = H( n ) ε s ε z n ε=0, s= P s Z (zn ) ε=0, s= + O(g(n)ε 2 ). (3) To compute the linear term in the Taylor expansion (3), we differentiate with respect to s, and evaluate at s =. Proceeding in analogy to the derivation in [4], we obtain the following result basically proved in [5], so we omit details here. Theorem 2: If the conditional symbol probabilities in the finite memory (Markov) process satisfy P(a r+ a r ) > 0 for all a r+ A r+, then the entropy rate of Z for small ε is H(Z) = lim n n H n(z n ) = H()+f (P )ε+o(ε 2 ), (4) where, denoting by z i the Boolean complement of z i, and ž 2r+ =z... z r z r+ z r+2... z 2r+, we have f (P ) = P (z 2r+ )log P (z 2r+ P (ž 2r+ z 2r+ ) ) = D ( P (z 2r+ ) P (ž 2r+ ) ). (5) Here, D( ) is the Kullback-Liebler divergence, applied here to distributions on A 2r+ derived from the marginals of. A question arises about the asymptotic expansion of the entropy H(Z) when some of the conditional probabilities are zero. Clearly, when some transition probabilities are zero, then certain sequences x n are not reachable by the Markov process, which provides the link to constrained sequences. Example. Consider a Markov chain with the following transition probabilities [ ] p p P = (6) 0

4 where 0 p. This process generates sequences satisfying the (, ) constraint (or, under a different interpretation of rows and columns, the equivalent (0, ) constraint). The output sequence Z, however, will generally not satisfy the constraint. The probability of the constraint-violating sequences at the output of the channel is a polynomial in ε, which will generally contribute a term O(ε log ε) to the entropy rate H(Z) when ε is small. This was already observed in [20] for the transition matrix P as in (6), where it was shown that as ε 0. H(Z) = H() p(2 p) ε logε + O(ε) (7) + p In this paper, in Section IV and Appendix A we prove the following generalization of Theorem 2 for (d, k) sequences. Theorem 3: Let Z be a HMM. Then, in general H(Z) = H() f 0 (P )ε log ε + f (P )ε + O(ε 2 log ε) (8) for some f 0 (P ) and f (P ). Observe that if all transition probabilities of are positive, then f 0 (P ) = 0 and the coefficient f (P ) at ε is presented in Theorem 2. The coefficient f 0 (P ) is derived in Section IV for HMM representing a (d, k) sequences, and for the imizing distribution is presented in Theorems 5 and 6. We should point out, that recently Han and Marcus [9] showed that in general for any HMM H(Z) = H() f 0 (P )ε log ε + O(ε) which is further generalized in [0] to H(Z) = H() f 0 (P )ε log ε + f (P)ε + O(ε 2 log ε) when at least one of the transition probabilities in the Markov chain is zero. III. CAPACITY OF THE NOISY CONSTRAINED SYSTEM We now apply the results on HMPs to the problem of noisy constrained capacity. A. Capacity as a Lyapunov Exponent Recall that I(; Z) = H(Z) H(ε) and, by Theorem, when is a Markov process, we have H(Z) = ξ(p ) where ξ(p ) is the top Lyapunov exponent of the process {M( Z i Z i )} i>0. In [3] it is proved that the process optimizing the mutual information can be approached by a sequence of Markov representations of increasing order. Therefore, as a direct consequence of this fact and Theorem we conclude the following. Theorem 4: The noisy constrained capacity C(S, ε) for a (d, k) constraint through a BSC channel of parameter ε is given by C(S, ε) = lim r sup P (r) ξ(p (r) ) H(ε) (9) where P (r) denotes the probability law of an rth-order Markov process generating the (d, k) constraint S. Clearly, estimating the top Lyapunov exponents in (9) is computationally prohibitively expensive, if possible. Therefore, we next turn our attention to asymptotic expansions of C(S, ε) near ε = 0. B. Asymptotic Behavior A nontrivial constraint will necessarily have some zerovalued conditional probabilities. Therefore, the associated HMP will not be covered by Theorem 2, but rather by Theorem 3. For (d, k) sequences we have H(Z) = H() f 0 (P )ε log ε + f (P )ε + O(ε 2 log ε) (20) for some f 0 (P ) and f (P ) where P is the underlying Markov process. Since H(ε) = ε log ε+ε O(ε 2 ) for small ε, we obtain C(S, ε) = sup H(Z) ε log ε + ε O(ε 2 ). S We need to find the imizing distribution to estimate the capacity. We shall prove in Section IV.D that this imizing distribution is actually the entropic distribution P (imizing the entropy of the underlying Markov process). However, this introduces additional error term O(ε 2 log 2 ε) which exceeds the error term O(ε 2 log ε) of the entropy estimation in Theorem 3 The same result was established by Han and Marcus [9], [0] using different methodology. In summary, we are led to C(S, ε) = C(S) ( f 0 (P ))ε log ε+(f (P ) )ε + O(ε 2 log 2 ε) (2) where C(S) is the capacity of noiseless RLL system. Various methods exist to derive C(S) [8]. In particular, one can write [8], [24] C(S) = log ρ 0, where ρ 0 is the smallest real root of k ρ l+ 0 =. (22) Our goal is to derive explicit expressions for f 0 (P ) and f (P ) for (d, k) sequences. For example, we will show in Theorem 5 below that for some RLL constraints, we have f 0 (P ) = in (2), hence the noisy constrained capacity is of the form C(S, ε) = C(S) + O(ε). In Theorem 6 below we derive also f (P ). We apply the same approach as in previous section, that is, we use the auxiliary function R n (s, ε) defined in (9). To start, we find a simpler expression for P Z (z n ). Summing over the number of errors introduced by the channel, we find n P Z (z n ) = P (x n )( ε) n + ε( ε) n P (x n e i ) plus the error term O(ε 2 ) (resulting from two or more errors), where e j = (0,..., 0,, 0,..., 0) A n with a at position j. Let B n A n denote the set of sequence z n at Hamming distance one from S n, and C n = A n \ (S n B n ). Notice that i=

5 sequences in C n are at distance at least two from S n, and contribute the O(ε 2 ) term. From the above, we conclude R n (s, ε) = P Z (z n )s (23) z nan = P Z (z n )s + P Z (z n )s + P Z (z n )s. z n Sn z n Bn z n Cn We observe that z n Bn P Z (z n )s = O(ε s ), as ε 0. Defining φ n (s) = Q n (s) = z n Cn P Z (z n )s = O(ε 2s ) n P (z n ) s P (z n ) z n Sn i= ( n ) s P (z n e i) z n Bn i= we arrive, after some algebra, at the following expression for R n (s, ε) R n (s, ε) = ( ε) ns R n (s, 0) + ε( ε) ns φ n (s) (24) Notice that + ε s ( ε) (n )s Q n (s) + O(ε 2 + ε +s + ε 2s ). φ n () + Q n () = z n We now derive n P (z n e i ) = n. i= H(Z n ) = s R n(, ε) using the fact that R n (, ε) =. Since all the functions involved are analytic, we obtain from (24) H(Z n ) = H( n )( nε) + nε ε(φ n () + φ n()) ε log εq n () εq n () + O(nε2 log ε), (25) where the error term is derived in Appendix A. In the above, φ n() and Q n() are, respectively, the derivative of φ n (s) and Q n (s) at s =. Notice also that the term nh( n )ε of order n 2 ε is cancelled by (φ n ()+Q n ())ε = (H(n )n + O(n))ε and only nε term remains (see the next section for details). The case k 2d is interesting: one-bit flip in a (d, k) sequence is guaranteed to violate the constraint, and consequently z n S n and i: P (z n e i) = 0. Therefore φ n (s) = 0 in this case, leaving Q n () = n. Thus, in the case k 2d, we have f 0 (P) =, and the term O(ε log ε) in (2) cancels out. Further considerations are required to compute Q n () and obtain the coefficient of ε in (25). Here, we provide the necessary definitions, and state our result that are proved in Section IV. Ignoring border effects (which do not affect asymptotics, as easy to see 3 ), we restrict our analysis to (d, k) sequences over the extended alphabet (of phrases) [8] B = { 0 d, 0 d+,...,0 k }. In other words, we consider only (d, k) sequences that end with a. For such sequences, we assume that they are generated by a memoryless process over the super-alphabet. This is further discussed in Section IV. Let p l denote the probability of the super-symbol 0 l. We stress the fact that p l differs from the probability that l + consecutive symbols equals 0 l. In fact it is equal to the probability that two consecutive returns to symbol are separated by exactly l zeros. Therefore p l = P ( i+l+ i+ = 0 l i = ), i >, d l k. (26) In Appendix B we prove that the entropic distribution P corresponds to the case with ρ 0 as in (22). p l = ρ l+ 0, (27) Example 2. Consider again the Markov process discussed in Example with the transition matrix (6). This represents (, ) constraint system. Observe that p l = P (0 l ) = P(0 )P l (0 0)P( 0) = p( p) l. The imum entropy distribution is achieved for p = /ϕ 2 where ϕ = ( + 5)/2.; also ρ 0 = /ϕ. In this case p = ρ 2 0 and p = ρ 0, thus p l = ρ l+ 0. The expected length of a super-symbol in B is λ = k (l + )p l. We also introduce the generating function r(s, z) = l p s l zl+. By ρ(s) we denote the smallest root in z of r(s, z) =, that is r(s, ρ(s)) =. Clearly, ρ() = and ρ l () = p l log p l λ is the entropy rate per bit of the super-alphabet, and ρ () = H(). Furthermore, we define λ(s) = z r(s, z) z=ρ(s) and notice that λ() = λ. Finally, to present succinctly our results, we introduce some additional notation. Let α(s, z) = l (2d l)p s lz l+. For integers l, l 2, d l, l 2 k, let I l,l 2 denote the interval I l,l 2 = {l: min + {l d, k l 2 } l min + {l 2 d, k l }}, 3 Indeed, in general a (d, k) sequence may have at most k starting and ending zeros of total length n + O() that cannot affect the entropy rate.

6 where min + {a, b} = {min{a, b}, 0}. We shall write Il,l 2 = I l,l 2 \ {0}. At last, we define τ(s, z) = τ (s, z) + τ 2 (s, z) + τ 3 (s, z) where τ (s, z) = 2 {0, l + l 2 k d}p s l p s l 2 z l2+l2+2 τ 2 (s, z) = τ 3 (s, z) = l =d l 2=d θ Il,l 2 θ I l,l 2 p l+θp l2 θ z l2+l2+2 Now we are in a position to present our main results. The proofs are delayed till the next section. The following theorem summarizes our findings for the case k 2d. Theorem 5: Consider a (d, k) constrained system with k 2d. Then, C(S, ε) = C(S) ( f 0 (P ))ε + O(ε2 log 2 ε), where for p l = ρ l+ 0 f 0 (P ) = log λ + () 2λ λ + s τ(, ) + sα(, ) λ + ρ ()( 2 2 α(, ) + τ(, )) sz sz + ρ () λ ( z α(, ) + τ(, )) z for ε 0 and λ(s), α(s, z) and τ(s, z) are defined above. In the complementary case k > 2d, the term φ n (s) in (23) does not vanish, and thus the O(ε log ε) term in (2) is generally nonzero. For this case, using techniques similar to the ones leading to Theorem 5, we obtain the following result. Theorem 6: Consider the constrained (d, k) system with k 2d. Define γ = (l 2d)p l, δ = p l p l2, l>2d d l +l 2+ k and λ = k = (l + )p l. Then for p l = ρ l+ 0 C(S, ε) = C(S) ( f 0 (P ))εlog ε + O(ε), (28) where for ε 0. and δ =. Thus, f 0 (P ) = γ + δ p(p 2) = λ p, consistent with the calculation of the same quantity in [20]. The noisy constrained capacity is obtained when P = P, d l,l 2 k i.e., p = /ϕ k k 2, where ϕ = (+ 5)/2, the golden ratio. Then, 2 (p l p l2 + p l+θp l2 θ) s z l2+l2+2 f 0 (P ) = / 5, and by Theorem 6 C(S, ε) = log ϕ ( / 5)ε log(/ε) + O(ε) k k for ε 0. 2 min{k, l + l 2 d} (l + l 2 ) + l =d l 2=d IV. ANALYSIS s f 0 (P ) = γ + δ λ Example 3. We consider the (, ) constraint with transition matrix P as in (6). Computing the quantities called for in Theorem 6 for d = and k =, we obtain p l = ( p) l p as in Example 2, and λ = + p p ( p)2, γ =, p In this section, we derive explicit expression for the coefficients f 0 (P ) and f (P ) of Theorem 3, as well as f 0 (P ) and f (P ) of Theorems 5 and 6. We also establish the error term in Theorem 5. Throughout, we consider the super-alphabet approach. Recall that a super-symbol is a text 0 l for d l k which is drawn from a memoryless source. This model is equivalent to a Markov process with renewals at symbols. To simplify the analysis, we first consider sequences of length n consisting only of (full) super symbols. We call such sequence a reduced (d, k) sequence of length n. Let x n be a reduced (d, k)-sequence of length n made of m super-symbols: x n = 0 0l l2...0 lm. In the sequel, for reduced (d, k) sequences of length n, we define P(x n ) = m p li. Notice that P(x n ) = 0 if xn is not a reduced (d, k) sequence (i.e., it doesn t end on a ). We also notice that quantities P(x n ) do not form a probability distribution over the reduced (d, k)-sequence of length n because they don t sum to. In view of this we define where i= P,n (x n ) = P(x n ) P n P n = x n P(x n ). Note that P n is the probability that the n-th symbol is exactly a (in other words, x n is built from a finite number of super symbols). We observe that P,n (x n ) is not the original probability distribution P (x n ) generated by a memoryless source of super-symbols. In fact, it is no longer memoryless, but it converges to it in a way that allows us to cope with truncation problems. The later does not effect the asymptotic value of the entropy rate. Recalling the definition r(s, z) = l ps l zl+, we find P n z n = r(, z). n Indeed, every reduced (d, k) sequence consists of an empty string, one super symbol, two super symbols or more, thus

7 n P nz n = k rk (, z) = /( r(, z)) (cf. [24]). By the Cauchy formula [24] we obtain P n = dz 2πi r(, z) z n+ = z r(, ) + O(µ n ) = λ + O(µ n ) for some µ >, since is the largest root of = r(, z) and z r(, ) = λ = k (l + )p l. Let Sm be the set of (d, k) reduced sequences made of exactly m super-symbols with no restriction on its length. Let S = S m m. Let B be the set of sequences that are exactly at Hamming distance from a sequence in S.By our convention, if x S m for some m, (i.e. if x = 0 l 0 l2...0 lm ), then P(x) = i=m i= p l; otherwise P(x) = 0). We denote by L(x) the length of x. We call such a model the variable-length model. To derive H(Z n ) presented in (25) we need to evaluate φ n () and Q n (). We estimate these quantities in the variablelength model as described above and then re-interpret them in the original model. Define which we re-write as φ(s, z) = n Q(s, z) = n φ(s, z) = Q(s, z) = x S x B L(x) P s (x) i= P s n φ n(s)z n, (29) P s n Q n(s)z n (30) i= P(x e i )z L(x), s L(x) P(x e i ) z L(x). We notice that φ(, z) + Q(, z) = L(x)z L(x) = z z r(, z). x S We can also write where φ m (s, z) = Q m (s, z) = φ(s, z) = m Q(s, z) = m x S m P s (x) x S m j=l(x) j= i=l(x) i= φ m (s, z), (3) Q m (s, z), (32) L(x) i= P(x e i )z L(x), x ej / S B(x e j ) S P(x e j e i ) s z L(x) (33) where B(y) is the set of sequences that are at Hamming distance from sequence y. The factor y/ is there to S enforce that y should not be in S, and therefore is in B (since it is at Hamming distance from S ). The division by B(y) S is to ensure that we do not over-count: it expresses the number of ways y can be reached from S. We next evaluate φ(s, z) and Q(s, z). A. Computation of φ m (s, z) The case k 2d is easy since x e j / S m when x S m. Thus φ m (s, z) = 0. In the sequel we concentrate on k > 2d. Theorem 7: For reduced (d, k) sequences consisting of m super symbols, we have φ m (s, z) = mb (s, z)r m (s, z)+(m )b 2 (s, z)r m 2 (s, z), where b (s, z) = b 2 (s, z) = In particular, k p s l d l +l 2 k b (, ) = l p j p l j z l+, j= p s l p s l 2 p l+l 2+z l+l2+2. l=k p j p l j, b 2 (, ) = {0, l 2d}p l. l Proof. We need to consider two cases: one in which the error changes a 0 to a, and the other one when the error occurs on a. In the first case, m super symbols are not changed and each contributes r(s, z). The corrupted super symbol is divided into two and its contribution is summarized in b (s, z). In the second case, an ending is changed into a 0 so two super symbols (except the last one) collapsed into a one super symbol. This contribution is summarized by b 2 (s, z) while the other m 2 super symbols, represented by r(s, z) are unchanged. B. Computation of Qm (s, z) We recall the following definitions. For integers l, l 2, d l, l 2 k, let I l,l 2 denote the interval I l,l 2 = {l: min + {l d, k l 2 } l min + {l 2 d, k l }}, where min + {a, b} = {min{a, b}, 0}. We shall write I l,l 2 = I l,l 2 \ {0}. Theorem 8: For reduced (d, k) sequences consisting of m super symbols, the following holds Q m (s, z) = mα(s, z)r m (s, z) + (m )τ(s, z)r m 2 (s, z) where α(s, z) = l j {0, 2d l}p s lz l+

8 and τ(s, z) = τ (s, z) + τ 2 (s, z) + τ 3 (s, z) where τ (s, z) = τ 2 (s, z) = τ 3 (s, z) = k l =d l 2=d k ({0, d(l ) + l 2 k} + {0, d(l 2 ) + l k})p s l p s l 2 z l+l2+2, k k θ d (p l p l2 p l+θp l2 θ) s z l+l2+2 2 l =d l 2=d θ Il,l 2 k k l =d l 2=d l+l 2+>k 2 min{k, l + l 2 d} (l + l 2 ) + θ I l,l 2 p l+θp l2 θ s z l+l2+2, with d(l) = min{d, l d}. In particular, for k 2d we have the following simplifications: α(s, z) = (2d l)p s l zl+, l and τ (s, z) = l,l 2 2 {0, l + l 2 k d}p s l p s l 2 z l2+l2+2, τ 2 (s, z) = τ 3 (s, z) = k k l =d l 2=d θ Il,l 2 k k 2 min{k, l + l 2 d} (l + l 2 ) + l =d l 2=d s z l2+l2+2. θ I l,l 2 p l+θp l2 θ Proof. As in the previous proof, the main idea is to enumerate all possible ways a sequence x leaves the status of (d, k) after a bit corruption and returns to (d, k) status after a second bit corruption. In other words, x S, x e j / S, and x e j e i S. We often refer to representation (33) in the proof. We consider several cases: a) Property : Let x be a single super-symbol: x = 0 l. Consider now x e j. First, suppose l 2d and the error e j falls on a zero of x. If e j falls on a zero between l d and d, then 0 l e j = 0 l 0 l2, and at least one of l, l 2 is smaller than d. Therefore, x e j is not a (d, k) sequence. The only way e i can produce a (d, k) sequence is when it is equal to e j : B(x e j ) S =. Assume now l > 2d. If e j falls at distance greater than d from both ends, then x e j S and does not leave S. b) Property 2: If the error e j falls on a symbol 0 l in x = 0 l 0 l2, on the last min{d, l d} zeros, then with θ min{d, l 2 d} 0 l 0 l2 e j = 0 l θ 0 θ 0 l2, and x / S. We have: if it falls also on the last min{d, l d, k l 2 } zeros, i.e θ min{d, l d, k l 2 }, then the only e i that moves x e i e j back a (d, k) sequence is either e j = e i or e j such that it falls on the of 0 l, and B(x e j ) S = 2, otherwise, the only acceptable j is i, so that B(x e j ) S = and x e j / S. c) Property 2bis: If the error e j in x = 0 l 0 l2 falls on the first min{d, l 2 d} zeros of 0 l2, then if it falls also on the first min{d, l 2 d, k l } zeros, then the only e j that moves x e i e j back a (d, k) sequence is either e j = e i or e j such that it falls on the of 0 l, and B(x e j ) S = 2, otherwise, the only acceptable j is i so that B(x e j ) S = and x e j / S. d) Property 3: We still consider x = 0 l 0 l2. If the error falls on the of 0 l, then the only e j that moves x e j e i back (d, k) sequences are those that either fall back on the, or on the min{l 2 d, k l } first zeros of 0 l2, or on the min{l d, k l 2 } last zeros of )0 l, and then B(x e j ) S = + min{l d, k l 2 } + min{l 2 d, k l } = + 2 min{k, l + l 2 d} l l 2. 2 (p l p l2 + p l+θp l2 θ) s z l2+l2+2 Clearly,, then we must have l + l 2 + > k in order x e j / S m. Given these four properties we can define the following quantities α(s, z) = l {0, 2d l}p s l zl+ and τ(s, z) = τ (s, z)+τ 2 (s, z)+τ 3 (s, z) with the convention that α(s, z) corresponds to Property, τ (s, z) to Property 2 and 2bis (second bullet), τ 2 (s, z) to Property 2 and Property 2bis (first bullet), τ 3 (s, z) to Property 3. This completes the proof. C. Asymptotic analysis Finally, we can re-interpret our results for reduced (d, k) sequences of the variable-length model in terms of the original (d, k) sequences of the fixed length. Our aim is to provide an asymptotic evaluation of φ n (), Q n (), φ n() and Q n() as n. To this end, we will present an asymptotic evaluation of φ n (s) and Q n (s). Using Theorems 7 and 8 we easily conclude φ(s, z) = Q(s, z) = m m φ m (s, z) = b (s, z) + b 2 (s, z) ( r(s, z)) 2, Q m (s, z) = α(s, z) + τ(s, z) ( r(s, z)) 2. Then by Cauchy formula applied to (29) and (30) Pn s φ n(s, z) = φ(s, z) dz 2iπ z n+, PnQ s n (s, z) = Q(s, z) dz 2iπ z n+.

9 A simple application of the residue analysis leads to Pn s φ n(s) = ρ n (s) λ(s) 2 ((n + )(b (s, ρ(s)) + b 2 (s, ρ(s))) z b (s, ρ(s)) ) z b (s, ρ(s)) + O(µ n ), P s n Q n(s) = ρ n (s) λ(s) 2 ((n + )(α(s, ρ(s)) + τ(s, ρ(s))) z α(s, ρ(s)) z τ(s, ρ(s)) ) + O(µ n ). Since functions involved are analytic and uniformly bounded in s in a compact neighborhood, the asymptotic estimates of φ n() and Q n() can be easily derived. In summary, we find φ n() + Q n() = (n + )ρ ()(φ n () + Q n ()) + O(n) = nh( n ) + O(n), which cancels the coefficient nεh( n ) in the expansion of H(Z n ) in (25). More precisely, φ n () + Q n () = nh(n ) + n log λ () 2λ λ + n ( λ s b (, ) + s b 2(, ) + s α(, ) + τ(, ) ( s ρ 2 () sz b (, ) + 2 sz b 2(, ) 2 )) 2 + α(, ) + τ(, ) sz sz ( + n ρ () λ z b (, ) + z b 2(, ) + z α(, ) + z τ(, ) ) + O(). (34) The expression for f 0 (P ) in Theorem 5 follows directly from the expression (34) since the coefficient at ε is exactly nh( n ) + φ n() + Q n() + φ n () and φ n () = 0 when k 2d. The proof of Theorem 6 is even easier since f 0 (P ) = Q n() n We have from (34): = φ n() n. ( b (, ) + b 2 (, ) φ n () = n λ ). Observe that b (, ) exactly matches γ and b 2 (, ) matches δ in Theorem 6. D. Error Term in Theorem 5 To complete the proof of Theorem 5, we establish here that the dominating error term of the capacity C(S, ε) estimation is O(ε 2 log 2 ε). For this we need to show that the imizing distribution P (ε) of H(Z) introduces error of order O(ε 2 log 2 ε). Recall that P imizes H(). In Appendix A we show that H(Z) = O(log ε) ǫ uniformly in P. As a consequence H(Z) converges to H() uniformly in P as ε 0. We also prove in the Appendix that H(Z) = H()+f 0 (P )ε log ε+f (P )ε+g(p )O(ε 2 log ε), where the functions f 0, f and g of P are in C (all continuous and infinitely many differentiable functions). Let P (ε) be the distribution that imizes H(Z), hence the capacity C(S, ε). For α > 0 let K α be a compact set of distributions that are at topological distance smaller than or equal to α from P. Since H(Z) converges to H() uniformly, there exists ε > 0 such that ε < ε, ε > 0 we have P K α. Let now β = P K α {g(p )}. Clearly, β g(p ) as α 0. Let also and F(P, ε) = H() + f 0 (P )ε log ε + f (P )ε, F α (ε) = P K α {F(P, ε)}. The following inequality for ε < follows from our analysis in Appendix A F α (ε) + βε 2 log ε H(P (ε)) F α(ε) βε 2 log ε. We will prove here that Let F α (ε) = F(P, ε) + O(ε 2 log 2 ε). P P = arg{f(p, ε)}. We have F(, ε) = 0, where F denotes the gradient of F with respect to P. Defining dp = P P we find F( P, ε) = F(P, ε) + 2 F(P, ε)dp + O( dp 2 ) where 2 F is the second derivative matrix (i.e., Hessian) of F and v is the norm of vector v. Since F( P, ε) = 0 and H(P ) = 0, thus F(P, ε) = f 0 (P )ε log ε + f (P )ε. Denoting F 2 = 2 F(P ) and its inverse matrix as F we arrive at and F 2 dp = f 0 (P +O( dp 2 ), 2, )ε log ε + f (P )ε (35) dp = F2 ( f 0 (P )ε log ε + f (P )ε) +O( dp 2 ). (36)

10 This lead to dp = O(ε log ε) for sufficiently small ε such that dp α. Thus F( P, ε) = F(P, ε) + 2 dp F 2 dp + f 0 (P )dp ε log ε + f (P )dp ε + O( dp 3 ), plugging the expression of dp from (36) yields: F α (ε) = F( P, ε) = F(P, ε) 2 f 0(P ) F 2 f 0 (P )ε2 log 2 ε f 0 (P ) F 2 f (P )ε2 log ε 2 f (P ) F2 f (P )ε 2 + O(ε 3 log 3 ε). This completes the proof. V. CONCLUSION We study the capacity of the constrained BSC channel in which the input is a (d, k) sequence. After observing that a (d, k) sequence can be generated by a k-order Markov chain, we reduce the problem to estimating the entropy rate of the underlying hidden Markov process (HMM). In our previous paper [4], [5], we established that the entropy rate for a HMM process is equal to a Lyapunov exponent. After realizing that such an exponent is hard to compute, theoretically and numerically, we obtained an asymptotic expansion of the entropy rate when the error rate ε is small (cf. also [27]). In this paper, we extend previous results in several directions. First, we present asymptotic expansion of the HMM when some of the transition probabilities of the underlying Markov are zero. This adds additional term of order ε log ε to the asymptotic expansion. Then, we return to the noisy constrained capacity and prove that the exact capacity is related to supremum of Lyapunov exponents over increasing order Markov processes. Finally, for (d, k) sequences we obtain an asymptotic expansion for the noisy capacity when the noise ε 0. In particular, we prove that for k 2d the noisy capacity is equal to the noiseless capacity plus a term O(ε). In the case k > 2d, the correction term is O(ε log ε). We should point out that recently Han and Marcus [9], [0] reached similar conclusions (and obtained some generalizations) using quite different methodology. APPENDI A: PROOF OF THEOREM 3 In this Appendix we prove the error term in (8) in Theorem 3 using the methodology developed by us in [5]. We need to prove that for ε < /2 H(Z n ) = H( n )+nf (P )ε+nf 0 (P )ε logε+o(nε 2 log ε) (37) for some f (P ) and f 0 (P ). We start with H(Z n ) = H( n ) ε ǫ H(Zn ) + G n (38) and show at the end of this section that G n = O(nε 2 log ε). We first concentrate on proving that ǫ H(Zn ) = nf (P ) + nf 0 (P )log ε (39) for some f 0 (P ) and f (P ). We use equation (48) from [5] which we reproduce below ǫ P Z(z) = (P Z (z e i ) P Z (z)) 2ε i for any sequence z of length n (hereafter, we simply write x for x n and z for z n ). Consequently, ǫ H(Zn ) = 2ε that can be rewritten as ǫ H(Zn ) = 2ε (P Z (z e i ) P Z (z))log P Z (z) z i x i P Z (z)log P Z(z e i ). P Z (z) In order to estimate the ratio of P Z (z e i ) and P Z (z), we observe that P Z (z) = ( ε) ( ) dh(x,z) ε n P (x), ε x where d H (, x, z) is the Hamming distance between x and z. Similarly, P Z (z e i ) = ( ε) ( ) dh(x,z e ε i) n P (x). ε x The following inequality is easy to prove ( ) dh(x,z e ε i) d H(x,z) min P Z(z e i ) i ε P Z (z) ( ) dh(x,z e ε i) d H(x,z). i ε Since d H (x, z e i ) = d H (x, z) ± we conclude that z i ε ε P Z(z e i ) P Z (z) ε. ε Thus P Z (z)log P Z(z e i ) n log( ε) n log ε P Z (z) and this completes the proof of (37). To finish the proof of Theorem 3, it remains to show that that G n = O(nε 2 log ε), that is, uniformly in n and ε > 0 H(Z n ) = H(n ) ε ǫ H(Zn ) + O(nε2 log ε). (40) To this end, we make use of the Taylor expansion: H(Z n ) = H( n ) ε ǫ H(Zn ) ε 0 θ 2 ǫ 2H(Zn ) ε=θdθ,

11 and prove that for ε small enough we have uniformly in n and ε > 0 2 ǫ 2H(Zn ) = O(n log ε), (4) from which the error term O(nε 2 log ε) follows immediately. In [5] we proved that for all sequences z 2 ǫ 2 P Z(z) = 2 2ε ǫ P Z(z) ( 2ε) 2 (P Z (z e i e j ) P Z (z e i ) P Z (z e j ) + P Z (z)), which led to equation (49) of [5] repeated below 2 ǫ 2H(Zn ) = 2 2ε ǫ H(Zn ) i,j ( 2ε) 2 (D + D 2 ), where D = P Z (z e i e j ) P Z (z e i ) z i,j P Z (z e j ) + P Z (z)log P Z (z), and D 2 = (P Z (z e i )) P Z (z)) z ij (P Z (z e j )) P Z (z)) P Z (z). We will prove that D = O(n log ε) and D 2 = O(n). Let first deal with D. We can write it as D = P Z (z)log P Z(z e i e j )P Z (z) P z Z (z e i )P(z e j ). i,j We now split D = D + D where D involves the pairs (i, j) such that i j k + and D deals with such pairs that j i > k +. For all z and all i and j such that i j k +, we have ε 2 ( ε) 2 < P Z(z e i e j )P Z (z) ( ε)2 < P Z (z e i )P Z (z e j ) ε 2. (42) Therefore, D z j i k+ P Z (z)( 2 log( ε) 2 log ε) (k + )n( 2 log( ε) log ε). For j i > k +, we observe, as in [5], that there exists µ < such that for all z P Z (z e i e j )P Z (z) P Z (z e i )P Z (z e j ) = + O(µ i ) + O(µ j ) + O(µ j i ) +O(µ n i ) + O(µ n i ). Thus we find D = P Z (z)log ( + O(ρ i ) + O(µ j ) z j i >k+ ) +O(µ j i ) + O(µ n i ) + O(µ n i ) = z P Z (z)o(n/( µ)) = O(n). Now we turn our attention to D 2, and similarly we split D 2 = D 2 +D 2 with D 2 involving only i, j such that i j k + and D 2 involving i, j such that i j > k +. We easily see that D 2 n(k + ), and then D 2 = P Z (z) P Z (z e i ) z i j >k P Z (z e j ) + P Z (z e i e j ( ) PZ (z e i )P Z (z e j ) + P Z (z e i e j )P Z (z) P Z (z e i e j, ε). We now notice that P Z (z) P Z (z e i ) P Z (z e j )+P Z (z e i e j ) = 0. z i,j Restricting this sum to i j > k+ we observe that it gives the opposite of the sum for i j k +. Therefore, the total contribution is O((k + )n. Furthermore, ( ) PZ (z e i )P Z (z e j ) P Z (z e i e j )P Z (z) z i j >k+ P Z (z e i e j ) = z P Z (z)o(n/( µ)) = O(n), and this completes the proof of Theorem 3. APPENDI B: PROOF OF (27) Our aim is to show that p l that imize the sequence entropy rate is given by (27). Recall that the entropy rate is equal to ρ () with k ρ () = p l log p l k (l + )p. l If we extend above p l such that l=k p l, then we need to modify it to ρ () = ( k p l ) ( k log p l ) k k (l + )p l p l log p l. The optimal distribution (p d,...,p k ) is the one imizing the gradient of ρ () which implies that the gradient of the denominator is collinear with the gradient of the numerator. Thus there exists ν such that: ( k ) ( k k k p l log p l ) p l log p l = ν (l+)p l. All computations done, this implies that for all l between d and k ( k ) log p i log p l = (l + )ν. i=d For p l such that k p l =, the above identity becomes for all d l k log p l = (l + )ν, hence p l = (e ν ) l+. Setting ρ 0 = e ν with ρ 0 being the unique root of k ρl+ =, we establish (27).

12 ACKNOWLEDGMENT The authors thank B. Marcus for very helpful discussions during the summer of 2006 when this research was shaping up. We also thank G. Seroussi for participating in the initial stage of this research. REFERENCES [] D. M. Arnold, H.-A. Loeliger, P. O. Vontobel, A. Kavcic, W. Zeng, Simulation-Based Computation of Information Rates for Channels With Memory, IEEE Trans. Information Theory, 52, , [2] D. Blackwell, The entropy of functions of finite-state Markov chains, in Trans. First Prague Conf. Information Theory, Statistical Decision Functions, Random Processes, (Prague, Czechoslovakia), pp. 3 20, Pub. House Chechoslovak Acad. Sci., 957. [3] J. Chen and P. Siegel, Markov processes asymptotically achieve the capacity of finite-state intersymbol interference channels, IEEE Trans. Information Theory, 54, , [4] J. Fan, T. L. Poo, and B. Marcus, Constraint Gain, IEEE Trans. Information Theory, 50, , 200. [5] H. Furstenberg and H. Kesten, Products of random matrices, Ann. Math. Statist, pp , 960. [6] R. Gharavi and V. Anantharam. An upper bound for the largest Lyapunov exponent of a Markovian product of nonnegative matrices. Theoretical Computer Science, Vol. 332, Nos. -3, pp , [7] G. Han and B. Marcus, Analyticity of Entropy Rate of Hidden Markov Chains,IEEE Trans. Information Theory, 52, , [8] G. Han and B. Marcus, Analyticity of Entropy Rate in Families of Hidden Markov Chains (II),,IEEE Trans. Information Theory, 52, 03-07, [9] G. Han and B. Marcus, Capacity of noisy constrained channels, Proc. ISIT 2007, , Nice, [0] G. Han and B. Marcus, Asymptotics of the input-constrained binary symmetric channel capacity, Annals of Applied Probability, 9, , [] T. Holliday, A. Goldsmith, and P. Glynn, Capacity of Finite State Channels Based on Lyapunov Exponents of Random Matrices, IEEE Trans. Information Theory, 52, , [2] K. A. Schouhammer Immink, Codes for Mass Data Storage Systems, Shannon Foundation Publishers, Eindhoven, [3] P. Jacquet and W. Szpankowski, Entropy computations via analytic depoissonization, IEEE Trans. Inform. Theory, 45, , 999. [4] P. Jacquet, G. Seroussi, and W. Szpankowski, On the Entropy of a Hidden Markov Process (extended abstract), Data Compression Conference, , Snowbird, [5] P. Jacquet, G. Seroussi, and W. Szpankowski, On the Entropy of a Hidden Markov Process (full version) Theoretical Computer Science, 395, , [6] P. Jacquet, G. Seroussi, and W. Szpankowski, Noisy Constrained Capacity 2007 International Symposium on Information Theory, , Nice, [7] S. Karlin and H. Taylor, A First Course in Stochastic Processes. New York: Academic Press, 975. [8] B. Marcus, R. Roth and P. Siegel, Constrained Systems and Coding for Recording Channels, Chap. 20 in Handbook of Coding Theory (eds. V. S. Pless and W. C. Huffman), Elsevier Science, 998. [9] E. Ordentlich and T. Weissman, New Bounds on th Entropy Rate of Hidden Markov Process, Information Theory Workshop, 7 22, San Antonio, [20] E. Ordentlich and T. Weissman, On the optimality of symbol by symbol filtering and denoising, IEEE Trans. Information Theory, 52, 9-40, [2] V. Oseledec, A multiplicative ergodic theorem, Trudy Moskov. Mat. Obshch., 968. [22] E. Seneta, Non-Negative Matrices. New York: John Wiley & Sons, 973. [23] S. Shamai (Shitz) and Y Kofman, On the capacity of binary and Gaussian channels with run-length limited inputs, IEEE Trans. Commun., 38, , 990. [24] W. Szpankowski, Average Case Analysis of Algorithms on Sequences. New York: John Wiley & Sons, Inc., 200. [25] J. Tsitsiklis and V. Blondel, The Lyapunov exponent and joint spectral radius of pairs of matrices are hard - when not impossible to compute and to approximate, Mathematics of Control, Signals, and Systems, 0, 3 40, 997. [26] E. Zehavi and J. K. Wolf, On runlength codes, IEEE Trans. Information Theory, 34, 45 54, 988. [27] O. Zuk, I. Kanter and E. Domany, Asymptotics of the Entropy Rate for a Hidden Markov Process, J. Stat. Phys., 2, , [28] O. Zuk, E. Domany, I. Kanter, and M. Aizenman. From Finite-System Entropy to Entropy Rate for a Hidden Markov Process. Signal Processing Letters, IEEE, 3, , Philippe Jacquet is a research director in INRIA. He graduated from Ecole Polytechnique in 98 and from Ecole Nationale des Mines in 984. He received his Ph.D. degree from Paris Sud University in 989 and his habilitation degree from Versailles University in 998. He is currently the head of HIPERCOM project that is devoted to high performance communications. His research interests cover information theory, probability theory, quantum telecommunication, evaluation of performance and algorithm design for telecommunication, wireless and ad hoc networking. Philippe Jacquet is expert in wireless ad hoc networking. He authored the communication protocol OLSR for mobile ad hoc networks (MANET). He is working in flooding techniques in ad hoc network, as well as on the Multipoint Relaying technique. His activities span space-time information theory considerations in massively dense mobile networks, complexity of distributed capacity, and quality of service management. He holds seven patents. Philippe Jacquet is author of numerous papers that have appeared in international journals. In 999 he co-chaired the Information Theory and Networking Workshop, Metsovo, Greece. Wojciech Szpankowski is Saul Rosen Professor of Computer Science and (by courtesy) Electrical and Computer Engineering at Purdue University where he teaches and conducts research in analysis of algorithms, information theory, bioinformatics, analytic combinatorics, random structures, and stability problems of distributed systems. He received his M.S. and Ph.D. degrees in Electrical and Computer Engineering from Gdansk University of Technology. He held several Visiting Professor/Scholar positions, including McGill University, INRIA, France, Stanford, Hewlett-Packard Labs, Universite de Versailles, University of Canterbury, New Zealand, Ecole Polytechnique, France, and the Newton Institute, Cambridge, UK. He is a Fellow of IEEE, and the Erskine Fellow. In 2009 he received the Humboldt Research Award. In 200 he published the book Average Case Analysis of Algorithms on Sequences, John Wiley & Sons, 200. He has been a guest editor and an editor of technical journals, including THEO- RETICAL COMPUTER SCIENCE, the ACM TRANSACTION ON ALGORITHMS, the IEEE TRANSACTIONS ON INFORMATION THEORY, FOUNDATION AND TRENDS IN COMMUNICATIONS AND INFORMATION THEORY, COMBINATORICS, PROBABIL- ITY, AND COMPUTING, and ALGORITHMICA. In 2008 he launched the interdisciplinary Institute for Science of Information, and in 200 he became the Director of the newly established NSF Science and Technology Center for Science of Information.

Noisy Constrained Capacity

Noisy Constrained Capacity Philippe Jacquet INRIA Rocquencourt 7853 Le Chesnay Cedex France Email: philippe.jacquet@inria.fr Gadiel Seroussi Hewlett-Packard Laboratories Palo Alto, CA 94304 U.S.A. Email: