Research Collection Conference Paper The relation between block length and reliability for a cascade of AWG links Authors: Subramanian, Ramanan Publication Date: 0 Permanent Link: https://doiorg/0399/ethz-a-00705046 Rights / License: In Copyright - on-commercial Use Permitted This page was generated automatically upon download from the ETH Zurich Research Collection For more information please consult the Terms of use ETH Library
Int Zurich Seminar on Communications IZS, February 9 March, 0 The Relation Between Block Length and Reliability for a Cascade of AWG Links Ramanan Subramanian Institute for Telecommunications Research, University of South Australia, Mawson Lakes, SA 5095, Australia Email: RamananSubramanian@unisaeduau Abstract This paper presents a fundamental informationtheoretic scaling law for a heterogeneous cascade of links corrupted by Gaussian noise For any set of fixed-rate encoding/decoding strategies applied at the links, it is claimed that a scaling of Θ log n in block length is both sufficient and necessary for reliable communication A proof of this claim is provided using tools from the convergence theory for inhomogeneous Markov chains I ITRODUCTIO Consider a line network in which a message originating at a certain node is conveyed to the intended destination through a series of hops The reception at every hop is impaired by zero-mean additive white Gaussian noise AWG of a certain variance The originating node as well as the intermediate nodes are subject to an average transmit-power constraint P 0 for the duration of transmission of any message The messages, drawn from a certain set of size M, are encoded and transmitted as codewords in R that satisfy the transmit power constraint Every intermediate hop decodes possibly erroneously the original message from its noisy observation and re-encodes it before transmitting to the next hop One can form the row-stochastic probability transition matrix P i for the i th hop in which each row gives the conditional probability distribution of the message decoded at hop i, given a certain message sent by the previous hop: p i p i p i M p i p i p i M P i = p i M p i M p i M M Assume that a network of the type outlined above consists of n hops If the encoder-decoder pairs are identical at the hops, and if the noise variance at every hop is σ, then the matrices P i will be identical to, say P for all i Then, the rows of P n will give the conditional probability distributions of the message Ŵn decoded at the final destination, given a certain message W at the originating node For decisions made on signals corrupted by Gaussian noise, P has the property that any column is either all-zero or contains only non-zero entries For such a matrix P, the rows of P n tend to be identical for large n see Theorem 49 in [], if M,, P 0, and σ are independent of n In other words, the This work has been supported by the Australian Research Council under the ARC Discovery Grant DP0986089 probability distribution of Ŵ n will be almost independent of the original message W This implies that the mutual information IW ; Ŵn between the original message and the decoded message tends to zero with n Hence, communication in the network is highly unreliable if and M do not vary with the number of hops n On the other hand, if is large and if one chooses M = R for any R < log + P0 σ, random-coding theorem states that there exists a channel code of block length such that the probability of error during decoding at any link is bounded above by e ER,P/σ,whereE is the random coding exponent for the corresponding AWG channel [], [3] By employing such a code for each link in the line network model, choosing any fn ω, and letting vary with log n+fn n as, one obtains the following union bound on the ER,P/σ probability of communication failure in the network: ] P [Ŵn W np B ne ER,P/σ = e fn 0 Hence, it is possible to transmit messages with very high reliability in the network for a fixed code rate R, if the code length satisfies n Θlog n On the other hand, the previous discussion about diminishing IW ; Ŵn implies that any fixed-rate fixed-length coding scheme is not scalable for the line network This leads to the natural question whether the scaling Θlog n for is indeed optimal In this paper, we show that for any scaling of satisfying n olog n, the mutual information IW ; Ŵn diminishes to zero with n for any set of fixed-rate codes This implies that the scaling order Θlog n for is necessary for reliable communication in the network Hence, the sufficiency criterion for given by the union bound in Eqn is indeed optimal up to a constant factor While this scaling problem has been solved partially in the past for cascades of homogeneous DMCs by iesen et al in [4], it has also been pointed out in the same work that the tools developed there apply neither to non-homogeneous cascades nor to cascades of continuous-input channels as is the case in the current paper In contrast, the results developed here are applicable to continuous-input AWG links where the links can be non-identical For any fn > 0 and gn > 0, fn ωgn lim n fn gn =,fn ogn lim n fn gn = 0,fn Θgn c>0 st lim n fn gn = c 7
Int Zurich Seminar on Communications IZS, February 9 March, 0 II OTATIOS AD DEFIITIOS The set of all real numbers is denoted by R and the set of all natural numbers by atural logarithms are assumed in the notations, unless the base is specified The notation represents L norm throughout We employ the following terminologies in this paper Let, called the code length or block length of the transmission scheme A code rate R>0 is a real number such that R is an integer Let M {,, 3,, R }, called the message alphabet LetP 0 > 0 be the input power constraint for transmission Definition : A rate-r length- code C with a power constraint of P 0 is an ordering of M = R elements from R, called code words that satisfies C =x, x, x 3,, x M st w M, x w P 0 Definition : A rate-r length- decision rule R is an ordering of M = R sub-sets of R, called decision regions, spanning R and pairwise disjoint: { w M R =R, R,, R M st R w = R,and w w R w R w = 3 Definition 3: The encoding function EC C : M R for a code C is defined by: EC C w =x w, 4 where x w is the w-th code word in C Definition 4: The decoding function DEC R : R M for a decision rule R is defined by: DEC R y = w Rw y, 5 i M where is the indicator function and R w is the w-th subset in R III ETWORK MODEL The line network model to be considered is described in Fig There are n + nodes in the network identified by the indices {0,,,,n} Then hops in the network are each associated with noise variances σ i, i n For encoding, nodes 0,,,n choose C 0, C,, C n respectively to transmit, each code being rate-r length- with a power constraint of P 0 For decoding, nodes,,,n choose rate- R length- decision rules R, R,, R n for reception From here on, we write EC i and DEC i in place of EC Ci and DEC Ri respectively, for simplicity ode 0 generates a random message W M with probability distribution p W w and intends to convey the same to node n through the cascade of noisy links in a multihop fashion Each of the n intermediate nodes estimates the message as sent by the node in the previous hop from its own noisy observation, re-encodes the decoded message, and transmits the resulting codeword to the next hop The codeword transmitted by node i, forany0 i n is given by X i = EC i Ŵi, 6 where Ŵi is the estimate of the message at node i after decoding ote that Ŵ 0 = W in this notation The observation received by node i, forany i n is given by Y i,which follows a conditional density function that depends on the codeword X i sent by the previous node: p Yi X i y x = πσ i e y x σ i 7 The above density function follows from the assumptions of AWG noise and memoryless-ness in the channel The message Ŵi decoded by node i is given by Ŵ i = DEC i Y i 8 ote that the random variable Ŵn represents the message decoded by the final destination, as per the above notation IV A UPPER BOUD O THE RELIABILITY OF THE ETWORK The key result presented here is summarized by the following theorem and its corollary: Theorem : In a line network consisting of a cascade of n AWG links, let for m n of the links the noise variances satisfy σ i σ 0 > 0 Then, for any choice of codes C 0, C,, C n and decision rules R, R,, R n with rate R length, andforanyɛ > 0, the following holds for sufficiently large m: m ɛ IW ; Ŵn R Γ +e E P 0/σ 0, where E S =exp {cosh S + } +cosh S + The corollary to the above theorem leads to the upper bound on the reliability vs block-length scaling as claimed in the introductory section: Corollary : Let R > 0, σ 0 > 0, and r > 0 be independent of n In a line network of n links, let for rn of the hops the noise variance be at least σ 0 > 0 Then, for any n olog n, lim n IW ; Ŵn =0 Proof of Theorem : At any hop i, the process of channel encoding, noisy reception, followed by decoding induces the conditional probabilities pŵi Ŵi k j, j,k M For every hop i, letp i be the M M row-stochastic matrix whose entry in row j and column k is pŵi Ŵi k j ote that the j th row in P i gives the conditional probability mass function on the estimate of message W at hop i ie,ŵi given that the message sent by hop i was j Let P n P i 9 i= Then, P clearly represents the row-stochastic probability transition matrix between the original message W and the message decoded at the final destination, Ŵ n The transition matrix P 7
Int Zurich Seminar on Communications IZS, February 9 March, 0 ode 0 ode i ode n p W W EC 0 X 0 p Y X 0 Y p Yi X i Y i DEC i Ŵ i EC i X i p Yn Xn Yn DECn Ŵn Fig : Line etwork Model along with p W, the probability mass function of the original message W together induce a joint distribution between W and Ŵn Our goal is to find an upper bound on IW ; Ŵn, with the constraints as given in the statement of Theorem For any M M row-stochastic matrix Q = {Q jk },letus consider the following measures of deviation: δ Q max max j,j k Q jk Q jk, 0 M λ Q min min Q jk,q jk j,j k= Both the above deviations measure the degree to which the rows of Q differ The measure λ Q is known as Dobrushin s coefficient of ergodicity and plays a role in the convergence of non-homogeneous stochastic processes [5] Moreover, for any stochastic matrix Q, 0 λ Q ow consider the total variational distance between the joint distribution of W and Ŵn and the product of the respective marginal distributions: τ W ; Ŵn w,ŵ M p W w pŵn ŵ p W Ŵ n w, ŵ In the above expression, substituting p W w pŵn W ŵ w for p W Ŵ n w, ŵ, w p M Ŵ ŵ w n W p W w for p W Ŵ n w, ŵ and by using the triangle inequality, one obtains the following bound: τ W ; Ŵn w,ŵ p W w w p W w δ P = MδP = R δ P 3 Recalling the definition of P in our case, and from Theorem in [6] equivalently, Lemma in [7], we have: n n δ P =δ P i λ P i i= i= i:σ i σ 0 λ P i 4 The second inequality in the above follows by applying Eqn for those hops for which σ i <σ 0 ote that as per the statement of the theorem, we have at least m terms in the product in the last expression above ow consider λ P i for any hop i such that σ i σ 0 Suppose P i has L distinct columns with indices k,k,,k L such that j,p jk = pŵi Ŵi k j p 5 j,p jkl = pŵi Ŵi k L j p L 6 Then from the definition in Eqn, it follows that λ P i L p l 7 We now show that the conditions given by Eqn 5-6 hold true in our case, so that we can apply Eqn 7 to our bound Let α>0 Consider one particular hop i satisfying σ i σ 0 Define the spherical regions S = {y R : y } P 0 S α = {y R : y α } P 0 Clearly, any code word in C i transmitted by the previous hop has to be in S ext, observe that R i has to contain decision regions, say L in number, spanning S α entirely Let us designate them as R k, R k,, R kl Also define for each l L, V l R kl S α Hence, we have: L V l = S α, and l l V l V l = 8 Let Ŵi = j be the message as decoded by the previous hop Hence, the code word x j from C i was transmitted by the node i The probability that this was decoded as message k l at hop i is given by: pŵi Ŵi k l j = p Yi X i y x j dy y R kl p Yi X i y x j dy πσ i pŵi Ŵi k l j a b e y + x j σ i dy c e y x j σ i e dy +α P 0 σ i dy = e +αp 0 σ i V l, 9 73
Int Zurich Seminar on Communications IZS, February 9 March, 0 where V l denotes the -dimensional volume of V l Inthe above series of inequalities, step a follows from Eqn 7, b from the triangle inequality, and c from the fact that any y V l must satisfy y α P 0,sinceV l S α, and from the fact that x j, being a code word in C i naturally satisfies the power constraint criterion given by x j P 0 ote that the lower bound given by Eqn 9 is independent of the message j decoded by the previous hop Hence, the conditions given by Eqns 5-6 are satisfied for p l = e +αp 0 σ πσ i i V l 0 Hence, the bound given by Eqn 7 holds true In our case, it gives: L λ P i p l = e +α P0 L σ πσ i i V l e +α P0 σ i π Γ π = πσ i Γ + α P 0 The last equality follows from the fact that V l are mutually disjoint and span the entire -dimensional sphere S α Eqn 8 The volume of S α is computed as: S α = α P 0 + The ideas in the above argument are outlined in Figs a and b V 6 V V 3 V 5 V V 4 a Intersecting regions of S α with the decision rule R i S α α P 0 O x j y P0 V l S b Integrating the density function of noise centered at x j over a subset V l of S α Fig : Geometrical argument used for bounding λ P i ext, note that the bound given by Eqn holds true for any α>0 Hence, we can maximize the right hand side of Eqn over all α>0 to obtain λ P i Γ +e E P 0/σ i, 3 where the function E S is as given in the statement of the theorem oting further that ES is a non-decreasing function in S, we will have for hops satisfying σ i σ 0, λ P i Γ +e E P 0/σ 0 4 Since at least m hops satisfy the minimum noise variance criterion σ i σ 0, we can now combine Eqn 4 with Eqn 4 to obtain m δ P Γ +e E P 0/σ 0 5 Applying the above to Eqn 3, we will have τ W ; Ŵn R Γ +e E P 0/σ 0 m 6 We can finally bound IW ; Ŵn using Lemma 7 of Chapter in [8] by: IW ; Ŵn R τ W ; Ŵn log 7 τ W ; Ŵn The bound given by the statement of the theorem then follows from combining the above with Eqn 6 and by observing that for sufficiently large m, τ W ; Ŵn e,that xlog x is non-decreasing for x e, and finally from the fact that for any ɛ>0, x ɛ log x for sufficiently large x Proof of Corollary : For the corollary, we have m = nr Moreover, Γ + Hence, the bound given by Theorem reduces to: IW ; Ŵn R e E P 0/σ 0 nr ɛ 8 R e nr ɛe E P 0 /σ 0 9 = e Rlog elog nr ɛ E P 0 /σ 0 30 The last expression diminishes to 0 with n if = n o log n Hence, we will have IW ; Ŵn 0 for any o log n REFERECES [] David A Levin, Yuval Peres, and Elizabeth L Wilmer, Markov Chains and Mixing Times, American Mathematical Society, edition, 008 [] Claude E Shannon, Probability of error for optimal codes in a gaussian channel, Claude Elwood Shannon: Collected Papers, pp 79 34, 993 [3] R Gallager, A simple derivation of the coding theorem and some applications, IEEE Trans Inform Theory, vol, no, pp 3 8, Jan 965 [4] U iesen, C Fragouli, and D Tuninetti, On capacity of line networks, IEEE Trans Inform Theory, vol 53, no, pp 4039 4058, ov 007 [5] Adolf Rhodius, On the maximum of ergodicity coefficients, the Dobrushin ergodicity coefficient, and products of stochastic matrices, Linear Algebra and its Applications, vol 53, no -3, pp 4 54, 997 [6] J Hajnal and MS Bartlett, Weak ergodicity in non-homogeneous markov chains, Mathematical Proceedings of the Cambridge Philosophical Society, vol 54, pp 33 46, 958 [7] J Wolfowitz, Products of indecomposable, aperiodic, stochastic matrices, Proceedings of the American Mathematical Society, vol 4, no 5, pp pp 733 737, Oct 963 [8] I Csiszar and J Korner, Information Theory: Coding Theorems for Discrete Memoryless Systems, volofprobability and Mathematical Statistics, Academic Press, 98 74