Symmetric Characterization of Finite State Markov Channels

Similar documents
Feedback Capacity of a Class of Symmetric Finite-State Markov Channels

Some Aspects of Finite State Channel related to Hidden Markov Process

Optimal Natural Encoding Scheme for Discrete Multiplicative Degraded Broadcast Channels

Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information

Dispersion of the Gilbert-Elliott Channel

Discrete Memoryless Channels with Memoryless Output Sequences

Recent Results on Input-Constrained Erasure Channels

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Computation of Information Rates from Finite-State Source/Channel Models

The Capacity Region of a Class of Discrete Degraded Interference Channels

Cut-Set Bound and Dependence Balance Bound

On the Sufficiency of Power Control for a Class of Channels with Feedback

Superposition Encoding and Partial Decoding Is Optimal for a Class of Z-interference Channels

LECTURE 3. Last time:

Analyticity and Derivatives of Entropy Rate for HMP s

Concavity of mutual information rate of finite-state channels

Exercise 1. = P(y a 1)P(a 1 )

Lecture 4 Noisy Channel Coding

Sensor Scheduling for Optimal Observability Using Estimation Entropy

The Effect of Memory Order on the Capacity of Finite-State Markov and Flat-Fading Channels

Recursions for the Trapdoor Channel and an Upper Bound on its Capacity

Capacity of Block Rayleigh Fading Channels Without CSI

Optimal Encoding Schemes for Several Classes of Discrete Degraded Broadcast Channels

Upper Bounds on the Capacity of Binary Intermittent Communication

Multiplicativity of Maximal p Norms in Werner Holevo Channels for 1 < p 2

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

A Single-letter Upper Bound for the Sum Rate of Multiple Access Channels with Correlated Sources

Multi-Kernel Polar Codes: Proof of Polarization and Error Exponents

The Gallager Converse

Asymptotic Filtering and Entropy Rate of a Hidden Markov Process in the Rare Transitions Regime

Information Theory. Lecture 5 Entropy rate and Markov sources STEFAN HÖST

Variable Length Codes for Degraded Broadcast Channels

Optimization in Information Theory

EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15

Notes 3: Stochastic channels and noisy coding theorem bound. 1 Model of information communication and noisy channel

Asymptotics of Input-Constrained Erasure Channel Capacity

Lecture 14 February 28

(Classical) Information Theory III: Noisy channel coding

CS 630 Basic Probability and Information Theory. Tim Campbell

Stochastic Processes

Multiaccess Channels with State Known to One Encoder: A Case of Degraded Message Sets

Reliable Computation over Multiple-Access Channels

EE229B - Final Project. Capacity-Approaching Low-Density Parity-Check Codes

Lecture Notes on Digital Transmission Source and Channel Coding. José Manuel Bioucas Dias

(each row defines a probability distribution). Given n-strings x X n, y Y n we can use the absence of memory in the channel to compute

A Deterministic Algorithm for the Capacity of Finite-State Channels

Taylor series expansions for the entropy rate of Hidden Markov Processes

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye

Capacity Region of Reversely Degraded Gaussian MIMO Broadcast Channel

Shannon s noisy-channel theorem

A Proof of the Converse for the Capacity of Gaussian MIMO Broadcast Channels

The Shannon s basic inequalities refer to the following fundamental properties of entropy function:

Channel Polarization and Blackwell Measures

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

On Entropy and Lyapunov Exponents for Finite-State Channels

MATH 564/STAT 555 Applied Stochastic Processes Homework 2, September 18, 2015 Due September 30, 2015

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

CAPACITY OF THE DELAY SELECTOR CHANNEL LU CUI

Electrical and Information Technology. Information Theory. Problems and Solutions. Contents. Problems... 1 Solutions...7

Convexity/Concavity of Renyi Entropy and α-mutual Information

Feedback Capacity of the Compound Channel

= P{X 0. = i} (1) If the MC has stationary transition probabilities then, = i} = P{X n+1

Capacity of a channel Shannon s second theorem. Information Theory 1/33

STA 4273H: Statistical Machine Learning

Noisy channel communication

On the Duality between Multiple-Access Codes and Computation Codes

Lecture 8: Channel Capacity, Continuous Random Variables

Soft-Output Trellis Waveform Coding

Optimality of Walrand-Varaiya Type Policies and. Approximation Results for Zero-Delay Coding of. Markov Sources. Richard G. Wood

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Codes on Graphs. Telecommunications Laboratory. Alex Balatsoukas-Stimming. Technical University of Crete. November 27th, 2008

Midterm Exam Information Theory Fall Midterm Exam. Time: 09:10 12:10 11/23, 2016

arxiv: v1 [cs.it] 5 Sep 2008

EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 2018 Please submit on Gradescope. Start every question on a new page.

The Method of Types and Its Application to Information Hiding

CONVERGENCE THEOREM FOR FINITE MARKOV CHAINS. Contents

An Outer Bound for the Gaussian. Interference channel with a relay.

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

Bounds on Mutual Information for Simple Codes Using Information Combining

Arimoto Channel Coding Converse and Rényi Divergence

H(X) = plog 1 p +(1 p)log 1 1 p. With a slight abuse of notation, we denote this quantity by H(p) and refer to it as the binary entropy function.

IN this work, we develop the theory required to compute

Capacity of Memoryless Channels and Block-Fading Channels With Designable Cardinality-Constrained Channel State Feedback

On Maximum a Posteriori Estimation of Hidden Markov Processes

Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information

Bounded Expected Delay in Arithmetic Coding

On Comparability of Multiple Antenna Channels

Lecture 5: Channel Capacity. Copyright G. Caire (Sample Lectures) 122

INFORMATION-THEORETIC BOUNDS OF EVOLUTIONARY PROCESSES MODELED AS A PROTEIN COMMUNICATION SYSTEM. Liuling Gong, Nidhal Bouaynaya and Dan Schonfeld

Lecture 3: Channel Capacity

EE376A: Homework #2 Solutions Due by 11:59pm Thursday, February 1st, 2018

Threshold Optimization for Capacity-Achieving Discrete Input One-Bit Output Quantization

Revision of Lecture 5

Entropy and Ergodic Theory Notes 21: The entropy rate of a stationary process

Polar codes for the m-user MAC and matroids

MAHALAKSHMI ENGINEERING COLLEGE-TRICHY QUESTION BANK UNIT V PART-A. 1. What is binary symmetric channel (AUC DEC 2006)

Lecture 4: State Estimation in Hidden Markov Models (cont.)

ECE Information theory Final

Remote Source Coding with Two-Sided Information

A Half-Duplex Cooperative Scheme with Partial Decode-Forward Relaying

Transcription:

Symmetric Characterization of Finite State Markov Channels Mohammad Rezaeian Department of Electrical and Electronic Eng. The University of Melbourne Victoria, 31, Australia Email: rezaeian@unimelb.edu.au Abstract In this paper the symmetry property for discrete memoryless channels is generalized to finite state Markov channels. We show that for symmetric Markov channels the noise process can be characterized as a hidden Markov process, and the capacity of channel is related to the entropy rate of this process. These results elaborate the characterization of symmetry for Markov channels in previous narrations and extend the capacity formulae for symmetric discrete memoryless channel to symmetric Markov channels. Using recent formulation of entropy rate of hidden Markov process, this capacity formula is shown to be equivalent to the previous formulations which have been obtained for special cases of symmetric Markov channels. I. INTRODUCTION For a discrete memoryless channel, symmetry is defined by permutation invariant property in the channel conditional probability table [1]. A similar characteristics for symmetric discrete memoryless multiple access channel has been defined [2]. The capacity achieving distribution for symmetric channels, or in a broader category T-symmetric channels [2] is the uniform distribution, and for these cases direct formulae for computation of capacity exist. For finite state Markov channels [3] in the special case of uniformly symmetric variable noise channel, an integral formula for capacity of channel has been obtained [3, Theorem 5.1]. In this paper we generalize the results in [3] by defining a subclass of finite state Markov channels, as symmetric Markov channels through an extension of the symmetry property for discrete memoryless channels. We show that for such a channel, under any i.i.d input distribution, the noise process represents a unique hidden Markov process which is explicitly defined by the parameters of Moreover, the capacity of channel can be described by the entropy rate of the noise process, and the capacity achieving distribution is uniform i.i.d. Using the expression for the entropy rate of hidden Markov process [4] we show that this formulation of capacity is equivalent to the capacity formula [5] for special case of Gilbert-Elliott channel, as well as the more general formulae [3]. In this paper a random variable is denoted by upper case and its realization by lower case. A Markov process {S n } n=, S n S, is represented by a pair [S, P ], where S is the state set, and P is the state transition probability matrix. A Markov process is considered to be stationary, irreducible and aperiodic [6]. A Hidden Markov Process (HMP) [7],[8] is represented by a quadruple [S, P, Z, T ], where [S, P ] is a Markov process and T is the observation probability Matrix S Z which defines the conditional probability distribution over Z for each given s S. The Markov process {S n } n= over the domain sets S is called the state process where its dynamic is governed by P. The non-markov process {Z n } n= over the domain Z is called the hidden Markov process (or the observation process), and it is a noisy version of the state process, corrupted by the discrete memoryless channel T, with the property that state is a sufficient statistics for previous observations. In the next section, we define the symmetric Markov channel and compare it with previous notions of symmetry. In Section III, we obtain the hidden Markov process associated with a symmetric Markov channel, and formulate the capacity of channel related to the entropy rate of this process, followed by specialization of results to known cases of symmetric Markov II. SYMMETRIC MARKOV CHANNEL A finite state Markov channel [3] is defined by a pentad [X, Y, S, P, C], where [S, P ] is a Markov process and C for each x X, s S defines a probability distribution over Y. A Markov channel is characterized by the following three properties for the sequence of random variables defined on the domains X, Y, S, referred to as input, output, and state sets, respectively. A1: Independency of state and input, A2: Sufficient statistics of state, p(s n x n ) = p(s n ) (1) p(s n s n 1, x n 1, y n 1 ) = p P (s n s n 1 ), (2) where p P (..) is defined by P. A3: Memoryless, For any integer n, n p(y n x n, s n ) = p C (y i x i, s i ), (3) where p C: (..,.) is defined by C. Although Property A3 exhibits a memoryless property for the channel under the knowledge of the channel state, the Markov channel is in fact a channel with long memory in i

the absence of knowledge about state. Note that Property A2 implies Markovity of state process. Property A3 implies p(y n s n, x n, s n 1, x n 1, y n 1 ) = p C (y n s n, x n ). (4) which is true for the general choice of the input distribution p(x n ). If the inputs x n are independent then we also have p(y n, x n s n, s n 1, x n 1, y n 1 ) = p(y n, x n s n ), (5) Therefore for the special case of i.i.d inputs we have the extra property A 3: Memoryless for i.i.d inputs, p(y n, x n s n ) = n p(y i, x i s i ). (6) In this paper, we are interested in a subclass of Markov channel where the conditional probability tables in C have a symmetric property similar to the characterization of symmetry in discrete memoryless channels. Definition 1 (Symmetric Markov Channel:): A Markov channel is symmetric if for each state, the channel is an output symmetric channel, ie: for any s, the sets of numbers p C (y s, x) where each set corresponds to a x X are permutations of each other, and the same is true for the sets corresponding to different y Y. Moreover, the permutation pattern for all states are the same. Unique permutation pattern in Definition 1 is described as follows. For any s, the conditional probability p C (y x, s) defines a table which consists of a set of numbers that are repeated in the table. For different s these numbers are different, but the collection of sets of pairs (x, y) with equal numbers will remain invariant with s. According to definition, in a symmetric Markov channel the conditional distribution p(y n x n ) = s p C(y n x n, s n )p s (s n ) for any n resembles a conditional probability matrix of an output symmetric discrete memoryless Albeit this matrix is changing with n. Considering a set Z, Z = Y, the output symmetry in Definition 1 implies that for each s we can find the functions f s (.) : Z [, 1], and φ s (.,.) : X Y Z, such that f s (φ s (x, y)) = p C (y x, s). But the unique permutation pattern implies that the functions φ s (.,.) are invariant with s. Therefore for a symmetric Markov channel, there exists a function φ(.,.), such that the random variable Z = φ(x, Y ) has the conditional distribution i p(z x, s) = f s (z). (7) The set of conditional probabilities which identifies the symmetric Markov channel therefore reduces to a S Z matrix T defined by T [s, z] = f s (z) (8) Thus a symmetric Markov channel is uniquely defined by the parameters [X, Y, S, P, Z, T, φ], where besides the properties A1 to A3 (where p C (y x, s) = T [s, φ(x, y)]), the following property is valid for the corresponding random variables. A4: Symmetric Markov channel, For any n, p(z n x n, s n ) = p(z n s n ) = T [s n, z n ], z n = φ(x n, y n ), (9) for some y n Y. For example the conditional probability matrices for a twostates symmetric Markov channel is shown below by the two probability matrices C 1, C 2 which have unique permutation pattern. C 1 = a b c b c a c a b C 2 = a b c b c a c a b (1) For this channel the function z = φ(x, y) is defined such that each letter in C 1 or C 2 corresponds to a unique z and that letter is the value of f at that z, eg: φ(1, 1) = φ(2, 3) = φ(3, 2) = 1, and f 1 (1) = a, f 2 (1) = a. The set of conditional probabilities are uniquely defined by φ and T, where a b c T = a b c. (11) From Properties A1 and A4, X n is independent of both S n and Z n. p(z n x n ) = p(z n x n, s n )p(s n ) = p(z n s n )p(s n ) = p(z n ). (12) s n s n We can also show the following relations p(s n z n, x n ) = Second equality is from (9),(1),(12) p(z n s n, x n )p(s n x n ) p(z n x n ) p(s n x n, y n, z n ) = = p(z n s n )p(s n ) p(z n ) = p(s n z n ), (13) p(s n x n, y n ) = p(s n x n, z n ) = p(s n z n ), z n = φ(x n, y n ) (14) p(s n z n, x n, y n ) = p(s n z n, x n ) = p(s n z n ), z i = φ(x i, y i ), (15) i =, 1,.., n, which indicate that Z n is a sufficient statistics for S n. Therefore from the definition of symmetric Markov channel, we can identify a fundamental property for this This property is the existence of a random variable Z n = φ(x n, Y n ) such that Z n is independent of X n, and it is a sufficient statistics for S n. A Markov channel with this property is known as variable noise channel [3]. Consequently, a symmetric Markov channel is a variable noise However, the opposite statement could be invalid. Another class of Markov channels, the uniformly symmetric channel is defined in [3] similar to Definition (1), but without the requirement for unique permutation pattern for all states. However this requirement is implicitly assumed in the proof of [3, Lemma 5.2]. The second equality in [3, Eq. 96] will only be true if the expression in the square bracket is independent of x which will be satisfied only if

the channel has unique permutation pattern. (For the purpose that the convex combination of C s, s = 1, 2,.. S by the coefficients π n (k), k = 1, 2,..., S always have the same set of numbers in each row, i.e: for each x, the matrices must have a unique permutation pattern.) However non of the assumptions of uniformly symmetric or variable noise are proved to be sufficient for unique permutation pattern. For example, by swapping the last two rows of the matrix C 2 in (1) we lose the unique permutation pattern but the channel is still uniformly symmetric. Therefore, although the class of the uniformly symmetric variable noise channels is a larger class than the symmetric Markov (which are also variable noise) channels, only the later class obeys [3, Theorem 5.1] which relies on [3, Lemma 5.2]. We show the equivalency of the capacity formula [3, Theorem 5.1] with the capacity of symmetric Markov channel for the Gilbert-Elliott III. THE CAPACITY OF SYMMETRIC MARKOV CHANNELS The objective of this section is to discuss the capacity of Markov channel, defined as: C = lim max 1 n p(x n ) n I(Xn ; Y n ), (16) where the maximization is over all possible probability distributions on the input n-sequences. For an output symmetric discrete memoryless channel identified by a conditional probability matrix ω(y x), there exists functions φ(.,.) : X Y Z, Z = Y and f : Z [, 1] such that f(φ(x, y)) = ω(y x). The random variable Z = φ(x, Y ) will have the conditional distribution p(z x) = f(z) = p(z), and H(Y X) = E log f(φ(x, y)) = H(Z) (17) Moreover for any x, H(Y X = x) = H(Z) is the same, so its expectation H(Y X) does not depend on the distribution p(x). On the other hand it can be shown that H(Y ) is maximized to log Y for this channel, which results in the capacity of channel to be C = max p(x) I(X, Y ) = max H(Y ) H(Y X) = log Y H(Z). p(x) (18) The random variable Z = φ(x, Y ) represents the i.i.d noise process in the channel, so the capacity of channel is expressed by the entropy of the noise in a single use of For a symmetric Markov channel, on the other hand, the process {Z n } n= which is defined by the random variables Z n = φ(x n, Y n ) has memory of all its past samples. This noise process as a process independent of the channel input degrades the channel by its entropy, resulting in the information loss in the In the following lemma we show that if the channel has i.i.d inputs, the noise process can be characterized as a hidden Markov process. Lemma 1: For a symmetric Markov channel [X, Y, S, P, Z, T, φ] with i.i.d inputs, the noise process is a hidden Markov process with parameters [S, P, Z, T ]. Proof: The state process of the Markov channel is a Markov process. It suffices to show that the random variable Z = φ(x, Y ) defined in the symmetric case makes the process {Z n } n= as an observation of the state process through a memoryless channel, with state being a sufficient statistics for the past observation. From A2 and the fact that z n 1 is a function of (x n 1, y n 1 ) we infer p(s n s n 1, z n 1 ) = p(s n s n 1 ) = P [s n 1, s n ]. (19) which implies the sufficient statistics of state for z n 1 (as the observation process of the corresponding hidden Markov model). On the other hand, from (5) we can write p(y n, x n s n, x n 1, y n 1 ) = p(y n, x n s n ). (2) By z being a function of x, y the above conditional independency implies p(y n, x n s n, z n 1 ) = p(y n, x n s n ), (21) and consequently from A4, p(z n s n, z n 1 ) = x n,y n:z n=φ(x n.y n) p(y n, x n s n, z n 1 ) = x n,y n :z n =φ(x n.y n ) p(y n, x n s n ) = p(z n s n ) = T [s n, z n ]. (22) which implies memoryless observations of states. (Note that without A4, p(z n s n ) can vary with n which doesn t match with a hidden Markov model.) According to Lemma 1, a subset of the parameters of a symmetric Markov channel defines a unique hidden Markov process which is associated with the channel for any choice of i.i.d inputs. In the following theorem we show that the capacity of symmetric Markov channel can be expressed by the entropy rate of that unique hidden Markov process. The expression for capacity does not require any further connection of that hidden Markov process to the input or the noise processes. Theorem 1: The capacity of the symmetric Markov channel [X, Y, S, P, Z, T, φ] is: C = log Y Ĥ(Z), (23) where Ĥ(Z) is the entropy rate of the hidden Markov process [S, P, Z, T ]. Proof: From Definition 1, A1 and A 3, we show that the uniform i.i.d input distribution gives uniform marginal conditional output distributions p(y n y n 1 ), thus H(Y n Y n 1 ) is maximizes to log Y for this input distribution. This can be shown by considering that if the channel is output symmetric for any state, and if the marginal p(x n ) is uniform, then p(y n s n ) = p(y n s n, x)p(x) = 1 p(y n s n, x) (24) X x x is independent of y n, thus p(y n s n ) = 1/ Y. Using the marginalization (5), p(y n s n, y n 1 ) = p(y n s n ), we obtain that for any choice y n 1, p(y n y n 1 ) is uniform over y n. p(y n y n 1 ) = p(y n s n )p(s n y n 1 ) = 1 Y. (25) s n

Furthermore, for a symmetric Markov channel, H(Y n X n, X n 1, Y n 1 ) = H(Z n Z n 1 ), and this entropy is independent of the input distribution. This can be shown by defining C(x) η p C (y x, s)p s (s), (26) y s where η(x) = x log(x), and observing, from A4 and p C (y x, s) = T [s, φ(x, y)] that C(x) is independent of x and it is equal to the constant C(x) = η p(z s)p s (s) (27) z s for the same choice of p s (.). Therefore the expectation of C(x) (with respect to x) is independent of p x (.), and only depends on p s (.). This means that the following expression is independent of the marginal p xn (x n ), and only depends on the distribution p(s n x n 1, y n 1 ). H(Y n X n, X n 1, Y n 1 ) = E X n,y n 1 y n η(p(y n x n, x n 1, y n 1 )) = E X n,y n 1 y n η( s n p C (y n x n, s n ).p(s n x n 1, y n 1 )) = E X n 1,Y n 1 z n η( s n p(z n s n ).p(s n x n 1, y n 1 )), (28) (the last equality is due to (27)). But from A2 and (15), p(s n x n 1, y n 1 ) = s n 1 p(s n s n 1, x n 1, y n 1 )p(s n 1 z n 1, x n 1, y n 1 ) = s n 1 p(s n s n 1 )p(s n 1 z n 1 ) = p(s n z n 1 ), (29) and since X n is independent of both S n and Z n, (29) will be independent of marginal p xn (x n ). Using (29) in (28), we have shown that H(Y n X n, X n 1, Y n 1 ) = H(Z n Z n 1 ) is independent of p xn (x n ). Now for the capacity of channel, we can write, C = lim max 1 n p(x n ) n I(Xn ; Y n ) = lim max 1 n n p(x n ) n i=1 H(Y i Y i 1 ) H(Y i X i, X i 1, Y i 1 ). (3) By the uniform i.i.d input distribution p(x n ), each of the elements of the sum in (3) will be maximized to log Y H(Z i Z i 1 ). Therefore the summation will be maximized by this input distribution to n log Y H(Z n ). On the other hand when we select the i.i.d inputs, the term 1/nH(Z n ) for n will be the entropy rate of the hidden Markov process [S, P, Z, T ]. In the remainder of this section we compare the capacity formula (23) with previous results for a special case of the symmetric Markov channel, namely the Gilbert-Elliott A. Example: Gilbert-Elliott channel The Gilbert-Elliott channel [9],[1] is a Markov channel [X, Y, S, P, C], with X = Y = S = {, 1}, and 1 g g P =, < g < 1, < b < 1 b 1 b (31) and C is defined by the following two conditional probability matrices for states s = and s = 1. 1 pg p G, s = p G 1 p G ( 1 pb p B p B 1 p B ), s = 1 (32) From (32) it is apparent that the Gilbert-Elliott channel is a symmetric Markov channel according to Definition 1. We see that for the state s = by defining the function f (.), f () = 1 p G, f (1) = p G, we have f (x y) = p C (y x, s = ). Similarly, for s = 1, by f 1 () = 1 p B, f 1 (1) = p B, f 1 (x y) = p C (y x, s = 1). Therefore, the Gilbert-Elliott channel is a symmetric Markov channel [X, Y, S, P, Z, T, φ], where, Z = {, 1}, φ(x, Y ) = X Y, and 1 pg p T = G. (33) 1 p B p B The channel is completely defined by the parameters g, b, p G, p B. The capacity of the Gilbert-Elliott channel has been obtained in [5] as C = 1 1 h b (q )ψ(q )dq, (34) (or in the concise form as C = 1 lim n E[h(q n )]) where h b (.) is the binary entropy function and ψ(.) is the limiting distribution of the continuous state space Markov process {q n } n= with transition probabilities: { 1 β, α = v(, β); P r(q n+1 = α q n = β) = (35) β, α = v(1, β). and initial distribution P r[q = (gp G + bp B )/(g + b)] = 1. (36) In (35) the function v(z, q) is equal to { pg + b(p B p G ) + µ(q p G )[(1 p B )/(1 q)], z=; p G + b(p B p G ) + µ(q p G )[p B /q], z=1. (37) where µ = 1 g b. The expression (34) can be evaluated using Equations (35)- (37). These equations are also the basis for a simple cointossing method for evaluating the capacity in [11]. Now we obtain the capacity of Gilbert-Elliott channel based on Theorem 1. Equation (23) requires evaluation of the entropy rate of the hidden Markov process [S, P, Z, T ]. According to [4] the entropy rate can be obtained by Ĥ(Z) = h(ξ T )ϕ(ξ)dξ, (38)

1 The distribution ψ(.) and function h b (.) 1 The distribution µ(.) and function h b (q(.)).9.9.8.8.7.7 ψ(q) and h b (q).6.5.4 µ(π) and h b (q(π)).6.5.4.3.3.2.2.1.1.1.2.3.4.5.6.7.8.9 1 q.1.2.3.4.5.6.7.8.9 1 π Fig. 1. The distribution ψ(.) and function h b (.) in (34) for the example Fig. 2. The distribution µ(.) and function h b (q(.)) in (43) for the example where h(.) is the entropy function over, the probability simplex in R 2, and ϕ is the limiting distribution of information state Markov process {ξ n } n=, ξ n. Parameterizing ξ by π [, 1], Equation (38) reduces to Ĥ(Z) = 1 h b (q(π ))µ(π )dπ, (39) where q(π) = p G (1 π) + p B π, and µ is the limiting distribution of the Markov process {π n } n=, π n [, 1] with transition probabilities { 1 q(β), α = f(, β); P r(π n+1 = α π n = β) = q(β), and initial distribution In (4) the function f(.,.) is f(z, π) α = f(1, β). (4) P r[π = b/(g + b)] = 1. (41) Using Equations (4) and (41), the limiting distribution µ(.) can be obtained for sufficiently large n. Having obtained the distribution µ(.) the capacity can be evaluated from (23) as: C = 1 1 h b(q(π ))µ(π )dπ. (43) Numerical evaluation of (34) and (43) for different channels show the equivalency of the two formula. Figures 1 and 2 show the distributions ψ and µ for the channel g =.1, b =.3, p G =.2, p B =.6 after 1 iterations, and both expressions (34) and (43) evaluate to C =.1625 bits. Theorem 1 is also equivalent to [3, Theorem 5.1] for the capacity of symmetric Markov This can be shown easily for the special case of Gilbert-Elliott C = 1 [ 1 i= = 1 [ 1 i= log p(z = i π).p(z = i π)]µ(dπ) = 1 1 h b(q(π ))µ(π )dπ = 1 Ĥ(Z). (44) IV. CONCLUSION In this paper we have presented a unified concept of symmetry for the finite state Markov channels based on the permutation invariant property of conditional probability tables of channel, as a natural extension of symmetry for discrete memoryless channels. We showed that the capacity of the symmetric Markov channel can be expressed by the entropy rate of the noise process, a hidden Markov process associated with and explicitly defined by the symmetric Markov Using recent expressions for the entropy rate of the hidden Markov process, we have shown that the expression for the capacity of symmetric Markov channel matches with previous expressions which have been obtained for special cases. REFERENCES b(1 π)(1 p G )+(1 g)π(1 p B ) [1] R. G. Gallager, Information Theory and Reliable Communication, John 1 (1 π)p G πp B, z=; Wiley: New York, 1968. (42) [2] M. Rezaeian and A. Grant, Computation of Total Capacity for Discrete b(1 π)p G +(1 g)πp B (1 π)p G +πp B, z=1. Memoryless Multiple-Access Channels, IEEE Trans. Inform. Theory, vol 5, pp. 2779-2784, Nov. 24. [3] A. Goldsmith and P Varaiya, Capacity, Mutual Information, and Coding for Finite-State Markov Channels, IEEE Trans. Inform. Theory, vol. IT- 42 No.3, pp. 868 886, May 1996. [4] M. Rezaeian, The entropy rate of the hidden Markov process, submitted to IEEE Trans. Inform. Theory, May 25. [5] M. Mushkin and I. Bar-David, Capacity and Coding for the Gilbert- Elliott channels, IEEE Trans. Inform. Theory, vol. IT-35 No.6, pp. 1277 129, November 1989. [6] T.M.Cover and J.A.Thomas. Elements of Information Theory, Wiley, New York, 1991. [7] Y. Ephraim and N. Merhav, Hidden Markov Processes, IEEE Trans. Inform. Theory, vol. IT-48 No.6, pp. 1518-1569, June 22. [8] L.R. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition, Proceedings of the IEEE, vol 77, No 2, February 1989, pp. 257-286. [9] E. N. Gilbert, Capacity of a burst-noise channel, Bell-System, Tech J, vol. 39, pp 1253-1265, Sep. 196. log p(y = i x =, π).p(y = i x =, π)]µ(dπ) [1] E. O. Elliott, Estimates of error rates for codes on burst-noise channels, Bell-System, Tech. J, vol.42, pp. 1977-1997, Sept 1963. [11] M. Rezaeian, Computation of capacity for Gilbert-Elliott channels, using a statistical method, The proceedings of the sixth Australian Communications Theory Workshop, Brisbane, p. 52, 25.