Computation of Total Capacity for Discrete Memoryless Multiple-Access Channels

Size: px

Start display at page:

Download "Computation of Total Capacity for Discrete Memoryless Multiple-Access Channels"

Duane Conley
5 years ago
Views:

1 IEEE TRANSACTIONS ON INFORATION THEORY, VOL. 50, NO. 11, NOVEBER Computation of Total Capacit for Discrete emorless ultiple-access Channels ohammad Rezaeian, ember, IEEE, and Ale Grant, Senior ember, IEEE Abstract The Arimoto Blahut algorithm is generalized for computation of the total capacit of discrete memorless multiple-access channels (ACs). In addition,a class of ACs is defined with the propert that the uniform distribution achieves the total capacit. These results are based on the specialization of the Kuhn Tucker condition for the total capacit of the AC,and an etension of a known smmetr propert for single-user channels. Inde Terms Arimoto Blahut algorithm,capacit,multiple-access channel (AC),nonconve optimization. I. INTRODUCTION Determination of the capacit region for multiterminal channels has attracted much attention in information theor. In man cases, single-letter representations of the capacit region are not known. Even in cases where a single-letter description has been found, such as the discrete memorless multiple-access channel (AC), evaluation of the capacit region is problematic. Specificall, computation of the boundar of the capacit region is a nonconve optimization problem. In contrast, for single-user channels in case of smmetr the capacit-achieving distribution is known, and in other cases, channel capacit can be numericall approimated to arbitrar precision using the Arimoto Blahut algorithm [1] [3] or other numerical optimization procedures [4], [5]. Such techniques are still lacking for the AC in its general form. A numerical method has been developed for two-user ACs with binar output [6]. Of particular interest for this correspondence is the computation of the total capacit, C total for the discrete memorless AC. This is the solution of the following optimization problem: ma I(X 1;X 2;...;X ; Y ): (1) P (X )P (X )...P (X ) The problem of capacit computation for a single-user channel is conve, and therefore the Kuhn Tucker condition [7] is sufficient for a distribution to achieve capacit. For a AC, this conveit is missing. Nevertheless, recent results [8] have shown that the Kuhn Tucker condition is either sufficient for optimalit in a AC, or the channel can be decomposed into subchannels for which the Kuhn Tucker condition is sufficient for optimalit. In the latter case, at least one subchannel has an optimal distribution that achieves the capacit of the original channel. In light of the result reported in [8], this correspondence gives capacit computation methods for the AC analogous to the methods used for single-user channels. Starting with an information function defined in Section II, a specialization of the Kuhn Tucker condition for the maimization (1) is given in Section III. This allows the derivation anuscript received August 7, 2003; revised a 27, This work was supported b the Australian Government under ARC Grant DP Rezaeian was with the Institute for Telecommunications Research, Universit of South Australia, awson Lakes SA 5095 Australia. He is now with the Department of Electrical and Electronic Engineering, Universit of elbourne, elbourne, Australia. A. Grant is with the Institute for Telecommunications Research, Universit of South Australia, awson Lakes SA 5095 Australia. Communicated b R. Y. Yeung, Associate Editor for Shannon Theor. Digital Object Identifier /TIT of a generalized Arimoto Blahut algorithm for total capacit, eplained in Section IV. Section V defines a class of smmetric -user discrete memorless ACs for which the uniform distribution is the solution of (1). This is based upon a generalization of the well-known notion of a smmetric single-user channel [10, p. 94]. Furthermore, a subset of this class in fact ehibits a special smmetr propert in the channel transition probabilit table. Notation P (Y ) and P (Y j X) represent probabilit distribution functions P Y (1) and P Y jx (1j1) (the latter being a conditional probabilit). Lower case arguments denote that the variable has been set to a specific value in the corresponding function, e.g., P (Y j ) represents P Y jx (1j). II. THE INFORATION FUNCTION A fundamental function, evaluating the information associated with individual elements of the discrete support X of a random variable X in the contet of the joint distribution P (X; Y ) is I(; Y )=D (P (Y j )kp (Y )) (2) where D(1k1) is the Kullback Leibler distance D P (Y )kp 0 (Y ) = P ()log P () P 0 () : Based on this definition, mutual information and entrop are given b the following epectations, taken with respect to P (X): where I(X; Y )= X[I(; Y )] (3) H(X) = X[I(; X)] (4) I(; X) =D(P (X j )kp (X)) = 0 log P () is the self-information associated with. The function I(; Y ) measures the D-variation in the distribution on Y due to revelation of. Whereas a distribution P (Y ) represents probabilistic knowledge of Y, its variation represents the information associated with a particular event. Gallager [10, p. 16] defines I(; ) log(p (j)=p ()) as the mutual information and (2) as an average mutual information over Y (with respect to P (Y j), rather than P (Y )). Now although I(; ) is not alwas positive, I(; Y ) 0 and in light of the preceding discussion (2) will be referred to as the information function. The information function can be etended to a set of variables X i 2X i, namel I( 1 ; 2 ; ::; ; Y )=D(P (Y j 1 ; 2 ;...; )kp (Y )): For S f1; 2;...;g, let S = f i : i 2Sgand denote the marginalization of I( 1 ; 2 ; ::; ; Y ) to S b IS (S; Y )= P S ( S )I(; Y ) where S = ns. In the case of a singleton set, notation will be abused to write I m = Ifmg. These notations should not be confused with I(S ; Y )=D(P (Y j S )kp (Y )). Note that I(X; Y )= X [IS (S; Y )] (5) I(XS; Y )= X [I(S ; Y )]: (6) /04$ IEEE

2 2780 IEEE TRANSACTIONS ON INFORATION THEORY, VOL. 50, NO. 11, NOVEBER 2004 The information function I(S ; Y ) depends on the joint distribution P (XS ; Y ). For fied conditional probabilit P (Y j XS ), the information function, and therefore mutual information, is a functional of the joint distribution P (XS). Each probabilit value P (S ) can therefore be considered as a variable and the mutual information I(XS; Y ) as a function of these variables. an information-theoretic problems are concerned with optimization of I(XS; Y ) over certain selections of variables P (S ). The first- and second-order derivatives of I(XS; Y ) Y ) = I(S ; Y ) I(XS; Y 0 S ) = 0 P ( j S )P ( j 0 S ) : (8) P () If however the random variables X are mutuall independent, with P (X) = m=1 P m(xm) then the mutual information I(X; Y ) can be regarded as a function of variables P m ( m ) for m 2X m, m 2, and it can be shown Y m(m) = I m ( m ; Y ) I(X; Y m(m)@pm( 0 m ) = 0 P ( j m)p ( j 0 m) P 2 I(X; Y m ( m )@P m ( m ) = I fm;m g( m;m ; Y ) 0 P ( j m )P ( j m ) (10) : (11) P () The derivatives (7) and (8) are along the lines of [10, eq. (4.5.5)], whereas the relations (9) (11) appear to be new. The first-order derivative is used to obtain the Kuhn Tucker condition for a distribution that maimizes mutual information. The second derivative can be used to obtain the Hessian matri for the mutual information at an given distribution, which could be used for conveit analsis of the mutual information although this approach is not pursued here. For a pair of random variables X; Y with fied P (Y j X), mutual information I(X; Y ) is a concave function of the variables P (). Therefore, a set of variables P () satisfing the Kuhn Tucker condition is sufficient for maimization of the mutual information under the constraints P () = 1 and P () 0. Based on (7) for a pair of random variables X; Y, in [10, Theorem 4.5.1] it is shown that the Kuhn Tucker condition is equivalent to having a constant C such that I(; Y )=C; P () > 0 I(; Y ) C; P () =0: (12) This specialization of the Kuhn Tucker condition suggests that an observer Y can get maimum average information about X if each for which P () > 0 has the same information for the observer. From (3), the unique number C satisfing (12) is the maimum of the mutual information I(X; Y ). The condition (12) ielded a criterion for testing (and eventuall to an iterative algorithm for finding) the capacit-achieving distribution for single-user channels. III. TOTAL CAPACITY An -user discrete memorless AC is defined b input alphabets X m, m =1; 2;...;, an output alphabet Y, and a conditional probabilit distribution Q(Y j X). Without loss of generalit, assume that for each 2Ythere eists a set of values ( 1 ; 2 ;...; ), for which Q( j 1; 2;...; ) 6= 0.Aregular channel is a AC with the condition jx m j jyj for all m. The total capacit of a AC is the solution of (1), where the joint distribution is given b P 1(X 1)P 2(X 2)...P (X ) Q(Y j X). Although the objective function is concave in the probabilit distribution for an one particular user, it is not concave over variation across users [6]. Despite this nonconveit, it has been shown in [8] that for regular channels the Kuhn Tucker condition is sufficient for a distribution to achieve the total capacit. Therefore, the total capacit of a regular channel can be obtained b finding an input distribution that satisfies the Kuhn Tucker condition. Nonregular channels can be turned into regular channels via reduction of the input alphabets. Each regular channel resulting from deletion of particular letters from the input alphabets is called a sub-ac. It has been shown in [8] that there eists a sub-ac with capacit equal to the original channel. Therefore, the capacit of a AC is the maimum of the capacit of all possible sub-acs. Note that reduction of the input alphabets cannot increase capacit. Since each sub-ac is regular, its capacit can be obtained b finding a distribution that satisfies the Kuhn Tucker condition. Accordingl, the discussion in this paper is restricted to regular channels. The following theorem due to [8] states the necessar and sufficient 1 condition for a capacit-achieving distribution in a regular channel. To emphasize the similarit to the single-user case (12), the theorem has been reformulated in terms of the information function I m ( m ; Y ) defined in the previous section. Theorem 1: For a regular AC, a set of necessar and sufficient conditions for the input probabilit distributions P m (X m ), m = 1; 2;...;, to achieve the total capacit is that there eists a number C such that for all m and m I m ( m ; Y )=C; P m ( m ) > 0 I m(m; Y ) C; P m(m) =0: (13) Furthermore, the total capacit of the channel is C total = C. Proof: Application of the Kuhn Tucker condition for the maimization of I(X; Y ) subject to the constraints P m(m) =1, and P m ( m ) 0, m 2, shows that if the input distribution P 3 () = m=1 P 3 m( m) achieves the total capacit then there eists unique Lagrange multiplier vectors 3 =( 1 ; 2 ;...; ) and 3 =( 1;1 ; 1;2 ;...; 1;jX j ; 2;1 ;...; ;jx j ) such that for all m and Y m(m) + m + m; =0 m; =0; Pm(m) > 0 m; 0; P m ( m )=0: In other words, there eist numbers m such that for all m and Y ) = 0 m m(m) P m ( m ) > Y ) 0 m m(m) P m ( m )=0: Since the channel is regular, according to [8] the above condition is also a sufficient condition for the global optimalit of the distributions P m (X m ). 1 Full details of the sufficienc proof in [8] can be found in [9].

3 IEEE TRANSACTIONS ON INFORATION THEORY, VOL. 50, NO. 11, NOVEBER Using (9), it follows that the set of distributions P m(xm) achieves the total capacit if and onl if for all m and m From (5) for an m, i.e., I m ( m ; Y )=C m ; P m ( m ) > 0 I m ( m ; Y ) C m ; P m ( m )=0: (14) P m ( m )I m ( m ; Y )=I(X; Y ) P m(m)cm = Cm is invariant with m and is equal to the maimum of mutual information. Therefore, in (14) C m = C = Ctotal for all m. Theorem 1 is a generalization of (12) for maimization of I(X; Y ). It shows that an observer Y with sufficient domain of observation can get maimum average information about a set of independent sources X 1 ;X 2 ;...;X, if each probable outcome of each source gives the same average (over all possible outcomes of all other sources) information. An obvious avenue of generalization is to consider maimization of the conditional mutual informations, I(XS ; Y jxs ), to obtain bounds on R(S) i2s R i, S. Let I(; Y jz) =D (P (Y j ; z)kp (Y jz)) I(; Y jz) = z P (z)i(; Y jz): Then for a regular AC, a set of necessar, but not sufficient conditions for the input distributions P m (X m ) to maimize R(S) (over the space of input product distributions), is that there eists a number C such that for all m 2Sand m 0 2 S I m(m; Y jxs )=C; P m(m) > 0 I m ( m ; Y jxs ) C; P m( m )=0 I(XS; Y jxsnfm g; m )=C; P m ( m ) > 0 I(XS; Y jxsnfm g; m ) C; P m ( m )=0: In the case of a true maimum, ma R(S) =C. The nonsufficienc results from the fact that although the Kuhn Tucker conditions are necessar and sufficient for maimization of each I(XS; Y jxs = S ), this does not impl sufficienc for a conve combination P (S )I(X S; Y jxs = S ). Theorem 1 cannot be used directl to obtain an analtical solution for maimization of I(X; Y ). It can however be used to obtain an iterative solution for the total capacit of the AC. IV. ITERATIVE COPUTATION OF THE TOTAL CAPACITY In this section, a generalization of the Arimoto Blahut algorithm [1], [2] for computation of the total capacit of regular AC is presented. The Arimoto Blahut algorithm gives a sequence of input probabilit distributions P r (), r = 0; 1;... that converges to a capacit-achieving distribution for a single-user channel. This sequence is defined b P r+1 () =P r () for all 2X, where P 0 () 6= 0for all. ep(i(; Y )) P r ( 0 ) ep(i( 0 ; Y )) (15) The generalization of the Arimoto Blahut algorithm for computation of the total capacit of regular ACs is given b the following sequence of probabilit distributions in r = 0; 1;... for all m = 1; 2...; and m 2X m: P r+1 m ( m )=P r m( m ) ep(i m(m; Y )) P r m( 0 m) ep(i m( 0 m ; Y )) (16) where P 0 m( m ) 6= 0, for all m and m. The probabilit distribution for user m is calculated based on the updated probabilit distributions for users 1 to m01.for =1, this reduces to the usual Arimoto Blahut algorithm (15). Theorem 2: For a regular AC, the sequence (16) converges to a total capacit-achieving distribution. The theorem is established b showing that the sequence (16) converges, and that an convergence point satisfies (13). Since mutual information is finite, the convergence of the sequence can be verified b showing that the sequence is nondecreasing in mutual information, which will be shown in Lemma 1. The characterization of the convergence point of the sequence is given in Lemma 2, which is actuall applicable to a wider class of algorithms, where ep in (16) can be replaced b an arbitrar monotonic function. Thus, Lemmas 1 and 2 establish Theorem 2. Lemma 1: The sequence (16) is nondecreasing in mutual information. Proof: The sequence (16) has two nested loops. The outer loop is over m, i.e., different users. For each user, the inner loop is over the letters m 2X m. It suffices to show that for each user m, updating the probabilities according to (16), increases mutual information. For fied m let and P () = P 0 () = P n ( n ) P 0 n( n)=p 0 m ( m) P n(n) be two input product distributions with the same marginals for all users, ecept possibl for user m. Given the channel transition matri Q( j ), define S(;P 0 )= Q( j )log Sm( 0 m ;P 0 )= J(P; P 0 )= S(;P 0 ) S(;P 0 ) Q( j ) Pn( 0 n) Q( j 0 ) P 0 n( 0 n) (17) P 0 n( n ) (18) P n ( n )+ H(X n ) (19) n n where H(X n) is with Pn. Note J(P; P ) = I(X; Y ) where the mutual information is with the distribution P. oreover Sm( 0 0 m;p ) = log P (m) +Im(m; Y ) 0 H(X n) (20) where I m ( m ; Y ) is with the distribution P. For fied P 0, J(P; P 0 ) is a concave function of P. This is because H(X m) is a concave function of Pm, which is a marginal of P, and the first element in the right-hand side of (19) is a linear function of P m. Therefore, the solution to r P J(P; P 0 )=0maimizes J(P; P 0 ) for fied P 0.

4 2782 IEEE TRANSACTIONS ON INFORATION THEORY, VOL. 50, NO. 11, NOVEBER 2004 Consider the constraint P m( 3 m)=10 P m( m)=1b substituting 6= P m( m) where 3 is an selected element of X m. Then P m is a function of jx m j01 variables, P 0 = S0 m( m ;P 0 ) 0 Sm( 0 3 m;p 0 )+log Pm( 3 m ) m( m) P m( m) for m 6= 3 m. Therefore, the solution to r P Thus, J(P m ;P 0 )=0is given b log P m ( m ) 0 S 0 m( m ;P 0 )=c m ; for all m 6= 3 m: P m( m)= for all m 6= 3 m and from (20) P m( m)= which reduces to P m( m)=p 0 m( m) ep(s 0 m( m ;P 0 )) ep(s 0 m( 0 m;p 0 )) Pm( 0 m ) ep I m ( m ; Y ) 0 H(X n ) Pm( 0 0 m) ep I m( 0 m; Y ) 0 H(X n) ep(i m( m; Y )) Pm( 0 0 m): ep(i m( 0 m; Y )) : (21) Therefore, for fied P 0, the function J(P; P 0 ) is maimized for P m selected according to (21). Denoting the distribution P with this P m as P 3 I(P 0 )=J(P 0 ;P 0 ) J(P 3 ;P 0 ): (22) On the other hand, using a similar inequalit to that used in [2] J(P; P 0 ) J(P; P )=I(P ) (23) where P 0 m 6= P m. Now the two inequalities (22) and (23) show that updating distribution of the user m from P 0 to P 3 according to (21) does not decrease mutual information. Now define the sequence of input probabilit distributions P r m( m ), r =0; 1;... P r+1 m ( m )=P r m( m ) f (I m( m; Y )) P r m( 0 m)f(i m( 0 m; Y )) (24) for all m =1; 2;...; and m 2X m, where f is an continuous monotonicall increasing positive function over R +. Also, assume that I m ( m ; Y ) is calculated similar to (16) and that P 0 m( m ) 6= 0, for all m and m. The sequence (24) is defined to emphasize that the ke propert required for the following lemma is monotonicit, rather than special properties of the eponent in (16). Lemma 2: For a regular AC, the convergence point of the sequence P r in (24), if it eists, achieves the total capacit. Proof: Suppose that for each m and m, the sequence P r m( m), r!1converges to P 3 m( m ). It will be shown that the set of probabilities P 3 m( m ) satisfies both of the conditions of (13). From the sequence P r m( m) define the sequence D r m( m )= f (I m( m; Y )) P r m( 0 m)f(i m( 0 m; Y )) : Convergence of P r m( m) implies that lim r!1 D r m ( m)=1for all m and m such that P 3 m( m ) 6= 0. The reason is that for r!1 Pm r+1 ( m)=pm( r m)=pm( 3 m): Denoting I 3 m( m ; Y ) as I m ( m ; Y ) for the distribution P 3 f (I 3 m( m ; Y )) = P 3 m( 0 m)f(i 3 m( 0 m; Y )) subject to P 3 m( m) 6= 0, which in turn implies f (I 3 m( m; Y )) = C m, for all m such that P 3 m( m ) 6= 0, for some constant C m. Since f is monotonic, this implies the first condition of (13). The net step is to show that P 3 not satisfing the second condition of (13) contradicts the convergence of P r! P 3.Now P r m( m)=p 0 m( m) r D n m( m): (25) If P 3 does not satisf the second condition of (13), then since f is assumed to be monotonicall increasing f (I 3 m( m ; Y )) > P 3 m( 0 m)f(i 3 m( 0 m; Y )) for some m, and m such that P 3 m( m)=0. This means that the sequence D r m( m ) for some m and m has converged to a value greater than one. This, along with the assumption P 0 m( m ) > 0, contradicts the convergence of the sequence of P r m( m) as a sequence of partial products in (25). As the proof shows, the requirement that the initial probabilities to be nonzero for all letters of all users is essential for convergence to a Kuhn Tucker point. A starting zero probabilit for a letter remains alwas zero. Theorem 2 is now established b Lemmas 1 and 2. From Theorem 1 and Lemma 2 it can be seen that if in (24) for a function f and a number m, and each value of m, the variation of I m ( m ; Y ) in the sequence P r tends to zero, then the total capacit of the channel is C total = I m( m; Y ), for m such that P ( m) 6= 0. Therefore, in a generalized algorithm based on (24), the convergence of the algorithm can be considered as a stopping criterion, or nonconvergence after a certain number of iteration as an initialization. Under general selection of f in Lemma 2, the algorithm converges onl to the optimal probabilit distribution (if it converges). Arbitrar selection of f does not guarantee that the algorithm will alwas converge. Such a guarantee required special properties of the eponential function in Lemma 1. Numerical investigations show, however, that the sequence almost alwas converges for a wide range of choices of the function f, and that it tpicall converges faster than f = ep. Fig. 1 shows tpical results for a two-user channel with X 1 = X 2 = f0; 1g, Y = f0; 1; 2g, and transition matri 0:2 0:3 0:5 0:7 0:2 0:1 0:5 0:1 0:4 0:3 0:4 0:3 (rows are inputs, naturall ordered). The initial distributions P (X 1 = 0) = 0:3 and P (X 2 =0)=0:6 were used in the eample. The figure shows the sequence (16) for various choices of f, namel, f () =e, f () = p, f () =, and f () = e 0 1 (the latter two result in almost identical sequences). The results of Fig. 1 seem to be representative, in that similar behavior was observed, in terms of relative convergence speed for man other (albeit arbitraril chosen) channels (with differing input/output alphabet sizes).

5 IEEE TRANSACTIONS ON INFORATION THEORY, VOL. 50, NO. 11, NOVEBER Fig. 1. Tpical algorithm convergence for different functions f (). Fig. 2. Values of (p ;p ) resulting in optimalit of the uniform distribution for the channel (27). An eplanation for convergence in (24) is as follows. At each iteration, the algorithm updates the probabilit distributions b giving more probabilit to letters that have more value of the information function. Increasing the probabilit for high information letters has two effects. First, it increases the average information I(X; Y ). On the other hand, this increased probabilit decreases the information of those letters, thus balancing the information function toward a constant, a requirement for the maimizing distribution. Conversel, the reduction in probabilit of low information letters is in favor of increasing information of those letters. This acts toward the balance of information between letters unless the probabilit of such low information alphabets reaches zero, i.e., the second condition of (13) is satisfied. An interesting open problem is to find a better characterization of monotonic functions f in (24) such that convergence alwas occurs, i.e., find an analog of Lemma 1 for functions other than ep. The numerical results provide strong motivation for further investigation in that direction. V. SYETRIC CHANNELS A single-user discrete memorless channel (DC) is defined b an input alphabet X, an output alphabet Y, and a conditional probabilit distribution Q(Y j X). The Kuhn Tucker condition [10, Theorem 4.5.1] implies that an input distribution P (X) is optimal if there eists a constant C, such that (12) is satisfied. This shows that the set of all channels with a specific nonboundar distribution (a distribution with nonzero probabilit for all letters) P 3 () as optimal, are channels for which the information function I(; Y ) for the distribution P 3 (X) is constant for all. Of particular interest is the class of channels for which the uniform distribution U () =1= jx j is the optimal input distribution. For such channels, the function Q( j ) T () = Q( j )log 2Y 2X Q( j 0 ) = I(; Y ) 0 log jx j (26) needs to be constant. The term I(; Y ) is the information function for the uniform distribution. For the class of smmetric channels identified in Shannon s original work [11, Sec. 16] and further developed in [10, Sec. 4.5], it is known that the uniform distribution is optimal. The proof of this result, however, rests onl on the fact that for such channels I(; Y ) is constant. This propert is equivalent to satisfaction of the Kuhn Tucker condition for uniform distribution. Thus, the whole class of channels with optimal uniform distribution is identified b uniformit in T () instead of permutation invariance in the channel transition probabilit table Q(Y jx). Note that T () is a function of this table. It could therefore be more natural to define a concept of smmetr b uniformit in T (). This propert for a channel is equivalent to optimalit of uniform input distribution. The purpose of this section is to etend this new definition of smmetr to the AC. For the purpose of distinction with the previous notions of smmetr, this definition will be referred to as T -smmetr. Definition 1 (T -Smmetric DC): A DC is T -smmetric if the function T defined b (26) is constant. Channels with row and column permutation smmetr in the transition probabilit matri (i.e., smmetric according to [10, p. 94]) are T -smmetric and it is well known that the are optimized b uniform input distribution. It is possible however to construct nontrivial eamples of T -smmetric channels that are not smmetric according to [10] (even allowing for partitions of the matri). One such eample is the channel defined b the conditional probabilit table 1=3 1=3 1=3 (27) p 1 p p 1 0 p 2 with p 1 = p 2 =1=6. Direct calculation shows that the uniform distribution satisfies (12) for this channel. Furthermore, this channel is not some isolated special case. In fact, infinitel man channels ma be constructed with constant T () that are not smmetric according to [10]. In demonstration, Fig. 2 shows the locus in the p 1 ;p 2 plane of all channels of the form (27) having uniform input distribution as optimal. Theorem 1 shows that the set of all regular ACs with a set of nonboundar distributions Pm( 3 m ) as optimal distribution, are channels for which I m(m; Y ) for that distribution is a constant. Of special interest is the class of ACs for which the uniform input distribution for each user is optimal. The T -smmetric single-user channel in Definition 1 is a special case of this class for single-user channels. The class of T -smmetric ACs will be defined based on a function of the channel conditional probabilities. For a AC Q( j 1; 2; ::; ), define the function T () and its marginalization T m ( m ) to user m b T () = T m ( m )= Q( j ) log Q( j ) Q( j 0 ) (28) T (): (29) Definition 2 (T -Smmetric AC): A AC is T -smmetric if for all m, the functions T m defined b (29) are constant.

6 2784 IEEE TRANSACTIONS ON INFORATION THEORY, VOL. 50, NO. 11, NOVEBER 2004 It can be inferred that for T -smmetric channels the dependenc of T m ( m ) on m can onl be in the form of T m ( m )=C= jx m j, where C is a constant. Using Theorem 1 one can show that the T -smmetr propert for a regular channel is equivalent to the propert that the uniform distribution for all users is the capacit-achieving distribution. In fact, for the uniform input distribution which leads to T m(a) = = T (; m = a) T ( a;b (; m = a)) and I(; Y )=T () + I m ( m ; Y )= = l6=m 1 jx l j l6=m 1 jx l j m log(jx m j) I(; Y ) T m ( m )+ m log(jx m j): (30) = T (; m = b) = T m(b) which means the channel is T -smmetric. One eample of a smmetric AC is the on off fading AC defined in [12]. This channel is a regular AC, and it can be shown that in the special case of unique outputs [12], the channel is smmetric according to Definition 3. Therefore, the uniform distribution achieves capacit for this channel. If T m ( m ) is independent of m, then so is I m ( m ; Y )=C m. Since I(X; Y )= I m ( m ; Y )= jx m j = C m is independent of m, and I m ( m ; Y )=C, which implies that T -smmetr results in (13) for uniform distributions. Conversel, if for the uniform input distribution I m(m; Y )=C, then from (30), Tm(m) can onl be dependent on m. Equation (30) shows that the total capacit of a regular T -smmetric channel is C total = Tm(m) l6=m jx lj + l log(jx l j) (31) for an arbitrar m and m. Finall, a subset of T -smmetric channels can be identified according to a certain smmetr propert of their transition probabilities. Definition 3 (Smmetric AC): A AC with identical 2 input alphabets X is smmetric if each row permutation of the AC transition probabilit matri corresponding to transposing source letters a; b 2X for all users is the same as a column permutation. The specialization of the condition in Definition 3 to single-user channels gives a subset of the condition in [10] for a smmetric singleuser channel. Theorem 3: The uniform input distribution achieves the total capacit of a regular smmetric AC. Proof: According to the preceding discussion, it suffices to show that an smmetric AC is T -smmetric. Define the operation a;b (i 1;i 2;...;i ) that swaps a and b in its argument, e.g., For the smmetric AC 2;3(1; 3; 4; 2; 3) = (1; 2; 4; 3; 2): Q( j a;b ()) T ( a;b ())= Q( j a;b ()) log Q( j a;b ()) 0 0 Q( j ) = Q( j ) log Q(0 j ) = T () VI. CONCLUSION The Arimoto Blahut algorithm has been etended for computation of the total capacit of regular ACs. The total capacit of the general ACs can be broken down to capacit computation for a number of regular ACs. oreover, a class of regular T -smmetric discrete memorless ACs has been defined, with the propert that the uniform input distribution for each user achieves the total capacit. Specialization of this class to the single-user case gives an etended concept of smmetr, namel, T -smmetr, which is the class of all DCs with uniform optimum input distribution. A subclass of ACs was also identified in which smmetr is a direct result of a transposition smmetr in the channel transition probabilit. REFERENCES [1] S. Arimoto, An algorithm for computing the capacit of arbitrar discrete memorless channels, IEEE Trans. Inform. Theor, vol. IT-18, pp , Jan [2] R. E. Blahut, Computation of channel capacit and rate-distortion functions, IEEE Trans. Inform. Theor, vol. IT-18, pp , Jul [3] R. W. Yeung, A First Course in Information Theor. New York: Kluwer Academic/Plenum, [4]. Chiang and S. Bod, Geometric programming duals of channel capacit and rate distortion, IEEE Trans. Inform. Theor, vol. 50, pp , Feb [5] J. Huang and S. en, Characterization and computation of optimal distributions for channel coding, in Proc. 37th Conf. Information Science and Sstems, Baltimore, D, ar [6] Y. Watanabe, The total capacit of two-user multiple-access channel with binar output, IEEE Trans. Inform. Theor, vol. 42, pp , Sept [7] S. Bod and L. Vandenberghe, Conve Optimization. Cambridge, U.K.: Cambridge Univ. Press, [8] Y. Watanabe and K. Kamoi, The total capacit of multiple access channel, in Proc. IEEE Int. Smp. Information Theor, Lausanne, Switzerland, June/Jul 2002, p [9], The total capacit of multiple access channel, IEEE Trans. Inform. Theor, submitted for publication. [10] R. G. Gallager, Information Theorand Reliable Communication. New York: Wile, [11] C. E. Shannon, A mathematical theor of communication, Bell Sst. Tech. J., vol. 27, pp , [12] E. Perron,. Rezaeian, and A. Grant, The on-off fading channel, in Proc. IEEE Int. Smp. Information Theor, Yokohama, Japan, June/Jul 2003, p Obviousl, the equalit of alphabet sets is not important. It is sufficient that the alphabet sets can be mapped to an identical set.

Computation of Csiszár s Mutual Information of Order α

Computation of Csiszár s Mutual Information of Order Damianos Karakos, Sanjeev Khudanpur and Care E. Priebe Department of Electrical and Computer Engineering and Center for Language and Speech Processing