A Single-letter Upper Bound for the Sum Rate of Multiple Access Channels with Correlated Sources

A Single-letter Upper Bound for the Sum Rate of Multiple Access Channels with Correlated Sources Wei Kang Sennur Ulukus Department of Electrical and Computer Engineering University of Maryland, College Park, MD 074 wkang@eng.umd.edu ulukus@umd.edu ariv:cs/05096v [cs.it] 8 Nov 005 Abstract The capacity region of the multiple access channel with arbitrarily correlated sources remains an open problem. Cover, El Gamal and Salehi gave an achievable region in the form of single-letter entropy and mutual information expressions, without a single-letter converse. Cover, El Gamal and Salehi also gave a converse in terms of some n-letter mutual informations, which are incomputable. In this paper, we derive an upper bound for the sum rate of this channel in a single-letter expression by using spectrum analysis. The incomputability of the sum rate of Cover, El Gamal and Salehi scheme comes from the difficulty of characterizing the possible joint distributions for the n-letter channel inputs. Here we introduce a new data processing inequality, which leads to a single-letter necessary condition for these possible joint distributions. We develop a single-letter upper bound for the sum rate by using this single-letter necessary condition on the possible joint distributions. I. INTRODUCTION The problem of determining the capacity region of the multiple access channel with correlated sources can be formulated as follows. Given a pair of correlated sources (U,V described by the joint probability distribution p(u, v, and a discrete, memoryless, multiple access channel characterized by the transition probability p(y x,x, what are the necessary and sufficient conditions for the reliable transmission of n independent identically distributed (i.i.d. samples of the sources through the channel, in n channel uses, as n? This problem was studied by Cover, El Gamal and Salehi in [], where an achievable region expressed by single-letter entropies and mutual informations was given. This region was shown to be suboptimal by Dueck []. Cover, El Gamal and Salehi [] also provided a capacity result with both achievability and converse in incomputable expressions in the form of some n-letter mutual informations. In this paper, we derive an upper bound for the sum rate of this channel in a single-letter expression. The incomputability of the sum rate of Cover, El Gamal and Salehi scheme is due to the difficulty of characterizing the possible joint distributions for the n-letter channel inputs. The Cover, El Gamal, Salehi converse is H(U,V n I(n, n ; n ( This work was supported by NSF Grants CCR 03-3, CCF 04-4763 and CCF 05-4846; and ARL/CTA Grant DAAD 9-0--00. where the random variables involved have a joint distribution expressed in the form n n p(u i,v i p(x n un p(x n vn p(y i x i,x i ( i= i.e., the sources and the channel inputs satisfy the Markov chain relation n Un V n n. It is difficult to evaluate the mutual information on the right hand side of ( when the joint probability distribution of the random variables involved is subject to (. A usual way to upper bound the mutual information in ( is n I(n,n ; n n I( i, i ; i n i= i= maxi(, ; (3 where the maximization in (3 is over all possible and such that U n V n. Therefore, combining ( and (3, a single-letter upper bound for the sum rate is obtained as, H(U,V maxi(, ; (4 where the maximization is over all, such that U n V n. However, a closed form expression for p(x,x satisfying this Markov chain, for all U, V and n, seems intractable to obtain. Data processing inequality [3, p. 3] is an intuitive way to obtain a necessary condition on p(x,x for the above Markov chain constraint, i.e., we may try to solve the following problem as an upper bound for (4 max I(, ; (5 s.t. I( ; I(U n ;V n = ni(u,v where s.t. line provides a constraint on the feasible set of p(x,x. However, when n is large, this upper bound becomes trivial as ni(u,v quickly gets larger than I( ; for p(x,x even without the Markov chain constraint. Although the data processing inequality in its usual form does not prove useful in this problem, we will still use the basic methodology of employing a data processing inequality to represent the Markov chain constraint on the valid input

distributions. For this, we will introduce a new data processing inequality. Spectrum analysis has been instrumental in the study of some properties of pairs of correlated random variables, especially, those of the i.i.d. sequences of pairs of correlated random variables, e.g., common information in [4] and isomorphism in [5]. In this paper, we use spectrum analysis to introduce a new data processing inequality. Our new data processing inequality provides a single-letter necessary condition for the joint distributions satisfying the Markov chain condition, and leads to a non-trivial single-letter upper bound for the sum rate of the multiple access channel with correlated sources. II. SOME PRELIMINARIES In this section, we provide some basic results what will be used in our later development. The concepts used here are originally introduced by Witsenhausen in [4] in the context of operator theory. Here, we limit ourselves to the finite alphabet case, and derive our results by means of matrix theory. We first introduce our matrix notation for probability distributions. For a pair of discrete random variables and, which take values in = {x,x,...,x m } and = {y,y,...,y n }, respectively, the joint distribution matrix P is defined as P (i,j Pr( = x i, = y j, where P (i,j denotes the (i,j-th element of the matrix P. From this definition, we have P T = P. The marginal distribution of a random variable is defined as a diagonal matrix with P (i,i Pr( = x i. The vectorform marginal distribution is defined asp (i Pr( = x i, i.e., p = P e, where e is a vector of all ones. Similarly, we define p P e and p e. The conditional P distribution of given is defined in the matrix form as P (i,j Pr( = x i = y j, and P = P P. We define a new quantity, P, which will play an important role in the rest of the paper, as P = P P P (6 Our main theorem in this section identifies the spectral properties of P. Before stating our theorem, we provide the following lemma, which will be used in its proof. Lemma [6, p. 49] The spectral radius of a stochastic matrix is. A non-negative matrix T is stochastic if and only if e is an eigenvector of T corresponding to the eigenvalue. Theorem An m n non-negative matrix P is a joint distribution matrix with marginal distributions P and P, i.e., Pe = p P e and P T e = p P e, if and only if the singular value decomposition (SVD of P P PP satisfies P = UΛV T = p T + λ i u i vi T (7 where U [u,...,u l ] and V [v,...,v l ] are two unitary matrices, Λ diag[λ,...,λ l ] and l = min(m,n; u = p, v = p, and λ = λ λ l 0. That is, all of the singular values of P are between 0 and, the largest singular value of P is, and the corresponding left and right singular vectors are p and p. Proof: Let P satisfy (7, then P PP e = P T + λ i u i vi T p = P p T p +P λ i u i vi T v = p (8 P PP Similarly, e T P PP = pt. Thus, the non-negative matrix is a joint distribution matrix with marginal distributions p and p. Conversely, we consider a joint distribution P with marginal distributions p and p. We need to show that the singular values of P lie in [0,], the largest singular value is equal to, and p and p, respectively, are the left and right singular vectors corresponding to the singular value. To this end, we first construct a Markov chain with P = P = P. Note that this also implies P = P, P = P = P, and P = P. The special structure of the constructed Markov chain provides the following: P = P P = P P = PP PT P = P (P PP (P PT P P = P P PT P (9 We note that the matrix P is similar to the matrix P P T [7, p. 44]. Therefore, all eigenvalues of P are the eigenvalues of P P T as well, and if v is a left eigenvector of P corresponding to an eigenvalue µ, then P v is a left eigenvector of P P T corresponding to the same eigenvalue. We note that P is a stochastic matrix, therefore, from Lemma, e is a left eigenvector of P corresponding the eigenvalue, which is also equal to the spectral radius of P. Since P is similar to P P T, we have that p is a left eigenvector of P PT with eigenvalue, and the rest of the eigenvalues of P P T lie in [,]. In addition, P PT is a symmetric positive semi-definite matrix, which implies that the eigenvalues of P P T are real and non-negative. Since the eigenvalues of P PT are non-negative, and the largest eigenvalue is equal to, we conclude that all of the eigenvalues of P P T lie in the interval [0,]. The singular values of P are the square roots of the eigenvalues of P P T, and the left singular vectors of P are the eigenvectors of P P T. Thus, the singular values of P lie in [0,], the largest singular value is equal to, and p is a

left singular vector corresponding to the singular value. The corresponding right singular vector is v T = ut P = T P PP = p T P = T (0 which concludes the proof. III. A NEW DATA PROCESSING INEQUALIT In this section, we introduce a new data processing inequality in the following theorem. We first provide a lemma that will be used in its proof. Lemma [8, p. 78] For matrices A and B λ i (AB λ i (Aλ (B ( where λ i ( denotes the i-th largest singular value of a matrix. Theorem If, then λ i ( P λ i ( P λ ( P λ i ( P ( where i =,...,rank( P. Proof: From the structure of the Markov chain, and from the definition of P in (6, we have P = P P P = P P (3 Using (7 for P, we obtain P =p T + λ i ( P u i ( P v i ( P T (4 and using (7 for P and P yields ( P P = p T + λ i ( P u i ( P v i ( P T ( p T + λ i ( P u i ( P v i ( P T ( =p T + λ i ( P u i ( P v i ( P T ( λ i ( P u i ( P v i ( P T (5 where the two cross-terms vanish since p is both v ( P and u ( P, and therefore, p is orthogonal to both v i ( P and u j ( P, for all i,j. Using (3 and equating (4 and (5, we obtain λ i ( P u i ( P v i ( P T ( = λ i ( P u i ( P v i ( P T ( λ i ( P u i ( P v i ( P T The proof is completed by applying Lemma to (6. (6 IV. ON I.I.D. SEQUENCES Let ( n, n be a pair of i.i.d. sequences, where each pair of letters of these sequences satisfies a joint distribution P. Thus, the joint distribution of the sequences isp n n = P n, where A A, A k A A (k, and represents the Kronecker product of matrices [7]. From (6, P = P P P (7 Then, P n n = P n = (P P P n = (P n P n (P n (8 We also have P n = (P n and P n = (P n. Thus, P n n P np n np n n P n = (P n (P (P n (P n = P (9 Applying SVD to P n n, we have P n n = U nλ n V T n = n P n = U n Λ n (V n T (0 From the uniqueness of the SVD, we know that U n = U n, Λ n = Λ n and V n = V n. Then, the ordered singular values of P n n are {,λ( P,...,λ( P,...} where the second through the n+-st singular values are all equal to λ ( P. V. A NECESSAR CONDITION As stated in Section I, the sum rate can be upper bounded as H(U,V maxi(, ; ( where the maximization is over all possible and that satisfy the Markov chain U n V n. From Theorem in Section III, we know that if U n V n, then, for i =,...,rank( P, λ i ( P λ ( P U nλ i( P Un V nλ ( P V n ( We showed in Section IV that λ i ( P Un V n λ ( P UV for i, and λ i ( P U n V n = λ ( P UV for i =,...,n +. Therefore, for i =,...,rank( P, we have λ i ( P λ ( P U nλ ( P UV λ ( P V n (3 From Theorem, we know that λ ( P Un and λ ( P V n. Next, in Theorem 3, we determine that the least upper bound for λ ( P U n and λ ( P V n is also. Theorem 3 Let F(n,P be the set of all joint distributions for and U n with a given marginal distribution for, P. Then, sup λ ( P Un = (4 F(n,P, n=,,...

The proof of Theorem 3 is given in the Appendix. Combining (3 and Theorem 3, we obtain the main result of our paper, which is stated in the following theorem. Theorem 4 If a pair of i.i.d. sources (U,V with joint distribution P UV can be transmitted reliably through a discrete, memoryless, multiple access channel characterized by P, then for some (, with H(U,V I(, ; (5 λ i ( P λ ( P UV, i =,...,rank( P. (6 VI. SOME SIMPLE EAMPLES We consider a multiple access channel where the alphabets of, and are all binary, and the channel transition probability matrix p(y x,x is given as \ 0 0 00 / / 0 0 0 / / The following is a trivial upper bound, which we provide as a benchmark, max I(, ; = (7 p(x,x where the maximization is over all binary bivariate distributions. The maximum is achieved by P( =, = = P( = 0, = 0 = /. We note that this upper bound does not depend on the source distribution. First, we consider a binary source (U, V with the following joint distribution p(u, v U\V 0 /3 /6 0 /6 /3 In this case, H(U,V =.9. We first note, using the trivial upper bound in (7, that, it is impossible to transmit this source through the given channel reliably. The upper bound we developed in this paper gives /3 for this source. We also note that, for this case, our upper bound coincides with the single-letter achievability expression given in [], which is H(U,V I(, ; (8 where, are such that U V holds. Therefore, for this case, our upper bound is the converse, as it matches the achievability expression. Next, we consider a binary source(u,v with the following joint distribution p(u, v U\V 0 0 0. 0 0. 0.8 In this case, H(U, V = 0.9, the single-letter achievability in (8 reaches 0.5 and our upper bound is 0.56. The gap between the achievability and our upper bound is quite small. We note that, in this case, the trivial upper bound in (7 fails to test whether it is possible to have reliable transmission or not, while our upper bound determines conclusively that reliable transmission is not possible. Finally, we consider a binary source (U,V with the following joint distribution p(u, v U\V 0 0 0.85 0 0. 0.05 In this case, H(U, V = 0.75, the single-letter achievability expression in (8 gives 0.57 and our upper bound is 0.9. We note that the joint entropy of the sources falls into the gap between the achievability expression and our upper bound, which means that we cannot conclude whether it is possible (or not to transmit these sources through the channel reliably. VII. CONCLUSION In this paper, we investigated the problem of transmitting correlated sources through a multiple access channel. We utilized the spectrum analysis to develop a new data processing inequality, which provided a single-letter necessary condition for the joint distributions satisfying the Markov chain condition. By using our new data processing inequality, we developed a new single-letter upper bound for the sum rate of the multiple access channel with correlated sources. To find APPENDI PROOF OF THEOREM 3 sup λ ( P Un, we need to exhaust F(n,P, n=,,... the sets F(n,P with n. In the following, we show that it suffices to check only the asymptotic case. For any joint distribution P U n F(n,P, we attach an independent U, say U n+, to the existing n-sequence, and get a new joint distribution P U n+ = P U n p U, where p U is the marginal distribution of U in the vector form. By arguments similar to those in Section IV, we have that λ i ( P U n+ = λ i( P U n. Therefore, for every P U n F(n,P, there exists some P U n+ F(n+,P, such that λ i ( P n+ U = λ i( P Un. Thus, sup λ ( P U n sup λ ( P Un+ (9 F(n,P F(n+,P From (9, we see that sup λ ( P Un is monotonically F(n,P non-decreasing in n. We also note that λ ( P Un is upper bounded by for all n, i.e., λ ( P Un. Therefore, sup λ ( P Un = lim F(n,P, n=,,... n sup F(n,P λ ( P U n To complete the proof, we need the following lemma. (30 Lemma 3 [4] λ ( P = if and only if P decomposes. By P decomposes, we mean that there exist sets S, S, such that P(S, P( S, P(S, P( S are positive, while P(( S S = P(S ( S = 0.

In the following, we will show by construction that there exists a joint distribution that decomposes asymptotically. For a given marginal distributionp, we arbitrarily choose a subset S from the alphabet of. We find a set S in the alphabet of U n such that P(S = P(S if it is possible. Otherwise, we pick S such that P(S P(S is minimized. We denote S(n to be the set of all subsets of the alphabet of U n and we also define P max = maxpr(s for all s U. Then, we have min S S(n P(S P(S P n max (3 We construct a joint distribution for and U n as follows. First, we construct the joint distribution P i corresponding to the case where and U n are independent. Second, we rearrange the alphabets of and U n and group the sets S, S, S and U n S as follows [ ] P i P i = P i P i P i (3 where P, i P, i P, i P i correspond to the sets S S, S (U n S, ( S S, ( S (U n S, respectively. Here, we assume that P(S P(S. Then, we scale these four sub-matrices as P = Pi P(S P(S, P P(S = 0, P = Pi (P(S P(S ( P(S, P P P(S = i ( P(S ( P(S ( P(S, and let [ ] P 0 P = (33 P P We note that P is a joint distribution for and U n with the given marginal distributions. Next, we move the mass in the sub-matrix P to P, which yields [ ] [ ] [ ] P P 0 P 0 E 0 = P+E = + 0 P P P E 0 (34 where E P, E Pi (P(S P(S P(S P(S, and P = P P(S P(S. We denote P and P U n as the marginal distributions of P. We note that P U = P n U n and P = P M where M is a scaling diagonal matrix. The elements in the set S are scaled up by a factor of P(S P(S, and those in the set S are scaled down by a factor of P(S P(S. Then, P = M P +M P U n (35 We will need the following lemmas in the remainder of our derivations. Lemma 5 can be proved using techniques similar to those in the proof of Lemma 4 [9]. Lemma 4 [9] If A = A+E, then λ i (A λ i (A E, where E is the spectral norm of E. Lemma 5 If A = MA, where M is an invertible matrix, then M λ i (A /λ i (A M. Since P decomposes, using Lemma 3, we conclude that λ ( P =. We upper bound P U n as follows, P U n P U n F (36 where F is the Frobenius norm. Combining (3 and (34, we have P U n F (P(S P(S P P(S P P i P U n F (37 where P min(p(s, P(S. Since P i corresponds to the independent case, we have P P i P U n F = from (7. Then, from (3, (36 and (37, we obtain where c P P(S. From Lemma, we have P U n c P n max (38 M P U n = λ (M P U ( n P(S c Pmax n c Pmax n (39 P(S From Lemma 4, we have c P n max λ (M P +c P n max (40 We upper bound M as follows M P(S = P(S + P(S P(S P(S + Pn/ max P(S +c 3Pmax n/ (4 Similarly, M c 4 P n/ max. From Lemma 5, we have ( c 4 Pmax n/ λ ( P (+c 3 P λ (M max P n/ (4 Since P is a joint distribution matrix, from Theorem, we know that λ ( P. Therefore, we have ( c 4 P n/ max ( c P n/ max λ ( P (43 When P max <, corresponding to the non-trivial case, lim n Pmax n/ = 0, and using (30, (4 follows. The casep(s < P(S can be proved similarly. REFERENCES [] T. M. Cover, A. El-Gamal, and M. Salehi, Multiple access channel with arbitrarily correlated sources, IEEE Trans. Inform. Theory, vol. 6, pp. 648 657, Nov. 980. [] G. Dueck, A note on the multiple access channel with correlated sources, IEEE Trans. Inform. Theory, vol. 7, pp. 3 35, Mar. 98. [3] T. M. Cover and J. A. Thomas, Elements of Information Theory. John Wiley and Sons, 99. [4] H. S. Witsenhausen, On sequences of pairs of dependent random variables, SIAM Journal on Applied Mathematics, vol. 8, pp. 00 3, Jan. 975. [5] K. Marton, The structure of isomorphisms of discrete memoryless correlated sources, eitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, vol. 56(3, pp. 37 37, 98. [6] A. Berman and R. J. Plemmons, Nonnegative Matrices in the Mathematical Sciences. Academic Press, 979. [7] R. A. Horn and C. R. Johnson, Matrix Analysis. Cambridge, 985. [8], Topics in Matrix Analysis. Cambridge, 99. [9] G. W. Stewart, On the early history of the singular value decomposition, SIAM Review, vol. 35, pp. 55 566, Dec. 993.