A New Data Processing Inequality and Its Applications in Distributed Source and Channel Coding

Size: px

Start display at page:

Download "A New Data Processing Inequality and Its Applications in Distributed Source and Channel Coding"

Joella Boyd
5 years ago
Views:

1 A New Data Processing Inequality and Its Applications in Distributed Source and Channel Coding Wei Kang Sennur Ulukus ariv:cs/0607v [cs.it 3 Nov 006 Department of Electrical and Computer Engineering University of Maryland, College Park, MD 074 wkang@eng.umd.edu ulukus@umd.edu July 4, 08 Abstract In the distributed coding of correlated sources, the problem of characterizing the joint probability distribution of a pair of random variables satisfying an n-letter Markov chain arises. The exact solution of this problem is intractable. In this paper, we seek a singleletter necessary condition for this n-letter Markov chain. To this end, we propose a new data processing inequality on a new measure of correlation by means of spectrum analysis. Based on this new data processing inequality, we provide a single-letter necessary condition for the required joint probability distribution. We apply our results to two specific examples involving the distributed coding of correlated sources: multi-terminal rate-distortion region and multiple access channel with correlated sources, and propose new necessary conditions for these two problems. This work was supported by NSF Grants CCR 03-3, CCF and CCF It was presented in part at the Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, October 005 [, the Conference on Information Sciences and Systems (CISS), Princeton, NJ, March 006 [, and the IEEE International Symposium on Information Theory (ISIT), Seattle, WA, July 006 [3.

2 Problem Formulation In this paper, we consider a pair of correlated discrete source sequences with length n, (U n,v n ) = {(U,V ),...,(U n,v n )}, which areindependent andidentically distributed(i.i.d.) in time, i.e., p(u n,v n ) = and n p(u i,v i ) () p(u i,v i ) = p(u,v), i =,...,n () wherethesingle-letterjointdistributionp(u,v)isdefinedonthealphabetu V. Let(, ) be two random variables such that (,,U n,v n ) satisfies i= p(x,x,u n,v n ) = p(u n,v n )p(x u n )p(x v n ) (3) or equivalently, U n V n This Markov chain appears in some problems involving the distributed coding of correlated sources. For example, in distributed rate-distortion problem [4 6, (, ) is used to reconstruct, (Ûn, ˆV n ), an estimate of the sources (U n,v n ), and in the problem of multiple access channel with correlated sources [7,8, (, ) is sent though a multiple access channel in one channel use. Although these specific problems have been studied separately in their own contexts, the common nature of these problems, the distributed coding of correlated sources, enables us to conduct a general study, which will be applicable to these specific problems. The study of the converse proofs of (or the necessary conditions for) the above specific problems raises the following questions. We know that the correlation between (, ) is limited, if a single-letter Markov chain U V is to be satisfied. With the help of more letters of the sources, i.e., U n V n with n larger than, the correlation between (, ) may increase. The question here is how correlated (, ) can be, when n goes to infinity. More specifically, can they be arbitrarily correlated? If not, then, how much extra correlation can (, ) gain when n goes from to? To answer these questions, we need to determine the set of all valid joint probability distributions p(x,x ), if U n V n is to be satisfied with n going to infinity, i.e., S {p(x,x ) : U n V n, n } (4) = f (U n ) and = f (V n ) is a degenerate case. We are also interested in determining the set of all valid probability distributions p(x,x,u,v ), or the set of all valid probability distributions p(x,x,u,u,v,v ), etc., if this Markov chain constraint is to be satisfied.

3 We note that it is practically impossible to exhaust the elements in the set S by searching over all conditional distribution pairs (p(x u n ),p(x v n )) when n. In other words, determining the set of all possible probability distributions p(x,x ) satisfying the n-letter Markov chain, i.e., the set S, seems computationally intractable. To avoid this problem, we seek a single-letter necessary condition for the above n-letter Markov chain. The resulting set, characterized by computable single-letter constraints, will contain the target set S. The most intuitive necessary condition for a Markov chain is the data processing inequality [9, p. 3, i.e., if U n V n, then I( ; ) I(U n ;V n ) = ni(u;v) (5) Since I(U n ;V n ) increases linearly with n, the constraint in (5) will be loose when n is sufficiently large. Although the data processing inequality in its usual form does not prove useful in this problem, we will still use the basic methodology of employing a data processing inequality to find a necessary condition for the n-letter Markov chain under consideration. For this, we will introduce a new measure of correlation, and develop a new data processing inequality based on this new measure of correlation. Spectrum analysis has been instrumental in the study of some properties of pairs of correlated random variables, especially, those of i.i.d. sequences of pairs of correlated random variables, e.g., common information in [0 and isomorphism in [. In this paper, we use spectrum analysis to introduce a new data processing inequality, which provides a singleletter necessary condition for the joint distributions satisfying the n-letter Markov chain. Main Results. Some Preliminaries In this section, we provide some basic results which will be used in our later development. The concepts used here are originally introduced by Witsenhausen in [0 in the context of operator theory. Here, we focus on the finite alphabet case, and derive our results by means of matrix theory. We first introduce our matrix notation for probability distributions. For a pair of discrete random variables and, which take values in and, respectively, the joint probability distribution matrix P is defined as P (i,j) Pr( = x i, = y j ) (6) where P (i,j) denotes the (i,j)-th element of the matrix P. The marginal distribution 3

4 matrix of a random variable, P, is defined as a diagonal matrix with P (i,i) Pr( = x i ) (7) and the vector-form marginal distribution, p, is defined as 3 p (i) Pr( = x i ) (8) or equivalently p = P e, where e is the vector of all ones. p can also be defined as p P for some degenerate random variable whose alphabet size is equal to one. For convenience, we define p P e (9) For conditional distributions, we define matrix P z as P z (i,j) Pr( = x i, = y j Z = z) (0) The vector-form conditional distribution p z is defined as p z (i) Pr( = x i Z = z) () or equivalently, p z (i) P z for some degenerate random variable whose alphabet size is equal to one. as We define a new matrix, P, which will play an important role in the rest of the paper, P P P P () Since p P for some degenerate random variable whose alphabet size is equal to one, we define p = P P P = P p = p (3) The counterparts for conditional distributions, P z and p y, can be defined similarly. A valid joint distribution matrix, P, is a matrix whose entries are non-negative and sum to. Due to this constraint, not every matrix will qualify as a P corresponding to a joint distribution matrix as defined in (). A necessary and sufficient condition for P to correspond to a joint distribution matrix is given in Theorem below, which identifies the spectral properties of P. Before stating the theorem, we provide a lemma and a definition regarding stochastic matrices, which will be used in the proof of the theorem. 3 In this paper, we only consider the case where p is a positive vector. 4

5 Definition [, p. 48 A square matrix T of order n is called (row) stochastic if T(i,j) 0 i,j =,...,n, T(i,j) = i =,...,n (4) j= Lemma [, p. 49 The spectral radius of a stochastic matrix is. A non-negative matrix T is stochastic if and only if e is an eigenvector of T corresponding to the eigenvalue. Theorem A non-negative matrix P is a joint distribution matrix with marginal distributions P and P, i.e., Pe = p P e and P T e = p P e, if and only if the singular value decomposition (SVD) of the non-negative matrix P P PP satisfies P = MΛN T = p (p ) T + l λ i µ i νi T (5) i= where M [µ,...,µ l and N [ν,...,ν l are two unitary matrices, Λ diag[λ,...,λ l and l = min(, ); µ = p, ν = p, and λ = λ λ l 0. That is, all of the singular values of P are between 0 and, the largest singular value of P is, and the corresponding left and right singular vectors are p and p. Proof: Let P satisfy (5), then P PP e = P (p (p ) T + ) l λ i µ i νi T i= p l = P p (p ) T p +P λ i µ i νi T ν i= = p (6) Similarly, e T P PP = pt. Thus, the non-negative matrix P PP matrix with marginal distributions p and p. is a joint distribution Conversely, we consider a joint distribution P with marginal distributions p and p. We need to show that the singular values of P lie in [0,, the largest singular value is equal to, and p and p, respectively, are the left and right singular vectors corresponding to the singular value. To this end, we first construct a Markov chain Z with P = P Z = P (this construction comes from [0). Note that this also implies P = P Z, P = P Z = P, and P = P Z. The special structure of the constructed Markov chain 5

6 provides the following: P Z = P P Z = P P = PP PT P = P (P PP )(P P T P )P = P P PT P (7) which implies that the matrix P Z is similar to the matrix P P T [3, p. 44. Therefore, all the eigenvalues of P Z are the eigenvalues of P P T as well, and if ν is a left eigenvector of P Z corresponding to an eigenvalue λ, then P ν is a left eigenvector of P P T corresponding to the same eigenvalue. We note that P Z T is a stochastic matrix, therefore, from Lemma, e is a left eigenvector of P Z corresponding the eigenvalue, which is equal to the spectral radius of P Z. Since P Z is similar to P P T, we have that p is a left eigenvector of P PT with eigenvalue, and all the eigenvalues of P PT lie in [,. In addition, P PT is a symmetric positive semi-definite matrix, which implies that the eigenvalues of P P T are real and non-negative. Since the eigenvalues of P P T are non-negative, and the largest eigenvalue is equal to, we conclude that all of the eigenvalues of P P T lie in the interval [0,. The singular values of P are the square roots of the eigenvalues of P PT, and the left singular vectors of P are the eigenvectors of P P T. Thus, the singular values of P lie in [0,, the largest singular value is equal to, and p is a left singular vector corresponding to the singular value. The corresponding right singular vector is which concludes the proof. ν T = µ T P = (p ) T P PP = e T PP = p T P = (p ) T (8) This theorem implies that there is a one-to-one relationship between P and P. It is easy to see from () that there is a unique P for every P. Conversely, any given P satisfying (5) gives a unique pair of marginal distributions (P,P ), which is specified by the left and right positive singular vectors corresponding to its largest singular value 4. Then, from (), using P and (P,P ) given by its singular vectors, we obtain a unique P as P = P PP (9) Because of this one-to-one relationship, exploring all possible joint distribution matrices P 4 We observe that there may exist multiple singular values equal to, but µ and ν are the only positive singular vectors. 6

7 is equivalent to exploring all possible non-negative matrices P satisfying (5). Here, λ,...,λ l can be viewed as a group of quantities, which measures the correlation between random variables and. We note that when λ = = λ l =, and are fully correlated, and, when λ = = λ l = 0, and are independent. In all the cases between these two extremes, and are arbitrarily correlated. Moreover, Witsenhausen showed that and have a common data if and only if λ = [0. In the next section, we will propose a new data processing inequality with respect to these new measures of correlation, λ,...,λ l. By utilizing this new data processing inequality, we will provide a single-letter necessary condition for the n-letter Markov chain U n V n.. A New Data Processing Inequality In this section, first, we introduce a new data processing inequality in the following theorem. Here, we provide a lemma that will be used in the proof of the theorem. Lemma [4, p. 78 For matrices A and B λ i (AB) λ i (A)λ (B) (0) where λ i ( ) denotes the i-th largest singular value of a matrix. Theorem If Z, then λ i ( P Z ) λ i ( P )λ ( P Z ) λ i ( P ) () where i =,...,rank( P Z ). Proof: From the structure of the Markov chain, and from the definition of P in (), we have P Z = P P ZP Z = P P P P P Z P Z = P PZ () Using (5) for P Z, we obtain P Z =p (p Z ) T + l λ i ( P Z )µ i ( P Z )ν i ( P Z ) T (3) i= 7

8 and applying (5) to P and P Z yields P PZ ( = p (p ) T + ) l l λ i ( P )µ i ( P )ν i ( P ) )(p T (p Z ) T + λ i ( P Z )µ i ( P Z )ν i ( P Z ) T i= ( l )( l ) =p (p Z ) T + λ i ( P )µ i ( P )ν i ( P ) T λ i ( P Z )µ i ( P Z )ν i ( P Z ) T i= i= i= (4) where the two cross-terms vanish because p plays the roles of both ν ( P ) and µ ( P Z ), and therefore, p is orthogonal to both ν i ( P ) and µ j ( P Z ), for all i,j. Using () and equating (3) and (4), we obtain l λ i ( P Z )µ i ( P Z )ν i ( P Z ) T i= ( l )( l ) = λ i ( P )µ i ( P )ν i ( P ) T λ i ( P Z )µ i ( P Z )ν i ( P Z ) T i= i= (5) The proof is completed by applying Lemma to (5) and also by noting that λ ( P Z ) from Theorem. Theorem is a new data processing inequality in the sense that the processing from to Z reduces the correlation measure λ i, i.e., the correlation between and Z, λ i ( P Z ), is less than or equal to the correlation measure between and, λ i ( P ). We note that this theorem is similar to the data processing inequality in [9, p. 3 except instead of mutual information, we use λ i ( P ) as the correlation measure. In the sequel, we will show that this new data processing inequality helps us develop a necessary condition for the n-letter Markov chain while the data processing inequality in its usual form [9, p. 3 is not useful in this context..3 A Necessary Condition Now, we switch our attention to i.i.d. sequences of correlated sources. Let (U n,v n ) be a pair of i.i.d. (in time) sequences, where each letter of these sequences satisfies a joint distribution P UV. Thus, the joint distribution of the sequences is P U n V n = P n UV, where A A, A k A A (k ), and denotes the Kronecker product of matrices [3. From (), we know that P UV = P U PUV P V (6) 8

9 Then, We also have P U n = P n U P U n V n = P n UV = (P U PUV P V ) n = (P n U ) P n UV (P V ) n (7) and P V n = P n V. Thus, P U n V n P U P n U n V np V n Now, applying SVD to P U n V n, we have n P n = (P U ) n (P U ) UV (P V ) n (P V ) n n = P UV (8) P U n V n = M nλ n N T n = P n UV = M n Λ n (N n ) T (9) From the uniqueness of the SVD, we know that M n = M n, Λ n = Λ n and N n = N n. Then, the ordered singular values of P U n V n are {,λ ( P UV ),...,λ ( P UV ),...} where the second through the n+-st singular values are all equal to λ ( P UV ). From Theorem, we know that if U n V n with n, then, for i =,...,min(, ), λ i ( P ) λ ( P U n)λ i( P U n V n)λ ( P V n ) (30) We showed above that λ i ( P U n V n) λ ( P UV ) for i, and λ i ( P U n V n) = λ ( P UV ) for i =,...,n+. Therefore, for i =,...,min(, ), we have λ i ( P ) λ ( P U n)λ ( P UV )λ ( P V n ) (3) From Theorem, we know that λ ( P U n) and λ ( P V n ). Next, in Theorem 3, we determine that the least upper bound for λ ( P U n) and λ ( P V n ) is also. Theorem 3 Let F(n,P ) be the set of all joint distributions for and U n with a given marginal distribution for, P. Then, The proof of Theorem 3 is given in Appendix B.. sup λ ( P Un) = (3) F(n,P ), n=,,... Based on the above discussion, we have the following theorem. 9

10 Theorem 4 If U n V n, then, for i =,...,min(, ), λ i ( P ) λ ( P UV ) (33) Theorem 4 provides a single-letter necessary condition for the n-letter Markov chain U n V n on the joint probability distribution p(x,x ). This theorem also answers the questions we posed in Section. Our first question was whether (, ) can be arbitrarily correlated, when n goes to infinity. Theorem 4 shows that (, ) cannot be arbitrarily correlated, as the correlation measures between (, ), λ i ( P ), are upper bounded by, λ ( P UV ), the second correlation measure of the single-letter sources (U,V). Our second question was how much extra correlation (, ) can gain when n goes from to. Although we have no exact answer for this question, the following observation may provide some insights into this problem. From Theorem, we know that, if U V, λ i ( P ) λ i ( P UV ) i =,...,min(, ) (34) Theorem 4 shows, on the other hand, that, if U n V n, λ i ( P ) λ ( P UV ) i =,...,min(, ) (35) Therefore, we note that n going from to increases the upper bounds 5 for the correlation measures λ i ( P ) from λ i ( P UV ) to λ ( P UV ) for i = 3,...,min(, ). As we mentioned in Section, the data processing inequality in its usual form [9, p. 3 is not helpful in this problem, while our new data processing inequality, i.e., Theorem, provides a single-letter necessary condition for this n-letter Markov chain. The main reason for this difference is that while themutual information, I(U n ;V n ), the correlationmeasure in the original data processing inequality, increases linearly with n, λ i ( P U n V n), the correlation measure in our new data processing inequality, is bounded as n increases, and therefore, makes the problem more tractable. Theorem 4 is valid for all discrete random variables. To illustrate the utility and also the limitations of Theorem 4, we will study a binary example in detail in Appendix A. In this example, (U,V) and (, ) are binary random variables. For this specific binary example, we will apply Theorem 4 to obtain a necessary condition for the n-letter Markov chain. Moreover, the special structure of this binary example will enable us to provide a sharper necessary condition than the one given in Theorem 4. We will compare these two necessary conditions and a sufficient condition for this binary example. 5 In general, these upper bounds are not tight. 0

11 .4 Conditional Distributions Theorem 4 in Section.3 provides a necessary condition for joint probability distributions p(x,x ), which satisfy the Markov chain U n V n. In certain specific problems, e.g., multi-terminal rate-distortion problem and multiple access channel with correlated sources, in addition to p(x,x ), the distributions of (, ) conditioned on parts of the n-letter sources may be needed, e.g., p(x,x u,v ), p(x,x u,u,v,v ), etc. 6 In this section, we will develop a result similar to that in Theorem 4 for conditional distributions. For a pair of i.i.d. sequences (U n,v n ) of length n, we define U as an arbitrary subset of {U,...,U n }, i.e., U {U i,...,u il } {U,...,U n } (36) and similarly, V {V j,...,v jk } {V,...,V n } (37) In the following theorem, we propose an upper bound for λ i ( P uv), when U n V n is satisfied. Theorem 5 Let (U n,v n ) be a pair of i.i.d. sequences of length n, and let the random variables, satisfy U n V n. Then, for i =,...,min(, ), λ i ( P uv) λ ( P UV ) (38) where U {U,...,U n } and V {V,...,V n }. Proof: We consider a special case of (U,V) as follows. We define U {U,...,U l } and V {V,...,V m,v l+,...,v l+k m }. We also define the complements of U and V as: U c {U,...,U n }\U and V c {V,...,V n }\V. If U and V take other forms, we can transform them to the form we defined above by permutations. We know that p(x,x,u c,v c u,v) = p(x u c,u,v)p(u c,v c u,v)p(x v c,v,u) (39) In other words, given U = u and V = v, (,U c,v c, ) form a Markov chain. Thus, from (), Furthermore, P uv = P U c uv P U c V c uv P V c uv (40) P U c V c uv = p T V p m+ l ul m+ U l+k m l+ v P l+k m U n l+ l+k m+ Vl+k m+ n (4) 6 The reader may wish to consult Sections 3 and 4 for further motivations to consider conditional probability distributions.

12 As mentioned earlier, a vector marginal distribution can be viewed as a joint distribution matrix with a degenerate random variable whose alphabet size is equal to. Since the rank of a vector is, from Theorem, the sole singular value of p V l m+ u l m+ (and of p U l+k m is equal to. Then, Combining (), (40), and (4), we obtain l+ v l+k m l+ λ i ( P U c V c uv) = λ i ( P U n l+k m+ V n l+k m+ ) (4) ) λ i ( P uv) λ ( P UV ) (43) which completes the proof..5 General Result In Sections.3 and.4, we proposed necessary conditions for the n-letter Markov chain U n V n on p(x,x ) and p(x,x uv), respectively. With these tools, we will develop a general result in this section. We define the set S UV as follows S UV {p(x,x u,v) : U n V n,n } (44) where U {U,...,U n } and V {V,...,V n }. We may invoke Theorem 5 with (U,V) = (U,V) and obtain S UV {p(x,x u,v) : λ i ( P uv) λ ( P UV ),i =,...,min(, )} S UV (45) In the following, we use Theorem 5 with different choices of set arguments to find a set that is smaller than S UV, but still contains S UV. We note that for a given source distribution p(u,v), we can obtain p(x,x u,v ) (or equivalently P u v ) for any U U and V V, from the conditional distribution p(x,x u,v). Thus, if we define S U V {p(x,x u,v) : λ i ( P u v ) λ ( P UV ),i =,...,min(, )} (46) then, by invoking Theorem 5 with (U,V) = (U,V ), we have S UV S U V (47) Consequently, if we define S UV S U V (48) U U,V V

13 then, we have S UV S UV S UV (49) That is, when we need a necessary condition on p(x,x u,v), even though S UV provides such a necessary condition, we can obtain a smaller probability set and therefore a stricter necessary condition by combining the necessary conditions for all p(x,x u,v ) where the sets U and V are included in the sets U and V, respectively. 3 Example I: Multi-terminal Rate-distortion Region Ever since the milestone paper of Wyner and Ziv [5 on the rate-distortion function of a single source with side information at the decoder, there has been a significant amount of efforts directed towards solving a generalization of this problem, the so called multi-terminal rate-distortion problem. Among all the attempts on this difficult problem, the notable works bytung [4andHousewright [5(seealso [6)providetheinner andouterboundsfortheratedistortion region. A more recent progress on this problem is by Wagner and Anantharam in [6, where a tighter outer bound is given. A very promising and very recent result can be found in [7. The multi-terminal rate-distortion problem can be formulated as follows. Consider a pair of discrete memoryless sources (U, V), with joint distribution p(u, v) defined on the finite alphabet U V. The reconstruction of the sources are built on another finite alphabet Û ˆV. The distortion measures are defined as d : U Û R+ {0} and d : V ˆV R + {0}. Assume that two distributed encoders are functions f : U n {,,...,M } and f : V n {,,...,M } and a joint decoder is the function g : {,,...,M } {,,...,M } Ûn ˆ V n, where n is a positive integer. A pair of distortion levels D (D,D ) is said to be R-attainable, for some rate pair R (R,R ), if for all ǫ > 0 and δ > 0, there exist, some positive integer n and a set of distributed encoders and joint decoder (f,f,g) with rates ( n log M, n log M ) = (R +δ,r +δ), such that the distortion between the sources (U n,v n ) and the decoder output (Ûn, ˆV n ) satisfies 7 ( Ed (U n, ˆV n ),Ed (V n, ˆV n ) ) < (D + ǫ,d + ǫ) where d (U n,ûn ) n n i= d (U i,ûi) and d (V n, ˆV n ) n n i= d (V i, ˆV i ). The problem here is to determine, for a fixed D, the set R(D) of all rate pairs R, for which D is R-attainable. 3. Existing Results We restate the outer bound provided in [4 and [5 in the following theorem. 7 By (A,B) < (C,D), we mean both A < B and C < D, and (A,B) (C,D) is defined in the similar manner. 3

14 Theorem 6 [4,5 R(D) R out, (D), where R out, (D) is the set of all R such that there exists a pair of discrete random variables (, ), for which the following three conditions are satisfied:. The joint distribution satisfies U V (50) U V (5). The rate pair satisfies R I(U,V; ) (5) R I(U,V; ) (53) R +R I(U,V;, ) (54) 3. There exists ( Û(, ), ˆV(, ) ) such that ( Ed (U,Û),Ed (V, ˆV ) ) D. An inner bound is also given in [4 and [5 as follows. Theorem 7 [4,5 R(D) R in (D), where R in (D) is the set of all R such that there exists a pair of discrete random variables (, ), for which the following three conditions are satisfied:. The joint distribution satisfies U V (55). The rate pair satisfies R I(U,V; ) (56) R I(U,V; ) (57) R +R I(U,V;, ) (58) 3. There exists ( Û(, ), ˆV(, ) ) such that ( Ed (U,Û),Ed (V, ˆV) ) D. We notethattheinner andouter boundsagreeonboththesecond condition, i.e., therate constraints in terms of some mutual information expressions, and the third condition, i.e., the reconstruction functions. However, the first condition in these two bounds constraining the underlying probability distributions p(x,x u,v) are different. It is easy to see that the Markov chain condition in the inner bound, i.e., U V, implies the Markov 4

15 chain conditions in the outer bound, i.e., U V and U V. Hence, if we define S out, {p(x,x u,v) : U V and U V } (59) S in {p(x,x u,v) : U V } (60) then, S in S out, (6) Using thetime-sharing argument, aconvexification oftheinner boundr in (D)yields another inner bound R in(d), which is larger than R in (D). This new inner bound may be expressed as a function of S in and D as follows, R in (D) R in (D) = F(S in,d) R(D) (6) where, using a time sharing random variable Q, which is known by the encoders and the decoder, F(S in,d) is defined as, F(S in,d) p P(S in,d) C(p) (63) p p(x,x,q u,v) = p q (x,x u,v)p(q) (64) p q (x,x u,v) S in ; P(S in,d) p : ( Û(,,Q), ˆV(,,Q) ), s.t. ( Ed (U,Û),Ed (V, ˆV) ) (65) D R I(U,V;,Q) C(p) (R,R ) : R I(U,V;,Q) (66) R +R I(U,V;, Q) From the definition of the function F, we can see that F is monotonic with respect to the set argument when the distortion argument is fixed, i.e., F(A,D) F(B,D), if A B (67) In [5, it was shown that R out, (D) is convex. Thus, R out, (D) can be represented in terms of function F as well, i.e., R out, (D) = F(S out,,d) (68) 5

16 F as 8 R out,(d) = F(S out,,d) (69) The result by Wagner and Anatharam [6 can also be expressed by using the function where S out, {p(x,x u,v) : w,p(x,x,w u,v) = p(w)p(x w,u)p(x w,v)} (70) The distribution in (70) may be represented by the following Markov chain like notation U V ց ր W (7) We note that S in S out, S out, (7) Therefore, we conclude that the gap between the inner and the outer bounds comes only from the difference between the feasible sets of the probability distributions p(x,x u,v). In the next section, we will provide a tighter outer bound for the rate region in the sense that it can be represented using the same mutual information expressions, however, on a smaller feasible set for p(x,x u,v) than R out, (D). 3. A New Outer Bound We propose a new outer bound for the multi-terminal rate-distortion region as follows. Theorem 8 R(D) R out, (D), where R out, (D) is the set of all R such that there exist some positive integer n, and discrete random variables Q,, for which the following three conditions are satisfied:. The joint distribution satisfies p(u n,v n,x,x,q) = p(q)p(x u n,q)p(x v n,q). The rate pair satisfies n p(u i,v i ) (73) i= R I(U,V ;,Q) (74) R I(U,V ;,Q) (75) R +R I(U,V ;, Q) (76) 8 This is a simplified version of [6 with the assumption that there is no hidden source behind (U n,v n ). 6

17 where (U,V ) is the first sample of the n-sequences (U n,v n ). 3. There exists ( Û(,,Q), ˆV(,,Q) ) such that ( Ed (U,Û),Ed (V, ˆV) ) D. or equivalently, where R out,3 (D) = F(S out,3,d) (77) S out,3 {p(x,x u,v ) : U n V n } (78) Proof: We consider an arbitrary triple (f,f,g) of two distributed encoders and one joint decoder with reconstructions (Ûn, ˆV n ) = g(,z), where = f (U n ) and Z = f (V n ), such that the distortions satisfy ( Ed (U n, ˆV n ),Ed (V n, ˆV n ) ) < (D + ǫ,d + ǫ). Here, we use R = n log (M ) = n log ( ) and R = n log (M ) = n log ( Z ). have where We define the auxiliary random variables i = (,U i ) and i = (Z,V i ). Then, we log (M ) H() = I(U n,v n ;) I(U n,v n ; Z) = I(U i,v i ; Z,U i,v i ) = = = 3 = = = i= I(U i,v i ;,Z U i,v i ) I(U i,v i ;Z U i,v i ) i= I(U i,v i ;,Z U i,v i ) I(U i,v i ;Z V i ) i= I(U i,v i ;,Z,U i V i ) I(U i,v i ;U i V i ) I(U i,v i ;Z V i ) i= I(U i,v i ;,Z,U i V i ) I(U i,v i ;Z V i ) i= I(U i,v i ;,U i Z,V i ) i= I(U i,v i ; i i ) (79) i=. follows from the fact that U n V n Z. We observe that the equality holds when is independent of Z; 7

18 . follows from the fact that p(z u i,v i,v i ) = p(z u i,v i,u i,v i ) (80) 3. follows from the memoryless property of the sources. Using a symmetrical argument, we obtain log (M ) I(U i,v i ; i i ) (8) i= Moreover, log (M M ) H(,Z) =I(U n,v n ;,Z) = H(U i,v i ) H(U i,v i,z,u i,v i ) = i= I(U i,v i ; i, i ) (8) i= We introduce a time-sharing random variable Q, which is uniformly distributed on {,...,n} and independent of U n and V n. Let the random variables and be such that p(x i,x i u i,v i,u c i,v c i) = p(x,x u,v,u c,v c,q = i) (83) where U c i {U,...,U i,u i+,...,u n } and V c i is defined similarly. Then, I(U i,v i ; i i ) = ni(u,v ;,Q) (84) i= I(U i,v i ; i i ) = ni(u,v ;,Q) (85) i= I(U i,v i ; i, i ) = ni(u,v ;, Q) (86) i= The reconstruction pair (Û, ˆV) is defined as follows. When Q = i, (Û, ˆV) (Ûi, ˆV i ), i.e., the i-th letter of (Ûn, ˆV n ) = g(,z). (Ûi, ˆV i ) is a function of (,Z), and, therefore, it is a function of (,,Q). Hence, we have that (Û, ˆV) is a function of (,,Q), i.e., ( Û(,,Q), ˆV(,,Q) ). It is easy to see that ( Ed (U,Û),Ed (V, ˆV) ) = ( Ed (U n, ˆV n ),Ed (V n, ˆV n ) ) < (D +ǫ,d +ǫ) (87) 8

19 which completes the proof. Next, we state and prove that our outer bound given in Theorem 8 is tighter than R out, (D) given in (69). Theorem 9 R out,3 (D) R out, (D) (88) Proof: Here, we provide two proofs. First, we prove this theorem by construction. For every (R,R ) point in R out,3 (D), there exist random variables Q,, satisfying (73), (R,R ) pair satisfying (74), (75) and (76), and a reconstruction pair ( Û(,,Q), ˆV(,,Q) ) suchthat ( Ed (U,Û),Ed (V, ˆV) ) D. Accordingto[5,let = (,Q)and = (,Q). Then, p(x,x u,v ) belongs to set S out,. Moreover, R I(U,V;,Q) = I(U,V; ) (89) and similarly, R I(U,V;,Q) = I(U,V; ) (90) and finally, R +R I(U,V;, Q) = H(U,V Q) H(U,V,,Q) = H(U,V) H(U,V,,Q) = H(U,V) H(U,V, ) = I(U,V;, ) (9) where. follows from the fact that Q is independent of (U,V). (Û, ˆV) is a function of (,,Q), and, therefore, it is a function of (, ) = ( (,Q),(,Q) ). Hence, for every rate pair (R,R ) R out,3 (D), there exist random variables, such that p(x,x u,v ) S out,, (R,R ) pair satisfies the mutual information constraints, and the reconstruction satisfies the distortion constraints. In other words, (R,R ) R out, (D), proving the theorem. An alternative proof comes from the comparison of S out, and S out,3, the feasible sets of probability distributions 9 p(x,x u,v ). We note that U n V n implies the Markov chain like condition in (7), which means that S out,3 S out, (9) 9 In S out,, the probability distribution is p(x,x u,v). Here, we just rename U = U and V = V. 9

20 and because of the monotonic property of F(,D) in (67), we have F(S out,3,d) = R out,3 (D) R out, (D) = F(S out,,d) (93) 3.3 A New Necessary Condition From the proof of Theorem 8, we note that ( i, i ) satisfies an n-letter Markov chain constraint i U n V n i. From the discussion in Section.5, we know that if the random variables and satisfy U n V n, then, λ i ( P ) λ ( P UV ) i =,...,min(, ) (94) λ i ( P u ) λ ( P UV ) i =,...,min(, ) (95) λ i ( P v ) λ ( P UV ) i =,...,min(, ) (96) λ i ( P u v ) λ ( P UV ) i =,...,min(, ) (97) or equivalently S out,3 S out,4 (98) where S out,4 {p(x,x u,v ) : (94),(95),(96), and (97) are satisfied} (99) Thus, we have the following theorem Theorem 0 R(D) R out,4 (D), where R out,4 (D) is the set of all R such that there exist discrete random variable Q independent of (U,V), and discrete random variables, for which the following three conditions are satisfied:. The joint distribution satisfies, λ i ( P q) λ ( P UV ) i =,...,min(, ) (00) λ i ( P uq) λ ( P UV ) i =,...,min(, ) (0) λ i ( P vq) λ ( P UV ) i =,...,min(, ) (0) λ i ( P uvq) λ ( P UV ) i =,...,min(, ) (03) 0

21 . The rate pair satisfies R I(U,V;,Q) (04) R I(U,V;,Q) (05) R +R I(U,V;, Q) (06) 3. There exists ( Û(,,Q), ˆV(,,Q) ) such that ( Ed (U,Û),Ed (V, ˆV) ) D. Equivalently, From Section.5, we have that R out,4 (D) = F(S out,4,d) (07) S out,3 S out,4 (08) and therefore R out,3 (D) = F(S out,3,d) R out,4 (D) = F(S out,4,d) (09) From Theorem 9, we know that S out,3 S out, (0) and R out,3 (D) = F(S out,3,d) R out, (D) = F(S out,,d) () So far, we have not been able to determine whether S out,4 S out, or S out, S out,4, however, we know that there exists some probability distribution p(x,x u,v ), which belongs to S out,, but does not belong to S out,4. For example, assume λ ( P UV ) < and some random variable W independent to (U,V). Let = (f (U ),W) and = (f (V ),W). We note that (,,U,V ) satisfies the Markov chain like condition in (7), i.e., p(x,x u,v ) S out,. But, (, ) contains common information W, which means that λ ( P ) = > λ ( P UV ) [0, and therefore, p(x,x u,v ) / S out,4. Based on this observation, we note that introducing S out,4 helps us rule out some unachievable probability distributions that may exist in S out,. The relation between different feasible sets of probability distributions p(x,x u,v ) is illustrated in Figure. Finally, we note that we can obtain a tighter outer bound interms of the function F(,D) by using a set argument which is the intersection of S out, and S out,4, i.e., R out, 4 (D) F(S out, S out,4,d) () It is straightforward to see that this outer bound R out, 4 (D) is in general tighter than the outer bound F(S out,,d).

22 all p(x, x u, v) S out, Sout, S out,3 S out,4 ~ λ i (P ) < ~ λ (P UV) ~ λ i (P ) < ~ u λ (P UV) ~ λ i (P < v ) ~ λ (P UV) ~ < ~ λ i (P ) λ (P UV) uv S in U V U V W U V U V for i =, 3,..., l n n U V Figure : Different sets of probability distributions p(x,x u,v). 4 Example II: Multiple Access Channel with Correlated Sources The problem of determining the capacity region of the multiple access channel with correlated sources can be formulated as follows. Given a pair of i.i.d. correlated sources(u, V) described by the joint probability distribution p(u, v), and a discrete, memoryless, multiple access channel characterized by the transition probability p(y x,x ), what are the necessary and sufficient conditions for the reliable transmission of n samples of the sources through the channel, in n channel uses, as n? 4. Existing Results The multiple access channel with correlated sources was studied by Cover, El Gamal and Salehi in [7 (a simpler proof was given in [8), where an achievable region expressed by single-letter entropies and mutual informations was given as follows. Theorem [7 A source (U,V) with joint distribution p(u,v) can be sent with arbitrarily small probability of error over a multiple access channel characterized by p(y x,x ), if there

23 exist probability mass functions p(s), p(x u,s), p(x v,s), such that H(U V) < I( ;,V,S) (3) H(V U) < I( ;,U,S) (4) H(U,V W) < I(, ; W,S) (5) H(U,V) < I(, ;) (6) where p(s,u,v,x,x,y) = p(s)p(u,v)p(x u,s)p(x v,s)p(y x,x ) (7) and w = f(u) = g(v) (8) is the common information in the sense of Witsenhausen, Gacs and Korner (see [0). The above region can be simplified if there is no common information between U and V as follows [7 H(U V) < I( ;,V) (9) H(V U) < I( ;,U) (0) H(U,V) < I(, ;) () where p(u,v,x,x,y) = p(u,v)p(x u)p(x v)p(y x,x ) () This achievable region was shown to be suboptimal by Dueck [8. Cover, El Gamal and Salehi [7 also provided a capacity result with both achievability and converse in the form of some incomputable n-letter mutual informations. Their result is restated in the following theorem. Theorem [7 The correlated sources (U,V) can be communicated reliably over the discrete memoryless multiple access channel p(y x,x ) if and only if [H(U V),H(V U),H(U,V) C n (3) n= where C n = [R,R,R 3 : R < n I(n ; n n,v n ) R < n I(n ; n n,un ) R 3 < n I(n,n ; n ) (4) 3

24 for some n n p(u n,v n,x n,xn,yn ) = p(x n un )p(x n vn ) p(u i,v i ) p(y i x i,x i ) (5) i= i= i.e., for some n and n that satisfy the Markov chain n Un V n n. Some recent results on the transmission of correlated sources over multiple access channels can be found in [9,0. 4. A New Outer Bound We propose a new outer bound for the multiple access channel with correlated sources as follows. Theorem 3 If a pair of i.i.d. sources (U,V) with joint distribution p(u,v) can be transmitted reliably through a discrete, memoryless, multiple access channel characterized by p(y x,x ), then H(U V) I( ;,U,Q) (6) H(V U) I( ;,V,Q) (7) H(U,V) I(, ; Q) (8) where random variables, and Q are such that n p(x,x,y,u n,v n,q) = p(q)p(x u n,q)p(x v n,q)p(y x,x ) p(u i,v i ) (9) where (U n,v n ) are n samples of the i.i.d. sources with n, U {U,...,U n } and V {V,...,V n } and both U and V contain finite number of elements. i= Proof: Consider a given block code of length n with the encoders f : U n n and f : V n n and decoder g : n U n V n. From Fano s inequality [9, p. 39, we have H(U n,v n n ) nlog U V P e + nǫ n (30) 4

25 Let G i be a permutation on the set {,...,n} (similarly on the set {U,...,U n }, and {V,...,V n }). We define 0 U i {G i (U k ) : U k U} (3) V i {G i (V k ) : V k V} (3) This definition provides that p(u i,v i ), the joint probabilities of U i and V i, are identical for i =,...,n. where For a code, for which P e 0, as n, we have ǫ n 0. Then, nh(u V) = H(U n V n ) = I(U n ; n V n )+H(U n n,v n ) I(U n ; n V n )+H(U n,v n n ) I(U n ; n V n )+nǫ n = H( n V n ) H( n U n,v n )+nǫ n = H( n n,v n ) H( n n,n,un,v n )+nǫ n 3 = H( n,v n n ) H( n, n )+nǫ n n [ 4 = H( i n,v n, i ) H( i i, i ) 5 6 = = i= i= i= [ H( i i,v i ) H( i i, i ) +nǫ n +nǫ n [ H( i i,v i ) H( i i, i,v i ) +nǫ n I( i ; i i,v i )+nǫ n (33) i=. from Fano s inequality in (30);. from the fact that n is the deterministic function of Un and n function of V n ; is the deterministic 3. from p(y n x n,xn,un,v n ) = p(y n x n,xn ); 4. from the chain rule and the memoryless nature of the channel; 0 Forexample,ifweletU = {U,U }andv = {V,V }andg () = 3andG () = 5,then, U = {U 3,U 5 } and V = {V 3,V 5 }. 5

26 5. from the property that conditioning reduces entropy; 6. from p(y i x i,x i,v i ) = p(y i x i,x i ). Using a symmetrical argument, we obtain nh(v U) I( i ; i i,u i )+nǫ n (34) i= Moreover, nh(u,v) = H(U n,v n ) = I(U n,v n ; n )+H(U n,v n n ) I(U n,v n ; n )+nǫ n I( n,n ; n )+nǫ n = H( n ) H( n, n )+nǫ n n [ = H( i i ) H( i i, i ) +nǫ n = i= i= [ H( i ) H( i i, i ) +nǫ n I( i, i ; i )+nǫ n (35) i= We introduce a time-sharing random variable Q [9, p. 397 as follows. Let Q be uniformly distributed on {,...,n} and be independent of U n, V n. Let the random variables and be such that where p(x i,x i u i,v i,u c i,vc i ) = p(x,x u,v,u c,v c,q = i) (36) U c {U,...,U n }\U (37) V c {V,...,V n }\V (38) U c i {G i(u k ) : U k U c } (39) V c i {G i(v k ) : V k V c } (40) 6

27 Then, I( i ; i i,v i ) = ni( ;,V,Q) (4) i= I( i ; i i,u i ) = ni( ;,U,Q) (4) i= I( i, i ; i ) = ni(, ; Q) (43) i= Combining (4), (4) and (43) with (33), (34) and (35) completes the proof. 4.3 A New Necessary Condition It can be shown that the outer bound in Theorem 3 is equivalent to the following { H R(S) co } R(p) (44) p S UV where H [H(U V),H(V U),H(U,V) (45) p p(x,x u,v) (46) S UV {p : U n V n,n } (47) R I( ;,V) R(p) [R,R,R 3 : R I( ;,U) (48) R 3 I(, ;) and co{ } represents the closure of the convex hull of the set argument. where From Section.5, we know that S UV S UV U U,V V S U V (49) S U V {p(x,x u,v) : λ i ( P u v ) λ ( P UV ),i =,...,min(, )} (50) Then, we obtain a single-letter outer bound for the multiple access channel with correlated sources as follows. Theorem 4 If a pair of i.i.d. sources (U,V) with joint distribution p(u,v) can be trans- 7

28 mitted reliably through a discrete, memoryless, multiple access channel characterized by p(y x,x ), then H(U V) I( ;,V,Q) (5) H(V U) I( ;,U,Q) (5) H(U,V) I(, ; Q) (53) where U {U,...,U n } and V {V,...,V n } are two sets containing finite letters of source samples, random variable Q independent of (U,V), and for random variables,, p(x,x u,v,q) such that, for any U U and V V, λ i ( P u v q) λ ( P UV ), i =,...,min(, ) (54) Equivalently, { H R(S ) co } R(p) (55) p S UV In the rest of this section, we will specialize our results to the case where we choose U = {U } and V = {V }. Here, we have the following definitions S out,3 S U V = {p(x,x u,v ) : U n V n } (56) and S out,4 = S S U S V S U V (57) where S {p(x,x u,v ) : λ i ( P ) λ ( P UV )} (58) S U {p(x,x u,v ) : λ i ( P u ) λ ( P UV )} (59) S V {p(x,x u,v ) : λ i ( P v ) λ ( P UV )} (60) S U V {p(x,x u,v ) : λ i ( P u v ) λ ( P UV )} (6) We note that when U = {U } and V = {V }, the expressions in (48) agree with those in the achievability scheme of Cover, El Gamal and Salehi when there is no common information, i.e., (9), (0), and (). Thus, the gap between the achievablity scheme of Cover, El Gamal and Salehi, and the converse in this paper results from the fact that the feasible sets for the conditional probability distribution p = p(x,x u,v) are different. In The notation S out,3, as well as S out,4 and S in in the sequel, is used in order to be consistent with the notations in Section 3. 8

29 the achievability scheme of Cover, El Gamal and Salehi, p belongs to S in {p(x,x u,v) : U V } (6) since for the achievability, we need U V. Whereas, in our converse, p S out,3 S out,4. Since U V implies U n V n and U n V n implies λ i ( P ) λ ( P UV ), λ i ( P u ) λ ( P UV ), λ i ( P v ) λ ( P UV ), and λ i ( P u v ) λ ( P UV ), we have S in S out,3 S out,4 (63) Therefore, when m =, even though the mutual information expressions in the achievability and the converse are the same, their actual values will be different, since they will be evaluated using the conditional probability distributions that belong to different feasible sets. 5 Conclusion In the distributed coding on correlated sources, the problem of describing a joint distribution involving an n-letter Markov chain arises. By means of spectrum analysis, we provided a new data processing inequality based on a new measure of correlation, which gave us a single-letter necessary condition for the n-letter Markov chain. We applied our results to two specific examples involving distributed coding of correlated sources: the multi-terminal rate-distortion region and the multiple access channel with correlated sources, and proposed two new outer bounds for these two problems. Appendices A An Illustrative Binary Example In this section, we will study a specific binary example in detail. The aims of this study are, first, to ilustrate the single-letter necessary condition we proposed for the n-letter Markov chain in Section.3, second, to develop a sharper necessary condition in this specific case, and finally, to compare different necessary conditions and a sufficient condition in this specific example. The binary example under consideration is as follows. Let U, V, and be binary random variables, which take values from {0,}. We assume that (U,V) are a pair of binary 9

30 symmetric sources, i.e., Pr(U = 0) = Pr(U = ) = Pr(V = 0) = Pr(V = ) = (64) From () and (5), we have P UV = [ [ +λ ( P UV )µ ( P UV )ν ( P UV ) T (65) Here we focus on the symmetric case, i.e., µ ( P UV ) = ν ( P UV ) = [ In addition, we assume the following marginal distributions for and, p = [ a p = a [ b b (66) (67) (68) where 0 a,b. Then, from () and (5), we have P = [ a a [ b b +λ ( P )µ ( P )ν ( P ) T (69) We note that µ ( P )ν ( P ) T = σ [ a a [ b b (70) where σ {, }. For the simplicity of the derivation in the sequel, we let λ = σλ ( P, ). Then, we have [ a [ P = b [ a [ b b +λ b a a From Theorem, we know that the entries of P are non-negative, i.e., (7) [ ab+λ ( a P = )( b ) a b λb a b a λa 0 (7) b ( a )( b )+λab 30

31 which implies that ξ λ ξ (73) where ξ min(a,b )min( a, b ) ab ( a )( b ) ξ min( a,b )min(a, b ) ab ( a )( b ) (74) (75) From Theorem 4, we have λ ( P UV ) λ λ ( P UV ) (76) Thus, from above, we have min(ξ,λ ( P UV )) λ min(ξ,λ ( P UV )) (77) A sharper bound in this special case can be obtained as follows. Theorem 5 If U n V n, and (,,U n,v n ) satisfies the above settings, then for sufficiently large n, ( min ξ,λ ( P UV ) +ξ ) ( λ min ξ,λ ( P UV ) +ξ ) The proof of Theorem 5 is given in Appendix B.. (78) The boundin (78) istighter thanthe onein(77) because ξ andtherefore +ξ. A similar argument holds for the other side of the inequality as well. In the above derivation, we provided two necessary conditions for the n-letter Markov chain U n V n, where n, in this special case of binary random variables. In other words, we provided two outer bounds for λ, where the joint distributions p(x,x,u n,v n ) satisfy the n-letter Markov chain U n V n with n and satisfy the fixed marginal distributions given in (67) and (68). For reference, we give a sufficient condition for U n V n, or equivalently, an inner bound for λ satisfying this n-letter Markov chain. This inner bound is obtained by noting that if (, ) satisfies U V, then it satisfies U n V n. In this case, using Theorem we have λ = λ L λ ( P UV )λ R (79) 3

32 where λ L and λ R are such that P U P V [ a a [ [ b [ b [ a [ +λ L a [ [ b +λ R b Due to the non-negativity of the matrices P U and P V, we have 0 (80) 0 (8) min(a, a ) a a λ L min(a, a ) a a (8) min(b, b ) b b λ R min(b, b ) b b (83) Thus, we have λ ( P UV )ξ 3 λ λ ( P UV )ξ 3 (84) where ξ 3 min(a, a )min(b, b ) ab ( a )( b ) (85) Then, combining (77), (78), and (84), we have the two outer bounds and one inner bound for λ as follows λ ( P UV )ξ 3 sup λ min(ξ,λ ( P UV ) +ξ ) min(ξ,λ ( P UV )) (86) U n V n min(ξ,λ ( P UV )) min(ξ,λ ( P UV ) +ξ ) inf λ λ ( P UV )ξ 3 (87) U n V n We illustrate these three bounds with λ ( P UV ) = 0.5 in Figure. B Proofs of Some Theorems B. Proof of Theorem 3 To find sup λ ( P U n), we need to exhaust the sets F(n,P ) with n. In the F(n,P ), n=,,... following, we show that it suffices to check only the asymptotic case. For any joint distribution P U n F(n,P ), we attach an independent U, say U n+, to the existing n-sequence, and get a new joint distribution P U n+ = P U n p U, where p U is the marginal distribution of U in the vector form. By arguments similar to those in Section.4, we have that λ i ( P U n+) = λ i( P U n). Therefore, for every P U n F(n,P ), there 3

33 outer bound outer bound inner bound b a b a b a Figure : (i) Outer bound, (ii) outer bound, and (iii) inner bound for λ. 33

34 exists some P U n+ F(n+,P ), such that λ i ( P U n+) = λ i( P Un). Thus, sup λ ( P U n) sup λ ( P Un+) (88) F(n,P ) F(n+,P ) From (88), we see that sup λ ( P Un) is monotonically non-decreasing in n. We also F(n,P ) note that λ ( P U n) is upper bounded by for all n, i.e., λ ( P Un). Therefore, sup λ ( P Un) = lim sup λ ( P Un) (89) F(n,P ), n=,,... n F(n,P ) To complete the proof, we need the following lemma. Lemma 3 [0 λ ( P ) = if and only if P decomposes. By P decomposes, we mean that there exist sets S, S, such that P(S ), P( S ), P(S ), P( S ) are positive, while P(( S ) S ) = P(S ( S )) = 0. In the following, we will show by construction that there exists a joint distribution that decomposes asymptotically. Foragivenmarginaldistribution P, wearbitrarilychooseasubset S fromthealphabet of with positive P(S ). We find a set S in the alphabet of U n such that P(S ) = P(S ) if it is possible. Otherwise, we pick S with positive P(S ) such that P(S ) P(S ) is minimized. We denote L(n) to be the set of all subsets of the alphabet of U n and we also define P max = maxpr(s) for all s U. Then, we have min P(S ) P(S ) Pmax n (90) S L(n) We construct a joint distribution for and U n as follows. First, we construct the joint distribution P i corresponding to the case where and U n are independent. Second, we rearrange the alphabets of and U n and group the sets S, S, S and U n S as follows P i = [ P i P i P i P i (9) where P i, P i, P i, P i correspond to the sets S S, S (U n S ), ( S ) S, ( S ) (U n S ),respectively. Here, weassumethatp(s ) P(S ). Then, wescalethese four sub-matrices as P = Pi P(S ), P P(S )P(S ) = 0, P = Pi (P(S ) P(S )), P ( P(S ))P(S ) = Pi ( P(S )), ( P(S ))( P(S )) and let [ P 0 P = (9) P P We note that P is a joint distribution for and U n with the given marginal distributions. 34

35 Next, we move the mass in the sub-matrix P to P, which yields [ [ [ P P 0 P 0 E 0 = P +E = + 0 P P P E 0 (93) where E P, E Pi (P(S ) P(S )) P(S )P(S ), and P = P P(S ) P(S ). We denote P and P U n as the marginal distributions of P. We note that P U n = P U n and P = P M where M is a scaling diagonal matrix. The elements in the set S are scaled up by a factor of P(S ) P(S ), and those in the set S are scaled down by a factor of P(S ) P(S ). Then, P = M P +M P EP U n (94) We will need the following lemmas in the remainder of our derivations. Lemma 5 can be proved using techniques similar to those in the proof of Lemma 4 [. Lemma 4 [ If A = A+E, then λ i (A ) λ i (A) E, where E is the spectral norm of E. Lemma 5 If A = MA, where M is an invertible matrix, then M λ i (A )/λ i (A) M. Since P decomposes, using Lemma 3, we conclude that λ ( P ) =. We upper bound P EP U n as follows, P EP U n P EP U n F (95) where F is the Frobenius norm. Combining (9) and (93), we have P EP U n F (P(S ) P(S )) P P(S ) P P i P U n F (96) where P min(p(s ), P(S )). Since P i corresponds to the independent case, we have P P i P U n F = from (5). Then, from (90), (95) and (96), we obtain where c P P(S ). From Lemma, we have M P EP U n = λ (M P EP U n ) P EP U n c P n max (97) ( ) P(S ) c Pmax n P(S ) c Pmax n (98) 35

36 From Lemma 4, we have We upper bound M as follows M = P(S ) P(S ) + c P n max λ (M P) +c P n max (99) P(S ) P(S ) P(S ) Similarly, M c 4 P n/ max. From Lemma 5, we have + Pn/ max P(S ) +c 3Pmax n/ (00) ( c 4 P n/ max) λ ( P) λ (M P) (+c 3 P n/ max) (0) Since P is a joint distribution matrix, from Theorem, we know that λ ( P). Therefore, we have ( c 4 P n/ max )( c P n max ) λ ( P) (0) When P max <, corresponding to the non-trivial case, lim n P n/ max = 0, and using (89), (3) follows. The case P(S ) < P(S ) can be proved similarly. B. Proof of Theorem 5 From (65), we know P UV = [ [ +λ ( P UV ) [ [ (03) From (9), we know n P U n V n = P UV = n [. + n i= λ ( P UV ) l i µ i ( P U n V n)νt i ( P U n V n) (04) where l i {,,...,n}, for i =,..., n. Due to the symmetric structure of P U n V n, we have µ i ( P U n V n) = ν i( P U n V n), i =,...,n (05) 36

A Single-letter Upper Bound for the Sum Rate of Multiple Access Channels with Correlated Sources

A Single-letter Upper Bound for the Sum Rate of Multiple Access Channels with Correlated Sources Wei Kang Sennur Ulukus Department of Electrical and Computer Engineering University of Maryland, College