Group, Lattice and Polar Codes for Multi-terminal Communications

Size: px

Start display at page:

Download "Group, Lattice and Polar Codes for Multi-terminal Communications"

Stuart Tucker
6 years ago
Views:

1 Group, Lattice and Polar Codes for Multi-terminal Communications by Aria Ghasemian Sahebi A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Electrical Engineering - Systems) in the University of Michigan 204 Doctoral Committee: Associate Professor S. Sandeep Pradhan, Chair Associate Professor Achilleas Anastasopolous Professor Robert L. Griess, Jr Professor David L. Neuhoff

3 To my family. ii

4 ACKNOWLEDGEMENTS It is a pleasure to thank the many people who made this thesis possible. I would like to express my very great appreciations to my advisor Professor Sandeep Pradhan for his patient guidance, enthusiastic encouragement and constant support. I have been very fortunate to have an advisor who gave me the freedom to explore on my own, and at the same time, the guidance I needed. I am deeply grateful to him for the long discussions that helped me sort out the technical details of my work. I cannot express enough thanks to my thesis committee members Professor Achilleas Anastasopoulos, Professor Robert Griess and Professor David Neuhoff for their interest in my research and for helping me have the knowledge base needed for my research. I wish to thank Professors Neuhoff and Anastasopoulous for excellent courses in Source Coding theory and Channel Coding theory and I am indebted to Professor Griess for teaching me Abstract Algebra and for his patience in answering my questions. I would like to express my deep gratitude to Professor Demos Teneketzis for being an excellent teacher and for his support over the years. I would also like to thank Professors Alfred Hero, Raj Rao Nadakuditi, Clayton Scott, Kim Winick, Mingyan Liu, Silvio Savarese, Andrew Yagle, Gregory Wakefield, Amir Mortazawi, Joseph Conlon, Edwin Romeijn and Roman Vershynin for their high standards of teaching and mentorship. I wish to thank Becky Turanski, Karen Liska, Shelly Feldkamp, Ann Pace and Beth Lawson for efficiently and cheerfully helping me deal with administrative iii

5 matters. I have thoroughly enjoyed living in Ann Arbor and that is largely due to the wonderful friends I have had. A big thanks to Hamed, Kaveh, Curtis, Lisa, Dimitris, Molly, Raza, Mahmood, Ali Kakhbod, Ali Nazari, Nima and all of my friends at Michigan. Arun, Farhad, Mohsen, Deepanshu, Raj and David have been great officemates. Lastly, I thank my family for their constant encouragement without which this dissertation would not have been possible. iv

6 TABLE OF CONTENTS DEDICATION ii ACKNOWLEDGEMENTS iii LIST OF FIGURES viii ABSTRACT x CHAPTER I. Introduction II. Abelian Group Codes for Point-to-Point Communications Preliminaries The Ensemble of Abelian Group Codes A Characterization of Abelian Groups The Image Ensemble The Performance of Abelian Group Codes Definitions Main Results Interpretation of the Results Proof for the Source Coding Problem The Coding Scheme Error Analysis Proof for the Channel Coding Problem The Coding Scheme Error Analysis Simplification of the Result for Symmetric Channels Examples Examples for Source Coding Examples for Channel Coding Appendix v

7 III. Abelian Group Codes for Multi-terminal Communications Nested Codes for Channels with State Information Nested Random/Group Codes for Channel Coding Nested Group/Random Codes for Channel Coding Nested Codes for Sources with Side Information Nested Random/Group Codes for Source Coding Nested Group/Random Codes for Source Coding Distributed Source Coding Preliminaries The Main Result The Coding Scheme Error Analysis Examples Appendix The 3-User Interference Channel Problem Definition and the Coding Scheme Error Analysis IV. Non-Abelian Group Codes Introduction Preliminaries Group Codes over D 2p The Ensemble of Codes Examples: Non-Abelian Group Codes Can Have a Good Performance Example : Point-to-Point Problem Example 2: Computation Over MAC V. Lattice Codes for Multi-terminal Communications Nested Lattices for Point-to-Point Communications Preliminaries Nested Lattice Codes for Channel Coding Nested Lattice Codes for Source Coding Appendix Distributed Source Coding The Main Result Examples Appendix VI. Polar Codes for Point-to-Point Communications vi

8 6. Polar Codes for Arbitrary DMCs Preliminaries Motivating Examples Polar Codes Over Channels with input Z p r Polar Codes Over Arbitrary Channels Relation to Group Codes Appendix Polar Codes for Arbitrary DMSs Preliminaries Polar Codes for Sources with reconstruction alphabet Z p r Arbitrary Reconstruction Alphabets Nested Polar Codes for Point-to-Point Communications The Lossy Source Coding Problem The Channel Coding Problem VII. Polar Codes for Multi-terminal Communications Introduction Distributed Source Coding: The Berger-Tung Problem Distributed Source Coding: Decoding the Sum of Variables Multiple Access Channels Computation over MAC The Broadcast Channel Multiple Description Coding Other Problems and Discussion BIBLIOGRAPHY vii

9 LIST OF FIGURES Figure 3. Comparison of the performance of random codes vs. group codes for a distributed source coding problem A simple channel with input D Two user MAC: Computation of D 6 operation Channel : The input of the channel has the structure of the group Z The behavior of I(W b b 2 b n ) for n = 4 for Channel when ɛ = 0.4 and λ = The asymptotic behavior of I(W b b 2 b n ), N = 2 n = 2 4, 2 8, 2 2, 2 4 for Channel when the data is sorted Channel 2: A channel with a composite input alphabet size Polarization of Channel 2 with parameters γ = 0, ɛ = 0.4, λ = Polarization of Channel 2 with parameters γ = 0.4, ɛ = 0, λ = Channel Source Coding: Test channel for the inner code Source Coding: Test channel for the outer code Channel Coding: Channel for the inner code Channel Coding: Channel for the outer code Distributed Source Coding: Test channel for the inner code viii

10 7.2 Distributed Source Coding: Test channel for the outer code Korner-Marton Problem, Terminal X: Test channel for the inner code Korner-Marton Problem, Terminal X: Test channel for the outer code Korner-Marton Problem, Terminal Y: Test channel for the inner code Korner-Marton Problem, Terminal Y: Test channel for the outer code Multiple-Access Channels: Channel for inner code Multiple-Access Channels: Channel for outer code Computation Over MAC, Termianl X: Channel for inner code Computation Over MAC, Termianl X: Channel for outer code Computation Over MAC, Termianl Y: Channel for inner code Computation Over MAC, Termianl Y: Channel for outer code Broadcast Channels: Test channel for inner code Broadcast Channels: Test channel for outer code Multiple Description Coding, Terminal X: Channel for inner code Multiple Description Coding, Terminal X: Channel for outer code Multiple Description Coding, Terminal Y: Channel for inner code Multiple Description Coding, Terminal Y: Channel for outer code Three User MAC: Channel for inner code Three User MAC: Channel for outer code ix

11 ABSTRACT Group, Lattice and Polar Codes for Multi-terminal Communications by Aria Ghasemian Sahebi Chair: S. Sandeep Pradhan We study the performance of algebraic codes for multi-terminal communications. This thesis consists of three parts: In the first part, we analyze the performance of group codes for communications systems. We observe that although group codes are not optimal for point-to-point scenarios, they can improve the achievable rate region for several multi-terminal communications settings such as the Distributed Source Coding and Interference Channels. The gains in the rates are particularly significant when the structure of the source/channel is matched to the structure of the underlying group. In the second part, we study the continuous alphabet version of group/linear codes, namely lattice codes. We show that similarly to group codes, lattice codes can improve the achievable rate region for multi-terminal problems. In the third part of the thesis, we present coding schemes based on polar codes to practically achieve the performance limits derived in the two earlier parts. We also present polar coding schemes to achieve the known achievable rate regions for multi-terminal communications problems such as the Distributed Source Coding, the Multiple Description Coding, Broadcast Channels, Interference Channels and Multiple Access Channels. x

12 CHAPTER I Introduction Approaching information theoretic performance limits of communications using structured codes has been of great interest for the last several decades. The earlier attempts to design computationally efficient encoding and decoding algorithms for point-to-point communication (both channel coding and source coding) resulted in injection of finite field structures to the coding schemes. In the channel coding problem, the channel input alphabets are replaced with algebraic fields and encoders are replaced with matrices. Similarly in source coding problem, the reconstruction alphabets are replaced with a finite fields and decoders are replaced with matrices. Later, these coding approaches were extended to weaker algebraic structures such as rings and groups [ 3, 7, 8, 9, 23, 24, 28, 29, 35, 42, 45]. The motivation for this are two fold: a) Finite fields exist only for alphabets with size equal to a prime power, and b) For communication under certain constraints, codes with weaker algebraic structures have better properties. For example, when communicating over an additive white Gaussian noise channel with 8-PSK constellation, codes over Z 8, the cyclic group of size 8, are more desirable over binary linear codes because the structure of the code is matched to the structure of the signal set [2], and hence the former have superior error correcting properties. As another example, construction of polar Note that this is an incomplete list. There is a vast body of work on group codes. See [9] for a more complete bibliography.

13 codes over alphabets of size p r, for r > and p prime, is simpler with a module structure rather than a vector space structure [54,58,66]. Subsequently, as interest in network information theory grew, these codes were used to approach the informationtheoretic performance limits of certain special cases of multi-terminal communication problems [26, 57, 75, 78]. These limits were obtained earlier using the random coding ensembles in the information theory literature. In 979, Korner and Marton, in a significant departure from tradition, showed that for a binary distributed source coding problem, the asymptotic average performance of binary linear code ensembles can be superior to that of the standard random coding ensembles. Although structured codes were being used in communication mainly for computational complexity reasons, the duo showed that, in contrast, even when computational complexity is not an issue, the use of structured codes leads to superior asymptotic performance limits in multi-terminal communication problems. In the recent past, such gains were shown for a wide class of problems [4, 42, 50, 56, 72]. In [42, 60], an inner bound to the optimal rate-distortion region for the distributed source coding problem is developed in which Abelian group codes were used as building blocks in the coding schemes. Similar coding approaches were applied for the interference channel and the broadcast channel in [52,53]. The motivation for studying Abelian group codes beyond the non-existence of finite fields over arbitrary alphabets is the following: The algebraic structure of the code imposes certain restrictions on the performance. For certain problems, linear codes were shown to be not optimal [42], and finite Abelian group codes exhibit a superior performance. For example, consider a distributed source coding problem with two statistically correlated but individually uniform quaternary sources X and Y that are related via the relation X = Y + Z, where + denotes addition modulo-4 and Z is a hidden quaternary random variable that has a non-uniform distribution and is independent of Y. The joint decoder wishes to reconstruct Z losslessly. In this problem, Abelian group codes over the cyclic group 2

14 Z 4 perform better than linear codes over the Galois field of size 4. In summary, the main reason for using algebraic structured codes in this context is performance rather than complexity of encoding and decoding. Hence information-theoretic characterizations of asymptotic performance of Abelian group code ensembles for various communication problems and under various decoding constraints became important. Such performance limits have been characterized in certain special cases. It is well-known that binary linear codes achieve the capacity of binary symmetric channels [25]. More generally, it has also been shown that q-ary linear codes can achieve the capacity of symmetric memoryless channels [23] and linear codes can be used to compress a source losslessly down to its entropy [4]. Goblick [3] showed that binary linear codes achieve the rate-distortion function of binary uniform sources with Hamming distortion criterion. Group codes were first studied by Slepian [68] for the Gaussian channel. In [6], the capacity of group codes for certain classes of channels has been computed. Further results on the capacity of group codes were established in [7, 8]. The capacity of group codes over a class of channels exhibiting symmetries with respect to the action of a finite Abelian group has been investigated in [8]. The thesis is organized as follows: Chapter II is devoted to the introduction of finite Abelian group codes and the performance of such codes is studied for point-topoint problems. We study both the channel coding and the source coding problems for arbitrary discrete memoryless sources and channels. Our contribution in this chapter is the source coding result and the generality of the channel coding result. Furthermore, we employ joint encoding and decoding schemes based on joint typicality, resulting in a simplified derivation. The existing results on the performance of Abelian group codes include [8] in which the performance of Abelian group codes over symmetric channels is investigated and [42] in which Z p r alphabets are considered. In Chapter III, we study the performance of these codes for some multi-terminal 3

15 problems. We derive an achievable rate region for the distributed source coding problem and interference channels as examples of multi-terminal problems in which structured codes prove to be superior to traditional random codes. Further results for other problems can be obtained in the same fashion. In Chapter IV, we consider a class of non-abelian groups and investigate the coding performance of codes over these groups. The contribution of this chapter is the characterization of the ensemble of all group codes over Dihedral groups and showing that the average performance of this ensemble can be superior to other coding schemes for some examples. Lattice codes are the analogue of linear/group codes for the the case where the channel inputs or the source reconstructions take values from a continuous alphabet (R for example). In Chapter V, we discuss the performance of lattice codes for some multi-terminal communications problems namely the Gelfand-Pinsker, the Wyner- Ziv and the distributed source coding problems. For the Gelfand-Pinsker and the Wyner-Ziv problems, we show that nested lattice codes are optimal. For the distributed source coding problem, we derive an achievable rate region which is strictly larger that known achievable regions. In Chapters VI and VII, we study polar codes. Polar codes were originally introduced as linear codes achieving the symmetric capacity of channels with binary input alphabets. They were later generalized to achieve the symmetric rate-distortion function for sources with binary reconstructions. Traditionally in information and coding theory, random ensembles of codes are considered and the average performance over the ensemble is evaluated. In contrast, polar codes constitute the first class of codes with an explicit construction with a (sub)-optimal performance. We make the observation that polar codes can be extended for arbitrary DMCs and DMSs if they are considered as nested group codes. In other words, we extend polar codes to achieve 4

16 the symmetric capacity of arbitrary DMCs and the symmetric rate-distortion function for arbitrary DMSs. Our next contribution is to show that polar codes are optimal (in the Shannon sense) for both channel coding and source coding problems. We also show that polar codes are optimal for many multi-terminal communication problems in the sense that they achieve the best known achievable rate regions for such problems. This includes the distributed source coding problem, the Korner-Marton problem, the multiple access channel, and computation over MAC. For the broadcast channel, we show that polar codes achieve Marton s inner bound with one additional constraint on auxiliary random variables. In summary, the thesis consists of results on the performance of structured codes for the following problems: Abelian group codes: Point-to-point channel coding Lossy source coding Distributed source coding The interference channel Non-Abelian group codes (over D 2p only): Point-to-point channel coding Computation over multiple-access channels Lattice Codes Point-to-point channel coding problem with state information (Gelfand- Pinsker problem). 5

17 Lossy source coding with state information (Wyner-Ziv problem) Distributed source coding Polar codes Point-to-point channel coding (symmetric capacity) Lossy source coding (Symmetric rate-distortion) Point-to-point channel coding (Shannon capacity) Lossy source coding (Shannon rate-distortion) Distributed source coding (Berger-Tung and Korner-Marton problems) Multiple access channels and computation over MAC Broadcast channels Multiple descriptions Coding 6

18 CHAPTER II Abelian Group Codes for Point-to-Point Communications In this chapter, we focus on two problems. First, we consider the lossy source coding problem for arbitrary discrete memoryless sources in which the distortion in measured using a single-letter criterion and the reconstruction alphabet is equipped with the structure of a finite Abelian group G. We derive an upper bound on the achievable rate-distortion function using group codes over G of some arbitrarily large block-length n. The average performance of the ensemble is shown to be the symmetric rate-distortion function of the source when the underlying group is a field i.e. the Shannon rate-distortion function with the additional constraint that the reconstruction variable is uniformly distributed. For the general case, it turns out that several additional terms appear corresponding to subgroups of the underlying group in the form of a maximization and this can result in a larger rate compared to the symmetric rate for a given distortion level. In the second part, we consider the channel coding problem for arbitrary discrete memoryless channels. Without a loss of generality, we assume that the channel input alphabet is equipped with the structure of a finite Abelian group G. We derive a lower bound on the capacity of such channels achievable using group codes which are subgroups of G n. We show that the achievable rate is equal to the symmetric 7

19 capacity of the channel when the underlying group is a field; i.e., it is equal to the Shannon mutual information between the channel input and the channel output when the channel input is uniformly distributed. Similarly to the source coding problem, we show that in the general case, several additional terms appear corresponding to subgroups of the underlying group in the form of a minimization and the achievable rate can be smaller than the symmetric capacity of the channel. It can be noted that the bounds on the performance limits as mentioned above apply to any arbitrary discrete memoryless case. Moreover, we use joint typicality encoding and decoding [2] for both problems at hand. This will make the analysis more tractable. In this approach we use a synergy of information-theoretic and grouptheoretic tools. The traditional approaches have looked at encoding and decoding of structured codes based on either minimum distance or maximum likelihood. We introduce two information quantities that capture the performance limits achievable using Abelian group codes that are analogous to the mutual information which captures the Shannon performance limits when no algebraic structure is enforced on the codes. They are source coding group mutual information and channel coding group mutual information. The converse bounds for both problems will be addressed in a future work. 2. Preliminaries The Source Model The source is modeled as a discrete-time memoryless random process with each sample taking values from a finite set X called alphabet according to the distribution p X. The reconstruction alphabet is denoted by U and the quality of reconstruction is measured by a single-letter distortion functions d : X U R +. We denote this source by (X, U, p X, d). For two sequences x = (x,, x n ) X n and u = 8

20 (u,, u n ) U n, with a slight abuse of notation, we denote the average distortion by d(x, u) = n n d(x i, u i ) i= The Channel Model We consider discrete memoryless channels used without feedback. We associate two finite sets X and Y with the channel as the input and output alphabets. The input-output relation of the channel is characterized by a conditional probability law W Y X (y x) for x X and y Y. The channel is specified by (X, Y, W Y X ). Review of Groups In this section, we review some of the basic concepts of group theory. For a more complete discussion, we refer the reader to [6]. A group (G, +) is a set G together with a binary operation + such that For all a, b G, a + b G. For all a, b, c G, a + (b + c) = (a + b) + c. There exists 0 G such that a a = a G. For all a G, there exists b G such that a + b = b + a = 0. If in addition to the above, the following condition is satisfied For all a, b G, a + b = b + a. then the group is called Abelian. We focus on finite groups, i.e., groups whose set is finite. When the group operation is clear from the context, we sometimes denote the group (G, +) simply as G. Given a group G, a subset H of G is called a subgroup of G if it is closed under the group operation. In this case, (H, +) is a group in its own 9

21 right. This is denoted by H G. A coset C of a subgroup H is a shift of H by an arbitrary element a G (i.e., C = a + H for some a G). For a subgroup H of G, cosets of H is G form a partition of G. The number of cosets of H in G is called the index of H in G and is denoted by G : H. The index of H in G is equal to G / H where G and H are the cardinalities of G and H respectively. If G is a power of a prime p, we say G is a p-group. For a prime p dividing the cardinality of G, a Sylow-p subgroup of G is a subgroup of G whose cardinality is a power of p which is not contained in another p-subgroup of G. Given two groups (G, + G ) and (K, + K ), a mapping φ : G K is called a homomorphism if for all a, b G, φ(a + G b) = φ(a) + K φ(b). The groups G and K are called to be isomorphic if there exists a bijective homomorphism φ between G and K. In this case, we write G = K. All groups referred to in this chapter are Abelian groups. Group Codes Given a group G, a group code C over G with block length n is any subgroup of G n. A shifted group code over G, C + b is a translation of a group code C by a fixed vector b G n. When the underlying group G is a finite field, the group code is a subspace over G and is called a linear code. Group codes generalize the notion of linear codes over fields to sources with reconstruction alphabets (and channels with input alphabets) having composite sizes. Achievability for Source Coding and the Rate-Distortion Function For a group G, a group transmission system with parameters (n, Θ,, τ) for compressing a given source (X, U = G, P X, d) consists of a codebook C, an encoding mapping Enc( ), and a decoding mapping Dec. The codebook C is a shifted group 0

22 code over G whose size is equal to Θ and the mappings are defined as Enc : X n {, 2,..., Θ}, Dec : {, 2,..., Θ} C such that [ ( P d X n, Dec ( Enc(X n ) )) ] > τ where X n is the random vector of length n generated by the source. In this transmission system, n denotes the block length, log Θ denotes the number of channel uses, denotes the distortion level, and τ is a bound on the probability of exceeding the distortion level. Given a source (X, U = G, P X, d), a pair of non-negative real numbers (R, D) is said to be achievable using group codes if for every ɛ > 0 and for all sufficiently large numbers n, there exists a group transmission system with parameters (n, Θ,, τ) for compressing the source such that log Θ R + ɛ, D + ɛ, τ ɛ n The optimal group rate-distortion function R (D) of the source is given by the infimum of the rates R such that (R, D) is achievable using group codes. Achievability for Channel Coding For a group G, a group transmission system with parameters (n, Θ, τ) for reliable communication over a given channel (X = G, Y, W Y X ) consists of a codebook C, an encoding mapping Enc( ), and a decoding mapping Dec( ). The codebook C is a shifted subgroup of G n group code over G whose size is equal to Θ and the mappings are defined as Enc : {, 2,, Θ} C Dec : Y n {, 2,, Θ}

23 such that Θ m= Θ y:dec(y) m W n (y Enc(m)) τ or equivalently, Θ m= Θ x X n {x=enc(m)} y Y n W n (y x) {m Dec(y)} τ Given a channel (X = G, Y, W Y X ), the rate R is said to be achievable using group codes if for all ɛ > 0 and for all sufficiently large n, there exists a group transmission system for reliable communication with parameters (n, Θ, τ) such that log Θ R ɛ, n τ ɛ The group capacity of the channel C is defined as the supremum of the set of all achievable rates using group codes. Typicality Consider two random variables X and Y with joint probability mass function p XY (x, y) for (x, y) X Y. Let n be an integer and let ɛ be a positive real number. The sequence pair (x, y) belonging to X n Y n is said to be jointly ɛ-typical with respect to p XY if a X, b Y : n N (a, b x, y) p XY (a, b) ɛ X Y and none of the pairs (a, b) with p XY (a, b) = 0 occurs in (x, y). Here, N(a, b x, y) counts the number of occurrences of the pair (a, b) in the sequence pair (x, y). We denote the set of all jointly ɛ-typical sequence pairs in X n Y n by A n ɛ (X, Y ). Given a sequence x A n ɛ (X), the set of conditionally ɛ-typical sequences A n ɛ (Y x) is defined as A n ɛ (Y x) = {y Y n (x, y) A n ɛ (X, Y )} 2

24 Notation In our notation, P is the set of all primes, Z + is the set of positive integers, R + is the set of non-negative reals, and for a prime p and a positive integer r, Z p r is the cyclic group of order p r. Since we deal with summations over several groups in this thesis, when not clear from the context, we indicate the underlying group in each (G) {}}{ summation, e.g., summation over the group G is denoted by. Direct sum of groups is denoted by and direct product of sets is denoted by. 2.2 The Ensemble of Abelian Group Codes In this section, we use a standard characterization of Abelian groups and introduce the ensemble of Abelian group codes used in the thesis A Characterization of Abelian Groups For an Abelian group G, let P(G) denote the set of all prime divisors of G and for a prime p P(G) let S p (G) be the corresponding Sylow subgroup of G. It is known [36, Theorem 3.3.] that any Abelian group G can be decomposed into a direct sum of its Sylow subgroups in the following manner G = S p (G) (2.) p P(G) Furthermore, each Sylow subgroup S p (G) can be decomposed into Z p r groups as follows: S p (G) = r R p(g) Z Mp,r p r (2.2) where R p (G) Z + and for r R p (G), M p,r is a positive integer. Note that Z Mp,r p r is defined as the direct sum of the ring Z p r with itself for M p,r times. Combining 3

25 Equations (2.) and (2.2), we can represent any Abelian group as follows: G = p P(G) r R p(g) Z Mp,r p r = M p,r p P(G) r R p(g) m= Z (m) p r (2.3) where Z (m) p r is called the mth Z p r ring of G or the (p, r, m) th ring of G. Equivalently, this can be written as follows G = (p,r,m) G (G) where G (G) P Z + Z + is defined as: Z (m) p r G (G) = {(p, r, m) P Z + Z + p P(G), r R p (G), m {, 2,, M p,r }} This means that any element a of the Abelian group G can be regarded as a vector whose components are indexed by (p, r, m) G (G) and whose (p, r, m) th component a p,r,m takes values from the ring Z p r. With a slight abuse of notation, we represent an element a of G as a = (p,r,m) G (G) a p,r,m Furthermore, for two elements a, b G, we have a + b = a p,r,m + p r b p,r,m (p,r,m) G (G) where + denotes the group operation and + p r denotes addition mod-p r. More generally, let a, b,, z be any number of elements of G. Then we have a + b + + z = (a p,r,m + p r b p,r,m + p r + p r z p,r,m ) (2.4) (p,r,m) G (G) This can equivalently be written as [a + b + + z] p,r,m = a p,r,m + p r b p,r,m + p r + p r z p,r,m where [ ] p,r,m denotes the (p, r, m) th component of it s argument. 4

26 Let I G:p,r,m G be a generator for the group which is isomorphic to the (p, r, m) th ring of G. Then we have a = (G) {}}{ (p,r,m) G (G) a p,r,m I G:p,r,m (2.5) where the summations are done with respect to the group operation and the multiplication a p,r,m I G:p,r,m is by definition the summation (with respect to the group operation) of I G:p,r,m to itself for a p,r,m times. In other words, a p,r,m I G:p,r,m is the short hand notation for a p,r,m I G:p,r,m = where the summation is the group operation. (G) {}}{ i {,,a p,r,m} I G:p,r,m Example: Let G = Z 4 Z 3 Z 2 9. Then we have P(G) = {2, 3}, S 2 (G) = Z 4 and S 3 (G) = Z 3 Z 2 9, R 2 (G) = {2}, R 3 (G) = {, 2}, M 2,2 =, M 3, =, M 3,2 = 2 and G (G) = {(2, 2, ), (3,, ), (3, 2, ), (3, 2, 2)} Each element a of G can be represented by a quadruple (a 2,2,, a 3,,, a 3,2,, a 3,2,2 ) where a 2,2, Z 4, a 3,, Z 3 and a 3,2,, a 3,2,2 Z 9. Finally, we have I G:2,2, = (, 0, 0, 0), I G:3,, = (0,, 0, 0), I G:3,2, = (0, 0,, 0), I G:3,2,2 = (0, 0, 0, ) so that Equation (2.5) holds. In the following section, we introduce the ensemble of Abelian group codes which we use in this chapter The Image Ensemble Recall that for a positive integer n, an Abelian group code of length n over the group G is a subgroup of G n. Our ensemble of codes consists of all Abelian group codes over G, i.e., we consider all subgroups of G n. We use the following fact to characterize all subgroups of G n : 5

27 Lemma II.. For an Abelian group G, let φ : J G be a homomorphism from some Abelian group J to G. Then φ(j) G, i.e., the image of the homomorphism is a subgroup of G. Moreover, for any subgroup H of G there exists a corresponding Abelian group J and a homomorphism φ : J G such that H = φ(j). Proof. The first part of the lemma is proved in [6, Theorem 2-]. For the second part, Let J be isomorphic to H and let φ be the identity mapping (more rigorously, let φ be the isomorphism between J and H). In order to use the above lemma to construct an ensemble of subgroups of G n, we need to identify all groups J from which there exist non-trivial homomorphisms to G n. Then the above lemma implies that for each such J and for each homomorphism φ : J G n, the image of the homomorphism is a group code over G of length n and for each group code C G n, there exists a group J and a homomorphism such that C is the image of the homomorphism. This ensemble corresponds to the ensemble of linear codes characterized by their generator matrix when the underlying group is a field of prime size. Note that as in the case of standard ensembles of linear codes, the correspondence between this ensemble and the set of Abelian group codes over G of length n may not be one-to-one. Let G and J be two Abelian groups with decompositions: G = J = (p,r,m) G ( G) (q,s,l) G (J) Z (m) p r and let φ be a homomorphism from J to G. For (q, s, l) G (J) and (p, r, m) G ( G), let Z (l) q s g (q,s,l) (p,r,m) = [φ(i J:q,s,l )] p,r,m 6

28 where I J:q,s,l J is the standard generator for the (q, s, l) th ring of J and [φ(i J:q,s,l )] p,r,m is the (p, r, m) th component of φ(i J:q,s,l ) G. For a = (q,s,l) G (J) a q,s,l J, let b = φ(a) and write b = (p,r,m) G ( G) b p,r,m. Note that as in Equation (2.5), we can write: a = = (J) {}}{ (q,s,l) G (J) (J) {}}{ a q,s,l I J:q,s,l (J) {}}{ (q,s,l) G (J) i {,,a q,s,l } I J:q,s,l where the summations are the group summations. We have b p,r,m = [φ(a)] p,r,m (J) {}}{ = φ (a) = (b) = (c) = = (J) {}}{ (q,s,l) G (J) i {,,a q,s,l } ( G) {}}{ ( G) {}}{ (q,s,l) G (J) i {,,a q,s,l } (Z p r ) {}}{ (q,s,l) G (J) i {,,a q,s,l } (Z p r ) {}}{ (q,s,l) G (J) (Z p r ) {}}{ (q,s,l) G (J) (Z p r ) {}}{ I J:q,s,l φ (I J:q,s,l ) [φ (I J:q,s,l )] p,r,m a q,s,l [φ (I J:q,s,l )] p,r,m a q,s,l g (q,s,l) (p,r,m) Note that (a) follows since φ is a homomorphism; (b) follows from Equation (2.4); and (c) follows by using a q,s,l [φ (I J:q,s,l )] p,r,m as the short hand notation for the summation of [φ (I J:q,s,l )] p,r,m to itself for a q,s,l times. Note that g (q,s,l) (p,r,m) represents the effect of the (q, s, l) th component of a on the (p, r, m) th component of b dictated by the homomorphism. This means that the p,r,m p,r,m 7

29 homomorphism φ can be represented by φ(a) = (Z p r ) {}}{ (p,r,m) G ( G) (q,s,l) G (J) a q,s,l g (q,s,l) (p,r,m) (2.6) where a q,s,l g (q,s,l) (p,r,m) is the short-hand notation for the mod-p r addition of g (q,s,l) (p,r,m) to itself for a q,s,l times. We have the following lemma on g (q,s,l) (p,r,m) : Lemma II.2. For a homomorphism described by (2.6), we have g (q,s,l) (p,r,m) = 0 g (q,s,l) (p,r,m) p r s Z p r If p q If p = q, r s Moreover, any mapping described by (2.6) and satisfying these conditions is a homomorphism. Proof. The proof is provided in Appendix This lemma implies that in order to construct a subgroup of G, we only need to consider homomorphisms from an Abelian group J to G such that P(J) P( G) since if for some (q, s, l) G (J), q / P( G) then φ(a) would not depend on a q,s,l. For p P( G), define r p = max R p (G) (2.7) We show that we can restrict ourselves to J s such that for all (q, s, l) G (J), s r q. For (q, s, l) G (J), assume s > r p. Then for all (p, r, m) G ( G), if p = q, we have s > r. Let (p, r, m) G ( G) be such that p = q. Since g (q,s,l) (p,r,m) Z p r and r r q, we have ( aq,s,l g (q,s,l) (p,r,m) ) (mod p r ) = ( (a q,s,l ) (mod p r )g (q,s,l) (p,r,m) ) = ( (a q,s,l ) (mod p rq )g (q,s,l) (p,r,m) ) (mod p r ) (mod p r ) 8

30 This implies that for all a J and all (q, s, l) G (J), in the expression for the (p, r, m) th component of φ(a) with p = q, a q,s,l appears as (a q,s,l ) (mod q rq ). Therefore, it suffices for a q,s,l to take values from Z q rq and this happens if s r q. To construct Abelian group codes of length n over G, let G = G n. we have G n = Define J as Z nmp,r p P(G) r R p J = q P(G) s= p r = r q for some positive integers k q,s. nm p,r p P(G) r R p m= Z kq,s q s = r q k q,s q P(G) s= l= Example: Let G = Z 8 Z 9 Z 5. Then we have Z (m) p r = (p,r,m) G (G n ) Z (l) q s = (q,s,l) G (J) Z (m) p r (2.8) Z (l) q s (2.9) J = Z k 2, 2 Z k 2,2 4 Z k 2,3 8 Z k 3, 3 Z k 3,2 9 Z k 5, 5 Define k = q P(G) s= r q k q,s and w q,s = kq,s k for q P(G) and s =,, r q so that we can write J = r q q P(G) s= for some constants w q,s adding up to one. kw q,s l= Z (l) q s (2.0) The ensemble of Abelian group encoders consists of all mappings φ : J G n of the form φ(a) = (Z p r ) {}}{ (p,r,m) G (G n ) (q,s,l) G (J) a q,s,l g (q,s,l) (p,r,m) (2.) 9

31 for a J where g (q,s,l) (p,r,m) = 0 if p q, g (q,s,l) (p,r,m) is a uniform random variable over Z p r if p = q, r s, and g (q,s,l) (p,r,m) is a uniform random variable over p r s Z p r if p = q, r s. The corresponding shifted group code is defined by where B is a uniform random variable over G n. C = {φ(a) + B a J} (2.2) Remark II.3. An alternate approach to characterizing Abelian group codes is to consider kernels of homomorphisms (the kernel ensemble). To construct an ensemble of Abelian group codes in this manner, let φ be a homomorphism from G n into J such that for a G n, where g (p,r,m) (q,s,l) φ(a) = (Z q s) {}}{ (q,s,l) G (J) (p,r,m) G (G n ) a p,r,m g (p,r,m) (q,s,l) = 0 if q p, g (p,r,m) (q,s,l) is a uniform random variable over Z q s if q = p, s r, and g (p,r,m) (q,s,l) is a uniform random variable over p s r Z q s if q = p, s r. The code is given by C = {a G n φ(a) = c} where c is a uniform random variable over J. In this paper, we use the image ensemble for both the channel and the source coding problem; however, similar results can be derived using the kernel ensemble as well. Remark II.4. For an Abelian group G, define Q(G) = {(p, r) p P(G), r R p (G)} (2.3) Consider the smaller ensemble of codes consisting of homomorphisms from Abelian groups J of the form J = (p,r) Q(G) for some integer k and some w p,r s adding up to one. Z kwp,r p r (2.4) The rate of a code in this ensemble is equal to R = n log J = k rw p,r log p (2.5) n (p,r) Q(G) 20

32 It can be shown that the average performance of codes over this ensemble is equal to the average performance of the ensemble of all codes considered above. In the rest of this paper, we consider this simpler ensemble to prove the achievability results. 2.3 The Performance of Abelian Group Codes In this section, we provide an upper bound on the rate-distortion function for a given source and a lower bound on the capacity of a given channel using group codes when the underlying group is an arbitrary Abelian group represented by Equation (2.3). We start by defining seven objects and then state two theorems using these objects, and finally provide an interpretation of the results and these objects with two examples Definitions For an Abelian group G, define Q(G) = {(p, r) p P(G), r R p (G)} (2.6) We denote vectors ˆθ, w and θ whose components are indexed by (p, r) Q(G) by (ˆθ p,r ) (p,r) Q(G), (w p,r ) (p,r) Q(G) and (θ p,r ) (p,r) Q(G) respectively. For ˆθ = (ˆθ p,r ) (p,r) Q(G), define θ(ˆθ) = min (q,s) Q(G) q=p r s + + ˆθ q,s (p,r) Q(G) Note that ˆθ and θ = θ(ˆθ) correspond to unique subgroups Hˆθ and H θ of J and G respectively where Hˆθ = H θ = (p,r) Q(G) (p,r,m) G (G) pˆθ p,r Z kwp,r p r p θp,r Z (m) p r J G 2

33 To give some intuition about the function θ( ), we state that for any homomorphism φ : J G n, we have φ(hˆθ) H θ and for some homomorphism φ : J G n, we have φ(hˆθ) = H θ. Let { Θ = θ(ˆθ) (ˆθ q,s ) (q,s) Q(G) : 0 ˆθ } q,s s (2.7) This set corresponds to a collection of subgroups of G which appear in the ratedistortion function. In other words only certain subgroups of the underlying group rather than all of them become important in the rate-distortion function. This will be clarified in the proof of the theorem. For θ Θ, define (p,r) Q(G) ω θ = θ p,rw p,r log p (p,r) Q(G) rw p,r log p (2.8) Let X and U be jointly distributed random variables and let [U] θ = U + H θ be a random variable taking values from the cosets of H θ in G. We define the source coding group mutual information between U and X as I G s.c.(u; X) = min max w p,r,(p,r) Q(G) θ Θ wp,r= θ 0 ω θ ( log G ) H θ H([U] θ X) (2.9) where 0 is a vector whose components are indexed by (p, r) Q(G) and whose (p, r) th component is equal to 0. Let X and Y be jointly distributed random variables and let [X] θ = X + H θ be a random variable taking values from the cosets of H θ in G. We define the channel coding group mutual information between X and Y as I G c.c.(x; Y ) = max w p,r,(p,r) Q(G) wp,r= min θ Θ θ r ( log H θ H(X Y [X] θ ) ω θ ) (2.20) where r is a vector whose components are indexed by (p, r) Q(G) and whose (p, r) th component is equal to r Main Results The following theorem provides an upper bound on the rate-distortion function achievable using group codes.. 22

34 Theorem II.5. For a source (X, U = G, p X, d) and a given distortion level D, let p XU be a joint distribution over X U such that its first marginal is equal to the source distribution p X, its second marginal p U is uniform over U = G and such that E{d(X, U)} D. Then the rate-distortion pair (R, D) is achievable using group codes where R = I G s.c.(u; X). Proof. The proof is provided in Section When the underlying group is a Z p r ring, this result can be simplified. We state this result in the form of a corollary: Corollary II.6. Let X, U be jointly distributed random variables such that U is uniform over U = G = Z p r for some prime p and positive integer r. For θ =, 2,, r, let H θ be a subgroup of Z p r defined by H θ = p θ Z p r and let [U] θ = U + H θ. Then, I G s.c.(u; X) = Proof. Immediate from the theorem. max r r θ= θ I([U] θ; X) The following theorem is the dual channel coding result to Theorem II.5. Theorem II.7. For a channel (X = G, Y, W Y X ), the rate R = I G c.c.(x; Y ) is achievable using group codes over G. Proof. The proof is provided in Section When the underlying group is a Z p r ring, this result can be simplified. We state this result in the form of a corollary: Corollary II.8. Let X, Y be jointly distributed random variables such that X is uniform over X = G = Z p r for some prime p and a positive integer r. For θ = 23

35 0,,, r, let H θ be a subgroup of Z p r defined by H θ = p θ Z p r and let [X] θ = X + H θ. Then, Proof. Immediate from the theorem. Ic.c.(X; G Y ) = max r r θ=0 r θ I(X; Y [X] θ) When dealing with group codes for the purpose of channel coding, an important case is when the channel exhibits some sort of symmetry. The capacity of group codes for channels with some notion of symmetry is found in [8]. The next corollary states that the result of this paper simplifies to the result of [8] when the channel is symmetric in the sense defined in [8]. Corollary II.9. When the channel (X = G, Y, W Y X ) is G-symmetric in the sense defined in [8], i.e. if. G acts simply transitively on X (trivially holds for this case) 2. G acts isometrically on Y 3. For all x, g G, y Y, W (y x) = W (g y g + x) then I G c.c(x; Y ) is equal to the rate provided in [8, Equation (33)]. Proof. The proof is provided in Section Interpretation of the Results In this section, we try to give some intuition about the result and the quantities defined above using several examples. At a high level, w p,r denotes the normalized weight given to the Z p r component of the input group J in constructing the homomorphism from J to G n, and θ indexes a subgroup H θ of G that comes from a set Θ. ω θ I([U] θ ; X) in source coding and ( ω θ ) I(X; Y [X] θ) in channel coding denote the rate constraints imposed by the subgroup H θ. Due to the algebraic structure 24

36 of the code in the ensemble, two random codewords corresponding to two distinct indexes are statistically dependent, unless G is a finite field. For the source coding problem, when the code is chosen randomly, consider the event that all components of their difference belong to a proper subgroup H θ of G. Then if one of them is a poor representation of a given source sequence, so is the other with a probability that is higher than the case when no algebraic structure on the code is enforced. This means that the code size has to be larger so that with high probability one can find a good representation of the source. For the channel coding problem, when a random codeword corresponding to a given message index is transmitted over the channel, consider the event that all components of the difference between the codeword transmitted and a random codeword corresponding to another message index belong to a proper subgroup H θ of G. Then the probability that the latter is decoded instead of the former is higher than the case when no algebraic structure on the code is enforced. Example: We start with the simple example where G = Z 8. In this case, we have P(G) = {2} and Q(G) = {(2, 3)}. For vectors w, ˆθ and θ defined as above, we have w = w 2,3 =, ˆθ = ˆθ 2,3 and θ = θ 2,3. Recall that the ensemble of Abelian group codes used in the random coding argument consists of the set of all homomorphisms from some J = Z kw 2,3 8 = Z k 8. Any ˆθ = ˆθ 2,3 with 0 ˆθ 2,3 3 corresponds to a subgroup Kˆθ of the input group J given by Kˆθ = 2ˆθ 2,3 Z k 8 Similarly, any θ = θ 2,3 with 0 θ 2,3 3 corresponds to a subgroup H θ of the group space G n given by H θ = 2 θ 2,3 Z n 8 In this case, it turns out that if θ = θ(ˆθ) = ˆθ 2,3 = ˆθ (2.2) 25

37 then for any random homomorphism φ from J into G n, and for any a J with a 2ˆθ 2,3 Z k 2,3 8\2ˆθ + Z k 8, φ(a) is uniformly distributed over Hθ n. The set Θ consists of all vectors θ for which there exists at least one such a. Note that this set corresponds to a collection of subgroups of G n. The quantity ω θ is a measure of the number of elements a of J for which φ(a) is uniform over H θ. It turns out that for this example, Θ = {0,, 2, 3} and ω 0 = 0, ω = 3, ω 2 = 2 3 and ω 3 =. Example: Next, we consider the case where G = Z 4 Z 3. In this case, we have P(G) = {2, 3} and Q(G) = {(2, 2), (3, )}. For vectors w, ˆθ and θ defined as before, we have w = (w 2,2, w 3, ), ˆθ = (ˆθ 2,2, ˆθ 3, ) and θ = (θ 2,2, θ 3, ). The ensemble of Abelian group codes consists of the set of all homomorphisms from some J = Z kw 2,2 4 Z kw 3, 3. Any vector ˆθ = (ˆθ 2,2, ˆθ 3, ) with 0 ˆθ 2,2 2 and 0 ˆθ 3, corresponds to a subgroup Kˆθ of the input group J given by Kˆθ = 2ˆθ 2,2 Z kw 2,2 4 3ˆθ 3, Z kw 3, 8 Similarly, any θ = (θ 2,2, θ 3, ) with 0 θ 2,2 2 and 0 θ 3, corresponds to a subgroup H θ of the group space G n given by H θ = 2 θ 2,2 Z n 4 3 θ 3, Z n 3 It turns out that if θ 2,2 = ˆθ 2,2 (2.22) θ 3, = ˆθ 3, (2.23) then for any random homomorphism φ from J into G n, and for any a = (β, γ) J with β 2ˆθ 2,2 Z kw 2,2 4 \2ˆθ 2,2 + Z kw 2,2 4 and γ 3ˆθ 3, Z kw 3, 3 \3ˆθ 3, + Z kw 3, 3, φ(a) is uniformly distributed over Hθ n. Moreover, for this example we have Θ = {(0, 0), (, 0), (2, 0), (0, ), (, ), (2, )} 26

38 2.4 Proof for the Source Coding Problem 2.4. The Coding Scheme Following the analysis of Section 2.2.2, we construct the ensemble of group codes of length n over G as the image of all homomorphisms φ from some Abelian group J into G n where J and G n are as in Equations (2.0) and (2.8) respectively. The random homomorphism φ is described in Equation (2.). To find an achievable rate for a distortion level D, we use a random coding argument in which the random encoder is characterized by the random homomorphism φ, a random vector B uniformly distributed over G n and a joint distribution p XU over X U such that its first marginal is equal to the source distribution p X, its second marginal p U is uniform over U = G and such that E{d(X, U)} D. The code is defined as in (2.2) and its rate is given by (2.5). Given the source output sequence x X n, the random encoder looks for a codeword u C such that u is jointly typical with x with respect to p XU. If it finds at least one such u, it encodes x to u (if it finds more than one such u it picks one of them at random). Otherwise, it declares error. The decoder outputs u as the source reconstruction Error Analysis Let x = (x,, x n ) and u = (u,, u n ) be the source output and the encoder/decoder output respectively. Note that if the encoder declares no error then since x and u are jointly typical, (d(x i, u i )) i=,,n is typical with respect to the distribution of d(x, U). Therefore for large n, d(x, u) = n n n i= d(x i, u i ) E{d(X, U)} D. It remains to show that the rate can be as small as Is.c.(X; G U) while keeping the probability of encoding error small. 27

39 Given the source output x X n, define α(x) = u A n ɛ (U x) {u C} = u A n ɛ (U x) a J {φ(a)+b=u} An encoding error occurs if and only if α(x) = 0. We use the following Chebyshev s inequality to show that under certain conditions the probability of error can be made arbitrarily small: P (α(x) = 0) var{α(x)} E{α(x)} 2 We need the following lemmas to proceed: Lemma II.0. For a, ã J, u, ũ G n and for (q, s, l) G (J), let ˆθ q,s,l {0,,, s} be such that ã q,s,l a q,s,l q ˆθ q,s,l Z q s\q ˆθ q,s,l + Z q s For (p, r) Q(G) define θ p,r (a, ã) = min (q,s,l) G (J) q=p r s + + ˆθ q,s,l and let θ p,r = θ p,r (a, ã). Define the subgroup H θ of G as H θ = (p,r,m) G (G) p θp,r Z (m) p r Then, P (φ(a) + B = u, φ(ã) + B = ũ) = G n H θ If ũ u H n n θ 0 Otherwise Proof. The proof is provided in Appendix Lemma II.. For a J and θ = (θ p,r ) (p,r) Q(G), let T θ (a) = {ã J (p, r) Q(G), θ p,r (a, ã) = θ p,r } 28

40 where θ p,r (a, ã) is defined as in the previous lemma. Then we have T θ (a) (p,r) Q(G) p (r θp,r)kwp,r = 2 nr( ω θ) In particular, T θ (a) does not depend on a. We denote this by T θ = T θ (a). Proof. The proof is provided in Appendix Lemma II.2. For a J and u G n, we have Proof. Immediate from Lemma II.0. and We have E{α(x)} = E{α(x) 2 } = E = P (φ(a) + B = u) = G n u A n ɛ (U x) a J = An ɛ (U x) J G n u,ũ A n ɛ (U x) a,ã J u,ũ A n ɛ (U x) a,ã J = θ Θ a J u A n ɛ (U x) ã T θ (a) P (φ(a) + B = u) {φ(a)+b=u,φ(ã)+b=ũ} P ({φ(a) + B = u, φ(ã) + B = ũ}) ũ A n ɛ (U x) ũ u H n θ G n H θ n Note that the term corresponding to θ = 0 is upper bounded by E{α(x)} 2. Using Lemma II.4, we have Therefore, A n ɛ (U x) (u + H n θ ) 2 n[h(u [U] θx)+o(ɛ)] var{α} = E{α(x) 2 } E{α(x)} 2 θ Θ θ 0 J A n ɛ (U x) T θ 2n[H(U [U] θx)+o(ɛ)] G n H θ n 29

41 Therefore, P (α(x) = 0) var{α(x)} E{α(x)} 2 θ Θ θ 0 2 nr( ω θ) 2 n[h(u X) H(U [U] θx) O(ɛ)] G n J H θ n Note that H(U X) H(U [U] θ X) = H([U] θ X) and J = 2 nr ; therefore, P (α(x) = 0) { [ ]} exp 2 n H([U] θ X) log G : H θ + ω θ R O(ɛ) θ Θ θ 0 In order for the probability of error to go to zero as n increases, we require the exponent of all the terms to be negative; or equivalently, R > ( ) log G : H θ H([U] θ X) ω θ with the convention 0 R = =. Therefore, the achievable rate is equal to min w p,r,(p,r) Q(G) wp,r= max θ Θ θ 0 ( ) log G : H θ H([U] θ X) ω θ 2.5 Proof for the Channel Coding Problem 2.5. The Coding Scheme Following the analysis of Section 2.2.2, we construct the ensemble of group codes of length n over G as the image of all homomorphisms φ from some Abelian group J into G n where J and G n are as in Equations (2.0) and (2.8) respectively. The random homomorphism φ is described in Equation (2.). To find an achievable rate, we use a random coding argument in which the random encoder is characterized by the random homomorphism φ and a random vector B uniformly distributed over G n. Given a message u J, the encoder maps it to x = φ(u) + B and x is then fed to the channel. At the receiver, after receiving the 30

42 channel output y Y n, the decoder looks for a unique ũ J such that φ(ũ) + B is jointly typical with y with respect to the distribution p X W Y X where p X is uniform over G. If the decoder does not find such ũ or if such ũ is not unique, it declares error Error Analysis Let u, x and y be the message, the channel input and the channel output respectively. The error event can be characterized by the union of two events: E(u) = E (u) E 2 (u) where E (u) is the event that φ(u) + B is not jointly typical with y and E 2 (u) is the event that there exists a ũ u such that φ(ũ) + B is jointly typical with y. We can provide an upper bound on the probability of the error event as P (E(u)) P (E (u)) + P (E 2 (u) (E (u)) c ). Using the standard approach, one can show that P (E (u)) 0 as n. The probability of the error event E 2 (u) (E (u)) c averaged over all messages can be written as P avg (E 2 (u) (E (u)) c ) = u J J x G n {φ(u)+b=x} y A n ɛ (Y x) W n Y X(y x) { ũ J:ũ u,φ(ũ)+b A n ɛ (X y)} The expected value of this probability over the ensemble is given by E{P avg (E 2 (u) (E (u)) c )} = P err where P err = u J J x G n Using the union bound, we have P err u J J y A n ɛ (Y x) W n Y X(y x)p (φ(u) + B = x, ũ J : ũ u, φ(ũ) + B A n ɛ (X y)) x G n y A n ɛ (Y x) ũ J ũ u x A n ɛ (X y) W n Y X(y x)p (φ(u) + B = x, φ(ũ) + B = x) Define Θ as in Equation (2.7) and for θ Θ and u J, define T θ (u) as in Lemma II.. It follows that P err u J J x G n y A n ɛ (Y x) θ Θ θ r ũ T θ (u) x A n ɛ (X y) W n Y X(y x)p (φ(u) + B = x, φ(ũ) + B = x) 3

43 Using Lemmas II.0, II.4 and II., we have P err θ Θ θ r θ Θ θ r θ Θ θ r u J u J J J x G n y A n ɛ (Y x) ũ T θ (u) x G n y A n ɛ (Y x) ũ T θ (u) T θ 2 n[h(x Y [X] θ)+o(ɛ)] H θ n Equivalently, this can be written as P err θ Θ θ r x A n ɛ (X y) x x+h θ n W n Y X(y x) G n H θ n W n Y X(y x)2 n[h(x Y [X] θ)+o(ɛ)] G n H θ n { [ ]} exp 2 n ( ω θ )R H(X Y [X] θ ) + log H θ O(ɛ) Therefore, the achievability condition is R = min θ Θ θ r ( ) log H θ H(X Y [X] θ ) ω θ If we maximize over the choice of w, we can conclude that the rate R = I G c.c(x; Y ) is achievable Simplification of the Result for Symmetric Channels In this section, we provide a proof of corollary II.9. Note that since we take X = G, we can take the action of G on X to be the group operation. We need to show that for all subgroups H of G, I(X; Y [X]) = C H where X = X + H and C H is the mutual information between the channel input and the channel output when the input is uniformly distributed over H; in other words, C H = I(X; Y [X] = H). This in turn follows by showing that for all g G I(X; Y [X] = g + H) = I(X; Y [X] = H) 32

44 This can be shown as follows: I(X; Y [X] = g + H) = x g+h y Y = x H y Y (a) = x H (b) = x H y Y y Y = I(X; Y [X] = H) W (y x) W (y x) log H P (y) W (y x + g) W (y x + g) log H P (y) W (g y x + g) W (g y x + g) log H P (y) W (y x) W (y x) log H P (y) where (a) follows since the action of g on Y is a bijection of Y and (b) follows from the symmetric property of the channel. Using this result, it can be shown that the rate provided in [8, Equation (33)] is equal to I G c.c.(x; Y ). The difference in the appearance of the two expressions is due to the fact that in [8, Equation (33)] the minimization is carried out over the subgroups of the input group whereas in the expression for Ic.c.(X; G Y ) the minimization is carried out over the resulting subgroups of the output group. 2.6 Examples In this section, we provide a few examples for both the source coding problem as well as the channel coding problem. We show that when the underlying group is a field, the source coding group mutual information and the channel coding group mutual information are both equal to the Shannon mutual information. We also provide several non-field examples for both problems Examples for Source Coding In this section, we find the rate-distortion region for a few examples. First, we consider the case where the underlying group is a field i.e. when G = Z m p for some 33

45 prime p and positive integer m. In this case, we have P(G) = {p}, R p (G) = {}, M p, = m and Q(G) = {(p, )}. Since the set Q(G) is a singleton, the only choice for the weights is w = w p, = and Θ = {0, } For θ =, we have w θ = 0 and [U] θ = U. Therefore, I G s.c. = I(U; X) This means when the underlying group is a field, the rate is equal to the regular mutual information between U and X when U is a uniform random variable. Next, we consider the case where the reconstruction alphabet is Z 4. In this case, we have p = 2 and r = 2. Therefore, R = max 2 2 θ= θ I([U] θ; X) = max(2i([u] ; X), I(U; X)) where U is uniform over Z 4, X is the source output and [U] = U +2 Z 4 = X +{0, 2} and the joint distribution is such that E{d(U, X)} D. Therefore, 2I([U] ; X) = I(U + {0, 2}; X) + I(U + {, 3}; X) Hence, R = max (I(U; X), I(U + {0, 2}; X) + I(U + {, 3}; X)) Next, we consider the case where the reconstruction alphabet is Z 8. For this source, we have p = 2 and r = 3. Following a similar argument as above we have: ( R = max I(U; X), 3 ) 2 I([U] 2; X), 3I([U] ; X) where U is uniform over Z 8, X is the source output, [U] = U + {0, 2, 4, 6} and [U] 2 = U + {0, 4}. 34

46 Similarly, for channels with input Z 9, we have p = 3, r = 2 and R = max (I(U; X), 2I([U] ; X)) where U is uniform over Z 9, X is the source output and [U] = U + {0, 3, 6}. Finally, we consider G = Z 2 Z 4. In this case, P(G) = {2}, R 2 (G) = {, 2}, Q(G) = {(2, ), (2, 2)}, 0 = (0, 0) and w = (w, w 2 ) such that w + w 2 =. We have Θ = {(0, 0), (0, ), (, ), (, 2)}. For θ = (0, ) we have ω θ = w +w 2 w +2w 2 = +w 2, for θ = (, ) we have ω θ = w 2 +w 2, and for θ = (, 2) we have ω θ = 0; therefore, ( R = min max ( + w 2 )I([U] θ=(0,) ; X), + w ) 2 I([U] θ=(,) ; X), I([U] θ=(,2) ; X) w,w 2 w ( 2 = min max ( + w 2 )I([U] θ=(0,) ; X), + w ) 2 I([U] θ=(,) ; X), I(U; X) w,w 2 w 2 The minimum of R is achieved when or equivalently Therefore, ( + w 2 )I([U] θ=(0,) ; X) = + w 2 w 2 I([U] θ=(,) ; X) w 2 = I([U] θ=(,); X) I([U] θ=(0,) ; X) R = max ( I([U] θ=(,) ; X) + I([U] θ=(0,) ; X), I(X; Y ) ) Examples for Channel Coding In this section, we find the achievable rate for a few examples: First, we consider the case where the underlying group is a field i.e. when G = Z m p for some prime p and positive integer m. As in the source coding case, the only choice for the weights is w = w p, = and Θ = {0, }. For θ = 0, we have w θ = and [U] θ is a trivial random variable. Hence I G s.c. = I(U; X) 35

47 This means when the underlying group is a field, the rate is equal to the regular mutual information between U and X when U is a uniform random variable. Next, we consider the case where the channel input alphabet is Z 4. In this case, we have p = 2 and r = 2. Therefore, R = min 2 θ=0 2 θ I(X; Y [X] θ) = min(i(x; Y ), 2I(X; Y [X] )) where the channel input X is uniform over Z 4, Y is the channel output and [X] = X + 2 Z 4 = X + {0, 2}. Therefore, 2I(X; Y [X] ) = I(X; Y X {0, 2}) + I(X; Y X {, 3}) Hence, R = min (I(X; Y ), I(X; Y X {0, 2}) + I(X; Y X {, 3})) Next, we consider a channel of input alphabet Z 8. For this channel we have p = 2 and r = 3. Following a similar argument as above we have: ( R = min I(X; Y ), 3 ) 2 I(X; Y [X] ), 3I(X; Y [X] 2 ) where the channel input X is uniform over Z 8, Y is the channel output, [X] = X + {0, 2, 4, 6} and [X] 2 = X + {0, 4}. Similarly, for channels with input Z 9, we have p = 3, r = 2 and R = min (I(X; Y ), 2I(X; Y [X] )) where the channel input X is uniform over Z 9, Y is the channel output and [X] = X + {0, 3, 6}. Finally, we consider G = Z 2 Z 4. In this case, P(G) = {2}, R 2 (G) = {, 2}, Q(G) = {(2, ), (2, 2)}, r = (, 2) and w = (w, w 2 ) such that w + w 2 =. We have 36

48 Θ = {(0, 0), (0, ), (, ), (, 2)}. For θ = (0, 0) we have ω θ =, for θ = (0, ) we have ω θ = w +w 2 w +2w 2 = +w 2 and for θ = (, ) we have ω θ = w 2 +w 2 therefore, ( + w2 R = max w,w 2 min = max w,w 2 min ) I(X; Y [X] θ=(,) ), ( + w 2 )I(X; Y [X] θ=(0,) ), I(X; Y [X] θ=(0,0) ) w ( 2 ) + w2 I(X; Y [X] θ=(,) ), ( + w 2 )I(X; Y [X] θ=(0,) ), I(X; Y ) w 2 The maximum of R is achieved when or equivalently Therefore, + w 2 I(X; Y [X] θ=(,) ) = ( + w 2 )I(X; Y [X] θ=(0,) w 2 w 2 = I(X; Y [X] θ=(,)) I(X; Y [X] θ=(0,) ) R = min ( I(X; Y [X] θ=(,) ) + I(X; Y [X] θ=(0,) ), I(X; Y ) ) 2.7 Appendix Proof of Lemma II.2 We first prove that for a homomorphism φ, g (q,s,l) (p,r,m) satisfies the above conditions. First assume p q. Note that the only nonzero component of I J:q,s,l takes values from Z q s and therefore q s I J:q,s,l = (J) {}}{ i=,,q s I J:q,s,l = 0 37

49 Note that since φ is a homomorphism, we have φ(q s I J:q,s,l ) = 0. On the other hand, φ(q s I J:q,s,l ) = φ( = = = = = (J) {}}{ i=,,q s I J:q,s,l ) ( G) {}}{ φ(i J:q,s,l ) i=,,q s (p,r,m) G ( G) (p,r,m) G ( G) (p,r,m) G ( G) (p,r,m) G ( G) ( G) {}}{ φ(i J:q,s,l ) i=,,q s (Z p r ) {}}{ p,r,m i=,,q s [φ(i J:q,s,l )] p,r,m q s [φ(i J:q,s,l )] p,r,m q s g (q,s,l) (p,r,m) Therefore, we have q s g (q,s,l) (p,r,m) = 0 (mod p r ) or equivalently q s g (q,s,l) (p,r,m) = Cp r for some integer C. Since p q, this implies p r g (q,s,l) (p,r,m) and since g (q,s,l) (p,r,m) takes value from Z p r, we have g (q,s,l) (p,r,m) = 0. Next, assume p = q and r s. Note that same as above, we have φ(q s I J:q,s,l ) = 0 and φ(q s I J:q,s,l ) = (p,r,m) G ( G) q s g (q,s,l) (p,r,m) and therefore, q s g (q,s,l) (p,r,m) = 0 (mod p r ). Since g (q,s,l) (p,r,m) takes values from Z p r and p = q, this implies p r s g (q,s,l) (p,r,m) or equivalently g (q,s,l) (p,r,m) p r s Z p r. Next we show that any mapping described by (2.6) satisfying the conditions of the lemma is a homomorphism. For two elements a, b J and for (p, r, m) G ( G) 38

50 we have [φ(a + b)] p,r,m = φ = φ = φ = = = On the other hand, we have (q,s,l) G (J) (J) {}}{ (q,s,l) G (J) (J) {}}{ (a q,s,l + q s b q,s,l ) p,r,m (a q,s,l + q s b q,s,l )I J:q,s,l (J) {}}{ I J:q,s,l (q,s,l) G (J) i=,,a q,s,l + q sb q,s,l ( G) {}}{ (q,s,l) G (J) (Z p r ) {}}{ (q,s,l) G (J) (Z p r ) {}}{ (q,s,l) G (J) ( {}}{ G) φ (I J:q,s,l ) i=,,a q,s,l + q sb q,s,l (Z p r ) {}}{ p,r,m p,r,m i=,,a q,s,l + q sb q,s,l [φ (I J:q,s,l )] p,r,m (Z p r ) {}}{ p,r,m i=,,a q,s,l + q sb q,s,l g (q,s,l) (p,r,m) (2.24) [φ(a) + φ(b)] p,r,m = [φ(a)] p,r,m + p r [φ(b)] p,r,m (Z p r ) {}}{ = a q,s,l g (q,s,l) (p,r,m) + p r = = (q,s,l) G (J) (Z p r ) {}}{ (Z p r ) {}}{ (q,s,l) G (J) i=,,a q,s,l (Z p r ) {}}{ (q,s,l) G (J) (Z p r ) {}}{ g (q,s,l) (p,r,m) + p r (Z p r ) {}}{ (q,s,l) G (J) (Z p r ) {}}{ (q,s,l) G (J) i=,,b q,s,l b q,s,l g (q,s,l) (p,r,m) (Z p r ) {}}{ g (q,s,l) (p,r,m) i=,,a q,s,l +b q,s,l g (q,s,l) (p,r,m) (2.25) where the addition in a q,s,l + b q,s,l is the integer addition. In order to show that φ is a homomorphism, it suffices to show that under the conditions of the lemma, Equations (2.24) and (2.25) are equivalent. We show that for a 39

51 fixed (q, s, l) G (J), if the conditions of the lemma are satisfied, then (Z p r ) {}}{ i=,,a q,s,l +b q,s,l g (q,s,l) (p,r,m) = (Z p r ) {}}{ i=,,a q,s,l + q sb q,s,l g (q,s,l) (p,r,m) (2.26) Note that if p q, then both summations are zero. Note that we have and (Z p r ) {}}{ i=,,a q,s,l +b q,s,l g (q,s,l) (p,r,m) = (Z p r ) {}}{ i=,,a q,s,l + q sb q,s,l g (q,s,l) (p,r,m) = (Z p r ) {}}{ i=,,(a q,s,l +b q,s,l) (mod p r ) (Z p r ) {}}{ i=,,(a q,s,l + q sb q,s,l) (mod p r ) g (q,s,l) (p,r,m) g (q,s,l) (p,r,m) If p = q and r s, then we have (a q,s,l + q s b q,s,l ) (mod p r ) = (a q,s,l + b q,s,l ) (mod p r ) and hence it follows that Equation (2.26) is satisfied. If p = q and r s, since g (q,s,l) (p,r,m) p r s Z p r we have (Z p r ) {}}{ i=,,a q,s,l +b q,s,l g (q,s,l) (p,r,m) = (Z p r ) {}}{ i=,,(a q,s,l +b q,s,l) (mod p s ) and hence it follows that Equation (2.26) is satisfied. g (q,s,l) (p,r,m) Proof of Lemma II.0 Note that since g (q,s,l) (p,r,m) s and B are uniformly distributed, in order to find the desired joint probability, we need to count the number of choices for g (q,s,l) (p,r,m) s and B such that for (p, r, m) G (G n ), (Z p r ) {}}{ a q,s,l g (q,s,l) (p,r,m) + p r B p,r,m = u p,r,m (q,s,l) G (J) (Z p r ) {}}{ (q,s,l) G (J) ã q,s,l g (q,s,l) (p,r,m) + p r B p,r,m = ũ p,r,m 40

52 and divide this number by the total number of choices which is equal to G n p min(r,s) = G n (p,r,m) G (G n ) (q,s,l) G (J) q=p (p,r,m) G (G) (q,s,l) G (J) q=p p min(r,s) n where the term p min(r,s) appears since the number of choices for g (q,s,l) (p,r,m) is p r if p = q, r s and is equal to p s if p = q, r s. Since B can take values arbitrarily from G n, the number of choices for the above set of conditions is equal to the number of choices for g (q,s,l) (p,r,m) s such that, (Z p r ) {}}{ (q,s,l) G (J) (ã q,s,l a q,s,l )g (q,s,l) (p,r,m) = ũ p,r,m u p,r,m Note that for all (q, s, l) G (J), (ã q,s,l a q,s,l )g (q,s,l) (p,r,m) p θp,r Z p r. Therefore we require ũ p,r,m u p,r,m p θp,r Z p r and therefore we require ũ u H n θ or otherwise the probability would be zero. For fixed p P(G) and r R p (G), let (q, s, l ) G (J) be such that q = p and θ p,r = r s + + ˆθ q,s,l For fixed (p, r, m) G (G n ), and for (q, s, l) (q, s, l ), choose g (q,s,l) (p,r,m) arbitrarily from it s domain. The number of choices for this is equal to (p,r,m) G (G) (q,s,l) G (J) q=p (q,s,l) (q,s,l ) p min(r,s) n 4

53 For each (p, r, m) G (G n ), we need to have (ã q,s,l a q,s,l )g (q,s,l ) (p,r,m) = ũ p,r,m u p,r,m (Z p r ) {}}{ (q,s,l) G (J) (q,s,l) (q,s,l ) (ã q,s,l a q,s,l )g (q,s,l) (p,r,m) Note that the right hand side is included in p θp,r Z p r and (ã q,s,l a q,s,l ) is included in pˆθ q,s,l Z (q ) (s ). We need to count the number of solutions for g (q,s,l ) (p,r,m) in p r s + Z p r. Using Lemma II.3, we can show that the number of solutions is equal to pˆθ q,s,l. The total number of solutions for φ is equal to Hence we have (p,r,m) G P (φ(a)+b =u, φ(ã)+b =ũ) = (q,s,l) G (J) q=p (q,s,l) (q,s,l ) (p,r,m) G (G) = (p,r,m) G (G) Note that for (q, s, l) = (q, s, l ) we have p min(r,s) pˆθ q,s,l pˆθ q,s,l (q,s,l) G (J) q=p (q,s,l) (q,s,l ) [ (p,r,m) G (G) (q,s,l) G (J) q=p (q,s,l)=(q,s,l ) n (q,s,l) G (J) q=p n pˆθ q,s,l p min(r,s) p min(r,s) ] n min(r, s) = min(r, s ) = r r s + = r (θ p,r ˆθ ) q,s,l p min(r,s) n 42

54 Therefore, the above probability is equal to (p,r,m) G (q,s,l) G (J) q=p (q,s,l)=(q,s,l ) pˆθ q,s,l p r (θp,r ˆθ q,s,l ) n = = Since the dither B is uniform, we conclude that P φ(u) + B = x φ(ũ) + B = x (p,r,m) G (p,r,m) G = G n H θ n (q,s,l) G (J) q=p (q,s,l)=(q,s,l ) p θp,r p r n p r θp,r = H θ n n Proof of Lemma II. Let ã T θ (a) be such that for (q, s, l) G (J), ã q,s,l a q,s,l q ˆθ q,s,l Z q s\q ˆθ q,s,l + Z q s for some 0 ˆθ q,s,l s. Since for all ã T θ (a) and all (p, r) Q(G) min (p,s,l) G (J) r s + + ˆθ q,s,l = θ p,r we require ˆθ p,s,l θ p,s for all (p, s, l) G (J). This means for (q, s, l) G (J), ã q,s,l can only take values from a q,s,l + q θq,s Z q s The cardinality of this set is equal to q s θq,s. Therefore, T θ (a) q s θq,s = (q,s,l) G (J) (q,s) Q(G) q (s θq,s)kwq,s The last part of the proof is straightforward given the definitions of ω θ and R. 43

55 Useful Lemmas Lemma II.3. Let p be a prime and s, r a positive integer such that s r. For a Z p s and b Z p r, let 0 ˆθ s and ˆθ θ r be such that a pˆθz p s\pˆθ+ Z p s b p θ Z p r Write a = pˆθα for some invertible element α Z p r and b = p θ β for some β β {0,,, p r θ }. Then, the set of solutions to the equation ax (mod p r ) = b is { p θ ˆθα β + iα p r ˆθ i } = 0,,, pˆθ Proof. Note that the representation of b as b = p θ β is not unique and for any β of the form β = β + ip r θ for i = 0,,, p θ, b can be written as p θ β. Also, the representation of a as a = pˆθα is not unique and for any α = α + ip r ˆθ for i = 0,,, pˆθ, we have a = pˆθ α. The set of solutions to ax = b is identical to the set of solutions to pˆθx = p θ α β. The set of solutions to the latter is { p θ ˆθα β + iα p r ˆθ i } = 0,,, pˆθ It remains to show that this set of solutions is independent of the choice of α and β. First, we show that the set of solutions is independent of the choice of β. For β = β + jp r θ for some j {0,,, p θ 2 }, we have { p θ ˆθα β + iα p r ˆθ i } = 0,,, pˆθ { = p θ ˆθα ( β + jp r θ) + iα p r ˆθ i } = 0,,, pˆθ { = p θ ˆθα β + (i + j) α p r ˆθ i } = 0,,, pˆθ { (a) = p θ ˆθα β + iα p r ˆθ i } = 0,,, pˆθ where (a) follows since the set p r ˆθ{0,,, pˆθ } is a subgroup of Z p r and jp r ˆθ lies in this set. 44

56 Next, we show that the set of solutions is independent of the choice of α. For α = α + jp r ˆθ for some j {0,,, pˆθ }, we have ) α (α α jp r ˆθ α = Therefore, it follows that the unique inverse of α satisfies α α α p r ˆθZ p r. Assume α = α + kα p r ˆθ. We have, { } p θ ˆθ α β + i α p r ˆθ i = 0,,, pˆθ { ( ) ( ) } = p θ ˆθ α + kα p r ˆθ β + i α + kα p r ˆθ p r ˆθ i = 0,,, pˆθ { ( ) } = p θ ˆθα β + i + ikp r ˆθ + kβp θ ˆθ α p r ˆθ i = 0,,, pˆθ { (a) = p θ ˆθα β + iα p r ˆθ i } = 0,,, pˆθ where same as above, (a) follows since the set p r ˆθ{0,,, pˆθ } is a subgroup of Z p r and (ikp r ˆθ + kβp θ ˆθ)p r ˆθ lies in this set. Lemma II.4. Let X be a random variable taking values from the group G and for a subgroup H of G, define [X] = X + H. For y A n ɛ (Y ) and x A n ɛ (X y), let z = [x] = x + H n. Then we have (x + H n ) A n ɛ (X y) = A n ɛ (X zy) and ( ɛ)2 n[h(x Y [X]) O(ɛ)] (x + H n ) A n n[h(x Y [X])+O(ɛ)] ɛ (X y) 2 Proof. First, we show that (x + H n ) A n ɛ (X y) is contained in A n ɛ (X zy). Since z is a function of x, we have (x, z, y) A n ɛ (X, [X], Y ). For x (x + H n ) A n ɛ (X y), we have [x ] = x + H n = x + H n = z and (x, z, y) = (x, [x ], y) A n ɛ (X, [X], Y ). Therefore, x A n ɛ (X zy) and hence, (x + H n ) A n ɛ (X y) A n ɛ (X zy) 45

57 Conversely, for x A n ɛ (X zy), since (x, z) A n ɛ (X, [X]) where [X] is a function of X, we have [x ] = z. This implies x z + H n = x + H n. Clearly, we also have x A n ɛ (X y). The claim on the size of the set follows since (z, y) A n ɛ ([X]Y ) How to Compute the Rate For (p, r) Q(G), define Θ p,r = {θ Θ (p, r ) Q(G) : θ p,r r θ p,r r } and distribute the break evens so that Θ p,r, (p, r) Q(G) forms a partition of Θ. Let (w p,r) (p,r) Q(G) be the optimal weights (need not be unique) and let Θ be set of the maximizing θ s for the optimal rate. Define S (G) = {(p, r ) Q(G) w p,r 0}. Since Θ p,r, (p, r) Q(G) forms a partition of Θ, we have θ Θ p,r for some (p, r) Q(G). For (p, r ) Q(G) such that (p, r ) (p, r), if there is no θ Θ such that θ Θ p,r, we can decrease w p,r and increase some of the other weights to get a better rate which is a contradiction. Hence, the optimal weight can be found as follows: Let S (G) be a subset of Q(G) and Let R(S (G)) be an empty set. For (p, r) S (G), choose θ p,r from Θ p,r and solve the system of equations: For all (p, r) S (G) and θ = θ p,r, I([U] θ ; X) = C ω θ w p,r = (p,r) S (G) where C is an arbitrary constant. Given the solutions w p,r, (p, r) S (G), find C and add it to the set R(S (G)). Do this for all choices of θ p,r s and take the maximum of the set R(S (G)). We also minimize the rate over the choice of S (G). 46

58 CHAPTER III Abelian Group Codes for Multi-terminal Communications 3. Nested Codes for Channels with State Information Consider a point-to-point channel coding problem with channel state information available at the transmitter. Denote the channel by (X, S, Y, W ) where X is the channel input alphabet, S is the channel state alphabet and Y is the channel output alphabet and for the channel input x X and the channel state s S, W (y x, s) denotes the conditional probability of observing y Y in the channel output. We assume X = G for some Abelian group G. We study the performance of nested random/group codes and nested group/random codes for this problem. These ensembles are important because they can be used in many multi-terminal communications such as broadcast channels and interference channels. 3.. Nested Random/Group Codes for Channel Coding A nested code consists of an outer code which is partitioned into smaller inner codes and the set of messages is equal to the set of inner codes. We employ a nested code in which the inner code is a group code and the outer code consists of random shifts of the inner code. Let C in = {φ(a) a J} where φ and J are defined in (2.) 47

59 and (2.4) respectively and let C out = 2 nr m= (C in + B m ) where B m s are iid random variables distributed uniformly over G n. Note that the rates of the inner and outer codes are equal to R in = n log J and R out = R in + R where R is the communication rate of our coding scheme. The encoding and decoding rules are as follows: Given a message m {, 2,, 2 nr }, and the channel state s S n, define α(m, s) = a J x A n ɛ (X s) {φ(a)+bm=x} Note that if α(m, s) > 0, then there exists at least one a J with φ(a) + B m A n ɛ (X s). In this case, the encoder picks one such a and sends x = φ(a) + B m over the channel. The encoder will declare an encoding error if α(m, s) = 0. Although it may be unnecessary, it is convenient in the proofs to assume that the encoder declares error if α(m, s) J An ɛ (X s) 2 G n that a J is picked with probability. We denote this error event by Err e. We also assume x A n ɛ (X s) {φ(a)+bm=x} α(m,s). At the decoder, after receiving the channel output y Y n, the decoder looks for a unique message ˆm {, 2,, 2 nr } for which there exists a J with φ(a) + B ˆm A ɛ (X y). If it doesn t find such ˆm (Err d ) or if it finds multiple such ˆm s (Err d2 ), it declares error. We show that if R Ī(X; Y ) IG s.c.(x; S) then the probability of all the errors (Err e, Err d and Err d2 ) approach zero as the block length approaches infinity. Note that by the standard typicality results [20, 48

60 Theorem 3..2] one can show that with probability approaching one as the block length increases, φ(a) + B m A ɛ (X y). Therefore, the probability of the error event Err d vanishes as the block length increases. It suffices to show that for any choice of weights (w p,r ) (p,r) Q(G), the probability of the two error events Err e and Err d2 vanish if R in > ( ) log G : H θ H([X] θ S) ω θ R + R in < Ī(X; Y ) 3... The Error Event Err e Note that given the message m and the channel state s n, the error event Err e occurs if α(m, s) J An ɛ (X s). We bound the probability of error using the following 2 G n Chebyshev s inequality: We have and ( ) ( P Err e m, s = P α(m, s) J An ɛ (X s) ) 2 G n E{α(m, s)} = a J E{α(m, s) 2 } = x A n ɛ (X s) = J An ɛ (X s) G n a,ã J x, x A n ɛ (X s) = a J x A n ɛ (X s) θ Θ ã T θ (a) ( ) P φ(a) + B m = x var{α(m, s)} E{α(m, s)} 2 ( ) P φ(a) + B m = x, φ(ã) + B m = x x A n ɛ (X s) x x+h n θ G n H θ n 49

61 Therefore, var{α(m, s)} = E{α(m, s) 2 } E{α(m, s)} 2 θ Θ θ 0 θ Θ θ 0 a J x A n ɛ (X s) ã T θ (a) x A n ɛ (X s) x x+h n θ G n H θ n J 2 n[h(x S)+δ] T θ 2 n[h(x [X] θs)+δ] G n H θ n Hence, ( ) P Err e m, s G n T θ 2 n[h(x [X] θs)+δ] J 2 n[h(x S)+δ] H θ n θ Θ θ 0 Note that J = 2 nr in and T θ = 2 n( ω θ)r in and H(X S) H(X [X] θ S) = H([X] θ S). Therefore, for θ Θ and θ 0, we require R in > ( ) log G : H θ H([X] θ ω θ The Error Event Err d2 Let Err = Err d2 Erre c Errd c. Then the probability of the error event Err is equal to ( ) P Err = 2 nr 2 nr 2 nr m= 2 nr m= p n S(s) s n S n a J y Y n W n (y x, s) p n S(s) s S n a J y Y n W n (y x, s) J A n {α(m,s)> ɛ (X s) x A n 2 G n } ɛ (X s) 2 nr m= m m x A n ɛ (X s) 2 nr m= m m {φ(ã)+b m = x} ã J x A n ɛ (X y) 2 G n J A n ɛ (X s) {φ(a)+b m=x} ã J x A n ɛ (X y) {φ(ã)+b m = x} α(m, s) {φ(a)+b m=x} 50

62 Therefore, { } E P (Err) 2 nr = 2 nr 2 nr m= 2 nr m= p n S(s) s S n a J 2 nr m= m m ã J x A n ɛ (X y) p n S(s) s S n a J 2 nr m= m m x A n ɛ (X s) 2 G n J A n ɛ (X s) y Y n W n (y x, s) ( ) P φ(a) + B m = x, φ(ã) + B m = x x A n ɛ (X s) G 2n ã J x A n ɛ (X y) 2nR J 2 n[h(x Y )+δ] 2 G n Therefore, we require to have 2 G n J A n ɛ (X s) R + R in < log G H(X Y ) = Ī(X; Y ) y Y n W n (y x, s) 3..2 Nested Group/Random Codes for Channel Coding We employ a nested code in which the outer code is a group code and the inner code is obtained by random binning of the outer code. Let C out = {φ(a) + B a J} where φ and J are defined in (2.) and (2.4) respectively and B is uniformly distributed over G n. Define the random mapping s : J {, 2,, 2 nr } where R is the rate of communication and for a J, s(a) s are independent and uniformly distributed over {, 2,, 2 nr }. Note the rate of the outer code is R out = log J. n The encoding and decoding rules are as follows: Given a message m from the set {, 2,, 2 nr }, and the channel state s S n, define α(m, s) = a J x A n ɛ (X s) {φ(a)+b=x,s(a)=m} Note that if α(m, s) > 0, then there exists at least one a J with s(a) = m and φ(a)+b A n ɛ (X s). In this case, the encoder picks one such a and sends x = φ(a)+b 5

63 over the channel. The encoder will declare an encoding error if α(m, s) = 0. Although it may be unnecessary, it is convenient in the proofs to assume that the encoder declares error if α(m, s) J An ɛ (X s) 2 2 nr G. We denote this error event by Err n e. We also assume that a J is picked with probability x A n ɛ (X s) {φ(a)+b=x,s(a)=m} α(m,s). At the decoder, after receiving the channel output y Y n, the decoder looks for a unique message ˆm {, 2,, 2 nr } for which there exists a J with s(a) = ˆm and φ(a) + B+ A ɛ (X y). If it doesn t find such ˆm (Err d ) or if it finds multiple such ˆm s (Err d2 ), it declares error. We show that if Is.c.(X; G S) Ic.c(X; G Y ), then for any rate R Ic.c.(X; G Y ) Ī(X; S) the probability of all the errors (Err e, Err d and Err d2 ) approach zero as the block length approaches infinity. Note that by the standard typicality results [20, Theorem 3..2] one can show that with probability approaching one as the block length increases, φ(a) + B A ɛ (X y). Therefore, the probability of the error event Err d vanishes as the block length increases. It suffices to show that for any choice of weights (w p,r ) (p,r) Q(G), the probability of the two error events Err e and Err d2 vanish if assuming I G s.c.(x; S) I G c.c(x; Y ). R out R > Ī(X; S) R out < ω θ ( log H θ H(X [X] θ Y ) ) The Error Event Err e Note that given the message m and the channel state s, the error event Err e occurs if α(m, s) J An ɛ (X s) 2 2 nr G n. We bound the probability of error using the following 52

64 Chebyshev s inequality: ( ) P Err e m, s We have and E{α(m, s) 2 } = Therefore, E{α(m, s)} = a J a,ã J x, x A n ɛ (X s) = a J ( = P α(m, s) J An ɛ (X s) ) 2 2 nr G n x A n ɛ (X s) + a J x A n ɛ (X s) = J An ɛ (X s) 2 nr G n x A n ɛ (X s) var{α(m, s)} E{α(m, s)} 2 ( ) P φ(a) + B = x, s(a) = m ( ) P φ(a) + B = x, φ(ã) + B = x, s(a) = m, s(ã) = m 2 nr G n θ Θ θ r ã T θ (a) var{α(m, s)} = E{α(m, s) 2 } E{α(m, s)} 2 Hence, = J An ɛ (X s) + 2 nr G n θ Θ a J θ 0 θ r x A n ɛ (X s) x x+h n θ x A n ɛ (X s) ã T θ (a) 2 2nR G n H θ n x A n ɛ (X s) x x+h n θ 2 2nR G n H θ n J An ɛ (X s) + J 2 n[h(x S)+δ] T θ 2 n[h(x [X] θs)+δ] 2 nr G n 2 2nR G n H θ n θ Θ θ 0 θ r ( ) P Err e m, s 2 nr G n J 2 n[h(x S)+δ] + θ Θ θ 0 θ r G n T θ 2 n[h(x [X] θs)+δ] J 2 n[h(x S)+δ] H θ n Note that J = 2 nr out and T θ = 2 n( ω θ)r out and H(X S) H(X [X] θ S) = H([X] θ S). Therefore, for θ Θ, θ 0, and θ r, we require R out > ( ) log G : H θ H([X] θ S) ω θ R out R > log G H(X S) 53

65 These conditions are equivalent to the following conditions: R out > ( ) log G : H θ H([X] θ S) ω θ R out R > log G H(X S) for θ Θ and θ r The Error Event Err d2 Let Err = Err d2 Erre c Errd c. Then the probability of the error event Err is equal to ( ) P Err = 2 nr 2 nr 2 nr m= 2 nr m= p n S(s) s n S n a J y Y n W n (y x, s) p n S(s) s n S n a J y Y n W n (y x, s) J A n {α(m,s)> ɛ (X s) x A n 2 2 ɛ (X s) nr G n } 2 nr m= m m {φ(ã)+b= x,s(ã)= m} ã J x A n ɛ (X y) x A n ɛ (X s) 2 nr m= m m α(m, s) {φ(a)+b=x,s(a)=m} 2 2nR G n J A n ɛ (X s n ) {φ(a)+b=x,s(a)=m} ã J x A n ɛ (X y) {φ(ã)+b= x,s(ã)= m} 54

66 Therefore, { } E P (Err) 2 nr 2 nr 2 nr θ Θ θ r 2 nr m= m= m m 2 nr p n S(s) s S n a J 2 nr m= ã J x A n ɛ (X y) p n S(s) s n S n a J m= θ Θ ã T θ (a) m m Therefore, we require to have x A n ɛ (X s) 2 2 nr G n J A n ɛ (X s n ) y Y n W n (y x, s) ( ) P φ(a)+b = x, φ(ã)+b = x, s(a) = m, s(ã) = m x A n ɛ (X y) x x+h n θ T θ 2 n[h(x [X] θy )+δ] 2 H θ n x A n ɛ (X s) 2 2nR G n J A n ɛ (X s n ) G n H θ n 2 2nR R out < I G c.c.(x; Y ) y Y n W n (y x, s) 3.2 Nested Codes for Sources with Side Information Consider a point to point source coding problem with side information available at the decoder. Denote the source by (X, S, U, p XS, d) where X, S and U are the source, side information and reconstruction alphabets correspondingly, p XS is the joint distribution of the source and the side information and d : X U R + is the measure of reconstruction. We assume U = G for some Abelian group G. We study the performance of nested random/group codes and nested group/random codes for this problem. These ensembles can be used in multi-terminal communications problems such as the distributed source coding and the multiple description coding Nested Random/Group Codes for Source Coding We employ a nested code in which the inner code is a group code and the outer code consists of random shifts of the inner code. Let C in = {φ(a) a J} where φ and 55

67 J are defined in (2.) and (2.4) respectively and let C out = 2 nr m= (C in + B m ) where B m s are iid random variables distributed uniformly over G n. Note that the rates of the inner and outer codes are equal to R in = n log J and R out = R in + R where R is the compression rate of our coding scheme. define The encoding and decoding rules are as follows: Given a source sequence x X n, α(x) = 2 nr {φ(a)+bm=u} m= a J u A n ɛ (U x) Note that if α(x) > 0, then there exists at least one m {,, 2 nr } and one a J with φ(a)+b m A n ɛ (U x). In this case, the encoder picks one such pair and sends m to the channel. The encoder will declare an encoding error if α(x) = 0. Although it may be unnecessary, it is convenient in the proofs to assume that the encoder declares error if α(x) 2nR J A n ɛ (U x) 2 G n that the pair (a, m) is picked with probability. We denote this error event by Err e. We also assume u A n ɛ (U x) {φ(a)+s i =u} α(x). At the decoder, having access to s and m, the decoder looks for a unique â J such that φ(â )+B m A n ɛ (U s). If it doesn t find such â (Err d ) or if it finds multiple such â s (Err d2 ), it declares error. We show that if R Ī(U; X) IG c.c.(u; S) then the probability of all the errors (Err e, Err d and Err d2 ) approach zero as the block length approaches infinity. Note that by the standard typicality results [20, 56

68 Theorem 3..2] one can show that with probability approaching one as the block length increases, φ(a) + B m A ɛ (U x). Therefore, the probability of the error event Err d vanishes as the block length increases. It suffices to show that for any choice of weights (w p,r ) (p,r) Q(G), the probability of the two error events Err e and Err d2 vanish if R + R in > Ī(U; X) R in < I G c.c.(u; S) We first show that the error events vanish if for any θ Θ, θ 0, R + ω θ R in > Ī([U] θ; X) R in < I G c.c.(u; S) The Error Event Err e Note that given the source sequence x, the error event Err e occurs if α(x) 2 nr J An ɛ (U x). We bound the probability of error using the following Chebyshev s 2 G n inequality: We have ( ) ( P Err e x = P α(x) 2nR J A n ɛ (U x) ) var{α(x)} 2 G n E{α(x)} 2 E{α(x)} = 2 nr m= a J u A n ɛ (U x) = 2nR J A n ɛ (U x) G n ( ) P φ(a) + B m = u 57

69 and E{α(x) 2 } = Therefore, Hence, = 2 nr P m, m= a,ã J u,ũ A n ɛ (U s) 2 nr m= a J u A n ɛ (U x) θ Θ ã T θ (a) θ Θ θ nr m, m= m m a,ã J ( ) φ(a) + B m = u, φ(ã) + B m = ũ ũ A n ɛ (U x) ũ u+h n θ G 2n u,ũ A n ɛ (U s) G n H θ n 2 nr J 2 n[h(u X)+δ] T θ 2 n[h(u [U] θx)+δ] + 22nR J 2 A n ɛ (U x) 2 G n H θ n G 2n var{α(x)} = E{α(x) 2 } E{α(x)} 2 θ Θ θ 0 2 nr J 2 n[h(u X)+δ] T θ 2 n[h(u [U] θx)+δ] G n H θ n ( ) P Err e x G n T θ 2 n[h(x [X] θs)+δ] 2 nr J 2 n[h(x S)+δ] H θ n θ Θ θ 0 Note that J = 2 nr in and T θ = 2 n( ω θ)r in and H(U X) H(U [U] θ X) = H([U] θ X). Therefore, for θ Θ and θ 0, we require R + ω θ R in > ( ) log G : H θ H([U] θ X) The Error Event Err d2 Let Err = Err d2 Erre c Errd c. Then the probability of the error event Err is equal to ( ) P Err x X n p n XS(x, s) s S n ã J ã a 2 nr m= a J ũ A n ɛ (U s) {φ(ã)+bm=ũ} {φ(a)+bm=u} α(x) u A n ɛ (U x) 58

70 Therefore, { } E P (Err) x X n x X n θ Θ θ r p n XS(x, s) s S n P ã J ã a ũ A n ɛ (U s) 2 nr p n XS(x, s) s S n θ Θ θ r ã T θ (a) m= a J u A n ɛ (U x) 2 G n 2 nr J A n ɛ (U x) ( φ(a) + B m = u, φ(ã) + B m = ũ 2 nr m= a J u A n ɛ (U x) ũ A n ɛ (U s) ũ u+h n θ T θ 2 n[h(u [U] θs)+δ] H θ n G n H θ n Therefore, we require to have R in < ω θ ( log H θ H(U [U] θ X) θ r. Equivalently, R in < I G c.c.(u; S) ) 2 G n 2 nr J A n ɛ (U x) ) for all θ Θ, Simplification of the Rate Region In this section, we show that if R in < Ic.c.(U; G S) then ( ) log G : H θ H([U] θ X) ω θ R in = log G H(U X) R in max θ Θ θ 0 We show this by contradiction. Note that the right-hand-side is equal to the lefthand-side for θ = r. Assume for some θ Θ, θ 0, Then we have log G : H θ H(U [U] θ X) ω θ R in > log G H(U X) R in ( ω θ )R in > log G H(U X) = log H θ H([U] θ X) ( ) log G : H θ H(U [U] θ X) which is a contradiction by the definition of I G c.c.(u; S) if we have the Markov chain U X S. 59

71 3.2.2 Nested Group/Random Codes for Source Coding We employ a nested code in which the outer code is a group code and the inner code is a random subset of the outer code. Let C out = {φ(a) + B a J} where φ and J are defined in (2.) and (2.4) respectively and B is uniformly distributed over G n. Define the mapping s : J {, 2,, 2 nr } such that for a J, s(a) s are iid random variables uniformly distributed over {, 2,, 2 nr } where R is the communication rate of the coding scheme. Note that the rates of the inner and outer codes are equal to R out = log J and R n in = R out R. define The encoding and decoding rules are as follows: Given a source sequence x X n, α(x) = a J u A n ɛ (U x) {φ(a)+b=u} Note that if α(x) > 0, then there exists at least one a J with φ(a) + B A n ɛ (U x). In this case, the encoder picks one such pair and sends m = s(a) to the channel. The encoder will declare an encoding error if α(x) = 0. Although it may be unnecessary, it is convenient in the proofs to assume that the encoder declares error if α(x) J An ɛ (U x) 2 G n is picked with probability. We denote this error event by Err e. We also assume that a J u A n ɛ (U x) {φ(a)+b=u} α(x). At the decoder, having access to s and m, the decoder looks for a unique â J such that φ(â ) + B A n ɛ (U s) and s(â) = m. If it doesn t find such â (Err d ) or if it finds multiple such â s (Err d2 ), it declares error. We show that if R Is.c.(U; G X) Ī(U; S) 60

72 then the probability of all the errors (Err e, Err d and Err d2 ) approach zero as the block length approaches infinity. Note that by the standard typicality results [20, Theorem 3..2] one can show that with probability approaching one as the block length increases, φ(a) + B A ɛ (U x). Therefore, the probability of the error event Err d vanishes as the block length increases. It suffices to show that for any choice of weights (w p,r ) (p,r) Q(G), the probability of the two error events Err e and Err d2 vanish if R out > I G s.c.(u; X) R out R < Ī(U; S) The Error Event Err e Note that given the source sequence x, the error event Err e occurs if α(x) J A n ɛ (U x). We bound the probability of error using the following Chebyshev s in- 2 G n equality: ( ) ( P Err e x = P α(x) J An ɛ (U x) ) var{α(x)} 2 G n E{α(x)} 2 We have E{α(x)} = 2 nr P m= a J u A n ɛ (U x) = J An ɛ (U x) G n ( ) φ(a) + B = u, s(a) = m and E{α(x) 2 } = = 2 nr P m, m= a,ã J u,ũ A n ɛ (U s) 2 nr m, m= a J u A n ɛ (U x) θ Θ ã T θ (a) θ Θ ( ) φ(a) + B m = u, φ(ã) + B m = ũ, s(a) = m, s(ã) = m ũ A n ɛ (U x) ũ u+h n θ J 2 n[h(u X)+δ] T θ 2 n[h(u [U] θx)+δ] G n H θ n G n H θ n 2 2nR 6

73 Therefore, var{α(x)} = E{α(x) 2 } E{α(x)} 2 θ Θ θ 0 J 2 n[h(u X)+δ] T θ 2 n[h(u [U] θx)+δ] G n H θ n Hence, ( ) P Err e x G n T θ 2 n[h(x [X] θs)+δ] J 2 n[h(x S)+δ] H θ n θ Θ θ 0 Note that J = 2 nr out and T θ = 2 n( ω θ)r out and H(U X) H(U [U] θ X) = H([U] θ X). Therefore, for θ Θ and θ 0, we require ω θ R out > ( ) log G : H θ H([U] θ X) Equivalently, we require R out > I G s.c.(u; X) The Error Event Err d2 Let Err = Err d2 Erre c Errd c. Then the probability of the error event Err is equal to ( ) P Err x X n p n XS(x, s) s S n ã J ã a 2 nr m= a J ũ A n ɛ (U s) {φ(ã)+bm=ũ,s(ã)=m} {φ(a)+b=u,s(a)=m} α(x) u A n ɛ (U x) 62

74 Therefore, { } E P (Err) x X n x X n θ Θ θ r p n XS(x, s) s S n P ã J ã a ũ A n ɛ (U s) 2 nr p n XS(x, s) s S n θ Θ θ r ã T θ (a) m= a J u A n ɛ (U x) 2 G n 2 2nR J A n ɛ (U x) ( φ(a) + B m = u, φ(ã) + B m = ũ 2 nr m= a J u A n ɛ (U x) ũ A n ɛ (U s) ũ u+h θ n T θ 2 n[h(u [U] θs)+δ] 2 nr H θ n Therefore, we require to have ( ω θ )R out R < θ Θ, θ r. G n H θ n ) 2 G n 2 2nR J A n ɛ (U x) ( ) log H θ H(U [U] θ S) for all Simplification of the Rate Region In this section, we show that if R out > I G s.c.(u; X) then max( ω θ )R out θ Θ θ r ( ) ( ) log H θ H(U [U] θ S) = R out log G H(U S) We show this by contradiction. Note that the right-hand-side is equal to the lefthand-side for θ = 0. Assume for some θ Θ, θ r, ( ω θ )R out ( ) ( ) log H θ H(U [U] θ S) > R out log G H(U S) Then we have ω θ R out < log G H(U S) ( ) log H θ H(U [U] θ S) = log G : H θ H([U] θ S) which is a contradiction by the definition of I G s.c.(u; S) if the Markov chain U X S holds. 63

75 3.3 Distributed Source Coding In this section, we consider a distributed source coding problem with one distortion constraint and provide an information-theoretic inner bound to the optimal ratedistortion region using group codes. This inner bound strictly contains the available bounds based on random codes Preliminaries The Source Model Consider two distributed sources generating discrete random variables X and Y. Assume X and Y take values from alphabets X and Y respectively with joint distribution p XY (, ). The source sequence (X n, Y n ) is independent over time and has the product distribution P ((X n, Y n ) = (x, y)) = n i= p XY (x i, y i ) for x = (x,, x n ) X n and y = (y,, y n ) Y n. We consider the following distributed source coding problem: The two components, X and Y, of the source are observed by two encoders which do not communicate with each other. Each encoder communicates a compressed version of its input through a noiseless channel to a joint decoder. For a discrete set Z, the decoder wishes to reconstruct a function f : X Y Z of the sources with respect to a general fidelity criterion. Let denote the reconstruction alphabet, and the fidelity criterion is characterized by a mapping: d : X Y ˆ Z R +. We restrict our attention to additive distortion measures, i.e., the distortion among three n-length sequences x = (x,, x n ) X n, y = (y,, y n ) Y n and ẑ = (ẑ,, ẑ n ) ˆ Z n is given by ˆ Z d(x, y, ẑ) n n d(x i, y i, ẑ i ). i= We denote this distributed source by (X, Y, Z, p XY, d). 64

76 Achievability and the Rate-Distortion Region Given a distributed source (X, Y, Z, p XY, d), a transmission system with parameters (n, Θ, Θ 2, ) is defined by the set of mappings Enc : X n {, 2,..., Θ }, Enc 2 : Y n {, 2,..., Θ 2 } (3.) Dec: {,..., Θ } {,..., Θ 2 } such that the following constraint is satisfied. ˆ Z n (3.2) { ( E d X n, Y n, Dec ( Enc (X n ), Enc 2 (Y n ) ))}. (3.3) We say that a tuple (R, R 2, D) is achievable if for all ɛ > 0 and for all sufficiently large n, there exists a transmission system with parameters (n, Θ, Θ 2, ) such that n log Θ i R i + ɛ for i =, 2 D + ɛ. The performance limit is given by the optimal rate-distortion region which is defined as the set of all achievable tuples (R, R 2, D) The Berger-Tung Region An achievable rate region for the problem defined in Section can be obtained based on the Berger-Tung coding scheme [5, 70] as follows: Let U and V be two finite sets and let U and V be two auxiliary random variables distributed over U and V according to the conditional probabilities P U X and P V Y g : U V respectively. Define ˆ Z as that function of U and V that gives the optimal reconstruction Ẑ with respect to the distortion measure d(,, ) so that E{d(X, Y, g(u, V ))} is minimized. 65

77 With these definitions, an achievable rate region for this problem is as follows: For a given distributed source (X, Y, Z, p XY, d) let D be a distortion level for the reconstruction. Let U and V be auxiliary random variables for which there exists a function g(, ) such that E{d(X, Y, g(u, V ))} D. Then the rate-distortion tuple (R, R 2, D) is achievable if R I(X; U V ) R 2 I(Y ; V U) R + R 2 I(XY ; UV ) This assertion follows from the analysis of the Berger-Tung problem [5, 70] in a straightforward way The Korner-Marton and the Ahlswede-Han Schemes Consider a distributed source coding problem in which two distributed binary sources X and Y seek to communicate the sum of the two sources Z = X + Y to a centralized decoder losslessly. Korner and Marton [4] propose a coding scheme based on binary linear codes to achieve the rates R = R 2 = H(Z). For certain cases, this rate is not achievable using the Berger-Tung scheme. Ahlswede and Han [9] propose a two layered coding scheme, consisting a Berger-Tung layer and a Korner-Marton layer to achieve the following rate region: Let P and Q be finite auxiliary random variables satisfying the Markov chain P X Y Q. Then the rate pair (R, R 2 ) is achievable if R I(X; P Q) + H(Z P Q) R 2 I(Y ; Q P ) + H(Z P Q) R + R 2 I(XY ; P Q) + 2H(Z P Q) 66

78 3.3.2 The Main Result In this section, we provide an inner bound to the achievable rate-distortion region which strictly contains the Berger-Tung rate region. The following theorem is the main result of this section. Theorem III.. For the distributed source (X, Y, Z, p XY, d), let U, V, P and Q be random variables jointly distributed with XY such that U and V take values from an Abelian group G, and P and Q take values from finite sets P and Q respectively. Assume the following Markov chains hold P X Y Q U (P, X) (Y, Q) V and assume there exists a function g : G P Q { } E d(x, Y, g(z, P, Q)) D ˆ Z such that for Z = U + V where + is the group operation. We show that with these definitions the rate-distortion triple (R, R 2, D) is achievable where R I(X; P Q) + Ī(U; XP ) IG c.c.(z; P Q) R 2 I(Y ; Q P ) + Ī(V ; Y Q) IG c.c.(z; P Q) R + R 2 I(XY ; P Q) + Ī(UV XY P Q) 2IG c.c.(z; P Q) Note that the case where U and V are trivial, corresponds to the Berger-Tung scheme and the case where P and Q are trivial, U = X, V = Y and the alphabets are binary corresponds to the Korner-Marton scheme. When P and Q are nontrivial, U = X, V = Y and the alphabets are binary this scheme corresponds to the Ahlswede-Han scheme. 67

79 3.3.3 The Coding Scheme In order to show the achievability, it suffices to show the achievability of the following corner point: R = I(X; P ) + Ī(U; XP ) IG c.c.(z; P Q) R 2 = I(Y ; Q P ) + Ī(V ; Y Q) IG c.c.(z; P Q) To show the achievability, we use a random coding argument as follows: Let C p be the code designed for the random variable P defined as follows: C p = { } C p (), C p (2),, C p (2 nrp ) where R p is the rate of this code and for i =,, 2 nrp, C p (i) s are iid random variables uniformly distributed over A n ɛ (P ). For the random variable Q, we use a nested code in which the outer code is defined as C qo = { } C qo (), C p (2),, C p (2 nrqo ) where R qo is the rate of the outer code and for i =,, 2 nrqo, C qo (i) s are iid random variables uniformly distributed over A n ɛ (Q). The outer code is partitioned into inner codes using the mapping m : C qo {, 2,, 2 nrq } where R q is the transmission rate for sending Q and for c C qo, m(c) s are iid random variables uniformly distributed over {, 2,, 2 nrq }. Note that the codes for P and Q are unstructured random codes. In the next layer of coding, we use structured random codes to transmit U + V where the random variables U and V are transmitted using nested codes with a common inner code. Let C g be a group code over G defined as follows: C g = {φ(a) a J} 68

80 where the Abelian group J and the homomorphism φ are defined according to (2.4) and (2.) respectively. The code C g is the common inner code between U and V and its rate is given by (2.5). The outer code for U is defined as C uo = s i + C g i {,,2 nru } where R u is the transmission rate for sending U and for i =,, 2 nru, s i s are iid random variables uniformly distributed over G n. Similarly, the outer code for V is defined as C vo = i {,,2 nrv }t i + C g where R v is the transmission rate for sending V and for i =,, 2 nrv, t i s are iid random variables uniformly distributed over G n. In the above code constructions, different random variables are assumed to be independent unless otherwise stated. For convenience, we also define the mappings s : C uo {,, 2 nru } and t : C vo {,, 2 nrv } where for i {,, 2 nru } and c s i + C g, s(c) = s i and the map t is similarly defined. The encoding and decoding rules are as follows: Given a source pair (x, y) X n Y n, if x / A n ɛ (X), the X-encoder declares error (Err x ); otherwise it looks for p C p such that p A n ɛ (P x). If it finds such p, it is sent to the decoder; otherwise, it declares error (Err p ). Similarly, if y / A n ɛ (Y ), the Y -encoder declares error (Err y ); otherwise it looks for q C qo such that q A n ɛ (Q y). If it finds such q, m(q) is sent to the decoder; otherwise, it declares error (Err q ). This stage of encoding is essentially the Berger-Tung layer of the coding scheme. In the second stage of encoding, assuming that no error occurred in the first stage, the X-encoder looks for u C uo such that u A n ɛ (U xp). Given the sequences 69

81 x X n and p P n, define α(x, p) = 2 nru i= a J u A n ɛ (U x,p) {φ(a)+si =u} Note that if α(x, p) > 0, then there exists at least one i {,, 2 nru } and one a J with φ(a) + s i A n ɛ (U x, p). In this case, the encoder picks one such pair and sends i to the channel. The encoder will declare an encoding error if α(x, p) = 0. Although it may be unnecessary, it is convenient in the proofs to assume that the encoder declares error if α(x, p) 2nRu J A n ɛ (U x,p) 2 G n also assume that the pair (a, i) is picked with probability. We denote this error event by Err u. We u A n ɛ (U x,p) {φ(a)+s i =u} α(x,p). Similarly, assuming that no error occurred in the first stage, the Y -encoder looks for v C vo such that v A n ɛ (V yq). Given the sequences y Y n and q Q n, define β(x, p) = 2 nrv j= {φ(b)+tj =v} b J v A n ɛ (V y,q) Note that if β(y, q) > 0, then there exists at least one j {,, 2 nrv } and one b J with φ(b) + t j A n ɛ (V y, q). In this case, the encoder picks one such pair and sends j to the channel. The encoder will declare an encoding error if β(y, q) = 0. Same as above, it is convenient in the proofs to assume that the encoder declares error if β(y, q) 2nRv J A n ɛ (V y,q) 2 G n that the pair (b, j) is picked with probability. We denote this error event by Err v. We also assume v A n ɛ (V y,q) {φ(b)+t j =v} β(y,q). At the receiver, assuming no encoding errors occurred in either of the terminals, p, m(q), s(u) and t(v) are available. The decoder sets ˆp = p and looks for ˆq C qo such that ˆq A n ɛ (Q ˆp). If it does not find such ˆq it declares error (Errˆq ). In the next stage of decoding, assuming no error occurred in the first stage, the decoder looks for ẑ A n ɛ (Z ˆpˆq) for which there exist û C uo and ˆv C vo such that s(û) = s(u), t(ˆv) = t(v) and û + ˆv = ẑ. If it does not find such ẑ, it declares error (Errẑ). 70

82 3.3.4 Error Analysis In Section 3.3.3, we defined the error events Err x, Err p, Err y, Err q, Err u, Err v, Errˆq and Errẑ. In addition, we define the following error events which may not be necessarily observable at any terminal: The error event Err xy is the event (x, y) / A n ɛ (XY ); Err q ˆq is the event that ˆq q; and Err z ẑ is the event that ẑ z. First we show that if none of the error events occur, then (x, y, ˆp, ˆq, ẑ) A n ɛ (XY P QZ). This is equivalent to showing (x, y, p, q, z) A n ɛ (XY P QZ) where z = u + v and in turn, it suffices to show (x, y, p, q, u, v) A n ɛ (XY P QUV ). Note that (x, y, ˆp, ˆq, ẑ) A n ɛ (XY P QZ) implies x, y and g(ẑ, ˆp, ˆq) are jointly typical which in turn implies d(x, y, g(ẑ, ˆp, ˆq)) E{d(X, Y, g(z, P, Q))} D. We need the following Markov lemma: Lemma III.2. Let X, Y, Z be random variables taking values from finite sets X, Y, Z respectively such that the Markov chain X Y Z holds. For n =, 2,, let (x (n), y (n) ) A n ɛ (XY ) and let K (n) be a random vector taking values from Z n with distribution satisfying (for simplicity of notation we call them x, y, K respectively) P (K = z) p n Z Y (z y)e ɛnn for some ɛ n 0 as n. Then, as n P ((x, y, K) A n ɛ (XY Z)) Proof. Provided in Section Note that if the error event Err xy does not happen, (x, y) A n ɛ (XY ) and therefore, the regular Markov lemma implies that (x, y, p, q) A n ɛ (XY ) since the Markov chain P X Y Q holds. Assuming that Err u does not occur, let K U n be 7

83 the output of the encoder. We have P (K = u φ, s, s 2, ) = a J Therefore, = a J a J 2 nru i= 2 nru i= 2 nru i= P (K = u) ũ A n ɛ (U x,p) {φ(a)+s i =ũ} {φ(a)+si =u} α(x, p) α(x, p) {φ(a)+s i =u,u A n ɛ (U x,p)} 2 G n 2 nru J A n ɛ (U x, p) {φ(a)+s i =u,u A n ɛ (U x,p)} 2 A n ɛ (U x, p) {u A n ɛ (U x,p)} Let U n be distributed according to p n U XY P Q ( x, y, p, q) = pn U XP ( x, p). It is known that for u A n ɛ (U x, p), P (U n = u) e δnn A n ɛ (U x,p) for some δ n converging to zero. This implies that there exist a sequence ɛ n converging to zero such that P (K = u) P (U n = u)e ɛnn = p n U XP (u x, p)e ɛnn Consider the Markov chain U (P, X) (Y, Q) and use Lemma III.2 for (x, p), (y, q), K. It follows that with high probability, (x, y, p, q, u) A n ɛ (XY P QU). Similarly, using the above argument for v and considering the Markov chain (U, P, X) (Y, Q) V, we can show that with high probability (x, y, p, q, u, v) A n ɛ (XY P QUV ) (3.4) Next we show that the expected value of the probability of all of the error events vanish zero as n increases for the following rates: R p > I(X; P ) R qo > I(Y ; Q) R q > I(Y ; Q) I(P ; Q) = I(Y ; Q P ) R g < I G c.c(z; P Q) R g + R u > Ī(U; XP ) R u > Ī(U; XP ) IG c.c(z; P Q) R g + R v > Ī(V ; Y Q) R v > Ī(V ; Y Q) IG c.c(z; P Q) 72

84 This will show that the claimed rates are achievable since R = R p + R u and R 2 = R q + R v. It is straightforward to show that the probabilities of the error events Err x, Err y, Err xy fall exponentially by n. It also follows from the standard approaches that the probabilities of the error events Err p, Err q and Err q ˆq vanish as n increases [5,70]. It remains to show the same for Err u, Err v, Errẑ, Errˆq and Err z ẑ The Error Events Err u and Err v For the source output x X n, assuming that Err p did not occur, let p be the output of the first step of encoding. The error event Err u occurs if α(x, p) 2 nru J A n ɛ (U x,p). We bound the probability of error using the following Chebyshev s 2 G n inequality: We have ( ) ( P Err u x, p = P α(x, p) 2nRu J A n ɛ (U x, p) ) 2 G n E{α(x, p)} = 2 nru i= a J u A n ɛ (U x,p) Note that φ(a) + s i is uniform over G n ; therefore, E{α(x, p)} = 2 nru i= a J u A n ɛ (U x,p) P (φ(a) + s i = u) G n = 2 nru J A n ɛ (U x, p) G n var{α(x, p)} E{α(x, p)} 2 73

85 Furthermore, E{α(x, p) 2 } = Therefore, Therefore, = + = + 2 nru 2 nru i= a J u A n ɛ (U x,p) j= ã J ũ A n ɛ (U x,p) 2 nru i= 2 nru i= 2 nru i= 2 nru i= a J u A n ɛ (U x,p) ã J a J u A n ɛ (U x,p) 2 nru j= j i a J u A n ɛ (U x,p) ã J a J u A n ɛ (U x,p) E{α(x, p)} 2 + θ Θ var{α(x, p)} θ Θ θ Θ 2 nru i= 2 nru j= j i 2 nru i= P (φ(a) + s i = u, φ(ã) + s j = ũ) ũ A n ɛ (U x,p) P (φ(a) + s i = u, φ(ã) + s i = ũ) ã J ũ A n ɛ (U x,p) P (φ(a) + s i = u, φ(ã) + s j = ũ) ũ A n ɛ (U x,p) G G 2n ã J ũ A n ɛ (U x,p) a J u A n ɛ (U x,p) ã T θ (a) a J u A n ɛ (U x,p) ã T θ (a) n P (φ(ã a) = ũ u) ũ A n ɛ (U x,p) ũ u H n θ ũ A n ɛ (U x,p) ũ u H n θ G n H θ n G n H θ n 2 nru J T θ 2 n[h(u XP )+ɛ] 2 n[h(u XP [U] θ)+ɛ] G n H θ n ( ) P Err u x, p G n J 2 nru 2 T θ 2 nh(u XP [U]θ) nh(u XP ) H θ n θ Θ Note that by Lemma II., T θ = 2 nr( ω θ). Therefore, R g < I G c.c.(z; P Q) implies for all θ Θ such that θ r, T θ 2 nh(z P Q[Z] θ) H θ n 0 74

86 as n increases. Note that H(U XP [U] θ ) (a) = H(U XP ) H([U] θ XP ) (b) = H(U XP QV ) H([U] θ XP QV ) = H(U XP QV [U] θ ) = H(UV XP QV [U] θ ) (c) = H(V Z XP QV [Z] θ ) = H(Z XP QV [Z] θ ) H(Z P Q[Z] θ ) where (a) follows since [U] θ is a function of U; (b) follows since the Markov chains U (P, X) (Q, V ) and [U] θ P (Q, V ) hold; and (c) holds since there are one to one correspondences between (U, V ) and (V, Z) and between ([U] θ, V ) and (V, [Z] θ ). Therefore, for θ r, Note that for θ = r, we have Therefore, it remains to show that T θ 2 nh(u XP [U] θ) H θ n T θ 2 nh(z P Q[Z]θ) H θ n 0 T θ 2 nh(u XP [U] θ) H θ n = G n J 2 nru 2 nh(u XP ) 0 as n. Note that J = 2 nrg ; therefore, G n J 2 nru 2 nh(u XP ) = G n 2 n(rg+ru) 2 nh(u XP ) 0 since R g + R u > log G H(U XP ). With a similar argument, we can show that the probability of the event Err v approaches zero as n increases. 75

87 The Error Events Errˆq and Errẑ Equation (3.4) implies that with hight probability, the choice of ˆq = q satisfies the conditions ˆq C qo and ˆq A n ɛ (Q ˆp). Therefore, the probability of the error event Errˆq vanishes as n increases. Similarly, ẑ = z = u + v satisfies the necessary conditions and it is straightforward to show that the probability of the event Errẑ approaches zero as n The Error Event Err z ẑ Let Err be the event that the error Err z ẑ occurs but none of the other error events occur. Then the probability of the error event Err is equal to ( ) P Err 2 nru i= 2 nrv j= 2 nru i= 2 nrv j= (x,y,p,q) A n ɛ (XY P Q) p n XY (x, y) a J u A n ɛ (U x,p) 2 nrp k= 2 nrqo l= α(x, p) {φ(a)+s i =u} β(y, q) {φ(b)+t j =v} b J v A n ɛ (V y,q) (x,y,p,q) A n ɛ (XY P Q) p n XY (x, y) a J u A n ɛ (U x,p) b J v A n ɛ (V y,q) 2 nrp k= 2 nrqo l= {Cp(k)=p,C qo(l)=q} z A n ɛ (Z p,q) z z c J {Cp(k)=p,C qo(l)=q} 2 G n 2 nru J A n ɛ (U x, p) {φ(a)+s i =u} 2 G n 2 nrv J A n ɛ (V y, q) {φ(b)+t j =v} {φ( c)+si +t j = z} z A n ɛ (Z p,q) z z c J {φ( c)+si +t j = z} 76

88 Therefore, { } E P (Err) (x,y) A n ɛ (XY ) p n XY (x, y) 2nRp A ɛ (P x) A ɛ (P ) 2 G n 2 nrv 2 nru J A n ɛ (U x, p) j= c J z A n ɛ (Z p,q) z z 2 nrv j= θ Θ θ r c J (x,y) A n ɛ (XY ) c J 2 G n v A n ɛ (V y,q) z A n ɛ (Z p,q) z z+h n θ 2nRqo A ɛ (Q y) A ɛ (Q) v A n ɛ (V y,q) 2 nru i= a J u A n ɛ (U x,p) 2 G n 2 nrv J A n ɛ (V y, q) ( P φ(a)+s i =u, φ(c)+s i +t j =u+v, φ( c)+s i +t j = z 2nRu p n nɛ XY (x, y)2 c T θ (c) i= a J u A n ɛ (U x,p) 2 nrv J A n ɛ (V y, q) G 2n H θ n 4 2 nɛ T θ 2 n[h(z [Z] θp Q)+ɛ] H θ n θ Θ θ r ( ) Therefore, we require to have R g < ω θ log H θ H(Z [Z] θ P Q) θ r. Equivalently, ) 2 G n 2 nru J A n ɛ (U x, p) for all θ Θ, R g < I G c.c.(z; P Q) Examples Consider a two-user distributed source coding problem in which the two sources X and Y take values from Z 4 and a centralized decoder in interested in decoding the sum of the two sources losslessly. Furthermore, assume that X is uniformly distributed over Z 4 and Y = X + Z where Z is independent from X and is distributed over Z 4 such that p Z (0) = τ and p Z () = p Z (2) = p Z (3) = τ 3 for some τ (0, ). Let R and R 2 be the rates of the two encoders. Using unstructured codes (Slepian-Wolf 77

89 coding), a sum rate of R = R + R 2 = H(X, Y ) is achievable. We have R = H(X, Y ) = H(X, X + Z) = H(X) + H(Z) = 2 + h(τ) + τ log 3 where h( ) denotes the binary entropy function. Using the scheme proposed by Krithivasan and Pradhan [42], the mod-4 operation can be embedded in Abelian groups Z 4, Z 7, Z 3 2 and Z 2 4. Let the auxiliary random variables U and V be equal to X and Y respectively and let P and Q be trivial random variables. It turns out that the rate pair (R, R 2 ) is achievable where ( ) ( ) R = R 2 = 2 min 2 H(Z), 2 2H([Z]) = max H(Z), 2H([Z]) ( ) where [Z] = Z+{0, 2}. Therefore, a sum rate of R = R +R 2 = 2 max H(Z), 2H([Z]) is achievable using the scheme proposed in [42]. Note that any a Z 4 can be uniquely represented by a = â + ã where â {0, } and ã {0, 2}. Now consider the following assignments for the auxiliary random variables: Let P = X, Q = Ŷ, U = X and ( ) V = Ỹ. It can be verified that X +Y = g(u +V, P, Q) = X + Ŷ (mod 2)+ X +Ỹ. Therefore the new coding theorem implies that the following sum rate is achievable: R = R + R 2 = I(XY ; XŶ ) + H( X + Ỹ XŶ ) = H( XŶ ) + H( X + Ỹ XŶ ) = H( XẐ) + H( Z XẐ) = H( XẐ Z) = H( X) + H(Z) Figure compares the sum rate for the coding schemes for different values of τ. As can be seen in the figure, the scheme based on group codes presented in this section outperforms the other schemes. 78

90 Sum Rate (R) New Scheme KP SW Figure 3.: Comparison of the performance of random codes vs. group codes for a distributed source coding problem. As can be seen in the picture, group codes outperform random codes as the structure of the code is matched to the desired function of the two sources Appendix Proof of Lemma III.2 It suffices to show that for all a X, b Y and c Z, n N(a, b, c x, y, K) p XY Z(a, b, c) 0 with probability one as n where N(a, b, c x, y, K) counts the number of occurrences of the triple (a, b, c) in the vector of triples (x, y, K). We have n N(a, b, c x, y, K) p XY Z(a, b, c) p XY Z (a, b, c) n N(a, b x, y)w Z Y n (K y) + n N(a, b x, y)w Z Y n (K y) N(a, b, c x, y, K) n Note that it follows from standard typicality results that the first term in the equation above vanishes as n increases almost surely. Next, we show that the second term also 79

91 vanishes almost surely. We have n N(a, b x, y)w n Z Y (K y) n N(a, b, c x, y, K)= n where for i =, 2,, n, n n ] {xi =a,y i =b}[ py Z (c y i ) {Ki =c} i= n i= θ i θ i = p Y Z (c y i ) {Ki =c} Let Z n be a random vector generated according to p Z Y ( y) and define θ i = p Y Z (c y i ) {Zi =c} Note that both θ i and θ i are binary random variables taking values from the set {p Z Y (c y i ), p Z Y (c y i ) } and θ i, θ i. We have E{ θ i } = 0 and var{ θ i }. It follows from [Proposition, Zhiyi Chi s paper] that θ i satisfied the large deviations principle with a good rate function I( ) such that ( θ + + P θ ) n t e ni(t) n where I(t) is positive. For b {p Z Y (c y i ), p Z Y (c y i ) } n, we have P (θ = b) = z Z n z i c if b i =p Z Y (c y i ) z i =c if b i =p Z Y (c y i ) z Z n z i c if b i =p Z Y (c y i ) z i =c if b i =p Z Y (c y i ) ) = e ɛnn P ( θ = b P (K = z) p n Z Y (z y)e ɛnn 80

92 We have ( θ + + θ n P n ) t = b: b + +bn n nt e ɛnn e n(i(t) ɛn) b: b + +bn n nt P (θ = b) ) P ( θ = b Note that since e n(i(t) ɛn) is summable, the Borel-Cantelli lemma implies that for all t > 0, lim sup n n n θ i t i= Therefore, n n i= θ i 0 as n almost surely. 3.4 The 3-User Interference Channel 3.4. Problem Definition and the Coding Scheme Consider a three-user discrete memoryless interference channel with inputs X, X 2 and X 3 and outputs Y, Y 2 and Y 3. Assume that X takes values from a finite set X and X 2 and X 3 take values from an Abelian group G. The decoder decodes X and Z = X 2 + X 3 jointly where + is the group operation and decoders 2 and 3 decode X 2 and X 3 respectively. The encoder uses the random code C r = {C r (),, C r (2 nr )} where R is the rate of the first user and C r (),, C r (2 nr ) are iid random variables uniformly distributed over A n ɛ (X ). The encoder 2 uses a nested code whose outer code is a shifted group code C g + b 2 over G where b 2 is uniform over G n and C g = {φ(a) a J} 8

93 where φ and J are defined in (2.) and (2.4) respectively. The code C g is the common code between X 2 and X 3 and its rate is equal to R g = k n rw p,r log p (p,r) Q(G) The outer code is partitioned into inner codes by the mapping s : J {, 2,, 2 nr 2 } where R 2 is the rate of the second encoder and {s(a)} a J are iid and uniformly distributed over {, 2,, 2 nr 2 }. Similarly, encoder 3 uses a nested code whose outer code is C g + b 3 where b 3 is uniform over G n and whose inner code is determined by a mapping t : J {, 2,, 2 nr 3 } where R 3 is the rate of the third encoder and {t(a)} a J are iid and uniformly distributed over {, 2,, 2 nr 3 }. Note that the above random codes and random mappings are independent from each other unless otherwise stated. The encoding and decoding rules are as follows: Given a message m {,, 2 nr }, the encoder sends C r (m ). Given a message m 2 {,, 2 nr 2 }, the encoder 2 looks for a J such that φ(a) + b 2 A n ɛ (X 2 ) and s(a) = m 2. If it finds such a, it sends x 2 = φ(a) + b 2 over the channel; otherwise it declares error (Err e2 ). Similarly, given a message m 3 {,, 2 nr 3 }, the encoder 3 looks for b J such that φ(b) + b 3 A n ɛ (X 3 ) and t(b) = m 3. If it finds such b, it sends x 3 = φ(b) + b 3 over the channel; otherwise it declares error (Err e3 ). At the receiver side, the decoder after receiving y Y n, looks for an index ˆm {,, 2 nr } and z C g +b 2 +b 3 such that (C r ( ˆm ), z) A n ɛ (X, X 2 +X 3 y ). If it does not find such a pair or if it finds more than one such index ˆm, it declares error (Err d ). The receiver 2 after receiving y 2 Y n 2, looks for the index ˆm 2 {,, 2 nr 2 } for which there exists a unique â J with φ(â) + b 2 A n ɛ (X 2 y 2 ) 82

94 and s(â) = ˆm 2. If it does not find such ˆm 2, it declares error (Err d2 ). The decoding rule for the third receiver is similar to that of the second receiver. In the following, we show that the expected value of the probability of all the error events over the ensemble approach zero as the block length increases if R < I(X ; Y Z) R + R g < I(X ; Y ) + I c.c. (Z; X Y ) R g R 2 > log G H(X 2 ) R g < Ic.c.(X G 2 ; Y 2 ) R g R 3 > log G H(X 3 ) R g < Ic.c.(X G 3 ; Y 3 ) and R g > I G s.c.(x 2 ; X 2 ). In the following analysis, for simplicity we are assume H(X 2 ) = H(X 3 ) so that one group code can be used for both terminals Error Analysis The Error Event Err e2 : Given a message m 2 {, 2,, 2 nr 2 }, define θ 2 (m 2 ) = a J x 2 A n ɛ (X 2) {φ(a)+b2 =x 2,s(a)=m 2 } To simplify the analysis, we use the following modified encoding rule: If θ 2 (m 2 ) < E{θ 2 (m 2 )} 2 declare error otherwise pick one such a uniformly and send x 2 = φ(a) + b 2. We use Chebyshev s inequality as follows. P (Err e2 m 2 ) = P (θ 2 (m 2 ) < E{θ 2(m 2 )} ) var{θ 2(m 2 )} 2 E {θ 2 (m 2 )} 2 83

95 We have E {θ 2 (m 2 )} = a J = a J x 2 A n ɛ (X 2 ) x 2 A n ɛ (X 2 ) = J An ɛ (X 2 ) G n 2 nr 2 P (φ(a) + b 2 = x 2, s(a) = m 2 ) G n 2 nr 2 and E { θ 2 (m 2 ) 2} = a,ã J x 2, x 2 A n ɛ (X 2 ) = θ Θ = a J + θ Θ a J θ r a J ã T θ (a) x 2 A n ɛ (X 2 ) x 2 A n ɛ (X 2) J An ɛ (X 2 ) G n 2 nr 2 P (φ(a)+b 2 =x 2, φ(ã)+b 2 = x 2, s(a)=m 2, s(ã)=m 2 ) G n 2 nr 2 ã T θ (a) x 2 A n ɛ (X 2) + θ Θ θ r x 2 A n ɛ (X 2 ) x 2 x 2 +H n θ x 2 A n ɛ (X 2) x 2 x 2 +H n θ G n H θ P (s(a)=m 2, s(ã)=m n 2 ) G n H θ n 2 2nR 2 J T θ A n ɛ (X 2 ) A n ɛ (X 2 ) (x 2 + H n θ ) G n H θ n 2 2nR 2 where, r is a vector whose components are indexed by (p, r) Q(G) and whose (p, r) th component is equal to r. Using Lemma II.4, we get var { θ 2 (m 2 ) 2} J An ɛ (X 2 ) G n 2 nr 2 + θ Θ θ 0,θ r J T θ 2 n[h(x 2)+ɛ] 2 n[h(x 2 [X 2 ] θ )+ɛ] G n H θ n 2 2nR 2 84

96 For some ɛ > 0 such that ɛ 0 as n. Here, 0 is a vector whose components are indexed by (p, r) Q(G) and whose (p, r) th component is equal to 0. We have P (Err e2 m 2 ) G n 2 nr 2 J 2 n[h(x 2) ɛ] + θ Θ θ 0,θ r G n T θ 2 n[h(x 2 [X 2 ] θ )+ɛ] H θ n J 2 n[h(x 2) ɛ] Note that J = 2 nrg, T θ = 2 n( ω θ)r g probability of error to go to zero, we require and G n H θ n = G : H θ n. In order for the which is equivalent to R g R 2 > log G H(X 2 ) R g > max [log G : H θ H([X 2 ] θ )] θ Θ ω θ θ 0,θ r or R g R 2 > log G H(X 2 ) R g > max θ Θ θ 0 ω θ [log G : H θ H([X 2 ] θ )] R g R 2 > log G H(X 2 ) R g > I G s.c.(x 2 ; X 2 ) The Error Event Err d2 : We have P avg (Err d2 Err c e2) = 2 nr 2 y 2 Y n 2 p n Y 2 X 2 (y 2 x 2 ) 2 nr 2 m 2 = x 2 A n ɛ (X 2) 2nR 2 m 2 = { a J {φ(a)+b 2 =x 2,s(a)=m 2 }}P (x 2 is sent) m 2 m 2 x2 A n ɛ (X 2 y 2 ) ã J {φ(ã) + b 2 = x 2, s(ã) = m 2 } 85

97 Therefore, E{P (Err d2 )} 2 nr 2 2 nr 2 m 2 = x 2 A n ɛ (X 2) x 2 A n ɛ (X 2 y 2 ) ã J = 2 nr 2 2 nr 2 m 2 = x 2 A n ɛ (X 2 y 2 ) = 2 nr 2 2 nr 2 m 2 = x 2 A n ɛ (X 2 y 2 ) x 2 x 2 +H n θ 2 θ Θ θ r 2 E{θ 2 (m 2 )} a J y 2 Y n 2 p n Y 2 X 2 (y 2 x 2 ) 2 nr 2 m 2 = m 2 m 2 P (φ(ã)+b 2 = x 2, φ(a)+b 2 =x 2 ) P (s(a)=m 2, s(ã)= m 2 ) 2 nr 2 m 2 = m 2 m 2 a J ã J ã a x 2 A n ɛ (X 2 ) 2 E{θ 2 (m 2 )} y 2 Y n 2 p n Y 2 X 2 (y 2 x 2 ) P (φ(ã)+b 2 = x 2, φ(a)+b 2 =x 2 ) P (s(a)=m 2, s(ã)= m 2 ) 2 nr 2 m 2 = m 2 m 2 a J θ Θ θ r G n H θ n 2 2nR 2 T θ 2 n[h(x 2 [X 2 ] θ Y 2 )+ɛ] H θ n ã T θ (a) x 2 A n ɛ (X 2) 2 E{θ 2 (m 2 )} y 2 Y n 2 Therefore, in order for the probability of error to go to zero, we require 2 nh(x 2 [X 2 ] θ Y 2) T θ H θ n = 2nH(X2 [X2]θY2) 2 n( ωθ)rg H θ n 0 for θ r or equivalently, we need to have R g < ω θ [log H θ H(X 2 [X 2 ] θ Y 2 )] for θ r. Therefore, it is sufficient to have R g < I G c.c.(x 2 ; Y 2 ). p n Y 2 X 2 (y 2 x 2 ) The Error Event Err d : Let P be the probability that both x and x 2 +x 3 are decoded incorrectly and let P 2 be the probability that x is decoded incorrectly but x 2 + x 3 is decoded correctly. We have P avg (Err d Err c e Err c e2 Err c e3) P + P 2 86

98 where and P 2 P 2 nr 2 nr 2 2 nr 3 2 nr {Cr (m )=x } 2 nr 2 2 nr 3 m = x A n ɛ (X ) m 2 = m 3 = x 2 A n ɛ (X 2 ) x 3 A n ɛ (X 3 ) 2 nr m = a,b J m m 2 E{θ 3 (m 3 )} ( x, z) A n ɛ (X,X 2 +X 3 y ) z x 2 +x 3 y Y n p n Y X (y x, x 2, x 3 ) {φ(a)+b2 =x 2,φ(b)+b 3 =x 3,s(a)=m 2,t(b)=m 3 } { x=cr( m)} { ã, b J:φ(ã)+b2+φ( b)+b3= z} 2 nr 2 nr 2 2 nr 3 2 nr {Cr (m )=x } 2 nr 2 2 nr 3 m = x A n ɛ (X ) m 2 = m 3 = x 2 A n ɛ (X 2) 2 E{θ 2 (m 2 )} 2 nr m = a,b J m m x 3 A n ɛ (X 3) 2 E{θ 3 (m 3 )} y Y n {φ(a)+b2 =x 2,φ(b)+b 3 =x 3,s(a)=m 2,t(b)=m 3 } p n Y X (y x, x 2, x 3 ) 2 E{θ 2 (m 2 )} ( x,x 2 +x 3 ) A n ɛ (X,X 2 +X 3 y ) { x =C r ( m )} Note that the event { ã, b J : φ(ã) + b 2 + φ( b) + b 3 = z} is equal to the event { c J : φ( c) + b 2 + b 3 = z}. Therefore, using the union bound, we get P 2 nr 2 nr 2 2 nr 3 2 nr {Cr (m )=x } 2 nr 2 2 nr 3 m = x A n ɛ (X ) m 2 = m 3 = x 2 A n ɛ (X 2 ) 2 E{θ 2 (m 2 )} 2 nr m = a,b J m m { x =C r ( m )} x 3 A n ɛ (X 3 ) 2 E{θ 3 (m 3 )} y Y n {φ(a)+b2 =x 2,φ(b)+b 3 =x 3,s(a)=m 2,t(b)=m 3 } c a+b {φ( c)+b2 +b 3 = z} p n Y X (y x, x 2, x 3 ) ( x, z) A n ɛ (X,X 2 +X 3 y ) z x 2 +x 3 87

99 Therefore, E{P } 2 nr 2 nr 2 2 nr 3 2 nr {Cr (m )=x } 2 nr 2 2 nr 3 m = x A n ɛ (X ) m 2 = m 3 = x 2 A n ɛ (X 2 ) x 3 A n ɛ (X 3 ) = 4 E{θ 2 (m 2 )} E{θ 3 (m 3 )} 2 nr m = a,b J m m θ Θ θ r y Y n ( x, z) A n ɛ (X,X 2 +X 3 y ) z x 2 +x 3 p n Y X (y x, x 2, x 3 ) c T θ (a+b) G n A n ɛ (X ) P (φ(a + b) + b 2 + b 3 = x 2 + x 3, φ( c) + b 2 + b 3 = z) 2 nr 2 nr 2 2 nr 3 2 nr {Cr (m )=x } 2 nr 2 2 nr 3 m = x A n ɛ (X ) m 2 = m 3 = x 2 A n ɛ (X 2 ) x 3 A n ɛ (X 3) θ Θ θ r 4 E{θ 2 (m 2 )} E{θ 3 (m 3 )} 2 nr m = a,b J m m θ Θ θ r y Y n ( x, z) A n ɛ (X,X 2 +X 3 y ) z x 2 +x 3 +H n θ p n Y X (y x, x 2, x 3 ) c T θ (a+b) 2 nr 2 n[h(x Y )+ɛ] 2 n[h(z [Z] θx Y )+ɛ] T θ 2 n[h(x ) ɛ] H n θ G n A n ɛ (X ) G n H θ n Note that T θ = 2 n( ω θ)r g. Therefore, in order for the probability of error to go to zero, it suffices to have R + ( ω θ )R g < I(X ; Y ) + log H θ H(Z [Z] θ X Y ) for θ r. For optimum weights {w p,r } (p,r) Q(G), the condition R + R g < I(X ; Y ) + I c.c. (Z; X Y ) implies R g < (I(X ; Y ) R ) + min θ Θ θ r (a) = min θ Θ θ r min θ Θ θ r [I(X ; Y ) R ] + min ω θ θ Θ θ r ω θ [log H θ H(Z [Z] θ X Y )] ω θ [log H θ H(Z [Z] θ X Y )] ω θ [I(X ; Y ) R + log H θ H(Z [Z] θ X Y )] 88

100 which is the desired condition. In the above equations, (a) follows since the maximum of ω θ is attained for θ = 0 and is equal to. Similarly, for P 2 we have E{P 2 } 2 nr 2 nr 2 2 nr 3 2 nr {Cr (m )=x } 2 nr 2 2 nr 3 m = x A n ɛ (X ) m 2 = m 3 = x 2 A n ɛ (X 2 ) 2 E{θ 2 (m 2 )} 2 nr m = a,b J m m x 3 A n ɛ (X 3 ) 2nR 2 n[h(x Y,X 2 +X 3 )+ɛ] 2 n[h(x ) ɛ] 2 E{θ 3 (m 3 )} y Y n G 2n 2 nr 2 2 nr 2 Therefore, it suffices to have R < I(X ; Y, X 2 + X 3 ). p n Y X (y x, x 2, x 3 ) ( x,x 2 +x 3 ) A n ɛ (X,X 2 +X 3 y ) A n ɛ (X ) 89

101 CHAPTER IV Non-Abelian Group Codes There are several results in the literature suggesting that non-abelian group block codes do not exhibit a good coding performance. Although there is no known general method to construct such codes, it is believed that these codes have poor Hamming distance properties and are inferior to Abelian group codes. In this paper, we show that to the contrary of this view, non-abelian group codes can have a good coding performance and should not be ignored. We show that these codes can be superior to their Abelian counterpart if they are employed with joint typicality decoding. Moreover, we show that in certain multi-terminal communication problems such codes outperform random codes and other known structured codes by achieving points outside the known rate region. To do so, we construct the ensemble of non-abelian group codes over Dihedral groups which constitute an important class of non-abelian groups. 4. Introduction Algebraic codes are an important class of codes in coding and information theory and the information-theoretic performance limits of such codes have been studied extensively in the literature [7, 8, 23, 35, 42, 57, 6]. It is known that linear codes are optimal for the point-to-point symmetric channel coding problem when the size of 90

102 the channel input alphabet is a prime power [23, 25]. These codes are also optimal for the lossless compression of a binary source [4]. Linear codes are a special class of algebraic codes which can only be defined over finite fields, hence over alphabets of size a power of a prime. The natural extensions of linear codes over arbitrary alphabets are called group codes which are classified as Abelian (commutative) and non-abelian (non-commutative) group codes. Structured codes are equally important in the multi-terminal communications problems. It is shown in [4] that for a special case of the distributed source coding problem, the average performance of the ensemble of linear codes can be superior to that of random codes. In recent years, this phenomenon has been observed for a wide class of multi-termianl problems [42, 50, 55]. Thus, characterizations of the information-theoretic performance limits of these codes became important. However, the structure of the code restricts the encoder to abide by certain algebraic constraints and hence the performance of such codes is inferior to random codes in some communication settings. For example, linear codes are the most structured class of codes and for some problems in multi-terminal communications, they are not optimal. Abelian group codes are a generalization of linear codes which are algebraically structured and can be defined for any alphabet. Group codes were first studied by Slepian [68] for the Gaussian channel. In [6], the capacity of group codes for certain classes of channels has been found. Further results on the capacity of group codes were established in [7,8,6]. Abelian group codes can outperform unstructured codes as well as linear codes in certain communications problems [42]. It turns out that for the point-to-point communications, Abelian group codes are inferior to linear and random codes. The class of Abelian group codes is a small fraction of group codes. Another step towards reducing the constraints of the code while maintaining some algebraic structure would be to consider non-abelian group codes. However, It has been sug- 9

103 gested by several authors that non-abelian group codes are inferior to Abelian group codes [27] [34] [46]. Moreover, they suggest that asymptotically good group codes over non-abelian groups may not exist. In this chapter, we consider the problem of evaluating the performance of non- Abelian group codes. Since there is no known method to construct such codes, we first characterize an ensemble of non-abelian group block codes over an important class of non-abelian groups, namely the Dihedral groups. We show that these codes can be characterized by the product two dependent linear subcodes each built on one of the two generators of the group. The dependency of the two linear subcodes is dictated by the fact the the two subcodes must commute to ensure the closure of the code under the group operation. This is much like any code over Abelian groups; the difference is that in the Abelian case, the commutativity of the linear subcodes is automatically satisfied and hence the subcodes can be chosen independently. We use this ensemble for a simple point-to-point channel and observe that typical codes in this ensemble achieve the symmetric capacity of this specific channel. We show that this could not have been possible if we were to restrict ourselves to any Abelian subgroup of the input alphabet. Moreover, we show that this performance is superior to the performance of Abelian group codes built for this channel. We also consider a multi-terminal communications problem in which two users send codewords over a multiple access channel and at the receiver, we wish to reconstruct the product of the two codewords where the product is the group operation. We show that these codes are superior to random codes as well as linear codes in certain cases. We use a combination of algebraic and information-theoretic tools for this task. 92

104 4.2 Preliminaries Dihedral Groups A dihedral group of order 2p is the group of symmetries of a regular p-gon, including reflections and rotations and any combination of these operations. A dihedral group can be represented as a quotient of a free group as follows: D 2p = x, y x p =, y 2 =, xyxy =. Dihedral groups are an important class of non-abelian groups. Note that N = x x p = = {, x,, x p } is a normal subgroup of D 2p. The group D 6 is the smallest non-abelian group. Note that for two elements g, h in D 6, g h may not be equal to h g Typicality We use the notion of strong typicality throughout this chapter (See Section 2.) Notation For a set A, A denotes its size (cardinality) and for g an element of a group G, g denotes its order. Let x be a generator of the group G whose order is a non-negative integer p and let u = (u,, u n ) be a vector in Z n p. Then x u denote the element (x u,, x un ) of G n. 4.3 Group Codes over D 2p Although there has been a lot of work on the properties of group codes in the literature, there is no universal approach to constructing the ensemble of such codes over arbitrary groups. Indeed, even for the smallest non-abelian group, namely D 6, the ensemble of group codes is not characterized. We do so in this section by constructing an ensemble of group codes over the group D 2p. First, we consider the case where p is a prime. The following theorem is the main result of this section: 93

105 Theorem IV.. Direct: Let N be a subgroup of {, x,, x p } n = Z n p and let M be a subgroup of n i= {, xα i y} = Z n 2 for some α,, α n {0,,, p }. If N and M commute i.e. if N M = M N, then C = N M = M N is a group code over D 2p. Converse: Let C be any group code over D 2p of length n. Then, C can be decomposed as C = N M where N {, x,, x p } n = Z n p and M n i= {, xα i y} = Z n 2 for some α,, α n {0,,, p } such that N M = M N. Note that this theorem facilitates the construction of group codes over D 2p. Note that the two subcodes N and M are linear codes which can be easily constructed by taking the images of homomorphisms i.e. for some positive integer l and matrix G Z l n p, N = {x ug u Z l p} and for some positive integer k and matrix H Z k n 2 and numbers α,, α n {0,,, p }, M = {(x α y,, x αn y) vh v Z k 2} where (x α,, x αn y) vh is an element of D n 2p whose ith component is x α i y if the ith component of vh is one and is otherwise. Note that some care should be taken when choosing the matrices G and H to ensure that the two subcodes commute. The proof of the direct part of Theorem IV. is standard and will be provided in a more complete version of this work. The converse part of the theorem guarantees that all group codes can be constructed in this manner; i.e. all group codes can be decomposed into two subcodes which commute. The rest of this section is devoted to proving the converse. We do so using the following lemmas. Lemma IV.2. For all g D n 2p, we have a) g {, 2, p, 2p}. Specially g 2p =. b) g 2 N n. c) g p {, y, xy,, x p y} n. d) If g = 2 then g {, y, xy,, x p y} n. e) If g = p then g N n. Proof. The proof is standard and will be provided in a more complete version of this work. 94

106 Lemma IV.3. For C D n 2p, let N = C N n. Then we have C = 2 r N for some non-negative integer r. Proof. Note that N n is a normal subgroup of D n p. Therefore the product CN n is also a subgroup of D n p and hence CN n divides D n p = (2p) n. Furthermore, we have CN n = C N n C N n = C pn N It follows that C N integer r. divides 2 n and this implies C = 2 r N for some non-negative In the following, we consider the implications of this lemma for a few special cases. These special cases are useful in proving the general case described in Lemmas IV.4 and IV.5. Special Case r = 0: In this case, the code C is contained in the subgroup N n of D n 2p. This means C is a linear code. Special Case r = : In this case, we have C = 2 N. Since N = C N n C, there exist an element g C such that g / N n. Since N C and g C, we must have N Ng C. Note that different cosets of a subgroup are disjoint, therefore N Ng = 2 N = C. Therefore, we must have C = N Ng. By part (b) of Lemma IV.2 we have g 2 N n. By the closure of the code C under multiplication, we also have g 2 C. Therefore, we have g 2 N or equivalently N = Ng. 2 Note that since g / N, N and g N are disjoint. Since g N C, we have Ng = g N. Note that g 2 N implies Ng = Ng. 3 Let h = g. 3 By part (c) of Lemma IV.2 the order of h is at most two. Therefore, we have C = N Nh where h takes values from {, y, xy,, x p y} n. Furthermore, Ng = g N implies Nh = h N and g / N n implies h. Note that always h 2 = N. 95

107 To summarize this case, for r =, we have C = N Nh for some h {, y, xy,, x p y} n such that h and Nh = h N. These conditions imply N Nh = N, h. Now Assume C = N, g where C N n = N. Then similarly to the above arguments, we have g 2 N. Also note that for integers a, b, if a + b is even, then gng a b N n and gng a b C. Since gng a b = N, we require gng a b = N. Similarly, we can show that if a+b is odd, then gng a b = Ng. This implies any sequence formed by N and g can be reduced in one of the following two forms: N or Ng. Hence, the size of C = N, g can be at most 2 N if we require N = C N n. Special Case r = 2: Similarly to the above case, for r = 2, we can show C = N Nh Nh 2 Nh h 2 for some h, h 2 {, y, xy,, x p y} n such that h, h 2, h h 2, Nh j = h j N for j =, 2 and h, h 2 commute. These conditions imply N Nh Nh 2 Nh h 2 = N, h, h 2. We address the general case in the following lemma: Lemma IV.4. For C D n 2p let N = C N n. Write C = N, g,, g k for some elements g,, g k C. Then we have (a) For all j =,, k, g j N = Ng j and g 2 j N. (b) For all A [, k] and for all permutations π : A A, ( ) ( ) N g π(j) = N g j (c) C = A [,k] [ ( )] N j A g j j A j A (4.) Proof. The proofs of parts (a) and (b) are through induction on k. We have shown above that these statement are valid for k =, 2. Assume that they are true for all 96

108 k K for some positive integer K > 3. We show that this implies the statements are true for k = K. Proof of (a): For j =,, K, let C = N, g j. We have N C N n C N n = N. Therefore, C N n = N and hence we can use the induction hypothesis to conclude g j N = Ng j and gj 2 N. Proof of (b): If A K, let C = N, g j : j A. Similarly to the argument above, we can show that C N n = N and therefore we can use the induction hypothesis to conclude (4.). Now assume A = K or equivalently A = [, K] and fix a permutation π : A A. If π(k), use part (a) to write g π() g π(k ) g π(k) N = g π() g π(k ) Ng π(k) (i) = g g j Ng π(k) = g (ii) = g j [2,K]\{π(K)} j [2,K]\{π(K)} j [2,K] g j g j g π(k) N N=g g 2 g K N where in (i) and (ii) we use the induction hypothesis for k = K. If π(k) =, use the induction hypothesis for k = 2 to write g π() g π(k ) g π(k) N = g π() g π(k) g π(k ) N After this step, we can use the same argument as above to show (4.). Proof of (c): For any w C = N, g,, g k we can find a sequence of integers α i,, α ik and β i for i Z such that w N i Z ( g α i g α ik k N β). Using the result of part (a) and the fact that N 2 = N, we get w N i Z (gα i g α ik k ). ( i Using the result of part (b) to reorder elements we obtain w N g α ) i i g α ik. Using the result of part (b) and the fact that gj 2 N for j =,, k, we get ( i w N g α ) ( i (mod 2) i g α ik (mod 2) ). This is equivalent to w N j A g j for A = {j =,, k i α i (mod 2) = }. k k 97

109 Lemma IV.5. For C D n 2p, we can find elements h,, h k {, y, xy,, x p y} and a subgroup N N n such that C = N, h,, h k and (a) For all j =,, k, h j N = Nh j. (b) All h j s commute. Proof. Let N = C N n and write C = N, g,, g k for some elements g,, g k C. For j =,, k, define h (0) j = g 3 j. Define h = h (0) and for j = 2,, k, define h j sequentially as follows: For l =,, j, let h (l) j = h (l ) j h l h (l ) j h l h (l ) j and finally let h j = h (j ) j. It is straightforward to verify that with these definitions C = N, h,, h k and (a) and (b) are satisfied. The complete proof will be provided in a more complete version of this work. We are ready to prove the converse part of Theorem IV.. For any C D n 2p, let N = C N n and let M = h,, h k where h,, h k are as in Lemma IV.5. Is it straightforward to verify that N and M satisfy the conditions of the theorem. Remark IV.6. Although in this section we addressed the case where p is a prime, Theorem IV. is valid for arbitrary Dihedral groups D 2q for an arbitrary integer q 3. The difference is that in the general case, the subcode N need not be a linear code but rather it is an Abelian group codes over the cyclic group Z q. The construction of Abelian group codes has been addressed in [6]. 4.4 The Ensemble of Codes In this section, we present an ensemble of codes which consists of all non-abelian group codes over D 2p for some prime p. We make use of Lemma IV.5 to construct an ensemble of codes of length n as follows: For i =,, n, choose α,, α n {0,,, p}. For some m n, choose a partition P = {P,, P m } of [, n] so that for i =,, m, P i = r i n for some 0 < r i. 98

110 For some 0 k n, choose subsets I,, I k of [, m]. For j =,, k, let A j = i Ij P i. For j =,, k, let h j = (h j,, h nj ) D n 2p where h ij = if i A j and h ij = x α i y if i A c j. For some 0 κ and for i =,, m, Let G i be a matrix in {0,, 2} [,κr in] P m. A message is indexed by a set J [, k] and by u i Z [,κr in] p for i =,, m. The encoder maps the message (J, u,, u m ) to Enc(J, u,, u m ) = x u G+ u mg m The rate of each code in this ensemble is equal to R = n (k + n i= κr i log p). The rest of this section is devoted to proving that this ensemble contains all group codes. As in the statement of Lemma IV.5, let C = N, h,, h k and for j =,, k, let h j = (h j,, h nj ). j J h j Lemma IV.7. For i =,, n, there exists α i such that for all j =,, k, h ji {, x α i y}. Proof. Fix an i [, n] and assume there exists a j [, k] with h ij. Since h ij {, y, xy,, x p y}, we can let h ij = x α i y for some α i. Since all of h j s commute, for all j =,, k, we have h ij h ij = h ij h ij where h ij {, y, xy,, x p y}. This can only happen if h ij {, x α i y}. This proves the claim. For j =,, k, let A j = {i h ij = }. Lemma IV.8. If Nh = h N, then N = N A N A c such that Proj A (N A c ) = 0 and Proj A c (N A ) = 0. 99

111 Proof. Let g = (g,, g 2 ) be a vector in N and with a slight abuse of notation let s write g = g A g A c and h = h A, h A c, (Note that h A, is a vector of all ones by definition and h A c, is a vector of elements of the form x α y). We have h gh = g A g A c C. Since h gh N n, we must have h gh = g A g A c N. Therefore, (g h gh ) p+ 2 = g A A c N. With a similar argument, we can show that A g A c N. To complete the proof, let N A = {g A A c g N} = Proj A (N) N A c = { A g A c g N} = Proj A c (N) In other words, we have shown that if Nh = h N, then N = Proj A (N) Proj A c (N). Lemma IV.9. The subcode N can be decomposed as J [,k] Proj ( j J A j ) ( j J c A c j ) (N) Proof. By Lemma IV.8, for all j =,, k, we have N = Proj Aj (N) Proj A c j (N). We have ) N = Proj A2 (Proj A (N) Proj A c (N) ) Proj A c 2 (Proj A (N) Proj A c (N) = Proj A A 2 (N) Proj A c A 2 (N) Proj A A c 2 (N) Proj A c Ac 2 (N) This proves the lemma for k = 2. The general case can be proved in a similar fashion. Define the collection of sets P as {( ) ( ) J } P = j J A j j J c A c j [, k] 00

112 Then P forms a partition of [, n] as P = {P,, P m } such that each A j can be written as union of P i s. To summarize, we have N = m i= Proj P i (N). In the construction above, the matrix G i is used to form the subgroup Proj Pi (N). 4.5 Examples: Non-Abelian Group Codes Can Have a Good Performance In this section, we consider two simple examples. These examples are chosen so that the construction and analysis of the ensemble of non-abelian group codes and the computation of the achievable rate becomes simple. In the first example, we show that the achievable rate using non-abelian group codes can be strictly larger than the rate achievable using Abelian group codes for the point-to-point problem. We also show that this rate is not achievable if we restrict ourselves to any Abelian subgroup of the alphabet. In the second example, we consider a scenario in which two users try to communicate the sum of two symbol streams with a joint decoder through a multiple access channel. We show that for this specific example, the achievable rate using non-abelian codes can be strictly larger than the rate achievable using random codes. This shows in certain multi-terminal communications problems non-abelian group codes can be beneficial by achieving points which are not achievable using other type of codes. In both examples, the parameters of the ensemble of codes is as follows: β 0 = = β n = 0, m = n, P i = {i}, r i =, l is a fixed parameter determined by the n rate of the code, I,, I l are uniformly random, κ = and G i is uniformly random. Note that with these parameters, N = N n and M = {y vh v Z k 2} for some uniformly random k n matrix H. 0

113 4.5. Example : Point-to-Point Problem Consider the channel depicted in Figure 4.. The Shannon capacity of this channel x x 2 ɛ ɛ ɛ y ɛ xy x 2 y ɛ ɛ ɛ ɛ ɛ ɛ ɛ ɛ a b c d e f Figure 4.: A simple channel with input D 6. is C = log 6 h(ɛ) bits per channel use. It turns out that using the joint typicality decoding, the average code in the ensemble of non-abelian group codes can achieve the shannon capacity R = log 6 h(ɛ). If we restrict ourselves to Abelian subgroups of D 6 we can achieve (in bits per channel use).585 for {, x, x 2 }, h(ɛ) for {, y} and.000 for {, xy}and {, x 2 y}. All of these rates are less than the rate achievable using non-abelian group codes Example 2: Computation Over MAC In this section, we use the ensemble of codes constructed in Section 4.4 for a problem of computation over multiple access channels. Consider the two-user MAC depicted in Figure 4.2 where X, Z take values from the Dihedral group D 6 and Y takes values from a finite set Y. X D6 Multiplier W = X Z W Y W Y Z Figure 4.2: Two user MAC: Computation of D 6 operation. When the inputs of the channel are x, z D 6, the channel output is y Y with conditional probability W Y XZ (y x, z). Let n be the block length and let C D n 6 and C 2 D n 6 be codebooks corresponding to Users and 2 respectively. If User 02

114 sends a message x C and User 2 sends a message z C 2, the decoder wishes to reconstruct x z losslessly where the multiplication is the component-wise group operation. Note that in this example, the MAC is multiplicative in the sense that the two terminals get multiplied (the group operation) and then the result is passed through a point-to-point channel. The channel W Y W is taken to be the channel of Example. The encoding/decoding strategy is as follows: Let C = {B x u y Hv u Z n 3, v Z k 2} and C 2 = {y Hv x u B 2 u Z n 3, v Z k 2} for some H Z k n 2. The decoder, after receiving the channel output, looks for a unique codeword in C C 2 which is jointly typical with the channel output. If it doesn t find such a codeword, it declares error. The average probability of error for the codes in this ensemble is given by P err = 3 2n 2 2k x C z C 2 WY n W (y x z) y Y n w C C 2 w x z {w A n ɛ (W y)} This probability of error approaches zero as n increase for R < log 6 h(ɛ) bits per channel use which is equal to the point-to-point capacity of the channel. If we restrict ourselves to the Abelian subgroup {, x, x 2 }, we can show that the rate R =.585 is achievable. The achievable rate using random codes is equal to R = I(XZ; Y )/2 = (log 6 h(ɛ))/2. We observe that for this example, non-abelian group codes outperform Abelian group codes and Abelian group codes outperform random codes. This is due to the fact that the structure of the channel is matched to the structure of non-abelian group codes. 03

115 CHAPTER V Lattice Codes for Multi-terminal Communications 5. Nested Lattices for Point-to-Point Communications In this section, we show that nested lattice codes achieve the capacity of arbitrary channels with or without non-casual state information at the transmitter. We also show that nested lattice codes are optimal for source coding with or without noncausal side information at the receiver for arbitrary continuous sources. Lattice codes for continuous sources and channels are the analogue of linear codes for discrete sources and channels and play an important role in information theory and communications. Linear/lattice and nested linear/lattice codes have been used in many communication settings to improve upon the existing random coding bounds [7, 4 44, 50, 55, 69]. In [7] and [43] the existence of lattice codes satisfying Shannon s bound has been shown. These results have been generalized and the close relation between linear and lattice codes has been pointed out in [44]. In [76], several results regarding lattice quantization noise in high resolution has been derived and the problem of constructing lattices with an arbitrary quantization noise distribution has been studied in [30]. Nested lattice codes were introduced in [79] where the concept of structured binning is presented. Nested linear/lattice code are important because in many com- 04

116 munication problems, specially multi-terminal settings, such codes can be superior in average performance compared to random codes [42]. It has been shown in [77] that nested lattice codes are optimal for the Wyner-Ziv problem when the source and side information are jointly Gaussian. The dual problem of channel coding with state information has been addressed in [7] and the optimality of lattice codes for Gaussian channels has been shown. In [5] it has been shown that nested linear codes are optimal for discrete channels with state information at the transmitter. In this section, we focus on two problems: ) The point to point channel coding with state information at the encoder (the Gelfand-Pinsker problem [3]) and 2) Lossy source coding with side information at the decoder (the Winer-Ziv problem [74] [73]). We consider these two problems in their most general settings i.e. when the source and the channel are arbitrary. We use nested lattice codes with joint typicality decoding rather than lattice decoding. We show that in both settings, from an informationtheoretic point of view, nested lattice codes are optimal. 5.. Preliminaries 5... Channel Model We consider continuous alphabet memoryless channels with knowledge of channel state information at the transmitter used without feedback. We associate two sets X and Y with the channel as the channel input and output alphabets. The set of channel states is denoted by S and it is assumed that the channel state is distributed over S according to P S. When the state of the channel S is s S, the inputoutput relation of the channel is characterized by a transition kernel W Y XS (y x, s) for x X and y Y. We assume that the state of the channel is known at the transmitter non-causally. The channel is specified by (X, Y, S, P S, W Y XS, w) where w : X S R + is the cost function. 05

117 5...2 Source Model The source is modeled as a discrete-time random process X with each sample taking values in a fixed set X called alphabet. Assume X is distributed jointly with a random variable S according to the measure P XS over X S where S is an arbitrary set. We assume that the side information S is known to the receiver non-causally. The reconstruction alphabet is denoted by U and the quality of reconstruction is measured by the average of a single-letter distortion functions d : X U R +. We denote such sources by (X, S, U, P XS, d) Linear and Coset Codes Over Z p For a prime number p, a linear code over Z p of length n and rate R = k log p is n a collection of p k codewords of length n which is closed under mod-p addition (and hence mod-p multiplication). In other words, linear codes over Z p are subspaces of Z n p. Any such code can be characterized by its generator matrix G Z k n p. This follow from the fact that any subgroup of an Abelian group corresponds to the image of a homomorphism into that group. The linear encoder maps a message tuple u Z k p to the codeword x where x = ug and the operations are done mod-p. The set of all message tuples for this code is Z k p and the set of all codewords is the range of the matrix G. i.e. C = { } ug u Z k p (5.) A coset code over Z p is a shift of a linear code by a fixed vector. A coset code of length n and rate R = k n log p is characterized by its generator matrix G Zk n p it s shift vector (dither) B Z n p. The encoding rule for the corresponding coset code is given by x = ug + B, where u is the message tuple and x is the corresponding and 06

118 codeword. i.e. C = { } ug + B u Z k p (5.2) In a similar manner, any linear code over Z p of length n and rate (at least) R = n k n log p is characterized by its parity check matrix H Zk n. This follows from the fact that any subgroup of an Abelian group corresponds to the kernel of a homomorphism from that group. The set of all codewords of the code is the kernel of the matrix H; i.e. C = { u Z n p Hu = 0 } (5.3) where the operations are done mod-p. Note that there are at least p n k codewords in this set. A coset code over Z p is a shift of a linear code by a fixed vector. A coset code of length n and rate (at least) R = n k n check matrix H Z k n p where the operations are done mod-p. and it s bias vector c Z k p as follows: log p can be characterized by its parity C = { u Z n p Hu = c } (5.4) p Lattice Codes and Shifted Lattice Codes A lattice code of length n is a collection of codewords in R n which is closed under real addition. A shifted lattice code is any translation of a lattice code by a real vector. In this paper, we use coset codes to construct (shifted) lattice codes as follows: Given a coset code C of length n over Z p and a step size γ, define Λ(C, γ, p) = γ(c p ) (5.5) 2 where = (,, ) Z n p. The corresponding mod-p lattice code Λ(C, γ, p) is the disjoint union of shifts of Λ by vectors in γpz n. i.e. Λ(C, γ, p) = v pz n (γv + Λ) 07

119 It can be shown that this definition is equivalent to: Λ(C, γ, p) = { γ(v p } 2 ) v Zn, v mod p C Note that Λ(C, γ, p) Λ(C, γ, p) is a scaled and shifted copy of the linear code C Nested Linear Codes A nested linear code consists of two linear codes, with the property than one of the codes (the inner linear code) is a subset of the other code (the outer linear code). For positive integers k and l, let the outer and inner codes C i and C o be linear codes over Z p characterized by their generator matrices G Z l n p and G Z (k+l) n p and their shift vectors B Z n p and B Z n p respectively. Furthermore, assume G = G, B = B G For some G Z k n p. In this case, C o = { ag + m G + B a Z l p, m Zp} k, (5.6) C i = { } ag + B a Z l p (5.7) It is clear that the inner code is contained in the outer code. Furthermore, the inner code induces a partition of the outer code through its shifts. For m Z k p define the mth bin of C i in C o as B m = { } ag + m G + B a Z l p Similarly, Nested linear codes can be characterized by the parity check representation of linear codes. For positive integers k and l, let the outer and inner codes C o and C i be linear codes over Z p characterized by their parity check matrices H Z l n p and 08

120 H Z (k+l) n p assume: and their bias vectors c Z l p and c Z k+l p H = H H, c = c c respectively. Furthermore For some H Z k n p and c Z k p. In this case, C o = { u Z n p Hu = c }, (5.8) C i = { u Z n p Hu = c, Hu = c } (5.9) For m Z k p define the mth bin of C i in C o as B m = { u Z n p Hu = c, Hu = m } The outer code is the disjoint union of all the bins and each bin index m Z k p is considered as a message. We denote a nested linear code by a pair (C i, C o ) Nested Lattice Codes Given a nested linear code (C i, C o ) over Z p and a step size γ, define Λ i (C i, γ, p) = γ(c i p ), 2 (5.0) Λ o (C o, γ, p) = γ(c o p 2 ) (5.) Then the corresponding nested lattice code consists of an inner lattice code and an outer lattice code Λ i (C i, γ, p) = v pz n(γv + Λ i ) (5.2) Λ o (C o, γ, p) = v pz n(γv + Λ o ) (5.3) In this case as well, the inner lattice code induces a partition of the outer lattice code. For m Z k p, define B m = γ(b m p 2 ) (5.4) 09

121 where B m is the mth bin of C i in C o. The mth bin of the inner lattice code in the outer lattice code is defined by: B m = v pz n(γv + B m ) The set of messages consists of the set of all bins of Λi in Λ o. We denote a nested lattice code by a pair ( Λ i, Λ o ) Achievability for Channel Coding and the Capacity-Cost Function A transmission system with parameters (n, M, Γ, τ) for reliable communication over a given channel (X, Y, S, P S, W Y XS, w) with cost function w : X S R + consists of an encoding mapping and a decoding mapping e : S n {, 2,..., M} X n f : Y n {, 2,..., M} such that for all m =, 2,..., M, if s = (s,, s n ) and x = e(s, m) = (x,, x n ), then and E PS { M m= n n w(x i, s i ) < Γ i= } M P r (f(y n ) m X n = e(s n, m)) τ Given a channel (X, Y, S, P S, W Y XS, w), a pair of non negative numbers (R, W ) is said to be achievable if for all ɛ > 0 and for all sufficiently large n, there exists a transmission system for reliable communication with parameters (n, M, Γ, τ) such that log M R ɛ, Γ W + ɛ, τ ɛ n 0

122 The optimal capacity cost function C(W ) is given by the supremum of C such that (C, W ) is achievable Achievability for Source Coding and the Rate-Distortion Function A transmission system with parameters (n, Θ,, τ) for compressing a given source (X, S, U, P XS, d) consists of an encoding mapping and a decoding mapping e : X n {, 2,, Θ}, such that the following condition is met: g : S n {, 2,, Θ} U n P (d(x n, g(e(x n ))) > ) τ where X n is the random vector of length n generated by the source. In this transmission system, n denotes the block length, log Θ denotes the number of channel uses, denotes the distortion level and τ denotes the probability of exceeding the distortion level. Given a source, a pair of non-negative real numbers (R, D) is said to be achievable if there exists for every ɛ > 0, and for all sufficiently large numbers n a transmission system with parameters (n, Θ,, τ) for compressing the source such that log Θ R + ɛ, D + ɛ, τ ɛ n The optimal rate distortion function R (D) of the source is given by the infimum of the rates R such that (R, D) is achievable Typicality We use the notion of weak* typicality with Prokhorov metric introduced in [47]. Let M(R d ) be the set of probability measures on R d. For a subset A of R d define its

123 ɛ-neighborhood by A ɛ = {x R d y A such that x y < ɛ} where denotes the Euclidean norm in R d. The Prokhorov distance between two probability measures P, P 2 M(R d ) is defined as follows: π d (P, P 2 ) = inf{ɛ > 0 P (A) < P 2 (A ɛ ) + ɛ and P 2 (A) < P (A ɛ ) + ɛ Borel set A in R d } Consider two random variables X and Y with joint distribution P XY (, ) over X Y R 2. Let n be an integer and ɛ be a positive real number. For the sequence pair (x, y) belonging to X n Y n where x = (x,, x n ) and y = (y,, y n ) define the empirical joint distribution by n P xy (A, B) = n i= {xi A,y i B} for Borel sets A and B. Let P x and P y be the corresponding marginal probability measures. It is said that the sequence x is weakly* ɛ-typical with respect to P X if π ( P x, P X ) < ɛ We denote the set of all weakly* ɛ-typical sequences of length n by A n ɛ (X). Similarly, x and y are said to be jointly weakly* ɛ-typical with respect to P XY if π 2 ( P xy, P XY ) < ɛ We denote the set of all weakly* ɛ-typical sequence pairs of length n by A n ɛ (XY ). Given a sequence x A n ɛ, the set of conditionally ɛ-typical sequences A n ɛ (Y x) is defined as A n ɛ (Y x) = {y Y n (x, y) A n ɛ (X, Y )} 2

124 5...0 Notation In our notation, O(ɛ) is any function of ɛ such that lim ɛ 0 O(ɛ) = 0 and for a set G, G denotes the cardinality (size) of G Nested Lattice Codes for Channel Coding We show the achievability of the rate R = I(U; Y ) I(U; S) for the Gelfand- Pinsker channel using nested lattice code for U. Theorem V.. For the channel (X, Y, S, P S, W Y XS, w), let w : X R + be a continuous cost function. Let U be an arbitrary set and let SU XY be distributed over S U X Y according to P S P U S W X US W Y SX where the conditional distribution P U S and the transition kernel W X US are such that E{w(X)} W. Then the pair (R, W ) is achievable using nested lattice codes over U where R = I(U; Y ) I(U; S) Discrete U and Bounded Continuous Cost Function In this section we prove the theorem for the case when U = Û takes values from the discrete set γ(z p p ) where p is a prime and γ is a positive number. We 2 use a random coding argument over the ensemble of mod-p lattice codes to prove the achievability. Let C o and C i be defined as (5.6) and (5.7) where G is a random matrix in Zp l n, G is a random matrix in Z k n p and B is a random vector in Z n p. Define Λ i (C i, γ, p) and Λ o (C o, γ, p) accordingly. The ensemble of nested lattice codes consists of all lattices of the form (5.0) and (5.). The set of messages consists of all bins B m indexed by m Z k p. The encoder observes the massage m Z k p and the channel state s S n and looks for a vector u in the mth bin B m which is jointly weakly* typical with s and encodes the massage m to x according to W X SU. The encoder declares error if it does not find such a vector. 3

125 After receiving y Y n, the decoder decodes it to m Z k p if m is the unique tuple such that the mth bin B m contains a sequence jointly typical with y. Otherwise it declares error. Encoding Error We begin with some definitions and lemmas. Let For a Z k p, m Z l p, define g(a, m) = γ g(a, m) has the following properties: S = [ γp 2, γp 2 ]n γz n (5.5) ( (ag + m G + B) ) (p ) 2 Lemma V.2. For a Z l p and m Z k p, g(a, m) is uniformly distributed over S. i.e. For u S, P (g(a, m) = u) = p n Proof. Note that B is independent of G and G and therefore ag + m G + B is a uniform variable over Z n p. The lemma follows by noting that S = γ ( Z n p ) (p ) 2 Lemma V.3. For a, ã Z l p and m Z k p if a ã then g(a, m) and g(ã, m) are independent. i.e. For u S and ũ S, P (g(a, m) = u, g(ã, m) = ũ) = p 2n 4

126 Proof. It suffices to show that ag + m G + B and ãg + m G + B are uniform over Z n p and independent. Note that for u, ũ Z n p, P (ag + m G + B = u, ãg + m G + B = ũ) = P (ag + m G + B = u, (ã a)g = ũ u) (a) = P (ag + m G + B = u) P ((ã a)g = ũ u) (b) = p 2n where (a) follows since the B is uniform over Z n p and independent of G and (b) follows since B and G are uniform and ã a 0 Lemma V.4. For a, ã Z l p and m, m Z k p if m m then g(a, m) and g(ã, m) are independent. i.e. For u S and ũ S, P (g(a, m) = u, g(ã, m) = ũ) = p 2n Proof. The proof is similar to the proof of the previous lemma and is omitted. For a message m Z k p and state s S n, the encoder declares error if there is no sequence in B m jointly typical with s. Define θ(s) = {u A n ɛ (Û s)} = {g(a,m) A n ɛ (Û s)} u B m a Z l p ( ) Let Z be a uniform random variable over γ Z p (p ) and hence Z n a uniform 2 random variable over S. Then we have E{θ(s)} = a Z l p we need the following lemmas from to proceed: ) P (Z n A nɛ (Û s) Lemma V.5. Let P XY be a joint distribution on R 2 and P X and P Y denote its marginals. Let Z n be a random sequence drawn according to P n Z. If D(P XY P Z P Y ) is finite then for each δ > 0, there exist ɛ(δ) such that if ɛ < ɛ(δ) and y A n ɛ (P Y ) then lim sup n log P n Z ((Z n, y) A n ɛ (P XY ) D(P XY P Z P Y )+δ 5

127 Proof. This lemma is a generalization of Theorem 2 of [47]. The proof is provided in the Appendix. Lemma V.6. Let P XY be a joint distribution on R 2 and P X and P Y denote its marginals. Let Z n be a random sequence drawn according to PZ n. Then for each ɛ, δ > 0, there exist ɛ(ɛ, δ) such that if y A n ɛ (P Y ) then lim inf n log P n Z ((Z n, y) A n ɛ (P XY ) D(P XY P Z P Y ) δ Proof. This lemma is a generalization of Theorem 22 of [47]. The proof is provided in the Appendix. Using these lemmas we get E{θ(s)} = p l 2 n[d(p ÛS P ZP S )+O(ɛ)] Similarly, let Z n = g(a, m) and Z n = g(ã, m). Note that Z n and Z n are equal if a = ã and are independent if a ã. We have E{θ(s) 2 } = P (Z n, Z ) n A nɛ (Û s) a,ã Z l p = a Z l p ) P (Z n A nɛ (Û s) + a,ã Z l p a ã = p l 2 n[d(p ÛS P ZP S )+O(ɛ)] ) 2 P (Z n A nɛ (Û s) + p l (p l )2 2n[D(P ÛS P ZP S )+O(ɛ)] Therefore var{θ(s)} = E{θ(s) 2 } E{θ(s)} 2 p l 2 n[d(p ÛS P ZP S )+O(ɛ)] 6

128 Hence, P (θ(s) = 0) P ( θ(s) E{θ(s)} E{θ(s)) (a) var{θ(s)} E{θ(s)} 2 p l 2 n[d(p ÛS P ZP S ]+O(ɛ)] Where (a) follows from Chebyshev s inequality. This bound is valid for all s S n. Therefore if l n log p > D(P ÛS P ZP S ) (5.6) then the probability of encoding error goes to zero as the block length increases. Decoding Error The decoder declares error if there is no bin B m containing a sequence jointly typical with y where y is the received channel output or if there are multiple bins containing sequences jointly typical with y. Assume that the message m has been encoded to x according to W X SU where u = g(a, m) and the channel state is s. The channel output y is jointly typical with u with high probability. Given m, s, a and u, the probability of decoding error is upper bounded by P err m Z k pã Z l p m m ) P (g(ã, m) A nɛ (Û y) g(a, m) Anɛ (Û y) (a) = p l p k 2 n[d(p ÛY P ZP Y )+O(ɛ)] Where in (a) we use Lemmas V.4, V.5 and V.6. Hence the probability of decoding error goes to zero if k + l n log p < D(P ÛY P ZP Y ) (5.7) 7

129 The Achievable Rate Using (5.6) and (5.7), we conclude that if we choose l log p sufficiently close n to D(PÛS P Z P S ) and k+l n rate log p sufficiently close to D(P ÛS P ZP S ) we can achieve the R = k n log p D(P ÛY P ZP Y ) D(PÛS P Z P S ) = I(Û; Y ) I(Û; S) Arbitrary U and Bounded Continuous Cost Function Let Q = {A, A 2,, A r } be a finite measurable partition of R d. For random variables U and Y on R d with measure P UY define the quantized random variables U Q and Y Q on Q with measure P UQ Y Q (A i, A j ) = P UY (A i, A j ) The Kullback-Leibler divergence between U and Y is defined as D(U Y ) = sup D(U Q Y Q ) Q where D(U Q Y Q ) is the discrete Kullback-Leibler divergence and the supremum is taken over all finite partitions Q of R d. Similarly, the mutual information between U and Y is defined as I(U; Y ) = sup I(U Q ; Y Q ) Q where I(U Q ; Y Q ) is the discrete mutual information between the two random variables and the supremum is taken over all finite partitions Q of R d. We have shown in Section that for discrete random variables the region given in Theorem V. is achievable. In this part, we make a quantization argument to generalize this result to arbitrary auxiliary random variables. Let S, U, X, Y be distributed according to P S P U S W X US W Y X where in this case U is an arbitrary random variable. We start with the following theorem: 8

130 Theorem V.7. Let F F 2 be an increasing sequence of σ-algebras on a measurable set A. Let F denote the σ-algebra generated by the union n=f n. Let P and Q be probability measures on A. Then D(P Fn Q Fn ) D(P F Q F ) as n where P F denotes the restriction of P on F. Proof. Provided in [33] and [4] for example. For a prime p > 2, a real positive number γ and for i = 0, p define a i = γ(p ) 2 + γi Define the quantization Q γ,p as Q γ,p = {A 0, A 2,, A p } where A 0 = (, a 0 ] A i = (a i, a i ], for i =,, p 2 A p = (a p 2, + ) Let the random variable Ûγ,p take values from {a 0,, a p } according to joint measure P S ÛXY (Û = a i, SXY B) = P SUXY (U A i, SXY B) (5.8) For all Borel sets B R 3. For a fixed γ, let p q be two primes. Then the σ-algebra induced by Q γ,p is included in the σ-algebra induced by Q γ,q. Therefore, for a fixed γ, we can use the above theorem to get I(U Fγ,p ; Y Fγ,p ) I(U Fγ, ; Y Fγ, ) as p (5.9) where U Fγ, is a random variable over Q γ, = {A i i Z} where A i = γ +(γi, γ(i+)] 2 with measure P U Fγ, (A i ) = P U (A i ). Let γ 0 = and define γ n = 2 n. Note that if m > n then F γn, is included in F γm,. 9

131 Also, since dyadic intervals generate the Borel Sigma field ( [49] for example), the restriction of U to the sigma algebra generated by n=f γn, is U itself. We can use Theorem V.7 to get I(U Fγn, ; Y Fγn, ) I(U; Y ) as n (5.20) Combining (5.9) and (5.20) we conclude that for all ɛ > 0, there exist Γ and P such that if γ Γ and p Γ then I(U Fγ,p ; Y Fγ,p ) I(U; Y ) < ɛ Since quantization reduces the mutual information (X Q X Y ), we have I(U Fγ,p ; Y Fγ,p ) I(U Fγ,p ; Y ) I(U; Y ) Therefore I(U Fγ,p ; Y ) I(U; Y ) < ɛ. Also note that I(U Fγ,p ; Y ) = I(Ûγ,p; Y ) since we define the joint measure to be the same. Therefore I(Ûγ,p; Y ) I(U; Y ) ɛ (5.2) With a similar argument, for all ɛ > 0 there exist γ and p such that I(Ûγ,p; S) I(U; S) ɛ (5.22) if we take the maximum of the two p s and the minimum of the two γ s, we can say for all ɛ > 0 there exist γ and p such that both (5.2) and (5.22) happen. consider the sequence P S Û γn,px as n, p. In the next lemma we show that under certain conditions this sequence converges in the weak* sense to P SUX. Lemma V.8. Consider the sequence P S Û γn,px where n and p is such that γ n p as n (Take p to be the smallest prime larger than 2 2n for example.). Then the sequence converges to P SUX in the weak* sense as n. 20

132 Proof. It suffices to show that the three dimensional cumulative distribution function F S Û converges to F γn,px SUX point-wise in all points (s, u, x) R 3 where F is continuous. Let (s, u, x) be a point where F is continuous and for an arbitrary ɛ > 0, let δ be such that F SUX (s, u δ, x) F SUX (s, u, x) < ɛ F SUX (s, u + δ, x) F SUX (s, u, x) < ɛ Let p be such that γ n = 2 n < δ and find p accordingly. Then there exist points a i, a j such that a i [u δ, u] and a j [u, u + δ]. We have F SUX (s, u δ, x) F S Û γn,px (s, a i, x) F S Û γn,px (s, u, x) F S Û γn,px (s, a j, x) F SUX (s, u + δ, x) Therefore F S Û (s, u, x) F γn,px SUX(s, u, x) ɛ. This shows the point-wise convergence of F S Û γn,px. The above lemma implies E PS Ûγn,pX {w(x, S)} converges to E P SUX {w(x, S)} W since w is assumed to be bounded continuous. We have shown that for arbitrary P U S and W X SU, one can find PÛ S and W X S Û induced from (5.8) such that Û is a discrete variable and I(Û; Y ) I(Û; S) I(U; Y ) I(U; S) E PS ÛX {w(x, S)} E P SUX {w(x, S)} Hence, using the result of section 5..2., we have shown the achievability of the rate region given in Theorem V. for arbitrary auxiliary random variables when the cost function is bounded and continuous. 2

133 Arbitrary U and Continuous Cost Function For a positive number l, define the clipped random variable ˆX by ˆX = sign(x) min(l, X ) and let Ŷ be distributed according to W Ŷ ˆX (, ˆx) = W Y X(, ˆx). Lemma V.9. As l, I(U; Ŷ ) I(U; Y ). Proof. Note that for Borel sets B, B 2, B 3 if B 2 ( l, l) then P U ˆXŶ (B, B 2, B 3 ) = P UXY (B, B 2, B 3 ) For any ɛ > 0, let Q = {A,, A r } be a quantization such that I(U Q ; Y Q ) I(U; Y ) < ɛ For an arbitrary δ > 0, assume l is large enough such that P X (( l, l)) > δ. Then P UQ Y Q (A i, A j ) = P UXY (A i, R, A j ) = P UXY (A i, ( l, l), A j )+P UXY (A i, (, l] [l, ), A j ) P UXY (A i, ( l, l), A j ) + P UXY (R, (, l] [l, ), R) = P U ˆXŶ (A i, ( l, l), A j ) + P X ((, l] [l, )) P U Ŷ (A i, A j ) + δ = P UQ Ŷ Q (A i, A j ) + δ Also, P UQ Y Q (A i, A j ) = P UXY (A i, R, A j ) P UXY (A i, ( l, l), A j ) = P U ˆXŶ (A i, ( l, l), A j ) P U ˆXŶ (A i, R, A j ) δ = P U Ŷ (A i, A j ) δ = P UQ Ŷ Q (A i, A j ) δ 22

134 Since the choice of δ is arbitrary and since the discrete mutual information is continuous, we conclude that as ɛ, δ 0 (hence l ), I(U; Ŷ ) I(U; Y ). Since ˆX is bounded and w is assumed to be continuous, w is also bounded. This completes the proof Nested Lattice Codes for Source Coding In this section, we show the achievability of the rate R = I(U; X) I(U; S) for the Wyner-Ziv problem using nested lattice codes for U. Theorem V.0. For the source (X, S, U, P XS, d) assume d : X U R + is continuous. Let U be a random variable taking values from the set U jointly distributed with X and S according to P XS W U X where W U X ( ) is a transition kernel. Further assume that there exists a measurable function f : S U ˆ X such that E{d(X, f(s, U))} D. Then the rate R (D) = I(X; U) I(S; U) is achievable using nested lattice codes Discrete U and Bounded Continuous Distortion Function In this section we prove the theorem for the case when U takes values from the discrete set γ(z p p ) where p is a prime and γ is a positive number. The gener- 2 alization to the case where U is arbitrary and the distortion function is continuous is similar to the channel coding problem and is omitted. We use a random coding argument over the ensemble of mod-p lattice codes to prove the achievability. The ensemble of codes used for source coding is based on the parity check matrix representation of linear and lattice codes. Define the inner and outer linear codes as in (5.8) and (5.9) where H is a random matrix in Z l n p, H is a random matrix in Z k n p, c is a random vector in Z l p and c is a random vector in Z k p. Define Λ i (C i, γ, p) and Λ o (C o, γ, p) accordingly. The set of messages consists of all bins B m indexed by m Z k p. 23

135 For m Z k p, Let B m be the mth bin of Λ i in Λ o. The encoder observes the source sequence x X n and looks for a vector u in the outer code Λ o which is typical with x and encodes the sequence x to the bin of Λ i in Λ o containing u. The encoder declares error if it does not find such a vector. Having observed the index of the bin m and the side information s, the decoder looks for a unique sequence u in the mth bin which is jointly typical with s and outputs f(u, s). Otherwise it declares error. Encoding Error Define S as in (5.5). For u S define g(u) has the following properties: g(u) = γ u + p 2 Lemma V.. For u S, P (u Λ o ) = P (Hg(u) = c) = p l i.e. All points of S lie on the outer lattice equiprobably. Proof. Follows from the fact that c is independent of H and is uniformly distributed over Z l p. Lemma V.2. For u S and ũ S, if u ũ, P (u Λ o, ũ Λ o ) = P (Hg(u) = c, Hg(ũ) = c) = p 2l i.e. All points of S lie on the outer lattice independently. 24

136 Proof. Note that P (Hg(u) = c, Hg(ũ) = c) = P (Hg(u) = c, H(g(ũ) g(u)) = 0) (a) = P (Hg(u) = c) P (H(g(ũ) g(u)) = 0) (b) = p 2l Where (a) follows since c is uniform and independent of H and (b) follows since H and c are uniform and g(ũ) g(u) is nonzero. For a source sequence x X n, the encoder declare error if there is no sequence u Λ o jointly typical with x. Define θ(x) = u Λ o {u A n ɛ (Û x)} Let Z be a uniform random variable over γ(z p p 2 )) and Zn a uniform random variable over S. We need the following lemmas to proceed: Lemma V.3. With the above construction Λ o = p n l with high probability. Specifically, P (rank(h) = l) = (pn )(p n p)(p n p 2 ) (p n p l ) p nl p n l and hence the probability that Λ o = p n l is close to one if n is large. Furthermore, for i =, 2,, l, P (rank(h) = i) ( ) l p i(l i) i p n(l i) Proof. The first part of the lemma follows since the total number of choices for H is equal to p nl and the number of choices with independent rows is equal to (p n )(p n p)(p n p 2 ) (p n p l ). Now we show the upper bounds. For a matrix H to 25

137 have a rank i, there should exist i independent rows and the rest of the rows must be a linear combination of these rows (There are p i of such linear combinations). Hence the total number of such matrices is upper bounded by ( ) l (p n )(p n p)(p n p 2 ) (p n p i )(p i ) l i i The lemma follows if we upper bound this quantity by ( ) l p ni p i(l i) i Lemma V.4. With θ(x) and Z n defined as above, we have ) E{θ(x)} p n l P (Z n A nɛ (Û s) + 2l E{θ(x)} ( p n l )pn l P p n(l ) (Z n A nɛ (Û s) ) Proof. Write the random lattice Λ o as {u (Λ o ), u 2 (Λ o ),, u r (Λ o )} where r is the cardinality of Λ o and u (Λ o ), u 2 (Λ o ),, u r (Λ o ) are picked without replacement from Λ o. It follow from Lemma V. that given Λ o = r = p n l, u (Λ o ), u 2 (Λ o ),, u r (Λ o ) are each uniformly distributed random variables over S. To see this note that for arbitrary u S, since u (Λ o ), u 2 (Λ o ),, u r (Λ o ) are picked randomly from Λ o, P (u = u (Λ o )) = P (u = u 2 (Λ o )) = = P (u = u r (Λ o )) Therefore P (u Λ o ) = r P (u = u i (Λ o )) i= = rp (u = u (Λ o )) = p l Hence if r = p n l then u (Λ o ) is uniform over S. This argument is valid for all i =,, r and hence if r = p n l then u i (Λ o ) is uniform over S. Note that E{θ(x)} = E{E{θ(x) Λ o = r}} 26

138 The conditional expectation on the right hand side of this equation is upper bounded by p n l and for r = p n l it is equal to E{θ(x) Λ o = p n l } = E{ u Λ o {u A n ɛ (Û x)}} p n l = E{ i= {ui (Λ o) A n ɛ (Û x)}} p n l ) = P (u i (Λ o ) A nɛ (Û x) i= ) P (Z n A nɛ (Û x) p n l (a) = i= ) = p n l P (Z n A nɛ (Û x) Where (a) follows since u i (Λ o ) is uniformly distributed over S for all i =,, r. Next note that Similarly, E{θ(x)} = E{θ(x)} = p n r=0 P ( Λ o = r) E{θ(x) Λ o = r} P ( Λ o = p n l) ) rp (Z n A nɛ (Û x) l + P ( Λ o = p n i) p n i i=0 ) p n l P (Z n A nɛ (Û s) + 2l p n r=0 p n(l ) P ( Λ o = r) E{θ(x) Λ o = r} P ( Λ o = p n l) ) rp (Z n A nɛ (Û x) ( ) p n l )pn l P (Z n A nɛ (Û s) Therefore, E{θ(s)} = p n l 2 n[d(p ÛX P ZP X )+O(ɛ)] 27

139 Similarly, θ(x) 2 = u,ũ Λ o {u,ũ A n ɛ (Û x)} = u Λ o {u A n ɛ (Û x)} + u ũ Λ o {u,ũ A n ɛ (Û x)} u Λ o {u A n ɛ (Û x)} + u,ũ Λ o {u,ũ A n ɛ (Û x)} It can be shown that ) E{θ(x) 2 } = E{ Λ o }P (Z n A nɛ (Û x) ) 2 + E{ Λ o } 2 P (Z n A nɛ (Û x) p n l 2 n[d(p ÛX P ZP X )+O(ɛ)] + p 2(n l) 2 2n[D(P ÛX P ZP X )+O(ɛ)] Hence var{θ(x)} p k 2 n[d(p ÛX P ZP X )+O(ɛ)] Hence, P (θ(s) = 0) var{θ(x)} E{θ(x)} 2 p (n l) 2 n[d(p ÛX P ZP X )+O(ɛ)] Therefore if l n log p < log p D(P ÛX P ZP X ) (5.23) then the probability of encoding error goes to zero as the block length increases. Decoding Error After observing m and the side information s, the decoder declares error if it does not find a sequence in the bin B m jointly typical with s or if there are multiple of 28

140 such sequences. We will show that the probability that a sequence ũ u is in the same bin as u and is jointly typical with s goes to zero as the block length increases if k+l n by log p > log p D(P ÛS P ZP S ). The probability of decoding error is upper bounded P err ũ S n P (u B m, u A nɛ (Û s) ) = ) P (u B m ) P (Z n A nɛ (Û s) ũ S n = pn p k+l 2 n[d(p ÛS P ZP S )+O(ɛ)] Hence the probability of decoding error goes to zero if The Achievable Rate k + l n log p > log p D(P ÛS P ZP S ) (5.24) Using (5.24) and (5.24), we conclude that if we choose l log p sufficiently close to n log p D(PÛX P Z P X ) and k+l n achieve the rate log p sufficiently close to log p D(P ÛS P ZP S ) we can R = k n log p D(PÛX P Z P X ) D(PÛS P Z P S ) = I(X; Û) I(S; Û) 5..4 Appendix Proof of Lemma V.5 The proof follows along the lines of the proof of Theorem 2 of [47]. Let Q = {A, A 2,, A r } be a finite partition of R. Let Q XY Z, Q XY, Q XZ, Q Y Z, Q X, Q Y and Q Z be measures induced by this partition, corresponding to P XY Z, P XY, P XZ, P Y Z, P X, P Y and P Z respectively. For the random sequence Z n = (Z,, Z n ) and the 29

141 deterministic sequence y = (y,, y n ) let Q y be the deterministic empirical measure of y and define the random empirical measures Q Zy (A i, A j ) = n Q Z (A i ) = n n i= n i= {Zi A i } {Zi A i,y i A j } for i, j =, 2,, r. As a property of weakly* typical sequences, for a fixed ɛ > 0, there exists a sufficiently small ɛ > 0 such that for a sequence pair (x, y) A n ɛ (XY ) and for all i, j =, 2,, r, Qxy (A i, A j ) Q XY (A i, A j ) ɛ where Q xy is the joint empirical measure of (x, y). It follows that the rare event (Z n, y) A n ɛ (XY ) is included in the intersection of events { QZy (A i, A j ) Q XY (A i, A j ) ɛ } (5.25) for i, j =, 2,, r. Therefore Q n Z ((Z n, y) A n ɛ (XY )) ( r { QZy (A i, A j ) Q XY (A i, A j ) } ) ɛ Q n Z i,j= Let ɛ(δ) be such that for j =,, r, Qy (A j ) Q Y (A j ) ɛ ɛ < Q y (A j ) Q Y (A j ) < + ɛ Note that if Q Y (A j ) = 0 then Q XY (A i, A j ) = 0 and hence QZy (A i, A j ) Q XY (A i, A j ) = QZy (A i, A j ) Q y (A j ) ɛ 30

142 and (5.25) is satisfied. If we choose ɛ smaller than any nonzero Q Y (A j ) it follows that Q y (A j ) > 0 whenever Q Y (A j ) > 0. Now assume that Q Y (A j ) > 0 and hence Q y (A j ) > 0. Define Q X Y (A i A j ) = Q XY (A i, A j ) Q Y (A j ) Q Z y (A i A j ) = Q Zy (A i, A j ) Q y (A j ) If Q Y (A j ) > 0, the event in (5.25) is included in the event { QZ y (A i A j ) Q y (A j ) Q X Y (A i A j ) Q y (A j ) +Q X Y (A i A j ) Q y (A j ) Q X Y (A i A j )Q Y (A j ) ɛ } (5.26) Note that QX Y (A i A j ) Q y (A j ) Q X Y (A i A j )Q Y (A j ) = Q X Y (A i A j ) Qy (A j ) Q Y (A j ) ɛ Therefore (5.26) implies { QZ y (A i A j ) Q y (A j ) Q X Y (A i A j ) Qy (A j ) 2ɛ } And this implies Let { QZ y (A i A j ) Q y (A j ) Q X Y (A i A j ) ɛ 2 = r max j= Q Y (A j )>0 then the event in (5.25) is included in the event 2ɛ Q y (A j )( ɛ ) 2ɛ Q y (A j )( ɛ ) } { QZ y (A i A j ) Q y (A j ) Q X Y (A i A j ) ɛ2 3

143 Therefore Q n Z ((Z n, y) A n ɛ (XY )) r { Q n Z QZ y (A i A j ) Q X Y (A i A j ) } ɛ2 i,j= Q Y (A j )>0 Note that since y is a deterministic sequence and Z i s are iid, the events { QZ y (A i A j ) Q X Y (A i A j ) ɛ 2 } are independent for different values of j =,, r. Let n j = n Q y (A j ). Then, Q n Z ((Z n, y) A n ɛ (XY )) ( r r { QZ y (A i A j ) Q X Y (A i A j ) } ) ɛ2 j= Q Y (A j )>0 Q n j Z i= Since for Q Y (A j ) > 0, n j as n, it follows from Sanov s theorem [22] that lim sup log n n j ( r { QZ y (A i A j ) Q X Y (A i A j ) } ) ɛ 2 Q n j Z i= where δ j 0 as ɛ 2 0. Therefore j= Q Y (A j )>0 [ D(Q X Y ( A j ) Q Z ( )) δ j ] lim sup n n log Qn Z ((Z n, y) A n ɛ (XY )) r lim sup n j n n D(Q X Y ( A j ) Q Z ( )) r j= Q Y (A j )>0 ( ɛ )Q Y (A j ) [ D(Q X Y ( A j ) Q Z ( )) δ j ] ( ɛ )D(Q XY Q Z Q Y ) + δ where δ 0 as ɛ 2 0. For finite D(P XY P Z P Y ) the statement of the lemma follows by choosing the quantization Q such that D(Q XY Q Z Q Y ) is sufficiently close to D(P XY P Z P Y ). 32

144 Proof of Lemma V.6 The proof follows along the lines of the proof of Theorem 22 of [47]. Let Q = {A, A 2,, A r } be a finite partition of R. Let Q XY Z, Q XY, Q XZ, Q Y Z, Q X, Q Y and Q Z be measures induced by this partition, corresponding to P XY Z, P XY, P XZ, P Y Z, P X, P Y and P Z respectively. For the random sequence Z n = (Z,, Z n ) and the deterministic sequence y = (y,, y n ) let Q y be the deterministic empirical measure of y and define the random empirical measures Q Zy (A i, A j ) = n Q Z (A i ) = n For arbitrary δ > 0, let Q be such that n i= n i= {Zi A i } {Zi A i,y i A j } π(q XY, P XY ) < ɛ π(q ZY, P ZY ) < ɛ D(P XY P Z P Y ) D(Q XY Q Z Q Y ) < ɛ We show that for such a quantization, under certain conditions, the probability of the event { π( QZy, Q XY ) < ɛ } is close to the probability of the event { π( PZy, P XY ) < 5ɛ } It follows from Theorem 8 of [47] that for arbitrary ɛ, δ > 0, there exists some ɛ > 0 such that for all n greater than some N if y A n ɛ (Y ), then lim P ( π( P Zy, P ZY ) < ɛ ) > δ n lim P ( π( Q Zy, Q ZY ) < ɛ ) > δ n 33

145 Consider the event { π( QZy, Q XY ) < ɛ, π( P Zy, P ZY ) < ɛ, π( Q Zy, Q ZY ) < ɛ } This event implies π( P Zy, P XY ) π( P Zy, P ZY ) + π(q ZY, P ZY ) + π( Q Zy, Q ZY ) + π( Q Zy, Q XY ) + π(q XY, P XY ) 5ɛ Therefore P ( π( P Zy, P XY ) 5ɛ ) P ( π( Q Zy, Q XY ) < ɛ, π( P Zy, P ZY ) < ɛ, Q Zy, Q ZY ) < ɛ ) The right hand side can be lower bounded by P ( π( Q Zy, Q XY ) ɛ ) (5.27) P ( π( P Zy, P ZY ) ɛ ) P ( QZy, Q ZY ) ɛ ) (5.28) P ( π( Q Zy, Q XY ) < ɛ ) δ δ (5.29) Note that for arbitrary δ and for sufficiently large n, P ( π( Q Zy, Q XY ) ) 2 n[d(q XY Q Z Q Y )+δ ] Since δ, δ are arbitrary and D(Q XY Q Z Q Y ) D(P XY P Z P Y ), it follows that P ( π( P Zy, P XY ) 5ɛ ) 2 n[d(p XY P Z P Y )+δ+ɛ ] 2δ 5.2 Distributed Source Coding In this section, we consider a distributed source coding problem in which the sources can take values from continuous alphabets and there is one distortion constraint. We provide an information-theoretic inner bound to the optimal rate-distortion 34

146 region using group codes which strictly contains the available bounds based on random codes. The problem definition and the Berger-Tung rate regions are the continuous alphabets versions of those provided in Section The Main Result In this section, we provide an inner bound to the achievable rate-distortion region which strictly contains the Berger-Tung rate region. Without a loss of generality, we assume that all the alphabets X, Y, Z ˆ, P, Q, U, V, Z are included in a Polish space R Finite Auxiliary Random variables and Bounded Continuous Distortion Function In this section, we consider the case where the sources are not necessarily discrete but all of the auxiliary random variables are finite (subsets of R). We generalize the definition of the channel coding mutual information as follows: I G c.c.(x; Y ) = max w p,r,(p,r) Q(G) wp,r= min θ Θ θ r ω θ D(p X[X]θ Y p W p [X]θ Y ) (5.30) where W is uniformly distributed over G. The following theorem is a generalization of Theorem III. to the case where the sources are not necessarily finite. Theorem V.5. For the distributed source (X, Y, Z, p XY, d) assume the distortion function d is bounded and continuous. Let Û, ˆV, ˆP and ˆQ be finite random variables jointly distributed with XY according to the channel p ˆP ˆQÛ ˆV XY such that Û and ˆV take values from a finite Abelian group G, and ˆP and ˆQ take values from finite sets P and Q respectively. Assume the following Markov chains hold ˆP X Y ˆQ Û ( ˆP, X) (Y, ˆQ) ˆV 35

147 and assume there exists a bounded and continuous (with respect to its first argument) function g : G P Q ˆ Z such that { E d(x, Y, g(ẑ, ˆP, ˆQ)) } D for Ẑ = Û + ˆV where + is the group operation. We show that with these definitions the rate-distortion triple (R, R 2, D) is achievable where R I(X; ˆP ˆQ) G + D(pÛX ˆP pŵ p X ˆP ) Ic.c.(Ẑ; ˆP ˆQ) R 2 I(Y ; ˆQ ˆP G ) + D(p ˆV Y ˆQ pŵ p Y ˆQ) Ic.c.(Ẑ; ˆP ˆQ) R + R 2 I(XY ; ˆP ˆQ) + D(pÛ ˆV XY ˆP ˆQ pŵ Ŵ p XY ˆP ˆQ) 2I G c.c.(z; P Q) where Ŵ and Ŵ are independent random variables uniformly distributed over G. The rest of this section is devoted to proving this theorem. In order to prove the theorem, it suffices to show the achievability of the following corner point: R = I(X; ˆP G ) + D(pÛX ˆP pŵ p X ˆP ) Ic.c.(Ẑ; ˆP ˆQ) R 2 = I(Y ; ˆQ ˆP G ) + D(p ˆV Y ˆQ pŵ p Y ˆQ) Ic.c.(Ẑ; ˆP ˆQ) The proof of this theorem is similar or the proof of Theorem III. with the difference that we use the notion of weak* typicality instead of the strong typicality. We need to show the following for the proof to go through: Size of the Typical Set: Lemma V.6. Let X and ˆP be jointly distributed random variables distributed according to the measure p X ˆP such that X is a random variable over a Polish alphabet X and ˆP is a finite random variable over P. Let x be a weakly* typical sequence in X n then for any ɛ > 0 there exists a δ > 0 such that 2 log P D(p X ˆP p XpŴ ) δ A n ɛ ( ˆP x) 2 log P D(p X ˆP p XpŴ )+δ 36

148 where Ŵ is a uniform random variable over P independent of X and ˆP and δ can be made to go to zero as ɛ 0. Proof. Let Ŵ n be random variable uniformly distributed over P n. Then we have A n ɛ ( ˆP x) = P n p ṋ W (Ŵ n A n ɛ ( ˆP x) ) = P n p ṋ W ((x, Ŵ n ) A n ɛ (X ˆP ) ) The rest of the proof follows from Lemmas V.5 and V.6. A special case is where both X = ˆX and ˆP are finite which is the standard strong typicality result since log P D(p ˆX ˆP p ˆXp ˆP ) = H( ˆP ˆX). Probability of the Typical Set and the Regular Markov Lemma: Let X and Y be two random variables over polish alphabets with joint distribution p XY. It is shown in [47] that the probability of the typical set P ((X n, Y n ) A n ɛ (XY )) approaches one as ɛ 0 and n. Let x A n ɛ (X) and let Y n be distributed according to p n Y X ( x) then it is shown in [47] that P ((x, Y n ) A n ɛ (XY )) approaches one as ɛ 0 and n (the regular Markov Lemma). Probability of a Typical Sequence: Let X and ˆP be jointly distributed random variables distributed according to the measure p XP such that X is a random variable over a Polish alphabet X and ˆP is a finite random variable over P. Let x be a weakly* typical sequence in X n. Then, for any ˆp A n ɛ ( ˆP x) and for any such ɛ > 0 there exists a δ > 0 such that 2 log P D(p X ˆP p XpŴ )+δ pṋ (ˆp x) P X 2 log P D(p X ˆP p XpŴ ) δ where Ŵ is a uniform random variable over P independent of X and ˆP and δ can be made to go to zero as ɛ 0. To show this, let Q, Q 2, be a sequence of increasing finite quantizations such that the sigma field generated by i=f Qi is equal to F X and let [X] Qi and [x] Qi be the corresponding quantized random variables and sequences. 37

149 Such a sequence exists by [47, Lemma]. It remains to show that lim i p ˆP [X]Qi (ˆp [x] Qi ) = p ˆP X (ˆp x) lim i = 2 log P D(p [X] Qi ˆP p [X]Qi pŵ ) 2 log P D(p X ˆP p XpŴ ) The first equality holds since p ˆP X is a channel (see [47, Definition 2]). The second equality holds since by definition, D(p [X]Qi ˆP p [X]Qi pŵ ) D(p X ˆP p X p ˆP ) and since is a continuous function of x. 2 log P x The Strong Markov Lemma: Lemma III.2 can be extended to the case where X and Y are not necessarily finite: Lemma V.7. Let X, Y, Z be random variables taking values from Polish alphabets X, Y, Z respectively such that Z is finite and the Markov chain X Y Z holds. For n =, 2,, let (x (n), y (n) ) A n ɛ (XY ) and let K (n) be a random vector taking values from Z n with distribution satisfying (for simplicity of notation we call them x, y, K respectively) P (K = z) p n Z Y (z y)e ɛnn for some ɛ n 0 as n. Then, as n Proof. Provided in Section P ((x, y, K) A n ɛ (XY Z)) Law of Large Numbers and the Convergence of the Average Distortion: We need to show that for (x, y, z, p, q) A n ɛ (XY ZP Q), n d(x i, y i, z i, p i, q i ) E{d(X, Y, g(z, P, Q))} n i= By definition of weak* typicality (and weak convergence of measures), the above happens if the function d(x, Y, g(z, P, Q)) is bounded and continuous. A sufficient condition is to have d bounded and both d and g continuous. 38

150 Arbitrary Auxiliary Random Variables and Bounded Continuous Distortion Function If we restrict the result of the previous section to the case where the Abelian groups are finite fields, then the following rates are achievable for finite auxiliary random variables: R = I(X; ˆP ) + D(pÛX ˆP pŵ p X ˆP ) D(pẐ ˆP ˆQ p W p ˆP ˆQ) R 2 = I(Y ; ˆQ ˆP ) + D(p ˆV Y ˆQ p W p Y ˆQ) D(pẐ ˆP ˆQ pŵ p ˆP ˆQ) Where Ẑ = Û + ˆV and Ŵ is a uniform random variable over the finite field. For random variables X, P, Q, U, Z, let the random variables Z, P, Q be identically distributed to Z, P, Q and be independent of X, P, Q, U, Z. Define r(x, P, Q, U, Z) = sup inf E{log p [U] Q [X] Q [P ] ([U] Q Q [X] Q [P ] Q ) Q Q 2 p [Z ] Q2 [P ] Q2 [Q ] Q2 ([Z ] Q2 [P ] Q2 [Q ] Q2 ) } where the supremum and infimum are taken over the set of all finite partitions of the Polish space and similarly define r(y, P, Q, V, Z). Then, the above rates are equivalent to R = I(X; ˆP ) + r(x, ˆP, ˆQ, Û, Ẑ) R 2 = I(Y ; ˆQ ˆP ) + r(y, ˆP, ˆQ, ˆV, Ẑ) It is straightforward to generalize the above result to the case where P and Q are not necessarily discrete to achieve the following corner point: R = I(X; P ) + r(x, P, Q, Û, Ẑ) R 2 = I(Y ; Q P ) + r(y, P, Q, ˆV, Ẑ) Definition Let U = V = Z = R be Polish spaces and let f : U V Z be an arbitrary function. Let G, G 2, G 3, be a sequence of finite fields and with a slight abuse of notation, for i =, 2,, define the corresponding quantization 39

151 mappings as follows: q i q i : R G i : G i R For i =, 2,, let Ûi = q i (q i (U)), ˆV i = q (q i (V )) and Ẑi = q i i ( ) q i (U) + Gi q i (V ). We say that the function f(, ) is embeddable in the sequence G, G 2, if there exist quantization mappings so that the sequence (X, Y, P, Q, Ûi, ˆV i, Ẑi) converges weakly (in distribution) to (X, Y, P, Q, U, V, Z). Lemma V.8. Let (X, Y, P, Q, Ûi, ˆV i, Ẑi) be a sequence of random variables converging in distribution to (X, Y, P, Q, U, V, Z). Then r(x, P, Q, Ûi, Ẑi) r(x, P, Q, U, Z) r(y, P, Q, ˆV i, Ẑi) r(y, P, Q, V, Z) if the quantities on the right-hand-side exist. Proof. For any ɛ > 0, let Q and Q 2 be finite partitions such that r(x, P, Q, U, Z) E{log p [U] Q [X] Q [P ] ([U] Q Q [X] Q [P ] Q ) p [Z ] Q2 [P ] Q2 [Q ] Q2 ([Z ] Q2 [P ] Q2 [Q ] Q2 ) } ɛ Using [47, Lemma 7], we can restrict attention to partitions Q and Q 2 consisting of continuity sets. It can be verified that r( ˆX, ˆP, ˆQ, Û, Ẑ) is a continuous function of the probability masses when all random variables are finite. Let δ > 0 be such that if the total variation distance between a probability mass functions of ( ˆX, ˆP, ˆP, ˆQ, Û, Ẑ ) and ([X] Q, [P ] Q, [P ] Q2 [Q] Q2, [U] Q, [Z] Q2 ) is less than δ then r( ˆX, ˆP, ˆQ, Û, Ẑ) E{log p [U] Q [X] Q [P ] ([U] Q Q [X] Q [P ] Q ) p [Z ] Q2 [P ] Q2 [Q ] Q2 ([Z ] Q2 [P ] Q2 [Q ] Q2 ) } ɛ 2 Let N be such that for i > N, the total variation distance between the probability mass density of ([ ˆX i ] Q, [ ˆP i ] Q, [ ˆP i ] Q2, [ ˆQ i] Q2, [Ûi] Q, [Ẑ i] Q2 ) and the probability mass 40

152 density of ([X] Q, [P ] Q, [P ] Q, [Q ] Q2, [U] Q, [Z ] Q2 ) is less than δ. Then for i > N, we have r( ˆX i, ˆP i, ˆQ i, Ûi, Ẑi) E{log p [U] Q [X] Q [P ] ([U] Q Q [X] Q [P ] Q ) p [Z ] Q2 [P ] Q2 [Q ] Q2 ([Z ] Q2 [P ] Q2 [Q ] Q2 ) } ɛ 2 Therefore, r(x, P, Q, U, Z) r( ˆX i, ˆP i, ˆQ i, Ûi, Ẑi) ɛ for all i > N. Theorem V.9. For the distributed source (X, Y, Z, p XY, d) assume X, Y and Zˆ are polish spaces and assume the distortion function d : X Y Z ˆ R + is bounded and continuous. Let U, V, P and Q be random variables jointly distributed with XY according to the channel p P QUV XY such that U and V take values from a polish spaces U = V = R, and P and Q take values from sets P and Q respectively. Assume the following Markov chains hold P X Y Q U (P, X) (Y, Q) V Let Z = f(u, V ) for some function f(, ) which is embeddable in a sequence G, G 2, of finite fields. Assume there exists a continuous function g : R P Q that { } E d(x, Y, g(z, P, Q)) D ˆ Z such Then, if r(x, P, Q, U, Z) and r(y, P, Q, V, Z) exist, the rate-distortion triple (R, R 2, D) is achievable where R = I(X; P ) + r(x, P, Q, U, Z) R 2 = I(Y ; Q P ) + r(y, P, Q, V, Z) 4

153 Proof. Note that for i =, 2,, Û i and ˆV i are functions of U and V respectively. Therefore, the following Markov chains hold: P X Y Q Û i (P, X) (Y, Q) ˆV i The weak convergence of (X, Y, P, Q, Ûi, ˆV i, Ẑi) to (X, Y, P, Q, U, V, Z) and the continuity of the functions g and d and the boundedness of d imply that { } { } E d(x, Y, g(ẑi, P, Q)) E d(x, Y, g(z, P, Q)) D Therefore, the rate-distortion tuple (R, R 2, D) is achievable where R = I(X; P ) + r(x, P, Q, Ûi, Ẑi) R 2 = I(Y ; Q P ) + r(y, P, Q, ˆV i, Ẑi) The proofs follows since r(x, P, Q, Ûi, Ẑi) r(x, P, Q, U, Z) r(y, P, Q, ˆV i, Ẑi) r(y, P, Q, V, Z) Arbitrary Auxiliary Random Variables and Bounded Continuous Distortion Function The result above can be generalized to the case where the distortion function is continuous but not necessarily bounded. The approach is similar to the one proposed in Section and is omitted Calculation of the Rates for Distributions with Densities In this section, we calculate the rates r(x, P, Q, U, Z) and r(y, P, Q, V, Z) for the case where all probability density functions are defined. It is straightforward to show 42

154 that in this case, { r(x, P, Q, U, Z) = E log f U XP (U XP ) } f Z P Q (Z P Q ) = h(z P Q) h(u XP ) Similarly, we can show that r(y, P, Q, V, Z) = h(z P Q) h(v Y Q) so that the rate-distortion tuple (R, R 2, D) is achievable where R = I(X; P ) + h(z P Q) h(u XP ) R 2 = I(Y ; Q P ) + h(z P Q) h(v Y Q) Examples In this section, we present two examples of mappings which are embeddable in a sequence of finite fields. Real Addition is Embeddable in a Sequence of Fields: Let all alphabets be equal to R and let f(u, V ) = U + V where + is the real addition. For i =, 2,, let γ i = 2 i and let p i be the smallest prime larger than 2 2i (so that γ i p i as i ). Let the sequence of finite fields be defined by G i = F pi for i =, 2,. Define the quantization mappings q i : R G i and q i follow: q i (x) = k p i q i 0 x < γ ip i 2 + γ i : G i R as x ( γ ip i 2 + kγ i, γ ip i 2 + (k + )γ i ), k = 2,, pi x > γ ip i 2 + (p i )γ i = γ ip i 2 γ i (k) = 2k + p i γ i k = 0,, p i 2 43

155 Note that this is essentially a uniform quantizer. We show that with these quantizers, the real addition is embeddable in the sequence G, G 2,. It suffices to show that (X, Y, P, Q, Ûi, ˆV i, Ẑi) converges in probability to (X, Y, P, Q, U, V, Z). We need to show that for any ɛ, δ > 0, there exists N > 0 such that for all i > N, P ( U Ûi < δ, V ˆV i < δ, Z Ẑi < δ) ɛ Let L be such that P (U ( L, L), V ( L, L)) > ɛ 3 and let N be such that for i = N, γ i δ 2 and γ ip i > 4L. These conditions guarantee that U Ûi < δ 2 and V ˆV i < δ with probability larger than ɛ. Furthermore, under the condition 2 3 γ i p i > 4L and for u, v L we have q i (q i (u) + Gi q i (v)) = q (q i (u)) + q (q i (v)) where the addition on the right-hand-side is the real addition. We have i i P ( Z Ẑi δ) P ( Z Ẑi δ, U L, V L) + ɛ 3 = P ( U + V Ûi ˆV i δ, U L, V L) + ɛ 3 P ( U Ûi + V ˆV i δ, U L, V L) + ɛ 3 = ɛ 3 Mod-2π Addition is Embeddable in a Sequence of Fields: This case is similar to the previous case. The rate-distortion tuple (R, R 2, D) is achievable where R = I(X; P ) + h(z P Q) h(u XP ) R 2 = I(Y ; Q P ) + h(z P Q) h(v Y Q) where Z = U + V (mod 2π). Other examples: Real addition in R n and mod-2π addition in R n can be embedded in the sequence F n p i. R + with multiplication operation is embeddable in F pi with pre-mappings u log u 44

156 and v log v and the post mapping log z z. (2 R, ) is embeddable in F pi with log pre-mappings Appendix Proof of Lemma V.7 Let f XY Z be a generating field defined over X Y Z according to [47, Cor. 8]. By definition of weak* typicality (see Definition and Theorem of [47]), we need to show that for any set S in f XY Z, lim n Px,y,K (S) = P XY Z (S) with probability one. Any open set in f XY Z can be represented as a disjoint countable union of sets of the form A B C where A X, B Y and C Z. Let f X, f Y and f Z be generating fields over X, Y and Z respectively defined as in [47, Cor. 8]. It suffices to show that for all A f X, B f Y and C f Z, Px,y,K (A, B, C) P XY Z (A, B, C) 0 with probability one as n. We have Px,y,K (A, B, C) p XY Z (A, B, C) p XY Z (A, B, C) P x,y W Z Y (A, B, C) + Px,y W Z Y (A, B, C) P x,y,k (A, B, C) Note that w-lim n Px,y = p XY. Therefore, [47, Lemma 6] implies w-lim n Px,y,K = p XY Z with probability one. This implies lim n Px,y,K (A, B, C) = p XY Z with probability one or equivalently, the first term in the equation above vanishes as n increases almost surely. Next, we show that the second term also vanishes almost surely. We have P x,y W Z Y (A, B, C) P x,y,k (A, B, C) = n = n n i= n i= [ {xi A,y i B} WY Z (C y i ) {Ki C}] θ i 45

157 where for i =, 2,, n, [ ] θ i = {xi A,y i B} WY Z (C y i ) {Ki C} It suffices to show that n n θ i 0 i= almost surely as n where θ i = W Y Z (C y i ) {Ki C} Let Z n be a random vector generated according to W Z Y ( y) and define θ i = W Y Z (C y i ) {Zi C} Note that both θ i and θ i are binary random variables taking values from the set {W Z Y (C y i ), W Z Y (C y i ) } and θ i, θ i. We have E{ θ i } = 0 and var{ θ i }. It follows from [Proposition, Zhiyi Chi s paper] that θ i satisfied the large deviations principle with a good rate function I( ) such that ( θ P + + θ ) n t e ni(t) n where I(t) is positive. For b {W Z Y (C y i ), W Z Y (C y i ) } n, we have ) P ( θ = b = z Z n b i =W Z Y (C y i ) z i / C b i =W Z Y (C y i ) z i C z Z n b i =W Z Y (C y i ) z i / C b i =W Z Y (C y i ) z i C ( θ = e ɛnn P = b) P (K = z) W n Z Y (z y)e ɛnn 46

158 We have P ( θ + + θ n n t ) = b: b + +bn n nt e ɛnn e n(i(t) ɛn) b: b + +bn n nt ) P ( θ = b ( θ P = b) Note that since e n(i(t) ɛn) is summable, the Borel-Cantelli lemma implies that for all t > 0, lim sup n n n θ i t i= Therefore, n n i= θ i 0 as n almost surely. 47

159 CHAPTER VI Polar Codes for Point-to-Point Communications 6. Polar Codes for Arbitrary DMCs In this section, we show that polar codes with their original (u, u + v) kernel, achieve the symmetric capacity of discrete memoryless channels with arbitrary input alphabet sizes. It is shown that in general, channel polarization happens in several, rather than only two levels so that the synthesized channels are either useless, perfect or partially perfect. Any subset of the channel input alphabet which is closed under addition, induces a coset partition of the alphabet through its shifts. For any such partition of the input alphabet, there exists a corresponding partially perfect channel whose outputs uniquely determine the coset to which the channel input belongs. By a slight modification of the encoding and decoding rules, it is shown that perfect transmission of certain information symbols over partially perfect channels is possible. Our result is general regarding both the cardinality and the algebraic structure of the channel input alphabet; i.e we show that for any channel input alphabet size and any Abelian group structure on the alphabet, polar codes are optimal. Due to the modifications we make to the encoding rule of polar codes, the constructed codes fall into a larger class of structured codes called nested group codes. Polar codes were originally proposed by Arikan in [0] for discrete memoryless 48

160 channels with a binary input alphabet. Polar codes over binary input channels are shifted linear (coset) codes capable of achieving the symmetric capacity of chan- nels. These codes are constructed based on the Kronecker power of the 2 2 matrix 0 construction. and are the first known class of capacity achieving codes with an explicit It is known that non-binary codes outperform binary codes in certain communication settings. Therefore, constructing capacity achieving codes for channels of arbitrary input alphabet sizes is of great interest. In order to construct capacity achieving codes over non-binary channels, there have been attempts to extend polar coding techniques to channels of arbitrary input alphabet sizes. It is shown in [66] that polar codes achieve the symmetric capacity of channels when the size of the input alphabet is a prime. For channels of arbitrary input alphabet sizes, it is shown in [66] that the original construction of polar codes does not necessarily achieve the symmetric capacity of the channel due to the fact that polarization (into two levels) may not occur for arbitrary channels. In the same paper, a randomized construction of polar codes based on permutations is proposed. In this approach, the existence of a polarizing transformation is shown by a random coding argument over the ensemble of permutations of the input alphabet. In another approach in [66], a code construction method is proposed which is based on the decomposition of the composite input channel into sub-channels of prime input alphabet sizes. In this multilevel code construction method, a separate polar code is designed for each sub-channel of prime input alphabet size. In [48], the problem of channel polarization using arbitrary kernels is studied and several sufficient conditions are provided under which a kernel can polarize a non-binary channels. It is shown in [65] that the two-level polarization of arbitrary DMC s can be achieved using a variety of non-linear polarizing transforms. 49

161 Another related work is [5], in which the authors have shown that polar codes, with their original (u, u + v) kernel, are sufficient to achieve the uniform sum rate on any binary input MAC and it is stated that the same technique can be used for the point-to-point problem to achieve the symmetric capacity of the channel when the size of the alphabet is a power of 2. In a recent work, it has been shown in [54] that polar codes achieve the symmetric capacity of channels with input alphabet size a power of 2. The difference between the approach proposed in [5] and the result of [54] is that in the former, the channel s input alphabet is assumed to be the group Z r 2 (with componentwise mod-2 operation) for some integer r and in the latter, the channel s input alphabet is assumed to be the group Z 2 r (with mod-2 r operation) for some integer r. Both of these cases can be recovered from the general result we propose in this paper depending on how the channel input alphabet is endowed with an Abelian group structure. The techniques used in [54] to prove the polarization, although not explicitly using the group-theoretical terminology, are similar to the techniques used in [58] and the current paper when they are specialized to channels of size 2 r with mod-2 r operation. However in [54], the convergence of Bhattacharyya parameters is shown through a new martingale convergence type result which is different from the approach of this paper. In this section, we show that with a slight modification of the encoding and decoding rules, polar codes, with their original (u, u+v) kernel, are sufficient to achieve the symmetric capacity of all discrete memoryless channels. Our result is general regarding both the cardinality and the algebraic structure of the channel input alphabet; i.e we show that for any channel input alphabet size and any Abelian group structure on the alphabet, polar codes are optimal. This result was first reported in [58]. We use a combination of algebraic and coding techniques and show that in general, channel 50

162 polarization occurs in several levels rather than only two: Suppose the channel input alphabet is G and is endowed with an Abelian group structure. Then for any subset H of the channel input alphabet G which is closed under addition (i.e any subgroup of G), there may exist a corresponding polarized channel which can perfectly transmit the index of the shift (coset) of H in G which contains the input. As an example, for a channel of input alphabet Z 6, there are four subgroups of the input alphabet: i) {0} with cosets {0}, {}, {2}, {3}, {4} and {5}, ii) {0, 3} with cosets {0, 3}, {, 4} and {2, 5}, iii) {0, 2, 4} with cosets {0, 2, 4} and {, 3, 5} and iv) Z 6. For polar codes over Z 6, the asymptotic synthesized channels can exist in four forms: i) can determine which one of the cosets {0}, {}, {2}, {3}, {4} or {5} contains the input symbol, (perfect channels with capacity log 2 6 bits per channel use), ii) can determine which one of the cosets {0, 3}, {, 4} or {2, 5} contains the input symbol (partially perfect channels with capacity log 2 3 bits per channel use), iii) can determine which one of the cosets {0, 2, 4} or {, 3, 5} contains the input symbol (partially perfect channels with capacity bit per channel use), iv) can only determine the input belongs to {0,, 2, 3, 4, 5} (useless channel). Cases i,ii,iii and iv correspond to coset decompositions of Z 6 based on subgroups {0}, {0, 3}, {0, 2, 4} and {0,, 2, 3, 4, 5} respectively. Although standard binary polar codes are group (linear) codes, the class of capacity achieving codes constructed and analyzed in this paper are not group codes. If polar codes are used in their standard form, i.e. when only perfect channels are used and partially perfect and useless channels are ignored, it can be shown that they will form group codes. It is known that group codes do not generally achieve the symmetric capacity of discrete memoryless channels [6, 8]. Hence, one could have predicted that standard polar codes cannot achieve the symmetric capacity of arbitrary channels and a modification of the encoding rule is indeed necessary to achieve that goal. Due to the modifications we make to the encoding rule of polar codes, the 5

163 constructed codes fall into a larger class of structured codes called nested group codes. 6.. Preliminaries 6... Symmetric Capacity and the Bhattacharyya Parameter For a channel (X, Y, W ), the symmetric capacity is defined as I 0 (W ) = I(X; Y ) where the channel input X is uniformly distributed over X and Y is the output of the channel; i.e. for q = X, I 0 (W ) = x X y Y W W (y x) log q (y x) q W (y x) The Bhattacharyya distance between two distinct input symbols x and x is defined as x X Z(W {x, x} ) = y Y W (y x)w (y x) and the average Bhattacharyya distance is defined as Z(W ) = Binary Polar Codes x, x X x x q(q ) Z(W {x, x}) For any N = 2 n, a polar code of length N designed for the channel (Z 2, Y, W ) is a linear (coset) code characterized by a generator matrix G N and a set of indices A {,, N} of almost perfect channels. The generator matrix for polar codes is defined as G N = B N F n where B N is a permutation of rows, F = 0 and denotes the Kronecker product. The set A is a function of the channel and determines the locations of the information bits. The decoding algorithm for polar codes is a specific form of successive cancellation [0]. 52

164 6...3 Polar Codes Over Abelian Groups For any discrete memoryless channel, there always exists an Abelian group of the same size as that of the channel input alphabet. In general, for an Abelian group, there may not exist a multiplication operation. Since polar encoders are characterized by a matrix multiplication, before using these codes for channels of arbitrary input alphabet sizes, a generator matrix for codes over Abelian groups needs to be properly defined. In Appendix 6..6., a convention is introduced to generate codes over groups using {0, }-valued generator matrices Notation We denote by O(ɛ) any function of ɛ which is right-continuous around 0 and that O(ɛ) 0 as ɛ 0. For positive integers N and r, let {A 0, A,, A r } be a partition of the index set {, 2,, N}. Given sets T t for t = 0,, r, the direct sum r t=0 T t At the set of all tuples u N = (u,, u N ) such that u i T t whenever i A t. is defined as 6..2 Motivating Examples A key property of the basic polarizing transforms used for binary polar codes is that they have perfect and useless channels as their fixed points ; in the sense that, if these transforms are applied to a perfect (useless) channel, the resulting channel is also perfect (useless). Moreover, these type of channels are the only fixed points of these transformations. In the following, we try to demonstrate that for non-binary channels, the basic transforms have fixed points which are neither perfect nor useless. Consider a 4-ary channel (Z 4, Y, W ) and assume the channel is such that W (y u) = W (y u + 2) for all y Y and all u Z 4 ; i.e. the channel cannot distinguish between inputs u and u + 2. Consider the transformed channels W and 53

165 W + originally introduced in [0] (Refer to Equations (6.32) and (6.33) of the current paper). It turns out that W + (y, y 2, u u 2 ) = W + (y, y 2, u u 2 + 2) W (y, y 2 u ) = W (y, y 2 u + 2) for all y, y 2 Y and all u, u 2 Z 4. This observation is closely related to the fact that {0, 2} is closed under addition mod-4; i.e. the fact that {0, 2} forms a subgroup of Z 4. This means that the transformed channels inherit this characteristic feature of the original channel, in the sense that they cannot distinguish between inputs u i and u i + 2 (i = 2 for W + and i = for W ). This suggests that even in the asymptotic regime, the transformed channels can only distinguish between the sets {0, 2} and {, 3}, and not within each set. In the following, we give an example for which such cases indeed exist in the asymptotic regime. Consider the channel depicted in Figure 6.. For this channel, the symmetric capacity is equal to C = I(X; Y ) = 2 ɛ 2λ bits per channel use. Depending on the values of the parameters ɛ and λ, this channel can present three extreme cases: ) If λ =, this channel is useless. 2) If ɛ =, this channel cannot distinguish between inputs u and u + 2 and has a capacity of bit per channel use. 3) If ɛ = λ = 0, this channel is perfect and has a capacity of 2 bits per channel use. Given a sequence of bits b b 2 b n, define W b b 2 b n as in [0, Section IV], and let I(W b b 2 b n ) be the mutual information between the input and output of W b b 2 b n when the input is uniformly distributed. We can find I(W b b 2 b n ) using the following recursion for which the proof can be found in Appendix Define ɛ 0 = ɛ and λ 0 = λ. For i =,, n, 54

166 0 0 ɛ ɛ E 2 2 λ λ 3 λ λ ɛ E3 ɛ E2 3 Figure 6.: Channel : The input of the channel has the structure of the group Z 4. The parameters ɛ and λ take values from [0, ] such that ɛ + λ. E and E 2 are erasures connected to cosets of the subgroup {0, 2}. The lines connecting the output symbols 0, 2,, 3 to their corresponding inputs, represent a conditional probability of ɛ λ. For this channel, the process I(W b b 2 b n ) can be explicitly found for each n and the multilevel polarization can be observed. If b i =, let ɛ i = ɛ 2 i + 2ɛ i λ i λ i = λ 2 i (6.) If b i = 0, let ɛ i = 2ɛ i ( ɛ 2 i + 2ɛ i λ i ) λ i = 2λ i λ 2 i (6.2) Then we have I(W b b 2 b n ) = 2 ɛ n 2λ n bits per channel use. Consider the function f : [0, ] 2 [0, ] 2, f(ɛ, λ) = (ɛ 2 + 2ɛλ, λ 2 ) corresponding to Equation (6.). The fixed points of this function are given by (0, ), (, 0) and (0, 0). Similarly, consider the function g : [0, ] 2 [0, ] 2, g(ɛ, λ) = (2ɛ (ɛ 2 + 2ɛλ), 2λ λ 2 ) corresponding to Equation (6.2). It turns out that the fixed points of g are the same as those of f. This suggests that in the limit, the transformed channels converge to one of three extreme cases discussed above. Figures 6.2 and 6.3 show that it is indeed the case and depict the three level polarization of the mutual information process I(W b b 2 b n ) to a discrete random variable I as n grows. When N = 2 n is large, let N 0 be the number of useless channels (corresponding to the width of the first step in Figure 6.3), N be the number of partially perfect 55

167 channels (corresponding to the width of the second step in Figure 6.3) and N 2 be the number of perfect channels (corresponding to the width of the third step in Figure 6.3). Since the mutual information process is a martingale, it follows that C = E{I } N 0 N 0 + N N + N 2 N 2 where C is the symmetric capacity of the channel. Consider the following encoding rule: For indices corresponding to useless channels, let the input symbol take values from {0} (from the transversal of the subgroup Z 4 of Z 4 i.e. fix the input). For indices corresponding to partially perfect channels, let the input symbol take values from {0, } (from the transversal of the subgroup {0, 2} of Z 4 ). For indices corresponding to perfect channels, let the input symbol take values from Z 4 (choose information symbols from the transversal of the subgroup {0} of Z 4 ). It turns out that this encoding rule used with an appropriate decoding rule has a vanishingly small probability of error as N becomes large. The rate of this code is equal to R = N (N 0 log 2 + N log N 2 log 2 4) This means R = C is achievable using polar codes. Next, we consider a channel with a composite input alphabet size. Consider the channel depicted in Figure 6.4. We call this Channel 2. It turns out that given a sequence of bits b b 2 b n, the transformed channel W b b 2 b n is (equivalent to) a channel of the same type as Channel 2 but with possibly different parameters ɛ, λ and γ. At each step n, the corresponding parameters can be found using the following recursion: Define ɛ 0 = ɛ, λ 0 = λ and γ 0 = γ. For i =,, n, If b i =, let γ i = γ 2 i + 2γ i ɛ i + 2γ i λ i ɛ i = ɛ 2 i + 2γ i ɛ i + 2ɛ i λ i λ i = λ 2 i 2γ i ɛ i (6.3) 56

168 If b i = 0, let γ i = 2γ i ( ) γi 2 + 2γ i ɛ i + 2γ i λ i ɛ i = 2ɛ i ( ) ɛ 2 i + 2γ i ɛ i + 2ɛ i λ i λ i = 2λ i + 2γ i ɛ i ( λ 2 i ) (6.4) Then we have I(W b b 2 b n ) = log 2 6 γ n log 2 2 ɛ n log 2 3 λ n log 2 6 The proof of the recursion formulas for Channel 2 is similar to that of Channel and is omitted. The fixed points of the functions corresponding to Equations (6.3) and (6.4) are given by (0, 0, 0), (, 0, 0), (0,, 0), (0, 0, ), (, 0, ), (0,, ), (,, ), and (,, 2), out of which (0, 0, 0), (, 0, 0), (0,, 0) and (0, 0, ) are admissible. Note that (0, 0, 0) corresponds to a perfect channel with a capacity of log 2 6 bits per channel use, (, 0, 0) corresponds to a partially perfect channel which can perfectly send the index of the coset of the subgroup {0, 3} to which the input belongs and has a capacity of log 2 3 bits per channel use, (0,, 0) corresponds to a partially perfect channel which can perfectly send the index of the coset of the subgroup {0, 2, 4} to which the input belongs and has a capacity of log 2 2 bits per channel use, and (0, 0, ) corresponds to a useless channel. This suggests that in the limit, the transformed channels converge to one of these four extreme cases. This can be confirmed using the recursion formulas for this channel as depicted in Figures 6.5 and 6.6. With encoding and decoding rules similar to those of Channel, we can show that polar codes achieve the symmetric capacity of this channel. In the next section, we show that polar codes achieve the symmetric capacity of channels with input alphabet size equal to a power of a prime Polar Codes Over Channels with input Z p r In this section, we consider channels of input alphabet size q = p r for some prime number p and a positive integer r. In this case, the input alphabet of the channel can 57

169 be considered as a ring with addition and multiplication modulo p r. We prove the achievability of the symmetric capacity of these channels using polar codes and later in Section 6..4 we will generalize this result to channels of arbitrary input alphabet sizes and arbitrary group operations Z p r Rings Let G = Z p r = {0,, 2,, p r } with addition and multiplication modulo p r be the input alphabet of the channel, where p is a prime and r is an integer. For t = 0,,, r, define the subgroup H t of G as the set: H t = p t G = {0, p t, 2p t,, (p r t )p t } and define H r+ =. For t = 0,,, r, define the subset K t of G as K t = H t \H t+ ; i.e. K t is defined as the set of elements of G which are a multiple of p t but are not a multiple of p t+. Note that K 0 is the set of all invertible elements (i.e. set of all elements with a multiplicative inverse) of G and K r = {0}. Let T t be a transversal of H t in G; i.e. a subset of G containing one and only one element from each coset of H t in G. One valid choice for T t is {0,,, p t }. Note that given H t and T t, each element g of G can be represented uniquely as a sum g = ĝ + g where ĝ T t and g H t Recursive Channel Transformation It has been shown in [0] that the error probability of polar codes over binary input channels is upper bounded by the sum of Bhattacharyya parameters of certain channels defined by a recursive channel transformation. Hence, the study of these channels is essential to show that polar codes achieve the symmetric capacity of channels. A similar set of synthesized channels appear for polar codes over channels 58

170 with arbitrary input alphabet sizes. The channel transformations are given by: W (y, y 2 u ) = u 2 G q W (y u + u 2)W (y 2 u 2) (6.5) W + (y, y 2, u u 2 ) = q W (y u + u 2 )W (y 2 u 2 ) (6.6) for y, y 2 Y and u, u 2 G. Repeating these operations n times recursively, we obtain N = 2 n channels W () N by:,, W (N). For i =,, N, these channels are given N W (i) N (yn, u i u i ) = u N i+ GN i q N W N (y N u N G N ) where G N is the generator matrix for polar codes. For the case of binary input channels, it has been shown in [0] that as N, these channels polarize in the sense that their Bhattacharyya parameters approaches either zero (perfect channels) or one (useless channels). In the next part, we formally state the following multilevel polarization result: In general, when the input alphabet is a prime power, polarization happens in multiple levels so that as N, these channels become useless, perfect or partially perfect The Multilevel Polarization Result In this section, we state the multilevel polarization result for the Z p r case and we prove it in the subsequent section. First, we define some quantities which are used in the statement and the proof of multilevel polarization. For an integer n, let J n be a uniform random variable over the set {, 2,, N = 2 n }. ) Define the random variable I n (W ) as where X and Y are the input and the output of W (Jn) N I n (W ) = I(X; Y ) (6.7) respectively and X is uniformly distributed. It has been shown in [0] that the process I 0, I, I 2, is a martingale 59

171 for the binary case. The same proof is valid for the general case as well. Hence E{I n } = I 0. 2) For an integer n and for d G, define the random variable Z n d (W ) = Z d(w (Jn) N ) where for a channel (G, Y, W ), Z d (W ) = q W (y x)w (y x + d) x G y Y = Z(W {x,x+d} ) (6.8) q x G Note that similar to the Bhattacharyya parameter, Z d (W ) takes values from [0, ]. In the extreme case when Z d (W ) is zero, for any x G, the channel can completely distinguish between x and x + d. On the other hand, when Z d (W ) is one, for any x G, the two input symbols x and x + d are equilikely given any channel output. 3) Let H be an arbitrary subgroup of G. Note that any uniform random variable defined over G can be decomposed into two uniform and independent random variables X and X where X takes values from the transversal T of H and X takes values from H. For an integer n, define the random variable IH n (W ) as I n H(W ) = I(X; Y X) = I( X; Y X) (6.9) where X and Y are the input and the output of W (Jn) N respectively. These processes along with Zd n (W ) processes are used to show the convergence of the mutual information process to a discrete random variable. 4) For t = 0,, r, define the random variable Z t (W (i) N ) = d/ H t Z d (W (i) N ) and the random process (Z t ) (n) (W ) = Z t (W (Jn) N ) where J n is a uniform random variable over {, 2,, N = 2 n }. We will see later that these processes are related to the probability of error incurred by polar codes. The following theorem states the multilevel polarization result for the Z p r case: 60

172 Theorem VI.. For all ɛ > 0 and β < 2, there exists a number N = N(ɛ) = 2n(ɛ) and disjoint subsets A ɛ 0, A ɛ,, A ɛ r of {,, N} such that for t = 0,, r and i A ɛ t, I(W (i) N ) t log 2 p ɛ and Z t (W (i) N ) < 2 2βn(ɛ). Moreover, as ɛ 0, Aɛ t p t for some probabilities p 0,, p r adding up to one. N Proof of Multilevel Polarization In this section, we prove the multilevel polarization through a series of lemmas. In the first lemma, we show that IH n (W ) is a super-martingale. Lemma VI.2. For an arbitrary group G and for any subgroup H of G, the random process IH n (W ) defined by Equation (6.9) is a super-martingale. Proof. Define the channels W and W + as in (6.32) and (6.33). Define the random variables U, U 2, X, X 2, Y and Y 2 where U and U 2 are uniformly distributed over G, X = U +U 2 where addition is the group operation, X 2 = U 2 and Y (respectively Y 2 ) is the channel output when the input is X (respectively X 2 ). Decompose the random variable U into two uniform and independent random variables Û and Ũ where Û takes values from the transversal T of H and Ũ takes values from H. Similarly define, Û2, X, X 2 and Ũ2, X, X 2. We need to show that I(Ũ; Y Y 2 Û) + I(Ũ2; Y Y 2 U Û2) 2I( X ; Y X ) Note that I( X ; Y X ) = I(X ; Y ) I( X ; Y ). Since I n is a martingale, we have I(U ; Y Y 2 ) + I(U 2 ; Y Y 2 U ) = 2I(X; Y ) Therefore, it suffices to show I(Û; Y Y 2 ) + I(Û2; Y Y 2 U ) 2I( X ; Y ) 6

173 We have I(Û2; Y Y 2 U ) = I(Û2; Y Y 2 Û Ũ ) = I(Û2; Y Y 2 Û ) + I(Û2; Ũ Y Y 2 Û ) I(Û2; Y Y 2 Û ) Hence, I(Û; Y Y 2 )+I(Û2; Y Y 2 U ) I(Û; Y Y 2 )+I(Û2; Y Y 2 Û ) = I(ÛÛ2; Y Y 2 ) (a) = I( X X2 ; Y Y 2 )=2I( X ; Y ) where (a) follows since Û and Û2 are recoverable from X and X 2 and vice versa as shown in the following: To show the forward direction, let U and U 2 take values from G and let X = U +U 2 and X 2 = U 2. We need to show that if X is in the same coset of H as X (i.e. if X X H or equivalently X = X ) and X 2 is in the same coset of H as X 2 (i.e. if X 2 X 2 H or equivalently X 2 = X 2 ), then U is in the same coset of H as U (i.e. U U H or equivalently Û = Û) and U 2 is in the same coset of H as U 2 (i.e. U 2 U 2 H or equivalently Û 2 = Û2). Note that X 2 X 2 H implies U 2 U 2 H and X X H implies U + U 2 U U 2 H. Since U 2 U 2 H (and hence U 2 U 2 H), U U H + U 2 U 2 implies U U H. For the other direction, note that X ) = T (Û + Û2 + H and X 2 = Û2. The next lemma is a restatement of Lemma 2 of [66] with a slight generalization: Lemma VI.3. Suppose B n, n Z + is a {, +}-valued process with P (B n = ) = P (B n = +) =. Suppose I 2 n and T n are two processes adapted to the process B n satisfying the following conditions. I n takes values in the interval [0, ]. 2. I n converges almost surely to a random variable I. 62

174 3. T n takes values in the interval [0, ]. 4. T n+ = T 2 n when B n+ = For all ɛ > 0, there exists δ > 0 such that for n Z +, T n δ implies I n ɛ. 6. For all ɛ > 0, there exists δ > 0 such that for n Z +, T n δ implies I n ɛ. Then T = lim n T n exists with probability and I, T both take values in {0, }. Proof. The proof follows from Lemma 2 of [66] in a straightforward fashion. In the next lemma, we show that for any d G, the random process Z n d converges to a Bernoulli random variable. Lemma VI.4. For all d G, Zd n (W ) converges almost surely to a {0, }-valued random variable Z d (W ) as n grows. Moreover, if d G is such that d = d then Z d (W ) = Z d (W ) almost surely; i.e. the random processes Zñ d (W ) and Zn d (W ) converge to the same random variable. Proof. Let H = d be the subgroup of G generated by d and let M be a maximal subgroup of H. Let d = argmax Z a (W ) (6.0) a H a/ M In Lemma VI.3, let I n (Here we use the notation I n instead of I n for notational convenience) be equal to the process IH n (W ) In M (W ) where In H (W ) and In M (W ) are defined by Equation (6.9) and let T n be equal to the process Zd n (W ) defined in (6.8). We claim that I n and T n satisfy the conditions of Lemma VI.3. The proof is given in the following: Recall that a uniform random variable X over G can be decomposed into two uniform and independent random variables X taking values from H and X taking values from 63

175 the transversal of H in G. Similarly, the uniform random variable X over H can be decomposed into two uniform and independent random variables X taking values from M H and X taking values from the transversal of M in H. Using the chain rule, we have: I( X; Y X) = I( X X; Y X) = I( X; Y X) + I( X; Y X X) Note that X M and ( X, X) indicate the coset of M in G to which X belongs. Therefore, the equation above implies that for each n, I n H (W ) In M (W ) = I( X; Y X) where X and Y are the input and the output of the channel W (Jn). Note that X N takes values from cosets of M in H and X takes values from cosets of H in G. Therefore, I( X; Y X) is the mutual information between the coset of M in H to which X belongs and Y given the coset of H in G to which X belongs. Since X can at most take H M values, by choosing the base of the log function to be equal to H M condition () of Lemma VI.3 is satisfied. We have shown in Lemma VI.2 that both processes IH n (W ) and In M (W ) are supermartingales and hence both converge almost surely. This means that the vector valued random process (IH n (W ), In M (W )) converges almost surely (refer to Proposition 5.25 of [37]). Hence condition (2) is satisfied. Condition (3) trivially holds and condition (4) is shown in Lemma VI.6 in Appendix To show (5), assume Z n d (W ) δ for some δ > 0 to be determined later. Let T H be a transversal of H in G and let T M be a transversal of M in H. Given X t H +H for some t H T H, the joint probability distribution of cosets of M in t H + H and the 64

176 channel output is given by: p(t H +t M +M, y) m MP (X=t H +t M +m, Y =y X t H +H) = m M P (X = t H + t M + m, Y = y) P (X t H + H) = P (X = t H + t M + m, Y = y) H / G m M = G H G W (y t H + t M + m) m M = W (y t H + t M + m) H m M where t M takes values from T M. The corresponding channel is defined as: m M W (y t H + t M + m) H W (y t H + t M + M) = P (X t H + t M + M X t H + H) = W (y t H + t M + m) (6.) M m M Note that the input of this channel takes values from the set {t H + t M + M t M T M } uniformly and the size of the input alphabet is q H M which is a prime (since M is maximal in H). Furthermore, by definition I( W ) = I( X; Y X = th ). It is shown in Appendix that Z d (W ) δ implies Z( W ) Cδ for a constant C = M H G H M. Therefore, part () of Lemma VI.4 in Appendix implies I( W q ) log q + C( q )δ = q log q + H G δ where the base of the logarithm in the calculation of the mutual information is set to be q. This result is valid for all t H T H. Therefore I H (W ) I M (W ) = t H T H P ( X = t H )I( X; Y X = th ) log q q + H G δ Hence, for any ɛ > 0, any choice of 0 < δ Z n d (W ) δ I H(W ) I M (W ) ɛ. qɛ H G guarantees for n Z +, 65

177 To show condition (6), assume that Z n d (W ) δ. For the channel W defined as above, it is shown in Appendix (An alternate proof for the Z p r case is provided in Appendix ) that Z d (W ) δ implies Z d +t H +M( W ) 2q 2δ δ 2 q M. Since the input alphabet of the channel W has a prime size and d H\M, we can use Lemma VI.5 in Appendix to conclude that Z( W ) > 2q q2 2δ δ 2 M therefore, Z( W ) 2 = ( + Z( W ))( Z( W )) 2( Z( W )) 4q q2 2δ δ 2. Now M we use part (2) of Lemma VI.4 in Appendix to conclude I( W 4q q ) 2( q ) log q e 2 2δ δ 2 C 4 δ M q for a constant C = 4 q( q ) log q e 2 where as above, the base of the logarithm in M the calculation of the mutual information is set to be q. This implies: I H (W ) I M (W ) = t H T H P ( X = t H )I( X; Y X = th ) C 4 δ and Hence, for any ɛ > 0, any choice of 0 < δ ( ɛ C ) 4 guarantees for n Z +, Z n d (W ) δ I H (W ) I M (W ) ɛ. So far, we have shown that for any d G, for H = d and d defined as in (6.0), the random variable Zd n (W ) converges to a Bernoulli random variable. Note that so far the proof is general and applies to arbitrary groups as well. We will use this part of the proof later in Section Next, we show that when G = Z p r, for any d H\M (including d itself), Z ñ (W ) converges to a Bernoulli random variable. Moreover, us- d ing the fact that they all take values from {0, }, we show that for all such d s, they converge to the same random variable. To see this, note that if Z n d δ, it follows that Z ñ d δ for all d H\M (since by definition, a = d achieves the maximum of Z a (W ) among all a H\M) and if Z n d δ it follows from Lemma VI.7 in 66

178 Appendix that for all d d = H, Z ñ d q3 δ. Note that when G = Z p r, H\M is the set of all elements d such that d = d. This completes the proof of the lemma. The next lemma gives a sufficient condition for two processes Z n d and Zñ d to converge to the same random variable. Recall that for 0 t r, K t = H t \H t+. Lemma VI.5. If d, d K t for some 0 t r, then Z n d and Zñ d converge to the same Bernoulli random variable. Proof. Note that d, d K t implies d = d = H t. Therefore, Lemma VI.4 implies Z n d and Zñ d converge to the same Bernoulli random variable. For t = 0,,, r, pick an arbitrary element k t K t. The lemma above suggests that we only need to study Z kt s rather than all Z d s. Lemma VI.6. For t s r, if Z kt δ for some k t K t, then Z ks q 3 δ for all k s K s. Proof. Follows from Lemma VI.7 in Appendix and the fact that k s k t. This lemma implies that for the group G = Z p r all possible asymptotic cases are: Case 0: Z k0 =, Z k =, Z k2 =,, Z kr = Case : Z k0 = 0, Z k =, Z k2 =,, Z kr = Case 2: Z k0 = 0, Z k = 0, Z k2 =,, Z kr =. Case r: Z k0 = 0, Z k = 0, Z k2 = 0,, Z kr = 0, where for t = 0,, r, case t happens with some probability p t. This implies (Z t ) (n) (W ) converges to a random variable (Z t ) ( ) (W ) almost surely and P ( (Z t ) ( ) = 0 ) = r s=t p s. Next, we study the behavior of I n in each of these asymptotic cases. 67

179 Lemma VI.7. For a channel (Z p r, Y, W ), let t be an integer taking values from {0,,, r}. For any ɛ > 0, there exists a δ > 0 such that if Z k0 δ, Z k δ,, Z kt δ, Z kt δ,, Z kr δ for all k s K s (s = 0,, r ), then t log 2 p ɛ I 0 (W ) t log 2 p + ɛ. Proof. Note that for all s = 0,, r, M s k s+ is a maximal subgroup of k s. In the proof of Lemma VI.4, if we let d = k 0 and M 0 = k, the choice of δ = pɛ/ log 2 q H G guarantees that ( ɛ/ log 2 q) log 2 p I G (W ) I M0 (W ) = I(W ) I M0 (W ) log 2 p (in bits). Similarly, it follows that ( ɛ/ log 2 q) log 2 p I Ms (W ) I Ms+ (W ) log 2 p ( 4 ɛ for all 0 s t. For s t, the choice of δ = C log 2 q) guarantees IMs I Ms+ (ɛ/ log 2 q) log 2 p = ɛ/r (in bits). Therefore, r ( I 0 (W ) = I G (W ) = IMs (W ) I Ms+(W ) ) s=0 t ( = IMs (W ) I Ms+(W ) ) + s=0 r ( IMs (W ) I Ms+(W ) ) s=t r t log 2 p + ɛ/r s=0 t log 2 p + ɛ s=t and t ( I 0 (W ) = I G (W ) = IMs (W ) I Ms+(W ) ) + s=0 r ( IMs (W ) I Ms+(W ) ) s=t t ( ɛ/ log 2 q) log 2 p s=0 t log 2 p ɛ 68

180 We have shown that the process I n converges to the following r + valued discrete random variable: I = t log 2 p with probability p t for t = 0,, r. We are now ready to prove the theorem: For the channel (Z p r, Y, W ), consider the vector random process V n = (Zk n 0, Zk n,, Zk n r, I n ). We have seen in the previous section that each component of this vector random process converges almost surely. Proposition 5.25 of [37] implies that the vector random process V n also converges almost surely to a random vector V. The random vector V is a discrete random variable defined as follows: t times r t times {}}{{}}{ P V = ( 0,, 0,,,, t log 2 p) = p t for t = 0,,, r where p t s are some probabilities. This implies that for all δ > 0, there exists a number N = N(δ) = 2 n(δ) and disjoint subsets A δ 0, A δ,, A δ r of {,, N} such that for t = 0,, r and i A δ t, Z ks (W (i) N ) δ if 0 s < t and Z ks (W (i) N ) δ if t s < r. This implies if i Aδ t then Z t (W (i) N ) qδ. Moreover, as δ 0, Aδ t N p t for some probabilities p 0,, p r adding up to one. For ɛ > 0, let δ be as in Lemma VI.7. Then, for t = 0,, r and i A δ t, we have I(W (i) N ) t log 2 p ɛ. Similarly, for ɛ > 0 if we let δ = ɛ/q, we get Z t (W (i) N ) ɛ. For any ɛ > 0, taking the minimum of the two δ s guarantees the existence of a number N = N(ɛ) = 2 n(ɛ) and disjoint subsets A ɛ 0, A ɛ,, A ɛ r of {,, N} such that for t = 0,, r and i A ɛ t, I(W (i) N ) t log 2 p ɛ and Z t (W (i) N ) < ɛ. Finally, in Appendix , we show that for any β < 2 and for t = 0,, r, ( lim P (Z t ) (n) < 2 2βn) P ( (Z t ) ( ) = 0 ) (6.2) n r = This rate of polarization result concludes the proof of Theorem VI.. s=t p s 69

181 Encoding and Decoding In the original construction of polar codes, we fix the input symbols corresponding to useless channels and send information symbols over perfect channels. Here, since the channels do not polarize into two levels, the encoding is slightly different and we send some information bits over partially perfect channels. At the encoder, if i A ɛ t for some t = 0,, r, the information symbol is chosen from the transversal T t uniformly and not from the whole set G. As we will see later, the channel W (i) N perfect for symbols chosen from T t and perfect decoding is possible at the decoder. Let XN ɛ = r t=0 T Aɛ t t be the set of all valid input sequences. For the sake of analysis, as in the binary case, the message u N vector b N r t=0 HAɛ t t is dithered with a uniformly distributed random revealed to both the encoder and the decoder. A message v N X ɛ N is encoded to the vector xn = (v N + b N )G N. Note that u N = v N + b N is uniformly distributed over G N. At the decoder, after observing the output vector y N, for t = 0,, r and i A ɛ t, use the following decoding rule: is û i = f i (y N, û i ) = argmax W (i) N (yn, û i g) g b i +T t where the ties are broken arbitrarily. Finally, the message is decoded as ˆv N = û N b N. The total number of valid input sequences is equal to 2 NR = r T t At = t=0 r p t At t=0 r t=0 p tptn Therefore, R r t=0 p tt log 2 p. On the other hand, since I n is a martingale, we have E{I } = I 0. Since E{I } = r t=0 p tt log 2 p, we observe that the rate R is equal to the symmetric capacity I 0. We will see in the next section that this rate is achievable. 70

182 Error Analysis In this section, we show that the error probability of polar codes approaches zero as the block length increases when the rate of transmission is equal to the symmetric capacity of the channel. Let B i be the event that the first error occurs when the decoder decodes the ith symbol: B i = { (u N, y N ) G N Y N j < i, u j = f j (y N, u j ), u i f i (y N, u i ) } (6.3) { (u N, y N ) G N Y N u i f i (y N, u i ) } For t = 0,, r and i A ɛ t, define { E i = (u N, y N ) G N Y N W (i) N (yn, u i u i ) } W (i) N (yn, u i ũ i ) for some ũ i b i + T t, ũ i u i (6.4) Lemma VI.8. For t = 0,, r and i A ɛ t, P (E i ) q 2 Z t (W (i) N ). Proof. For u i G, write u i = b i (u i ) + v i (u i ) where b i (u i ) H t and v i (u i ) T t. We 7

183 have P (E i ) = u N,yN u N,yN = u i,yn q N W N(y N u N ) Ei (u N, y N ) q W N(y N N u N ) q u N i+ ũ i b i (u i )+T t ũ i u i q W N(y N N u N ) ũ i b i (u i )+T t ũ i u i = q W (i) N (yn, u i u i ) u i,yn = u i G q ũ i b i (u i )+T t ũ i u i = u i,yn u i G ũ i b i (u i )+T t ũ i u i ũ i b i (u i )+T t ũ i u i W (i) N (yn, u i ũ i ) W (i) N (yn, u i u i ) W (i) N (yn, u i ũ i ) W (i) N (yn, u i u i ) W (i) N (yn, u i ũ i ) W (i) N (yn, u i u i ) W (i) N (yn, u i ũ i )W (i) N (yn, u i u i ) q Z {u i,ũ i }(W (i) N ) For u i G and ũ i b i (u i ) + T t, if u i ũ i, then u i, ũ i are not in the same coset of H t and hence u i ũ i / H t. Therefore, u i ũ i G\H t. Note that for d = u i ũ i, Z {ui,ũ i }(W (i) N ) qz d(w (i) N ). Since d G\H t, we have Z d (W (i) N ) Zt (W (i) N ) and hence, Z {ui,ũ i }(W (i) N ) qzt (W (i) N ) Therefore, P (E i ) q T t Z t (W (i) N ) q2 Z t (W (i) N ). The probability of block error is given by P (err) = r t=0 i A P (B i). ɛ t Since 72

184 B i E i, we get P (err) (a) r t=0 i A ɛ t q 2 Z t (W (i) N ) (6.5) r A ɛ t q 2 2 2βn (6.6) t=0 q 2 N2 2βn (6.7) for any β < 2 where (a) follows from Theorem VI.. Therefore, the probability of error goes to zero as ɛ 0 (and hence n ) Polar Codes Over Arbitrary Channels For any channel input alphabet there always exist an Abelian group of the same size. In this section, we generalize the result of the previous section to channels of arbitrary input alphabet sizes and arbitrary group operations Abelian Groups Let the Abelian group G be the input alphabet of the channel. It is known that any Abelian group can be decomposed into a direct sum of Z p r rings [6]. Let G = L l= R l with R l = Z p r l l where p l s are prime numbers and r l s are positive integers. For t = (t, t 2,, t L ) with t l {0,,, r l }, there exists a corresponding subgroup H of G defined by H = L l= pt l l R l. For a subgroup H of G define T H to be a transversal of H in G Recursive Channel Transformation The Basic Channel Transforms The transformed channels W + and W and the process I n (W ) are defined the same way as the Z p r case through Equations (6.32), (6.33) and (6.7). 73

185 Asymptotic Behavior of Synthesized Channels For d G, define Zd n (W ) similarly to (6.8) where q = G and for H G, define IH n (W ) by Equation (6.9). The following lemma is a restatement of Lemma VI.4. Here, we prove it for arbitrary groups. Lemma VI.9. For all d G, Zd n (W ) converges almost surely to a {0, }-valued random variable Z d (W ) as n grows. Moreover, if d G is such that d = d then Z d (W ) = Z d (W ) almost surely; i.e. the random processes Zñ d (W ) and Zn d (W ) converge to the same random variable. Proof. Similarly to the proof of Lemma VI.4, let H = d and let M be any maximal subgroup of H. Define d = argmax Z a (W ) (6.8) a H a/ M It is relatively straightforward to show that in the general case as well, Zd n (W ) converges to a {0, }-valued random variable Zd (W ). Indeed this part of the proof of Lemma VI.4 is general enough for arbitrary Abelian groups. Here we show that this implies Zd n (W ) also converges to a Bernoulli random variable. Let H = k i= qa i i where q i s are distinct primes and a i s are positive integers. Note that H is isomorphic to the cyclic group Z H. For i =,, k, define the subgroup M i = q i of Z H (and isomorphically of H) and let d i = argmax a H a/ M i Z a (W ). Note that for i =,, k, M i is a maximal subgroup of Z H (and isomorphically of H). Therefore, for i =,, k, Z n d i(w ) converges to a {0, }-valued random variable. If for some i =,, k, Z d i (W ) δ it follows that Z d (W ) δ (since d H\M i ) and if for all i =,, k, Z d i (W ) δ, it follows from Lemma VI.8 in Appendix that Z d(w ) 2kq 3 δ for any d d, d 2,, d k. Next, we show that d, d 2,, d k = H and this will prove that if for all i =,, k, Z d i (W ) δ 74

186 then Z d (W ) 2kq 3 δ. For i =,, k, since d i / M i it follows that d i 0 (mod q i ). Let a = i= k k j= j i q j d i Then we have a 0 (mod q i ) for all i =,, k. This implies a = H and hence d, d 2,, d k = H. Therefore, if in the limit Z d (W ) = 0 for some i =,, k then i Z d (W ) = 0 and if Z d i (W ) = 0 for all i =,, k then Z d (W ) =. This proves that Zd n (W ) converges to a Bernoulli random variable. If d G is such that d = d then it follows that d H and d / M i for i =,, k. Therefore if in the limit Z d i (W ) = 0 for some i =,, k then Z d(w ) = 0 and if Z d i (W ) = for all i =,, k then Z d(w ) =. This proves that the random processes Z ñ(w ) and d Zn d (W ) converge to the same random variable. In the asymptotic regime, let d, d 2,, d m be all elements of G such that Z di (W ) = and assume that for all other elements d G, Z d (W ) = 0 (we can make this assumption since in the limit Z d s are {0, }-valued). It is shown in Lemma VI.8 in Appendix that if Z di (W ) = for i =,, m then for any d d, d 2,, d m, Z d(w ) =. Therefore, d, d 2,, d m {d, d 2,, d m } and hence we must have {d, d 2,, d m } = d, d 2,, d m = H for some subgroup H of G. This means all possible asymptotic cases can be indexed by subgroups of G. i.e. for any H G, one possible asymptotic case is if d H; Case H: Z d (W ) = 0 Otherwise. where for H G, case H happens with some probability p H. Next, We study the behavior of I n in each of these cases. 75

187 Lemma VI.0. For a channel (G, Y, W ) let S be a subgroup of G. For any ɛ > 0 there exists a δ > 0 such that if Z d δ for d S and Z d δ for d / S, then log 2 G S ɛ I 0 (W ) log 2 G S + ɛ. Proof. Let 0 = M 0 M M t S = M t M t+ G = M k for some positive integer k log 2 G be any chain of subgroups such that M s is maximal in M s for s =,, k. Fix some s {,, t} and let H = M s and M = M s and let T H be a transversal of H in G and let T M be a transversal of M in H. For d H, we have Z d (W ) δ. For t H T H define the channel W (y th + t M + M s ) similar to (6.). We have shown in Appendix that if for some d H\M, Z d (W ) δ then Z d+th +M( W ) 2q 2δ δ 2. Since the input alphabet of the channel W q M has a prime size (see Lemma VI.9 in Appendix ), we can use Lemma VI.5 in Appendix to conclude that Z( W ) 2q q2 2δ δ 2. Now, similarly to the proof M of Lemma VI.4, we use part (2) of Lemma VI.4 in Appendix to conclude I( W ) C 4 q δ for C = 4 q( q ) log 2 e 2. This result is valid for all t M H T H. Since I( W ) = I( X; Y X = th ), we conclude that I Ms (W ) I Ms (W ) = I H (W ) I M (W ) = t H T H P ( X = t H )I( X; Y X = th ) < C 4 δ Fix some s {t +,, k} and let H = M s and M = M s and let T H be a transversal of H in G and let T M be a transversal of M in H. For d H\M, we have Z d (W ) δ. For the channel W defined as above, we have shown in Appendix that if for all d H\M, Z d (W ) δ then Z( W ) M H G. Therefore, part () of H M 76

188 Lemma VI.4 in Appendix implies I( W q ) log 2. We conclude that + H G δ Therefore, I Ms (W ) I Ms (W ) = I H (W ) I M (W ) = log 2 q + H G δ log 2 M s / M s + G 2 δ t ( I G (W)= IMs (W ) I Ms (W ) ) k ( + IMs (W ) I Ms (W ) ) s= k s=t+ k s=t+ I Ms (W ) I Ms log 2 M s / M s + G 2 δ s=t+ log 2 G S k log 2( + G 2 δ) The choice of 0 < δ 2ɛ/k G 2 will guarantee that I 0 (W ) log 2 G S ɛ. Similarly, t ( I G (W)= IMs (W ) I Ms (W ) ) k ( + IMs (W ) I Ms (W ) ) s= t C 4 δ + s= k s=t+ kc 4 G δ + log 2 S s=t+ log 2 M s M s The choice of 0 < δ ( ) ɛ 4 kc will guarantee that I 0 G (W ) log 2 + ɛ. S We have shown that the process I n converges to the following discrete random variable: I = log 2 G H with probability p H for H G. For H G, define the random variable Z H (W (i) N ) = d/ H Z d(w (i) N ) and the random process (Z H ) (n) (W ) = Z H (W (Jn) N ) where J n is a uniform random variable over {, 2,, N = 2 n }. Note that (Z H ) (n) (W ) converges almost surely to a random variable (Z H ) ( ) (W ) and P ( (Z H ) ( ) = 0 ) = S H p S. 77

189 Summary of Channel Transformation For the channel (G, Y, W ), the convergence of the processes I n and (Z H ) n for H G implies that for all ɛ > 0, there exists a number N = N(ɛ) and a partition {A ɛ H H G} of {,, N} such that for H G and i Aɛ (i) H, I(W N ) = log G 2 + H O(ɛ) and Z H (W (i) N ) = O(ɛ). Moreover, as ɛ 0, Aɛ H N p H, H G. In Appendix , we show that for any β < 2 p H for some probabilities and for H G, ( lim P (Z H ) (n) < 2 2βn) P ( (Z H ) ( ) = 0 ) (6.9) n r = We have proved the following theorem: Theorem VI.. For all ɛ > 0, there exists a number N = N(ɛ) = 2 n(ɛ) and a partition {A ɛ H H G} of {,, N} such that for H G and i Aɛ (i) H, I(W N ) = S H G log + O(ɛ) and 2 H ZH (W (i) N ) < 2 2βn(ɛ). Moreover, as ɛ 0, Aɛ H N probabilities p H, H G. p S p H for some Encoding and Decoding At the encoder, if i A ɛ H for some H G, the information symbol is chosen from the transversal T H arbitrarily. Let X ɛ N = H G T Aɛ H H be the set of all valid input sequences. As in the Z p r case, the message u N is dithered with a uniformly distributed random vector b N H G HAɛ H revealed to both the encoder and the decoder. A message v N X ɛ N is encoded to the vector xn = (v N + b N )G N. Note that u N = v N + b N is uniformly distributed over G N. At the decoder, after observing the output vector y N, for H G and i A ɛ H, use the following decoding rule: û i = f i (y N, û i ) = argmax W (i) N (yn, û i g) g b i +T H 78

190 where the ties are broken arbitrarily. Finally, the message is recovered as ˆv N = û N b N. The total number of valid input sequences is equal to Therefore, R H G 2 NR = H G T H A H H G ( ) AH G H A H log G N 2. On the other hand, since H In is a martingale, we have E{I } = I 0. Since E{I } = H G p G H log 2, we observe that the rate R H converges to the symmetric capacity I 0 as ɛ 0. We will see in the next section that this rate is achievable. It is worth mentioning the complexity of these codes is similiar to the binary case; i.e. the complexity of encoding and the complexity of successive cancelation decoding are both O(N log N) as functions of code blocklength N Error Analysis For H G and i A ɛ H, define the events B i and E i according to Equations (6.3) and (6.4). Similar to the Z p r case, it is straightforward to show that for H G and i A ɛ H, P (E i) q 2 Z H (W (i) N ) where q = G. The probability of block error is given by P (err) = H G i A P (B ɛ i ). Since B i E i, we get H P (err) H G H G i A ɛ H q 2 Z H (W (i) N ) A ɛ H q 2 2 2βn q 2 N2 2βn for any β <. Therefore, the probability of block error goes to zero as ɛ 0 2 (n ). 79

191 6..5 Relation to Group Codes Recall that for an arbitrary group G, the polar encoder of length N introduced in this paper maps the set H G T A H H to GN where for a subgroup H of G, T H is a transversal of H and {A H H G} is some partition of {,, N}. Note that the set of messages H G T A H H is not necessarily closed under addition and hence in general, the set of encoder outputs is not a subgroup of G N ; i.e. polar codes constructed and analyzed in Sections and 6..4 are not group encoders. To the contrary, standard polar codes (i.e. polar codes in which only perfect channels are used) are indeed group codes since their set of messages is of the form G A {0} {,,N}\A for some A {,, N} which is closed under addition. It is worth mentioning that polar encoders constructed in this paper fall into a larger class of structured codes called nested group codes. Nested group codes consist of two group codes: the inner code C i and the outer code C o such that the inner code is a subgroup of the outer code (C i C o ). The set of messages consists of cosets of C i in C o. For the case of polar codes, the inner code is given by [ ] C i = H A H G H G { = mg m H G H A H and the outer code is the whole group space: C o = G N. To verify that this is indeed the case, it suffices to show that the set of codewords of polar codes [ H G T ] A H H G has only one common element with each coset of C i. } Equivalently, it suffices to show that for m, m 2 G N, if m G m 2 G C i, then either m m 2 / H G T A H H. / H G T A H H or Lemma VI.2. For N = 2 n where n is a positive integer, the generator matrix corresponding to polar codes G N = B N F n is full rank. 80

192 Proof. Since G N = B N F n where B N is a permutation of rows, it suffices to show that F n is full rank. Note that the rank of the Kronecker product of two matrices is equal to the product of the ranks of matrices and the rank of F is equal to 2. Hence we have rank(g) = rank(f n ) = 2 n = N. This lemma implies that if m G m 2 G C i then m m 2 H G HA H. This means either m are indeed nested group codes. / H G T A H H or m 2 / H G T A H. This proves that polar codes H In this section, we consider two examples of channels over Z 4. The first example is Channel introduced in Section Based on the symmetry of this channel, we show that polar codes achieve the group capacity of this specific channel. The intent of the second example is to show that in general, polar codes do not achieve the group capacity of channels. In order to find the capacity of polar codes as group codes, we use the standard construction of polar codes, i.e. we only use perfect channels and fix partially perfect and useless channels Example Consider Channel of Figure 6.. Define H 0 = {0,, 2, 3}, H = {0, 2} and H 2 = {0} and define K 0 = {, 3}, K = {2} and K 2 = {0}. For this channel we have: I 0 I(X; Y ) = 2 ɛ 2λ I2 0 I(X ; Y ) = (ɛ + λ) (I 2) 0 I(X ; Y ) = (ɛ + λ) = I2 0 where X is uniform over Z 4, X is uniform over H and X is uniform over + H. The capacity of group codes over this symmetric channel is equal to [6]: C = min(i4, 0 I2 0 + (I 2) 0 ) = min(2 ɛ 2λ, 2 2ɛ 2λ) = 2 2ɛ 2λ 8

193 All possible cases for this channel are Case 0: Z = Z 3 =, Z 2 = Case : Z = Z 3 = 0, Z 2 = Case 2: Z = Z 3 = 0, Z 2 = 0 As we saw in Figures 6.2 and 6.3, this result agrees with the asymptotic behavior of I n predicted by the recursion formulas (6.) and (6.2). Define I(W b b 2 b n ) = I(X; Y ) where X, Y are the input and output of W b b 2 b n and X is uniform over Z 4. Similarly, define I 2 (W b b 2 b n ) = I(X ; Y ) where X, Y are the input and output of W b b 2 b n and X is uniform over H and define I 2(W b b 2 b n ) = I(X ; Y ) where X, Y are the input and output of W b b 2 b n and X is uniform over + H. Define the mutual information processes I n 4, I n 2 and (I 2) n to be equal to I(W b b 2 b n ), I 2 (W b b 2 b n ) and I 2(W b b 2 b n ) where for i =,, n, b i s are iid Bernoulli(0.5) random variables. For this channel, we can show that I 2 (W b b 2 b n ) = I 2(W b b 2 b n ) = (ɛ n + λ n ) and conclude that (I 2 + I 2) n I n 2 + (I 2) n is a martingale. Therefore I n 4 and (I 2 + I 2) n converge almost surely to random variables I 4 and (I 2 + I 2) respectively. This observation provides us with an ad-hoc way to find the probabilities p t, t = 0,, 2 of the limit random variable I 4 for this simple channel. We can show the following for the final states: case 0 I 4 = 0, (I 2 + I 2) = 0 case I 4 =, (I 2 + I 2) = 0 case 2 I 4 = 2, (I 2 + I 2) = 2 82

194 Therefore, we obtain the following three equations: E{I 4 } = p p + p 2 2 = I 0 4 = 2 ɛ 2λ E{(I 2 + I 2) }=p 0 0+p 0+p 2 2=(I 2 +I 2) 0 =2 2ɛ 2λ p 0 + p + p 2 = Solving this system of equations, we obtain: p 2 = ɛ λ = C/2 p = I 0 4 (I 2 + I 2) 0 p 0 = ( I 0 4 (I 2 + I 2) 0 /2 ) We see that the fraction of perfect channels is equal to the capacity of the channel achievable using group codes and therefore, polar codes achieve the capacity of group codes for this channel Example 2 The channel is depicted in Figure 6.7. We call this Channel 3. For this channel, when λ = 0.2 we have: I 0 = I(X; Y ) = (I I 2) 0 = 0.26 The rate C = min(i 0 4, (I 2 +I 2) 0 ) = (I 2 +I 2) 0 = 0.26 is achievable using group codes over this channel [6]. For this channel, we have three possible asymptotic cases: Case 0: Z =, Z 2 = I 4 = 0, (I 2 + I 2) = 0 Case : Z = 0, Z 2 = I 4 =, (I 2 + I 2) = 0 Case 2: Z = 0, Z 2 = 0 I 4 = 2, (I 2 + I 2) = 2 83

195 Therefore we obtain the following three equations: E{I 4 } = p p + p 2 2 E{(I 2 + I 2) } = p p 0 + p 2 2 p 0 + p + p 2 = Therefore, the achievable rate using polar codes over this channel is equal to R = 2p 2 = E{(I 2 + I 2) }. We have E{(I 2 + I 2) } = which is strictly less than (I 2 + I 2) 0. The following lemma implies R = E{(I 2 + I 2) } E{(I 2 + I 2) } < C = (I 2 + I 2) 0 and completes the proof. Lemma VI.3. For a channel (Z 4, Y, W ), the process (I 2 + I 2) n, n = 0,, 2, is a super-martingale. Proof. Follow from Lemma VI.2 with H = {0, 2} Appendix Polar Codes Over Abelian Groups Given a k n matrix G n of 0 s and s, one can construct a group code as follows: Given any message tuple u k G k, encode it to u k G n. Where the elements of G n determine whether an element of u k appears as a summand in the encoded word or not. For example consider the generator matrix G 4 =

196 Then u 4 G 4 is defined as [u u 2 u 3 u 4 ] = u + u 2 + u 3 + u 4 u 3 + u 4 u 2 + u 4 u 4 Using this convention, we can define a group code based on a given binary matrix without actually defining a multiplication operation for the group Recursion Formula for Channel Recursion for W + We show that W + (corresponding to b = ) is equivalent to a channel of the same type as W but with different parameters ɛ and λ corresponding to ɛ and λ respectively; where, ɛ = ɛ 2 + 2ɛλ λ = λ 2 We say an output tuple (y, y 2, u ) is connected to an input u 2 Z 4 if W + (y, y 2, u u 2 ) = W (y 4 u + u 2 )W (y 2 u 2 ) is strictly positive. First, let us assume the output tuple (y, y 2, u ) is connected to all u 2 Z 4. Then W (y 2 u 2 ) must be nonzero for all u 2 and hence y 2 = E 3. Similarly since W (y u +u 2 ) is nonzero for all u 2 (and hence all u + u 2 ) it follows that y = E 3. Therefore W + (E 3, E 3, u u 2 ) = 4 λ2 for all u, u 2 Z 4 and these are all output tuples connected to all inputs (with positive probability). Since all of these output tuples are equivalent we can combine them to get a single output symbol connected to all four inputs with probability λ 2. 85

197 Next we show that if an output tuple is connected to an input from {0, 2} and an input from {, 3}, then it is connected to all inputs. Consider the case where the output tuple (y, y 2, u ) is connected to both 0 and i.e. W + (y, y 2, u 0) and W + (y, y 2, u ) are both nonzero. Then since W (y 2 0) 0 and W (y 2 ) 0, it follows that y 2 = E 3. Similarly since W (y u ) 0 and W (y u + ) 0, it follows that y = E 3. We have already seen that for all u Z 4, the output tuple (E 3, E 3, u ) is connected to all input symbols. The proof is similar for other three cases i.e. when (y, y 2, u ) is connected to 0 and 3, when (y, y 2, u ) is connected to 2 and, and when (y, y 2, u ) is connected to 2 and 3. Next we find all output tuples which are connected to both 0 and 2 but are not connected to or 3. Let (y, y 2, u ) be an output tuple such that W + (y, y 2, u 0) 0, W + (y, y 2, u 2) 0, W + (y, y 2, u ) = 0 and W + (y, y 2, u 3) = 0. First assume u {0, 2}. Since W (y 2 0) 0 and W (y 2 2) 0, it follows that y 2 {E, E 3 } and since W (y u ) 0 and W (y u + 2) 0, it follows that y {E, E 3 }. Note that for y = E 3 and y 2 = E 3, the output tuple is connected to all inputs and therefore all possible cases are y = E, y 2 = E, y = E, y 2 = E 3 and y = E 3, y 2 = E. In all cases it can be shown that W + (y, y 2, u ) = 0 and W + (y, y 2, u 3) = 0. Hence for u {0, 2}, (E, E, u ) is connected to 0 and 2 with probabilities 4 ɛ2 and is not connected to or 3. (E, E 3, u ) is connected to 0 and 2 with probabilities 4 ɛλ and is not connected to or 3. (E 3, E, u ) is connected to 0 and 2 with probabilities ɛλ and is not connected to or 3. 4 Now assume u {, 3}. Same as above we have y 2 {E, E 3 } and since W (y u ) 0 and W (y u + 2) 0, it follows that y {E 2, E 3 }. In this case, all possible cases are y = E 2, y 2 = E, y = E 2, y 2 = E 3 and y = E 3, y 2 = E. In all cases it can be shown that W + (y, y 2, u ) = 0 and W + (y, y 2, u 3) = 0. Hence for u {, 3}, (E 2, E, u ) is connected to 0 and 2 with probabilities 4 ɛ2 and is not connected to or 86

198 3. (E 2, E 3, u ) is connected to 0 and 2 with probabilities ɛλ and is not connected to 4 or 3. (E 3, E, u ) is connected to 0 and 2 with probabilities ɛλ and is not connected 4 to or 3. Therefore, there are four equivalent outputs connected to 0 and 2 with probabilities 4 ɛ2 and not connected to or 3 and there are eight equivalent outputs connected to 0 and 2 with probabilities ɛλ and not connected to or 3. Since all of these 4 outputs are equivalent, we can combine them into one output connected to 0 and 2 with probabilities ( ) ( ) 4 4 ɛ ɛλ = ɛ 2 + 2ɛλ Now we find all output tuples which are connected to both and 3 but are not connected to 0 or 2. Let (y, y 2, u ) be an output tuple such that W + (y, y 2, u ) 0, W + (y, y 2, u 3) 0, W + (y, y 2, u 0) = 0 and W + (y, y 2, u 2) = 0. First assume u {0, 2}. Since W (y 2 ) 0 and W (y 2 3) 0, it follows that y 2 {E 2, E 3 } and since W (y u + ) 0 and W (y u + 3) 0, it follows that y {E 2, E 3 }. Note that for y = E 3 and y 3 = E 3, the output tuple is connected to all inputs and therefore all possible cases are y = E 2, y 2 = E 2, y = E 2, y 2 = E 3 and y = E 3, y 2 = E 2. In all cases it can be shown that W + (y, y 2, u 0) = 0 and W + (y, y 2, u 2) = 0. Hence for u {0, 2}, (E 2, E 2, u ) is connected to and 3 with probabilities 4 ɛ2 and is not connected to 0 or 2. (E 2, E 3, u ) is connected to and 3 with probabilities 4 ɛλ and is not connected to 0 or 2. (E 3, E 2, u ) is connected to and 3 with probabilities ɛλ and is not connected to 0 or 2. 4 Now assume u {, 3}. Same as above we have y 2 {E 2, E 3 } and since W (y u + ) 0 and W (y u + 3) 0, it follows that y {E, E 3 }. In this case, all possible cases are y = E, y 2 = E 2, y = E, y 2 = E 3 and y = E 3, y 2 = E 2. In all cases it can be shown that W + (y, y 2, u 0) = 0 and W + (y, y 2, u 2) = 0. Hence for u {, 3}, (E, E 2, u ) is connected to and 3 with probabilities 4 ɛ2 and is not connected to 0 or 2. (E, E 3, u ) is connected to and 3 with probabilities ɛλ and is not connected to 4 87

199 0 or 2. (E 3, E 2, u ) is connected to and 3 with probabilities ɛλ and is not connected 4 to 0 or 2. Therefore, there are four equivalent outputs connected to and 3 with probabilities 4 ɛ2 and not connected to 0 or 2 and there are eight equivalent outputs connected to and 3 with probabilities ɛλ and not connected to 0 or 2. Same as above, since all 4 of these outputs are equivalent, we can combine them into one output connected to and 3 with probabilities ɛ 2 + 2ɛλ. We have shown that there is (equivalently) one channel output (call it E 3 + ) connected to all inputs u 2 Z 4 with conditional probability λ = λ 2 and we have shown that if a channel output is connected to more that one input but is not connected to all inputs, it is either connected to {0, 2} and is not connected to {, 3} (call it E + ) or it is connected to {0, 2} and is not connected to {, 3} (call it E 2 + ). 0 and 2 are connected to E + with probabilities ɛ = ɛ 2 + 2ɛλ and and 3 are connected to E + 2 with probabilities ɛ = ɛ 2 + 2ɛλ. Then for each input u 2 Z 4 these exist several outputs which are only connected to u 2 and not other inputs and whose sum of probabilities add up to ɛ λ. This completes the proof for W +. Recursion for W We show that W (corresponding to b = 0) is equivalent to a channel of the same type as W but with different parameters ɛ and λ corresponding to ɛ and λ respectively; where, ɛ = 2ɛ ( ɛ 2 + 2ɛλ ) λ = 2λ λ 2 Note that each channel output is a pair (y, y 2 ) {0,, 2, 3, E, E 2, E 3 } 2. The channel W can be shown to be as following: 88

200 Output pairs (0, 0), (, ), (2, 2), (3, 3) are only connected to input 0 each with conditional probability 4 ( ɛ λ)2. This is equivalent to one channel output only connected to 0 with probability ( ɛ λ) 2. Output pairs (0, 2), (, 3), (2, 0), (3, ) are only connected to input 2 each with conditional probability 4 ( ɛ λ)2. This is equivalent to one channel output only connected to 2 with probability ( ɛ λ) 2. Output pairs (0, 3), (, 0), (2, ), (3, 2) are only connected to input each with conditional probability 4 ( ɛ λ)2. This is equivalent to one channel output only connected to with probability ( ɛ λ) 2. Output pairs (0, ), (, 2), (2, 3), (3, 0) are only connected to input 3 each with conditional probability 4 ( ɛ λ)2. This is equivalent to one channel output only connected to 3 with probability ( ɛ λ) 2. Output pairs (0, E ), (, E 2 ), (2, E ), (3, E 2 ), (E, 0), (E, 2), (E 2, ), (E 2, 3) are only connected to inputs 0 and 2 each with conditional probability ɛ( ɛ λ). Output 4 pairs (E, E ), (E 2, E 2 ) are only connected to inputs 0 and 2 each with conditional probability 2 ɛ2. This is equivalent to one channel output only connected to 0 and 2 with probability ɛ = 8 4 ɛ( ɛ λ) ɛ2 = 2ɛ ( ɛ 2 + 2ɛλ ) Output pairs (0, E 2 ), (, E ), (2, E 2 ), (3, E ), (E, ), (E, 3), (E 2, 0), (E 2, 2) are only connected to inputs and 3 each with conditional probability ɛ( ɛ λ). Output 4 pairs (E, E 2 ), (E 2, E ) are only connected to inputs and 3 each with conditional probability 2 ɛ2. This is equivalent to one channel output only connected to and 3 with probability 2ɛ (ɛ 2 + 2ɛλ). Output pairs (0, E 3 ), (, E 3 ), (2, E 3 ), (3, E 3 ), (E 3, 0), (E 3, ), (E 3, 2), (E 3, 3) are connected to all inputs each with conditional probability λ( ɛ λ). Output pairs 4 89

201 (E, E 3 ), (E 2, E 3 ), (E 3, E ), (E 3, E 2 ) are connected to all inputs each with conditional probability ɛλ. Output pair (E 2 3, E 3 ) is connected to all inputs with conditional probability λ 2. This is equivalent to one channel output only connected to all inputs with probability ɛ = 8 4 λ( ɛ λ) + 4 ɛλ + λ2 2 = 2λ λ 2 We have listed all 49 channel outputs and the corresponding probabilities. This completes the proof for W Some Useful Lemmas Lemma VI.4. Let W be a channel with prime input alphabet size q. We have the following relations between I( W ) (in bits) and Z( W ):. I( W ) log 2 q +( q )Z( W ) 2. I( W ) 2( q )(log 2 e) Z(W ) 2 Proof. This lemma is a restatement of Proposition 2 of [66]. Lemma VI.5. Let W be a channel with prime input alphabet size q and define d = argmax a 0 Z a ( W ). If Z d ( W ) δ for some δ > 0, then Z( W ) q( q ) 2 δ. Proof. This lemma has been proved in [66] (Lemma 4). Lemma VI.6. For any d G, we have Z d (W + ) = Z d (W ) 2 90

202 Proof. By definition, Z d (W + ) is equal to q q W(y u 2 +u 2 )W(y 2 u 2 )W(y u +u 2 +d)w(y 2 u 2 +d) u G u 2 G y,y 2 Y = q = q u 2 G y 2 Y W (y2 u 2 )W (y 2 u 2 + d) q u 2 G y 2 Y = Z d (W ) 2 u G y Y W (y u +u 2 )W (y u +u 2 +d) W (y2 u 2 )W (y 2 u 2 + d) q ũ G y Y W (y ũ )W (y ũ + d) Lemma VI.7. For d G, if Z d (W ) δ for some δ > 0, then for d d, Z d(w ) q 3 δ where q = G. Proof. First note that Z d (W ) = W (y x)w (y x + d) q x G y Y = 2q [ ] 2 W (y x) W (y x + d) x G y Y Therefore Z d (W ) δ implies [ ] 2 W (y x) W (y x + d) 2qδ y Y 9

203 for all x G. Since d d, we have d = id for some i q. Therefore, Z d(w ) = W (y x)w (y x + id) q x G y Y = [ ] 2 W (y x) W (y x + id) 2q = 2q x G y Y [ i x G y Y i 2q = i 2q ( ) ] 2 W (y x + jd) W (y x + (j + )d) j=0 i [ ] 2 W (y x + jd) W (y x + (j + )d) x G y Y j=0 i [ ] 2 W (y x + jd) W (y x + (j + )d) x G j=0 y Y i 2qδ 2 q 3 δ x G j=0 Lemma VI.8. For d,, d m G, if Z d (W ) δ, Z d2 (W ) δ,, Z dm (W ) δ, then Z d(w ) 2mq 3 δ for any d d, d 2,, d m where d d, d 2,, d m is the subgroup of G generated by d,, d m. Proof. We prove this theorem for m = 2. The general case is a straightforward generalization of this case. Similarly to the proof of Lemma VI.7, for all x G, we have [ ] 2 W (y x) W (y x + dl ) 2qδ y Y for l =, 2. Since d d, d 2, it can be written as d = id + jd 2 for some integers 92

204 i, j q. Therefore, Z d(w ) = W (y x)w (y x + id + jd 2 ) q x G y Y = [ 2 W (y x) W (y x + id + jd 2 )] 2q = 2q x G y Y [ i x G y Y ( W (y x + kd ) W (y x + (k + )d )) k=0 j ( + W(y x+id +kd 2 ) ) ] 2 W(y x+id +(k+)d 2 ) k=0 i+j 2q x G y Y ( i [ W (y x + kd ) 2 W (y x + (k+)d )] k=0 j [ + W(y x+id +kd 2 ) ) ] 2 W(y x+id +(k+)d 2 ) x G 4q 3 δ k=0 ( i ) j 2qδ + 2qδ k=0 k=0 Lemma VI.9. For an Abelian group G, let M be a maximal subgroup. Then the index of M in G is a prime. Proof. Since M is normal in G, there is a one-to-one correspondence between the subgroups of the quotient group G/M and the subgroups of G containing M. By maximality of M, the latter only contains G and M (which is not equal to G). Hence, the only subgroups of G/M are {0} and G/M (which is not equal to {0}). Hence the order of G/M must be a prime. 93

205 Upper Bound on Z( W ) Assume Z d (W ) δ. This implies q x G y Y for all d H\M. Therefore for each x G, y Y W (y x)w (y x + d) δ W (y x)w (y x + d) qδ (6.20) The Bhattacharyya parameter of the channel W, Z( W ), is given by: q( q ) y Y t M,t M T M t M t M W (y th + t M + M) W (y t H + t M + M) = / M W(y t H +t M +m) W(y t H +t M q( q ) + m ) y Y m M t M,t M T M t M t M = / M q( q ) y Y t M,t M T M t M t M y Y t M,t M T M t M t M m M m,m W(y t H +t M +m)w(y t H +t M +m ) / M W(y t H +t M +m)w(y t H +t M q( q ) +m ) m,m Let x = t H +t M +m and x = t H +t M +m. Note that x x = t M t M +m m H since t M, t M, m, m H. Also note that since t M t M and m m M, x x / M. Now we use (6.20) to conclude: Z( W ) q( q ) M t M,t M T M t M t M m,m M q( q ) M ( H M )2 M 2 qδ = qδ M H G δ H M 94

206 Remark VI.20. For an arbitrary Abelian group G, let H G be an arbitrary subgroup and let M be any maximal subgroup of H. If for all d H\M, Z d(w ) δ then with a similar argument as above we can show that Z( W ) M H G δ where H M W is defined by (6.) Lower Bound on Z d +t H +M( W ) Assume Z d (W ) δ. Define D d (W ) = W (y x) W (y x + d ) 2q x G y Y First we show that Z d (W ) δ implies D d (W ) 2δ δ 2. Define the following quantities: q x,y = W (y x) + W (y x + d ) 2 δ x,y = 2 W (y x) W (y x + d ) Then we have Z d (W ) = q = q x G y Y x G y Y (q x,y δ x,y )(q x,y + δ x,y ) q 2 x,y δ 2 x,y Also we have D δ x,y = D d (W ), q x G y Y and 0 δ x,y q x,y Note that Z d (W ) q x G max d x,y: y Y dx,y=d q x G y Y q 2 x,y d 2 x,y 95

207 The Lagrangian for this optimization problem is given by ( L = qx,y q 2 d 2 x,y λ q we have x G y Y x G y Y d x,y D ) d L = x,y d x,y q 2 x,y d 2 x,y λ q and 2 d 2 x,y q 2 x,y L = 0 (qx,y 2 d 2 x,y) 3 2 Define γ = λ to get d γ q x,y = 2 q +γ 2 x,y. We have y Y q x,y =, therefore, Therefore we have D = have q q d x,y = q x G y Y γ = 2 + γ 2 q x G y Y γ = 2 + γ 2 x G y Y x G y Y γ 2 + γ 2 q x,y q x,y γ 2 and hence d +γ 2 x,y = Dq x,y. For this choice of d x,y we q 2 x,y d 2 x,y = D 2 q = D 2 x G y Y q x,y Therefore, we have shown that Z d (W ) D d (W ) 2. This implies that D d (W ) 2δ δ2. Next, we show that D d (W ) δ implies D d +t H +M( W ) 2qδ. By definition, q M 96

208 D d +t H +M( W ) is equal to 2q y Y t M T M W (y th +t M +M) W (y t H +t M +d +M) = q M H +t M y Y m MW(y t +m) W(y t H +t M +d +m) m M q q t M T M M W (y t H +t M +m) W (y t H +t M +d +m) y Y t M T M m M M 2qD d (W ) This shows that D d (W ) δ implies D d +t H +M( W ) 2qδ q M. Next, we show that D d (W ) δ implies Z d (W ) δ. We need the following lemma: Lemma VI.2. For constants 0 a b, with b a δ, Proof. Note that a + b 2 ab a + b 2 ab δ 2 a + x max ax 0 x a δ 2 We have [ a + x ] ax = x 2 2 a 2 ax 0 for all x a. Therefore the maximum is attained at x = a + δ. Therefore, a + b 2 ab a + (a + δ) 2 a(a + δ) The maximum of the right hand side is attained at a = 0, hence, a + b 2 ab δ 2 97

209 Assume D d (W ) δ. By definition, we have Z d (W ) is equal to W (y x)w (y x + d ) q x G y Y = ( W (y x)+w (y x+d ) q 2 (a) q y Y x G x G y Y = D d (W ) 2 W (y x) W (y x + d ) ) W (y x)w (y x+d ) where (a) follows from Lemma VI.2 with a = W (y x), b = W (y x + d ) and δ = W (y x) W (y x + d ). This shows that D d (W ) δ implies Z d (W ) > δ. We have shown that Z d (W ) δ implies D d (W ) 2δ δ 2. This implies D d +t H +M( W ) 2q 2δ δ 2 q M and this in turn implies Z d +t H +M( W ) 2q 2δ δ 2. q M Remark VI.22. For an arbitrary Abelian group G, let H G be an arbitrary subgroup and let M be any maximal subgroup of H. If for some d H\M, Z d(w ) δ then with a similar argument as above, we can show that Z d+th +M ( W ) 2q 2δ δ 2 q M where W is defined by (6.) Alternate Proof for a Lower Bound on Z d +t H +M( W ) In Appendix , we proved that Z d (W ) δ implies Z d +t H +M( W ) 2q 2δ δ 2 q M for the general case. In this part, we give a shorter proof for the Z p r case for a slightly different statement: If Z d (W ) δ then Z d +t H +M( W ) q 2 δ Assume Z d (W ) δ. It follows that for all x G, 2 y Y [ ] 2 W (y x) W (y x + d ) = y Y W (y x)w (y x + d ) qδ 98

210 For any d d we have d = id for some i q. Therefore, for any x G, y Y W (y x)w (y x + d) is equal to [ W ] 2 (y x) W (y x + 2 d) y Y = 2 2 y Y [ i j=0 ( W (y x+jd ) ) ] 2 W (y x+(j+)d ) i [ W (y x + jd ) 2 W (y x + (j + )d )] y Y j=0 i qδ q 2 δ (6.2) j=0 By definition, Z d +t H +M( W ) is equal to W (y th + t M + M) q W (y t H + t M + d + M) y Y t M T M = q M W(y t H+t 2 M +m)w(y t H +t M +d +m ) y Y t M T M (a) q q y Y t M T M m,m M m M m M M 2 W (y th +t M +m)w (y t H +t M +d +m ) min m,m t M T M y Y W(y th +t M +m)w(y t H +t M +d +m ) where (a) follows since is a concave function. Let x = t H + t M + m and x = t H + t M + d + m. It follows that x x = d + (m m). Since d, m, m H we have x x H. Since G and hence H are Z p r rings it follows that d H\M generates H; hence x x d. We can use (6.2) to get Z d +t H +M( W ) q t M T M The Rate of Polarization min ( m,m M q2 δ) = q 2 δ Recall that for t = 0,, r, (Z t ) (n) = d/ H t Z d (W (Jn) N ) where J n is uniform over {, 2,, 2 n }. For t = 0,, r, define (Zmax) t (n) = max d/ Ht Z d (W (Jn) N ) where J n is 99

211 same as above. Since for all d G, Z d (W + ) = Z d (W ) 2 it follows that Z t max(w + ) Z t max(w ) 2. It has been shown in [66, p. 6] that Z d (W ) 2Z d (W ) + 0 d Z (W )Z d+ (W ) Note that for any G, d / H t implies that either / H t or d+ / H t. Therefore, d / H t implies either Z (W ) Z t max(w ) or Z d+ (W ) Z t max(w ) (or both). Since Z (W ) and Z d+ (W ) both take values from [0, ], it follows that Z (W )Z d+ (W ) Z t max(w ) Therefore, for any d / H t, Z d (W ) 2Z d (W ) + qz t max(w ). Hence Z t max(w ) = max d/ H t Z d (W ) max d/ H t ( 2Zd (W ) + qz t max(w ) ) (q + 2)Z t max(w ) Since for all d G, Z n d converges to a Bernoulli random variable it follows that (Z t max) (n) also converges to a {0, }-valued random variable (Z t max) ( ). Note that P ( (Z t max) ( ) = 0 ) = P ((Z t ) = 0) = r s=t p s. Therefore, (Z t max) (n) satisfies the conditions of [3, Theorem ] and hence ( lim P (Zmax) t (n) < 2 2βn) = P ( (Zmax) t ( ) = 0 ) n for any β <. It clearly follows that lim 2 n P (q(z tmax) ) (n) < 2 2βn = P ( (Zmax) t ( ) = 0 ). Note that the event {(Z t ) (n) < 2 2βn } includes the event {q(z t max) (n) < 2 2βn }. Therefore, ( lim P (Z t ) (n) < 2 2βn) P ( (Z t ) = 0 ) n Similarly, for an arbitrary Abelian group G and a subgroup H of G, define (Zmax) H (n) = max d/ H Z d (W (Jn) N ) where J n is defined as above. It is straightforward 200

212 to show that (Z H max) (n) satisfies the conditions of [3, Theorem ]. Therefore, with an argument similar to above, we can show that, for any β < 2. ( lim P (Z H ) (n) < 2 2βn) P ( (Z H ) = 0 ) n 6.2 Polar Codes for Arbitrary DMSs In Section 6., we have shown that polar codes can achieve the symmetric capacity of arbitrary discrete memoryless channels regardless of the size of the channel input alphabet. It is shown in [39] that polar codes employed with a successive cancelation encoder can achieve the symmetric rate-distortion function for the lossy source coding problem when the size of the reconstruction alphabet is two. This result is extended to the case where the size of the reconstruction alphabet is a prime in [38]. In this section, we show that polar codes achieve the symmetric rate-distortion bound for the lossy source coding problem when the size of the reconstruction alphabet is finite. We show that similarly to the channel coding problem, polar transformations applied to (test) channels can converge to several asymptotic cases each corresponding to a subgroup H of the reconstruction alphabet G. We employ a modified randomized rounding encoding rule to achieve the symmetric rate-distortion bound Preliminaries The Rate-Distortion Function It is known that the optimal rate-distortion function is given by: R(D) = min I(X; U) p U X E px p U X {d(x,u)} D 20

213 where p U X is the conditional probability of U given X and p X p U X is the joint distribution of X and U. The symmetric rate-distortion function R(D) is defined as follows: R(D) = min p U X E px p U X {d(x,u)} D p U = q I(X; U) where p U is the marginal distribution of U given by p U (u) = x X p X(x)p U X (u x) and q is the size of the reconstruction alphabet U Channel Parameters For a test channel (U, X, W ) assume U is equipped with the structure of a group (G, +). The quantities I 0 (W ) = I(U; X), Z(W {u,ũ} ) and Z(W ) are defined similarly to Section 6... In addition, we use the following two quantities in the paper extensively: D d (W ) = W (x u) W (x u + d) 2q u U x X D d (W ) = (W (x u) W (x u + d)) 2 2q u U x X where d is some element of G and + is the group operation Polar Codes for Sources with reconstruction alphabet Z p r In this section, we consider sources whose reconstruction alphabet size q is of the form q = p r for some prime number p and a positive integer r. In this case, the reconstruction alphabet can be considered as a ring with addition and multiplication modulo p r. We prove the achievability of the symmetric rate-distortion bound for these sources using polar codes and later in Section we will generalize this result to sources with arbitrary finite reconstruction alphabets. 202

214 Recursive Channel Transformation The Basic Channel Transforms For a test channel (U = G, X, W ) where G = q, the channel transformations are given by: W (x, x 2 u ) = u 2 G q W (x u + u 2)W (x 2 u 2) (6.22) W + (x, x 2, u u 2 ) = q W (x u + u 2 )W (x 2 u 2 ) (6.23) for x, x 2 X and u, u 2 G. Repeating these operations n times recursively, we obtain N = 2 n channels W () N by:,, W (N) N. For i =,, N, these channels are given W (i) N (xn, u i u i ) = q W N (x N N u N G N ) (6.24) u N i+ GN i where G N is the generator matrix for polar codes. Let V N be a random vector uniformly distributed over G N and assume the random vectors V N, U N and X N are distributed over G N G N X N according to p V N U N XN (vn, u N, x n ) = q N {u N =vn G N } N W (x i u i ) The conditional probability distribution induced from the above equation is consistent with (6.24). We use this probability distribution extensively throughout the paper. i= Encoding and Decoding Let n be a positive integer and let G N be the generator matrix for polar codes where N = 2 n. Let {A t 0 t r} be a partition of the index set {, 2,, N}. Let b N be an arbitrary element from the set r t=0 HAt t. Assume that the partition {A t 0 t r} and the vector b N are known to both the encoder and the decoder. The encoder maps a source sequence x N to a vector v N G N by the following rule: 203

215 For i =,, N, if i A t, let v i be a random element g from the set b i + T t picked with probability P (v i = g) = P V i V i,x N (g vi, x N ) P Vi V i,x (b N i + T t v i, x N ) This encoding rule is a generalization of the randomized rounding encoding rule used for the binary case in [39]. Given a sequence v N G N, the decoder decodes it to v N G N. For the sake of analysis we assume that the vector b N is uniformly randomly distributed over the set r t=0 HAt t (Although it s common information between the encoder and the decoder). The average distortion is given by N E{d(XN, U N )} and the rate of the code is given by R = N r A t log T t = t=0 r t=0 A t N t log p Test Channel Polarization The following result has been proved in [59]: For all ɛ > 0, there exists a number N = N(ɛ) = 2 n(ɛ) and a partition {A ɛ 0, A ɛ,, A ɛ r} of {,, N} such that for t = 0,, r and i A ɛ t, Z d (W (i) N ) < O(ɛ) if d H s for 0 s < t and Z d (W (i) N ) > O(ɛ) if d H s for t s < r. For t = 0,, r and i A ɛ t, we have I(W (i) N ) = t log(p)+o(ɛ) and Z t (W (i) N ) = O(ɛ) where Z t (W ) = H t d H t Z d (W ) Moreover, as ɛ 0, Aɛ t N p t for some probabilities p 0,, p r. In the next section, we show that for any β < 2 and for t = 0,, r, ( lim P (Z t ) (n) > 2 2βn) P ( (Z t ) ( ) = ) (6.25) n r = s=t p s 204

216 Remark VI.23. This observation implies the following stronger result: For all ɛ > 0, there exists a number N = N(ɛ) = 2 n(ɛ) and a partition {A ɛ 0, A ɛ,, A ɛ r} of {,, N} such that for t = 0,, r and i A ɛ t, I(W (i) N ) = t log(p) + O(ɛ) and Z t (W (i) N ) > 2 2βn(ɛ). Moreover, as ɛ 0, Aɛ t N p t for some probabilities p 0,, p r Rate of Polarization In this section we derive a rate of polarization result for the source coding problem. In this proof, we do not assume q is a power of a prime and hence the rate of polarization result derived in this section is valid for the general case. It is shown in [3] (with a slight generalization) that if a random process Z n satisfies the following two properties Z n+ kz n w.p. 2 Z n+ Z 2 n w.p. 2 (6.26) (6.27) for some constant k, then for any β < 2, lim n P (Z n < 2 2βn ) = P (Z = 0). We prove that the random process D n d satisfied these properties. First note that by definition D d (W + ) = 2q u 2 G x,x 2 X u G [ q W (x u + u 2 )W (x 2 u 2 ) q W (x u + u 2 + d)w (x 2 u 2 + d) ] 2 If we add and subtract the term q W (x u + u 2 )W (x 2 u 2 + d) to the term inside brackets and use the inequality (a + b) 2 2(a 2 + b 2 ) with a = W(x u +u 2 )W(x 2 u 2 ) W(x u + u 2 )W(x 2 u 2 +d) b = W (x u + u 2 )W (x 2 u 2 + d) W (x u + u 2 + d)w (x 2 u 2 + d) 205

217 we obtain D d (W + ) 2q u 2 G x,x 2 X,u G 2 q 2 [ (W(x u +u 2 )W(x 2 u 2 ) W(x u +u 2 )W(x 2 u 2 +d)) 2 + (W (x u + u 2 )W (x 2 u 2 + d) W (x u + u 2 + d)w (x 2 u 2 + d)) 2] (6.28) This summation can be expanded into two separate summations. For the first summation, we have 2q u 2 G x,x 2 X,u G 2 q 2 (W (x u +u 2 )W (x 2 u 2 ) W (x u +u 2 )W (x 2 u 2 +d)) 2 2 q 2 2q 2q q 2 2q u 2 G x,x 2 X,u G W (x u + u 2 ) 2 (W (x 2 u 2 ) W (x 2 u 2 + d)) 2 (W (x 2 u 2 ) W (x 2 u 2 + d)) 2 u 2 G x 2 X = 2 q D d (W ) (6.29) Similarly, for the second summation we can show that 2q u 2 G x,x 2 X u G 2 q 2 (W (x u + u 2 )W (x 2 u 2 + d) W (x u + u 2 + d)w (x 2 u 2 + d)) 2 2 q D d (W ) (6.30) Therefore, it follows from (6.28), (6.29) and (6.30) that condition (6.26) is satisfied for k = 4 q. 206

218 Next we show that D d (W ) ( Dd (W )) 2. Note that D d (W ) = = 2q 2q v G x,x 2 X ( q v G x,x 2 X [( q ) W (x v + v 2 )W (x 2 v 2 ) v 2 G )] 2 W (x v + d + v 2 )W (x 2 v 2 ) v 2 G q 2 [ W (x 2 v 2 ) (W (x v + v 2 ) v 2 G W (x v + d + v 2 ))] 2 The squared term in the brackets can be expanded as q 2 v 2,v 2 G W (x 2 v 2 )W (x 2 v 2) (W (x v + v 2 ) W (x v + d + v 2 )) (W (x v + v 2) W (x v + d + v 2)) Therefore, Dd (W ) can be written as a summation of four terms D d (W ) = D + D 2 + D 3 + D 4 where D = 2q v G x,x 2 X q 2 v 2,v 2 G W (x 2 v 2 ) D 2 D 3 D 4 W (x 2 v 2)W (x v + v 2 )W (x v + v 2) = W (x 2q q 2 2 v 2 ) v G x,x 2 X v 2,v 2 G W (x 2 v 2)W (x v + v 2 + d)w (x v + v 2 + d) = W (x 2q q 2 2 v 2 ) v G x,x 2 X v 2,v 2 G W (x 2 v 2)W (x v + v 2 )W (x v + v 2 + d) = W (x 2q q 2 2 v 2 ) v G x,x 2 X v 2,v 2 G W (x 2 v 2)W (x v + v 2 + d)w (x v + v 2) 207

219 For d G define Note that S d (W ) = S d (W ). We have D = q 2 2q S d (W ) = W (x v)w (x v + d) 2q x 2 X v G x X = q 2 v G x X W (x 2 v 2 )W (x 2 v 2) v 2,v 2 G W (x v + v 2 )W (x v + v 2) W (x 2 v 2 )W (x 2 v 2)S v2 v (W ) 2 x 2 X v 2,v 2 G = W(x q 2 2 v 2 )W(x 2 v 2 a)s a (W ) x 2 X v 2 G a G v 2 =v 2 a = 2 S a (W ) q 2q a G v 2 G x 2 X = 2 S a (W )S a (W ) q a G = 2 S a (W ) 2 q a G With similar arguments we can show that W (x 2 v 2 )W (x 2 v 2 a) D 2 D 3 D 4 = 2 S a (W ) 2 q a G = 2 S a (W )S a d (W ) q a G = 2 S a (W )S a d (W ) q a G Therefore D d (W ) = 4 q ( Sa (W ) 2 S a (W )S a d (W ) ) a G = 2 (S a (W ) S a d (W )) 2 q a G 208

220 Note that D d (W ) = (W (x v) W (x v + d)) 2 2q v G x X = 2S 0 (W ) 2S d (W ) Therefore ( ) 2 Dd (W ) = 4(S0 (W ) S d (W )) 2 To show that D d (W ) ( ) 2 Dd (W ) it suffices to show that Sa (W ) S a d (W ) S 0 (W ) S d (W ). We will make use of the rearrangement inequality: Lemma VI.24. Let π be an arbitrary permutation of the set {,, n. If a a n and b b n then n a i b i i= n a i b π(i) i= The rearrangement inequality implies that S 0 (W ) S d (W ) S a (W ) S a d (W ). Therefore it follows that condition (6.27) is also satisfied and hence lim n P ( D d n < 2 2βn ) = P ( D d = 0). It has been shown in Appendix D of [59] that D d (W ) < ɛ implies Z d (W ) > ɛ. This completes the rate of polarization result. 209

221 Polar Codes Achieve the Rate-Distortion Bound The average distortion for the encoding and decoding rules described in Section is given by D avg = p N X(x N ) x N X N ( r t=0 i A t b N r t=0 HA t t v N bn + r t=0 T A t t P Vi V i,x N (g vi, x N ) P t=0 i A t Vi V i,x (b N i + T t v i, x N ) ( r ) d(x N H t=0 t At, v N G N ) = p N X(x N ) x N X N b N r t=0 HA t t v N bn + r t=0 T A t t ( r ) P Vi V i,x N (g vi, x N ) P Vi V i,x N (b i + T t v i, x N ) H t ) d(x N, v N G N ) This can be written as D avg = E Q {d(x N, V N G N )} where the distribution Q is defined by and Q(v i v i, x N ) = P Vi V i,x (v i v i N, x N ) P Vi V i,x (b N i + T t v i, x N ) H t Q(x N ) = p N X(x N ) and hence Q(v N, x N ) = = N i= Q(v i v i, x N ) r t=0 i A t P Vi V i,x (v i v i N, x N ) P Vi V i,x (b N i + T t v i, x N ) H t 20

222 Recall that P (v N, x N ) = N i= P Vi V i,x N (v i v i, x N ) The total variation distance between the distributions P and Q is given by We have P Q t.v. = = v N GN,x N X N x N X N p N X(x N ) v N GN Q(v N, x N ) P (v N, x N ) Q(v N x N ) P (v N x N ) Q(v N x N ) P (v N x N ) v N GN = v N GN (a) = [( v N GN ( r ) i P (v i v, x N ) P (b t=0 i A i + T t v i t, x N ) H t ( r P (v i v i, x N )) t=0 i A t r t=0 i A t P (v i v i, x N ) P (b i + T t v i r t=0, x N ) H t P (v i v i N j=i+ j A t r t=0 i P (v i v j= j A t P (v i v i, x N ) P (b i + T t v i ), x N ) i, x N ), x N ) H t where in (a) we used the telescopic inequality introduced in [39]. It is straightforward to show that P Q t.v. r E t=0 i A t { } P (b i + T t v i, x N ) H t 2

223 It has been shown in Appendix D of [59] that if Z d (W ) > ɛ then D d (W ) 2ɛ ɛ 2. Therefore if Z d (W (i) N ) > ɛ for all d H we have D d (W (i) N ) = 2q v i G v i G i,x N X N W (i) N (vi, x N v i ) W (i) N (vi 2ɛ ɛ 2, x N v i + d) Therefore for all v i G v i G i x N X N W (i) N (vi, x N v i ) W (i) N (vi, x N v i + d) 2q(2ɛ ɛ 2 ) 22

224 We have { } E H t P (T t V i, X N ) = P (v i, x N ) H t P (T t v i, x N ) v i G i,x N X N = H t P (vi, x N ) P (g, v i, x N ) g T t = v i G i x N X N v i G i,x N X N v i G i,x N X N G H t g T t 2q(2ɛ ɛ2 ) H t d H g T t [ G W (vi, x N g) H t G W (vi, x N g + d)] d H [ W (v i, x N g) G d H g T t H t W (vi, x N g + d)] v i G i,x N X N W (v i, x N g) W (v i, x N g + d) Therefore for if for all d H t, Z d > ɛ then for δ = 2q(2ɛ ɛ2 ) H t we have { } E P H t P (b i + T t V i, X N ) < δ A similar argument as in [39] implies that D avg = N E Q{d(X N, V N G N )} N E P {d(x N, V N G N )} + N r A t d max δ where d max is the maximum value of the distortion function. Note that t=0 E P {d(x N, V N G N )} = ND 23

225 Therefore, D avg D + N r A t d max δ t=0 Note that from the rate of polarization derived in Section we can choose ɛ to be ɛ = 2 2βn. This implies that as n, D + N r t=0 A t d max δ D. Also note that the rate of the code R = r A t t=0 t log p converges to r N t=0 p tt log p and this last quantity is equal to the symmetric capacity of the test channel since the mutual information is a martingale. This means the rate I(X; U) is achievable with distortion D Arbitrary Reconstruction Alphabets For an arbitrary Abelian group G, The following polarization result has been provided in [59]: For all ɛ > 0, there exists a number N = N(ɛ) = 2 n(ɛ) and a partition {A ɛ H H G} of {,, N} such that for H G and i Aɛ H, Z d(w (i) N ) < O(ɛ) if d H and Z d (W (i) N ) > O(ɛ) if d / H. For I(W (i) G N ) = log + O(ɛ) and H ZH (W (i) N ) = O(ɛ) where Z H (W ) = H H G and i Aɛ H, we have Z d (W ) d H Moreover, as ɛ 0, Aɛ H N p H for some probabilities p H, H G. As mentioned earlier the rate of polarization result derived in Section is valid for the general case. Therefore it follows that for any β < 2 and for H G, ( lim P (Z H ) (n) > 2 2βn) P ( (Z H ) ( ) = ) (6.3) n = S H p H Remark VI.25. This observation implies the following stronger result: For all ɛ > 0, there exists a number N = N(ɛ) = 2 n(ɛ) and a partition {A ɛ H H G} of {,, N} 24

226 such that for H G and i A ɛ (i) G H, I(W N ) = log +O(ɛ) and H ZH (W (i) N ) > 2 2βn(ɛ). Moreover, as ɛ 0, Aɛ H N p H for some probabilities p H, H G. The encoding and decoding rules for the general case is as follows: Let n be a positive integer and let G N be the generator matrix for polar codes where N = 2 n. Let {A H H G} be a partition of the index set {, 2,, N}. Let b N be an arbitrary element from the set H G HA H. Assume that the partition {A H H G} and the vector b N are known to both the encoder and the decoder. The encoder maps a source sequence x N to a vector v N G N by the following rule: For a subgroup H of G, let T H be a transversal of H in G. For i =,, N, if i A H, let v i be a random element g from the set b i + T H picked with probability P (v i = g) = P Vi V i,x N (g vi, x N ) P Vi V i,x (b N i + T H v i, x N ) Given a sequence v N G N, the decoder decodes it to v N G N. It follows from the analysis of the Z p r case in a straightforward fashion that this encoding/decoding scheme achieves the symmetric rate-distortion bound when the group G is an arbitrary Abelian group. 6.3 Nested Polar Codes for Point-to-Point Communications In this section, we show that nested polar codes achieve the Shannon rate-distortion function for arbitrary (binary or non-binary) discrete memoryless sources and the Shannon capacity of arbitrary discrete memoryless channels. Polar codes for lossy source coding were investigated in [40] where it is shown that polar codes achieve the symmetric rate-distortion function for sources with binary reconstruction alphabets. For the lossless source coding problem, the source polarization phenomenon is introduced in [] to compress a source down to its entropy. 25

227 It is well known that linear codes can at most achieve the symmetric capacity of discrete memoryless channels and the symmetric rate-distortion function for discrete memoryless sources. This indicates that polar codes are optimal linear codes in terms of the achievable rate. It is also known that nested linear codes achieve the Shannon capacity of arbitrary discrete memoryless channels and the Shannon rate-distortion function for arbitrary discrete memoryless sources. In this paper, we investigate the performance of nested polar codes for the point-to-point channel and source coding problems and show that these codes achieve the Shannon capacity of arbitrary (binary or non-binary) DMCs and the Shannon rate-distortion function for arbitrary DMSs. The results of this chapter are general regarding the size of the channel and source alphabets. To generalize the results to non-binary cases, we use the approach of [62] in which it is shown that polar codes with their original (u, u + v) kernel, achieve the symmetric capacity of arbitrary discrete memoryless channels where + is the addition operation over any finite Abelian group The Lossy Source Coding Problem In this section, we prove the following theorem: Theorem VI.26. For an arbitrary discrete memoryless source (X, U, p X, d), nested polar codes achieve the Shannon rate-distortion function. For the source (X, U, p X, d), let U = G where G is an arbitrary Abelian group and let q = G be the size of the group. For a pair (R, D) R 2, let X be distributed according to p X and let U be a random variable such that E{d(X, U)} D. We prove that there exists a pair of polar codes C i C o such that C i induces a partition of C o through its shifts, C o is a good source code for X and each shift of C i is a good channel code for the test channel p X U. This will be made clear in the following. 26

228 Given the test channel p X U, define the artificial channels (G, G, W c ) and (G, X G, W s ) such that for s, z G and x X, W c (z s) = p U (z s) and W s (x, z s) = p XU (x, z s). These channels have been depicted in Figures 6.8 and 6.9. Let S be a random variable uniformly distributed over G which is independent from X and U. It is straightforward to show that in this case, Z is also uniformly distributed over G. The symmetric capacity of the channel W c is equal to Ī(W c) = log q H(U. For the channel W s, it is shown in [63] that the symmetric capacity of the channel W s is equal to Ī(W s) = log q H(U X). We employ a nested polar code in which the inner code C i is a good channel code for the channel W c and the outer code C o is a good source code for W s. The rate of this code is equal to R = Ī(Ws) Ī(W c). Therefore, R = log q H(U X) (log q H(U)) = I(X; U) Note that the channels W c and W s are chosen so that the difference of their symmetric capacities is equal to the Shannon mutual information between U and X. This enables us to use channel coding polar codes to achieve the symmetric capacity of W c (as the inner code) and source coding polar codes to achieve the symmetric capacity of the test channel W s (as the outer code). The exact proof is postponed to Section where the result is proved for the binary case and Section in which the general proof (for arbitrary Abelian groups) is presented. The next section is devoted to some general definitions and useful lemmas which are used in the proofs. 27

229 6.3.. Definitions and Lemmas For a channel (X, Y, W ), the basic channel transformations associated with polar codes are given by: W (y, y 2 u ) = u 2 G q W (y u + u 2)W (y 2 u 2) (6.32) W + (y, y 2, u u 2 ) = q W (y u + u 2 )W (y 2 u 2 ) (6.33) for y, y 2 Y and u, u 2 G. We apply these transformations to both channels (G, G, W c ) and (G, X G, W s ). Repeating these operations n times recursively for W c and W s, we obtain N = 2 n channels W () c,n,, W (N) c,n respectively. For i =,, N, these channels are given by: W (i) c,n (zn, v i v i ) = v N i+ GN i q N W N c (z N v N G) W (i) s,n (xn, z n, v i v i ) = v N i+ GN i q N W N s (x N, z N v N G) () and W s,n,, W (N) s,n for z N, v N G N, x N X N where G is the generator matrix of dimensions N N for polar codes. For the case of binary input channels, it has been shown in [0] that as N, these channels polarize in the sense that their Bhattacharyya parameter gets either close to zero (perfect channels) or close to one (useless channels). For arbitrary channels, it is shown in [62] that polarization happens in multiple levels so that as N channels get useless, perfect or partially perfect. Definition VI.27. The channel (G, Y, W ) is degraded with respect to the channel (G, Y 2, W 2 ) if there exists a channel (Y 2, Y, W ) such that for x G and y Y, W (y x) = y 2 Y W 2 (y 2 x)w (y y 2 ) Lemma VI.28. If the channel (G, Y, W ) is degraded with respect to the channel (G, Y 2, W 2 ) in the sense of Definition VI.27, then for any d G, Z d (W ) Z d (W 2 ). 28

230 Proof. Provided in a more complete version of this work [63]. A special case of this lemma is when G = Z 2 and d =. In this case, the lemma implies, Z(W ) Z(W 2 ) if W is degraded with respect to W 2. Lemma VI.29. The channel W c is degraded with respect to the channel W s in the sense of Definition VI.27. Proof. In Definition VI.27, let the channel (X G, G, W ) be such that for z, z G and x X, W (z x, z ) = {z=z }. Let the random vectors X N, U N be distributed according to P N XU and let ZN be a random variable uniformly distributed over G N which is independent of X N, U N. Let S N = Z N U N and V N = S N G (Here, G is the inverse of the one-two-one mapping G : G N G N ). In other words, the joint distribution of these random vectors is given by p V N S N U N XN ZN (vn, s N, u N, x N, z N ) = q N pn XU(x N, u N ) {s N =v N G,un =zn vn G} Source Coding: Proof for the Binary Case The standard result of channel polarization for the binary input channel W c implies [0] that for any ɛ > 0 and 0 < β <, there exist a large N = 2 2n and a partition (i) A 0, A of [, N] such that for t = 0, and i A t, Ī(W c,n ) t < ɛ and such that for i A Z(W (i) c,n ) < 2 N β. Moreover, as ɛ 0 (and N ), At N p t for some p 0, p adding up to one with p = Ī(W c). Similarly, for the channel W s we have the following: For any ɛ > 0 and 0 < β < 2, there exist a large N = 2 n and a partition B 0, B of [, N] such that for τ = 0, and (i) i B τ, Ī(W s,n ) τ < ɛ and such that for i B, Z(W (i) s,n ) < 2 N β. Moreover, as ɛ 0 (and N ), Bτ N q τ for some q 0, q adding up to one with q = Ī(W s). 29

231 Lemma VI.30. For i =,, N, Z(W (i) (i) c,n ) Z(W s,n ). Proof. Provided in a more complete version [63]. To introduce the encoding and decoding rules, we need to make the following definitions: { } A 0 = i [, N] Z(W (i) c,n ) > 2 N β { } B 0 = i [, N] Z(W (i) s,n ) > 2 N β and A = [, N]\A 0 and B = [, N]\B 0. For t = 0, and τ = 0,, define A t,τ = A t B τ. Note that for large N, 2 N β < 2 N β and therefore, Lemma VI.30 implies A,0 =. Note that the above polarization results imply that as N increases, A Ī(W N c) and Bτ N Ī(W s). Encoding and Decoding Let z N G N be an outcome of the random variable Z N known to both the encoder and the decoder. Given a source sequence x N X N, the encoding rule is as follows: For i [, N], if i B 0, then v i is uniformly distributed over G and is known to both the encoder and the decoder (and is independent from other random variables). If i B, v i = g for some g G with probability P (v i = g) = p Vi X N ZN V i (g x N, z N, v i ) Note that [, N] can be partitioned into A 0,0, A 0, and A, (since A,0 is empty) and B 0 = A 0,0, B = A 0, A,. Therefore, v N can be decompose as v N = v A0,0 + v A0, + v A, in which v A0,0 is known to the decoder. The encoder sends v A0, to the decoder and the decoder uses the channel code to recover v A,. The decoding rule is as follows: Given z N, v A0,0 and v A0,, let ˆv A0,0 = v A0,0 and ˆv A0, = v A0,. For 220

232 i A,, let Finally, the decoder outputs z N ˆv N G. ˆv i = argmax W (i) c,n (zn, ˆv i g) g G Error Analysis The analysis is a combination of the-point-to point channel coding and source coding results for polar codes. The average distortion between the encoder input and the decoder output is upper bounded by D avg p N q X(x N N ) z N GN x N X N q B 0 v N GN ( ) i B p(v i x N, z N,v i ) ( d max {ˆv v} + d(x N, z N v N G) ) where we have replaced p Vi X N ZN V i (v i x N, z N, v i ) with p(v i x N, z N, v i ) for sim- plicity of notation and d max is the maximum value of the d(, ) function. Let q(x N, z N ) = p(x N, z N ) and q(v i x N z N v i ) = If i B 2 0 (v i x N z N v i ) If i B p Vi X N ZN V i It is shown in [63] that we have D avg D + D 2 + D 3 where D = p(v N, x N, z N )d max {ˆv v} (6.34) D 2 = D 3 = v N,zN GN x N X N v N,zN GN x N X N p(v N, x N, z N )d(x N, z N v N G) (6.35) q(v N, x N, z N ) p(v N, x N, z N ) v N,zN GN x N X N ( ) d max {ˆv v} + d(x N, z N v N G) (6.36) 22

233 The proof proceeds as follows: It is straightforward to show that D D as N increases. It can also be shown that D 2 0 as N increases since the inner code is a good channel code. Finally, it can be shown that D 3 0 as N increases since the total variation distance between the P and the Q measures is small (in turn since the outer code is a good source code). For the complete proof, please see [63] Source Coding: Proof for the General Case The result of channel polarization for arbitrary discrete memoryless channels applied to W c implies [62] that for any ɛ > 0 and 0 < β <, there exist a 2 large N = 2 n and a partition {A H H G} of [, N] such that for H G and (i) G i A H, Ī(W c,n ) log < ɛ and Z H (W (i) c,n ) < 2 N β. Moreover, as ɛ 0 (and N ), H A H N p H for some probabilities p H, H G adding up to one with H G p H log G H = Ī(W c). Similarly, for the channel W s we have the following: For any ɛ > 0 and 0 < β < 2, there exist a large N = 2 n and a partition {B H H G} of [, N] such that for (i) G H G and i B H, Ī(W s,n ) log < ɛ and Z H (W (i) s,n ) < 2 N β. Moreover, as ɛ 0 (and N ), B H N with H G q H log G H = Ī(W s). H q H for some probabilities q H, H G adding up to one Lemma VI.3. For i =,, N and for d G and H G, Z d (W (i) c,n ) Z d(w (i) s,n ) and Z H (W (i) c,n ) ZH (W (i) s,n ). Proof. Provided in a more complete version [63]. We define some quantities before we introduce the encoding and decoding rules. 222

234 For H G, define { A H = i [, N] Z H (W (i) c,n ) < 2 N β, K Hsuch that Z K (W (i) β} c,n ) < 2 N { B H = i [, N] Z H (W (i) s,n ) < 2 N β, K H such that Z K (W (i) β} s,n ) < 2 N For H G and K G, define A H,K = A H B K. Note that for large N, 2 N β < 2 N β and therefore, if for some i [, N], i A H, Lemma VI.3 implies Z H (W (i) s,n ) < 2 N β and hence i K H B K. Therefore, for K H, A H,K =. Therefore {A H,K K H G} is a partition of [, N]. Note that the channel polarization results imply that as N increases, A H N p H and B H N q H. Encoding and Decoding Let z N G N be an outcome of the random variable Z N known to both the encoder and the decoder. Given K H G, let T H be a transversal of H in G and let T K H be a transversal of K in H. Any element g of G can be represented by g = [g] K + [g] TK H + [g] TH for unique [g] K K, [g] TK H T K H and [g] TH T H. Also note that T K H + T H is a transversal T K of K in G so that g can be uniquely represented by g = [g] K + [g] TK for some [g] TK T K and [g] TK can be uniquely represented by [g] TK = [g] TK H + [g] TH. Given a source sequence x N X N, the encoding rule is as follows: For i [, N], if i A H,K for some K H G, [v i ] K is uniformly distributed over K and is known to both the encoder and the decoder (and is independent from other random variables). The component [v i ] TK is chosen randomly so that for g [v i ] K + T K, P (v i = g) = p Vi X N ZN V i (g x N, z N, v i ) p Vi X N ZN V i ([v i ] K + T K x N, z N, v i ) 223

235 Note that v N can be decomposed as v N = [v N ] K + [v N ] TK H + [v N ] TH (with a slight abuse of notation since K and H depend on the index i) in which [v N ] K is known to the decoder. The encoder sends [v N ] TK H to the decoder and the decoder uses the channel code to recover [v N ] TH. The decoding rule is as follows: Given z N, [v N ] K and [v N ] TK H, and for i A H,K, let ˆv i = argmax W (i) c,n (zn, ˆv i g) g [v i ] K +[v i ] TK H +T H Finally, the decoder outputs z N ˆv N G. Note that the rate of this code is equal to R = K H G = K H G A H,K N A H,K N H log K G log K Ī(W s) Ī(W c) = I(X; U) K H G A H,K N G log H Error Analysis The average distortion between the encoder input and the decoder output is upper bounded by D avg q N z N GN ( p N X(x N ) q B 0 x N X N v N GN K G i B K p(g xn, z N, v i ) p([v i ] K + T K x N, z N, v i ) ) K ( dmax {ˆv v} + d(x N, z N v N G) ) where p Vi X N ZN V i ( x N, z N, v i ) is replaced with p( x N, z N, v i ) for simplicity of notation. The rest of the proof is essentially similar to the binary case. For the complete proof, please see [63] The Channel Coding Problem In this section, we prove the following theorem: 224

236 Theorem VI.32. For an arbitrary discrete memoryless channel (X, Y, W ), nested polar codes achieve the Shannon capacity. For the channel let X = G for some Abelian group G and let G = q. Similarly to the source coding problem, we show that there exists nested polar code C i C o such that C o is a good channel code and each shift of C i is a good source code. This will be made clear later in the following. Let X be a random variable with the capacity achieving distribution and let U be uniformly distributed over G. Define the artificial channels (G, G, W s ) and (G, Y G, W c ) such that for u, z G and y Y, W s (z u) = p X (z u) and W c (y, z u) = p XY (z u, y). These channels have been depicted in Figures 6.0 and 6.. Note that for u, x, z G and y Y, p UXY Z (u, x, y, z) = p U (u)p X (x)w (y x) {z=u+x}. Similarly to the source coding case, one can show that the symmetric capacities of the channels are equal to Ī(W s) = log q H(X) and Ī(W c) = log q H(X Y ). We employ a nested polar code in which the inner code is a good source code for the test channel W s and the outer code is a good channel code for W c. The rate of this code is equal to R = Ī(W c) Ī(W x) = I(X; Y ). We only present our encoding and decoding rules here. The proofs can be found in [63]. To introduce our encoding and decoding rules, we need to make some definitions. Let n be a positive integer and let N = 2 n. Similar to the source coding case, for i = N, define the synthesized channels W (i) c,n U N be distributed according to p N U (i) and W s,n. Let the random vector (uniform) and let V N = U N G where G is the polar coding matrix of dimension N N. Note that since G is a one-to-one mapping, V N is also uniformly distributed. Let Y N and Z N be the outputs of the channel W c 225

237 when the input is U N. For H G, define { A H = i [, N] Z H (W (i) s,n ) < 2 N β, B H = K Hsuch that Z K (W (i) β} s,n ) < 2 N { i [, N] Z H (W (i) c,n ) < 2 N β, K H such that Z K (W (i) β} c,n ) < 2 N For H G and K G, define A H,K = A H B K. Note that for K H G, Z H (W ) Z K (W ). Also note that Z H (W (i) c,n ) ZH (W (i) s,n ) Therefore, if K H then { A H,K i [, N] 2 N β < Z H (W (i) c,n ) Z H (W (i) β} s,n ) < 2 N Since Z H (W (i) c,n ) and ZH (W (i) s,n ) both polarize to 0,, as N increases A H,K N K H. 0 if Note that the channel polarization results imply that as N increases, A H N p H and B H N q H. The encoding and decoding rules are as follows: Let z N G N be an outcome of the random variable Z N known to both the encoder and the decoder. Given K H G, let T H be a transversal of H in G and let T K H be a transversal of K in H. Any element g of G can be represented by g = [g] K + [g] TK H + [g] TH for unique [g] K K, [g] TK H T K H and [g] TH T H. Also note that T K H + T H is a transversal T K of K in G so that g can be uniquely represented by g = [g] K + [g] TK for some [g] TK T K and [g] TK can be uniquely represented by [g] TK = [g] TK H + [g] TH. Given a source sequence x N X N, the encoding rule is as follows: For i [, N], if i A H,K for some K H G, [v i ] K is uniformly distributed over K and is known to both the encoder and the decoder (and is independent from other random variables). The component [v i ] TK H is the message and is uniformly distributed but is only known to the encoder. The component [v i ] TK is chosen randomly so that for 226

238 g [v i ] K + [v i ] TK H + T H, P (v i = g) = p Vi X N ZN V i p Vi X N ZN V i (g x N, z N, v i ) ([v i ] K + [v i ] TK H + T H x N, z N, v i ) For i [, N], if i A H,K for some K H, [v i ] H is uniformly distributed over H and is known to both the encoder and the decoder and the component [v i ] TH is chosen randomly so that for g [v i ] H + T H, P (v i = g) = p Vi X N ZN V i (g x N, z N, v i ) p Vi X N ZN V i ([v i ] H + T H x N, z N, v i ) For the moment assume that in this case v i is known at the receiver. Note that for i [, N], if i A H,K for some K H G, v i can be decomposed as v i = [v i ] K + [v i ] TK H + [v i ] TH in which [v i ] K is known to the decoder. The decoding rule is as follows: Given z N and for i A H,K for some K H G, let ˆv i = argmax W (i) c,n (zn, ˆv i g) g [v i ] K +[v i ] TK H +T H It is shown in [63] that with this encoding and decoding rules, the probability of error goes to zero. It remains to send the v i i A H,K with K H to the decoder which can be done using a regular polar code (which achieves the symmetric capacity of the channel). Note that since the fraction A H,K N rate loss due to this transmission can be made arbitrarily small. vanishes as N increases if K H, the 227

Figure 6.2: The behavior of I(W b b 2 b n ) for n = 4 for Channel when ɛ = 0.4 and λ = 0.2. The three solid lines represent the three discrete values of I with positive probability. 2.8.6.4 N=2 4 Mutual Information.

239 Figure 6.2: The behavior of I(W b b 2 b n ) for n = 4 for Channel when ɛ = 0.4 and λ = 0.2. The three solid lines represent the three discrete values of I with positive probability N=2 4 Mutual Information N=2 8 N=2 4 N= Index Figure 6.3: The asymptotic behavior of I(W b b 2 b n ), N = 2 n = 2 4, 2 8, 2 2, 2 4 for Channel when the data is sorted. We observe that for this channel, all three extreme cases appear with positive probability. In general, it is possible to have fewer cases in the asymptotic regime. 228

240 Channel Inputs. { Channel Outputs E E 2 E 3 E 4 E 5 E 6 Figure 6.4: Channel 2: A channel with a composite input alphabet size. For this channel, the process I n can be explicitly found for each n and the multilevel polarization can be observed. E, E 2 and E 3 are erasures corresponding to cosets of the subgroup {0, 3} and E 4 and E 5 are erasures corresponding to cosets of the subgroup {0, 2, 4}. The lines connected to outputs E, E 2 and E 3 correspond to a conditional probability of γ, the lines connected to outputs E 4 and E 5 correspond to a conditional probability of ɛ, the lines connected to the output E 6 correspond to a conditional probability of λ, and the lines connected to outputs 0,, 2, 3, 4 and 5 correspond to a conditional probability of γ ɛ λ. The parameters γ, ɛ, λ take values from [0, ] such that γ + ɛ + λ. 229

241 Figure 6.5: Polarization of Channel 2 with parameters γ = 0, ɛ = 0.4, λ = 0.2. The middle line represents the subgroup {0, 2, 4} of Z 6. Figure 6.6: Polarization of Channel 2 with parameters γ = 0.4, ɛ = 0, λ = 0.2. The middle line represents the subgroup {0, 3} of Z λ 2 3 λ Figure 6.7: Channel 3 230

Lattices for Distributed Source Coding: Jointly Gaussian Sources and Reconstruction of a Linear Function

Lattices for Distributed Source Coding: Jointly Gaussian Sources and Reconstruction of a Linear Function Dinesh Krithivasan and S. Sandeep Pradhan Department of Electrical Engineering and Computer Science,